Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
# https://old.reddit.com/r/learnpython/comments/ql7m0c/rejoining_a_data_frame_after_a_scrape_on_index/ # ifreeski420.py import pandas as pd # https://tqdm.github.io/ from tqdm import tqdm def get_bio(url, index): # ...code to scrape profile bio... # Some of the URL rows are empty # and I think it de-couples from the index # when trying to merge everything back together. s = f"get_bio({url}, index)" if url != "url_2" else "bio not found" df = pd.DataFrame([s], columns=["bio"]) print(df) return df # bio # 0 get_bio(url_0, index) # bio # 0 get_bio(url_1, index) # bio # 0 bio not found # bio # 0 get_bio(url_3, index) # bio # 0 get_bio(url_4, index) df_list = [] df = pd.DataFrame({'player_profile': [f"url_{i}" for i in range(5)]}) print(f"\nInitial df") print(df) # Initial df # player_profile # 0 url_0 # 1 url_1 # 2 url_2 # 3 url_3 # 4 url_4 # for athlete_row in tqdm(df.iterrows()): for athlete_row in df.iterrows(): url = athlete_row[1]['player_profile'] index = athlete_row.index data = get_bio(url, index) ## VERY SUSPICIOUS! ## data is undefined when get_bio() raises error # try: # data = get_bio(url, index) # except: # continue df_list.append(data) final_bio_frame = pd.concat(df_list).reset_index(drop=True) print(f"\nfinal_bio_frame") print(final_bio_frame) # final_bio_frame # bio # 0 get_bio(url_0, index) # 1 get_bio(url_1, index) # 2 bio not found # 3 get_bio(url_3, index) # 4 get_bio(url_4, index) final = pd.merge(df, final_bio_frame , how='left', left_index=True, right_index=True) print(f"\nfinal") print(final) # final # player_profile bio # 0 url_0 get_bio(url_0, index) # 1 url_1 get_bio(url_1, index) # 2 url_2 bio not found # 3 url_3 get_bio(url_3, index) # 4 url_4 get_bio(url_4, index)
Related posts
- Helper class for tracking the progress of iteration in CLI
- I have this function I have written that shows how much of a percentage is done given progress in a loop..so..if you are iterating through a loop that is 500 long, at 200 it says "40%",240 "48%", and so on, but, how do you just change the value on the screen, not print a new one on a new line?
- I keep getting this issue, can anyone help??
- [2022 Day11 (Part2)] [python] brute force
- How to implement a progress bar for non verbose commands?