DataFrame.at creates a new row instead of inserting value - python

I have a function that loops through two lists, makes API calls based on the values contained in those lists and then merges responses into a single dataframe as below:
master_df = pd.DataFrame()
links = [109340678,60375713,...]
ids = [353474,184335,...]
for id in ids:
url = "https://myurl/{}.json".format(id)
response = requests.get(url, headers=headers)
json_obj = json.loads(response.content)
data = json_obj['data']
df = pd.DataFrame.from_dict(data)
df = df.set_index('field').T
df = df.replace(to_replace='.*patern.*', value=np.NaN, regex=True) # replace a particular pattern with NaN, insert content from the loop below
for link in links:
url_download= "https://myurl/{id}/{link}".format(id= id, link= link)
response = requests.get(url_act, headers=headers)
df.at[0,link] = response.content
master_df = master_df.append(df)
I save the results in an Excel workbook, but get the following, where value is the response.content
I inspected every iteration of the loop and can't quite understand why, despite specifying 0 as row index, I get and extra row created every time.

Related

Scraping Table Data from Multiple URLS, but first link is repeating

I'm looking to iterate through the URL with "count" as variables between 1 and 65.
Right now, I'm close but really struggling to figure out the last piece. I'm receiving the same table (from variable 1) 65 times, instead of receiving the different tables.
import requests
import pandas as pd
url = 'https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc/{count}'
res = []
for count in range(1, 65):
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
res.append(df)
print(res)
df.to_csv('my data.csv')
Any thoughts?
A few errors:
Your URL was templated incorrectly. It remains at .../{count} literally, without substituting or updating from the loop variable.
If you want to get page 1 to 65, use range(1, 66)
Unless you want to export only the last dataframe, you need to concatenate all of them first
# No count here, we will add it later
url = 'https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc'
res = []
for count in range(1, 66):
# pd.read_html accepts a URL too so no need to make a separate request
df_list = pd.read_html(f"{url}/{count}")
res.append(df_list[-1])
pd.concat(res).to_csv('my data.csv')

Is it possible to merge several data into one DataFrames from function?

Function uses name from loop and prints 10 or more data sets. I want to merge them in one DataFrames write into csv file.
def f(name):
url = "https://apiurl"
response = requests.get(url, params = {'page': 1})
records = []
for page_number in range(1, response.json().get("pages")+1):
response = requests.get(url, params = {'page': page_number})
records += response.json().get('records')
df = pd.DataFrame(records)
return df
For loop to the function.
for row in valdf.itertuples():
name = valdf.loc[row.Index, 'Account_ID']
df1 = f(name)
print(df1)
When I tried df.to_csv('df.csv') it only take last result from loop. Is it possible to merge them in a one DataFrames and export?
create a list outside of the loop and use pd.concat()
dfs = []
for row in valdf.itertuples():
name = valdf.loc[row.Index, 'Account_ID']
df1 = f(name)
dfs.append(df1)
all_df = pd.concat(dfs)
this assumes the dfs all have the same n dimensions

convert json to dataframe in for loops in python

I'm trying to call the data using api and making a dataframe using for loops with returned json. I am able to create the first dataframe but my for loop only returns the first json -> dataframe. After a few days struggle, I decided to ask guidance from experts here..
import requests
import json
import pandas as pd
# create an Empty DataFrame object
df = pd.DataFrame()
# api header
headers = {"Accept": "application/json","Authorization": "api_secret"}
#email for loops
email_list = ["abc#gmail.com", "xyz#gmail.com"]
#supposed to read 2 emails in the list and append each df but only reads the first one...#
for i in email_list:
querystring = {"where":i}
response = requests.request("GET", "https://example.com/api/2.0/export", headers=headers, params=querystring)
with open('test.jsonl', 'w') as writefile:
writefile.write(response.text)
data = [json.loads(line) for line in open('test.jsonl', 'r')]
FIELDS = ["event"]
df = pd.json_normalize(data)[FIELDS]
df = df.append(df)
I wonder if I need to change something in df append but I can't pinpoint where needs to be changed. thank you so much in advance!
df = pd.json_normalize(data)[FIELDS]
df = df.append(df)
overwrites the dataframe each time instead, create a new one before appending:
df2 = pd.json_normalize(data)[FIELDS]
df = df.append(df2)

How to create a loop that appends new rows to CSV each time it loops?

I'm working with an API in and trying to pull out a complete list of surveys by looping through every users API token. My idea for the loop is that it reads each API token (stored in a list) one at a time, stores the data, converts it to a pandas dataframe and then stores the data in a CSV file. I've so far created a script that successfully loops through the list of API tokens but just overwrites the CSV file each time. Here is my script currently:
apiToken = ["n0000000000001", "N0000000002"]
for x in apiToken:
baseUrl = "https://group.qualtrics.com/API/v3/surveys"
headers = {
"x-api-token": x,
}
response = requests.get(baseUrl, headers=headers)
surveys = response.text
surveys2 = json.loads(response.text)
surveys3 = surveys2["result"]["elements"]
df = pd.DataFrame(surveys3)
df.to_csv('survey_list.csv', index=False)
What changes do I need to make in order for the new rows to be appended on to the CSV rather than overwrite it?
Assuming your code is right, this should work. You make a list of the separate dataframes, and concat them together. This works if all the DataFrames in the list of DataFrames (dfs) have the same column names.
apiToken = ["n0000000000001", "N0000000002"]
dfs = []
for x in apiToken:
baseUrl = "https://group.qualtrics.com/API/v3/surveys"
headers = {
"x-api-token": x,
}
response = requests.get(baseUrl, headers=headers)
surveys = response.text
surveys2 = json.loads(response.text)
surveys3 = surveys2["result"]["elements"]
dfs.append(pd.DataFrame(surveys3))
final_df = pd.concat(dfs)
final_df .to_csv('survey_list.csv', index=False)
Using the csv package you can append files to your csv as follows:
import csv
...
with open(filename,'a') as filecsv:
writer = csv.writer(filecsv)
line = # whatever csv line you want to append
writer.writerow(line)

Why is my for loop overwriting instead of appending CSV?

I am trying to scrape IB website. So, what I am doing, I have created the urls to iterate over, and I am able to extract the required information, but seems the dataframe keeps being overwritten vs appending.
import pandas as pd
from pandas import DataFrame as df
from bs4 import BeautifulSoup
import csv
import requests
base_url = "https://www.interactivebrokers.com/en/index.phpf=2222&exch=mexi&showcategories=STK&p=&cc=&limit=100"
n = 1
url_list = []
while n <= 2:
url = (base_url + "&page=%d" % n)
url_list.append(url)
n = n+1
def parse_websites(url_list):
for url in url_list:
html_string = requests.get(url)
soup = BeautifulSoup(html_string.text, 'lxml') # Parse the HTML as a string
table = soup.find('div',{'class':'table-responsive no-margin'}) #Grab the first table
df = pd.DataFrame(columns=range(0,4), index = [0]) # I know the size
for row_marker, row in enumerate(table.find_all('tr')):
column_marker = 0
columns = row.find_all('td')
try:
df.loc[row_marker] = [column.get_text() for column in columns]
except ValueError:
# It's a safe way when [column.get_text() for column in columns] is empty list.
continue
print(df)
df.to_csv('path_to_file\\test1.csv')
parse_websites(url_list)
Can you please take a look at my code at advise what I am doing wrong ?
One solution if you want to append the data frames on the file is to write in append mode:
df.to_csv('path_to_file\\test1.csv', mode='a', header=False)
otherwise you should create the data frame outside as mentioned in the comments.
If you define a data structure from within a loop, each iteration of the loop
will redefine the data structure, meaning that the work is being rewritten.
The dataframe should be defined outside of the loop if you do not want it to be overwritten.

Categories