Python: How to set up bounds for index - python

I'm trying to convert json file to excel and modify it.
After normalizing the json and try to add columns I get an error saying index 20 is out of bounds for axis 0 with size 19. However, when I normalize 3 things from JSON I don't get this error but when I normalize just 2 things I get an error.
Here's my code
def get_data(link :str):
resp = requests.get(link) #reading link
txt = resp.json()
data = pd.DataFrame(txt['products']) #data
return txt
def main():
#get json data from link
json = get_data(link = 'https://0f91c5da166bc1b5a70cce01e1f0370c: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#shopeclat.myshopify.com/admin/api/2022-07/products.json')
v = pd.json_normalize(json['products'], record_path =['variants'],meta=['id','title','body_html', 'vendor','product_type','created_at','updated_at','status','image','tags'],record_prefix='varients_')
i = pd.json_normalize(json['products'], record_path =['images'],meta=['id','title','body_html', 'vendor','product_type','created_at','updated_at','status','image','tags'],record_prefix='images_')
#merging all three dataset on id
df = [v,i]
final_df = reduce(lambda left,right: pd.merge(left,right,on=['id'],
how='outer'), df)
print("Exporting csv files ....")
final_df.to_csv('Bound.csv',index = False)
if __name__ == '__main__':
main()

Maybe .explode() is what you want:
import requests
import pandas as pd
url = "https://0f91c5da166bc1b5a70cce01e1f0370c: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#shopeclat.myshopify.com/admin/api/2022-07/products.json"
df = (
pd.DataFrame(requests.get(url).json()["products"])
.explode("variants")
.explode("options")
.explode("images")
)
df = pd.concat(
[
df,
df.pop("variants").apply(pd.Series).add_prefix("v_"),
df.pop("options").apply(pd.Series).add_prefix("o_"),
df.pop("images").apply(pd.Series).add_prefix("imgs_"),
],
axis=1,
)
df.to_csv("out.csv", index=False)
Creates out.csv (screenshot from Libre Office):

Related

Add rows back to the top of a dataframe

I have a raw dataframe that looks like this
I am trying to import this data as a csv, do some calculations on the data, and then export the data. Before doing this, however, I need to remove the three lines of "header information", but keep the data as I will need to add it back to the dataframe prior to exporting. I have done this using the following lines of code:
import pandas as pd
data = pd.read_csv(r"test.csv", header = None)
info = data.iloc[0:3,]
data = data.iloc[3:,]
data.columns = data.iloc[0]
data = data[1:]
data = data.reset_index(drop = True)
The problem I am having is, how do I add the rows stored in "info" back to the top of the dataframe to make the format equivalent to the csv I imported.
Thank you
You can just use the append() function of pandas to merge two data frames. Please check by printing the final_data.
import pandas as pd
data = pd.read_csv(r"test.csv", header = None)
info = data.iloc[0:3,]
data = data.iloc[3:,]
data.columns = data.iloc[0]
data = data[1:]
data = data.reset_index(drop = True)
# Here first row of data is column header so converting back to row
data = data.columns.to_frame().T.append(data, ignore_index=True)
data.columns = range(len(data.columns))
final_data = info.append(data)
final_data = final_data.reset_index(drop = True)

How to fill cell by cell of an empty pandas dataframe which has zero columns with a loop?

I need to scrape hundreds of pages and instead of storing the whole json of each page, I want to just store several columns from each page into a pandas dataframe. However, at the beginning when the dataframe is empty, I have a problem. I need to fill an empty dataframe without any columns or rows. So the loop below is not working correctly:
import pandas as pd
import requests
cids = [4100,4101,4102,4103,4104]
df = pd.DataFrame()
for i in cids:
url_info = requests.get(f'myurl/{i}/profile')
jdata = url_info.json()
df['Customer_id'] = i
df['Name'] = jdata['user']['profile']['Name']
...
In this case, what should I do?
You can solve this by using enumerate(), together with loc:
for index, i in enumerate(cids):
url_info = requests.get(f'myurl/{i}/profile')
jdata = url_info.json()
df.loc[index, 'Customer_id'] = i
df.loc[index, 'Name'] = jdata['user']['profile']['Name']
If you specify your column names when you create your empty dataframe, as follows:
df = pd.DataFrame(columns = ['Customer_id', 'Name'])
Then you can then just append your new data using:
df = df.append({'Customer_id' : i, 'Name' : jdata['user']['profile']['Name']}, ignore_index=True)
(plus any other columns you populate) then you can add a row to the dataframe for each iteration of your for loop.
import pandas as pd
import requests
cids = [4100,4101,4102,4103,4104]
df = pd.DataFrame(columns = ['Customer_id', 'Name'])
for i in cids:
url_info = requests.get(f'myurl/{i}/profile')
jdata = url_info.json()
df = df.append({'Customer_id' : i, 'Name' : jdata['user']['profile']['Name']}, ignore_index=True)
It should be noted that using append on a DataFrame in a loop is usually inefficient (see here) so a better way is to save your results as a list of lists (df_data), and then turn that into a DataFrame, as below:
cids = [4100,4101,4102,4103,4104]
df_data = []
for i in cids:
url_info = requests.get(f'myurl/{i}/profile')
jdata = url_info.json()
df_data.append([i, jdata['user']['profile']['Name']])
df = pd.DataFrame(df_data, columns = ['Customer_id', 'Name'])

To merge more than one list of table data and save as csv format using pandas

From the following code, when I iterate and print, then I get all table data, but but when I store as csv format using pandas, then I get only the first list of table data. How to store all of them into a single CSV file?
import requests
import pandas as pd
isins = ['LU0526609390:EUR','IE00BHBX0Z19:EUR']
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
df_list = pd.read_html(html)
dfs = df_list
#print(dfs)
for df in dfs:
df.to_csv('data.csv', header=False, index=True)
#print(df)
The idea is to collect the data frames in dfs, loop over it and generate csv files.
import requests
import pandas as pd
isins=['LU0526609390:EUR','IE00BHBX0Z19:EUR']
dfs = []
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
dfs.extend(pd.read_html(html))
df = pd.concat(dfs)
df.to_csv('data.csv' , header=False, index=True)
Loop was overwited file. This not saves it as one file bur each file for each iteration, but you get what was wrong
isins=['LU0526609390:EUR','IE00BHBX0Z19:EUR']
k = 0
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
df_list = pd.read_html(html)
dfs = df_list
#print(dfs)
for df in dfs:
df.to_csv(str(k)+'data.csv' , header=False, index=True)
#print(df)
k = k+1
an easy answer would be to use pd.concat() to create a new df and save this. What do you want the csv to look like though, as the result of this concatenation would be.
["CSV"]: https://i.stack.imgur.com/BvZ1X.png
I don't know whether this is sufficient, as the data is not really labelled (might be problem, if you plan to search for more than two funds).
import requests
import pandas as pd
funds = ['LU0526609390:EUR', 'IE00BHBX0Z19:EUR']
for fund in funds:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={fund}').content
df_list = pd.read_html(html)
df_final = pd.concat(df_list)
# print(df_final)
df_final.to_csv('data.csv', header=False, index=True)
(I replaced isin with fund, as isin is already used in python.)

Formatting of JSON file

Can we convert the highlighted INTEGER values to STRING value (refer below link)?
https://i.stack.imgur.com/3JbLQ.png
CODE
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
import pandas as pd
df = pd.read_csv ('newsample2.csv')
df.to_json('myjson2.json', indent=4)
print(df)
Try doing something like this.
import pandas as pd
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
df = pd.read_csv ('newsample2.csv')
df['index'] = df.index
df.to_json('myjson2.json', indent=4)
print(df)
This will take indices of your data and store them in the index column, so they will become a part of your data.

Generating Dataframe from JSON URL in a column in another DataFrame

I am trying to generate one dataframe based on Json Url in another Dataframe called Data
import requests
import pandas as pd
import numpy as np
resp = requests.get('https://financialmodelingprep.com/api/v3/company/stock/list')
txt = resp.json()
Data = pd.DataFrame(txt['symbolsList'])
Data = Data.assign(keymetric= 'https://financialmodelingprep.com/api/v3/company-key-metrics/'+ Data.symbol + '?period=quarter')
Data = Data.assign(profile= 'https://financialmodelingprep.com/api/v3/company/profile/'+ Data.symbol)
Data = Data.assign(financials= 'https://financialmodelingprep.com/api/v3/financial-statement-growth/'+ Data.symbol + '?period=quarter')
I have 3 problems:
1) when I am downloading the JSON URL in the Dataframe ('Data') I don't have in the output the symbol
in the code below 'AAPL'
resp = requests.get('https://financialmodelingprep.com/api/v3/company-key-metrics/AAPL?period=quarter')
txt = resp.json()
key= pd.DataFrame(txt['metrics'])
2) I don't know how to automate the code above, using as an import the column 'keymetrics' in the dataframe 'Data'
3) once the process is done I am trying to have just one dataframe instead of having one per each symbol
Expected output for keymetrics. Each column should be divided not all aggregated under one column called 'keymetric'
This code can work.
import pandas as pd
import requests
resp = requests.get('https://financialmodelingprep.com/api/v3/company/stock/list')
txt = resp.json()
Data = pd.DataFrame(txt['symbolsList'])
def get_value(symbol):
resp_keymetric = requests.get(f'https://financialmodelingprep.com/api/v3/company-key-metrics/{symbol}?period=quarter')
resp_profile = requests.get(f'https://financialmodelingprep.com/api/v3/company/profile/{symbol}?period=quarter')
resp_financials = requests.get(f'https://financialmodelingprep.com/api/v3/financial-statement-growth/{symbol}?period=quarter')
try:
txt_keymetric = resp_keymetric.json()['metrics'][0]
txt_profile = resp_profile.json()['profile']
txt_financials = resp_financials.json()['growth'][0]
df_keymetric = pd.DataFrame([txt_keymetric])
df_profile = pd.DataFrame([txt_profile])
df_financials = pd.DataFrame([txt_financials])
df = pd.concat([df_keymetric, df_profile, df_financials], axis=1)
return df
except:
pass
result = []
for symbol in Data['symbol'].values.tolist()[:5]:
try:
df = get_value(symbol)
result.append(df)
except:
pass
result_df = pd.concat(result, axis=0)
print(result_df)
Expected output for keymetrics. Each column should be divided not all aggregated under one column called 'keymetric'
current output

Categories