I'm a novice when it comes to Python and in order to learn it, I was working on a side project. My goal is to track card prices of my YGO cards using the yu-gi-oh prices API https://yugiohprices.docs.apiary.io/#
I am attempting to manually enter the print tag for each card and then have the API pull the data and populate the spreadsheet, such as the name of the card and its trait, in addition to the price data. So anytime I run the code, it is updated.
My idea was to use a for loop to get the API to search up each print tag and store the information in an empty dictionary and then post the results onto the excel file. I added an example of the spreadsheet.
Please let me know if I can clarify further. Any suggestions to the code that would help me achieve the goal for this project would be appreciated. Thanks in advance
import requests
import response as rsp
import urllib3
import urlopen
import json
import pandas as pd
df = pd.read_excel("api_ygo.xlsx")
print(df[:5]) # See the first 5 columns
response = requests.get('http://yugiohprices.com/api/price_for_print_tag/print_tag')
print(response.json())
data = []
for i in df:
print_tag = i[2]
request = requests.get('http://yugiohprices.com/api/price_for_print_tag/print_tag' + print_tag)
data.append(print_tag)
print(data)
def jprint(obj):
text = json.dumps(obj, sort_keys=True, indent=4)
print(text)
jprint(response.json())
Example Spreadsheet
Iterating over a pandas dataframe can be done using df.apply(). This has the added advantage that you can store the results directly in your dataframe.
First define a function that returns the desired result. Then apply the relevant column to that function while assigning the output to a new column:
import requests
import pandas as pd
import time
df = pd.DataFrame(['EP1-EN002', 'LED6-EN007', 'DRL2-EN041'], columns=['print_tag']) #just dummy data, in your case this is pd.read_excel
def get_tag(print_tag):
request = requests.get('http://yugiohprices.com/api/price_for_print_tag/' + print_tag) #this url works, the one in your code wasn't correct
time.sleep(1) #sleep for a second to prevent sending too many API calls per minute
return request.json()
df['result'] = df['print_tag'].apply(get_tag)
You can now export this column to a list of dictionaries with df['result'].tolist(). Or even better, you can flatten the results into a new dataframe with pd.json_normalize:
df2 = pd.json_normalize(df['result'])
df2.to_excel('output.xlsx') # save dataframe as new excel file
https://www.goalserve.com/getfeed/2633a3fcbb2b4558740708d89fcf1b20/football/nfl-standings
i watched a couple videos about extracting json objects with python, but i havnt been able to find any with objects within objects.
import pandas as pd
import random
import numpy as np
import requests
import json
#%%
response = requests.get("https://www.goalserve.com/getfeed/2633a3fcbb2b4558740708d89fcf1b20/football/nfl-standings?json=1")
output = response.json()
# Extract specific node content.
data = output['standings']['category']['league']
# print(output['standings']['category']['league'])
#print(type(data))
#for team in range(len(data)):
# print(team['division'])
# Dump data as string
data = json.dumps(output, indent=4)
print(data)
for team in data['division']['team']:
print(team)
i get this error "string indices must be integers". How can i access the attributes for each team record?
Hi According to your code I can see one flaw that is been there is no validation.
make these two validations after that manipulate the request
if response.status_code==200
if response and response.text
after validating you can pass the response for the data manipulation.
There's a couple of problems here. First is that this line:
data = json.dumps(output, indent=4)
converts the data structure in the data variable back to a string so you can't treat it like a dictionary.
The second problem is that the data structure here is a mix of dictionaries and lists, so you need to treat data which is the list of leagues as a list, not a dictionary. The same goes for league['division'] and division['team'].
Here's a version that prints the team names:
import pandas as pd
import random
import numpy as np
import requests
import json
response = requests.get("https://www.goalserve.com/getfeed/2633a3fcbb2b4558740708d89fcf1b20/football/nfl-standings?json=1")
output = response.json()
data = output['standings']['category']['league']
for league in data:
for division in league['division']:
for team in division['team']:
print(team['name'])
I would like the following code to download the xlsx files from the URL and save in drive.
I receive this error:
AttributeError: 'str' object has no attribute 'content'
Below is the code:
import requests
import xlrd
import pandas as pd
filed = 'https://www.icicipruamc.com/downloads/others/monthly-portfolio-disclosures/monthly-portfolio-disclosure-november19/Arbitrage.xlsx'
resp = requests.get(filed)
workbook = xlrd.open_workbook(file_contents = filed.content)
worksheet = workbook.sheet_by_index(0)
first_row = worksheet.row(0)
df = pd.DataFrame(first_row)
pandas already has a function thas converts excel direclty into pandas dataframe (using xlrd):
import pandas as pd
MY_EXCEL_URL="www.yes.com/xl.xlsx"
xl_df = pd.read_excel(MY_EXCEL_URL,
sheet_name='my_sheet',
skiprows=range(5),
skipfooter=0)
then yo can handle /save file using pd.DataFrame.to_excel
This function works, tested individual components. The ICICI website you have seems to give me a 404. So make sure the website works and has an excel sheet before trying this out.
import requests
import pandas as pd
def excel_to_pandas(URL, local_path):
resp = requests.get(URL)
with open(local_path, 'wb') as output:
output.write(resp.content)
df = pd.read_excel(local_path)
return df
print(excel_to_pandas("www.websiteforxls.com", '~/Desktop/my_downloaded.xls'))
As a footnote, this was super simple. And I'm disappointed you couldn't do this on your own. I might not have been able to do this 5 years ago, and that's why I decided to help.
If you want to code. Learn the basics, literally the basics: Class, Functions, Variables, Types, OOP principals. And that's all you need to start. Then you need to learn how to search, and make different components to work together the way you require them too. And with SO, if you show some effort, we are happy to help. We are a community, not a place to solve your homework. Try harder next time.
I try to read multi-level JSON with pandas and store data in the data-frame for next work with it or for print. The main goal for me is to understand how to read data from each level of JSON.
Here you are my first steps, which works:
import pandas as pd
import requests
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
d = req.json()
#what is next step?
#something like this? df = pd.DataFrame.from_dict(d.Data)
Could you tell me, how to read:
1st level (columns PageIndex, PageSize, TotalCount, Data)
2 level (from Data columns Code, Timestamp, Category, snapshots)
3 level (from Data and snapshots columns Code, DateFrom, DateTo, Type ...)
some good tip for next work with data?
maybe you tell me, that using pandas is not the best way how to read JSON
Here is json:
my json file to download from OneDrive
{"PageIndex":0,"PageSize":2,"TotalCount":100248,"Data":[{"Code":"859182400102974","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400102974","DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-502,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"},{"Code":"859182400102974","DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-382,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation":-382}],"ConsumptionEstimation2":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation2":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation2":-382}]}},{"Code":"859182400104897","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400104897","DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-280,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"},{"Code":"859182400104897","DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-282,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation":-282}],"ConsumptionEstimation2":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation2":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation2":-282}]}}]}
Thank you
I think using pandas to process JSON is not a good choice, because pandas is trying to deal with structural data, but in your example you are dealing with multi-level unstructured data.
But if you insist to do that, you can extract structural data from your JSON structure. For example, you can extract the array in JSON_ROOT."Data"."snapshots" into an ArrayList and save it into pd.DataFrame. Otherwise, you can only save the JSON structure as a string in one column in pd.DataFrame.
From answers above I am not more clever as before.
So I try to reduce my question to one question.
How Can I get table with 4 columns:
Data.Code; Data.snapshots.DateFrom; Data.snapshots.Address.Street; Data.snapshots.Address.City
This is my code, but it is necessary to correct it, but I do not how. The Code works but it returns 30 columns and not exactly what I want.
import pandas as pd
import requests
import pandas.io.json as pd_json
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()
df = pd_json.json_normalize(fin,
record_path=['Data','snapshots'],
record_prefix = 'Data.',
errors = 'ignore'
)
print(df)
Thank you for help.