I would like the following code to download the xlsx files from the URL and save in drive.
I receive this error:
AttributeError: 'str' object has no attribute 'content'
Below is the code:
import requests
import xlrd
import pandas as pd
filed = 'https://www.icicipruamc.com/downloads/others/monthly-portfolio-disclosures/monthly-portfolio-disclosure-november19/Arbitrage.xlsx'
resp = requests.get(filed)
workbook = xlrd.open_workbook(file_contents = filed.content)
worksheet = workbook.sheet_by_index(0)
first_row = worksheet.row(0)
df = pd.DataFrame(first_row)
pandas already has a function thas converts excel direclty into pandas dataframe (using xlrd):
import pandas as pd
MY_EXCEL_URL="www.yes.com/xl.xlsx"
xl_df = pd.read_excel(MY_EXCEL_URL,
sheet_name='my_sheet',
skiprows=range(5),
skipfooter=0)
then yo can handle /save file using pd.DataFrame.to_excel
This function works, tested individual components. The ICICI website you have seems to give me a 404. So make sure the website works and has an excel sheet before trying this out.
import requests
import pandas as pd
def excel_to_pandas(URL, local_path):
resp = requests.get(URL)
with open(local_path, 'wb') as output:
output.write(resp.content)
df = pd.read_excel(local_path)
return df
print(excel_to_pandas("www.websiteforxls.com", '~/Desktop/my_downloaded.xls'))
As a footnote, this was super simple. And I'm disappointed you couldn't do this on your own. I might not have been able to do this 5 years ago, and that's why I decided to help.
If you want to code. Learn the basics, literally the basics: Class, Functions, Variables, Types, OOP principals. And that's all you need to start. Then you need to learn how to search, and make different components to work together the way you require them too. And with SO, if you show some effort, we are happy to help. We are a community, not a place to solve your homework. Try harder next time.
I try to read multi-level JSON with pandas and store data in the data-frame for next work with it or for print. The main goal for me is to understand how to read data from each level of JSON.
Here you are my first steps, which works:
import pandas as pd
import requests
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
d = req.json()
#what is next step?
#something like this? df = pd.DataFrame.from_dict(d.Data)
Could you tell me, how to read:
1st level (columns PageIndex, PageSize, TotalCount, Data)
2 level (from Data columns Code, Timestamp, Category, snapshots)
3 level (from Data and snapshots columns Code, DateFrom, DateTo, Type ...)
some good tip for next work with data?
maybe you tell me, that using pandas is not the best way how to read JSON
Here is json:
my json file to download from OneDrive
{"PageIndex":0,"PageSize":2,"TotalCount":100248,"Data":[{"Code":"859182400102974","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400102974","DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-502,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"},{"Code":"859182400102974","DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-382,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation":-382}],"ConsumptionEstimation2":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation2":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation2":-382}]}},{"Code":"859182400104897","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400104897","DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-280,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"},{"Code":"859182400104897","DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-282,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation":-282}],"ConsumptionEstimation2":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation2":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation2":-282}]}}]}
Thank you
I think using pandas to process JSON is not a good choice, because pandas is trying to deal with structural data, but in your example you are dealing with multi-level unstructured data.
But if you insist to do that, you can extract structural data from your JSON structure. For example, you can extract the array in JSON_ROOT."Data"."snapshots" into an ArrayList and save it into pd.DataFrame. Otherwise, you can only save the JSON structure as a string in one column in pd.DataFrame.
From answers above I am not more clever as before.
So I try to reduce my question to one question.
How Can I get table with 4 columns:
Data.Code; Data.snapshots.DateFrom; Data.snapshots.Address.Street; Data.snapshots.Address.City
This is my code, but it is necessary to correct it, but I do not how. The Code works but it returns 30 columns and not exactly what I want.
import pandas as pd
import requests
import pandas.io.json as pd_json
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()
df = pd_json.json_normalize(fin,
record_path=['Data','snapshots'],
record_prefix = 'Data.',
errors = 'ignore'
)
print(df)
Thank you for help.