Python- pull csv from ftp url with credentials - python

Newb to python and working with APIs. My source create an ftp url where they are dumping files on a daily basis and I would like to grab file to perform engineering + analysis. My problem is, how do I specify username and password to pull the csv?
import pandas as pd
data = pd.read_csv('http://site-ftp.site.com/test/cat/filename.csv)
How do I include credentials to this?
PS- url is fake for the sake of an example.

With older versions of Pandas, you can use something like requests.get() to download the CSV data into memory. Then you could use StringIO to make the data "file like" so that pd.read_csv() can read it in. This approach avoids having to first write the data to a file.
import requests
import pandas as pd
from io import StringIO
csv = requests.get("http://site-ftp.site.com/test/cat/filename.csv", auth=HTTPBasicAuth('user', 'password'))
data = pd.read_csv(StringIO(csv.text))
print(data)
From pandas 0.19.2, the pd.read_csv() function now lets you just pass the URL directly. For example:
data = pd.read_csv('http://site-ftp.site.com/test/cat/filename.csv')

Related

How do you run an excel function from within Python?

We are using an excel plugin to pull some data from an API. Our excel file contains a column with an entity identifier, and we use an excel formula to pull data for this entity from the internet.
Is it possible to run this from within Python?
I could export my pd.DataFrame to csv, open it with excel, append the data I want, and read it back into pandas... but is there a quicker way?
You can import the request and extract data using the Json() method
import pandas as pd
import requests
url = 'https://api.covid19api.com/summary'
r = requests.get(url)
json = r.json()
json
Then you have the data and just need to include it in your dataframe

Using Pandas, how to read a csv file inside a zip file which you fetch using an url[Python]

This url
https://ihmecovid19storage.blob.core.windows.net/latest/ihme-covid19.zip
contains 2 csv files, and 1 pdf which is updated daily, containing Covid-19 Data.
I want to be able to load the Summary_stats_all_locs.csv as a Pandas DataFrame.
Usually if there is a url that points to a csv I can just use df = pd.read_csv(url) but since the csv is inside a zip, I can't do that here.
How would I do this?
Thanks
You will need to first fetch the file, then load it using the ZipFile module. Pandas can read csvs from inside a zip actually, but the problem here is there are multiple, so we need to this and specify the file name.
import requests
import pandas as pd
from zipfile import ZipFile
from io import BytesIO
r = requests.get("https://ihmecovid19storage.blob.core.windows.net/latest/ihme-covid19.zip")
files = ZipFile(BytesIO(r.content))
pd.read_csv(files.open("2020_05_16/Summary_stats_all_locs.csv"))

How to download xlsx file from URL and save in data frame via python

I would like the following code to download the xlsx files from the URL and save in drive.
I receive this error:
AttributeError: 'str' object has no attribute 'content'
Below is the code:
import requests
import xlrd
import pandas as pd
filed = 'https://www.icicipruamc.com/downloads/others/monthly-portfolio-disclosures/monthly-portfolio-disclosure-november19/Arbitrage.xlsx'
resp = requests.get(filed)
workbook = xlrd.open_workbook(file_contents = filed.content)
worksheet = workbook.sheet_by_index(0)
first_row = worksheet.row(0)
df = pd.DataFrame(first_row)
pandas already has a function thas converts excel direclty into pandas dataframe (using xlrd):
import pandas as pd
MY_EXCEL_URL="www.yes.com/xl.xlsx"
xl_df = pd.read_excel(MY_EXCEL_URL,
sheet_name='my_sheet',
skiprows=range(5),
skipfooter=0)
then yo can handle /save file using pd.DataFrame.to_excel
This function works, tested individual components. The ICICI website you have seems to give me a 404. So make sure the website works and has an excel sheet before trying this out.
import requests
import pandas as pd
def excel_to_pandas(URL, local_path):
resp = requests.get(URL)
with open(local_path, 'wb') as output:
output.write(resp.content)
df = pd.read_excel(local_path)
return df
print(excel_to_pandas("www.websiteforxls.com", '~/Desktop/my_downloaded.xls'))
As a footnote, this was super simple. And I'm disappointed you couldn't do this on your own. I might not have been able to do this 5 years ago, and that's why I decided to help.
If you want to code. Learn the basics, literally the basics: Class, Functions, Variables, Types, OOP principals. And that's all you need to start. Then you need to learn how to search, and make different components to work together the way you require them too. And with SO, if you show some effort, we are happy to help. We are a community, not a place to solve your homework. Try harder next time.

Read multi-level json in python with pandas from url

I try to read multi-level JSON with pandas and store data in the data-frame for next work with it or for print. The main goal for me is to understand how to read data from each level of JSON.
Here you are my first steps, which works:
import pandas as pd
import requests
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
d = req.json()
#what is next step?
#something like this? df = pd.DataFrame.from_dict(d.Data)
Could you tell me, how to read:
1st level (columns PageIndex, PageSize, TotalCount, Data)
2 level (from Data columns Code, Timestamp, Category, snapshots)
3 level (from Data and snapshots columns Code, DateFrom, DateTo, Type ...)
some good tip for next work with data?
maybe you tell me, that using pandas is not the best way how to read JSON
Here is json:
my json file to download from OneDrive
{"PageIndex":0,"PageSize":2,"TotalCount":100248,"Data":[{"Code":"859182400102974","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400102974","DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-502,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"},{"Code":"859182400102974","DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-382,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation":-382}],"ConsumptionEstimation2":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation2":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation2":-382}]}},{"Code":"859182400104897","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400104897","DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-280,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"},{"Code":"859182400104897","DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-282,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation":-282}],"ConsumptionEstimation2":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation2":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation2":-282}]}}]}
Thank you
I think using pandas to process JSON is not a good choice, because pandas is trying to deal with structural data, but in your example you are dealing with multi-level unstructured data.
But if you insist to do that, you can extract structural data from your JSON structure. For example, you can extract the array in JSON_ROOT."Data"."snapshots" into an ArrayList and save it into pd.DataFrame. Otherwise, you can only save the JSON structure as a string in one column in pd.DataFrame.
From answers above I am not more clever as before.
So I try to reduce my question to one question.
How Can I get table with 4 columns:
Data.Code; Data.snapshots.DateFrom; Data.snapshots.Address.Street; Data.snapshots.Address.City
This is my code, but it is necessary to correct it, but I do not how. The Code works but it returns 30 columns and not exactly what I want.
import pandas as pd
import requests
import pandas.io.json as pd_json
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()
df = pd_json.json_normalize(fin,
record_path=['Data','snapshots'],
record_prefix = 'Data.',
errors = 'ignore'
)
print(df)
Thank you for help.

How to use Python Web Scraping to download CSV file then convert it to Pandas Dataframe?

I'd like my script to do the following:
1) Access this website:
2) Import a CSV file titled "Sales Data with Leading Indicator"
3) Convert it to pandas Dataframe for data analysis.
Currently, the code I have is this:
response = request.urlopen("http://vincentarelbundock.github.io/Rdatasets/datasets.html")
csv = response.read()
Thanks in advance
pandas.read_csv() method accepts a URL to a csv file as its buffer, so
import pandas as pd
pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/datasets/BJsales.csv')
Should basically work. See further info here .

Categories