python - import data from url to pandas - python

Can you please help me to code the import the data coming from this url into a pandas dataframe? Is a time serie of a mutual fund and I need to make some statistical analysis and plot.
http://tools.morningstar.it/api/rest.svc/timeseries_price/jbyiq3rhyf?currencyId=EURtype=Morningstar&frequency=daily&startDate=2008-04-01&priceType=&outputType=COMPACTJSON&id=F00000YU62]2]0]FOITA$$ALL&applyTrackRecordExtension=true
Any hint is appreciated to understand how it works
thanks

Answer to get result:
import requests
import pandas as pd
URL ='http://tools.morningstar.it/api/rest.svc/timeseries_price/jbyiq3rhyf?currencyId=EURtype=Morningstar&frequency=daily&startDate=2008-04-01&priceType=&outputType=COMPACTJSON&id=F00000YU62]2]0]FOITA$$ALL&applyTrackRecordExtension=true'
r = requests.get(URL)
# a= eval(r.content) Never user eval for online texts
df = pd.DataFrame(r.json())
Answer to understand whats going on
In my answer, I use a little trick that is not recommended all the times.
First,
I used request to get data from URL and then evaluate it using python eval function, as you can see its a nested list. But its a better idea to use r.json()
pandas.DataFrame is a method that converts data to data frame using different method for example you can use nested list or json like data(like dictionaries) to create a Dataframe.
But In most case results from web can become a pandas Dataframe using pd.read_csv it parse data using sep and lineterminator.

Or you can just use pd.read_json(URL)
import pandas as pd
URL = "http://your.url.com/api"
pd.read_json(URL)

Related

Converting print output into a dataframe

I am currently working on a world bank data project, and would like to convert the following output into a simple pandas dataframe.
import pandas as pd
##If already have wbgapi installed, if not pip install wbgapi in cmd prompt
import wbgapi as wbgapi
print(wbgapi.economy.info())
Note, you may need to pip install wbgapi if you do not already have it.
I wish to convert the output of this print statement, which pulls up what is essentially a table, as then I can use pandas functions on this table, sortby etc.
I tried, what I was 99.9% sure wouldn't work as the output is ofcourse a string not a dict,
economies = wbgapi.economy.info()
df = pd.DataFrame(economies)
Approaches such as converting a string to a list etc., I cannot figure out how to approach the string in question (the output of print(wb.economy.info())) given its spacing and column like nature, rather than being a block of text.
Any help would be greatly appreciated.
Try this:
df = wbgapi.economy.DataFrame()
I would suggest having a look at the documentation either way:
https://pypi.org/project/wbgapi/
To convert the output of wbgapi.economy.info() into a Pandas DataFrame, you can use the json_normalize() function from the pandas.io.json module. Here's an example code snippet that should work:
import pandas as pd
import wbgapi as wb
from pandas.io.json import json_normalize
economies = wb.economy.info()
df = json_normalize(economies)
In this code, we first import the necessary modules, including wbgapi and json_normalize. Then, we use wbgapi.economy.info() to get the data we want to convert to a DataFrame. Finally, we use json_normalize() to convert the data to a DataFrame.
Note that json_normalize() expects a list of dictionaries, so we don't need to do any additional string or list manipulation. We can pass the economies data directly to json_normalize().
Once you have the DataFrame, you can use Pandas functions like sort_values() and others to manipulate and analyze the data.

I am having a difficulty in creating a dataframe in pandas, which is taken from a particular url. Could someone look after this?

import pandas as pd
daf = pd.read_html('https://github.com/justmarkham/DAT8/blob/master/data/beer.txt' )
*this would extract the dataset from the mentioned url but I am facing trouble in setting up the dataframe with the required index. Just lemme know how to organise the dataset properly. If you dont understand my question, just look at the code and run it and i guess you'll figure out what am i asking. *
You could use pandas.read_csv:
import pandas as pd
daf = pd.read_csv('https://github.com/justmarkham/DAT8/blob/master/data/beer.txt', ' ')

How to get data from object in Python

I want to get the discord.user_id, I am VERY new to python and just need help getting this data.
I have tried everything and there is no clear answer online.
currently, this works to get a data point in the attributes section
pledge.relationship('patron').attribute('first_name')
You should try this :
import pandas as pd
df = pd.read_json(path_to_your/file.json)
The ourput will be a DataFrame which is a matrix, in which the json attributes will be the names of the columns. You will have to manipulate it afterwards, which is preferable, as the operations on DataFrames are optimized in terms of processing time.
Here is the official documentation, take a look.
Assuming the whole object is call myObject, you can obtain the discord.user_id by calling myObject.json_data.attributes.social_connections.discord.user_id

implement a text classifier with python

i try to implement a Persian text classifier with python, i use excel to read my data and make my data set.
i would be thankful if you have any suggestion about better implementing.
i tried this code to access to body of messages which have my conditions and store them. i took screenshot of my excel file to help more.
for example i want to store body of messages which its col "foolish" (i mean F column) have value of 1(true).
https://ibb.co/DzS1RpY "screenshot"
import pandas as pd
file='1.xlsx'
sorted=pd.read_excel(file,index_col='foolish')
var=sorted[['body']][sorted['foolish']=='1']
print(var.head())
expected result is body of rows 2,4,6,8.
try assigning like this:
df_data=df["body"][df["foolish"]==1.0]
dont use - which is a python operator instead use _ (underscore)
Also note that this will return a series.
For a dataframe , use:
df_data = pd.DataFrame(df['body'][df["foolish"]==1.0])

Can Pandas read a nested JSON blob without parsing sub JSON structures?

I'm trying to parse a JSON blob with Pandas without parsing the nested JSON structures. Here's an example of what I mean.
import json
import pandas as pd
x = json.loads('{"test":"something", "yes":{"nest":10}}')
df = pd.DataFrame(x)
When I do df.head() I get the following:
test yes
nest something 10
What I really want is ...
test yes
1 something {"nest": 10}
Any ideas on how to do this with Pandas? I have workaround ideas, but I'm parsing GBs of JSON files and do not want to be dependent on a slow for loop to convert and prep the information for Pandas. It would be great to do this efficiently while still utilizing the speed of Pandas.
Note: There's been a correction to this question to fix and an error about my reference to json objects.
I'm trying to parse a JSON blob with Pandas
No you're not. You're just constructing a DataFrame out of a plain old Python dict. That dict might have been parsed from JSON somewhere else in your code, or it may never have been JSON in the first place. It doesn't matter; either way, you're not using Pandas's JSON parsing. In fact, if you did try to construct a DataFrame directly out of a JSON string, you would get a PandasError.
If you do use Pandas parsing, you can use its options, as documented in pandas.read_json. For example:
>>> j = '{"test": "something", "yes": {"nest": 10}}'
>>> pd.read_json(j, typ='series')
test something
yes {u'nest': 10}
dtype: object
(Of course that's obviously a Series, not a DataFrame. But I'm not sure exactly what you want your DataFrame to be here…)
But if you've already parsed the JSON elsewhere, you obviously can't use Pandas's data parsing on it.
Also:
… and do not want to be dependent on a slow for loop to convert and prep the information for Pandas …
Then use, e.g., a dict comprehension, generator expression, itertools function, or something else that can do the looping in C instead of in Python.
However, I doubt that the speed of looping over the JSON strings is actually a real performance issue here, compared to the cost of parsing the JSON, building the Pandas structures, etc. Figure out what's actually taking the time by profiling, then optimize that, instead of just picking some random part of your code and hoping it makes a difference.

Categories