Converting print output into a dataframe - python

I am currently working on a world bank data project, and would like to convert the following output into a simple pandas dataframe.
import pandas as pd
##If already have wbgapi installed, if not pip install wbgapi in cmd prompt
import wbgapi as wbgapi
print(wbgapi.economy.info())
Note, you may need to pip install wbgapi if you do not already have it.
I wish to convert the output of this print statement, which pulls up what is essentially a table, as then I can use pandas functions on this table, sortby etc.
I tried, what I was 99.9% sure wouldn't work as the output is ofcourse a string not a dict,
economies = wbgapi.economy.info()
df = pd.DataFrame(economies)
Approaches such as converting a string to a list etc., I cannot figure out how to approach the string in question (the output of print(wb.economy.info())) given its spacing and column like nature, rather than being a block of text.
Any help would be greatly appreciated.

Try this:
df = wbgapi.economy.DataFrame()
I would suggest having a look at the documentation either way:
https://pypi.org/project/wbgapi/

To convert the output of wbgapi.economy.info() into a Pandas DataFrame, you can use the json_normalize() function from the pandas.io.json module. Here's an example code snippet that should work:
import pandas as pd
import wbgapi as wb
from pandas.io.json import json_normalize
economies = wb.economy.info()
df = json_normalize(economies)
In this code, we first import the necessary modules, including wbgapi and json_normalize. Then, we use wbgapi.economy.info() to get the data we want to convert to a DataFrame. Finally, we use json_normalize() to convert the data to a DataFrame.
Note that json_normalize() expects a list of dictionaries, so we don't need to do any additional string or list manipulation. We can pass the economies data directly to json_normalize().
Once you have the DataFrame, you can use Pandas functions like sort_values() and others to manipulate and analyze the data.

Related

python/pandas : Pandas changing the value adding extra digits in values [duplicate]

I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.
When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003 which actually should be 2470.691137. Or the value 2484.30691 is shown as 2484.3069100000002.
This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv by giving the dtype argument as {'columnname': np.float64}. Still the issue did not go away.
How can I get the values imported and shown exactly as they are in the source csv file?
Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip' to read_csv fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf" to the corresponding method.
A full example:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
I realise this is an old question, but maybe this will help someone else:
I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precision option only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:
In the pd.read_csv arguments, define dtype='str' and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64') .
Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.

How to get data from object in Python

I want to get the discord.user_id, I am VERY new to python and just need help getting this data.
I have tried everything and there is no clear answer online.
currently, this works to get a data point in the attributes section
pledge.relationship('patron').attribute('first_name')
You should try this :
import pandas as pd
df = pd.read_json(path_to_your/file.json)
The ourput will be a DataFrame which is a matrix, in which the json attributes will be the names of the columns. You will have to manipulate it afterwards, which is preferable, as the operations on DataFrames are optimized in terms of processing time.
Here is the official documentation, take a look.
Assuming the whole object is call myObject, you can obtain the discord.user_id by calling myObject.json_data.attributes.social_connections.discord.user_id

python - import data from url to pandas

Can you please help me to code the import the data coming from this url into a pandas dataframe? Is a time serie of a mutual fund and I need to make some statistical analysis and plot.
http://tools.morningstar.it/api/rest.svc/timeseries_price/jbyiq3rhyf?currencyId=EURtype=Morningstar&frequency=daily&startDate=2008-04-01&priceType=&outputType=COMPACTJSON&id=F00000YU62]2]0]FOITA$$ALL&applyTrackRecordExtension=true
Any hint is appreciated to understand how it works
thanks
Answer to get result:
import requests
import pandas as pd
URL ='http://tools.morningstar.it/api/rest.svc/timeseries_price/jbyiq3rhyf?currencyId=EURtype=Morningstar&frequency=daily&startDate=2008-04-01&priceType=&outputType=COMPACTJSON&id=F00000YU62]2]0]FOITA$$ALL&applyTrackRecordExtension=true'
r = requests.get(URL)
# a= eval(r.content) Never user eval for online texts
df = pd.DataFrame(r.json())
Answer to understand whats going on
In my answer, I use a little trick that is not recommended all the times.
First,
I used request to get data from URL and then evaluate it using python eval function, as you can see its a nested list. But its a better idea to use r.json()
pandas.DataFrame is a method that converts data to data frame using different method for example you can use nested list or json like data(like dictionaries) to create a Dataframe.
But In most case results from web can become a pandas Dataframe using pd.read_csv it parse data using sep and lineterminator.
Or you can just use pd.read_json(URL)
import pandas as pd
URL = "http://your.url.com/api"
pd.read_json(URL)

Converting python Dataframe to Matlab file

I am trying to convert a python Dataframe to a Matlab (.mat) file.
I initially have a txt (EEG signal) that I import using panda.read_csv:
MyDataFrame = pd.read_csv("data.txt",sep=';',decimal='.'), data.txt being a 2D array with labels. This creates a dataframe which looks like this.
In order to convert it to .mat, I tried this solution where the idea is to convert the dataframe into a dictionary of lists but after trying every aspect of this solution it's still unsuccessful.
scipy.io.savemat('EEG_data.mat', {'struct':MyDataFrame.to_dict("list")})
It did create a .mat file but it did not save my dataframe properly. The file I obtain after looks like this, so all the values are basically gone, and the remaining labels you see are empty when you look into them.
I also tried using mat4py which is designed to export python structures into Matlab files, but it did not work either. I don't understand why, because converting my dataframe to a dictionary of lists is exactly what should be done according to the mat4py documentation.
I believe that the reason the previous solutions haven't worked for you is that your DataFrame column names are not valid MATLAB struct field names, because they contain spaces and/or start with digit characters.
When I do:
import pandas as pd
import scipy.io
MyDataFrame = pd.read_csv('eeg.txt',sep=';',decimal='.')
truncDataFrame = MyDataFrame[0:1000] # reduce data size for test purposes
scipy.io.savemat('EEGdata1.mat', {'struct1':truncDataFrame.to_dict("list")})
the result in MATLAB is a struct with the 4 fields reltime, datetime, iSensor and quality. Each of these has 1000 elements, so the data from these columns has been converted, but the rest of your data is missing.
However if I first rename the DataFrame columns:
truncDataFrame.rename(columns=lambda x:'col_' + x.replace(' ', '_'), inplace=True)
scipy.io.savemat('EEGdata2.mat', {'struct2':truncDataFrame.to_dict("list")})
the result in MATLAB is a struct with 36 fields. This is not the same format as your mat4py solution but it does contain (as far as I can see) all the data from the source DataFrame.
(Note that in your question, you are creating a .mat file that contains a variable called struct and when this is loaded into MATLAB it masks the builtin struct datatype - that might also cause issues with subsequent MATLAB code.)
I finally found a solution thanks to this post. There, the poster did not create a dictionary of lists but a dictionary of integers, which worked on my side. It is a small example, easily reproductible. Then I tried to manually add lists by entering values like [1, 2], an it did not work. But what worked was when I manually added tuples !
MyDataFrame needs to be converted to a dictionary and if a dictionary of lists doesn't work, try with tuples.
For beginners : lists are contained by [] and tuples by (). Here is an image showing both.
This worked for me:
import mat4py as mp
EEGdata = MyDataFrame.apply(tuple).to_dict()
mp.savemat('EEGdata.mat',{'structs': EEGdata})
EEGdata.mat should now be readable by Matlab, as it is on my side.

Can Pandas read a nested JSON blob without parsing sub JSON structures?

I'm trying to parse a JSON blob with Pandas without parsing the nested JSON structures. Here's an example of what I mean.
import json
import pandas as pd
x = json.loads('{"test":"something", "yes":{"nest":10}}')
df = pd.DataFrame(x)
When I do df.head() I get the following:
test yes
nest something 10
What I really want is ...
test yes
1 something {"nest": 10}
Any ideas on how to do this with Pandas? I have workaround ideas, but I'm parsing GBs of JSON files and do not want to be dependent on a slow for loop to convert and prep the information for Pandas. It would be great to do this efficiently while still utilizing the speed of Pandas.
Note: There's been a correction to this question to fix and an error about my reference to json objects.
I'm trying to parse a JSON blob with Pandas
No you're not. You're just constructing a DataFrame out of a plain old Python dict. That dict might have been parsed from JSON somewhere else in your code, or it may never have been JSON in the first place. It doesn't matter; either way, you're not using Pandas's JSON parsing. In fact, if you did try to construct a DataFrame directly out of a JSON string, you would get a PandasError.
If you do use Pandas parsing, you can use its options, as documented in pandas.read_json. For example:
>>> j = '{"test": "something", "yes": {"nest": 10}}'
>>> pd.read_json(j, typ='series')
test something
yes {u'nest': 10}
dtype: object
(Of course that's obviously a Series, not a DataFrame. But I'm not sure exactly what you want your DataFrame to be here…)
But if you've already parsed the JSON elsewhere, you obviously can't use Pandas's data parsing on it.
Also:
… and do not want to be dependent on a slow for loop to convert and prep the information for Pandas …
Then use, e.g., a dict comprehension, generator expression, itertools function, or something else that can do the looping in C instead of in Python.
However, I doubt that the speed of looping over the JSON strings is actually a real performance issue here, compared to the cost of parsing the JSON, building the Pandas structures, etc. Figure out what's actually taking the time by profiling, then optimize that, instead of just picking some random part of your code and hoping it makes a difference.

Categories