Convert mulitcolumn json to pandas dataframe - python

I want to convert a json file to a pandas dataframe.
My json's format is:
{'n': '{"nbTxs":952,"address":0xaf6787931e7369113b667c2cb09449de88951144,"nbTxsSend":473,"TotalAmountOut":5.649219999999994e-13,"nbConnectedAddressesReceived":15,"nbDistinctAmount":313,"MaxAmountReceived":3.75e-14,"MaxAmountSend":3.1795e-14,"community":222249,"type":Person,"AmountRatio":0.9769629723144068,"nbDistinctAmountSend":278,"MaxAmount":3.75e-14,"nbTxsReceived":479,"nbConnectedAddressesSend":12,"TotalAmountIn":5.782429999999999e-13,"nbConnectedAddresses":25,"nbDistinctAmountReceived":285}',
's': '{"tx_hash":0x097d22cd626396f9034e6aa3222d34684eea4b0fa755c5f5a21f32910dfd6b75,"value_eth":1.57e-15,"timestamp":2017-07-28 06:57:09 UTC}',
'i': '{"date":Jun 2017,"address":0x1f573d6fb3f13d689ff844b4ce37794d79a7ff1c,"name":Bancor,"raised":$153,000,000,"type":ICO}'},
{'n': '{"nbTxs":7298,"address":0xcab007003d241d7e7e8c1092ab93911b669ebd0c,"nbTxsSend":3340,"TotalAmountOut":6.252459000000004e-12,"nbConnectedAddressesReceived":18,"nbDistinctAmount":2243,"MaxAmountReceived":3.033e-14,"MaxAmountSend":3.1996e-14,"community":222249,"type":Person,"AmountRatio":1.0002945308767859,"nbDistinctAmountSend":1629,"MaxAmount":3.1996e-14,"nbTxsReceived":3958,"nbConnectedAddressesSend":6,"TotalAmountIn":6.25061800000001e-12,"nbConnectedAddresses":20,"nbDistinctAmountReceived":1770}',
's': '{"tx_hash":0x198427cb789f34b14a222bf7a9ed95e52ed545fe6378ee3efcecee845f10f1ce,"value_eth":1.119e-15,"timestamp":2017-08-21 13:50:53 UTC}',
'i': '{"date":Jun 2017,"address":0x1f573d6fb3f13d689ff844b4ce37794d79a7ff1c,"name":Bancor,"raised":$153,000,000,"type":ICO}'}
Using:
import pandas as pd
import json
from pandas import json_normalize
with open("data/ico.json", "r") as f:
data = json.loads(f.read())
df_nested_list = pd.json_normalize(data, record_path =['n'])
I get the TypeError:
{'n': '{"nbTxs":186,"address":0x6c5208924c5b302f756a79776a8b2918a041ad4d,"nbTxsSend":186,"TotalAmountOut":0.0,"score_behavior":31,"nbDistinctAmount":1,"MaxAmountSend":0.0,"community":9040,"type":Person,"s[...]"score_bulk":10}
for path n. Must be list or null.
I tried to use the pd.json_normalize(data, record_path =[...]) function to convert my json into a pandas dataframe.

Related

Converting an xlsx file to a dictionary in Python pandas

I am trying to import a dataframe from an xlsx file to Python and then convert this dataframe to a dictionary. This is how my Excel file looks like:
A B
1 a b
2 c d
where A and B are names of columns and 1 and 2 are names of rows.
I want to convert the data frame to a dictionary in python, using pandas. My code is pretty simple:
import pandas as pd
my_dict = pd.read_excel(‘.\inflation.xlsx’, sheet_name = ‘Sheet2’, index_col=0).to_dict()
print(my_dict)
What I want to get is:
{‘a’:’b’, ‘c’:’d’}
But what I get is:
{‘b’:{‘c’:’d’}}
What might be the issue?
This does what is requested:
import pandas as pd
d = pd.read_excel(‘.\inflation.xlsx’, sheet_name = ‘Sheet2’,index_col=0,header=None).transpose().to_dict('records')[0]
print(d)
Output:
{'a': 'b', 'c': 'd'}
The to_dict() function takes an orient parameter which specifies how the data will be manipulated. There are other options if you have more rows.
This should work
import pandas as pd
my_dict = pd.read_excel(‘.\inflation.xlsx’, sheet_name = ‘Sheet2’,header = 0 index_col=None).to_dict('records')
print(my_dict)

How to write a list of dicts to a csv and create a pandas dataframe from the CSV buffer in Python?

I am trying to create a csv object from a list of dicts and then create a pandas dataframe by reading that csv object as a string buffer. But the resultant pandas dataframe doesn't look right to me. I am not sure how to format it correctly. I was wondering if anyone could suggest me right approach. Here is the code which I am using:
import pandas as pd
import io
import csv
data = [{"x":123,"y":146},{"x":146,"y":None},
{"x":187,"y":123},{"x":114,"y":184},{"x":1328,"y":977}]
output = io.StringIO()
writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(data)
output.getvalue()
pd.read_csv(io.StringIO(output.getvalue()))
The last line generates the following in one single line:
{'x': 123, 'y': 146} {'x': 146, 'y': None} {'x': 187, 'y': 123} {'x': 114, 'y': 184} {'x': 1328, 'y': 977}
I would like to format this as x and y as column names and the respective values as the rows.
Thanks
You can use the following code -
import pandas as pd
data = [{"x":123,"y":146},
{"x":146,"y":None},
{"x":187,"y":123},
{"x":114,"y":184},
{"x":1328,"y":977}]
data = pd.DataFrame(data)
data.to_csv("/tmp/test.csv", index=None)
pd.read_csv('/tmp/test.csv', index_col=None)
Here is a much easier way to do it
import json
import pandas as pd
data = [{"x":123,"y":146},{"x":146,"y":None},
{"x":187,"y":123},{"x":114,"y":184},{"x":1328,"y":977}]
data = json.dumps(data)
df = pd.read_json(data)
print(df)
Output:
x y
0 123 146.0
1 146 NaN
2 187 123.0
3 114 184.0
4 1328 977.0
Note that column "y" is coerced to float because NaN is a float.

How to read JSON with Pandas along with the list of dictionary

How can I convert a dictionary into the dataframe through pandas. I want to get all dictionary values into separate column in dataframe.
{
"jobId":"3355f555ffr1af3fae56b8b74d02",
"runVerId":"333",
"totalNumberofJobs":30,
"startIndex":0,
"issue":[
{
"id":"00a9a6248fhf9849fj45",
"path":"",
"jobId":"33fj484jjfjb74d02",
"plugin":"SSL",
"vcid":"763.2",
"method":null,
"flawDescription":"testdjfk kkdkdkrikssllss",
"flawRemediation":"Lakkdjnjdjj jdjdjkiedksk kskskkfkfk",
"paramType":"ASIS",
"paramName":"NONE",
"paramDescription":null,
"originalArg":"ddsd",
"injectedArg":"",
"referrerUrl":null,
"host":"",
"port":8020,
"found":null,
"secure":null,
"insecure":true,
"rawEvent":"sjsjjjhjjduennnjfjfiibcbckskscbjkkkskkkfdfdfdfdfdsX3Byb3RvY29scyBUTFN2MS4xIFRMU3YxLjJcIiBpbiB0aG45df5f2f1g2fgf5g12f12df121f2df1d2f12d2vIGRlddsdjskskkskskkcncncmJlODciLCJvcmlnaW5hbGFyZyI6IlNTTHYzIiwicGFyYW1kZXNjcmlwdGlvbiI6InNzbDMiLCJwYXJhbW5hbWUiOiJOT05FIiwicGFyYW10eXBlIjoiQVNJUyIsInBsdWdpbiI6IlNTTCIsInBvcnQiOjgwLCJyZWZlcnJlcnVybCI6IiIsInJlcHJvIjpbXSwicmVxdWVzdCI6IiIsInJlc3BvbnNlIjoiIiwic2VjdXJlIjpmYWxzZSwidGltZXN0YW1wIjoxNTQ2NTUwNDA4MTU4LCJ2Y2lkIjoiNzU3LjgwMiJ9"
}
]
}
hi you can use json_normalize of pandas
import json
from pandas.io.json import json_normalize
with open('pathToJson.json') as data_file:
data = json.load(data_file)
df = json_normalize(data, 'issue', ['id', 'jobId', 'path'], #add field names here you want in your dataframe
record_prefix='issue_')
here df will be your dataframe and you nested data will be made with the column name starting with prefix issue_

Python - How To Convert Pandas Dataframe To JSON Object?

I'm using df.to_json() to convert dataframe to json. But it gives me a json string and not an object.
How can I get JSON object?
Also, when I'm appending this data to an array, it adds single quote before and after the json and it ruins the json structure.
How can I export to json object and append properly?
Code Used:
a=[]
array.append(df1.to_json(orient='records', lines=True))
array.append(df2.to_json(orient='records', lines=True))
Result:
['{"test:"w","param":1}','{"test:"w2","param":2}]']
Required Result:
[{"test":"w","param":1},{"test":"w2","param":2}]
Thank you!
I believe need create dict and then convert to json:
import json
d = df1.to_dict(orient='records')
j = json.dumps(d)
Or if possible:
j = df1.to_json(orient='records')
Here's what worked for me:
import pandas as pd
import json
df = pd.DataFrame([{"test":"w","param":1},{"test":"w2","param":2}])
print(df)
test param
0 w 1
1 w2 2
So now we convert to a json string:
d = df.to_json(orient='records')
print(d)
'[{"test":"w","param":1},{"test":"w2","param":2}]'
And now we parse this string to a list of dicts:
data = json.loads(d)
print(data)
[{'test': 'w', 'param': 1}, {'test': 'w2', 'param': 2}]

Python - Pandas - read_csv - Convert a csv in a dictionary to be passed to pd.DateFrame

I have a csv containing some ticks of a currency pair:
{'bid':1.2, 'instrument': 'EUR_USD'}
{'bid':1.5, 'instrument': 'EUR_USD'}
I would like to convert this csv in a dictionary, e.g.:
mydict = {0:{'bid':1.2, 'instrument': 'EUR_USD'}, 1: {'bid':1.5, 'instrument': 'EUR_USD'}}
or to whatever iterable that can be read by the pandas DataFrame class:
pd.DataFrame(mydict)
which would yield to the following dataframe:
bid instrument
0 1.2 EUR_USD
1 1.5 EUR_USD
In other words, the labels of the csv should be the columns of the dataframe.
Do you have any suggestions on how to achieve this?
python 3.4.1
pandas 0.15.2
setup
from io import StringIO
import pandas as pd
txt = """{'bid':1.2, 'instrument': 'EUR_USD'}
{'bid':1.5, 'instrument': 'EUR_USD'}"""
# assigns pandas series of dictionary strings
s = pd.read_csv(StringIO(txt), sep=';', header=None, squeeze=True)
solution
pd.DataFrame(s.apply(eval).tolist())

Categories