convert json with nested dicts into data frame with python - python

can someone explain how I convert the following json into a simple data frame with the following headings?
----- sample----
{
"last_scanned_block": 14968718,
"blocks": {
"13965799": {
"0x9603846aff5c425277e483de16179a68dbc739debcc5449ea99e45c9d0924430": {
"165": {
"from": "0x0000000000000000000000000000000000000000",
"to": "0x01f87c337be5636Cd9B3D48F1159768A7e7837A5",
"value": 100000000000000000000000000,
"timestamp": "2022-01-08T16:19:02"
}
}
},
"13965820": {
"0xd4a4122734a522c40504c8b0ab43b9aa40ac821cd9913179b3ae64e5b166fc57": {
"226": {
"from": "0x01f87c337be5636Cd9B3D48F1159768A7e7837A5",
"to": "0xEa3Fa123Eb40CEEaeED390D8d6dE6AF95f044AF7",
"value": 610000000000000000000000,
"timestamp": "2022-01-08T16:25:12"
}
}
},
--- end----
I'd like the df to have the following 8 column headings and values for each row
(value examples for first row)
Last_scanned_block: 14968718
block: 13965799
hex: 0x9603846aff5c425277e483de16179a68dbc739debcc5449ea99e45c9d0924430
number: 165
from: 0x0000000000000000000000000000000000000000
to: 0x01f87c337be5636Cd9B3D48F1159768A7e7837A5
value: 100000000000000000000000000
timestamp: 2022-01-08T16:19:02
Thanks

I would make a new dictionary from the json that is passed in. Essentially instead of having nested dictionaries like you have above you want to get them into one simple dictionary according to your headings and values. It should be:
*heading name* : *list of values*
Essentially, the resulting format should be:
{"Last_scanned_block" : [14968718], "block" : [13965799], "hex" : ["0x9603846aff5c425277e483de16179a68dbc739debcc5449ea99e45c9d0924430"], "number" : [165], "from" : ["0x0000000000000000000000000000000000000000"], "to" : ["0x01f87c337be5636Cd9B3D48F1159768A7e7837A5"], "value": [100000000000000000000000000], "timestamp" : ["2022-01-08T16:19:02"]}
Then every time you read more data you just append it to each respective list in your dictionary.
Once you have your complete dictionary, you would use pandas. So something along the lines of:
import pandas
d = *the dictionary above*
frame = pandas.DataFrame(data = d)
print(frame)

Related

Generating json file from dict python with changing values from function

I have functions witch generate data witch I add t dict the think is there I want my json file to look like this 1.to have multiple data not only one key value pair like in my code:
{"data":[
{"key":"Shyam", "value":10.4},
{"key":"Bob", "value":12.5},
{"key":"Jai", "value":24.2}
]}
This is how is look like the moment only one key value pair is added:
{
"key": "Amadm",
"value": 14.5
}
This is my code to assign to dict before json dumps.
:
def gen_dict(key, value, ts):
data = {
"name": key,
"value": value,
"ts": ts
}
return data
json_object = json.dumps(gen_dict(gen_key(), gen_value()), indent = 4)
So my question is how to assign more than one key value pair in the dict and later to transform to json obj like in the example I show in the example.
You are creating a single dictionary, what you want is a list of dictionaries:
Assuming that you each call to gen_key() and gen_value() generates a single instance of the data, you can use:
# Some random key
def gen_key():
return ''.join((random.choice(string.ascii_lowercase) for x in range(5)))
# Some random value
def gen_value():
return random.choice(range(1000))
s = json.dumps({"data": [ {
"name": gen_key(), "value": gen_value()} for i in range(3)] }, indent = 4)
output:
{
"data": [
{
"name": "rrqct",
"value": 162
},
{
"name": "vbuyq",
"value": 422
},
{
"name": "kfyqt",
"value": 7
}
]
}

convert excel colums in nested arrays/ list using pandas

I have some data in csv file and i need to convert the data into as lists and arrays to json format. here is a sample data:
the desired output id :
{
"topics":[
{
"topicID":1,
"labels":[
{
"phrase":"security level",
"prob":0.3
},
{
"phrase":" hack",
"prob":0.3
},
{
"phrase":"our server lab",
"prob":0.2
},
{
"phrase":" people",
"prob":0.2
},
{
"phrase":" trouble",
"prob":0.2
}
]
},
{
"topicID":2,
"labels":[
{
"phrase":"base3",
"prob":0.4806
}
]
}
]
}
and so on.
i have just extracted 5 colums to get topics array:
df.loc[:, ['t_1', 't_2', 't_3','t_4','t_5']]
and i have converted topics columns to array:
topic_list = df[[''t_1', 't_2', 't_3','t_4','t_5'']].values
but I am clueless how to append phrases and other columns in this array?

Add missing fields with null values as per position mentioned in the config file in Python while parsing the JSON file data

I Have a config file
Position,ColumnName
1,TXS_ID
4,TXX_NAME
8,AGE
As per the above position i have 1 , 4, 8 --- we have only 3 columns are available. In between 1 & 4 we don't have 2,3 position where i want to fill them with Null Values .
As per the above config file i am trying to parse the data from a Json file by using Python but i have a scenario where i need to define the columns on the base of position as mentioned above. When python script is running if the "TXS_ID" is available it should pick the data from the JSON file & as i dont have 2& 3 fields i want to keep them as Null.
Sample output file
TSX_ID,,,TXX_NAME,,,,AGE
10000,,,AAAAAAAAA,,,,40
As per the config file i specify , data should be extracted from Json file and if the position is missing as per above example then it should be filling with nulls. Please help me if there is any possibility i can achieve.
Below is the sample Json File.
{
"entities": [
{
"id": "XXXXXXXXXXXXXXX",
"data": {
"attributes": {
"TSX_ID": {
"values": [
{
"value": 10000
}
]
},
"TXX_NAME": {
"values": [
{
"value": "AAAAAAAAA"
}
]
},
"AGE": {
"values": [
{
"value": "40"
}
]
}
}
}
}
]
}
Assuming that the config file line 1,TXS_ID has a typo and is actually 1,TSX_ID, this program works with your sample data (see explanations in comments):
import pandas
# read the "config file" into a Series of the "ColumnName"s:
config = pandas.read_csv('config', index_col='Position', squeeze=True)
maxdex = config.index[-1] # get the maximum Position
# fill the Positions missing in the "config file" with empty "ColumnName"s:
config = config.reindex(range(1, maxdex+1), fill_value='')
import json
sample = json.load(open('sample.json'))
# create an empty DataFrame with the desired columns:
output = pandas.DataFrame(columns=config.values)
# now insert the nested JSON data values into the given columns:
for a in config.values:
if a: # only if not an empty column name, of course
output[a] = [av['value'] for e in sample['entities']
for av in e['data']['attributes'][a]['values']]
output.to_csv('output.csv', index=False)

JSON extract to pandas dataframe

I'm currently trying to process a json as pandas dataframe. What happened here is that I get a continuous stream of json structures. They are simply appended. It's a whole line. I extracted a .txt from it and want to analyse it now via pandas.
Example snippet:
{"positionFlightMessage":{"messageUuid":"95b3b6ca-5dd2-44b4-918a-baa51022d143","schemaVersion":"1.0-RC1","timestamp":1533134514,"flightNumber":"DLH1601","position":{"waypoint":{"latitude":44.14525,"longitude":-1.31849},"flightLevel":340,"heading":24.0},"messageSource":"ADSB","flightUniqueId":"AFR1601-1532928365-airline-0002","airlineIcaoCode":"AFR","atcCallsign":"AFR89GA","fuel":{},"speed":{"groundSpeed":442.0},"altitude":{"altitude":34000.0},"nextPosition":{"waypoint":{}},"messageSubtype":"ADSB"}}{"positionFlightMessage":{"messageUuid":"884708c1-2fff-4ebf-b72c-bbc6ed2c3623","schemaVersion":"1.0-RC1","timestamp":1533134515,"flightNumber":"DLH012","position":{"waypoint":{"latitude":37.34542,"longitude":143.79951},"flightLevel":320,"heading":54.0},"messageSource":"ADSB","flightUniqueId":"EVA12-1532928367-airline-0096","airlineIcaoCode":"DLH","atcCallsign":"EVA012","fuel":{},"speed":{"groundSpeed":462.0},"altitude":{"altitude":32000.0},"nextPosition":{"waypoint":{}},"messageSubtype":"ADSB"}}...
as you see in this light snipped is, that every json starts with {"positionFlightMessage": and ends with messageSubtype":"ADSB"
After a json ends, the next json just appends after it.
What i need is a table out of it, like this:
95b3b6ca-5dd2-44b4-918a-baa51022d143 1.0-RC1 1533134514 DLH1601 4.414.525 -131.849 340 24.0 ADSB AFR1601-1532928365-airline-0002 AFR AFR89GA 442.0 34000.0 ADSB
884708c1-2fff-4ebf-b72c-bbc6ed2c3623 1.0-RC1 1533134515 DLH012 3.734.542 14.379.951 320 54.0 ADSB EVA12-1532928367-airline-0096 DLH EVA012 462.0 32000.0 ADSB
i tried to use pandas read json but i get a error.
import pandas as pd
df = pd.read_json("tD.txt",orient='columns')
df.head()
ValueError: Trailing data
tD.txt has the above given snippet without the last (...) dots
I think the problem is, that every json is just appended. I could add a new line after every
messageSubtype":"ADSB"}}
and then read it, but maybe you have a solution where i can just convert the big txt file directly and convert it easily to a df
Try to get the stream of json to output like the following:
Notice the starting '[' and the ending ']'.
Also notice the ',' between each json input.
data = [{
"positionFlightMessage": {
"messageUuid": "95b3b6ca-5dd2-44b4-918a-baa51022d143",
"schemaVersion": "1.0-RC1",
"timestamp": 1533134514,
"flightNumber": "DLH1601",
"position": {
"waypoint": {
"latitude": 44.14525,
"longitude": -1.31849
},
"flightLevel": 340,
"heading": 24.0
},
"messageSource": "ADSB",
"flightUniqueId": "AFR1601-1532928365-airline-0002",
"airlineIcaoCode": "AFR",
"atcCallsign": "AFR89GA",
"fuel": {},
"speed": {
"groundSpeed": 442.0
},
"altitude": {
"altitude": 34000.0
},
"nextPosition": {
"waypoint": {}
},
"messageSubtype": "ADSB"
}
}, {
"positionFlightMessage": {
"messageUuid": "884708c1-2fff-4ebf-b72c-bbc6ed2c3623",
"schemaVersion": "1.0-RC1",
"timestamp": 1533134515,
"flightNumber": "DLH012",
"position": {
"waypoint": {
"latitude": 37.34542,
"longitude": 143.79951
},
"flightLevel": 320,
"heading": 54.0
},
"messageSource": "ADSB",
"flightUniqueId": "EVA12-1532928367-airline-0096",
"airlineIcaoCode": "DLH",
"atcCallsign": "EVA012",
"fuel": {},
"speed": {
"groundSpeed": 462.0
},
"altitude": {
"altitude": 32000.0
},
"nextPosition": {
"waypoint": {}
},
"messageSubtype": "ADSB"
}
}]
Now you should be able to loop over each 'list' element in the json and append it to the pandas df.
print(len(data))
for i in range(0,len(data)):
#here is just show messageSource only. Up to you to find out the rest..
print(data[i]['positionFlightMessage']['messageSource'])
#instead of printing here you should append it to pandas df.
Hope this helps you out a bit.
Now here's a solution for your JSON as is using regex.
s = '{"positionFlightMessage":{"messageUuid":"95b3b6ca-5dd2-44b4-918a-baa51022d143","schemaVersion":"1.0-RC1","timestamp":1533134514,"flightNumber":"DLH1601","position":{"waypoint":{"latitude":44.14525,"longitude":-1.31849},"flightLevel":340,"heading":24.0},"messageSource":"ADSB","flightUniqueId":"AFR1601-1532928365-airline-0002","airlineIcaoCode":"AFR","atcCallsign":"AFR89GA","fuel":{},"speed":{"groundSpeed":442.0},"altitude":{"altitude":34000.0},"nextPosition":{"waypoint":{}},"messageSubtype":"ADSB"}}{"positionFlightMessage":{"messageUuid":"884708c1-2fff-4ebf-b72c-bbc6ed2c3623","schemaVersion":"1.0-RC1","timestamp":1533134515,"flightNumber":"DLH012","position":{"waypoint":{"latitude":37.34542,"longitude":143.79951},"flightLevel":320,"heading":54.0},"messageSource":"ADSB","flightUniqueId":"EVA12-1532928367-airline-0096","airlineIcaoCode":"DLH","atcCallsign":"EVA012","fuel":{},"speed":{"groundSpeed":462.0},"altitude":{"altitude":32000.0},"nextPosition":{"waypoint":{}},"messageSubtype":"ADSB"}}'
import re
import json
replaced = json.loads('['+re.sub(r'{\"positionFlightMessage*', ',{\"positionFlightMessage', s)[1:] + ']')
dfTemp = pd.DataFrame(data=replaced)
df = pd.DataFrame()
counter = 0
def newDf(row):
global df,counter
counter += 1
temp = pd.DataFrame([row])
df = df.append(temp)
dfTemp['positionFlightMessage'] = dfTemp['positionFlightMessage'].apply(newDf)
print(df)
First we replace all occurrences of {"positionFlightMessage with ,{"positionFlightMessage and discard the first separator.
We create a dataframe out of this but we have only one column here. Use the apply function on the column and create a new dataframe out of it.
From this dataframe, you can perform some more cleaning.

How to parse nestead json and construct relational database columns from dict values using python

Below is my sample json. Am trying to extract "attributes" part of the json and insert into a relational database. But I needed to construct "name" values as relational columns and insert "value" values into table. I mean
{"name":"ID","value":"528BE6D9FD"} "ID" as a column and insert 528BE6D9FD under the "ID". Its just beginning of my python learning so not sure on how to construct columns from dictionary values.
d = 'C:/adapters/sample1.json'
json_data = open(d).read()
json_file = json.loads(json_data)
for children in json_file["events"]:
#print (children)
for grandchildren in children["attributes"]:
#print(grandchildren)
for key, value in grandchildren.iteritems():
#if key == 'name':
print value
{
"events":[
{
"timestamp":"2010-11-20T11:08:00.978Z",
"code":"Event",
"namespace":null,
"version":null,
"attributes":[
{
"name":"ID",
"value":"528BE6D9FD"
},
{
"name":"Total",
"value":67
},
{
"name":"PostalCode",
"value":"6064"
},
{
"name":"Category",
"value":"More"
},
{
"name":"State",
"value":"QL"
},
{
"name":"orderDateTime",
"value":"2010-07-20T12:08:13Z"
},
{
"name":"CategoryID",
"value":"1091"
},
{
"name":"billingCountry",
"value":"US"
},
{
"name":"shipping",
"value":"Go"
},
{
"name":"orderFee",
"value":77
},
{
"name":"Name",
"value":"Roy"
}
]
}
]
}
As far as extracting the attributes hash of your json data, I would do that like so:
json_path = "c:\\adapters\\sample1.json"
with open(json_path) as json_file:
json_dict = json.load(json_file)
attributes = json_dict['events'][0]['attributes']
Now, I don't know which database system you are using, but regardless, you can extract names, and values with list comprehensions like so:
names = [key['name'] for key in attributes]
values = [key['value'] for key in attributes]
And now just create a table if needed, insert names as column headers, and insert values as a single row with respect to names.

Categories