I'm trying to convert json obtained from a python GET request (requests library) into a pandas dataframe.
I've tried some other solutions on the subject, including json_normalize, however it does not appear to be working. The dataframe appears as a single column with dictionary's.
response = requests.get(myUrl, headers=head)
data = response.json()
#what now?
gives me the following json:
"data": [
{
"timestamp": "2019-04-10T11:40:13.437Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.010000228881836
},
{
"comp": "humid",
"value": 34.4900016784668
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 166
},
{
"comp": "pm25",
"value": 4
},
{
"comp": "lux",
"value": 961.4000244140625
},
{
"comp": "spl_a",
"value": 45.70000076293945
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -2
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
}
How do i convert this into a dataframe? The end result is supposed to look have the following headers:
you can import json in order use json package.
json package has loads() method, you can use this method convert json object to dict object, then by giving key to this dict object to get value to put it into dataframe.
Related
I am new to python and now want to convert a csv file into json file. Basically the json file is nested with dynamic structure, the structure will be defined using the csv header.
From csv input:
ID, Name, person_id/id_type, person_id/id_value,person_id_expiry_date,additional_info/0/name,additional_info/0/value,additional_info/1/name,additional_info/1/value,salary_info/details/0/grade,salary_info/details/0/payment,salary_info/details/0/amount,salary_info/details/1/next_promotion
1,Peter,PASSPORT,A452817,1-01-2055,Age,19,Gender,M,Manager,Monthly,8956.23,unknown
2,Jane,PASSPORT,B859804,2-01-2035,Age,38,Gender,F,Worker, Monthly,125980.1,unknown
To json output:
[
{
"ID": 1,
"Name": "Peter",
"person_id": {
"id_type": "PASSPORT",
"id_value": "A452817"
},
"person_id_expiry_date": "1-01-2055",
"additional_info": [
{
"name": "Age",
"value": 19
},
{
"name": "Gender",
"value": "M"
}
],
"salary_info": {
"details": [
{
"grade": "Manager",
"payment": "Monthly",
"amount": 8956.23
},
{
"next_promotion": "unknown"
}
]
}
},
{
"ID": 2,
"Name": "Jane",
"person_id": {
"id_type": "PASSPORT",
"id_value": "B859804"
},
"person_id_expiry_date": "2-01-2035",
"additional_info": [
{
"name": "Age",
"value": 38
},
{
"name": "Gender",
"value": "F"
}
],
"salary_info": {
"details": [
{
"grade": "Worker",
"payment": " Monthly",
"amount": 125980.1
},
{
"next_promotion": "unknown"
}
]
}
}
]
Is this something can be done by the existing pandas API or I have to write lots of complex codes to dynamically construct the json object? Thanks.
I have this JSON file:
{
"entityId": "proc_1234",
"displayName": "oracle12",
"firstSeenTms": 1639034760000,
"lastSeenTms": 1650386100000,
"properties": {
"detectedName": "oracle.sysman.gcagent.tmmain.TMMain",
"bitness": "64",
"jvmVendor": "IBM",
"metadata": [
{
"key": "COMMAND_LINE_ARGS",
"value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0"
},
{
"key": "EXE_NAME",
"value": "java"
},
{
"key": "EXE_PATH",
"value": "/usr/local/oracle/oem/agent*c/agent_*/oracle_common/jdk/bin/java"
},
{
"key": "JAVA_MAIN_CLASS",
"value": "oracle.sysman.gcagent.tmmain.TMMain"
},
{
"key": "EXE_PATH",
"value": "/usr/local/oracle/oem/agent12c/agent_13.3.0.0.0/oracle_common/jdk/bin/java"
}
]
}
}
I need to extract entityId, detectedName, EXE_NAME, EXE_PATH from the json file.
output should be like this:
entityId detectedName EXE_NAME EXE_PATH
proc_1234 oracle.sysman.gcagent.tmmain.TMMain java /usr/local/oracle/oem/agent*c/agent_*/oracle_common/jdk/bin/java
I have tried this:
Procdf = (pd.json_normalize(resp2, record_path=['properties', 'metadata'], meta=['entityId']).drop_duplicates(subset=['key']) .query("key in ['EXE_NAME','EXE_PATH']").pivot('entityId', 'key', 'value', 'detectedName').reset_index())
I get this error:
TypeError: pivot() takes from 1 to 4 positional arguments but 5 were given
It is not clear to me what exactly the purpose of pivot is. But you are trying to pivot detectedName, that's not in your dataframe. Below might be what you need.
import pandas as pd
det_name = list(resp2.get('properties','detectedName').values())[0]
dataframe = pd.json_normalize(resp2, record_path=['properties', 'metadata'], meta=['entityId']).drop_duplicates(subset=['key']).query("key in ['EXE_NAME','EXE_PATH']").assign(detectedName=det_name).T
print(type(dataframe))
<class 'pandas.core.frame.DataFrame'>
I am new to converting pandas dataframe into json object.
I have a data frame:
Expected json output after conversion is this.
{
"Name": {
"id": "Max",
},
"Favorites" : [
{
"id":"Apple",
"priority":"High",
"Count":"4"
},
{
"id":"Oranges",
"priority":"Medium",
"Count":"2"
},
{
"id":"Banana",
"priority":"Low",
"Count":"1"
}
]
}
Here's a freebie. Hope it helps you learn how to write it yourself in the future :)
output = []
for index, row in df.iterrows():
entry = {
"Name": {
"id": row['Names']
},
"Favorites": [
{
"id": row['High_Priority_Goods_Name'],
"priority": "High",
"count": row['High_Priority_Goods_Count']
},
{
"id": row['Medium_Priority_Goods_Name'],
"priority": "Medium",
"count": row['Medium_Priority_Goods_Count']
},
{
"id": row['Low_Priority_Goods_Name'],
"priority": "Low",
"count": row['Low_Priority_Goods_Count']
}
]
}
output.append(entry)
print(output)
I am trying to get json output of via one of API request , which i wanted then load that into excel file
The problem is the response i get from api, if i dump it to json.dumps() method, its becoming not parsable. But if i try to parse it as text, then tried to format it json formatter its parsing
Though i wrote code to write to csv below, but i wanted it to excel file..
Here is my sample respone.text variable in my actual code looks like:
{
"value": [
{
"correlationId": "xxxxxxxxxx",
"eventName": {
"value": "EndRequest",
"localizedValue": "EndRequest"
},
"id": "/subscriptions/xxxxxxxxxx/resourcegroups/xxxxxxxxx/providers/Microsoft.Compute/virtualMachines/xxxxxx/extensions/enablevmaccess/events/xxxxxxxxxx/ticks/xxxxxxxx",
"level": "Informational",
"resourceGroupName": "xxxxxx",
"resourceProviderName": {
"value": "Microsoft.Compute",
"localizedValue": "Microsoft.Compute"
},
"operationName": {
"value": "Microsoft.Compute/virtualMachines/extensions/write",
"localizedValue": "Microsoft.Compute/virtualMachines/extensions/write"
},
"status": {
"value": "Succeeded",
"localizedValue": "Succeeded"
},
"eventTimestamp": "2020-08-06T12:47:02.0657952Z",
"submissionTimestamp": "2020-08-06T12:49:03.137537Z"
},
{
"correlationId": "xxxxxxxxxx",
"eventName": {
"value": "EndRequest",
"localizedValue": "EndRequest"
},
"id": "/subscriptions/xxxxxxxxxx/resourcegroups/xxxxxxxxx/providers/Microsoft.Compute/virtualMachines/xxxxxx/extensions/enablevmaccess/events/xxxxxxxxxx/ticks/xxxxxxxx",
"level": "Informational",
"resourceGroupName": "xxxxxx",
"resourceProviderName": {
"value": "Microsoft.Compute",
"localizedValue": "Microsoft.Compute"
},
"operationName": {
"value": "Microsoft.Compute/virtualMachines/extensions/write",
"localizedValue": "Microsoft.Compute/virtualMachines/extensions/write"
},
"status": {
"value": "Succeeded",
"localizedValue": "Succeeded"
},
"eventTimestamp": "2020-08-06T12:47:02.0657952Z",
"submissionTimestamp": "2020-08-06T12:49:03.137537Z"
},
]
}
Here the code I am trying:
d_date = datetime.datetime.now()
today = d_date.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
print(today)
N = 10
date_N_days_ago = datetime.datetime.now() - timedelta(days=N)
start_date = date_N_days_ago.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
print(start_date)
vm_list = compute_client.virtual_machines.list_all()
for vm_general in vm_list:
general_view = vm_general.id.split("/")
resource_group = general_view[4]
print(resource_group)
BASE_URL = f"https://management.azure.com/subscriptions/{subscription_id}/providers/microsoft.insights/eventtypes/management/values?api-version=2015-04-01&$filter=eventTimestamp ge {start_date} and eventTimestamp le {today} and resourceGroupName eq {resource_group}&$select=eventName,id,resourceGroupName,resourceProviderName,operationName,status,eventTimestamp,correlationId,submissionTimestamp,level"
BASE_URL = BASE_URL
headers = {
"Authorization": 'Bearer ' + credential.token["access_token"]
}
response = requests.get(BASE_URL, headers=headers)
# if i convert below line to df_json = response.json() it says AttributeError: 'str' object has no attribute 'json'
df_json = response.text # this is a string but i am able to parse it properly in json forammter
print(df_json)
with open('c:\csv\logs_test.csv', 'w') as f:
for key in df_json.keys():
f.write("%s,%s\n" % (key, df_json[key]))
break
I am getting error like:
AttributeError: 'str' object has no attribute 'keys'
Expected result:
Actually I need to to write to xls (excel) format having columns as "correlationId,eventName,id,resourceGroupName,resourceProviderName,operationName,status,eventTimestamp,submissionTimestamp
You can actually use eval to convert the text to a dictionary and then use pandas to convert it to an excel file.
import pandas
response_dict = eval(response.text)
df = pd.DataFrame(response_dict['value'])
df['tag'] = "Managed by IT"
file_name = 'data.xls'
df.to_excel(file_name, index = False)
The easiest is to convert to pandas dataframe and then to xls file.
You will to have to install xlwt - pip install xlwt.
import pandas as pd
data = {
"value": [
{
"correlationId": "xxxxxxxxxx",
"eventName": {
"value": "EndRequest",
"localizedValue": "EndRequest"
},
"id": "/subscriptions/xxxxxxxxxx/resourcegroups/xxxxxxxxx/providers/Microsoft.Compute/virtualMachines/xxxxxx/extensions/enablevmaccess/events/xxxxxxxxxx/ticks/xxxxxxxx",
"level": "Informational",
"resourceGroupName": "xxxxxx",
"resourceProviderName": {
"value": "Microsoft.Compute",
"localizedValue": "Microsoft.Compute"
},
"operationName": {
"value": "Microsoft.Compute/virtualMachines/extensions/write",
"localizedValue": "Microsoft.Compute/virtualMachines/extensions/write"
},
"status": {
"value": "Succeeded",
"localizedValue": "Succeeded"
},
"eventTimestamp": "2020-08-06T12:47:02.0657952Z",
"submissionTimestamp": "2020-08-06T12:49:03.137537Z"
},
{
"correlationId": "xxxxxxxxxx",
"eventName": {
"value": "EndRequest",
"localizedValue": "EndRequest"
},
"id": "/subscriptions/xxxxxxxxxx/resourcegroups/xxxxxxxxx/providers/Microsoft.Compute/virtualMachines/xxxxxx/extensions/enablevmaccess/events/xxxxxxxxxx/ticks/xxxxxxxx",
"level": "Informational",
"resourceGroupName": "xxxxxx",
"resourceProviderName": {
"value": "Microsoft.Compute",
"localizedValue": "Microsoft.Compute"
},
"operationName": {
"value": "Microsoft.Compute/virtualMachines/extensions/write",
"localizedValue": "Microsoft.Compute/virtualMachines/extensions/write"
},
"status": {
"value": "Succeeded",
"localizedValue": "Succeeded"
},
"eventTimestamp": "2020-08-06T12:47:02.0657952Z",
"submissionTimestamp": "2020-08-06T12:49:03.137537Z"
}
]
}
df = pd.json_normalize(data['value'])
cols = ["correlationId","eventName.value","id","resourceGroupName","resourceProviderName.value","operationName.value","status.value","eventTimestamp","submissionTimestamp"]
df[cols].to_excel("data.xls", index=False)
Instead of json, use demjson. Install the library - pip install demjson because json parses correctly only if it's a proper json.
import demjson
data = demjson.decode(response.text)
# remaining code goes on
I currently have a json file that looks like this....
{
"data": [
{
"tag": "cashandequivalents",
"value": 10027000000.0
},
{
"tag": "shortterminvestments",
"value": 101000000.0
},
{
"tag": "accountsreceivable",
"value": 4635000000.0
},
{
"tag": "netinventory",
"value": 1386000000.0
}...
but what I am trying to get to is this
{
"cashandequivalents": 10027000000.0,
"shortterminvestments":101000000.0 ,
"accountsreceivable":4635000000.0,
"netinventory":1386000000.0
}
I just don't know how to go about this.
Maybe there is an easier way, but this seems the most logical to me because the next step is writer.writerow to csv
So eventually the csv will look like
cashandequivalents | shortterminvestments | accountsreceivable | netinventory
100027000000 101000000000 46350000000 13860000000
########### ############ ########### ...........
(writer.writeheader will be done outside of the loop so I am only writing the values, not the "tags")
Thanks
A naive solution:
import json
json_data = {
"data": [
{
"tag": "cashandequivalents",
"value": 10027000000.0
},
{
"tag": "shortterminvestments",
"value": 101000000.0
},
{
"tag": "accountsreceivable",
"value": 4635000000.0
},
{
"tag": "netinventory",
"value": 1386000000.0
}
]
}
result = dict()
for entry in json_data['data']:
result[entry['tag']] = entry['value']
print json.dumps(result, indent=4)
Output
{
"shortterminvestments": 101000000.0,
"netinventory": 1386000000.0,
"accountsreceivable": 4635000000.0,
"cashandequivalents": 10027000000.0
}
The easiest and cleanest way to do this is with a dictionary comprehension.
d = {
"data": [
{
"tag": "cashandequivalents",
"value": 10027000000.0
},
{
"tag": "shortterminvestments",
"value": 101000000.0
},
{
"tag": "accountsreceivable",
"value": 4635000000.0
},
{
"tag": "netinventory",
"value": 1386000000.0
}
]
}
newDict = {i['tag']: i['value'] for i in d['data']}
# {'netinventory': 1386000000.0, 'shortterminvestments': 101000000.0, 'accountsreceivable': 4635000000.0, 'cashandequivalents': 10027000000.0}
This iterates through the list that is contained within the "data" key of your original dictionary and creates a new one inline with the key being the tag value of each and the value being the value for each during the iterations.