I have a JSON file and I need to convert that into CSV. But my JSON file contains JSON object which is an array and my all attributes are in that array but the code I am trying converts the first object into a single value but in actual I want all those attributes from JSON object.
JSON file content
{
"leads": [
{
"id": "31Y2V29CH0X82",
"product_type": "prelist"
},
{
"id": "2N649TAJBA50Z",
"product_type": "prelist"
}
],
"has_next_page": true,
"next_cursor": "2022-07-27T20:02:13.856000-07:00"
}
Python code
import pandas as pd
df = pd.read_json (r'C:\Users\Ron\Desktop\Test\Product_List.json')
df.to_csv (r'C:\Users\Ron\Desktop\Test\New_Products.csv', index = None)
The output I am getting is as following
And the output I want
I want the attributes as CSV content with headers?
I think you'll have to do this row by row.
data = {"leads": [{"id": "31Y2V29CH0X82", "product_type": "prelist"}, {"id": "2N649TAJBA50Z", "product_type": "prelist"}], "has_next_page": True,
"next_cursor": "2022-07-27T20:02:13.856000-07:00"}
headers = data.copy()
del headers['leads']
rows = []
for row in data['leads']:
row.update( headers )
rows.append( row )
import pandas as pd
df = pd.DataFrame( rows )
print(df)
Output:
id product_type has_next_page next_cursor
0 31Y2V29CH0X82 prelist True 2022-07-27T20:02:13.856000-07:00
1 2N649TAJBA50Z prelist True 2022-07-27T20:02:13.856000-07:00
Related
So I have this data that I scraped
[
{
"id": 4321069,
"points": 52535,
"name": "Dennis",
"avatar": "",
"leaderboardPosition": 1,
"rank": ""
},
{
"id": 9281450,
"points": 40930,
"name": "Dinh",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/aeaf8cc1-65b2-4d07-a838-1f078bbd2b60.png",
"leaderboardPosition": 2,
"rank": ""
},
{
"id": 1087209,
"points": 26053,
"name": "Sophia",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/c3e9ffb1-df72-46e8-9cd5-c66a000e98fa.png",
"leaderboardPosition": 3,
"rank": ""
And so on... Big leaderboard of 20 ppl
Scraped with this code
import json
import requests
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
# print for all time:
data = requests.get(url_all_time).json()
# for item in data:
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for item in data:
print(item['name'], item['points'])
And I want to be able to create a table that ressembles this
Every time I scrape data, I want it to update the table with the number of points with a new data stamped as the header. So basically what I was thinking is that my index = usernames and the header = date. The problem is, I can't even get to make a csv file with that NAME/POINTS columns.
The only thing I have succeeded doing so far is writing ALL the data into a csv file. I haven't been able to pinpoint the data I want like in the print command.
EDIT : After reading what #Shijith posted I succeeded at transferring data to .csv but with what I have in mind (add more data as time flies), I was asking myself if I should do a code with an Index or without.
WITH
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table.to_csv('products.csv', index=True, encoding='utf-8')
WITHOUT
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, columns=['points','name'])
table.to_csv('products.csv', index=False, encoding='utf-8')
Have you tried just reading the json directly into a pandas dataframe? From here it should be pretty easy to transform it like you want. You could add a column for today's date and pivot it.
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
df = pd.read_json(url_all_time)
data['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
data.pivot(index='name',columns='date',values='points')
I'm currently working on a project that will be analyzing multiple data sources for information, other data sources are fine but I am having a lot of trouble with json and its sometimes deeply nested structure. I have tried to turn the json into a python dictionary, but with not much luck as it can start to struggle as it gets more complicated. For example with this sample json file:
{
"Employees": [
{
"userId": "rirani",
"jobTitleName": "Developer",
"firstName": "Romin",
"lastName": "Irani",
"preferredFullName": "Romin Irani",
"employeeCode": "E1",
"region": "CA",
"phoneNumber": "408-1234567",
"emailAddress": "romin.k.irani#gmail.com"
},
{
"userId": "nirani",
"jobTitleName": "Developer",
"firstName": "Neil",
"lastName": "Irani",
"preferredFullName": "Neil Irani",
"employeeCode": "E2",
"region": "CA",
"phoneNumber": "408-1111111",
"emailAddress": "neilrirani#gmail.com"
}
]
}
after converting to dictionary and doing dict.keys() only returns "Employees".
I then resorted to instead opt for a pandas dataframe and I could achieve what I wanted by calling json_normalize(dict['Employees'], sep="_") but my problem is that it must work for ALL jsons and looking at the data beforehand is not an option so my method of normalizing this way will not always work. Is there some way I could write some sort of function that would take in any json and convert it into a nice pandas dataframe? I have searched for about 2 weeks for answers bt with no luck regarding my specific problem. Thanks
I've had to do that in the past (Flatten out a big nested json). This blog was really helpful. Would something like this work for you?
Note, like the others have stated, for this to work for EVERY json, is a tall task, I'm merely offering a way to get started if you have a wider range of json format objects. I'm assuming they will be relatively CLOSE to what you posted as an example with hopefully similarly structures.)
jsonStr = '''{
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}'''
It flattens out the entire json into single rows, then you can put into a dataframe. In this case it creates 1 row with 18 columns. Then iterates through those columns, using the number values within those column names to reconstruct into multiple rows. If you had a different nested json, I'm thinking it theoretically should work, but you'll have to test it out.
import json
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
jsonObj = json.loads(jsonStr)
flat = flatten_json(jsonObj)
results = pd.DataFrame()
columns_list = list(flat.keys())
for item in columns_list:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
column = item.replace('_'+row_idx+'_', '_')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
print (results)
Output:
print (results)
Employees_userId ... Employees_emailAddress
0 rirani ... romin.k.irani#gmail.com
1 nirani ... neilrirani#gmail.com
[2 rows x 9 columns]
d={
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
import pandas as pd
df=pd.DataFrame([x.values() for x in d["Employees"]],columns=d["Employees"][0].keys())
print(df)
Output
userId jobTitleName firstName ... region phoneNumber emailAddress
0 rirani Developer Romin ... CA 408-1234567 romin.k.irani#gmail.com
1 nirani Developer Neil ... CA 408-1111111 neilrirani#gmail.com
[2 rows x 9 columns]
For the particular JSON data given. My approach, which uses pandas package only, follows:
import pandas as pd
# json as python's dict object
jsn = {
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
# get the main key, here 'Employees' with index '0'
emp = list(jsn.keys())[0]
# when you have several keys at this level, i.e. 'Employers' for example
# .. you need to handle all of them too (your task)
# get all the sub-keys of the main key[0]
all_keys = jsn[emp][0].keys()
# build dataframe
result_df = pd.DataFrame() # init a dataframe
for key in all_keys:
col_vals = []
for ea in jsn[emp]:
col_vals.append(ea[key])
# add a new column to the dataframe using sub-key as its header
# it is possible that values here is a nested object(s)
# .. such as dict, list, json
result_df[key]=col_vals
print(result_df.to_string())
Output:
userId lastName jobTitleName phoneNumber emailAddress employeeCode preferredFullName firstName region
0 rirani Irani Developer 408-1234567 romin.k.irani#gmail.com E1 Romin Irani Romin CA
1 nirani Irani Developer 408-1111111 neilrirani#gmail.com E2 Neil Irani Neil CA
I have following nested json file, which I need to convert in pandas dataframe, the main problem is there is only one unique item in the whole json and it is very deeply nested.
I tried to solve this problem with the following code, but it gives repeating output.
[{
"questions": [{
"key": "years-age",
"responseKey": null,
"responseText": "27",
"responseKeys": null
},
{
"key": "gender",
"responseKey": "male",
"responseText": null,
"responseKeys": null
}
],
"transactions": [{
"accId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"tId": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"catId": "21001000",
"tType": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"name": "Online Transfer FROM CHECKING 1200454623",
"category": [
"Transfer",
"Acc Transfer"
]
}
],
"institutions": [{
"InstName": "Citizens company",
"InstId": "inst_1",
"accounts": [{
"pAccId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"pAccType": "depo",
"pAccSubtype": "check",
"_id": "5ad38837e806efaa90da4849"
}]
}]
}]
I need to convert this to pandas dataframe as follows:
id pAccId tId
5ad38837e806efaa90da4849 v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ 80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53
The main problem I am facing is with the "id" as it very deeply nested which is the only unique key for the json.
here is my code:
import pandas as pd
import json
with open('sub.json') as f:
data = json.load(f)
csv = ''
for k in data:
for t in k.get("institutions"):
csv += k['institutions'][0]['accounts'][0]['_id']
csv += "\t"
csv += k['institutions'][0]['accounts'][0]['pAccId']
csv += "\t"
csv += k['transactions'][]['tId']
csv += "\t"
csv += "\n"
text_file = open("new_sub.csv", "w")
text_file.write(csv)
text_file.close()
Hope above code makes sense, as I am new to python.
Read the JSON file and create a dictionary of account pAccId keys mapped to account.
Build the list of transactions as well.
with open('sub.json', 'r') as file:
records = json.load(file)
accounts = {
account['pAccId']: account
for record in records
for institution in record['institutions']
for account in institution['accounts']
}
transactions = (
transaction
for record in records
for transaction in record['transactions']
)
Open a csv file. For each transaction, get account for it from the accounts dictionary.
with open('new_sub.csv', 'w') as file:
file.write('id, pAccId, tId\n')
for transaction in transactions:
pAccId = transaction['accId']
account = accounts[pAccId]
_id = account['_id']
tId = transaction['tId']
file.write(f"{_id}, {pAccId}, {tId}\n")
Finally, read csv file to pandas.DataFrame.
df = pd.read_csv('new_sub.csv')
I wish to get the value of consumptionSavings from the following JSON format stored as .txt file.
{
"_id": "58edf905746de21c401a3dce",
"sites": [{
"ecms": [{
"consumptionSavings": 148,
"equipmentCost": 3455,
{
"energySource": "Electricity",
"consumptionReduction": {
"amount": 345435,
"unit": "MWh"
},
"projectDurationMonths": 36
}
}
}
]
]
}
I wrote the following code to extract the value of consumptionSavings;
import xlwings as xw
import pandas as pd
import json
data = json.load(open('data.txt'))
# Create a Pandas dataframe from the data.
df = pd.DataFrame({'data':[data["sites"]["ecms"]["consumptionSavings"]]})
wb = xw.Book('Values.xlsx')
ws = wb.sheets['Sheet1']
ws.range('C3').options(index=False).value = df
wb = xw.Book('Result.xlsx')
wb.save()
xw.apps[0].quit()
and It returns the following error:
TypeError: list indices must be integers or slices, not str
I am bit confused how that could be. Thank you
I want to combine some meta information together with a Pandas DataFrame as a JSON string.
I can call df.to_json(orient='values') to get the DataFrame's data as array, but how do I combine it with some additional data?
result = {
meta: {'some': 'meta info'},
data: [[dataframe.values], [list], [...]]
}
I could also ask: How do I merge a Python object (meta: {...}) into a serialised JSON string (df.to_json())?
You can always convert JSON into Python data.
import json
df_json = df.to_json(orient='values') # JSON
py_data = json.loads( df_json ) # Python data
result['extra_data'] = py_data # merge data
json_all = json.dumps( result ) # JSON again
EDIT:
I found better solution - use pandas.json.dumps
Standard module json got problem with numpy numbers used in dictionary made by pandas.
import pandas as pd
result = { 'meta': {'some': 'meta info'} }
df = pd.DataFrame([[1,2,3], [.1,.2,.3]], columns=('a','b','c'))
#result['extra_data'] = df.to_dict() # as dictonary
result['extra_data'] = df
print pd.json.dumps( result )
result
{
"extra_data":{
"a":{"0":1.0,"1":0.1},
"c":{"0":3.0,"1":0.3},
"b":{"0":2.0,"1":0.2}
},
"meta":{"some":"meta info"}
}
or
import pandas as pd
result = { 'meta': {'some': 'meta info'} }
df = pd.DataFrame([[1,2,3], [.1,.2,.3]], columns=('a','b','c'))
df_dict = df.to_dict()
df_dict['extra_data'] = result
print pd.json.dumps( df_dict )
result
{
"a":{"0":1.0,"1":0.1},
"c":{"0":3.0,"1":0.3},
"b":{"0":2.0,"1":0.2}
"extra_data":{"meta":{"some":"meta info"}},
}