I have a dataframe with 10 rows like this:
id
name
team
1
Employee1
Team1
2
Employee2
Team2
...
How can I generate 10 json files from the dataframe with python?
Here is the format of each json file:
{
"Company": "Company",
"id": "1",
"name": "Employee1",
"team": "Team1"
}
The field "Company": "Company" is the same in all json files.
Name of each json file is the name of each employee (i.e Employee1.json)
I do not really like iterrows but as you need a file per row, I cannot imagine how to vectorize the operation:
for _, row in df.iterrows():
row['Company'] = 'Company'
row.to_json(row['name'] + '.json')
You could use apply in the following way:
df.apply(lambda x: x.to_json(), axis=1)
And inside the to_json pass the employee name, it’s available to you in x
Another approach is to iterate over the rows like:
for i in df.index:
df.loc[i].to_json("Employee{}.json".format(i))
Related
How can I convert to Json file form Dataframe in python using pandas.
I don't know how to get name and carmodel column, I was only get the price from dataframe
I have an Dataframe like:
name carmodel
ACURA CL 6.806155e+08
OTHER 2.280000e+08
EL 1.300000e+08
MDX 7.828750e+08
RDX 3.850000e+08
...
VOLVO XC90 3.748778e+09
ZOTYE OTHER 1.887500e+08
HUNTER 1.390000e+08
T600 4.200000e+08
Z8 4.754000e+08
so I want to convert it like:
{
'ACURA':
{
'CL':6.806155e+08,
'OTHER':2.280000e+08,
'MDX':7.828750e+08,
},
'VOLVO':
{
'XC90':3.748778e+09,
},
'ZOTYE':
{
'OTHER':1.887500e+08,
'HUNTER':1.390000e+08,
'T600':4.200000ee+08,
}
}
Here is my code:
df = pd.read_csv(r'D:\vucar\scraper\result.csv',dtype='unicode')
df['price'] = pd.to_numeric(df['price'],downcast='float')
cars = df[['name','carmodel','price']].sort_values('name').groupby(['name','carmodel']).mean()['price']
print(cars)
You could reset the index on cars, then groupby name again, while merging carmodel and name to a dictionary, then save as json:
cars = df[['name','carmodel','price']].sort_values('name').groupby(['name','carmodel']).mean().reset_index(level='carmodel')
cars.groupby('name').apply(lambda x: x.set_index('carmodel')['price'].to_dict()).to_json('filename.json', orient='index')
output:
{"ACURA":{"CL":680615500.0,"EL":130000000.0,"MDX":782875000.0,"OTHER":228000000.0,"RDX":385000000.0},"VOLVO":{"XC90":3748778000.0},"ZOTYE":{"HUNTER":139000000.0,"OTHER":188750000.0,"T600":420000000.0,"Z8":475400000.0}}
So I am using the JSON package in python to extract data from generated JSON which would essentially fetched data from a firebase database which was then generated as a JSON file.
Within the given data set I want to extract all of the data corresponding to bills in each entry within the JSON file. For that I created a separate dictionary to add all of the elements corresponding to bills in the dataset.
When converted to CSV, the dataset looks like this:
csv for one entry
So I have the following code to do above operation. But as I create a new dictionary, there are certain entries which have null values designated as [] (see the csv file). I assigned list to store all those bills which would have the data in the bills column (essentially avoiding all the null entries). But as a I create a new list the required output is only getting stored in the first index of the new list or array. Please see the code below.
My code is as below:
filedata = open('requireddataset.json','r') data = json.load(filedata)
listoffields = [] # To produce it into a list with fields for dic
for dic in data:
try:
listoffields.append(dic['bills']) # only non-essential bill categories.
except KeyError:
pass
#print (listoffields[3]) # This would return the first payment entry within
# the JSON Array of objects.
for val in listoffields:
if val!=[]:
x = val[0] # only val[0] would contain data
#print (x)
myarray = np.array(val)
print(myarray[0]) # All of the data stored in only one index, any way to change this?
This is the output : output
This is how the original JSON file looks like : requireddataset.json
Essentially my question is the list listoffields would contain all the fields in it(from the JSON file), and bills in one of the fields. And within the column bills each entry again contains id, value, role and many other entries. Is there any way to extract only values from this and produce sum .
In the JSON file this is how it looks like for one entry :
[{"goal_savings": 0.0, "social_id": "", "score": 0, "country": "BR", "photo": "http://graph.facebook", "id": "", "plates": 3, "rcu": null, "name": "", "email": ".", "provider": "facebook", "phone": "", "savings": [], "privacyPolicyAccepted": true, "currentRole": "RoleType.PERSONAL", "empty_lives_date": null, "userId": "", "authentication_token": "-------", "onboard_status": "ONBOARDING_WIZARD", "fcmToken": ----------", "level": 1, "dni": "", "social_token": "", "lives": 10, "bills": [{"date": "2020-12-10", "role": "RoleType.PERSONAL", "name": "Supermercado", "category": "feeding", "periodicity": "PeriodicityType.NONE", "value": 100.0"}], "payments": [], "goals": [], "goalTransactions": [], "incomes": [], "achievements": [{"created_at":", "name": ""}]}]
I'm trying to replace the 'starters' column of this DataFrame
starters
roster_id
Bob 3086
Bob 1234
Cam 6130
... ...
with the player names from a large nested dict like this. The values in my 'starters' column are the keys.
{
"3086": {
"team": "NE",
"player_id":"3086",
"full_name": "tombrady",
},
"1234": {
"team": "SEA",
"player_id":"1234",
"full_name": "RussellWilson",
},
"6130": {
"team": "BUF",
"player_id":"6130",
"full_name": "DevinSingletary",
},
...
}
I tried using DataFrame.replace(dict) and Dataframe.map(dict) but that gives me back all the player info instead of just the name.
is there a way to do this with a nested dict? thanks.
let df be the dataframe and d be the dictionary, then you can use apply from pandas on axis 1 to change the column
df.apply(lambda x: d[str(x.starters)]['full_name'], axis=1)
I am not sure, if I understand your question correctly. Have you tried using dict['full_name'] instead of simply dict?
Try pd.concat with series.map:
>>> pd.concat([
df,
pd.DataFrame.from_records(
df.astype(str)
.starters
.map(dct)
.values
).set_index(df.index)
], axis=1)
starters team player_id full_name
roster_id
Bob 3086 NE 3086 tombrady
Bob 1234 SEA 1234 RussellWilson
Cam 6130 BUF 6130 DevinSingletary
I have a column that is a list of dictionary. I extracted only the values by the name key, and saved it to a list. Since I need to run the column to a tfidVectorizer, I need the columns to be a string of words. My code is as follows.
def transform(s,to_extract):
return [object[to_extract] for object in json.loads(s)]
cols = ['genres','keywords']
for col in cols:
lst = df[col]
df[col] = list(map(lambda x : transform(x,to_extract='name'), lst))
df[col] = [', '.join(x) for x in df[col]]
for testing, here's 2 rows.
data = {'genres': [[{"id": 851, "name": "dual identity"},{"id": 2038, "name": "love of one's life"}],
[{"id": 5983, "name": "pizza boy"},{"id": 8828, "name": "marvel comic"}]],
'keywords': [[{"id": 9663, "name": "sequel"},{"id": 9715, "name": "superhero"}],
[{"id": 14991, "name": "tentacle"},{"id": 34079, "name": "death", "id": 163074, "name": "super villain"}]]
}
df = pd.DataFrame(data)
I'm able to extract the necessary data and save it accordingly. However, I find the codes too verbose, and I would like to know if there's a more pythonic way to achieve the same outcome?
Desired output of one row should be a string, delimited only by a comma. Ex, 'Dual Identity,love of one's life'.
Is this what you need ?
df.applymap(lambda x : pd.DataFrame(x).name.tolist())
Out[278]:
genres keywords
0 [dual identity, love of one's life] [sequel, superhero]
1 [pizza boy, marvel comic] [tentacle, super villain]
Update
df.applymap(lambda x : pd.DataFrame(x).name.str.cat(sep=','))
Out[280]:
genres keywords
0 dual identity,love of one's life sequel,superhero
1 pizza boy,marvel comic tentacle,super villain
I need to format the contents of a Json file in a certain format in a pandas DataFrame so that I can run pandassql to transform the data and run it through a scoring model.
file = C:\scoring_model\json.js (contents of 'file' are below)
{
"response":{
"version":"1.1",
"token":"dsfgf",
"body":{
"customer":{
"customer_id":"1234567",
"verified":"true"
},
"contact":{
"email":"mr#abc.com",
"mobile_number":"0123456789"
},
"personal":{
"gender": "m",
"title":"Dr.",
"last_name":"Muster",
"first_name":"Max",
"family_status":"single",
"dob":"1985-12-23",
}
}
}
I need the dataframe to look like this (obviously all values on same row, tried to format it best as possible for this question):
version | token | customer_id | verified | email | mobile_number | gender |
1.1 | dsfgf | 1234567 | true | mr#abc.com | 0123456789 | m |
title | last_name | first_name |family_status | dob
Dr. | Muster | Max | single | 23.12.1985
I have looked at all the other questions on this topic, have tried various ways to load Json file into pandas
with open(r'C:\scoring_model\json.js', 'r') as f:
c = pd.read_json(f.read())
with open(r'C:\scoring_model\json.js', 'r') as f:
c = f.readlines()
tried pd.Panel() in this solution Python Pandas: How to split a sorted dictionary in a column of a dataframe with dataframe results from [yo = f.readlines()]. I thought about trying to split contents of each cell based on ("") and find a way to put the split contents into different columns but no luck so far.
If you load in the entire json as a dict (or list) e.g. using json.load, you can use json_normalize:
In [11]: d = {"response": {"body": {"contact": {"email": "mr#abc.com", "mobile_number": "0123456789"}, "personal": {"last_name": "Muster", "gender": "m", "first_name": "Max", "dob": "1985-12-23", "family_status": "single", "title": "Dr."}, "customer": {"verified": "true", "customer_id": "1234567"}}, "token": "dsfgf", "version": "1.1"}}
In [12]: df = pd.json_normalize(d)
In [13]: df.columns = df.columns.map(lambda x: x.split(".")[-1])
In [14]: df
Out[14]:
email mobile_number customer_id verified dob family_status first_name gender last_name title token version
0 mr#abc.com 0123456789 1234567 true 1985-12-23 single Max m Muster Dr. dsfgf 1.1
It's much easier if you deserialize the JSON using the built-in json module first (instead of pd.read_json()) and then flatten it using pd.json_normalize().
# deserialize
with open(r'C:\scoring_model\json.js', 'r') as f:
data = json.load(f)
# flatten
df = pd.json_normalize(d)
If a dictionary is passed to json_normalize(), it's flattened into a single row, but if a list is passed to it, it's flattened into multiple rows. So if the nested structure contains only key-value pairs, pd.json_normalize() with no parameters suffices to flatten it.
However, if the data contains a list (JSON array in the nesting in the file), then passing record_path= argument to let pandas find the path to the records. For example, if the data is like the following (notice how the value under "body" is a list, i.e. a list of records):
data = {
"response":[
{
"version":"1.1",
"customer": {"id": "1234567", "verified":"true"},
"body":[
{"email":"mr#abc.com", "mobile_number":"0123456789"},
{"email":"ms#abc.com", "mobile_number":"9876543210"}
]
},
{
"version":"1.2",
"customer": {"id": "0987654", "verified":"true"},
"body":[
{"email":"master#abc.com", "mobile_number":"9999999999"}
]
}
]
}
then you can pass record_path= to let the program know that the records are under "body" and pass meta= to set the path to the metadata. Note how in "body", "version" and "customer" are in the same level in the data but "id" is nested one level more so you need to pass a list to get the value under "id".
df = pd.json_normalize(data['response'], record_path=['body'], meta=['version', ['customer', 'id']])