I have a bunch of dataframes, like the ones below:
import pandas as pd
data1 = [['1', '2', 'mary', 123], ['1', '3', 'john', 234 ], ['2', '4', 'layla', 345 ]]
data2 = [['2', '6', 'josh', 345], ['1', '2', 'dolores', 987], ['1', '4', 'kate', 843]]
df1 = pd.DataFrame(data1, columns = ['state', 'city', 'name', 'number1'])
df2 = pd.DataFrame(data2, columns = ['state', 'city', 'name', 'number1'])
for some silly reason I need to transform it in a list in this manner (for each row):
list(
df1.apply(
lambda x: {
"profile": {"state": x["state"], "city": x["city"], "name": x["name"]},
"number1": x["number1"],
},
axis=1,
)
)
what returns me exactly what I need:
[{'profile': {'state': '1', 'city': '2', 'name': 'mary'}, 'number1': 123},
{'profile': {'state': '1', 'city': '3', 'name': 'john'}, 'number1': 234},
{'profile': {'state': '2', 'city': '4', 'name': 'layla'}, 'number1': 345}]
It works if I do it for each dataframe, but I need to write a function so I can use it latter. Also, I need to be able to store both df1 and df2 separately after the operation.
I tried something like this:
df_list = [df1, df2]
for row in df_list:
row = list(row.apply(lambda x: {'send': {'state':x['state'], 'city':x['city'], 'name':x['name']}, 'number1':x['number1']}, axis=1))
but it saves only the value of the last df in the list (df2) row.
also, I tried something like this (and a lot of other stuff):
new_values = []
for row in df_list:
row = list(row.apply(lambda x: {'send'{'state':x['state'],'city':x['city'],'name':x['name']},'number1':x['number1']}, axis=1))
new_values.append(df_list)
I know it might be about not been saving the row value locally. I've read a lot posts here similar to my problem, but I couldn't manage to fully use then... Any help will be appreciated, I'm really stuck here..
Do you mean this?
def func(df):
return list(df.apply(lambda x:{'profile' : {'state': x['state'],'city': x['city'],'name':x['name']},'number1': x['number1']}, axis=1))
you can use it just like that:
df1 = func(df1)
also if you want to map all of data frames:
df1, df2 = [func(df) for df in [df1, df2]]
Related
If have a dataframe like this:
df = pd.DataFrame({
'ID': ['1', '4', '4', '3', '3', '3'],
'club': ['arts', 'math', 'theatre', 'poetry', 'dance', 'cricket']
})
and I have a dictionary named tag_dict:
{'1': {'Granted'},
'3': {'Granted'}}
The keys of the dictionary match with some IDs in the ID column on data frame.
Now, I want to create a new column "Tag" in Dataframe such that
If a value in the ID column matches with the keys of a dictionary, then we have to place the value of that key in the dictionary else place '-' in that field
The output should look like this:
df = PD.DataFrame({
'ID': ['1', '4', '4', '3', '3', '3'],
'club': ['arts', 'math', 'theatre', 'poetry', 'dance', 'cricket'],
'tag':['Granted','-','-','Granted','Granted','Granted']
})
import pandas as pd
df = pd.DataFrame({
'ID': ['1', '4', '4', '3', '3', '3'],
'club': ['arts', 'math', 'theatre', 'poetry', 'dance', 'cricket']})
# I've removed the {} around your items. Feel free to add more key:value pairs
my_dict = {'1': 'Granted', '3': 'Granted'}
# use .map() to match your keys to their values
df['Tag'] = df['ID'].map(my_dict)
# if required, fill in NaN values with '-'
nan_rows = df['Tag'].isna()
df.loc[nan_rows, 'Tag'] = '-'
df
End result:
I'm not sure what the purpose of the curly brackets arount Granted is but you could use apply:
df = pd.DataFrame({
'ID': ['1', '4', '4', '3', '3', '3'],
'club': ['arts', 'math', 'theatre', 'poetry', 'dance', 'cricket']
})
tag_dict = {'1': 'Granted',
'3': 'Granted'}
df['tag'] = df['ID'].apply(lambda x: tag_dict.get(x, '-'))
print(df)
Output:
ID club tag
0 1 arts Granted
1 4 math -
2 4 theatre -
3 3 poetry Granted
4 3 dance Granted
5 3 cricket Granted
Solution with .map:
df["tag"] = df["ID"].map(dct).apply(lambda x: "-" if pd.isna(x) else [*x][0])
print(df)
Prints:
ID club tag
0 1 arts Granted
1 4 math -
2 4 theatre -
3 3 poetry Granted
4 3 dance Granted
5 3 cricket Granted
How can I sort a dictionary using the values from a list?
names = ['bread', 'banana', 'apple']
prices = ['2', '4', '1']
...
dict = {'name': names, 'price': prices}
dict is now {'name': ['bread', 'banana', 'apple'], 'price': ['2', '4', '1']}
I want to sort the dictionary in a way that the first name corresponds to the lower price.
Is this possible to achieve with sorting on a dictionary?
Example
sorted_dict = {'name': ['apple', 'bread', 'banana'], price: ['1', '2', '4']}
IIUC, you want to sort the first list (in name) based on the values of the second list (in price).
If that's what you want, then a quick way is to use pandas, since the data structure you have (dict of lists), fits really nicely with a pd.DataFrame.
import pandas as pd
pd.DataFrame(d).sort_values('price').to_dict('list')
{'name': ['apple', 'bread', 'banana'], 'price': ['1', '2', '4']}
Added the example as per OPs modified request -
names = ['bread', 'banana', 'apple']
prices = ['2', '4', '1']
description = ['a','x','b']
...
d = {'name': names, 'price': prices, 'description':description}
pd.DataFrame(d).sort_values('price').to_dict('list')
{'name': ['apple', 'bread', 'banana'],
'price': ['1', '2', '4'],
'description': ['b', 'a', 'x']}
I usually work with data look like this {'id': '1', 'start_date': '2012-04-8', 'end_date': '2012-08-06'} but now I have something very different. I have items of items where each two-element represents the one item
data = [
{'id': '1', 'field': 'end_tmie', 'value': '2012-08-06'},
{'id': '1', 'field': 'start_date', 'value': '2012-04-8'},
{'id': '2', 'field': 'end_tmie', 'value': '2012-01-06'},
{'id': '2', 'field': 'start_date', 'value': '2012-03-8'},
]
Goal how to get the duration end_time -start_time for each two data points with the same id in pandas
data Goal
df = [
{'id': '1', 'durations': '2012-08-06 - 2012-04-8'},
{'id': '2', 'durations': '2012-01-06 - 2012-03-8'},
]
2 data Goal how to resample data to look like tihs
df = [
{'id':'1', 'start':'2012-04-8', 'end':'2012-08-06'},
{'id':'2', 'start':'2012-03-8', 'end':'2012-01-06'},
]
Create DataFrame constructor first, then DataFrame.pivot with rename columns and for duration convert subtract columns with convert timedetas to days by Series.dt.days:
df = pd.DataFrame(data)
df['value'] = pd.to_datetime(df['value'])
df = df.pivot(index='id',columns='field',values='value').rename(columns={'start_date':'start','end_tmie':'end'})
df['durations'] = df['end'].sub(df['start']).dt.days
Last for exports filter columns with DataFrame.to_dict:
d1 = df['durations'].reset_index().to_dict('records')
print (d1)
[{'id': '1', 'durations': 120}, {'id': '2', 'durations': -62}]
d2 = df[['start','end']].apply(lambda x: x.dt.strftime('%Y-%m-%d')).reset_index().to_dict('records')
print (d2)
[{'id': '1', 'start': '2012-04-08', 'end': '2012-08-06'},
{'id': '2', 'start': '2012-03-08', 'end': '2012-01-06'}]
Assuming that there are no multiple value of start_date and end_tmie for each id, pd.pivot_table() should do the job.
>>> import pandas as pd
>>> data = [
... {'id': '1', 'field': 'end_tmie', 'value': '2012-08-06'},
... {'id': '1', 'field': 'start_date', 'value': '2012-04-8'},
... {'id': '2', 'field': 'end_tmie', 'value': '2012-01-06'},
... {'id': '2', 'field': 'start_date', 'value': '2012-03-8'},
... ]
>>> df = pd.DataFrame(data)
>>> df.pivot_table('value', 'id', 'field', lambda x: x).sort_index(ascending=False, axis=1).assign(duration=lambda x: pd.to_datetime(x['end_tmie']) - pd.to_datetime(x['start_date']))
field start_date end_tmie duration
id
1 2012-04-8 2012-08-06 120 days
2 2012-03-8 2012-01-06 -62 days
I have a single df that includes multiple json strings per row that need reading and normalizing.
I can read out the json info and normalize the columns by storing each row as a new dataframe in a list - which i have done with the code below.
However I need to append the original unique Id in the original df (i.e. 'id': ['9clpa','g659am']) - which is lost in my current code.
The expected output is a list of dataframes per Id that include the exploded json info, with an additional column including Id (which will be repeated for each row of the final df).
I hope that makes sense, any suggestions are very welcome. thanks so much
dataframe
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
current code
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
expected output
pd.DataFrame(data={'id': ['9clpa','9clpa'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
pd.DataFrame(data={'id': ['g659am','g659am'],'qi': ['01','02'], 'answers': ['{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}', '"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"']})
df = pd.DataFrame(data={'id': ['9clpa','g659am'],'i2': [('{"t":"unique678","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}'),('{"t":"unique428","q":[{"qi":"01","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"G","value":"3"},{"answer":"V","value":"4"}]},{"qi":"02","answers":[{"answer":"M","value":"1"},{"answer":"F","value":"2"},{"answer":"A","value":"3"},{"answer":"B","value":"4"},{"answer":"G","value":"5"},{"answer":"NC","value":"6"},{"answer":"O","value":"7"} ]}]}')]})
out={}
columns1 = ['id','qi','answers']
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
df_new = pd.DataFrame(data=out[i],columns=columns1)
df_new = df_new.assign(id = lambda x: df.id[i])
display(df_new)
You can add a lambda function which will assign the value of 'id' to new df formed.
Edit: You can add location of 'id' column, in columns1 and define where you want it to appear when you create a dataframe.
Output dataframe:
You are just missing on assigning the id to your dataframe after your normalize columns:
out={}
for i in range(len(df)):
out[i] = pd.read_json(df.i2[i])
out[i] = pd.json_normalize(out[i].q)
out[i]['id'] = df.id[i]
out[i] = out[i].loc[:, ['id','qi','answers']]
Output:
>>> out[0]
id qi answers
0 9clpa 01 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'G', 'value': '3'}, {'answer': 'V', 'value': '4'}]
1 9clpa 02 [{'answer': 'M', 'value': '1'}, {'answer': 'F', 'value': '2'}, {'answer': 'A', 'value': '3'}, {'answer': 'B', 'value': '4'}, {'answer': 'G', 'value': '5'}, {'answer': 'NC', 'value': '6'}, {'answer': 'O', 'value': '7'}]
You can use .json_normalize (doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html)
(from https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e)
how do i reshape my dataframe from
to
using Python
df1 = pd.DataFrame({'Name':['John', 'Martin', 'Ricky'], 'Age': ['25', '27', '22'], 'Car1': ['Hyundai', 'VW', 'Ford'], 'Car2': ['Maruti', 'Merc', 'NA']})
You want :
df_melted = pd.melt(df, id_vars=['Name', 'Age', 'salary'], value_vars=['car1', 'car2'], var_name='car_number', value_name='car')
df_melted.drop('car_number', axis=1, inplace=True)
df_melted.sort_values('Name', inplace=True)
df_melted.dropna(inplace=True)