I'm new to python.
I got a Dataframe like this:
df = pd.DataFrame({'column_a' : [1, 2, 3],
'conversions' : [[{'action_type': 'type1',
'value': '1',
'value_plus_10': '11'},
{'action_type': 'type2',
'value': '2',
'value_plus_10': '12'}],
np.nan,
[{'action_type': 'type3',
'value': '3',
'value_plus_10': '13'},
{'action_type': 'type4',
'value': '4',
'value_plus_10': '14'}]]} )
where values in the column conversions is either a list or a NaN.
values in conversions looks like this:
print(df['conversions'][0])
>>> [{'action_type': 'type1', 'value': '1', 'value_plus_10': '11'}, {'action_type': 'type2', 'value': '2', 'value_plus_10': '12'}]
But it's kinda hard to manipulate, so I want elements in conversions to be either a dataFrame or a NaN, like this:
print(df['conversions'][0])
>>>
action_type value value_plus_10
0 type1 1 11
1 type2 2 12
print(df['conversions'][1])
>>> nan
print(df['conversions'][2])
>>>
action_type value value_plus_10
0 type3 3 13
1 type4 4 14
Here's what I tried:
df['conversions'] = df['conversions'].apply(lambda x : pd.DataFrame(x) if type(x)=='list' else x)
which works, but nothing really changes.
I could only find ways to convert a series to a dataframe, but what I'm trying to do is converting elements in a series to dataframes.
Is it possible to do? Thanks a lot!
Edit: Sorry for the unclear expected output , hope it's clear now.
You can apply the DataFrame constructor to the conversions columns:
df['conversions'] = df['conversions'].apply(lambda x: pd.DataFrame(x) if isinstance(x, list) else x)
print(df['conversions'][0])
Output:
action_type value value_plus_10
0 type1 1 11
1 type2 2 12
Edit: it seems I misread your question (which is a bit unclear tbf) since you claim that this doesn't get the expected result. Are you trying to get all elements in one df? In that case you can use concat:
df_out = pd.concat([
pd.DataFrame(x) for x in df['conversions'] if isinstance(x, list)
])
print(df_out)
Output:
action_type value value_plus_10
0 type1 1 11
1 type2 2 12
0 type3 3 13
1 type4 4 14
Related
I have a dataframe like this:
df:
Collection ID
0 [{'tom': 'one'}, {'tom': 'two'}] 10
1 [{'nick': 'one'}] 10
2 [{'julie': 'one'}] 14
When the 'ID' column has duplicated values, for whichever entry of duplicates, the length of the list value of the column 'Collection' is greater, I want to set the value of a new column 'status' as 1, else 0.
Resultant df should look like:
df:
Collection ID status
0 [{'tom': 'one'}, {'tom': 'two'}] 10 1
1 [{'nick': 'one'}] 10 0
2 [{'julie': 'one'}] 14 1
I have tried to go along the np.where function which I have found closest to my problem from Stack Overflow but failing to get an alternative of df['Collection'].str.len() which will give me the length of the list.
df['status']=np.where(df["Collection"].str.len() > 1, 1, 0)
Thanks in advance.
df to dict value:
{'Collection': {0: [{'tom': 'one'}, {'tom': 'two'}],
1: [{'nick': 'one'}],
2: [{'julie': 'one'}]},
'ID': {0: 10, 1: 10, 2: 14}}
IIUC, you can do:
df.loc[df.assign(l=df['Collection'].apply(len)).groupby('ID').idxmax()['l'], 'status'] = 1
df['status'] = df['status'].fillna(0).astype(int)
In a later version of pandas, probably you need to supply numeric_only=True in idxmax() function.
output:
Collection ID status
0 [{'tom': 'one'}, {'tom': 'two'}] 10 1
1 [{'nick': 'one'}] 10 0
2 [{'julie': 'one'}] 14 1
A possible solution:
df['status'] = df['Collection'].map(len)
df['status'] =(df.groupby('ID', sort=False)
.apply(lambda g: 1*g['status'].eq(max(g['status'])))
.reset_index(drop=True))
Output:
Collection ID status
0 [{'tom': 'one'}, {'tom': 'two'}] 10 1
1 [{'nick': 'one'}] 10 0
2 [{'julie': 'one'}] 14 1
here is a simple pandas DataFrame :
data={'Name': ['John', 'Dav', 'Ann', 'Mike', 'Dany'],
'Number': ['2', '3', '2', '4', '2']}
df = pd.DataFrame(data, columns=['Name', 'Number'])
df
I would like to add a third column named "color" where the value is 'red' if Number = 2 and 'Blue' if Number = 3
This dataframe just has 5 rows. In reality It has thousand rows so I can not just add a simple column manually.
You can use .map:
dct = {2: "Red", 3: "Blue"}
df["color"] = df["Number"].astype(int).map(dct) # remove .astype(int) if the values are already integer
print(df)
Prints:
Name Number color
0 John 2 Red
1 Dav 3 Blue
2 Ann 2 Red
3 Mike 4 NaN
4 Dany 2 Red
pd.DataFrame({'id': ['id1', 'id1', 'id2', 'id2'],
'value': ['1', '2', '10', '20'],
'index': ['day1', 'day2', 'day1', 'day2']})
how can I transform this data correctly (and concisely) with pandas that it results in:
| id1 | id2
day1 : 1 | 10
day2 : 2 | 20
Maybe something with groupby but without aggregation, I dont know what to google, can you help me?
thank you very much
Use pandas pivot. It reshapes datframe based on the input conditions
pd.pivot_table(df, index=['index'], columns=['id'],values='value').reset_index()
Just remember to set value to float or integer
I have a list of list of dictionaries such as the following:
[[{'ID': '1',
'Value': '100'},
{'ID': '2',
'Value': '200'}],
[{'ID': '2',
'Value': '300'},
{'ID': '2',
'Value': '300'}],
...]]
I want to convert it into a denormalized dataframe which would have new column for each key such as:
# ID Value ID Value
#0 1 100 2 100
#1 2 300 2 300
If one item has 3 pairs of id, value those should be null for the other items. Running pd.DataFrame(list) creates only one ID and one Value column and puts the values under. How can we achieve this as seperate columns?
You can do it with the concat function:
data = [pd.DataFrame(i) for i in input_data]
out = pd.concat(data, axis=1)
print(out)
Prints:
ID Value ID Value
0 1 100 2 300
1 2 200 2 300
The key is the axis=1 which concatenates along the column axis.
Edit:
Just saw the information with the zeros for all "shorter" columns. THis code results in NaN instead of zero, this however can be resolved fast with the fillna() method:
out = out.fillna(value=0)
Example:
import pandas as pd
input_data = [[{'ID': '1',
'Value': '100'},
{'ID': '2',
'Value': '200'}],
[{'ID': '2',
'Value': '300'},
{'ID': '2',
'Value': '300'}],
[{'ID': '2',
'Value': '300'},
{'ID': '2',
'Value': '300'},
{'ID': '3',
'Value': '300'}]]
data = [pd.DataFrame(i) for i in input_data]
out = pd.concat(data, axis=1)
out = out.fillna(value=0)
print(out)
prints:
ID Value ID Value ID Value
0 1 100 2 300 2 300
1 2 200 2 300 2 300
2 0 0 0 0 3 300
I am trying to group some signals and concatenate the text at the same time for that I use something similar to the code below. Where I use the sum_x custom function to concatenate the text under 'text'
lst = [{'name': 'A', 'reg': '1', 'text': 'txt1', 'value': 5},
{'name': 'A', 'reg': '1', 'text': 'txt2', 'value': 2},
{'name': 'B', 'reg': '2', 'text': 'txt3', 'value': 2}]
data = pd.DataFrame(lst)
sum_x = lambda x : x.sum()
data.groupby(by=['name', 'reg']).apply(sum_x)
Out[48]:
name reg text value
name reg
A 1 AA 11 txt1txt2 7
B 2 B 2 txt3 2
This however do not produce the expected result
Notice how the text column was added as expected but also the 'by' columns where concatenated. The response also have an extra level.
is it posible to obtain something like:
name reg text value
A 1 txt1txt2 7
B 2 txt3 2
where the columns in 'by' are preserved ?
try this:
In [21]: data.groupby(['name', 'reg']).agg({'value':'sum', 'text':'sum'}).reset_index()
Out[21]:
name reg text value
0 A 1 txt1txt2 7
1 B 2 txt3 2
You can select the columns the groupby operation works on:
In [21]: data.groupby(by=['name', 'reg'])[['text', 'value']].apply(sum_x)
Out[21]:
text value
name reg
A 1 txt1txt2 7
B 2 txt3 2
Finally, if you do not want the name and reg in the index, you can use reset_index():
In [22]: data.groupby(by=['name', 'reg'])[['text', 'value']].apply(sum_x).reset_index()
Out[22]:
name reg text value
0 A 1 txt1txt2 7
1 B 2 txt3 2