For each row, I am computing values and storing them in a dictionary. I want to be able to take the dictionary and add it to the row where the keys are columns.
For example:
Dataframe
A B C
1 2 3
Dictionary:
{
'D': 4,
'E': 5
}
Result:
A B C D E
1 2 3 4 5
There will be more than one row in the dataframe, and for each row I'm computing a dictionary that might not necessarily have the same exact keys.
I ended up doing this to get it to work:
appiled_df = df.apply(lambda row: func(row['a']), axis='columns', result_type='expand')
df = pd.concat([df, appiled_df], axis='columns')
def func():
...
return pd.Series(dictionary)
If you want the dict values to appear in each row of the original dataframe, use:
d = {
'D': 4,
'E': 5
}
df_result = df.join(df.apply(lambda x: pd.Series(d), axis=1))
Demo
Data Input:
df
A B C
0 1 2 3
1 11 12 13
Output:
df_result = df.join(df.apply(lambda x: pd.Series(d), axis=1))
A B C D E
0 1 2 3 4 5
1 11 12 13 4 5
If you just want the dict to appear in the first row of the original dataframe, use:
d = {
'D': 4,
'E': 5
}
df_result = df.join(pd.Series(d).to_frame().T)
A B C D E
0 1 2 3 4.0 5.0
1 11 12 13 NaN NaN
Simply use a for cycle in your dictionary and assign the values.
df = pd.DataFrame(columns=['A', 'B', 'C'], data=[[1,2,3]])
# You can test with df = pd.DataFrame(columns=['A', 'B', 'C'], data=[[1,2,3], [8,0,33]]), too.
d = {
'D': 4,
'E': 5
}
for k,v in d.items():
df[k] = v
print(df)
Output:
A
B
C
D
E
0
1
2
3
4
5
Related
I have a dataset like:
Data
a
a
a
a
a
b
b
b
a
a
b
I want to add a column that looks like the one below. The data will be in the form of a1,1 in the column, where the first element represent the event frequency (a1) and the second element (,1) is the frequency for each event. Is there a way we can do this using python?
Data Frequency
a a1,1
a a1,2
a a1,3
a a1,4
a a1,5
b b1,1
b b1,2
b b1,3
a a2,1
a a2,2
b b2,1
You can use:
# identify changes in Data
m = df['Data'].ne(df['Data'].shift()).cumsum()
# cumulated increments within groups
g1 = df.groupby(m).cumcount().add(1).astype(str)
# increments of different subgroups per Data
g2 = (df.loc[~m.duplicated(), 'Data']
.groupby(df['Data']).cumcount().add(1)
.reindex(df.index, method='ffill')
.astype(str)
)
df['Frequency'] = df['Data'].add(g2+','+g1)
output:
Data Frequency
0 a a1,1
1 a a1,2
2 a a1,3
3 a a1,4
4 a a1,5
5 b b1,1
6 b b1,2
7 b b1,3
8 a a2,1
9 a a2,2
10 b b2,1
Code:
from itertools import groupby
k = [key for key, _group in groupby(df['Data'].tolist())] #OUTPUT ['a', 'b', 'a', 'b']
Key = [v+f'{k[:i].count(v)+1}' for i,v in enumerate(k)] #OUTPUT ['a1', 'b1', 'a2', 'b2']
Sum = [sum(1 for _ in _group) for key, _group in groupby(df['Data'].tolist())] #OUTPUT [4, 3, 2, 1]
df['Frequency'] = [f'{K},{S}' for I, K in enumerate(Key) for S in range(1, Sum[I]+1)]
Output:
Data Frequency
0 a a1,1
1 a a1,2
2 a a1,3
3 a a1,4
4 b b1,1
5 b b1,2
6 b b1,3
7 a a2,1
8 a a2,2
9 b b2,1
def function1(dd:pd.DataFrame):
dd2=dd.assign(col2=dd.col1.ne(dd.col1.shift()).cumsum())\
.assign(col2=lambda dd:dd.Data+dd.col2.astype(str))\
.assign(rk=dd.groupby('col1').col1.transform('cumcount').astype(int)+1)\
.assign(col3=lambda dd:dd.col2+','+dd.rk.astype(str))
return dd2.loc[:,['Data','col3']]
df1.assign(col1=df1.ne(df1.shift()).cumsum()).groupby(['Data']).apply(function1)
Data col3
0 a a1,1
1 a a1,2
2 a a1,3
3 a a1,4
4 a a1,5
5 b b1,1
6 b b1,2
7 b b1,3
8 a a2,1
9 a a2,2
10 b b2,1
Let’s say I have the following Pandas dataframe, where the 'key' column only contains unique strings:
import pandas as pd
df = pd.DataFrame({'key':['b','d','c','a','e','f'], 'value': [0,0,0,0,0,0]})
df
key value
0 b 0
1 d 0
2 c 0
3 a 0
4 e 0
5 f 0
Now I have a list of unique keys and a list of corresponding values:
keys = ['a', 'b', 'c', 'd']
values = [1, 2, 3, 4]
I want to update the 'value' column in the same order of the lists, so that each row has matched 'key' and 'value' (a to 1, 'b' to 2, 'c' to 3, 'd' to 4). I am using the following code, but the dataframe seems to update values from top to bottom, which I don't quite understand
df.loc[df['key'].isin(keys),'value'] = values
df
key value
0 b 1
1 d 2
2 c 3
3 a 4
4 e 0
5 f 0
To be clear, I am expecting to get
key value
0 b 2
1 d 4
2 c 3
3 a 1
4 e 0
5 f 0
Any suggestions?
Use map:
dd = dict(zip(keys, values))
df['value'] = df['key'].map(dd).fillna(df['value'])
keys = ['a', 'b', 'c', 'd']
values = [1, 2, 3, 4]
# form a dictionary with keys and values list
d=dict(zip(keys, values))
# update the value where mapping exists using LOC and MAP
df.loc[df['key'].map(d).notna(), 'value'] =df['key'].map(d)
df
key value
0 b 2
1 d 4
2 c 3
3 a 1
4 e 0
5 f 0
with a temporary dataframe:
temp_df = df.set_index('key')
temp_df.loc[keys] = np.array(values).reshape(-1, 1)
df = temp_df.reset_index()
Let's say I have the following Pandas dataframe. It is what it is and the input can't be changed.
df1 = pd.DataFrame(np.array([['a', 1,'e', 5],
['b', 2, 'f', 6],
['c', 3, 'g', 7],
['d', 4, 'h', 8]]))
df1.columns = [1,1,2,2]
See how the columns have the same name? The output I want is to have columns with the same name combined (not summed or concatenated), meaning the second column 1 is added to the end of the first column 1, like so:
df2 = pd.DataFrame(np.array([['a', 'e'],
['b','f'],
['c', 'g'],
['d', 'h'],
[1,5],
[2,6],
[3,7],
[4,8]]))
df2.columns = [1,2]
How do I do this? I can do it manually, except I actually have like 10 column titles, about 100 iterations of each title, and several thousand rows, so it takes forever and I have to redo it with each new dataset.
EDIT: the columns in actual datasets are unequal in length.
Try with groupby and explode:
output = df1.groupby(level=0, axis=1).agg(lambda x: x.values.tolist()).explode(df1.columns.unique().tolist())
>>> output
1 2
0 a e
0 1 5
1 b f
1 2 6
2 c g
2 3 7
3 d h
3 4 8
Edit:
To reorder the rows, you can do:
output = output.assign(order=output.groupby(level=0).cumcount()).sort_values("order",ignore_index=True).drop("order",axis=1)
>>> output
1 2
0 a e
1 b f
2 c g
3 d h
4 1 5
5 2 6
6 3 7
7 4 8
Depending on the size of your data, you could split the data into a dictionary and then create a new data frame from that:
df1 = pd.DataFrame(np.array([['a', 1, 'e', 5],
['b', 2, 'f', 6],
['c', 3, 'g', 7],
['d', 4, 'h', 8]]))
df1.columns = [1, 1, 2, 2]
dictionary = {}
for column in df1.columns:
items = []
for item in df1[column].values.tolist():
items += item
dictionary[column] = items
new_df = pd.DataFrame(dictionary)
print(new_df)
You can use a dictionary whose default value is list and loop through the dataframe columns. Use the column name as dictionary key and append the column value to the dictionary value.
from collections import defaultdict
d = defaultdict(list)
for i, col in enumerate(df1.columns):
d[col].extend(df1.iloc[:, i].values.tolist())
df = pd.DataFrame.from_dict(d, orient='index').T
print(df)
1 2
0 a e
1 b f
2 c g
3 d h
4 1 5
5 2 6
6 3 7
7 4 8
For df1.columns = [1,1,2,3], the output is
1 2 3
0 a e 5
1 b f 6
2 c g 7
3 d h 8
4 1 None None
5 2 None None
6 3 None None
7 4 None None
If I understand correctly, this seems to work:
pd.concat([s.reset_index(drop=True) for _, s in df1.melt().groupby("variable")["value"]], axis=1)
Output:
In [3]: pd.concat([s.reset_index(drop=True) for _, s in df1.melt().groupby("variable")["value"]], axis=1)
Out[3]:
value value
0 a e
1 b f
2 c g
3 d h
4 1 5
5 2 6
6 3 7
7 4 8
I have a DataFrame, in which there a column which is composed of dict, I want to extract all the keys and values and make them as two new columns.
a b
0 1 {'a': 1, 'b': 2}
1 2 {'k': 4, 'v': 6}
2 3 {'z': 3}
The output would be
a k v
0 1 a 1
1 1 b 2
2 2 k 4
3 2 v 6
4 3 z 3
Use list comprehension with flatten values for list of tuples and pass to DataFrame constructor:
L = [(x, k, v) for x, y in df[['a','b']].values for k, v in y.items()]
df = pd.DataFrame(L, columns=['a','k','v'])
print (df)
a k v
0 1 a 1
1 1 b 2
2 2 k 4
3 2 v 6
4 3 z 3
EDIT: For general solution working with unique index is possible solution modify with DataFrame.pop for extract b column, add new column idx by index values, convert to index and last use DataFrame.join:
L = [(x, k, v) for x, y in df.pop('b').items() for k, v in y.items()]
df1 = pd.DataFrame(L, columns=['idx','k','v']).set_index('idx').rename_axis(None)
df = df.join(df1).reset_index(drop=True)
print (df)
a k v
0 1 a 1
1 1 b 2
2 2 k 4
3 2 v 6
4 3 z 3
You can try groupby, apply, rename_axis and reset_index:
>>> df.groupby('a').apply(lambda x:pd.Series(x.b[0], name='v'))
.rename_axis(['a','k']).reset_index()
a k v
0 1 a 1
1 1 b 2
2 2 k 4
3 2 v 6
4 3 z 3
First, I created the original data frame:
df = pd.DataFrame({'a': [1, 2, 3],
'b': [{'a': 1, 'b': 2},
{'k': 4, 'v': 6},
{'z': 3}]
})
Then, iterate over rows of the data frame, and iterate over the dictionary in Column B:
ts = list()
for row in df.itertuples():
for key, value in row.b.items():
t = (row.Index, row.a, key, value)
ts.append(t)
print(pd.DataFrame(data=ts, columns=['Index', 'a', 'k', 'v']).set_index('Index'))
a k v
Index
0 1 a 1
0 1 b 2
1 2 k 4
1 2 v 6
2 3 z 3
You could expand the dictionary, explode the column and apply the pd.Series function to get your result :
df = pd.DataFrame({"a": [1, 2, 3], "b": [{"a": 1, "b": 2}, {"k": 4, "v": 6}, {"z": 3}]})
divider = df.columns.get_loc("b")
# expand dictionary within the `b` column
df["b"] = [tuple(entry.items()) for entry in df.b]
# merge dataframe before `b`, the exploded `b` column, and the dataframe after `b`
merger = (
df.iloc[:, :-divider],
df.b.explode().apply(pd.Series).set_axis(["k", "v"], axis=1),
df.iloc[:, : -(divider + 1)],
)
pd.concat(merger, axis=1)
a k v
0 1 a 1
0 1 b 2
1 2 k 4
1 2 v 6
2 3 z 3
Consider we have 2 dataframes:
df = pd.DataFrame(columns = ['a','b','c']) ##empty
d = {'a': [1, 2], 'b': [3, 4]}
df1 = pd.DataFrame(data=d)
How can I join them in order the result to be this:
a b c
-----
1 3 Nan
---------
2 4 Nan
-------
Use reindex by columns from df:
df = pd.DataFrame(columns = ['a','b','c'])
d = {'a': [1, 2], 'b': [3, 4]}
df1 = pd.DataFrame(data=d).reindex(columns=df.columns)
print (df1)
a b c
0 1 3 NaN
1 2 4 NaN
Difference betwen soluions - if columns are not sorted get different output:
#different order
df = pd.DataFrame(columns = ['c','a','b'])
d = {'a': [1, 2], 'b': [3, 4]}
df1 = pd.DataFrame(data=d)
print (df1.reindex(columns=df.columns))
c a b
0 NaN 1 3
1 NaN 2 4
print (df1.merge(df,how='left'))
a b c
0 1 3 NaN
1 2 4 NaN
How can I join them
If you have the dataframe existing somewhere(not creating a new), do :
df1.merge(df,how='left')
a b c
0 1 3 NaN
1 2 4 NaN
Note: This produces sorted columns. So if order of columns are already sorted, this will work fine , else not.