Putting rows of pandas dataframe into list form - python

I have a pandas dataframe of the form
T1 T2
0 A B
1 C D
2 B C
3 D E
4 F A
I would like to generate another pandas dataframe that lists each of the unique items in T1 and T2 has its own row, and has a column with the name of that unique item and a column with a list of the items it shared a row with in the original dataframe. For example, in this case I would be looking for something of the form:
Name List
0 A [B, F]
1 B [A, C]
2 C [D, B]
3 D [C, E]
4 E [D]
5 F [A]
Can someone suggest a proper pandonic (like pythonic but for pandas :)) way to do this? Thanks in advance!

IIUC, swap columns and use pandas.DataFrame.columns:
df2 = df.copy()
df2.columns = df.columns[::-1]
new_df = pd.concat([df, df2])
new_df.groupby("T1")["T2"].apply(list).reset_index()
Output:
T1 T2
0 A [B, F]
1 B [C, A]
2 C [D, B]
3 D [E, C]
4 E [D]
5 F [A]

Related

Aggregate values pandas

I have a pandas dataframe like this:
Id A B C D
1 a b c d
2 a b d
2 a c d
3 a d
3 a b c
I want to aggregate the empty values for the columns B-C and D, using the values contained in the other rows, by using the information for the same Id.
The resulting data frame should be the following:
Id A B C D
1 a b c d
2 a b c d
3 a b c d
There can be the possibility to have different values in the first column (A), for the same Id. In this case instead of putting the first instance I prefer to put another value indicating this event.
So for e.g.
Id A B C D
1 a b c d
2 a b d
2 x c d
It becomes:
Id A B C D
1 a b c d
2 f b c d
IIUC, you can use groupby_agg:
>>> df.groupby('Id')
.agg({'A': lambda x: x.iloc[0] if len(x.unique()) == 1 else 'f',
'B': 'first', 'C': 'first', 'D': 'first'})
A B C D
Id
1 a b c d
2 f b c d
The best way I can think to do this is to iterate through each unique Id, slicing it out of the original dataframe, and constructing a new row as a product of merging the relevant rows:
def aggregate(df):
ids = df['Id'].unique()
rows = []
for id in ids:
relevant = df[df['Id'] == id]
newrow = {c: "" for c in df.columns}
for _, row in relevant.iterrows():
for col in newrow:
if row[col]:
if len(newrow[col]):
if newrow[col][-1] == row[col]:
continue
newrow[col] += row[col]
rows.append(newrow)
return pd.DataFrame(rows)

Pandas swap values for columns

What's the simplest way to achieve the below with pandas?
df1 =
A B C
0 1 1 2
1 2 3 1
2 3 3 2
to
df_result =
1 2 3
0 [A, B] [C] []
1 [C] [A] [B]
2 [] [C] [A,B]
Thanks in advance
Use DataFrame.stack with Series.reset_index for DataFrame, aggregate list and reshape by Series.unstack, lasr remove index and columns names by DataFrame.rename_axis:
df = (df.stack()
.reset_index(name='val')
.groupby(['level_0','val'])['level_1']
.agg(list)
.unstack(fill_value=[])
.rename_axis(index=None, columns=None))
print (df)
1 2 3
0 [A, B] [C] []
1 [C] [A] [B]
2 [] [C] [A, B]

Repeating other column values when using pandas.Series.explode()

I have a pandas dataframe of the form
a b
0 [a, b] 0
1 [c, d, e] 1
I have written a function to create a list of partial lists:
def partials(l):
result = []
for i, elem in enumerate(l):
result.append(l[:i+1])
return result
which, when applied to the series df['a'], and exploding, using d['a'].apply(partials).explode() correctly gives:
0 [a]
0 [a, b]
1 [c]
1 [c, d]
1 [c, d, e]
However, this series is necessarily longer than the original. How can I apply this function in-place to column a of my dataframe, such that the column b repeats its value wherever the corresponding line from the original dataframe is 'exploded', like this :
a b
0 [a] 0
0 [a, b] 0
1 [c] 1
1 [c, d] 1
1 [c, d, e] 1
?
You can join back:
(df['a'].apply(partials)
.explode().to_frame()
.join(df.drop('a', axis=1))
)
Output:
a b
0 [a] 0
0 [a, b] 0
1 [c] 1
1 [c, d] 1
1 [c, d, e] 1

Counting each unique array of an array in each row of a column in a data frame

I am practicing pandas and python and I am not so good at for loops. I have a data frame as below: let's say this is df:
Name Value
A [[A,B],[C,D]]
B [[A,B],[D,E]]
C [[D,E],[K,L],[M,L]]
D [[K,L]]
I want to go through each row and find unique arrays and count them.
I have tried np.unique(a, return_index=True) then returns two different list and my problem I don't know how to go through each array.
Expected result would be:
Value Counts
[A,B] 2
[D,E] 2
[K,L] 2
[C,D] 1
[M,L] 1
Thank you very much.
Use DataFrame.explode in pandas +0.25:
df.explode('Value')['Value'].value_counts()
Output:
[K, L] 2
[A, B] 2
[D, E] 2
[C, D] 1
[M, L] 1
Name: Value, dtype: int64
Use Series.explode with Series.value_counts:
df = df['Value'].explode().value_counts().rename_axis('Value').reset_index(name='Counts')
print (df)
Value Counts
0 [D, E] 2
1 [A, B] 2
2 [K, L] 2
3 [C, D] 1
4 [M, L] 1
Numpy solution:
a, v = np.unique(np.concatenate(df['Value']),axis=0, return_counts=True)
df = pd.DataFrame({'Value':a.tolist(), 'Counts':v})
print (df)
Value Counts
0 [A, B] 2
1 [C, D] 1
2 [D, E] 2
3 [K, L] 2
4 [M, L] 1

python 3 get the column name depending of a condition [duplicate]

This question already has answers here:
Create a column in a dataframe that is a string of characters summarizing data in other columns
(3 answers)
Closed 4 years ago.
So i have a pandas df (python 3.6) like this
index A B C ...
A 1 5 0
B 0 0 1
C 1 2 4
...
As you can see, the index values are the same as the columns names.
What i'm trying to do is to get a new column in the dataframe that has the name of the columns where the value is > than 0
index A B C ... NewColumn
A 1 5 0 [A,B]
B 0 0 1 [C]
C 1 2 4 [A,B,C]
...
i've been trying with iterrows with no success
also i know i can melt and pivot but i think there should be a way with apply lamnda maybe?
Thanks in advance
If new column should be string compare by DataFrame.gt with dot product with columns, last remove trailing separator:
df['NewColumn'] = df.gt(0).dot(df.columns + ', ').str.rstrip(', ')
print (df)
A B C NewColumn
A 1 5 0 A, B
B 0 0 1 C
C 1 2 4 A, B, C
And for lists use apply with lambda function:
df['NewColumn'] = df.gt(0).apply(lambda x: x.index[x].tolist(), axis=1)
print (df)
A B C NewColumn
A 1 5 0 [A, B]
B 0 0 1 [C]
C 1 2 4 [A, B, C]
Use:
df['NewColumn'] = df.apply(lambda x: list(x[x.gt(0)].index),axis=1)
A B C NewColumn
A 1 5 0 [A, B]
B 0 0 1 [C]
C 1 2 4 [A, B, C]
You could use .gt to check which values are greater than 0 and .dot to obtain the corresponding columns. Finally .apply(list) to turn the results to lists:
df.loc[:, 'NewColumn'] = df.gt(0).dot(df.columns).apply(list)
A B C NewColumn
index
A 1 5 0 [A, B]
B 0 0 1 [C]
C 1 2 4 [A, B, C]
Note: works with single letter columns, otherwise you could do:
df.loc[:, 'NewColumn'] = ((df.gt(0) # df.columns.map('{},'.format))
.str.rstrip(',').str.split(','))
A B C NewColumn
index
A 1 5 0 [A, B]
B 0 0 1 [C]
C 1 2 4 [A, B, C]

Categories