Pivot table based on the first value of the group in Pandas - python

Have the following DataFrame:
I'm trying to pivot it in pandas and achieve the following format:
Actually I tried the classical approach with pd.pivot_table() but it does not work out:
pd.pivot_table(df,values='col2', index=[df.index], columns = 'col1')
Would be appreciate for some suggestions :) Thanks!

You can use pivot and then dropna for each column:
>>> df.pivot(columns='col1', values='col2').apply(lambda x: x.dropna().tolist()).astype(int)
col1 a b c
0 1 2 9
1 4 5 0
2 6 8 7

Another option is to create a Series of lists using groupby.agg; then construct a DataFrame:
out = df.groupby('col1')['col2'].agg(list).pipe(lambda x: pd.DataFrame(zip(*x), columns=x.index.tolist()))
Output:
A B C
0 1 2 9
1 4 5 0
2 6 8 7

Related

Appending rows to existing pandas dataframe

I have a pandas dataframe df1
a b
0 1 2
1 3 4
I have another dataframe in the form of a dictionary
dictionary = {'2' : [5, 6], '3' : [7, 8]}
I want to append the dictionary values as rows in dataframe df1. I am using pandas.DataFrame.from_dict() to convert the dictionary into dataframe. The constraint is, when I do it, I cannot provide any value to the 'column' argument for the method from_dict().
So, when I try to concatenate the two dataframes, the pandas adds the contents of the new dataframe as new columns. I do not want that. The final output I want is in the format
a b
0 1 2
1 3 4
2 5 6
3 7 8
Can someone tell me how do I do this in least painful way?
Use concat with help of pd.DataFrame.from_dict, setting the columns of df1 during the conversion:
out = pd.concat([df1,
pd.DataFrame.from_dict(dictionary, orient='index',
columns=df1.columns)
])
Output:
a b
0 1 2
1 3 4
2 5 6
3 7 8
Another possible solution, which uses numpy.vstack:
pd.DataFrame(np.vstack([df.values, np.array(
list(dictionary.values()))]), columns=df.columns)
Output:
a b
0 1 2
1 3 4
2 5 6
3 7 8

Python pandas groupby get list in the cell (excel)

So I'm very new with pandas excel in python
here's what I'm trying to achieve
data before
what I'm trying to get
I haven't found specific ways to do this with groupby
pls help
You can try groupby then agg
df['c'] = df['c'].astype(str)
out = df.groupby(['a', 'b'])['c'].agg(','.join).reset_index()
print(out)
a b c
0 1 2 3,4
1 1 3 5
2 2 2 3,4
3 2 3 5
4 5 6 7

Pandas How to find duplicate row in group

everyone, I try to find duplicate row in double grouped DataFrame and I don't understand how to do it.
df_part[df_part.income_flag==1].groupby(['app_id', 'month_num'])['amnt'].duplicate()
For example df:
So I want to see something like this:
So, if I use thise code I see that there are two same value 'amnt' 0.387677 but in different month... it's information that i need
df_part[(df_part.income_flag==2) & df_part.duplicated(['app_id','amnt'], keep=False)].groupby(['app_id', 'amnt', 'month_num'])['month_num'].count().head(10)
app_id amnt month_num
0 0.348838 3 1
0.387677 6 1
10 2
0.426544 2 2
0.475654 2 1
0.488173 1 1
1 0.297589 1 1
4 1
0.348838 2 1
0.426544 8 3
Name: month_num, dtype: int64
Thanks all.
I think you need chain another mask by & for bitwise AND with DataFrame.duplicated and then use GroupBy.size:
df = (df_part[(df_part.income_flag==1) & df_part.duplicated(['app_id','amnt'], keep=False)]
.groupby('app_id')['amnt']
.size()
.reset_index(name='duplicate_count'))
print (df)
app_id duplicate_count
0 12 2
1 13 3

How to apply .astype() method to a dataframe in Python?

I want to convert multiple columns in a dataframe (pandas) to the type "category" using the method .astype. Here is my code:
df['Field_1'].astype('category').cat.codes
works however
categories = df.select_types('objects')
categories['Field_1'].cat.codes
doesn't.
Would someone please tell my why?
In general, the question is how to apply a method (.astype) to a dataframe? I know how to apply a method to a column in a dataframe, however, applying it to a dataframe hasnt been successful, even with for loop since the for loop returns a series and the method .cat.codes is not appliable for the series.
I think you need processing each column separately in DataFrame.apply and lambda function, your code failed, because Series.cat.codes is not implemented for DataFrame:
df = pd.DataFrame({
'A':list('acbdac'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':list('dddbbb')
})
cols = df.select_dtypes('object').columns
df[cols] = df[cols].apply(lambda x: x.astype('category').cat.codes)
print (df)
A B C D
0 0 4 7 1
1 2 5 8 1
2 1 4 9 1
3 3 5 4 0
4 0 5 2 0
5 2 4 3 0
Similar idea, not sure if same output if convert all columns to categorical in first step by DataFrame.astype:
cols = df.select_dtypes('object').columns
df[cols] = df[cols].astype('category').apply(lambda x: x.cat.codes)
print (df)
A B C D
0 0 4 7 1
1 2 5 8 1
2 1 4 9 1
3 3 5 4 0
4 0 5 2 0
5 2 4 3 0

Is there a way in pandas to create an integer in a new column if a row contains a specific string

For example, I have the following dataframe:
I want to transform the dataframe from above to something like this:
Thank's for any kind of help!
Run:
df['Number'] = df.svn_changes.str.match(r'r\d+').cumsum()
Yes, is contains with regex and cumsum:
df = pd.DataFrame({'svn_changes':['r123456','RowValueRow','ValueRowValue',
'some_string_string','r234566','ValueRowValue',
'some_string_string','r123789','something_here',
'ValueRowValue','String_2','String_4']})
df['Number'] = df['svn_changes'].str.contains('r\d+').cumsum()
print(df)
Output:
svn_changes Number
0 r123456 1
1 RowValueRow 1
2 ValueRowValue 1
3 some_string_string 1
4 r234566 2
5 ValueRowValue 2
6 some_string_string 2
7 r123789 3
8 something_here 3
9 ValueRowValue 3
10 String_2 3
11 String_4 3
Here's a simple reusable line you can use to do that:
df['new_col'] = df['old_col'].str.contains('string_to_match')*1
The new column will have value 1 if the string is present in this column, and 0 otherwise.

Categories