New dataframe from a groupby

New dataframe from a groupby - python

I have the following dataframe:
teste.head(5)
card_id feature_1 feature_2
0 C_ID_92a2005557 5 2
1 C_ID_3d0044924f 4 1
2 C_ID_d639edf6cd 2 2
3 C_ID_186d6a6901 4 3
4 C_ID_cdbd2c0db2 1 3
And I have this other dataframe:
historical.head(5)
authorized_flag card_id city_id category_1 installments category_3 merchant_category_id merchant_id
0 Y C_ID_cdbd2c0db2 88 N 0 A 80 M_ID_e020e9b302
1 Y C_ID_92a2005557 88 N 0 A 367 M_ID_86ec983688
2 Y C_ID_d639edf6cd 88 N 0 A 80 M_ID_979ed661fc
3 Y C_ID_d639edf6cd 88 N 0 A 560 M_ID_e6d5ae8ea6
4 Y C_ID_92a2005557 88 N 0 A 80 M_ID_e020e9b302
Comments:
The first dataframe has only some information about the card_id and the value I want to predict (target)
The second dataframe looks like the history of each card_id containing the columns I need to merge to the first dataframe (giving more information / columns for each card_id)
obviously the card_id in the second dataframe is repeated several times, with this, from the second dataframe I need to create a new dataframe, not letting the card_id multiply.
I can use:
historical.groupby('card_id').size()
and create a new column with the number of times that cad_id was used.
But I'm not able to do this with the rest of the columns because I need to sum all the values in each column and associate each card_id to then merge with the first dataframe
Can you help me create the new columns in the best way?

Related

python sum values in columns taken from another dataframe

I have a dataframe ("MUNg") like this:
MUN_id Col1
1-2 a
3 b
4-5-6 c
...
And another dataframe ("ppc") like this:
id population
0 1 20
1 2 25
2 3 4
3 4 45
4 5 100
5 6 50
...
I need to create a column in "MUNg" that contains the total population obtained by summing the population corresponding to the ids from "pcc", that are present in MUN_id
Expected result:
MUN_id Col1 total_population
1-2 a 45
3 b 4
4-5-6 c 195
...
I don't write how I tried to achieve this, because I am new to python and I don't know how to do it.
MUNg['total_population']=?
Many thanks!

You can split and explode your string into new rows, map the population data and GroupBy.agg to get the sum:
MUNg['total_population'] = (MUNg['MUN_id']
.str.split('-')
.explode()
.astype(int) # required if "id" in "ppc" is an integer, comment if string
.map(ppc.set_index('id')['population'])
.groupby(level=0).sum()
)
output:
MUN_id Col1 total_population
0 1-2 a 45
1 3 b 4
2 4-5-6 c 195

How to add multiple dataframe to individual cells inside a new dataframe

I have multiple dataframes that I have scraped from a website using pandas.read_html(). The tables that I get are not in proportion, meaning that they have different numbers of rows and columns, but they belong to a single entity. So I want to add all those dataframes to individual cells of the same row.
Here is an example.
I have the following dataframes:
df1=pd.DataFrame([[1,2,3]]*2)
df1
df2=pd.DataFrame([['a','b']]*3)
df2
df3=pd.DataFrame([[23,565,34,67,34]]*1)
df3
df4=pd.DataFrame([['df','grgrd','weddv','dfdf','re',45,93]]*5)
df4
and this how I am trying to do it:
dic={}
d['a']=df1
d['b']=df2
d['c']=df3
d['d']=df4
df_out=pd.DataFrame([d])
but the result looks like this:
a b c d
0 0 1 2 0 1 2 3 1 1 2 3 0 1 0 a b 1 a b 2 a b 0 1 2 3 4 0 23 565 34 67 34 0 1 2 3 4 5 6 0 df g...
Looks like the index counters are also added as values to the cells.
How do I remove indices values?
Is there a way that they are stored in a way that they would appear as a table within individual cells?
Is there a better way to do it?

Split Two Related DataFrame Columns into Two New DataFrames

I basically have 2 related columns in a data frame in python. One of the columns is binary i.e. 1,0,0,1,0 etc and the next column has a related value i.e 200, 34, 124, etc. I want to take all the zero values with their corresponding values in the adjacent column and create a new data frame and do the same for all the ones. An illustration of the columns are below;
Location Price
1 24
0 200
0 56
0 89
1 101
1 94
1 3

You can make two new dataframes with just ones and zeros like this, IIUC:
df[df.Location == 0]
# Location Price
#1 0 200
#2 0 56
#3 0 89
df[df.Location == 1]
# Location Price
#0 1 24
#4 1 101
#5 1 94
#6 1 3

How to drop rows by threshold of index column's occur frequence in Pandas

I have a dataframe like this:
userid itemid timestamp
1 1 50
1 2 50
1 3 50
1 4 60
2 1 40
2 2 50
I want to drop all rows whose userid occur more than 2 times and get a new dataframe as follows. Does someone can help me? Thanks.
userid itemid timestamp
2 1 40
2 2 50

You can use pd.Series.value_counts and calculate an array of userid filtered by your condition. Then use this to filter your original dataframe.
c = df['userid'].value_counts()
idx = c[c > 2].index
res = df[~df['userid'].isin(idx)]
print(res)
userid itemid timestamp
4 2 1 40
5 2 2 50

Iteratively Capture Value Counts in Single DataFrame

I have a pandas dataframe that looks something like this:
id group gender age_grp status
1 1 m over21 active
2 4 m under21 active
3 2 f over21 inactive
I have over 100 columns and thousands of rows. I am trying to create a single pandas dataframe of the value_counts of each of the colums. So I want something that looks like this:
group1
gender m 100
f 89
age over21 98
under21 11
status active 87
inactive 42
Any one know a simple way I can iteratively concat the value_counts from each of the 100+ columns in the original dataset while capturing the name of the columns as a hierarchical index?
Eventually I want to be able to merge with another dataframe of a different group to look like this:
group1 group2
gender m 100 75
f 89 92
age over21 98 71
under21 11 22
status active 87 44
inactive 42 13
Thanks!

This should do it:
df.stack().groupby(level=1).value_counts()
id 1 1
2 1
3 1
group 1 1
2 1
4 1
gender m 2
f 1
age_grp over21 2
under21 1
status active 2
inactive 1
dtype: int64

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

New dataframe from a groupby - python

Related

python sum values in columns taken from another dataframe

How to add multiple dataframe to individual cells inside a new dataframe

Split Two Related DataFrame Columns into Two New DataFrames

How to drop rows by threshold of index column's occur frequence in Pandas

Iteratively Capture Value Counts in Single DataFrame

Categories

Resources