Multiple new columns dependent on other column value

Multiple new columns dependent on other column value - python

I have a dataframe that looks like this:
Node Node1 Length Spaces Dist T
1 2 600 30 300 100
1 3 400 20 200 100
2 1 600 30 300 100
2 6 500 25 250 400
3 1 400 20 200 100
3 4 400 20 200 200
3 12 400 20 200 200
4 3 400 20 200 200
4 5 200 10 100 500
4 11 600 30 300 1400
5 4 200 10 100 500
5 6 400 20 200 200
5 9 500 25 250 800
6 2 500 25 250 400
6 5 400 20 200 200
6 8 200 10 100 800
This tells us that, for example in the first row, there are 30 spaces between nodes 1 and 2. How could I create, say, 30 new columns with a value of 1 to represent each space seperately. Then do the same for each row.

The code below should work (col 'A' is your 'NoSpaces'):
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
max_val = df['A'].max()
for itr in range(max_val):
colname = 'A%d' % itr
df[colname] = (df['A'] >=itr).astype('int')

Related

Comparing the Date Columns of Two Dataframes and Keeping the Rows with the same Dates

I have two dataframes df1 and df2, I would like to keep the rows in each dataframe containing the same dates as the other one:
df1
> Date Price Volume
0 2002-01-04 100 200
1 2002-01-05 200 400
2 2002-01-06 300 600
3 2002-01-07 400 800
4 2002-01-08 500 1000
5 2002-01-09 600 1200
6 2002-01-10 700 1400
df2
> Date Price Volume
0 2002-01-04 100 200
1 2002-01-05 200 400
2 2002-01-06 300 600
3 2002-01-07 400 800
4 2002-01-09 500 1000
5 2002-01-11 600 1200
6 2002-01-12 700 1400
7 2002-01-13 800 1600
Desired output:
df1
> Date Price Volume
0 2002-01-04 100 200
1 2002-01-05 200 400
2 2002-01-06 300 600
3 2002-01-07 400 800
5 2002-01-09 600 1200
df2
> Date Price Volume
0 2002-01-04 100 200
1 2002-01-05 200 400
2 2002-01-06 300 600
3 2002-01-07 400 800
4 2002-01-09 500 1000

First make the date column as index and then do the following:
common_index = set(df1.index).intersection(df2.index)
df1 = df1.loc[common_index].copy()
df2 = df2.loc[common_index].copy()

Groupby sequence in order by date, find the min, max based on other column value

I started to learn pandas 40 days ago. I only know pandas basics functions.
I have a data frame as shown below.
ID Status Date Cost
0 1 F 2017-06-22 500
1 1 M 2017-07-22 100
2 1 P 2017-10-22 100
3 1 F 2018-06-22 600
4 1 P 2018-08-22 150
5 1 F 2018-10-22 120
6 1 F 2019-03-22 750
7 2 M 2017-06-29 200
8 2 F 2017-09-29 600
9 2 F 2018-01-29 500
10 2 M 2018-03-29 100
11 2 P 2018-08-29 100
12 2 M 2018-10-29 100
13 2 F 2018-12-29 500
14 3 M 2017-03-20 300
15 3 F 2018-06-20 700
16 3 P 2018-08-20 100
17 3 M 2018-10-20 250
18 3 F 2018-11-20 100
19 3 P 2018-12-20 100
20 3 F 2019-03-20 600
22 4 M 2017-08-10 800
23 4 F 2018-06-10 100
24 4 P 2018-08-10 120
25 4 F 2018-10-10 500
26 4 M 2019-01-10 200
27 4 F 2019-06-10 600
31 7 M 2017-08-10 800
32 7 F 2018-06-10 100
33 7 P 2018-08-10 20
34 7 F 2018-10-10 500
35 7 F 2019-01-10 200
The data set is sorted based on ID and Date.
please note that Last Status in all IDs are F.
From the above data frame I would like to prepare below data frame.
ID SLS Cost#SLS Min_Cost Max_Cost Avg_Cost
1 F 120 100 600 261.67
2 M 100 100 600 266.67
3 P 100 100 700 258.33
4 M 200 100 800 344.00
7 F 500 20 800 360.00
SLS = Second Last Status
Please note that Min, Max and Avg Cost are calculated without considering last rows per IDs.
And from that replace Cost#SLS = 1000, if SLS == F
The expected data frame is as shown below.
ID SLS Cost#SLS Min_Cost Max_Cost Avg_Cost
1 F 1000 100 600 261.67
2 M 100 100 600 266.67
3 P 100 100 700 258.33
4 M 200 100 800 344.00
7 F 1000 20 800 360.00

Here is one way slightly modify piR's answer
s=df[df.ID.duplicated(keep='last')].groupby('ID').agg({'Status': ['last'], 'Cost': [ 'last','min', 'max', 'mean']})
s.loc[s[('Status','last')]=='F',('Cost','last')]=1000
s
Status Cost
last last min max mean
ID
1 F 1000 100 600 261.666667
2 M 100 100 600 266.666667
3 P 100 100 700 258.333333
4 M 200 100 800 344.000000
7 F 1000 20 800 355.000000

Python pandas: groupby and devide by the first value of each group

I have a pandas dataframe like this.
>data
ID Distance Speed
1 100 40
1 200 20
1 200 10
2 400 20
2 500 30
2 100 40
2 600 20
2 700 90
3 800 80
3 700 10
3 400 20
I want to groupby the table by ID, and create a new column time by dividing each value in the Distance column by the first row of the Speed column of each ID group. So the result should look like this.
>data
ID Distance Speed Time
1 100 40 2.5
1 200 20 5
1 200 10 5
2 400 20 20
2 500 30 25
2 100 40 5
2 600 20 30
2 700 90 35
3 800 80 10
3 700 10 8.75
3 400 20 5
My attempt:
data['Time'] = data['Distance'] / data.loc[data.groupby('ID')['Speed'].head(1).index, 'Speed']
But the result seems to be not good. How do you do it?

Use transform with first for return same length Series as original df:
data['Time'] = data['Distance'] /data.groupby('ID')['Speed'].transform('first')
Or use drop_duplicates with map:
s = data.drop_duplicates('ID').set_index('ID')['Speed']
data['Time'] = data['Distance'] / data['ID'].map(s)
print (data)
ID Distance Speed Time
0 1 100 40 2.50
1 1 200 20 5.00
2 1 200 10 5.00
3 2 400 20 20.00
4 2 500 30 25.00
5 2 100 40 5.00
6 2 600 20 30.00
7 2 700 90 35.00
8 3 800 80 10.00
9 3 700 10 8.75
10 3 400 20 5.00

Add column in dataframe from another dataframe doing some arithmetic calculations python

i have a table in pandas df
id product_1 product_2 count
1 100 200 10
2 200 600 20
3 100 500 30
4 400 100 40
5 500 700 50
6 200 500 60
7 100 400 70
also i have another table in dataframe df2
product price
100 5
200 10
300 15
400 20
500 25
600 30
700 35
i have to create a new column price_product2 in my first df, taking values of price from df2 with respect to product_2.
And also find the percentage difference of product_2 with respect to product_1
and make one more column %_diff .
i.e say product_1 = 100 and product_2 = 200. therefore product_2 is 200% of the price of 100.
similarly if product_1 = 400 and product_2 = 100, it is a decline in price.
therefore product_2 is -25% of product_1.
my final output should be. df =
id product_1 product_2 count price_product_2 %_diff
1 100 200 10 10 +200
2 200 600 20 30 +300
3 100 500 30 25 +500
4 400 100 40 5 -25
5 500 700 50 35 +140
6 200 500 60 25 +250
7 100 400 70 20 -71.42
Any ideas how to achieve it?
i was trying to use map functions.
df['price_product_2'] = df['product_2'].map(df2.set_index('product_id')['price'])
but i could get only one column , how do i get the %_diff column?

Use merge (or map) twice, once for each product, then calculate the difference.
# Add prices for products 1 and 2
df3 = (df1.
merge(df2, left_on='product_1', right_on='product').
merge(df2, left_on='product_2', right_on='product'))
# Calculate the percent difference
df3['pct_diff'] = (df3.price_y - df3.price_x) / df3.price_x

Suppose you have the following data frames:
In [32]: df1
Out[32]:
index id product_1 product_2 count
0 0 1 100 200 10
1 1 2 200 600 20
2 2 3 100 500 30
3 3 4 400 100 40
4 4 5 500 700 50
5 5 6 200 500 60
6 6 7 100 400 70
In [33]: df2
Out[33]:
product price
0 100 5
1 200 10
2 300 15
3 400 20
4 500 25
5 600 30
6 700 35
It is probably easier simply to set product as the index for df2:
In [35]: df2.set_index('product', inplace=True)
In [36]: df2
Out[36]:
price
product
100 5
200 10
300 15
400 20
500 25
600 30
700 35
Then you can do things like the following:
In [37]: df2.loc[df1['product_2']]
Out[37]:
price
product
200 10
600 30
500 25
100 5
700 35
500 25
400 20
Use the values explicitly to set, or else the product index will screw things up:
In [38]: df1['price_product_2'] = df2.loc[df1['product_2']].values
In [39]: df1
Out[39]:
index id product_1 product_2 count price_product_2
0 0 1 100 200 10 10
1 1 2 200 600 20 30
2 2 3 100 500 30 25
3 3 4 400 100 40 5
4 4 5 500 700 50 35
5 5 6 200 500 60 25
6 6 7 100 400 70 20
For the percentage difference, you can also use vectorized operations:
In [40]: df1.product_2 / df1.product_1 * 100
Out[40]:
0 200.0
1 300.0
2 500.0
3 25.0
4 140.0
5 250.0
6 400.0
dtype: float64

Solution with map by dict d with divide by div:
d = df2.set_index('product')['price'].to_dict()
df['price_product_2'] = df['product_2'].map(d)
df['price_product_1'] = df['product_1'].map(d)
df['diff'] = df['price_product_2'].div(df['price_product_1']).mul(100)
print (df)
id product_1 product_2 count price_product_2 price_product_1 diff
0 1 100 200 10 10 5 200.0
1 2 200 600 20 30 10 300.0
2 3 100 500 30 25 5 500.0
3 4 400 100 40 5 20 25.0
4 5 500 700 50 35 25 140.0
5 6 200 500 60 25 10 250.0
6 7 100 400 70 20 5 400.0
But it seems only divide is necessary if multiple by same constant columns product_1 and product_2, then difference is same:
df['diff1'] = df['product_2'].div(df['product_1']).mul(100)
print (df)
id product_1 product_2 count diff1
0 1 100 200 10 200.0
1 2 200 600 20 300.0
2 3 100 500 30 500.0
3 4 400 100 40 25.0
4 5 500 700 50 140.0
5 6 200 500 60 250.0
6 7 100 400 70 400.0

Conditional shift in pandas

The following pandas DataFrame is an example that I need to deal with:
Group Amount
1 1 100
2 1 300
3 1 400
4 1 700
5 2 500
6 2 900
Here's the result that I want after calculation:
Group Amount Difference
1 1 100 100
2 1 300 200
3 1 400 100
4 1 700 300
5 2 500 500
6 2 900 400
I knew that df["Difference"] = df["Amount"] - df["Amount"].shift(-1) can produce the difference between all rows, but what can I do for the problem I have like this that needs a group as condition?

groupby on 'Group' and call transform on the 'Amount' col, additionally call fillna and pass the 'Amount' column:
In [110]:
df['Difference'] = df.groupby('Group')['Amount'].transform(pd.Series.diff).fillna(df['Amount'])
df

Out[110]:
Group Amount Difference
1 1 100 100
2 1 300 200
3 1 400 100
4 1 700 300
5 2 500 500
6 2 900 400

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple new columns dependent on other column value - python

The code below should work (col 'A' is your 'NoSpaces'): import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD')) max_val = df['A'].max() for itr in range(max_val): colname = 'A%d' % itr df[colname] = (df['A'] >=itr).astype('int')

Related

Comparing the Date Columns of Two Dataframes and Keeping the Rows with the same Dates

Groupby sequence in order by date, find the min, max based on other column value

Python pandas: groupby and devide by the first value of each group

Add column in dataframe from another dataframe doing some arithmetic calculations python

Conditional shift in pandas

Categories

Resources