Multiply dataframe with values from other dataframe

Multiply dataframe with values from other dataframe - python

I have two dataframes
df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]], index = ['a','b','c', 'a'], columns = ['d','e'])
d e
a 1 2
b 3 4
c 5 6
a 7 8
df2 = pd.DataFrame([['a', 10],['b',20],['c',30],['f',40]])
0 1
0 a 10
1 b 20
2 c 30
3 f 40
i want my final dataframe to multiply rows of df1 to multiply by a factor corresponding to value in df2 (for eg. 20 for b)
so my output should look like
d e
a 10 20
b 60 80
c 150 180
a 70 80
Kindly provide a solution assuming df1 to be hundreds of rows in length. I could only think of looping through df1.index.

Use set_index and reindex to align df2 with df1 and then mul
In [1150]: df1.mul(df2.set_index(0).reindex(df1.index)[1], axis=0)
Out[1150]:
d e
a 10 20
b 60 80
c 150 180
a 70 80

Create a mapping and call df.apply:
In [1128]: mapping = dict(df2.values)
In [1129]: df1.apply(lambda x: x * mapping[x.name], 1)
Out[1129]:
d e
a 10 20
b 60 80
c 150 180
a 70 80

IIUC:
In [55]: df1 * pd.DataFrame(np.tile(df2[[1]],2), columns=df1.columns, index=df2[0])
Out[55]:
d e
a 10 20
a 70 80
b 60 80
c 150 180
Helper DF:
In [57]: pd.DataFrame(np.tile(df2[[1]],2), columns=df1.columns, index=df2[0])
Out[57]:
d e
0
a 10 10
b 20 20
c 30 30

This is straight forward. You just make sure they have a common axis, then you can combine them:
put the lookup column into the index
df2.set_index(0, inplace=True)
1
0
a 10
b 20
c 30
Now you can put that column into df1 very easily:
df1['multiplying_factor'] = df2[1]
Now you just want to multiply two columns:
df1['final_value'] = df1.e*df1.multiplying_factor
Now df1 looks like:
d e multiplying_factor final_value
a 1 2 10 20
b 3 4 20 80
c 5 6 30 180
a 7 8 10 80

Related

how to split dataframe cells using delimiter into different dataframes. with conditions

There are other questions on the same topic and they helped but I have an extra twist.
I have a dataframe with multiple values in each (but not all) cells.
df = pd.DataFrame({'a':["10-30-410","20-40-500","25-50"], 'b':["5-8-9","4", "99"]})
index
a
b
0
10-30-410
5-8-9
1
20-40-500
4
2
25-50
99
How can I split each cell by the dash "-" and create three new dataframes? Note that not all cells have multiple values, in which case the second and third dataframes get NA or blank (treating these as strings).
So I need df1 to be the first of those values:
index
a
b
0
10
5
1
20
4
2
25
99
And df2 would be:
index
a
b
0
30
8
1
40
2
50
And likewise for df3:
index
a
b
0
410
9
1
500
2
I got df1 with this
df1 = df.replace(r'(\d+).*(\d+).*(\d+)+', r'\1', regex=True)
But df2 doesn't quite work. I get the second values but also 4 and 99, which should be blank.
df2 = df.replace(r'(\d+)-(\d+).*', r'\2', regex=True)
index
a
b
0
30
8
1
40
4 - should be blank
2
50
99 - should be blank
Is this the right approach? I'm pretty good on regex but fuzzy with groups. Thank you.

Use str.split + concat + stack to get the data in a more usable format:
new_df = pd.concat(
(df['a'].str.split('-', expand=True),
df['b'].str.split('-', expand=True)),
keys=('a', 'b'),
axis=1
).stack(dropna=False).droplevel(0)
new_df:
a b
0 10 5
1 30 8
2 410 9
0 20 4
1 40 None
2 500 None
0 25 99
1 50 None
2 None None
Expandable option for n cols:
cols = ['a', 'b']
new_df = pd.concat(
(df[c].str.split('-', expand=True) for c in cols),
keys=cols,
axis=1
).stack(dropna=False).droplevel(0)
Then groupby level 0 + reset_index to create a list of dataframes:
dfs = [g.reset_index(drop=True) for _, g in new_df.groupby(level=0)]
dfs:
[ a b
0 10 5
1 20 4
2 25 99,
a b
0 30 8
1 40 None
2 50 None,
a b
0 410 9
1 500 None
2 None None]
Complete Working Example:
import pandas as pd
df = pd.DataFrame({
'a': ["10-30-410", "20-40-500", "25-50"],
'b': ["5-8-9", "4", "99"]
})
cols = ['a', 'b']
new_df = pd.concat(
(df[c].str.split('-', expand=True) for c in cols),
keys=cols,
axis=1
).stack(dropna=False).droplevel(0)
dfs = [g.reset_index(drop=True) for _, g in new_df.groupby(level=0)]
print(dfs)

You can also try with filter:
k = pd.concat((df[c].str.split('-', expand=True).add_prefix(c+ '-')
for c in df.columns), 1).fillna('')
df1 = k.filter(like='0')
df2 = k.filter(like='1')
df3 = k.filter(like='2')
NOTE: To strip the digit from columns use : k.filter(like='0').rename(columns= lambda x: x.split('-')[0])

Adding new columns to dataframe from other dataframe according to list of indices in other dataframe

I have two dataframes, each row in dataframe A has a list of indices corresponding to entries in dataframe B and a set of other values. I want to join the two dataframes in a way so that each of the entries in B has the other values in A where the index of the entry in B is in the list of indices in the entry in A.
So far, I have found a way of extracting the rows in B for the list of indices in each row in A but only row-by-row from this answer but then I am not sure where to go from here? Also not sure if there's a better way of doing it with Pandas dynamically as the size of the list of indices can change.
import pandas as pd
import numpy as np
# Inputs
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
print(A)
>> indices a1 a2
>> 0 [0, 1] a 100
>> 1 [2, 3] b 200
>> 2 [4, 5] c 300
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
print(B)
>> b
>> 0 10
>> 1 20
>> 2 30
>> 3 40
>> 4 50
>> 5 60
# This is the desired output
out = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60],
"a1": ["a","a", "b", "b", "c", "c"],
"a2": [100,100,200,200,300,300]
})
print(out)
>> b a1 a2
>> 0 10 a 100
>> 1 20 a 100
>> 2 30 b 200
>> 3 40 b 200
>> 4 50 c 300
>> 5 60 c 300

If you have pandas >=0.25, you can use explode:
C = A.explode('indices')
This gives:
indices a1 a2
0 0 a 100
0 1 a 100
1 2 b 200
1 3 b 200
2 4 c 300
2 5 c 300
Then do:
output = pd.merge(B, C, left_index = True, right_on = 'indices')
output.index = output.indices.values
output.drop('indices', axis = 1, inplace = True)
Final Output:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300

using pd.merge
df2 = pd.DataFrame(A.set_index(['a1','a2']).indices)
df = pd.DataFrame(df2.indices.values.tolist(), index=a.index).stack().reset_index().drop('level_2', axis=1).set_index(0)
pd.merge(B,df,left_index=True, right_index=True)
Output
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300

Here you go:
helper = A.indices.apply(pd.Series).stack().reset_index(level=1, drop=True)
A = A.reindex(helper.index).drop(columns=['indices'])
A['indices'] = helper
B = B.merge(A, left_index=True, right_on='indices').drop(columns=['indices']).reset_index(drop=True)
Result:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300

You can also use melt instead of stack, but it's more complicated as you must drop columns you don't need:
import pandas as pd
import numpy as np
# Inputs
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
AA = pd.concat([A.indices.apply(pd.Series), A], axis=1)
AA.drop(['indices'], axis=1, inplace=True)
print(AA)
0 1 a1 a2
0 0 1 a 100
1 2 3 b 200
2 4 5 c 300
AA = AA.melt(id_vars=['a1', 'a2'], value_name='val').drop(['variable'], axis=1)
print(AA)
a1 a2 val
0 a 100 0
1 b 200 2
2 c 300 4
3 a 100 1
4 b 200 3
5 c 300 5
pd.merge(AA.set_index(['val']), B, left_index=True, right_index=True)
Out[8]:
a1 a2 b
0 a 100 10
2 b 200 30
4 c 300 50
1 a 100 20
3 b 200 40
5 c 300 60

This solution will handle indices of varying lengths.
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
A = A.indices.apply(pd.Series) \
.merge(A, left_index = True, right_index = True) \
.drop(["indices"], axis = 1)\
.melt(id_vars = ['a1', 'a2'], value_name = "index")\
.drop("variable", axis = 1)\
.dropna()
A = A.set_index('index')
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
B
B.merge(A,left_index=True,right_index=True)
Final Output:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300

Group dataframe by separation in intervals

I have a dataframe that has intervals and a label associated with each. I need to group and aggregate rows separated by a given distance from others.
For example, groups rows whose start/end are within 3 units of the the start/end of other rows have their label fields concatenated:
In [16]: df = pd.DataFrame([
...: [ 1, 3,'a'], [ 4,10,'b'],
...: [15,17,'c'], [18,20,'d'],
...: [27,30,'e'], [31,40,'f'], [41,42,'g'],
...: [50,54,'h']],
...: columns=['start', 'end', 'label'])
...:
In [17]: df
Out[17]:
start end label
0 1 3 a
1 4 10 b
2 15 17 c
3 18 20 d
4 27 30 e
5 31 40 f
6 41 42 g
7 50 54 h
Desired output:
In [18]: df_desired = group_by_interval(df)
In [19]: df_desired
Out[19]:
start end label
0 1 10 a b
1 15 20 c d
2 27 30 e f g
3 50 54 h
How can I execute this sort of grouping by interval with a dataframe?
I have found one similar SO here, but it's a little different since I don't know where to cut a priori.

You can create a grouper based on the condition and aggregate
grouper = ((df['start'] - df['end'].shift()) > 3).cumsum()
df.groupby( grouper).agg({'start' : 'first', 'end' : 'last', 'label': lambda x: ' '.join(x)})
start end label
0 1 10 a b
1 15 20 c d
2 27 42 e f g
3 50 54 h

intersect dataframes python, keep one dataframe columns

I would like to join 2 dataframes, so that the result will be the intersection on the two datasets on the key column.
By doing this:
result = pd.merge(df1,df2,on='key', how='inner')
I will get what I need, but with extra columns of df2. I only want df1 columns in the results. (I do not want to delete them later).
Any ideas?
Thanks,

Here is a generic solution which will work for one and for multiple keys (joining) columns:
Setup:
In [28]: a = pd.DataFrame({'a':[1,2,3,4], 'b':[10,20,30,40], 'c':list('abcd')})
In [29]: b = pd.DataFrame({'a':[3,4,5,6], 'b':[30,41,51,61], 'c':list('efgh')})
In [30]: a
Out[30]:
a b c
0 1 10 a
1 2 20 b
2 3 30 c
3 4 40 d
In [31]: b
Out[31]:
a b c
0 3 30 e
1 4 41 f
2 5 51 g
3 6 61 h
multiple joining keys:
In [32]: join_cols = ['a','b']
In [33]: a.merge(b[join_cols], on=join_cols)
Out[33]:
a b c
0 3 30 c
single joining key:
In [34]: join_cols = ['a']
In [35]: a.merge(b[join_cols], on=join_cols)
Out[35]:
a b c
0 3 30 c
1 4 40 d

how to filter groupby object in pandas based on difference of values within the group?

I have a dataframe as listed below:
In []: dff = pd.DataFrame({'A': np.arange(8),
'B': list('aabbbbcc'),
'C':np.random.randint(100,size=8)})
which i have grouped based on column B
In []: grouped = dff.groupby('B')
Now, I want to filter the dff based on difference of values in column 'C'. For example, if the difference between any two points within the group in column C is greater than a threshold, remove that row.
If dff is:
A B C
0 0 a 18
1 1 a 25
2 2 b 56
3 3 b 62
4 4 b 46
5 5 b 56
6 6 c 74
7 7 c 3
Then, a threshold of 10 for C will produce a final table like:
A B C
0 0 a 18
1 1 a 25
2 2 b 56
3 3 b 62
4 4 b 46
5 5 b 56
here the grouped category c (small letter) is removed as the difference between the two is greater than 10, but category b has all the rows intact as they are all within 10 of each other.

I think I'd do the hard work in numpy:
In [11]: a = np.array([2, 3, 14, 15, 54])
In [12]: res = np.abs(a[:, np.newaxis] - a) < 10 # Note: perhaps you want <= 10.
In [13]: np.fill_diagonal(res, False)
In [14]: res.any(0)
Out[14]: array([ True, True, True, True, False], dtype=bool)
You could wrap this in a function:
In [15]: def has_close(a, n=10):
res = np.abs(a[:, np.newaxis] - a) < n
np.fill_diagonal(res, False)
return res.any(0)
In [16]: g = df.groupby('B', as_index=False)
In [17]: g.C.apply(lambda x: x[has_close(x.C.values)])
Out[17]:
A B C
0 0 a 18
1 1 a 25
2 2 b 56
3 3 b 62
5 5 b 56

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiply dataframe with values from other dataframe - python

Use set_index and reindex to align df2 with df1 and then mul In [1150]: df1.mul(df2.set_index(0).reindex(df1.index)[1], axis=0) Out[1150]: d e a 10 20 b 60 80 c 150 180 a 70 80

Create a mapping and call df.apply: In [1128]: mapping = dict(df2.values) In [1129]: df1.apply(lambda x: x * mapping[x.name], 1) Out[1129]: d e a 10 20 b 60 80 c 150 180 a 70 80

IIUC: In [55]: df1 * pd.DataFrame(np.tile(df2[[1]],2), columns=df1.columns, index=df2[0]) Out[55]: d e a 10 20 a 70 80 b 60 80 c 150 180 Helper DF: In [57]: pd.DataFrame(np.tile(df2[[1]],2), columns=df1.columns, index=df2[0]) Out[57]: d e 0 a 10 10 b 20 20 c 30 30

Related

how to split dataframe cells using delimiter into different dataframes. with conditions

Adding new columns to dataframe from other dataframe according to list of indices in other dataframe

Group dataframe by separation in intervals

intersect dataframes python, keep one dataframe columns

how to filter groupby object in pandas based on difference of values within the group?

Categories

Resources