Below is the code and output, what I'm trying to get is shown in the "exp" column, as you can see the "countif" column just counts 5 columns, but I want it to only count negative values.
So for example: index 0, df1[0] should equal 2
What am I doing wrong?
Python
import pandas as pd
import numpy as np
a = ['A','B','C','B','C','A','A','B','C','C','A','C','B','A']
b = [2,4,1,1,2,5,-1,2,2,3,4,3,3,3]
c = [-2,4,1,-1,2,5,1,2,2,3,4,3,3,3]
d = [-2,-4,1,-1,2,5,1,2,2,3,4,3,3,3]
exp = [2,1,0,2,0,0,1,0,0,0,0,0,0,0]
df1 = pd.DataFrame({'b':b,'c':c,'d':d,'exp':exp}, columns=['b','c','d','exp'])
df1['sumif'] = df1.where(df1<0,0).sum(1)
df1['countif'] = df1.where(df1<0,0).count(1)
df1
# df1.sort_values(['a','countif'], ascending=[True, True])
Output
You don't need where here, you can simply use df.lt with df.sum(axis=1):
In [1329]: df1['exp'] = df1.lt(0).sum(1)
In [1330]: df1
Out[1330]:
b c d exp
0 2 -2 -2 2
1 4 4 -4 1
2 1 1 1 0
3 1 -1 -1 2
4 2 2 2 0
5 5 5 5 0
6 -1 1 1 1
7 2 2 2 0
8 2 2 2 0
9 3 3 3 0
10 4 4 4 0
11 3 3 3 0
12 3 3 3 0
13 3 3 3 0
EDIT: As per OP's comment including solution with iloc and .lt:
In [1609]: df1['exp'] = df1.iloc[:, :3].lt(0).sum(1)
First DataFrame.where working different, it replace False values to 0 here by condition (here False are greater of equal 0), so cannot be used for count:
print (df1.iloc[:, :3].where(df1<0,0))
b c d
0 0 -2 -2
1 0 0 -4
2 0 0 0
3 0 -1 -1
4 0 0 0
5 0 0 0
6 -1 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
You need compare first 3 columns for less like 0 and sum:
df1['exp1'] = (df1.iloc[:, :3] < 0).sum(1)
#If need compare all columns
#df1['exp1'] = (df1 < 0).sum(1)
print (df1)
b c d exp exp1
0 2 -2 -2 2 2
1 4 4 -4 1 1
2 1 1 1 0 0
3 1 -1 -1 2 2
4 2 2 2 0 0
5 5 5 5 0 0
6 -1 1 1 1 1
7 2 2 2 0 0
8 2 2 2 0 0
9 3 3 3 0 0
10 4 4 4 0 0
11 3 3 3 0 0
12 3 3 3 0 0
13 3 3 3 0 0
Related
I have the following code:
data={'id':[1,2,3,4,5,6,7,8,9,10,11],
'value':[1,0,1,0,1,1,1,0,0,1,0]}
df=pd.DataFrame.from_dict(data)
df
Out[8]:
id value
0 1 1
1 2 0
2 3 1
3 4 0
4 5 1
5 6 1
6 7 1
7 8 0
8 9 0
9 10 1
10 11 0
I want to create a flag column that indicate with 1 consecutive values starting from the second occurrence and ignoring the first.
With the actual solution:
df['flag'] =
df.value.groupby([df.value,df.flag.diff().ne(0).cumsum()]).transform('size').ge(3).astype(int)
Out[8]:
id value flag
0 1 1 0
1 2 0 0
2 3 1 0
3 4 0 0
4 5 1 1
5 6 1 1
6 7 1 1
7 8 0 1
8 9 0 1
9 10 1 0
10 11 0 0
While I need a solution like this, where the first occurence is flagged as 0 and 1 starting from the second:
Out[8]:
id value flag
0 1 1 0
1 2 0 0
2 3 1 0
3 4 0 0
4 5 1 0
5 6 1 1
6 7 1 1
7 8 0 0
8 9 0 1
9 10 1 0
10 11 0 0
Create consecutive groups by compared Series.shifted values by not equal and Series.cumsum, create counter by GroupBy.cumcount and compare if greater values like 0 by Series.gt, last map True, False to 1, 0 by casting to integers by Series.astype:
df['flag'] = (df.groupby(df['value'].ne(df['value'].shift()).cumsum())
.cumcount()
.gt(0)
.astype(int))
print (df)
id value flag
0 1 1 0
1 2 0 0
2 3 1 0
3 4 0 0
4 5 1 0
5 6 1 1
6 7 1 1
7 8 0 0
8 9 0 1
9 10 1 0
10 11 0 0
How it working:
print (df.assign(g = df['value'].ne(df['value'].shift()).cumsum(),
counter = df.groupby(df['value'].ne(df['value'].shift()).cumsum()).cumcount(),
mask = df.groupby(df['value'].ne(df['value'].shift()).cumsum()).cumcount().gt(0)))
id value g counter mask
0 1 1 1 0 False
1 2 0 2 0 False
2 3 1 3 0 False
3 4 0 4 0 False
4 5 1 5 0 False
5 6 1 5 1 True
6 7 1 5 2 True
7 8 0 6 0 False
8 9 0 6 1 True
9 10 1 7 0 False
10 11 0 8 0 False
Use groupby.cumcount and a custom grouper:
# group by identical successive values
grp = df['value'].ne(df['value'].shift()).cumsum()
# flag all but the first one (>0)
# convert the booleans True/False to integers 1/0
df['flag'] = df.groupby(grp).cumcount().gt(0).astype(int)
Generic code to skip first N:
N = 1
grp = df['value'].ne(df['value'].shift()).cumsum()
df['flag'] = df.groupby(grp).cumcount().ge(N).astype(int)
Output:
id value flag
0 1 1 0
1 2 0 0
2 3 1 0
3 4 0 0
4 5 1 0
5 6 1 1
6 7 1 1
7 8 0 0
8 9 0 1
9 10 1 0
10 11 0 0
I have a pandas dataframe as follows:
df2
amount 1 2 3 4
0 5 1 1 1 1
1 7 0 1 1 1
2 9 0 0 0 1
3 8 0 0 1 0
4 2 0 0 0 1
What I want to do is replace the 1s on every row with the value of the amount field in that row and leave the zeros as is. The output should look like this
amount 1 2 3 4
0 5 5 5 5 5
1 7 0 7 7 7
2 9 0 0 0 9
3 8 0 0 8 0
4 2 0 0 0 2
I've tried applying a lambda function row-wise like this, but I'm running into errors
df2.apply(lambda x: x.loc[i].replace(0, x['amount']) for i in len(x), axis=1)
Any help would be much appreciated. Thanks
Let's use mask:
df2.mask(df2 == 1, df2['amount'], axis=0)
Output:
amount 1 2 3 4
0 5 5 5 5 5
1 7 0 7 7 7
2 9 0 0 0 9
3 8 0 0 8 0
4 2 0 0 0 2
You can also do it wit pandas.DataFrame.mul() method, like this:
>>> df2.iloc[:, 1:] = df2.iloc[:, 1:].mul(df2['amount'], axis=0)
>>> print(df2)
amount 1 2 3 4
0 5 5 5 5 5
1 7 0 7 7 7
2 9 0 0 0 9
3 8 0 0 8 0
4 2 0 0 0 2
I have the following dataframe:
df = pd.DataFrame(np.array([[4, 1], [1,1], [5,1], [1,3], [7,8], [np.NaN,8]]), columns=['a', 'b'])
a b
0 4 1
1 1 1
2 5 1
3 1 3
4 7 8
5 Nan 8
Now I would like to do a value_counts() on the columns for values from 1 to 9 which should give me the following:
a b
1 2 3
2 0 0
3 0 1
4 1 0
5 1 0
6 0 0
7 1 0
8 0 2
9 0 0
That means I just count the number of occurences of the values 1 to 9 for each column. How can this be done? I would like to get this format so that I can apply afterwards df.plot(kind='bar', stacked=True) to get e stacked bar plot with the discrete values from 1 to 9 at the x axis and the count for a and b on the y axis.
Use pd.value_counts:
df.apply(pd.value_counts).reindex(range(10)).fillna(0)
Use np.bincount on each column:
df.apply(lambda x: np.bincount(x.dropna(),minlength=10))
a b
0 0 0
1 2 3
2 0 0
3 0 1
4 1 0
5 1 0
6 0 0
7 1 0
8 0 2
9 0 0
Alternatively, using a list comprehension instead of apply.
pd.DataFrame([
np.bincount(df[c].dropna(), minlength=10) for c in df
], index=df.columns).T
a b
0 0 0
1 2 3
2 0 0
3 0 1
4 1 0
5 1 0
6 0 0
7 1 0
8 0 2
9 0 0
>>> df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3],
'b': [0,0,1,1,0,0,1,1,0,0,1,1,],
'c': [5,5,5,8,9,9,6,6,7,8,9,9]})
>>> df
a b c
0 1 0 5
1 1 0 5
2 1 1 5
3 1 1 8
4 2 0 9
5 2 0 9
6 2 1 6
7 2 1 6
8 3 0 7
9 3 0 8
10 3 1 9
11 3 1 9
Is there an alternative way to get this output?
>>> pd.pivot_table(df, index=['a','b'], columns='c', aggfunc=len, fill_value=0).reset_index()
c a b 5 6 7 8 9
0 1 0 2 0 0 0 0
1 1 1 1 0 0 1 0
2 2 0 0 0 0 0 2
3 2 1 0 2 0 0 0
4 3 0 0 0 1 1 0
5 3 1 0 0 0 0 2
I have a large df (>~1m lines) with len(df.c.unique()) being 134 so pivot is taking forever.
I was thinking that, given that this result is returned within a second in my actual df:
>>> df.groupby(by = ['a', 'b', 'c']).size().reset_index()
a b c 0
0 1 0 5 2
1 1 1 5 1
2 1 1 8 1
3 2 0 9 2
4 2 1 6 2
5 3 0 7 1
6 3 0 8 1
7 3 1 9 2
whether I could manually construct the desired outcome from this output above
1. Here's one:
df.groupby(by = ['a', 'b', 'c']).size().unstack(fill_value=0).reset_index()
Output:
c a b 5 6 7 8 9
0 1 0 2 0 0 0 0
1 1 1 1 0 0 1 0
2 2 0 0 0 0 0 2
3 2 1 0 2 0 0 0
4 3 0 0 0 1 1 0
5 3 1 0 0 0 0 2
2. Here's another way:
pd.crosstab([df.a,df.b], df.c).reset_index()
Output:
c a b 5 6 7 8 9
0 1 0 2 0 0 0 0
1 1 1 1 0 0 1 0
2 2 0 0 0 0 0 2
3 2 1 0 2 0 0 0
4 3 0 0 0 1 1 0
5 3 1 0 0 0 0 2
I have
{"A":[0,1], "B":[4,5], "C":[0,1], "D":[0,1]}
what I want it
A B C D
0 4 0 0
0 4 0 1
0 4 1 0
0 4 1 1
1 4 0 1
...and so on. Basically all the combinations of values for each of the categories.
What would be the best way to achieve this?
If x is your dict:
>>> pandas.DataFrame(list(itertools.product(*x.values())), columns=x.keys())
A C B D
0 0 0 4 0
1 0 0 4 1
2 0 0 5 0
3 0 0 5 1
4 0 1 4 0
5 0 1 4 1
6 0 1 5 0
7 0 1 5 1
8 1 0 4 0
9 1 0 4 1
10 1 0 5 0
11 1 0 5 1
12 1 1 4 0
13 1 1 4 1
14 1 1 5 0
15 1 1 5 1
If you want the columns in a particular order you'll need to switch them afterwards (with, e.g., df[["A", "B", "C", "D"]].