Count items with condition in DataFrame Python

Count items with condition in DataFrame Python - python

I have a DataFrame like this:
index column1 column2 column3
1 30 55 62
2 69 20 40
3 23 62 23
...
May I know how to count the number of values which are > 50 for all elements in the above table?
I'm trying below method:
count = 0
for column in df.items():
count += df[df[column] > 50][column].count()
Is this a proper way to do it? Or any other more effective suggestion?

You can just check all the values at once and then sum() them since True evaluates to 1 and False to 0:
df.gt(50).sum().sum()

(df > 54).values.sum() will do what you're looking for here is the total code to get the results:
>>> df = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB'))
>>> df
A B
0 68 92
1 47 53
2 5 35
3 75 82
4 51 89
>>> (df > 54).values.sum()
5
>>>
Basically what I'm doing is creating a mask of true false values of the entire data frame based on the condition in this case > 54 and then just rolling up the data frame because true/false is equal to 1/0 when added.

Related

How do I create a new column with values from the next row of another column in python?

Say I have a dataframe as such:
id
pos_X
pos_y
1
100
0
2
68
17
3
42
28
4
94
35
5
15
59
6
84
19
This is my desired dataframe:
id
pos_X
pos_y
pos_xend
pos_yend
1
100
0
68
17
2
42
28
94
35
3
15
59
84
19
Basically the new column will have the values from the next row. How can I do this?

You can use a pivot:
out = (df
.drop(columns='id')
.assign(idx=np.arange(len(df))//2,
col=np.where(np.arange(len(df))%2, '', 'end'))
.pivot(index='idx', columns='col')
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
)
output:
pos_X pos_Xend pos_Y pos_Yend
idx
0 68 100 17 0
1 94 42 35 28
2 64 15 19 59

You only need create a new DataFrame. You can build it with a "for" that travell around the old dataframe
import pandas as pf
old_datas = {'id':[1,2],'pos_x':[100,68],'pos_y':[0,17]}
old_df = pf.DataFrame(data=old_datas)
new_pos_x = []
new_pos_y = []
pos_xend = []
pos_yend = []
new_id =[]
for i in range(len(old_df)):
if i%2:
new_pos_x.append(old_df.iloc[i]['pos_x'])
new_pos_y.append(old_df.iloc[i]['pos_y'])
new_id.append(i)
else:
pos_xend.append(old_df.iloc[i]['pos_x'])
pos_yend.append(old_df.iloc[i]['pos_y'])
new_datas = {'id':new_id,'pos_x':new_pos_x,'pos_y':new_pos_y,'pos_xend':pos_xend,'pos_yend':pos_yend}
new_df = pf.DataFrame(data=new_datas)
print(new_df)

# select alternate row, first starting from 0, second from 1
# reset index and concat based on the index
# choose columns to use in concat
df2=pd.concat(
[df.loc[ ::2][['id','pos_X', 'pos_y']].reset_index(drop=True) ,
df.loc[1::2][['pos_X', 'pos_y']] .reset_index(drop=True) .add_suffix('end')
],
axis=1)
# reset the ID column
df2['id']=np.arange(0, len(df2))
df2
or
# drop the extra column after concat
df2=pd.concat([df.loc[ ::2].reset_index(drop=True) ,
df.loc[1::2].reset_index(drop=True) .add_suffix('end')
],
axis=1).drop(columns='idend')
# reset the ID column
df2['id']=np.arange(0, len(df2))
df2
id pos_X pos_y pos_Xend pos_yend
0 0 100 0 68 17
1 1 42 28 94 35
2 2 15 59 84 19

Selecting odd and even rows then concatenating them would solve the problem. Something like this
import pandas as pd
df = pd.DataFrame({'X':[100,68,12,6,21] , 'Y':[0,17,32,23,14]})
print(df)
# select even and odd rows
even_df = df.iloc[::2].reset_index(drop=True)
odd_df = df.iloc[1::2].reset_index(drop=True) # odd
# concatente columns
result = pd.concat( [even_df, odd_df], axis=1)
print(result)

I think you are taking alternate rows, so i would suggest something like this, considering data to be a pandas dataframe:
df = """your data"""
posx_end = df.loc[df['id'] % 2 ==0 ]['pos_X'].values
posy_end = df.loc[df['id'] % 2 ==0 ]['pos_y'].values
df = df.loc[df['id'] % 2 !=0].copy()
df['posx_end'] = posx_end
df['posy_end'] = posy_end
edit:
add the following lines as well for id column formatting
df['id'] = range(1, len(df)+1)
df.set_index('id', inplace=True)
result:
id pos_X pos_y posx_end posy_end
1 100 0 68 17
2 42 28 94 35
3 15 59 84 19

Pandas Multiindex get values from first entry of index

I have the following multiindex dataframe:
from io import StringIO
import pandas as pd
datastring = StringIO("""File,no,runtime,value1,value2
A,0, 0,12,34
A,0, 1,13,34
A,0, 2,23,34
A,1, 6,23,38
A,1, 7,22,38
B,0,17,15,35
B,0,18,17,35
C,0,34,23,32
C,0,35,21,32
""")
df = pd.read_csv(datastring, sep=',')
df.set_index(['File','no',df.index], inplace=True)
>> df
runtime value1 value2
File no
A 0 0 0 12 34
1 1 13 34
2 2 23 34
1 3 6 23 38
4 7 22 38
B 0 5 17 15 35
6 18 17 35
C 0 7 34 23 32
8 35 21 32
What I would like to get is just the first values of every entry with a new file and a different number
A 0 34
A 1 38
B 0 35
C 0 32
The most similar questions I could find where these
Resample pandas dataframe only knowing result measurement count
MultiIndex-based indexing in pandas
Select rows in pandas MultiIndex DataFrame
but I was unable to construct a solution from them. The best I got was the ix operation, but as the values technically are still there (just not on display), the result is
idx = pd.IndexSlice
df.loc[idx[:,0],:]
could, for example, filter for the 0 value but would still return the entire rest of the dataframe.
Is a multiindex even the right tool for the task at hand? How to solve this?

Use GroupBy.first by first and second level of MultiIndex:
s = df.groupby(level=[0,1])['value2'].first()
print (s)
File no
A 0 34
1 38
B 0 35
C 0 32
Name: value2, dtype: int64
If need one column DataFrame use one element list:
df1 = df.groupby(level=[0,1])[['value2']].first()
print (df1)
value2
File no
A 0 34
1 38
B 0 35
C 0 32
Another idea is remove 3rd level by DataFrame.reset_index and filter by Index.get_level_values with boolean indexing:
df2 = df.reset_index(level=2, drop=True)
s = df2.loc[~df2.index.duplicated(), 'value2']
print (s)
File no
A 0 34
1 38
B 0 35
C 0 32
Name: value2, dtype: int64

For the sake of completeness, I would like to add another method (which I would not have found without the answere by jezrael).
s = df.groupby(level=[0,1])['value2'].nth(0)
This can be generalized to finding any, not merely the first entry
t = df.groupby(level=[0,1])['value1'].nth(1)
Note that the selection was changed from value2 to value1 as for the former, the results of nth(0) and nth(1) would have been identical.
Pandas documentation link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.nth.html

Generate Column Value in Pandas based on previous rows

Let us assume I am taking a temperature measurement on a regular interval and recording the values in a Pandas Dataframe
day temperature [F]
0 89
1 91
2 93
3 88
4 90
Now I want to create another column which is set to 1 if and only if the two previous values are above a certain level. In my scenario I want to create a column value of 1 if the two consecutive values are above 90, thus yielding
day temperature Above limit?
0 89 0
1 91 0
2 93 1
3 88 0
4 91 0
5 91 1
6 93 1
Despite some SO and Google digging, it's not clear if I can use iloc[x], loc[x] or something else in a for loop?

You are looking for the shift function in pandas.
import io
import pandas as pd
data = """
day temperature Expected
0 89 0
1 91 0
2 93 1
3 88 0
4 91 0
5 91 1
6 93 1
"""
data = io.StringIO(data)
df = pd.read_csv(data, sep='\s+')
df['Result'] = ((df['temperature'].shift(1) > 90) & (df['temperature'] > 90)).astype(int)
# Validation
(df['Result'] == df['Expected']).all()

Try this:
df = pd.DataFrame({'temperature': [89, 91, 93, 88, 90, 91, 91, 93]})
limit = 90
df['Above'] = ((df['temperature']>limit) & (df['temperature'].shift(1)>limit)).astype(int)
df
In the future, please include the code to testing (in this case the df construction line)

df['limit']=""
df.iloc[0,2]=0
for i in range (1,len(df)):
if df.iloc[i,1]>90 and df.iloc[i-1,1]>90:
df.iloc[i,2]=1
else:
df.iloc[i,2]=0
Here iloc[i,2] refers to ith row index and 2 column index(limit column). Hope this helps

Solution using shift():
>> threshold = 90
>> df['Above limit?'] = 0
>> df.loc[((df['temperature [F]'] > threshold) & (df['temperature [F]'].shift(1) > threshold)), 'Above limit?'] = 1
>> df
day temperature [F] Above limit?
0 0 89 0
1 1 91 0
2 2 93 1
3 3 88 0
4 4 90 0

Try using rolling(window = 2) and then apply() as follows:
df["limit"]=df['temperature'].rolling(2).apply(lambda x: int(x[0]>90)&int(x[-1]> 90))

how to add complementary intervals in pandas dataframe

Lets say that I have a signal of 100 samples L=100
In this signal I found some intervals that I label as "OK". The intervals are stored in a Pandas DataFrame that looks like this:
c = pd.DataFrame(np.array([[10,26],[50,84]]),columns=['Start','End'])
c['Value']='OK'
How can I add the complementary intervals in another dataframe in order to have something like this
d = pd.DataFrame(np.array([[0,9],[10,26],[27,49],[50,84],[85,100]]),columns=['Start','End'])
d['Value']=['Check','OK','Check','OK','Check']

You can use the first Dataframe to create the second one and merge like suggested #jezrael :
d = pd.DataFrame({"Start":[0] + sorted(pd.concat([c.Start , c.End+1])), "End": sorted(pd.concat([c.Start-1 , c.End]))+[100]} )
d = pd.merge(d, c, how='left')
d['Value'] = d['Value'].fillna('Check')
d = d.reindex_axis(["Start","End","Value"], axis=1)
output
Start End Value
0 0 9 Check
1 10 26 OK
2 27 49 Check
3 50 84 OK
4 85 100 Check

I think you need:
d = pd.merge(d, c, how='left')
d['Value'] = d['Value'].fillna('Check')
print (d)
Start End Value
0 0 9 Check
1 10 26 OK
2 27 49 Check
3 50 84 OK
4 85 100 Check
EDIT:
You can use numpy.concatenate with numpy.sort, numpy.column_stack and DataFrame constructor for new df. Last need merge with fillna by dict for column for replace:
s = np.sort(np.concatenate([[0], c['Start'].values, c['End'].values + 1]))
e = np.sort(np.concatenate([c['Start'].values - 1, c['End'].values, [100]]))
d = pd.DataFrame(np.column_stack([s,e]), columns=['Start','End'])
d = pd.merge(d, c, how='left').fillna({'Value':'Check'})
print (d)
Start End Value
0 0 9 Check
1 10 26 OK
2 27 49 Check
3 50 84 OK
4 85 100 Check
EDIT1 :
For d was added new values by loc, rehape to Series by stack and shift. Last create df back by unstack:
b = c.copy()
max_val = 100
min_val = 0
c.loc[-1, 'Start'] = max_val + 1
a = c[['Start','End']].stack(dropna=False).shift().fillna(min_val - 1).astype(int).unstack()
a['Start'] = a['Start'] + 1
a['End'] = a['End'] - 1
a['Value'] = 'Check'
print (a)
Start End Value
0 0 9 Check
1 27 49 Check
-1 85 100 Check
d = pd.concat([b, a]).sort_values('Start').reset_index(drop=True)
print (d)
Start End Value
0 0 9 Check
1 10 26 OK
2 27 49 Check
3 50 84 OK
4 85 100 Check

Pandas individual item using index and column

I have a csv file test.csv. I am trying to use pandas to select items dependent on whether the second value is above a certain value. Eg
index A B
0 44 1
1 45 2
2 46 57
3 47 598
4 48 5
So what i would like is if B is larger than 50 then give me the values in A as an integer which I could assign a variable to
edit 1:
Sorry for the poor explanation. The final purpose of this is that I want to look in table 1:
index A B
0 44 1
1 45 2
2 46 57
3 47 598
4 48 5
for any values above 50 in column B and get the column A value and then look in table 2:
index A B
5 44 12
6 45 13
7 46 14
8 47 15
9 48 16
so in the end i want to end up with the value in column B of table two which i can print out as an integer and not as a series. If this is not possible using panda then ok but is there a way to do it in any case?

You can use dataframa slicing, to get the values you want:
import pandas as pd
f = pd.read_csv('yourfile.csv')
f[f['B'] > 50].A
in this code
f['B'] > 50
is the condition, returning a booleans array of True/False for all values meeting the condition or not, and then the corresponding A values are selected
This would be the output:
2 46
3 47
Name: A, dtype: int64
Is this what you wanted?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count items with condition in DataFrame Python - python

You can just check all the values at once and then sum() them since True evaluates to 1 and False to 0: df.gt(50).sum().sum()

Related

How do I create a new column with values from the next row of another column in python?

Pandas Multiindex get values from first entry of index

Generate Column Value in Pandas based on previous rows

how to add complementary intervals in pandas dataframe

Pandas individual item using index and column

Categories

Resources