I have a pandas dataframe with 2 columns that basically looks like this. If i have true in B I want to have the last non-nan value of A in C. Is that even possible?
Actual table:
A B
0 754 False
1 None False
2 None False
3 None False
4 None True
5 999 False
6 None False
7 None True
8 None False
9 875 False
Wanted table:
A B C
0 754 False 754
1 None False NaN
2 None False NaN
3 None False NaN
4 None True NaN
5 999 False 999
6 None False NaN
7 None True NaN
8 None False NaN
9 875 False NaN
Im unclear as to exactly what you want but what you describe in the text can be achieved by
enter code heredf.loc[df.B, 'C'] = df.loc[df.B].index
for n in df.C:
if not math.isnan(n):
cap = df.C[df.C == n].index[0]
df.loc[cap, 'C'] = df.A[df.A[:cap].last_valid_index()]
output:
A B C
0 754.0 False NaN
1 NaN False NaN
2 NaN False NaN
3 NaN False NaN
4 NaN True 754.0
5 999.0 False NaN
6 NaN False NaN
7 NaN True 999.0
8 NaN False NaN
9 875.0 False NaN
Related
I'm still quite new to Python and programming in general. With luck, I have the right idea, but I can't quite get this to work.
With my example df, I want iteration to start when entry == 1.
import pandas as pd
import numpy as np
nan = np.nan
a = [0,0,4,4,4,4,6,6]
b = [4,4,4,4,4,4,4,4]
entry = [nan,nan,nan,nan,1,nan,nan,nan]
df = pd.DataFrame(columns=['a', 'b', 'entry'])
df = pd.DataFrame.assign(df, a=a, b=b, entry=entry)
I wrote a function, with little success. It returns an error, unhashable type: 'slice'. FWIW, I'm applying this function to groups of various lengths.
def exit_row(df):
start = df.index[df.entry == 1]
df.loc[start:,(df.a > df.b), 'exit'] = 1
return df
Ideally, the result would be as below:
a b entry exit
0 0 4 NaN NaN
1 0 4 NaN NaN
2 4 4 NaN NaN
3 4 4 NaN NaN
4 4 4 1.0 NaN
5 4 4 NaN NaN
6 6 4 NaN 1
7 6 4 NaN 1
Any advice much appreciated. I had wondered if I should attempt a For loop instead, though I often find them difficult to read.
You can use boolean indexing:
# what are the rows after entry?
m1 = df['entry'].notna().cummax()
# in which rows is a>b?
m2 = df['a'].gt(df['b'])
# set 1 where both conditions are True
df.loc[m1&m2, 'exit'] = 1
output:
a b entry exit
0 0 4 NaN NaN
1 0 4 NaN NaN
2 4 4 NaN NaN
3 4 4 NaN NaN
4 4 4 1.0 NaN
5 4 4 NaN NaN
6 6 4 NaN 1.0
7 6 4 NaN 1.0
Intermediates:
a b entry notna m1 m2 m1&m2 exit
0 0 4 NaN False False False False NaN
1 0 4 NaN False False False False NaN
2 4 4 NaN False False False False NaN
3 4 4 NaN False False False False NaN
4 4 4 1.0 True True False False NaN
5 4 4 NaN False True False False NaN
6 6 4 NaN False True True True 1.0
7 6 4 NaN False True True True 1.0
I have a df like this:
Low isLower isPrevHigher isNextHigher
0 22470.0 True False True
1 22480.0 NaN False True
2 22576.6 NaN False True
3 22600.4 NaN False False
4 22583.5 NaN True True
5 22652.2 NaN False True
6 22656.8 NaN False False
7 22646.5 NaN True False
8 22600.0 NaN True False
9 22555.0 NaN True True
10 22580.1 NaN False True
11 22620.0 NaN False True
12 22682.2 NaN False False
13 22681.0 NaN True True
14 22710.8 NaN False False
15 22657.2 NaN True False
16 22623.0 NaN True True
17 22634.0 NaN False True
18 22660.0 NaN False True
19 22673.6 NaN False True
20 22721.2 NaN False False
21 22580.0 NaN True False
22 22552.6 NaN True False
23 22382.6 True True False
24 22353.0 True True False
25 22341.7 True True False
26 22312.4 True True False
**27 22256.4 True True True**
28 22310.6 True False False
29 22286.0 True True True
30 22306.8 True False True
31 22386.3 True False False
I want to drop all rows after the first isLower == True & isPrevHigher == True & isNextHigher == True.
So everything after row 27.
drop_row = df[df[['isLower', 'isPrevHigher', 'isNextHigher']].eq(True).all(axis=1)].index[0]
df = df[df.index <= drop_row]
print(df)
Output:
Low isLower isPrevHigher isNextHigher
0 22470.0 True False True
1 22480.0 NaN False True
2 22576.6 NaN False True
3 22600.4 NaN False False
4 22583.5 NaN True True
5 22652.2 NaN False True
6 22656.8 NaN False False
7 22646.5 NaN True False
8 22600.0 NaN True False
9 22555.0 NaN True True
10 22580.1 NaN False True
11 22620.0 NaN False True
12 22682.2 NaN False False
13 22681.0 NaN True True
14 22710.8 NaN False False
15 22657.2 NaN True False
16 22623.0 NaN True True
17 22634.0 NaN False True
18 22660.0 NaN False True
19 22673.6 NaN False True
20 22721.2 NaN False False
21 22580.0 NaN True False
22 22552.6 NaN True False
23 22382.6 True True False
24 22353.0 True True False
25 22341.7 True True False
26 22312.4 True True False
27 22256.4 True True True
You may want to drop rows on/after the first row with ALL empty values:
# create another data frame
df = pd.DataFrame(
{'direction': ['north', 'east', 'south', None, 'up', 'down'],
'amount': [10, 20, 30, None, 100, 200]})
# does the whole row consist of `None`
df['row_is_none'] = df.isna().all(axis=1)
# calculate the cumulative sum of the new column
df['row_is_non_accum'] = df['row_is_none'].cumsum()
# create boolean mask and perform drop (not shown to save space)
print(df)
direction amount row_is_none row_is_non_accum
0 north 10.0 False 0
1 east 20.0 False 0
2 south 30.0 False 0
3 None NaN True 1
4 up 100.0 False 1
5 down 200.0 False 1
This will find the where all the specified columns = True, then find the lowest index making it into a number variable the using iloc to final all the index's specified
first_all_true = df.iloc[np.where((df['isLower'] == True) & (df['isPrevHigher'] == True) & (df['isNextHigher'] == True))]index[0]
df.iloc[0:first_all_true + 1]
Using boolean indexing with help of all, the boolean NOT (~), and cummin:
df[(~df[['isLower', 'isPrevHigher', 'isNextHigher']].eq(True).all(axis=1)).cummin()]
NB. untested answer
A list has many paths of certain csv's.
How to check if each csv in every loop has any empty columns and delete them if they are.
Code:
for i in list1:
if (list1.columns = '').any():
i.remove that column
Hope this explains what I am talking about.
Sample:
df = pd.DataFrame({
'':list('abcdef'),
'B':[4,5,4,5,5,np.nan],
'C':[''] * 6,
'D':[np.nan] * 6,
'E':[5,3,6,9,2,4],
'F':list('aaabb') + ['']
})
print (df)
B C D E F
0 a 4.0 NaN 5 a
1 b 5.0 NaN 3 a
2 c 4.0 NaN 6 a
3 d 5.0 NaN 9 b
4 e 5.0 NaN 2 b
5 f NaN NaN 4
Removed first column, because empty column name - it means filtering only columns with no empty values with loc and boolean indexing:
df1 = df.loc[:, df.columns != '']
print (df1)
B C D E F
0 4.0 NaN 5 a
1 5.0 NaN 3 a
2 4.0 NaN 6 a
3 5.0 NaN 9 b
4 5.0 NaN 2 b
5 NaN NaN 4
Reoved column C, because filled only empty values - compare all values if not empty values and get at least one True per column by DataFrame.any, also filter by boolean indexing with loc:
df2 = df.loc[:, (df != '').any()]
print (df2)
B D E
0 a 4.0 NaN 5
1 b 5.0 NaN 3
2 c 4.0 NaN 6
3 d 5.0 NaN 9
4 e 5.0 NaN 2
5 f NaN NaN 4
print ((df != ''))
B C D E F
0 True True False True True True
1 True True False True True True
2 True True False True True True
3 True True False True True True
4 True True False True True True
5 True True False True True False
print ((df != '').any())
True
B True
C False
D True
E True
F True
dtype: bool
Removed column D because filled only missing values with function dropna:
df3 = df.dropna(axis=1, how='all')
print (df3)
B C E F
0 a 4.0 5 a
1 b 5.0 3 a
2 c 4.0 6 a
3 d 5.0 9 b
4 e 5.0 2 b
5 f NaN 4
I'm dealing a problem with Excel sheet and python. I have successfully retrieved the specific columns and rows in Excel by pandas. Now I want to display only the rows and columns which has the "none" or "empty" value. Sample image of excel sheet
In above image, I need the rows and columns whose values is none. For example in "estfix" column has several none value so I need to check the column value if it is none i need to print it's corresponding row and column. Hope you understand.
Code I tried:
import pandas as pd
wb= pd.ExcelFile(r"pathname details.xlsx")
sheet_1=pd.read_excel(r"pathname details.xlsx",sheetname=0)
c=sheet_1[["bugid","sev","estfix","manager","director"]]
print(c)
I'm using python 3.6. Thanks in advance!
I expecting output like this:
Here Nan is consider as a None.
Use isnull with any for check at least one True:
a = df[df.isnull().any(axis=1)]
For columns with rows:
b = df.loc[df.isnull().any(axis=1), df.isnull().any()]
Sample:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,np.nan,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,np.nan,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4.0 7 1.0 5 a
1 b NaN 8 3.0 3 a
2 c 4.0 9 NaN 6 a
3 d 5.0 4 7.0 9 b
4 e 5.0 2 1.0 2 b
5 f 4.0 3 0.0 4 b
a = df[df.isnull().any(1)]
print (a)
A B C D E F
1 b NaN 8 3.0 3 a
2 c 4.0 9 NaN 6 a
b = df.loc[df.isnull().any(axis=1), df.isnull().any()]
print (b)
B D
1 NaN 3.0
2 4.0 NaN
Detail:
print (df.isnull())
A B C D E F
0 False False False False False False
1 False True False False False False
2 False False False True False False
3 False False False False False False
4 False False False False False False
5 False False False False False False
print (df.isnull().any(axis=1))
0 False
1 True
2 True
3 False
4 False
5 False
dtype: bool
print (df.isnull().any())
A False
B True
C False
D True
E False
F False
dtype: bool
Lets say I am updating my dataframe with another dataframe (df2)
import pandas as pd
import numpy as np
df=pd.DataFrame({'axis1': ['Unix','Window','Apple','Linux'],
'A': [1,np.nan,1,1],
'B': [1,np.nan,np.nan,1],
'C': [np.nan,1,np.nan,1],
'D': [1,np.nan,1,np.nan],
}).set_index(['axis1'])
print (df)
df2=pd.DataFrame({'axis1': ['Unix','Window','Apple','Linux','A'],
'A': [1,1,np.nan,np.nan,np.nan],
'E': [1,np.nan,1,1,1],
}).set_index(['axis1'])
df = df.reindex(columns=df2.columns.union(df.columns),
index=df2.index.union(df.index))
df.update(df2)
print (df)
Is there a command to get the number of cells that were updated? (changed from Nan to 1)
I want to use this to track changes to my dataframe.
There is no built in method in pandas I can think of, you would have to save the original df prior to the update and then compare, the trick is to ensure that NaN comparisons are treated the same as non-zero values, here df3 is a copy of df prior to the call to update:
In [104]:
df.update(df2)
df
Out[104]:
A B C D E
axis1
A NaN NaN NaN NaN 1
Apple 1 NaN NaN 1 1
Linux 1 1 1 NaN 1
Unix 1 1 NaN 1 1
Window 1 NaN 1 NaN NaN
[5 rows x 5 columns]
In [105]:
df3
Out[105]:
A B C D E
axis1
A NaN NaN NaN NaN NaN
Apple 1 NaN NaN 1 NaN
Linux 1 1 1 NaN NaN
Unix 1 1 NaN 1 NaN
Window NaN NaN 1 NaN NaN
[5 rows x 5 columns]
In [106]:
# compare but notice that NaN comparison returns True
df!=df3
Out[106]:
A B C D E
axis1
A True True True True True
Apple False True True False True
Linux False False False True True
Unix False False True False True
Window True True False True True
[5 rows x 5 columns]
In [107]:
# use numpy count_non_zero for easy counting, note this gives wrong result
np.count_nonzero(df!=df3)
Out[107]:
16
In [132]:
~((df == df3) | (np.isnan(df) & np.isnan(df3)))
Out[132]:
A B C D E
axis1
A False False False False True
Apple False False False False True
Linux False False False False True
Unix False False False False True
Window True False False False False
[5 rows x 5 columns]
In [133]:
np.count_nonzero(~((df == df3) | (np.isnan(df) & np.isnan(df3))))
Out[133]:
5