Iterate through dataframe and select null values

Iterate through dataframe and select null values - python

I am trying to iterate through a dataframe that has null values for the column = [myCol]. I am able to iterate through the dataframe fine, however when I specify I only want to see null values I get an error.
End goal is that I want to force a value into the fields that are Null which is why I am iterating to identify which are first.
for index,row in df.iterrows():
if(row['myCol'].isnull()):
print('true')
AttributeError: 'str' object has no attribute 'isnull'
I tried specifying the column = 'None' since that is the value I see when I print the iteration of the dataframe. Still no luck:
for index,row in df.iterrows():
if(row['myCol']=='None'):
print('true')
No returned rows
Any help greatly appreciated!

You can use pd.isnull() to check if a value is null or not:
for index, row in df.iterrows():
if(pd.isnull(row['myCol'])):
print('true')
But seems like you need df.fillna(myValue) where myValue is the value you want to force into fields that are NULL. And also to check the NULL fields in a data frame you can invoke df.myCol.isnull() instead of looping through rows and check individually.
If the columns are of string type, you might also want check if it is empty string:
for index, row in df.iterrows():
if(row['myCol'] == ""):
print('true')

Related

I do not understand this error in python while trying to remove a record based on a value in a column

I have a table with a column of text data. I want to get the frequency counts for each word so I have this code:
cm9_list = (df.cm9.str.split(expand=True).stack().value_counts()).reset_index()
which produces a dataframe like object. It says object type when I use dtypes. I change the column headers:
cm9_list.columns.values[0] = 'word'
cm9_list.columns.values[1] = 'frequency'
and then I want to remove the record in the table in the word column that has the 'nan' value (I do some text processing before this to strip punctuation and stop words etc. so I think these 'nan' values were inserted in null cells during that process.)
I am getting an error when I try to run this code:
cm9_list = cm9_list[cm9_list.columns[0] != 'nan']
That says:
KeyError: True
And I have also tried:
cm9_list = cm9_list[cm9_list['word'] != 'nan']
and get this:
KeyError: 'word'
I have no idea what these errors mean. All I can think of is that it doesn't recognize word as a column name. When I check the column names though, it looks normal:
Index(['word', 'frequency'], dtype='object')
What could be the issue?
TIA!!

You are putting an expression (cm9_list['word'] != 'nan') that is evaluated as True, and True isn't a key into cm9_list dictionary.
Like the last answer cm9_list dictionary hasn't got a key named "word".

Delete rows in dataframe where a name value in dataframe column equals to the value in a list

Dataframe name is df_buysellrcds:
Values in list. Name is remove_list
Hi all,
Below is my code logic. I want to loop the 5 records in remove_list to remove rows in df_buysellrcds if the Stock column value equals to the value in List, and value in Type column equals to 'Buy'.
for name in remove_list:
for i, row in df_buysellrcds:
if row["Stock"] == name and row["Type"]=="Buy":
df_actualHoldings = df_buysellrcds.drop(df_buysellrcds.index[i])
However I got this error:
File "<ipython-input-10-a399d857a077>", line 2, in <module>
for i, row in df_buysellrcds:
ValueError: too many values to unpack (expected 2)
Any ideas how should I improve my code?
My objective is to
Keep all rows with Type = 'Buy' where Stock value is not in the List
Keep all row with Type = 'Sell'
Remove rows with Type = 'Buy' if the Stock value exists in the list
For example, first 2 records are remove but 3rd record retains for Stock value 'Genting'
Thanks for the help.

Since you're deleting rows while iterating over the dataframe, a ValueError is raised. To overcome this for instance, simply keep track of the indices of the rows to be deleted. However in this situation, the expressiveness of pandas allows us to bypass the iteration.
To keep only the elements whose Type is not Buy or whose Stock is not present in remove_list,
df_filtered = df[(df.Type != "Buy") | ~df.Stock.isin(remove_list)]

Access to DataFrame element by column label containing special character hyphen '-' fails

so i have a
df = read_excel(...)
loop does work:
for i, row in df.iterrows(): #loop through rows
a = df[df.columns].SignalName[i] #column "SignalName" of row i, is read
b = (row[7]) #column "Bus-Signalname" of row i, taken primitively=hardcoded
Access to a is ok, how to replace the hardcoded b = (row[7]) with a dynamically found/located "Bus-Signalname" element from the excel table. Which are the many ways to do this?
b = df[df.columns].Bus-Signalname[i]
does not work.

To access the whole column, run: df['Bus-Signalname'].
So called attribute notation (df.Bus-Signalname) will not work here,
since "-" is not allowed as a part of an attribute name.
It is treated as minus operator, so:
the expression before it is df.Bus, but df probably has no
column with whis name, so an exception is thrown,
what occurs after it (Signalname) is expected to be e.g. a variable,
but you probably have no such variable and this is another reason
which could cause an exception.
Note also that then you wrote [i].
As I understand, i is an integer and you want to access element No i from this column.
Note that the column you retrieved is a Series with index just the
same as your whole DataFrame.
If the index is a default one (consecutive numbers, starting from 0),
you will succeed. Otherwise (if the index does not contain value of i)
you will fail.
A more pandasonic syntax to access an element in a DataFrame is:
df.loc[i, 'Bus-Signalname']
where i is the index of the row in question and Bus-Signalname is the column name.

#Valdi_Bo
thank you. In the loop, both
df.loc[i, 'Bus-Signalname']
and
df['Bus-Signalname'][i]
work.

Errors when looping with iterrows

I have a dataframe with 2 columns and I want to create a 3rd column that returns True or False for each row according to if the value in column A is contained in the value in column B .
Here's my code:
C = []
for index, row in df.iterrows():
if row['A'][index] in row['B'][index]:
C[index] = True
else:
C[index] = False
I get the following errors:
1) TypeError: 'float' object is not subscriptable
2) IndexError: list assignment index out of range
How can I solve these errors?

I think the problem is some values of row['A'] or row['B'] contain float values. This is why when you get that float value you can not subscript it. Otherwise it would be like [float][index] which is what giving error. Are you expecting a string value there? It could be possible not all values are having same data type in the data frame.
Secondly, the index is of the row, I don't now why are you using it like this. For more clarifications I need to take a look at that data, but what seems possible is like even if you got a string or array value for row ['A'], which could be traversed, the index is too large. For ex-
row['A'] = "hello"
a = row['A'][10]
will give you the index error.

Returning unique values in .csv and unique strings in python+pandas

my question is very similar to here: Find unique values in a Pandas dataframe, irrespective of row or column location
I am very new to coding, so I apologize for the cringing in advance.
I have a .csv file which I open as a pandas dataframe, and would like to be able to return unique values across the entire dataframe, as well as all unique strings.
I have tried:
for row in df:
pd.unique(df.values.ravel())
This fails to iterate through rows.
The following code prints what I want:
for index, row in df.iterrows():
if isinstance(row, object):
print('%s\n%s' % (index, row))
However, trying to place these values into a previously defined set (myset = set()) fails when I hit a blank column (NoneType error):
for index, row in df.iterrows():
if isinstance(row, object):
myset.update(print('%s\n%s' % (index, row)))
I get closest to what I was when I try the following:
for index, row in df.iterrows():
if isinstance(row, object):
myset.update('%s\n%s' % (index, row))
However, my set prints out a list of characters rather than the strings/floats/values that appear on my screen when I print above.
Someone please help point out where I fail miserably at this task. Thanks!

I think the following should work for almost any dataframe. It will extract each value that is unique in the entire dataframe.
Post a comment if you encounter a problem, i'll try to solve it.
# Replace all nones / nas by spaces - so they won't bother us later
df = df.fillna('')
# Preparing a list
list_sets = []
# Iterates all columns (much faster than rows)
for col in df.columns:
# List containing all the unique values of this column
this_set = list(set(df[col].values))
# Creating a combined list
list_sets = list_sets + this_set
# Doing a set of the combined list
final_set = list(set(list_sets))
# For completion's sake, you can remove the space introduced by the fillna step
final_set.remove('')
Edit :
I think i know what happens. You must have some float columns, and fillna is failing on those, as the code i gave you was replacing missing values with an empty string. Try those :
df = df.fillna(np.nan) or
df = df.fillna(0)
For the first point, you'll need to import numpy first (import numpy as np). It must already be installed as you have pandas.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterate through dataframe and select null values - python

Related

I do not understand this error in python while trying to remove a record based on a value in a column

Delete rows in dataframe where a name value in dataframe column equals to the value in a list

Access to DataFrame element by column label containing special character hyphen '-' fails

Errors when looping with iterrows

Returning unique values in .csv and unique strings in python+pandas

Categories

Resources