I'm trying to use iterrows() for a DataFrame... The Column could have a value such as Fred,William,John and I want to count how many names are listed. The following code works great...
for index, row in search_df:
print(len(row["Name"].split(",")))
However, when I try to actually use the value from len(), it will give me errors... Such as:
for index, row in search_df:
row["Number of Names"] = len(row["Name"].split(","))
That will give me an error.. 'float' object has no attribute 'split' or something..
And when I do:
row["Number of Names"] = len(row["Name"].str.split(","))
It will give the error: 'str' object has no attribute 'str'
Looks like a string, but it's a float... Try to treat it as a string, it's already a string... Frustration...
If you are working on dataframe, try this:
df[“Name”].value_counts()
Dont use a loop
Refer - pandas create new column based on values from other columns / apply a function of multiple columns, row-wise
def count_names(row):
return len(row['Name'].split(','))
df["Number of Names"] = df.apply(count_names, axis=1)
Splitting on , and then counting the elements seems inefficient to me. You can use count instead.
search_df['Name'].apply(lambda x: str(x).count(',')+1)
Nevermind.. I worked it out...
for index, row in search_df:
num_language = len(str(row["Language"]).split(","))
search_df.loc[index, "Number of Names"] = num_language
Related
How can i filter a query and then do a group by
df.query("'result_margin' > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()
I am getting this error
TypeError: '>' not supported between instances of 'str' and 'int'
I am trying to filter where result_margin is greater than 100 then groupby with the columns specified and print records
Using 'result_margin' would treat it as a string, and not refer to the columns.
You would need to remove the quotes:
df.query("result_margin > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()
Or if you might have columns that contain spaces, than add backticks:
df.query("`result_margin` > 100").groupby(['city','season','toss_winner','toss_decision','winner'])['winner'].size()
You need to convert 'result_margin' to int. Try:
df['result_margin'] = df['result_margin'].astype(int)
For the filter, I always create a new dataframe.
df_new = df[df['result_margin']>100].groupby['city','season','toss_winner','toss_decision','winner']).agg(WinnerCount = pd.NamedAgg(column='winner',aggfunc='count'))
I don't use the size method but instead opt for using the agg method and create a new column. You can also try replacing the
agg(WinnerCount = pd.NamedAgg(column='winner',aggfunc='count'))
with
['winner'].size
Datset
I'm trying to check for a win from the WINorLOSS column, but I'm getting the following error:
Code and Error Message
The variable combined.WINorLOSS seems to be a Series type object and you can't compare an iterable (like list, dict, Series,etc) with a string type value. I think you meant to do:
for i in combined.WINorLOSS:
if i=='W':
hteamw+=1
else:
ateamw+=1
You can't compare a Series of values (like your WINorLOSS dataframe column) to a single string value. However you can use the following to counts the 'L' and 'W' in your columns:
hteamw = combined['WINorLOSS'].value_counts()['W']
hteaml = combined['WINorLOSS'].value_counts()['L']
I have a dataframe df and one of the features called mort_acc have missing data. I want to filter out those rows that contains missing data for mort_acc and I used the following way
df[df['mort_acc'].apply(lambda x:x == " ")]
It didn't work. I got output 0. So I used the following lambda way
df[df['mort_acc'].apply(lambda x:len(x)<0)]
It didn't work too and this time got error object of type 'float' has no len()
So I tried this way
df[df['mort_acc'].apply(lambda x:x == NaN)]
Error happened again name 'NaN' is not defined
Does anyone know how to do it?
bad_values_row_mask = df['mort_acc'].isna()
df[bad_values_row_mask]
sounds like what you want I guess
there is no datatype as NaN in python use pd.isna() to check if it's nan.
df[df['mort_acc'].apply(lambda x:pd.isna(x))]
This will give you rows where the column value is having NaN values.
df[df.mort_acc.isnull()]
So I have a df with three columns: The first contains a name, the second an ID, and the third a list of IDs (delimited by commas). For guys with an identical name in the first column, I'd like to check if the ID in the second column of the one guy appears in the list of IDs in the third column of the other guy.
name id id2
Gabor 665 123
Hoak 667 100,111,112
Sherr 668 1,2,3
Hoak 669 667,500,600
Rine 670 73331,999
Rine 671 670,15
So basically I'd like python to note that there's two guys called "Hoak" and check if the id 667 of Hoak No.1 appears in the other Hoak's id2-list (which it does). I've tried to start with a cheap approach that does it manually for whatever name I specify, let's say for "Hoak" (i=1):
import pandas as pd
df = pd.read_excel (...)
for i in range(0,len(df)):
if df['name'][i] == df['name'][1]:
if df['id'][1] in df['id2'][i]:
print(i)
However, I'm getting
TypeError: argument of type 'float' is not iterable
I've tried to add all sorts of variations, like .string or str(), or things like if (df['id2'][i]).str.contains("667"), but I can't work it out, getting erros like
AttributeError: 'float' object has no attribute 'string'
Thanks for your help
You need to set dtype in read_excel to avoid float problems.
Data type to force. Only a single dtype is allowed. If None, infer
import pandas as pd
import numpy as np
df = pd.read_excel(io="test.xls", header=0, dtype={'name': np.str, 'id': np.str, 'id2': np.str})
for i in range(0,len(df)):
if df['name'][i] == df['name'][1]:
if df['id'][1] in df['id2'][i]:
print(i)
Next you need correct the search algorithm.
A more pandas-style approach is to group the rows by name and see if the set of all IDs in each group intersects with the set of all ID2s in the same group:
df['id2'] = df['id2'].astype(str).str.split(',').apply(set)
df['id'] = df['id'].astype(str) # if needed
df.groupby('name')\
.apply(lambda x: set(x['id']) & set.union(*x['id2']))
#name
#Gabor {}
#Hoak {667}
#Rine {670}
#Sherr {}
try chaging this condition
if df['id'][1] in df['id2'][i]:
with this
if isinstance(df['id2'][i], list) and df['id'][1] in df['id2'][i]:
...
elif df['id'][1] == df['id2'][i] :
...
the problem is maybe that when you go through rows with one value only it wont take it as a list but as a float value, so you can't iterate through it
df = pd.read_excel is showing up as a float, per your error messages. Have you tried to just print out the i at your first loop? Work your way down through your nested for-loops once that bug is gone.
To solve the first bug, you need to set dtype in read_excel to avoid float problems.
What I want to do I figured would look like this:
(t in df[self.target]).any()
But I am getting:
AttributeError: 'bool' object has no attribute 'any'
You can use Pandas str methods (docs).
df[self.target].str.contains(t).any()
I'm assuming it's a pandas DataFrame. Try this
(df[self.target] == t).any()
EDIT:
any((t in k for k in df[self.target]))