Dataframe "name" contains the names of people's first 10 job employers.
I want to retrieve all the names of employers that contain "foundation".
My purpose is to better understand the employers' names that contains "foundation".
Here is the code that I screwed up:
name=employ[['nameCurrentEmployer',
'name2ndEmployer', 'name3thEmployer',
'name4thEmployer', 'name5thEmployer',
'name6thEmployer', 'name7thEmployer',
'name8thEmployer', 'name9thEmployer',
'name10thEmployer']]
print(name.loc[name.str.contains('foundation', case=False)][['Answer.nameCurrentEmployer',
'Answer.nameEighthEmployer', 'Answer.nameFifthEmployer',
'Answer.nameFourthEmployer', 'Answer.nameNinethEmployer',
'Answer.nameSecondEmployer', 'Answer.nameSeventhEmployer',
'Answer.nameSixthEmployer', 'Answer.nameTenthEmployer',
'Answer.nameThirdEmployer']])
And the error is:
AttributeError: 'DataFrame' object has no attribute 'str'
Thank you!
You get AttributeError: 'DataFrame' object has no attribute 'str', because str is an accessor of Series and not DataFrame.
From the docs:
Series.str can be used to access the values of the series as strings
and apply several methods to it. These can be accessed like
Series.str.<function/property>.
So if you have multiple columns like ["name6thEmployer", "name7thEmployer"] and so on in your DataFrame called name, then the naivest way to approach it would be:
columns = ["name6thEmployer", "name7thEmployer", ...]
for column in columns:
# for example, if you just want to count them up
print(name[name[column].str.contains("foundation")][column].value_counts())
Try :
foundation_serie=df['name'].str.contains('foundation', regex=True)
print(df[foundation_serie.values])
Related
I have a dataframe with product name and volumes. I also have two variables with per unit cost.
LVP_Cost=xxxx
HVP_Cost=xxxx
However, I would like to apply the per unit cost only to selected product types. To achive this I am using isin() within a user defined function.
I am getting and error message:
AttributeError: 'str' object has no attribute 'isin'
Here is my code;
LVP_list=['BACS','FP','SEPA']
HVP_list=['HVP','CLS']
def calclate_cost (row):
if row['prod_final'].isin(LVP_list):
return row['volume']*LVP_per_unit_cost
elif row['prod_final']==(HVP_list):
return row['volume']*HVP_per_unit_cost
else:
return 0
mguk['cost_usd']=mguk.apply(calclate_cost,axis=1)
Please could you help
row['prod_final'] is a string containing the value of that column in the current row, not a Pandas series. So use the regular in operator.
if row['prod_final'] in LVP_list:
I'm trying to use iterrows() for a DataFrame... The Column could have a value such as Fred,William,John and I want to count how many names are listed. The following code works great...
for index, row in search_df:
print(len(row["Name"].split(",")))
However, when I try to actually use the value from len(), it will give me errors... Such as:
for index, row in search_df:
row["Number of Names"] = len(row["Name"].split(","))
That will give me an error.. 'float' object has no attribute 'split' or something..
And when I do:
row["Number of Names"] = len(row["Name"].str.split(","))
It will give the error: 'str' object has no attribute 'str'
Looks like a string, but it's a float... Try to treat it as a string, it's already a string... Frustration...
If you are working on dataframe, try this:
df[“Name”].value_counts()
Dont use a loop
Refer - pandas create new column based on values from other columns / apply a function of multiple columns, row-wise
def count_names(row):
return len(row['Name'].split(','))
df["Number of Names"] = df.apply(count_names, axis=1)
Splitting on , and then counting the elements seems inefficient to me. You can use count instead.
search_df['Name'].apply(lambda x: str(x).count(',')+1)
Nevermind.. I worked it out...
for index, row in search_df:
num_language = len(str(row["Language"]).split(","))
search_df.loc[index, "Number of Names"] = num_language
I have some code to convert all columns of my DataFrame of type 'object' to type 'category'. I'm just looping through my DF by column, which has already been filtered to object types, and getting those with low ratio of unique values to just convert those.
converted_obj = pd.DataFrame()
for col in df201911_obj.columns:
num_unique_values = len(df201911_obj[col].unique())
num_total_values = len(df201911_obj[col])
if num_unique_values / num_total_values < 0.5:
converted_obj.loc[:,col] = df201911_obj[col].astype('category')
else:
converted_obj.loc[:,col] = df201911_obj[col]
I am getting an AttributeError: 'DataFrame' object has no attribute 'unique'
I checked all the data type of the columns with a loop, since there are over 100 columns:
for col in df201911_obj.columns:
print(type(col))
And they are all: <class 'str'>
Why am I getting this error?
On Attibute error, try reloading all your libraries first, then reload all your predifined variables or other dataframes in your system, then just delete the code part that is causing the problem and rewrite the code(This fixes any syntax error if any).
I tried the above fix and it worked on
AttributeError: 'DataFrame' object has no attribute 'dublicated'
I was using google colabs, and the problem was with loading the dataframe
I have data frame which looks like:
Now I am comparing whether two columns (i.e. complaint and compliment) have equal value or not: I have written a function:
def col_comp(x):
return x['Complaint'].isin(x['Compliment'])
When I apply this function to dataframe i.e.
df.apply(col_comp,axis=1)
I get an error message
AttributeError: ("'float' object has no attribute 'isin'", 'occurred
at index 0')
Any suggestion where I am making the mistake.
isin requires an iterable. You are providing individual data points (floats) with apply and col_comp. What you should use is == in your function col_comp, instead of isin. Even better, you can compare the columns in one call:
df['Complaint'] == df['Compliment']
i have a csv file with multiple columns containing empty strings. Upon reading the csv into pandas dataframe, the empty strings get converted to NaN.
Now i want to append a string tag- to the strings already present in the columns but to only those that have some values in it and not on those with NaN
this is what i was trying to do:
with open('file1.csv','r') as file:
for chunk in pd.read_csv(file,chunksize=1000, header=0, names=['A','B','C','D'])
if len(chunk) >=1:
if chunk['A'].notna:
chunk['A'] = "tag-"+chunk['A'].astype(str)
if chunk['B'].notna:
chunk['B'] = "tag-"+chunk['B'].astype(str)
if chunk['C'].notna:
chunk['C'] = "tag-"+chunk['C'].astype(str)
if chunk['D'].notna:
chunk['D'] = "tag-"+chunk['D'].astype(str)
and this is the error I'm getting:
AttributeError: 'Series' object has no attribute 'notna'
the final output that i want should be something like this:
A,B,C,D
tag-a,tab-b,tag-c,
tag-a,tag-b,,
tag-a,,,
,,tag-c,
,,,tag-d
,tag-b,,tag-d
I believe you need mask for add tag- to all columns together:
for chunk in pd.read_csv('file1.csv',chunksize=2, header=0, names=['A','B','C','D']):
if len(chunk) >=1:
m1 = chunk.notna()
chunk = chunk.mask(m1, "tag-" + chunk.astype(str))
You need upgrade to last version of pandas, 0.21.0.
You can check docs:
In order to promote more consistency among the pandas API, we have added additional top-level functions isna() and notna() that are aliases for isnull() and notnull(). The naming scheme is now more consistent with methods like .dropna() and .fillna(). Furthermore in all cases where .isnull() and .notnull() methods are defined, these have additional methods named .isna() and .notna(), these are included for classes Categorical, Index, Series, and DataFrame. (GH15001).
The configuration option pd.options.mode.use_inf_as_null is deprecated, and pd.options.mode.use_inf_as_na is added as a replacement.