'Series' object has no attribute 'values_counts' - python

When I try to apply the values_count() method to series within a function, I am told that 'Series' object has no attribute 'values_counts'.
def replace_1_occ_feat(col_list, df):
for col in col_list:
feat_1_occ = df[col].values_counts()[df[col].values_counts() == 1].index
feat_means = df[col].groupby(col)['SalePrice'].mean()
feat_means_no_1_occ = feat_means.iloc[feat_means.difference(feat_1_occ),:]
for feat in feat_1_occ:
# Find the closest mean SalePrice
replacement = (feat_means_no_1_occ - feat_means.iloc[feat,:]).idxmin()
df.col.replace(feat, replacement, inplace = True)
However when running df.column.values_count() outside a function it works.
The problem occurs on the first line when the values_counts() methods is used.
I checked the pandas version it's 0.23.0.

The function is value_counts(). Note only count is plural.

Related

I am having troubling indexing a dataframe

So I have a csv, and I am trying to load it into a dataframe via
df = pd.read_csv("watchlist.csv", sep='\s{2,}',)
It seems to work fine when I print(df)
Also, when I print columns, this is the output I get.
print(df.columns) #- OUTPUT:
Index([',Name,Growth,Recommendation,CurrentRatio,TotalCash,Debt,Revenue,PercentageSharesOut,PercentageInstitutions,PercentageInsiders,PricetoBook,ShortRatio,RegularMarketPrice'], dtype='object')
The trouble I'm having, is that when I try to then go and access a column with something like
med_debt = math.floor(df.Debt), or even
print(df.Debt)
I get an attribute error:
AttributeError: 'DataFrame' object has no attribute 'Debt'
Any assistance here would be appreicated
 sep='\s{2,}' parameter will cause column list to become an object of type string, example:
>>> df = pd.read_csv("weather", sep='\s{2,}')
>>> df.columns
Index(['Date/Time,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),
Visibility (km),Stn Press (kPa),Weather'], dtype='object')
>>> df.index
RangeIndex(start=0, stop=8784, step=1)
 When you try to access a specific column math.floor(df.Debt) it returns
AttributeError: 'DataFrame' object has no attribute 'Debt'
or maybe df["Debt"]
raise KeyError(key) from err
(KeyError: 'Debt')
 To have access on specific columns of df by this way, use:
df = pd.read_csv("watchlist.csv")
The separator is not separating the csv correctly, try leaving it out and letting the csv reader use the default value of , instead.

How to filx "'Series' objects are mutable, thus they cannot be hashed" with Pandas and NumPy

I have the following code, where I read with Pandas a CSV file to create one 'Dataframe' (the attributeValues) and a 'Series' (the attributeClassValues).
dataSegmentToProcess = pd.read_csv('myCSV', nrows=2000, header=None)
attributeValues = dataSegmentToProcess[dataSegmentToProcess.columns[:-1]]
attributeClassValues = dataSegmentToProcess[dataSegmentToProcess.columns[-1]]
Then, in the for loop, below I got the error "Series objects are mutable, thus they cannot be hashed" in the last line of my code:
means = np.zeros([numberOfUniqueClasses, dimension])
for i in range(numberOfUniqueClasses):
idx = (attributeClassValues == uniqueClasses[i])
means[i, :] = np.mean(attributeValues[idx, :], axis=0)
'numberOfUniqueClasses' and 'dimension' are integers.
The attributeValues is a dataframe, not a series object, so how come I get this error?
Any idea will be welcomed!
PS: I can do my job with the 'read_csv` (without Pandas), but for obvious reasons, I prefer using Pandas.

Pandas Python user input an attribute for dataframe object

I'm trying to allow the user to input the attribute for the dataframe object.
I've tried changing my input into a string. I've also tried using my input saved to a variable. Both these options do not work.
data = pd.read_csv('2019FallEnrollees.csv')
input1_col = input("Enter comparison group A: ")
input2_col = input("Enter comparison group B: ")
input1_str= str(input1_col)
input2_str = str(input2_col)
test = data[['CUM_GPA', input1_str, input2_str]]
# error here! 'test' does not have attribute 'input1_str' or 'input1_col'
df_1 = test[(test.input1_str == 0) & (test.input2_str == 0)]
df_2 = test[(test.input1_col == 1) & (test.input2_col == 0)]
print(stats.ttest_ind(df_1.CUM_GPA, df_2.CUM_GPA, equal_var = False))
The error messages says
"AttributeError: 'DataFrame' object has no attribute 'input1_str'
or
"AttributeError: 'DataFrame' object has no attribute 'input1_col'
Welcome!
To access a column in pandas you cannot use data.column
Try data[column] or in your case test[input1_col]
Before you do so, make sure the column does exist and the user is not inputting a nonexistant column.
Sometimes the column name can be an integer and converting to a string may also be a concern
You can get a list of all the dataframe columns through running data.columns (if you want a regular array: list(data.columns)) and infact you can alter the column names through running data.columns = ["Column Header 1" , "Column Header 2" etc.]

Don't understand this AttributeError: 'function' object has no attribute 'isalpha'

Refer the following code
# import
import pandas as pd
import numpy as np
import string
# create data frame
data = {'Name': ['Jas,on', 'Mo.lly', 'Ti;na', 'J:ake', '!Amy', "Myself"]}
df = pd.DataFrame(data, columns = ['Name'])
df
# get cleanName - Function
def getCleanName(pName):
vRetVals = pName.translate(str.maketrans(" ", " ", string.punctuation))
return(vRetVals)
# clean Name
print("PreClean Good Rows", df.shape[0] - df.Name.map(lambda v:v.isalpha()).sum())
df['Name'] = [getCleanName for n in df.Name]
print("PostClean Good Rows", df.shape[0] - df.Name.map(lambda v: v.isalpha()).sum())
Issue
When the below line is run for the first time, it runs properly:
print("PreClean Good Rows", df.shape[0] - df.Name.map(lambda v: v.isalpha()).sum())
when the same line is run for the second time, it gives the following error
AttributeError: 'function' object has no attribute 'isalpha'
Any ideas, what is causing the issue?
You forgot to call getCleanName, so your list ends up a bunch of identical references to the function. Change it to:
df['Name'] = [getCleanName(n) for n in df.Name]
# ^^^ changed
to actually call the function and use the results.

Some operations on DataFrame

I am working on praising a *.csv file. Therefore I try to create a class which helps me to simplify some operations on DataFrame.
I've created two methods in order to parse a column 'z' that contains values for the 'Price' column.
def subr(self):
isone = self.df.z == 1.0
if isone.any():
atone = self.df.Price[isone].iloc[0]
self.df.loc[self.df.z.between(0.8, 2.5), 'Benchmark'] = atone
# df.loc[(df.r >= .8) & (df.r <= 1.4), 'value'] = atone
return self.df
def obtain_z(self):
"Return a column with z for E_ref"
self.z_col = self.subr()
self.dfnew = self.df.groupby((self.df.z < self.df.z.shift()).cumsum()).apply(self.z_col)
return self.dfnew
def main():
x = ParseDataBase('data.csv')
file_content = x.read_file()
new_df = x.obtain_z()
I'm getting the following error:
'DataFrame' objects are mutable, thus they cannot be hashed
'DataFrame' objects are mutable means that we can change elements of that Frame. I'm not sure when I'm hashing.
I noticed the use of apply(self.z_col) is going wrong.
I also have no clue how to fix it.
You are passing the DataFrame self.df returned by self.subr() to apply, but actually apply only takes functions as parameters (see examples here).

Categories