When I try to apply the values_count() method to series within a function, I am told that 'Series' object has no attribute 'values_counts'.
def replace_1_occ_feat(col_list, df):
for col in col_list:
feat_1_occ = df[col].values_counts()[df[col].values_counts() == 1].index
feat_means = df[col].groupby(col)['SalePrice'].mean()
feat_means_no_1_occ = feat_means.iloc[feat_means.difference(feat_1_occ),:]
for feat in feat_1_occ:
# Find the closest mean SalePrice
replacement = (feat_means_no_1_occ - feat_means.iloc[feat,:]).idxmin()
df.col.replace(feat, replacement, inplace = True)
However when running df.column.values_count() outside a function it works.
The problem occurs on the first line when the values_counts() methods is used.
I checked the pandas version it's 0.23.0.
The function is value_counts(). Note only count is plural.
Related
So I have a csv, and I am trying to load it into a dataframe via
df = pd.read_csv("watchlist.csv", sep='\s{2,}',)
It seems to work fine when I print(df)
Also, when I print columns, this is the output I get.
print(df.columns) #- OUTPUT:
Index([',Name,Growth,Recommendation,CurrentRatio,TotalCash,Debt,Revenue,PercentageSharesOut,PercentageInstitutions,PercentageInsiders,PricetoBook,ShortRatio,RegularMarketPrice'], dtype='object')
The trouble I'm having, is that when I try to then go and access a column with something like
med_debt = math.floor(df.Debt), or even
print(df.Debt)
I get an attribute error:
AttributeError: 'DataFrame' object has no attribute 'Debt'
Any assistance here would be appreicated
sep='\s{2,}' parameter will cause column list to become an object of type string, example:
>>> df = pd.read_csv("weather", sep='\s{2,}')
>>> df.columns
Index(['Date/Time,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),
Visibility (km),Stn Press (kPa),Weather'], dtype='object')
>>> df.index
RangeIndex(start=0, stop=8784, step=1)
When you try to access a specific column math.floor(df.Debt) it returns
AttributeError: 'DataFrame' object has no attribute 'Debt'
or maybe df["Debt"]
raise KeyError(key) from err
(KeyError: 'Debt')
To have access on specific columns of df by this way, use:
df = pd.read_csv("watchlist.csv")
The separator is not separating the csv correctly, try leaving it out and letting the csv reader use the default value of , instead.
I have the following code, where I read with Pandas a CSV file to create one 'Dataframe' (the attributeValues) and a 'Series' (the attributeClassValues).
dataSegmentToProcess = pd.read_csv('myCSV', nrows=2000, header=None)
attributeValues = dataSegmentToProcess[dataSegmentToProcess.columns[:-1]]
attributeClassValues = dataSegmentToProcess[dataSegmentToProcess.columns[-1]]
Then, in the for loop, below I got the error "Series objects are mutable, thus they cannot be hashed" in the last line of my code:
means = np.zeros([numberOfUniqueClasses, dimension])
for i in range(numberOfUniqueClasses):
idx = (attributeClassValues == uniqueClasses[i])
means[i, :] = np.mean(attributeValues[idx, :], axis=0)
'numberOfUniqueClasses' and 'dimension' are integers.
The attributeValues is a dataframe, not a series object, so how come I get this error?
Any idea will be welcomed!
PS: I can do my job with the 'read_csv` (without Pandas), but for obvious reasons, I prefer using Pandas.
I'm trying to allow the user to input the attribute for the dataframe object.
I've tried changing my input into a string. I've also tried using my input saved to a variable. Both these options do not work.
data = pd.read_csv('2019FallEnrollees.csv')
input1_col = input("Enter comparison group A: ")
input2_col = input("Enter comparison group B: ")
input1_str= str(input1_col)
input2_str = str(input2_col)
test = data[['CUM_GPA', input1_str, input2_str]]
# error here! 'test' does not have attribute 'input1_str' or 'input1_col'
df_1 = test[(test.input1_str == 0) & (test.input2_str == 0)]
df_2 = test[(test.input1_col == 1) & (test.input2_col == 0)]
print(stats.ttest_ind(df_1.CUM_GPA, df_2.CUM_GPA, equal_var = False))
The error messages says
"AttributeError: 'DataFrame' object has no attribute 'input1_str'
or
"AttributeError: 'DataFrame' object has no attribute 'input1_col'
Welcome!
To access a column in pandas you cannot use data.column
Try data[column] or in your case test[input1_col]
Before you do so, make sure the column does exist and the user is not inputting a nonexistant column.
Sometimes the column name can be an integer and converting to a string may also be a concern
You can get a list of all the dataframe columns through running data.columns (if you want a regular array: list(data.columns)) and infact you can alter the column names through running data.columns = ["Column Header 1" , "Column Header 2" etc.]
Refer the following code
# import
import pandas as pd
import numpy as np
import string
# create data frame
data = {'Name': ['Jas,on', 'Mo.lly', 'Ti;na', 'J:ake', '!Amy', "Myself"]}
df = pd.DataFrame(data, columns = ['Name'])
df
# get cleanName - Function
def getCleanName(pName):
vRetVals = pName.translate(str.maketrans(" ", " ", string.punctuation))
return(vRetVals)
# clean Name
print("PreClean Good Rows", df.shape[0] - df.Name.map(lambda v:v.isalpha()).sum())
df['Name'] = [getCleanName for n in df.Name]
print("PostClean Good Rows", df.shape[0] - df.Name.map(lambda v: v.isalpha()).sum())
Issue
When the below line is run for the first time, it runs properly:
print("PreClean Good Rows", df.shape[0] - df.Name.map(lambda v: v.isalpha()).sum())
when the same line is run for the second time, it gives the following error
AttributeError: 'function' object has no attribute 'isalpha'
Any ideas, what is causing the issue?
You forgot to call getCleanName, so your list ends up a bunch of identical references to the function. Change it to:
df['Name'] = [getCleanName(n) for n in df.Name]
# ^^^ changed
to actually call the function and use the results.
I am working on praising a *.csv file. Therefore I try to create a class which helps me to simplify some operations on DataFrame.
I've created two methods in order to parse a column 'z' that contains values for the 'Price' column.
def subr(self):
isone = self.df.z == 1.0
if isone.any():
atone = self.df.Price[isone].iloc[0]
self.df.loc[self.df.z.between(0.8, 2.5), 'Benchmark'] = atone
# df.loc[(df.r >= .8) & (df.r <= 1.4), 'value'] = atone
return self.df
def obtain_z(self):
"Return a column with z for E_ref"
self.z_col = self.subr()
self.dfnew = self.df.groupby((self.df.z < self.df.z.shift()).cumsum()).apply(self.z_col)
return self.dfnew
def main():
x = ParseDataBase('data.csv')
file_content = x.read_file()
new_df = x.obtain_z()
I'm getting the following error:
'DataFrame' objects are mutable, thus they cannot be hashed
'DataFrame' objects are mutable means that we can change elements of that Frame. I'm not sure when I'm hashing.
I noticed the use of apply(self.z_col) is going wrong.
I also have no clue how to fix it.
You are passing the DataFrame self.df returned by self.subr() to apply, but actually apply only takes functions as parameters (see examples here).