Don't understand this AttributeError: 'function' object has no attribute 'isalpha' - python

Refer the following code
# import
import pandas as pd
import numpy as np
import string
# create data frame
data = {'Name': ['Jas,on', 'Mo.lly', 'Ti;na', 'J:ake', '!Amy', "Myself"]}
df = pd.DataFrame(data, columns = ['Name'])
df
# get cleanName - Function
def getCleanName(pName):
vRetVals = pName.translate(str.maketrans(" ", " ", string.punctuation))
return(vRetVals)
# clean Name
print("PreClean Good Rows", df.shape[0] - df.Name.map(lambda v:v.isalpha()).sum())
df['Name'] = [getCleanName for n in df.Name]
print("PostClean Good Rows", df.shape[0] - df.Name.map(lambda v: v.isalpha()).sum())
Issue
When the below line is run for the first time, it runs properly:
print("PreClean Good Rows", df.shape[0] - df.Name.map(lambda v: v.isalpha()).sum())
when the same line is run for the second time, it gives the following error
AttributeError: 'function' object has no attribute 'isalpha'
Any ideas, what is causing the issue?

You forgot to call getCleanName, so your list ends up a bunch of identical references to the function. Change it to:
df['Name'] = [getCleanName(n) for n in df.Name]
# ^^^ changed
to actually call the function and use the results.

Related

Checking if another DataFrame with same name has been created. Error: 'str' object has no attribute 'append'

I'm running a for loop using pandas that checks if another DataFrame with same name has been created. If it has been created, then just append the values to the correspondent columns. If it has not been created, then create the df and append the values to the named columns.
dflistglobal = []
####
For loop that generate a, b, and c variables every time it runs.
####
###
The following code runs inside the for loop, so that everytime it runs, it should generate a, b, and c, then check if a df has been created with a specific name, if yes, it should append the values to that "listname". If not, it should create a new list with "listname". List name changes everytime I run the code, and it varies but can be repeated during this for loop.
###
if listname not in dflistglobal:
dflistglobal.append(listname)
listname = pd.DataFrame(columns=['a', 'b', 'c'])
listname = listname.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
else:
listname = listname.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
I am getting the following error:
File "test.py", line 150, in <module>
functiontest(image, results, list)
File "test.py", line 68, in funtiontest
listname = listname.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
AttributeError: 'str' object has no attribute 'append'
The initial if statement runs fine, but the else statement causes problems.
Solved this issue by not using pandas dataframes. I looped thru the for loop generating a unique identifier for each listname, then appended a,b,c,listname on a list. At the end you will end up with a large df that can be filtered using the groupby function.
Not sure if this will be helpful for anyone, but avoid creating pandas dfs and using list is the best approach.
That error tells you that listname is a string (and you cannot append a DataFrame to a string).
You may want to check if somewhere in your code you are adding a string to your list dflistglobal.
EDIT: Possible solution
I'm not sure how you are naming your DataFrames, and I don't see how you can access them afterwards.
Instead of using a list, you can store your DataFrames inside a dictionary dict = {"name": df}. This will let you easily access the DataFrames by name.
import pandas as pd
import random
df_dict = {}
# For loop
for _ in range(10):
# Logic to get your variables (example)
a = random.randint(1, 10)
b = random.randint(1, 10)
c = random.randint(1, 10)
# # Logic to get your DataFrame name (example)
df_name = f"dataframe{random.randint(1,10)}"
if df_name not in df_dict.keys():
# DataFrame with same name does not exist
new_df = pd.DataFrame(columns=['a', 'b', 'c'])
new_df = new_df.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
df_dict[df_name] = new_df
else:
# DataFrame with same name exists
updated_df = df_dict[df_name].append({'a':a, 'b':b, 'c':c}, ignore_index=True)
df_dict[df_name] = updated_df
Also, for more info, you may want to visit this question
I hope it was clear and it helps.

TypeError: 'str' object does not support item assignment pandas add column

I'm new with pandas...I'm trying add a new colum to a df (df['new_col']) ..but when I make have this error:
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
df = pd.DataFrame(j['result'])
#add column
df = df['new_col'] = '12'
print(df.head())
Traceback (most recent call last):
File "pandas_csv.py", line 8, in <module>
df = df['new_col'] = '12'
TypeError: 'str' object does not support item assignment
Any idea ?
Just to explain why that's happening, here's a simpler MCVE of the problem:
d = {1: "a"}
d = d[1] = "3"
TypeError: 'str' object does not support item assignment
This happens because, as described here, df = df['new_col'] = '12' is equivilent to:
df = "3"
df['new_col'] = '12'
Now, it should be obvious why the error is happening. df is overwritten with a string before the 'new_col' assignment happens.
Just replace
df = df['new_col'] = '12'
by
df['new_col'] = '12'

Pandas - AttributeError: 'NoneType' object has no attribute 'pipe'

I loaded a csv, tried to pipe some functions and get the following error:
AttributeError: 'NoneType' object has no attribute 'pipe'
df = pd.read_csv('file.csv')
def func1(df):
df['newcol'] = ...some code
def func2(df):
df['newcol2'] = ...some code
(
df.pipe(func1)
.pipe(func2)
)
when I print out df, it prints the dataframe normally. No idea why I get that error. Pandas v0.24.2. Python v3.7
you need to return df from func1 as it is input to func2.
I had same issue as I was not returning df from a function in middle. Hope this helps!!

'Series' object has no attribute 'values_counts'

When I try to apply the values_count() method to series within a function, I am told that 'Series' object has no attribute 'values_counts'.
def replace_1_occ_feat(col_list, df):
for col in col_list:
feat_1_occ = df[col].values_counts()[df[col].values_counts() == 1].index
feat_means = df[col].groupby(col)['SalePrice'].mean()
feat_means_no_1_occ = feat_means.iloc[feat_means.difference(feat_1_occ),:]
for feat in feat_1_occ:
# Find the closest mean SalePrice
replacement = (feat_means_no_1_occ - feat_means.iloc[feat,:]).idxmin()
df.col.replace(feat, replacement, inplace = True)
However when running df.column.values_count() outside a function it works.
The problem occurs on the first line when the values_counts() methods is used.
I checked the pandas version it's 0.23.0.
The function is value_counts(). Note only count is plural.

Problems transforming data in a dataframe

I've written the function (tested and working) below:
import pandas as pd
def ConvertStrDateToWeekId(strDate):
dateformat = '2016-7-15 22:44:09'
aDate = pd.to_datetime(strDate)
wk = aDate.isocalendar()[1]
yr = aDate.isocalendar()[0]
Format_4_5_4_date = str(yr) + str(wk)
return Format_4_5_4_date'
and from what I have seen on line I should be able to use it this way:
ml_poLines = result.value.select('PURCHASEORDERNUMBER', 'ITEMNUMBER', PRODUCTCOLORID', 'RECEIVINGWAREHOUSEID', ConvertStrDateToWeekId('CONFIRMEDDELIVERYDATE'))
However when I "show" my dataframe the "CONFIRMEDDELIVERYDATE" column is the original datetime string! NO errors are given.
I've also tried this:
ml_poLines['WeekId'] = (ConvertStrDateToWeekId(ml_poLines['CONFIRMEDDELIVERYDATE']))
and get the following error:
"ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions." which makes no sense to me.
I've also tried this with no success.
x = ml_poLines.toPandas();
x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])
ml_poLines2 = spark.createDataFrame(x)
ml_poLines2.show()
The above generates the following error:
AttributeError: 'Series' object has no attribute 'isocalendar'
What have I done wrong?
Your function ConvertStrDateToWeekId takes a string. But in the following line the argument of the function call is a series of strings:
x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])
A possible workaround for this error is to use the apply-function of pandas:
x['testDates'] = x['CONFIRMEDDELIVERYDATE'].apply(ConvertStrDateToWeekId)
But without more information about the kind of data you are processing it is hard to provide further help.
This was the work-around that I got to work:
`# convert the confirimedDeliveryDate to a WeekId
x= ml_poLines.toPandas();
x['WeekId'] = x[['ITEMNUMBER', 'CONFIRMEDDELIVERYDATE']].apply(lambda y:ConvertStrDateToWeekId(y[1]), axis=1)
ml_poLines = spark.createDataFrame(x)
ml_poLines.show()`
Not quite as clean as I would like.
Maybe someone else cam propose a cleaner solution.

Categories