'function' object has no attribute 'str' in pandas - python

I am using below code to read and split the csv file strings separated by /
DATA IS
SRC_PATH TGT_PATH
/users/sn/Retail /users/am/am
/users/sn/Retail Reports/abc /users/am/am
/users/sn/Automation /users/am/am
/users/sn/Nidh /users/am/xzy
import pandas as pd
df = pd.read_csv('E:\RCTemplate.csv',index_col=None, header=0)
s1 = df.SRC_PATH.str.split('/', expand=True)
i get the correct split data in s1, but when i am going to do the similar operation on single row it throws error "'function' object has no attribute 'str'"
error is throwing in below code
df2= [(df.SRC_PATH.iloc[0])]
df4=pd.DataFrame([(df.SRC_PATH.iloc[0])],columns = ['first'])
newvar = df4.first.str.split('/', expand=True)

Pandas thinks you are trying to access the method dataframe.first().
This is why it's best practice to use hard brackets to access dataframe columns rather than .column access
df4['first'].str.split() instead of df4.first.str.split()
Not that this cause common issues with things like a column called 'name' ending up as the name attribute of the dataframe and a host of other problems

Related

Replace string in one part pandas dataframe

print(df["date"].str.replace("2016","16"))
The code above works fine. What I really want to do is to make this replacement in just a small part of the data-frame. Something like:
df.loc[2:4,["date"]].str.replace("2016","16")
However here I get an error:
AttributeError: 'DataFrame' object has no attribute 'str'
What about df['date'].loc[2:4].str.replace('2016', 16')?
By selecting ['date'] first you know you are dealing with a series which does have a string attribute.

Problem with pandas 'to_csv' of 'DataFrameGroupBy' objects)

I want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.
Python 3.7
This is my dataframe
This is my code
groups = clustering_df.groupby(clustering_df['Family Number'])
groups.apply(lambda clustering_df: clustering_df.sort_values(by=['Family Number']))
groups.to_csv('grouped.csv')
Error Message
(AttributeError: Cannot access callable attribute 'to_csv' of 'DataFrameGroupBy' objects, try using the 'apply' method)
You just need to do this:
groups = clustering_df.groupby(clustering_df['Family Number'])
groups = groups.apply(lambda clustering_df: clustering_df.sort_values(by=['Family Number']))
groups.to_csv('grouped.csv')
What you have done is, not saved the groupby-apply variable. It would get applied and might throw output depending on what IDE/Notebook you use. But to save it into a file, you will have to apply the function on the groupby object, save it into a variable and you can save the file.
Chaining works as well:
groups = clustering_df.groupby(clustering_df['Family Number']).apply(lambda clustering_df: clustering_df.sort_values(by=['Family Number']))
groups.to_csv("grouped.csv")

Handling unicode names in DataFrame

I want to convert all my data in a DataFrame to uppercase. When I start conversion from column names I get this error:
Code:
xl = pd.ExcelFile(target_processed_directory + filename)
# check sheet names
print(xl.sheet_names[0])
# sheet to pandas dataframe
df = xl.parse(xl.sheet_names[0])
# make whole dataframe uppercase
df.columns = map(str.upper, df.columns)
Error :
TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode'
When using Pandas you'll want to avoid for loops in Python, and you'll usually want to avoid map() as well. Those are the slow ways to do things, and if you want to build good habits, you'll avoid them whenever you can.
There are fast vectorized string operations available for Pandas string sequences. In this case, you want:
df.columns = df.columns.str.upper()
Docs: http://pandas.pydata.org/pandas-docs/stable/text.html
Try using list comprehension instead of mapping str.upper.
df.columns = [c.upper() for c in df.columns]
In Python 2.7, the distinction between strings and unicode is preventing you from applying a string method to a unicode object, despite the fact that the names of the methods are the same.

AttributeError: 'list' object has no attribute 'rename'

df.rename(columns={'nan': 'RK', 'PP': 'PLAYER','SH':'TEAM','nan':'GP','nan':'G','nan':'A','nan':'PTS','nan':'+/-','nan':'PIM','nan':'PTS/G','nan':'SOG','nan':'PCT','nan':'GWG','nan':'PPG','nan':'PPA','nan':'SHG','nan':'SHA'}, inplace=True)
This is my code to rename the columns according to http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2
I want both the tables to have same column names. I am using python2 in spyder IDE.
When I run the code above, it gives me this error:
AttributeError: 'list' object has no attribute 'rename'
The original question was posted a long time ago, but I just came across the same issue and found the solution here: pd.read_html() imports a list rather than a dataframe
When you do pd.read_html you are creating a list of dataframes since the website may have more than 1 table. Add one more line of code before you try your rename:
dfs = pd.read_html(url, header=0)
and then df = dfs[0] ; you will have the df variable as a dataframe , which will allow you to run the df.rename command you are trying to run in the original question.
this should be able to fix , df is you dataset
df.columns=['a','b','c','d','e','f']

How to resolve AttributeError: 'DataFrame' object has no attribute

I know that this kind of question was asked before and I've checked all the answers and I have tried several times to find a solution but in vain.
In fact I call a Dataframe using Pandas. I've uploaded a csv.file.
When I type data.Country and data.Year, I get the 1st Column and the second one displayed. However when I type data.Number, everytime it gives me this error:
AttributeError: 'DataFrame' object has no attribute 'Number'.
Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.
data = pd.read_csv('/your file name', delim_whitespace=True)
data.Number
now you can run this code with no error.
Quick fix: Change how excel converts imported files. Go to 'File', then 'Options', then 'Advanced'. Scroll down and uncheck 'Use system seperators'. Also change 'Decimal separator' to '.' and 'Thousands separator' to ',' . Then simply 're-save' your file in the CSV (Comma delimited) format. The root cause is usually associated with how the csv file is created. Trust that helps. Point is, why use extra code if not necessary? Cross-platform understanding and integration is key in engineering/development.
I'd like to make it simple for you.
the reason of " 'DataFrame' object has no attribute 'Number'/'Close'/or any col name " is because you are looking at the col name and it seems to be "Number" but in reality it is " Number" or "Number " , that extra space is because in the excel sheet col name is written in that format. You can change it in excel or you can write
data.columns = data.columns.str.strip() / df.columns = df.columns.str.strip()
but the chances are that it will throw the same error in particular in some cases after the query.
changing name in excel sheet will work definitely.
Change ";" for "," in the csv file
I realize this is not the same usecase but this might help:
In my case, my DataFrame object didn't have the column I wanted to do an operation on.
The following conditional statement allowed me to avoid the AttributeError:
if '<column_name>' in test_data.columns:
# do your operation on the column

Categories