How to resolve AttributeError: 'DataFrame' object has no attribute - python

I know that this kind of question was asked before and I've checked all the answers and I have tried several times to find a solution but in vain.
In fact I call a Dataframe using Pandas. I've uploaded a csv.file.
When I type data.Country and data.Year, I get the 1st Column and the second one displayed. However when I type data.Number, everytime it gives me this error:
AttributeError: 'DataFrame' object has no attribute 'Number'.

Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})

I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.

data = pd.read_csv('/your file name', delim_whitespace=True)
data.Number
now you can run this code with no error.

Quick fix: Change how excel converts imported files. Go to 'File', then 'Options', then 'Advanced'. Scroll down and uncheck 'Use system seperators'. Also change 'Decimal separator' to '.' and 'Thousands separator' to ',' . Then simply 're-save' your file in the CSV (Comma delimited) format. The root cause is usually associated with how the csv file is created. Trust that helps. Point is, why use extra code if not necessary? Cross-platform understanding and integration is key in engineering/development.

I'd like to make it simple for you.
the reason of " 'DataFrame' object has no attribute 'Number'/'Close'/or any col name " is because you are looking at the col name and it seems to be "Number" but in reality it is " Number" or "Number " , that extra space is because in the excel sheet col name is written in that format. You can change it in excel or you can write
data.columns = data.columns.str.strip() / df.columns = df.columns.str.strip()
but the chances are that it will throw the same error in particular in some cases after the query.
changing name in excel sheet will work definitely.

Change ";" for "," in the csv file

I realize this is not the same usecase but this might help:
In my case, my DataFrame object didn't have the column I wanted to do an operation on.
The following conditional statement allowed me to avoid the AttributeError:
if '<column_name>' in test_data.columns:
# do your operation on the column

Related

Why Doesn't Python Recognize the Column Name (KeyError)

I imported stock/options data into a data frame and want to use pandas to manually filter for specific criteria. I renamed a few columns and then later on I tried to do a bit of cleaning so I can work with the data.
I tried to replace percentage signs then convert the data type to a float by doing this:
df = df['IV'].str.rstrip("%").astype(float)
df = df['IV_Rank'].str.rstrip("%").astype(float)/100
df = df['IV PCT'].str.rstrip("%").astype(float)/100
When I run that code I get the error message: KeyError: 'IV'. I got this error for the other columns as well when I tried to run them each independently but I tried copy then pasting the column name as well as trying the old names. I am not too sure what to do but some help would be appreciated
That's because you are overwriting the entire dataframe. This is what I think you are trying to do
df['IV'] = df['IV'].str.rstrip("%").astype(float)
df['IV_Rank'] = df['IV_Rank'].str.rstrip("%").astype(float)/100
df['IV PCT'] = df['IV PCT'].str.rstrip("%").astype(float)/100

'function' object has no attribute 'str' in pandas

I am using below code to read and split the csv file strings separated by /
DATA IS
SRC_PATH TGT_PATH
/users/sn/Retail /users/am/am
/users/sn/Retail Reports/abc /users/am/am
/users/sn/Automation /users/am/am
/users/sn/Nidh /users/am/xzy
import pandas as pd
df = pd.read_csv('E:\RCTemplate.csv',index_col=None, header=0)
s1 = df.SRC_PATH.str.split('/', expand=True)
i get the correct split data in s1, but when i am going to do the similar operation on single row it throws error "'function' object has no attribute 'str'"
error is throwing in below code
df2= [(df.SRC_PATH.iloc[0])]
df4=pd.DataFrame([(df.SRC_PATH.iloc[0])],columns = ['first'])
newvar = df4.first.str.split('/', expand=True)
Pandas thinks you are trying to access the method dataframe.first().
This is why it's best practice to use hard brackets to access dataframe columns rather than .column access
df4['first'].str.split() instead of df4.first.str.split()
Not that this cause common issues with things like a column called 'name' ending up as the name attribute of the dataframe and a host of other problems

Pandas apply giving float error when operation is performed on string

I have a dataframe with a column test that contains test names, which I am using that to extract some information about what grade a test was written. Because I know that the string used for the test name always has the grade in it as the next digit after the date I have been extracting that data using this line of code:
df['Grade'] = df['test'].apply(lambda x: str(list(filter(str.isdigit, x[10:]))[0]))
This line, however, gives a TypeError: 'float' object is not subscriptable. Now, I should note that before I ran this, I did a check with df.dtypes and the column test was listed as object. That makes sense, as the string for the test names are something like 2015-2016_math_grade_7, so there is no way it could be seen as a float by Pandas.
I have checked, and test names are the only data in that column, so I have no idea why I am getting this type error. No matter what I change the code to, I get this error because I need to perform a string operation after x[:10]. (I have used df['Grade'] = df['test'].apply(lambda x: str(re.sub("\D", "", str(x[:10])))))
I should also note, that I have used this code before and it worked perfectly, but for some reason on this data set it seems to fail, if that helps.

pandas module to trim columns in python

Any idea why below code can't keep the first column of my csv file? I would like to keep several columns in a new csv file, first column included. And if I select the name of first column to be on new file.
I get an error :
"Type" not index.
import pandas as pd
f = pd.read_csv("1.csv")
keep_col = ['Type','Pol','Country','User Site Code','PG','Status']
new_f = f[keep_col]
new_f.to_csv("2.csv", index=False)
Thanks a lot.
Try f.columns.values.tolist() and check the output of the first column. It sounds like there is an encoding issue when you are reading the CSV. You can try specifying the "encoding" option in your pd.read_csv() to see if that will get rid of the extra characters at the front. Otherwise, you can use f.rename(columns={'F48FBFBFType':'Type'} to change whatever the current name of your first column is to simply be 'Type'.
You are better off by specifying the columns to read from your csv file.
pd.read_csv('1.csv', names=keep_col).to_csv("2.csv", index=False)
Do you have any special characters in your first column?

find column name in dataframe

Using ipython for interactive manipulation, the autocomplete feature helps expanding columns names quickly.
But given the column object, I'd like to get it's name but I haven't found a simple way to do it. Is there one?
I'm trying to avoid typing the full "ALongVariableName"
x = "ALongVariableName"
relevantColumn = df[x]
instead I type "df.AL<\Tab>" to get my series. So I have:
relevantColumn = df.ALongVariableName #now how can I get x?
But that series object doesn't carry its name or index in the dataframe. Did I miss it?
Thanks!

Categories