AttributeError: 'list' object has no attribute 'rename' - python

df.rename(columns={'nan': 'RK', 'PP': 'PLAYER','SH':'TEAM','nan':'GP','nan':'G','nan':'A','nan':'PTS','nan':'+/-','nan':'PIM','nan':'PTS/G','nan':'SOG','nan':'PCT','nan':'GWG','nan':'PPG','nan':'PPA','nan':'SHG','nan':'SHA'}, inplace=True)
This is my code to rename the columns according to http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2
I want both the tables to have same column names. I am using python2 in spyder IDE.
When I run the code above, it gives me this error:
AttributeError: 'list' object has no attribute 'rename'

The original question was posted a long time ago, but I just came across the same issue and found the solution here: pd.read_html() imports a list rather than a dataframe
When you do pd.read_html you are creating a list of dataframes since the website may have more than 1 table. Add one more line of code before you try your rename:
dfs = pd.read_html(url, header=0)
and then df = dfs[0] ; you will have the df variable as a dataframe , which will allow you to run the df.rename command you are trying to run in the original question.

this should be able to fix , df is you dataset
df.columns=['a','b','c','d','e','f']

Related

AttributeError: 'DataFrame' object has no attribute 'dtype' error in pyspark

I have categoryDf which is spark Dataframe and its being printed successfully:
categoryDf.limit(10).toPandas()
I want to join this to another sparkdataframe. So, I tried this:
df1=spark.read.parquet("D:\\source\\202204121920-seller_central_opportunity_explorer_niche_summary.parquet")
#df1.limit(5).toPandas()
df2=df1.join(categoryDf,df1["category_id"] == categoryDf["cat_id"])
df2.show()
When I use df2.show() then I see the output as:
The join is happening succesfully.But when I tried to change it into df2.limit(10).toPandas(), I see the error:
AttributeError: 'DataFrame' object has no attribute 'dtype' error in pyspark
I want to see how the data looks after join. So, I tried to use df2.limit(10).toPandas(). Or is there any other method to see the data since my join is happening successfully?
My python version is:3.7.7
Spark version is:2.4.4
I faced the same problem, in my case it was because I had duplicate column names after the join.
I see you have report_date and marketplaceid in both dataframes. For each duplicated pair, you need to either drop one or both, or rename one of them.

Getting attribute error: Series object has no attribute 'explode' [duplicate]

This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 2 years ago.
I am trying to run python script which I am using explode(). In my local it is working fine but when I am trying to run on server it is giving error.
I am using below code:
df_main1 = (df_main1.set_index(['rule_id', 'applied_sql_function1', 'input_condition', 'input_value', 'and_or_not_oprtor', 'output_condition', 'priority_order']).apply(lambda x: x.astype(str).str.split(',').explode()).reset_index())
Error I am getting:
("'Series' object has no attribute 'explode'", u'occurred at index comb_fld_order')
Problem is different versions of pandas, because Series.explode working in later versions only:
New in version 0.25.0.
Try:
df_main1 = (df_main1.set_index(['rule_id', 'applied_sql_function1', 'input_condition', 'input_value', 'and_or_not_oprtor', 'output_condition', 'priority_order'])[col].str.split(',', expand=True).stack()
Where col is the name of the string column, which you wish to split and explode.
Generally expand will do the horizontal explode, while stack will move everything into one column.
I used to below code to get rid off from explode():
df_main1 = (df_main1.set_index(['rule_id', 'applied_sql_function1', 'input_condition', 'input_value', 'and_or_not_oprtor', 'output_condition', 'priority_order'])['comb_fld_order']
.astype(str)
.str.split(',', expand=True)
.stack()
.reset_index(level=-1, drop=True)
.reset_index(name='comb_fld_order'))

'function' object has no attribute 'str' in pandas

I am using below code to read and split the csv file strings separated by /
DATA IS
SRC_PATH TGT_PATH
/users/sn/Retail /users/am/am
/users/sn/Retail Reports/abc /users/am/am
/users/sn/Automation /users/am/am
/users/sn/Nidh /users/am/xzy
import pandas as pd
df = pd.read_csv('E:\RCTemplate.csv',index_col=None, header=0)
s1 = df.SRC_PATH.str.split('/', expand=True)
i get the correct split data in s1, but when i am going to do the similar operation on single row it throws error "'function' object has no attribute 'str'"
error is throwing in below code
df2= [(df.SRC_PATH.iloc[0])]
df4=pd.DataFrame([(df.SRC_PATH.iloc[0])],columns = ['first'])
newvar = df4.first.str.split('/', expand=True)
Pandas thinks you are trying to access the method dataframe.first().
This is why it's best practice to use hard brackets to access dataframe columns rather than .column access
df4['first'].str.split() instead of df4.first.str.split()
Not that this cause common issues with things like a column called 'name' ending up as the name attribute of the dataframe and a host of other problems

Problem with pandas 'to_csv' of 'DataFrameGroupBy' objects)

I want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.
Python 3.7
This is my dataframe
This is my code
groups = clustering_df.groupby(clustering_df['Family Number'])
groups.apply(lambda clustering_df: clustering_df.sort_values(by=['Family Number']))
groups.to_csv('grouped.csv')
Error Message
(AttributeError: Cannot access callable attribute 'to_csv' of 'DataFrameGroupBy' objects, try using the 'apply' method)
You just need to do this:
groups = clustering_df.groupby(clustering_df['Family Number'])
groups = groups.apply(lambda clustering_df: clustering_df.sort_values(by=['Family Number']))
groups.to_csv('grouped.csv')
What you have done is, not saved the groupby-apply variable. It would get applied and might throw output depending on what IDE/Notebook you use. But to save it into a file, you will have to apply the function on the groupby object, save it into a variable and you can save the file.
Chaining works as well:
groups = clustering_df.groupby(clustering_df['Family Number']).apply(lambda clustering_df: clustering_df.sort_values(by=['Family Number']))
groups.to_csv("grouped.csv")

pd.merge throwing error while executing through .bat file

Python script does not run while executing in a bat file, but runs seamlessly on the editor.
The error is related to datatype difference in pd.merge script. Although the datatype given to both the columns is same in both the dataframes.
df2a["supply"] = df2a["supply"].astype(str)
df2["supply_typ"] = df2["supply_typ"].astype(str)
df2a["supply_typ"] = df2a["supply_typ"].astype(str)
df = (pd.merge(df2,df2a, how=join,on=
['entity_id','pare','grome','buame','tame','prd','gsn',
'supply','supply_typ'],suffixes=['gs2','gs2x']))
While running the bat file i am getting following error in pd.merge:
You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat
Not a direct answer, but contains code that cannot be formatted in a comment, and should be enough to solve the problem.
When pandas says that you are trying to merge on float64 and object columns, it is certainly right. It may not be evident because pandas relies on numpy, and that a numpy object column can store any data.
I ended with a simple function to diagnose all those data type problem:
def show_types(df):
for i,c in enumerate(df.columns):
print(df[c].dtype, type(df.iat[0, i]))
It shows both the pandas datatype of the columns of a dataframe, and the actual type of the first element of the column. It can help do see the difference between columns containing str elements and other containing datatime.datatime ones, while the datatype is just objects.
Use that on both of your dataframes, and the problem should become evident...

Categories