Save a pandas dataframe - python

I have a dataframe that I have generated myself like:
I would want to save it using pickle (the only way I know to do that) using:
df.to_pickle(file_name)
To use it then with:
df = pd.read_pickle(file_name)
The problem is that using this I'm getting a file that the other program don't know how to read:
df = pd.read_pickle("dataframe.pkl")
print(df)
I'm getting an AttributeError:
AttributeError: 'DataFrame' object has no attribute '_data'
Thanks for your help.

Related

'DataFrame' object has no attribute '_internal'

I am trying to run the line of code:
pd.get_dummies(pd_df, columns = ['ethnicity'])
However, I keep getting the error 'DataFrame' object has no attribute '_internal'. It looks like its linked to the ...pyspark/pandas/namespace.py file so therefore I am not too sure how to fix it.
Unfortunately, the dataframe itself is private so I can't show/describe it on Stackoverflow however any information about why this could be happening would be greatly appreciated!
I can make the example below work perfectly but it wont work on my code even though it is exactly the same I just have a different DataFrame that has been changed from PySpark to Pandas:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
,"region":["East","North","East","South","West","West","South","West","West","East",np.nan]
}
)
pd.get_dummies(sales_data, columns = ['region'])
I had this same error. I was confusing the execution by using ps (pyspark.pandas) instead of pd (pandas).
Ensure your alias are correct and you're not accidentally renaming a pandas instantiation:
Ex.
import pyspark.pandas as pd

list' object has no attribute 'to_csv

I hve a dataframe where I want to use the group by function for Region column.It works fine in data frame
I am doing
import pandas as pd
#df=pd.read_csv(r'C:\Users\mobeen\Downloads\pminus.csv')
df=pd.read_csv(r'C:\Users\final.csv')
print(df)
df1=[v for k, v in df.groupby('region')]
df1
df1.to_csv('filename2',na_rep='Nan',index=False)
but after that I want to write the output in csv and it throws following error
AttributeError: 'list' object has no attribute 'to_csv'
How can I write it into csv?
I already checked this but it is not working
You are getting subgroup to list, you can try loop the list and export them to csv by appending
dfs = [v for k, v in df.groupby('region')]
for df in dfs:
df.to_csv('filename2', mode='a', na_rep='Nan', index=False)
You are calling to_csv from the module (in pd.to_csv(df1,'filename2',na_rep='Nan',index=False). Call df1.to_csv('filename2',na_rep='Nan',index=False), as that is the actual dataframe.
Although your code does not show you calling to_csv(), the error suggests that you're calling the function directly on the pandas module, i.e. pd.to_csv(). You should call the function on the dataframe, like so: df.to_csv(filename).

TypeError: 'module' object is not callable for time on Koalas dataframe

I am facing a small issue with a line of code that I am converting from pandas into Koalas.
Note: I am executing my code in the databricks.
The following line is pandas code:
input_data['t_avail'] = np.where(input_data['purchase_time'] != time(0, 0), 1, 0)
I did the conversion to Koalas as follows. Just to mention that I already have defined the input_data dataframe as Koalas type before the following line of code.
# Add a new column called 't_avail' in input_data Koalas dataframe
input_data = input_data.assign(
t_avail = (input_data['purchase_time'] != time(0, 0))
)
I get the following error with the Koalas conversion: TypeError: 'module' object is not callable
I am not sure what is the issue with the time module as I just want to assign the t_avail column with entries from the purchase_time column with entries that have a not empty time.
May someone help me resolve the issue? I think I am missing something silly.
Thank you to all.
As you say you import time module in your code.
This is because you write time(0,0).
However, time is a module and you use it as a function
You can use this
input_data = input_data.assign(
t_avail = ((input_data['purchase_time']).str.strip() != "")
)

AttributeError: 'DataFrame' object has no attribute 'save'

I'm trying to save a pandas DataFrame in binary data formats and book says that pandas objects all have save method which writes the data to disc as a pickle. but when I run the code there is an error. Is there save method for pandas objects in pandas new versions? I'm using pandas 0.25.3
import pandas as pd
frame = pd.read_csv('PandasTest.csv')
frame.save('PandasTest_Pickle')
The error is:
AttributeError: 'DataFrame' object has no attribute 'save'
As others in comment section suggested, use 'to_pickle' and 'read_pickle' methods. For e.g,
import pandas as pd
frame=pd.read_csv('data.csv')
frame.to_pickle('frame_pickle')
pd.read_pickle('frame_pickle')

Reading an excel data set saved as CSV file in pandas

There is a very similar question to the one I am about to ask posted here:
Reading an Excel file in python using pandas
Except when I attempt to use the solutions posted here I am countered with
AttributeError: 'DataFrame' object has no attribute 'read'
All I want to do is convert this excel sheet into the pandas format so that I can preform data analysis on some of the subjects of my table. I am super new to this so any information, advice, feedback or whatever that anybody could toss my way would be greatly appreciated.
Heres my code:
import pandas
file = pandas.read_csv('FILENAME.csv', 'rb')
# reads specified file name from my computer in Pandas format
print file.read()
By the way, I also tried running the same query with
file = pandas.read_excel('FILENAME.csv', 'rb') returning the same error.
Finally, when I try to resave the file as a .xlsx I am unable to open the document.
Cheers!
read_csv() return a dataframe by itself so there is no need to convert it, just save it into dataframe.
I think this should work
import pandas as pd #It is best practice to import package with as a short name. Makes it easier to reference later.
file = pd.read_csv('FILENAME.csv')
print (file)
Your error message means exactly what it says: AttributeError: 'DataFrame' object has no attribute 'read'
When you use pandas.read_csv you're actually reading the csv file into a dataframe. BTW, you don't need the 'rb'
df = pandas.read_csv('FILENAME.csv')
You can print (df) but you can not do print(df.read()) because the dataframe object doesn't have a .read() attribute. This is what's causing your error.

Categories