Save pandas dataframe to file including index [duplicate] - python

This question already has answers here:
How to reversibly store and load a Pandas dataframe to/from disk
(13 answers)
Saving and Loading of dataframe to csv results in Unnamed columns
(4 answers)
Closed 6 months ago.
Which file format can be used to save a Pandas DataFrame object and then loading it back with the proper index? I.e. if column blah was an index before saving it to the file, I want that after loading it back again blah to be an index without me having to tell this to Pandas.

df.to_pickle('file.pickle')
df = pd.read_pickle('file.pickle')

Related

Python Pandas .str.split() creates an extra column that can't be dropped [duplicate]

This question already has answers here:
How to avoid pandas creating an index in a saved csv
(6 answers)
Closed 5 months ago.
I'm using the pandas split function to create new columns from an existing one. All of that works fine and I get my expected columns created. The issue is that it is creating an additional column in the exported csv file. So, it says there are 3 columns when there are actually 4.
I've tried various functions to drop that column, but it isn't recognized as part of the data frame so it can't be successfully removed.
Hopefully someone has had this issue and can offer a possible solution.
[example of the csv data frame output with the unnecessary column added]
The column A doesn't come from split but it's the index of your actual dataframe by default. You can change that by setting index=False in df.to_csv:
df.to_csv('{PATH}.csv', index=False)

Problem using Pandas for joining dataframes [duplicate]

This question already has answers here:
Import multiple CSV files into pandas and concatenate into one DataFrame
(20 answers)
How do I combine two dataframes?
(8 answers)
Closed 8 months ago.
I am trying to join a lot of CSV files into a single dataframe after doing some conversions and filters, when I use the append method for the sn2 dataframe, the exported CSV contains all the data I want, however when I use the append method for the sn3 dataframe, only the data from the last CSV is exported, what am I missing?
sn2=pd.DataFrame()
sn3=pd.DataFrame()
files=os.listdir(load_path)
for file in files:
df_temp=pd.read_csv(load_path+file)
df_temp['Date']=file.split('.')[0]
df_temp['Date']=pd.to_datetime(df_temp['Date'],format='%Y%m%d%H%M')
filter1=df_temp['Name']=='Atribute1'
temp1=df_temp[filter1]
sn2=sn2.append(temp1)
filter2=df_temp['Name']=='Atribute2'
temp2=df_temp[filter2]
sn3=pd.concat([temp2])
You have to pass all the dataframes that you want to concatenate to concat:
sn3 = pd.concat([sn3, temp2])

How do you remove every second row in a pandas dataframe? [duplicate]

This question already has answers here:
pandas read_csv remove blank rows
(4 answers)
Closed 1 year ago.
I’ve read a file into a dataframe, and every second row is n/a. How do I remove the offending blank rows?
I am assuming there are many ways to do this. But I just use iloc
df = df.iloc[::2,:]
Try it and let me know if it worked for you.

How to append pandas dataframe with similar names in a loop? [duplicate]

This question already has answers here:
Using a loop in Python to name variables [duplicate]
(5 answers)
Append multiple pandas data frames at once
(5 answers)
Closed 5 years ago.
I have pandas data frame numbered from x1,x2....x100 with same columns.
I want to append them all using a for loop. How can i do that?
I know how to append two dataframe but how to do it for 100 of them. The main problem here is how can i have a dynamic variable name.
I want to append the data frames not concat.
x=x1.append(x2)
x=x.append(x3)
and so on.
I want to this in a loop.

Not able to view all columns in Pandas Data frame [duplicate]

This question already has answers here:
How do I expand the output display to see more columns of a Pandas DataFrame?
(22 answers)
Closed 4 years ago.
I am trying to output all columns of a data frame .
Here is the code below:
df_advertiser_activity_part_qa = df_advertiser_activity_part.loc[(df_advertiser_activity_part['advertiser_id']==209988 )]
df_advertiser_activity_part_qa.sort(columns ='date_each_day_et')
df_advertiser_activity_part_qa
when I output the data frame not all columns gets displayed . This has 21 columns and between some columns there is just there dots "..." I am using ipython notebook . Is there a way by which this can be ignored.
try:
pandas.set_option('display.max_columns', None)
but depending how many columns you have this is not a good idea. The data is being abbreviated because you have too many columns to fit practically on the screen.
You might be better off saving to a .csv to inspect the data.
df.to_csv('myfile.csv')
or if you have lots of rows:
df.head(1000).to_csv('myfile.csv')

Categories