I have created a Dataframe df by merging 2 lists using the following command:
import pandas as pd
df=pd.DataFrame({'Name' : list1,'Probability' : list2})
But I'd like to remove the first column (The index column) and make the column called Name the first column. I tried using del df['index'] and index_col=0. But they didn't work. I also checked reset_index() and that is not what I need. I would like to completely remove the whole index column from a Dataframe that has been created like this (As mentioned above). Someone please help!
You can use set_index, docs:
import pandas as pd
list1 = [1,2]
list2 = [2,5]
df=pd.DataFrame({'Name' : list1,'Probability' : list2})
print (df)
Name Probability
0 1 2
1 2 5
df.set_index('Name', inplace=True)
print (df)
Probability
Name
1 2
2 5
If you need also remove index name:
df.set_index('Name', inplace=True)
#pandas 0.18.0 and higher
df = df.rename_axis(None)
#pandas bellow 0.18.0
#df.index.name = None
print (df)
Probability
1 2
2 5
If you want to save your dataframe to a spreadsheet for a report.. it is possible to format the dataframe to eliminate the index column using xlsxwriter.
writer = pd.ExcelWriter("Probability" + ".xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name='Probability', startrow=3, startcol=0, index=False)
writer.save()
index=False will then save your dataframe without the index column.
I use this all the time when building reports from my dataframes.
I think the best way is to hide the index using the hide_index method
df = df.style.hide_index()
this will hide the index from the dataframe.
Related
Working with a CSV file in PyCharm. I want to delete the automatically-generated index column. When I print it, however, the answer I get in the terminal is "None". All the answers by other users indicate that the reset_index method should work.
If I just say "df = df.reset_index(drop=True)" it does not delete the column, either.
import pandas as pd
df = pd.read_csv("music.csv")
df['id'] = df.index + 1
cols = list(df.columns.values)
df = df[[cols[-1]]+cols[:3]]
df = df.reset_index(drop=True, inplace=True)
print(df)
I agree with #It_is_Chris. Also,
This is not true because return is None:
df = df.reset_index(drop=True, inplace=True)
It's should be like this:
df.reset_index(drop=True, inplace=True)
or
df = df.reset_index(drop=True)
Since you said you're trying to "delete the automatically-generated index column" I could think of two solutions!
Fist solution:
Assign the index column to your dataset index column. Let's say your dataset has already been indexed/numbered, then you could do something like this:
#assuming your first column in the dataset is your index column which has the index number of zero
df = pd.read_csv("yourfile.csv", index_col=0)
#you won't see the automatically-generated index column anymore
df.head()
Second solution:
You could delete it in the final csv:
#To export your df to a csv without the automatically-generated index column
df.to_csv("yourfile.csv", index=False)
I am trying to drop all rows from dataframe where any entry in any column of the row has the value zero.
I am placing a Minimal Working Example below
import pandas as pd
df = pd.read_excel('trial.xlsx',sheet_name=None)
df
I am getting the dataframe as follows
OrderedDict([('Sheet1', type query answers
0 abc 100 90
1 def 0 0
2 ghi 0 0
3 jkl 5 1
4 mno 1 1)])
I am trying to remove the rows using the dropna() using the following code.
df = df.dropna()
df
i am getting an error saying 'collections.OrderedDict' object has no attribute 'dropna''. I tried going through the various answers provided here and here, but the error remains.
Any help would be greatly appreciated!
The reason why you are getting an OrderedDict object is because you are feeding sheet_name=None parameter to the read_excel method of the library. This will load all the sheets into a dictionary of DataFrames.
If you only need the one sheet, specify it in the sheet_name parameter, otherwise remove it to read the first sheet.
import pandas as pd
df = pd.read_excel('trial.xlsx') #without sheet_name will read first sheet
print(type(df))
df = df.dropna()
or
import pandas as pd
df = pd.read_excel('trial.xlsx', sheet_name='Sheet1') #reads specific sheet
print(type(df))
df = df.dropna()
I am doing some analysis on several different categories. I want to all the analysis to be on the same tab in a spreadsheet. So I have two dataframes for the information, but the columns are different and information different.
dataframe 1
colA colB calC
row 1
row 2
row 3
dataframe 2
colD colE calD
row 1
row 2
row 3
I want to export both of these dataframes on one excel sheet one after the other. The analysis are different lengths and I want dataframe 2 to be right below dataframe1 on a sheet.
import pandas
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name=sheetname,startrow=writer.sheets["Sheet1"].max_row, index = False,header= False)
writer.save()
// then do the same steps for any more number of dataframes.
You can add an extra row to the second DataFrame with the values same as the column names. And then simply use pd.concat()
df2.columns = df1.columns
pd.concat([df1, df2])
First make the columns of both the dataframes to be the same and then use pd.concat to append df2 to the end of df1
You can create a new dataframe from this and export it to csv :
df = pd.concat([df1,df2])
df.to_csv('filename.csv')
If you want the header of the second dataframe also in your final csv file, create df2 : df2 = pd.read_csv('df2.csv', names = df1.columns)
df1=pd.DataFrame(np.vstack([df1.columns, df1]))
#this will convert column names into rows
df2=pd.DataFrame(np.vstack([df2.columns, df2]))
#samewith other dataframe
#concat these dataframe and save as excel without index or columns
pd.concat((a,b)).to_excel('filename.xlsx',header=False,index=False)
how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.
I am encountering pretty strange behavior. If I let
dict = {'newcol':[1,5], 'othercol':[12,-10]}
df = pandas.DataFrame(data=dict)
print df['newcol']
I get back a pandas Series object with 1 and 5 in it. Great.
print df
I get back the DataFrame as I would expect. Cool.
But what if I want to add to a DataFrame a little at a time? (My use case is saving metrics for machine learner training runs happening in parallel, where each process gets a number and then adds to only that row of the DataFrame.)
I can do the following:
df = pandas.DataFrame()
df['newcol'] = pandas.Series()
df['othercol'] = pandas.Series()
df['newcol'].loc[0] = 1
df['newcol'].loc[1] = 5
df['othercol'].loc[0] = 12
df['othercol'].loc[1] = -10
print df['newcol']
I get back the pandas Series I would expect, identical to creating the DataFrame by the first method.
print df
I see printed that df is an Empty DataFrame with columns [newcol, othercol].
Clearly in the second method the DataFrame's contents are equivalent to the first method. So why is it not smart enough to know it is filled? Is there a function I can call to update the DataFrame's knowledge of its own Series so all these (possibly out-of-order) Series can be unified in to a consistent DataFrame?
You would be able to assign data to an empty dataframe using following
df = pd.DataFrame()
df['newcol'] = pd.Series()
df['othercol'] = pd.Series()
df.loc[0, 'newcol'] = 1
df.loc[1, 'newcol'] = 5
df.loc[0, 'othercol'] = 12
df.loc[1, 'othercol'] = -10
newcol othercol
0 1.0 12.0
1 5.0 -10.0