Removing index column in pandas when reading a csv - python

I have the following code which imports a CSV file. There are 3 columns and I want to set the first two of them to variables. When I set the second column to the variable "efficiency" the index column is also tacked on. How can I get rid of the index column?
df = pd.DataFrame.from_csv('Efficiency_Data.csv', header=0, parse_dates=False)
energy = df.index
efficiency = df.Efficiency
print efficiency
I tried using
del df['index']
after I set
energy = df.index
which I found in another post but that results in "KeyError: 'index' "

When writing to and reading from a CSV file include the argument index=False and index_col=False, respectively. Follows an example:
To write:
df.to_csv(filename, index=False)
and to read from the csv
df.read_csv(filename, index_col=False)
This should prevent the issue so you don't need to fix it later.

df.reset_index(drop=True, inplace=True)

DataFrames and Series always have an index. Although it displays alongside the column(s), it is not a column, which is why del df['index'] did not work.
If you want to replace the index with simple sequential numbers, use df.reset_index().
To get a sense for why the index is there and how it is used, see e.g. 10 minutes to Pandas.

You can set one of the columns as an index in case it is an "id" for example.
In this case the index column will be replaced by one of the columns you have chosen.
df.set_index('id', inplace=True)

If your problem is same as mine where you just want to reset the column headers from 0 to column size. Do
df = pd.DataFrame(df.values);
EDIT:
Not a good idea if you have heterogenous data types. Better just use
df.columns = range(len(df.columns))

you can specify which column is an index in your csv file by using index_col parameter of from_csv function
if this doesn't solve you problem please provide example of your data

One thing that i do is df=df.reset_index()
then df=df.drop(['index'],axis=1)

To remove or not to create the default index column, you can set the index_col to False and keep the header as Zero. Here is an example of how you can do it.
recording = pd.read_excel("file.xls",
sheet_name= "sheet1",
header= 0,
index_col= False)
The header = 0 will make your attributes to headers and you can use it later for calling the column.

It works for me this way:
Df = data.set_index("name of the column header to start as index column" )

Related

How do I split rows in DataFrame?

I want to split the rows while maintaing the values.
How can I split the rows like that?
The data frame below is an example.
the output that i want to see
You can use the pd.melt( ). Read the documentation for more information: https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I tried working on your problem.
import pandas as pd
melted_df = data.melt(id_vars=['value'], var_name="ToBeDropped", value_name="ID1")
This would show a warning because of the unambiguity in the string passed for "value_name" argument. This would also create a new column which I have assigned the name already. The new column will be called 'ToBeDropped'. Below code will remove the column for you.
df = melted_df.drop(columns = ['ToBeDropped'])
'df' will be your desired output.
via wide_to_long:
df = pd.wide_to_long(df, stubnames='ID', i='value',
j='ID_number').reset_index(0)
via set_index and stack:
df = df.set_index('value').stack().reset_index(name='IDs').drop('level_1', 1)
via melt:
df = df.melt(id_vars='value', value_name="ID1").drop('variable', 1)

Pandas DF NotImplementedError: Writing to Excel with MultiIndex columns and no index ('index'=False) is not yet implemented

I have these lines of code reading and writing an excel:
df = pd.read_excel(file_path, sheet_name, header=[0, 1])
df.to_excel(output_path, index=False)
When it tries to write the excel I get the following error:
NotImplementedError: Writing to Excel with MultiIndex columns and no index ('index'=False) is not yet implemented
I have no idea why this is happening, and I cannot find a concrete answer online.
Please help.
You can simply set index=True instead of False
That is because you are having multi index in your dataframe.
You can either reset_index() or drop your level=1 index.
If you don't want index then you can insert a column as index
df.insert(0,df.index,inplace=False) #something like this
Multi-index columns can actually be exported to Excel. You just have to set index=True.
So for the example the solution becomes...
df = pd.read_excel(file_path, sheet_name, header=[0, 1])
df.to_excel(output_path, index=True)
NB. This is true as of Pandas version 1.2.0
Multi-index columns cannot be exported to Excel. It is possible to transform the multi-index to single index, and then export it to excelc.
df = df.reset_index()
df.to_excel('file_name.xlsx', index=False)

How to drop the first row number column pandas?

This question may sound similar to other questions posted, but I'm posting this after searching long for this exact solution.
So, I've a JSON from which I'm creating a pandas dataframe:
col_list = ["allocation","completion_date","has_expanded_access"]
final_data = dict((k,d[k]) for k in (col_list) if k in d)
a = json_normalize(final_data)
And then this:
I tried saving with:
df = df.reset_index(drop=True)
And
df = df.rename_axis(None)
As suggested on few answers, but of no use, when I try to save it, this default first column containing row index comes with header as blank (null), even if I try to drop, it doesn't work. Any help?
Try
df.to_csv('df_name.csv', sep = ';', encoding = 'cp1251', index = False)
to save df without indices.
Or change index column with
df.set_index('col_name')
If you want to save the dataframe as csv file then you can do this:
df.to_csv(filename, index=False)

Remove all rows in .csv except first that feature a duplicate cell in column

Due to some regex error I have many rows in a .csv file that are the same but with slightly different formatting, the URL is always common variable. I need to find all duplicated of the url in the column "tx" and delete all other than the first one.
.csv is ~50k rows. System is Windows.
What I tried:
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("dupes.csv")
# dropping ALL duplicte values
df = data.drop_duplicates(subset ="TxHash\tx", keep = "first", inplace = True)
data.to_csv('nodupes.csv', index=False)
All columns have /t at the end, unsure how to get rid of them, have also tried numerous variations, including setting new headers with Pandas. Have tried many solutions but most result into this error:
raise KeyError(diff)
KeyError: Index(['TxHash\t'], dtype='object')
Default separator in read_csv is ,, so for tab is necessary add sep='\t' and also for inplace operation is returned None, so possible 2 solutions are remove it or not assign back:
data = pd.read_csv("dupes.csv", sep='\t')
df = data.drop_duplicates(subset ="TxHash")
print (df)
data.drop_duplicates(subset ="TxHash", inplace=True)
print (data)

Need help to solve the Unnamed and to change it in dataframe in pandas

how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.

Categories