Dropping index in DataFrame for CSV file - python

Working with a CSV file in PyCharm. I want to delete the automatically-generated index column. When I print it, however, the answer I get in the terminal is "None". All the answers by other users indicate that the reset_index method should work.
If I just say "df = df.reset_index(drop=True)" it does not delete the column, either.
import pandas as pd
df = pd.read_csv("music.csv")
df['id'] = df.index + 1
cols = list(df.columns.values)
df = df[[cols[-1]]+cols[:3]]
df = df.reset_index(drop=True, inplace=True)
print(df)

I agree with #It_is_Chris. Also,
This is not true because return is None:
df = df.reset_index(drop=True, inplace=True)
It's should be like this:
df.reset_index(drop=True, inplace=True)
or
df = df.reset_index(drop=True)

Since you said you're trying to "delete the automatically-generated index column" I could think of two solutions!
Fist solution:
Assign the index column to your dataset index column. Let's say your dataset has already been indexed/numbered, then you could do something like this:
#assuming your first column in the dataset is your index column which has the index number of zero
df = pd.read_csv("yourfile.csv", index_col=0)
#you won't see the automatically-generated index column anymore
df.head()
Second solution:
You could delete it in the final csv:
#To export your df to a csv without the automatically-generated index column
df.to_csv("yourfile.csv", index=False)

Related

How can i add a column that has the same value

I was trying to add a new Column to my dataset but when i did the column only had 1 index
is there a way to make one value be in al indexes in a column
import pandas as pd
df = pd.read_json('file_1.json', lines=True)
df2 = pd.read_json('file_2.json', lines=True)
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
görüş_column = ['Milet İttifakı']
df3['Siyasi Yönelim'] = görüş_column
As per my understanding, this could be your possible solution:-
You have mentioned these lines of code:-
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
You can modify them into
df3 = pd.concat([df,df2],axis=1) ## axis=1 means second dataframe will add to columns, default value is axis=0 which adds to the rows
Second point is,
df3 = df3.loc[:, ['renderedContent']]
I think you want to write this one , instead of df3=df.loc[:,['renderedContent']].
Hope it will solve your problem.

how can I return column that i already deleted in dataframe pandas

Using df.drop() I removed the "ID" column from the df, and now I want to return that column.
df.drop('ID', axis=1, inplace=True)
df
# shows me df without ID column
What method should I use?
found that all you need to do is to reload again the first command of the import of the df :)

csv file data cleaning process

enter image description here See the attached screenshot. I want to delete all the rows which contain entries from 'Unnamed' column.
i know that the column can be removed by data.drop(data.columns[27], axis=1, inplace=True) but it wont delete the entire rows with it
import pandas as pd
import numpy as np
data = pd.read_csv('/home/syed/ML-Notebook/FL-P1/DATASET_FRAUDE.csv',
engine='python',
encoding=('latin1'),
parse_dates=['FECHA_SINIESTRO','FECHA_INI_VIGENCIA','FECHA_FIN_VIGENCIA','FECHA_DENUNCIO'])
#data.drop(data.columns[27], axis=1, inplace=True)
print(data.info())
df = df[df['Unnamed: 27'].astype(str).map(len) >0]
df
Drop Column:
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
To delete rows macthing a condition you can do:
df = df.drop(df[df.column_name == 'Unnamed'].index)
However this question should be helpfull: Deleting DataFrame row in Pandas based on column value

how to get row number in dataframe and store as Id?

I wanna get the row number of the dataframe and store it in a new column called Id. Please advise me on how to code it.
Current dataframe:
expected outcome with new Id column:
If the first column is index use DataFrame.insert, if necessary subtract 1:
df.insert(0, 'Id', df.index - 1)
If you need count column for general solution with any index values:
df.insert(0, 'Id', np.arange(len(df)))
import pandas as pd
df = pd.DataFrame({'a':[34, 23,37,38],'b':[1,2,3,4]})
df. set_index(a, inplace=True)
Id = list(df. index)
df['Id'] = id
First, reset an index, it will add as a new column with name index. Then rename the column to the desired name
import pandas as pd
df = (df.reset_index()
.rename(columns= {'index':'ID'}
)

Need help to solve the Unnamed and to change it in dataframe in pandas

how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.

Categories