I wanna get the row number of the dataframe and store it in a new column called Id. Please advise me on how to code it.
Current dataframe:
expected outcome with new Id column:
If the first column is index use DataFrame.insert, if necessary subtract 1:
df.insert(0, 'Id', df.index - 1)
If you need count column for general solution with any index values:
df.insert(0, 'Id', np.arange(len(df)))
import pandas as pd
df = pd.DataFrame({'a':[34, 23,37,38],'b':[1,2,3,4]})
df. set_index(a, inplace=True)
Id = list(df. index)
df['Id'] = id
First, reset an index, it will add as a new column with name index. Then rename the column to the desired name
import pandas as pd
df = (df.reset_index()
.rename(columns= {'index':'ID'}
)
Related
I have a pandas dataframe- got it from API so don't have much control over the structure of it- similar like this:
I want to have datetime a column and value as another column. Any hints?
you can use T to transform the dataframe and then reseindex to create a new index column and keep the current column you may need to change its name form index
df = df.T.reset_index()
df.columns = df.iloc[0]
df = df[1:]
Working with a CSV file in PyCharm. I want to delete the automatically-generated index column. When I print it, however, the answer I get in the terminal is "None". All the answers by other users indicate that the reset_index method should work.
If I just say "df = df.reset_index(drop=True)" it does not delete the column, either.
import pandas as pd
df = pd.read_csv("music.csv")
df['id'] = df.index + 1
cols = list(df.columns.values)
df = df[[cols[-1]]+cols[:3]]
df = df.reset_index(drop=True, inplace=True)
print(df)
I agree with #It_is_Chris. Also,
This is not true because return is None:
df = df.reset_index(drop=True, inplace=True)
It's should be like this:
df.reset_index(drop=True, inplace=True)
or
df = df.reset_index(drop=True)
Since you said you're trying to "delete the automatically-generated index column" I could think of two solutions!
Fist solution:
Assign the index column to your dataset index column. Let's say your dataset has already been indexed/numbered, then you could do something like this:
#assuming your first column in the dataset is your index column which has the index number of zero
df = pd.read_csv("yourfile.csv", index_col=0)
#you won't see the automatically-generated index column anymore
df.head()
Second solution:
You could delete it in the final csv:
#To export your df to a csv without the automatically-generated index column
df.to_csv("yourfile.csv", index=False)
Using df.drop() I removed the "ID" column from the df, and now I want to return that column.
df.drop('ID', axis=1, inplace=True)
df
# shows me df without ID column
What method should I use?
found that all you need to do is to reload again the first command of the import of the df :)
enter image description herePlease i am trying to name the index column but I can't. I want to be a able to name it such that I can reference it to view the index values which are dates. i have tried
df3.rename(columns={0:'Date'}, inplace=True) but it's not working.
Please can someone help me out? Thank you.
Note that the dataframe index cannot be accessed using df['Date'],
I fyou want rename the index, you can use DataFrame.rename_axis:
df=df.rename_axis(index='Date')
if you want to access it as a column then you have to transform it into a column using:
df=df.reset_index()
then you can use:
df['Date']
otherwise you can access the index by:
df.index
As there is no example data frame that you are on, I am listing an arbitrary example to demonstrate the idea.
import datetime as dt
import pandas as pd
data = {'C1' : [1, 2],
'Date' : [dt.datetime.now().strftime('%Y-%m-%d'),dt.datetime.now().strftime('%Y-%m-%d')]}
df = pd.DataFrame(data)
df.index = df["Date"]
del df["Date"]
print(df.index.name) # this will give you the new index column
print(df) #print the dataframe
Goal here is to find the columns that does not exist in df and create them with null values.
I have a list of column names like below:
column_list = ('column_1', 'column_2', 'column_3')
When I try to check if the column exists, it gives out True for only columns that exist and do not get False for those that are missing.
for column in column_list:
print df.columns.isin(column_list).any()
In PySpark, I can achieve this using the below:
for column in column_list:
if not column in df.columns:
df = df.withColumn(column, lit(''))
How can I achieve the same using Pandas?
Here is how I would approach:
import numpy as np
for col in column_list:
if col not in df.columns:
df[col] = np.nan
Using np.isin, assign and unpacking kwargs
s = np.isin(column_list, df.columns)
df = df.assign(**{k:None for k in np.array(column_list)[~s]})