I have a dataframe which has 14343 rows. But, when I check df.info() it shows 14365 rows as after the last row, there are cells which explain the column names and df considers it as a row. I tried the following code but it seems it did not work: df.drop(df.index[14344, ])
Related
I am trying to delete the second row of pandas dataframe but I was unable to do so.
Code used:
df1.drop([1,],axis=0, inplace=True)
Note: The second row is empty that's why I want it gone.
This one works for me
df1.drop(1, inplace=True)
I am trying to format a data frame from 2 rows to 1 rows. but I am encountering some issues. Do you have any idea on how to do that? Here the code and df:
Thanks!
If you are looking to convert two rows into one, you can do the following...
Stack the dataframe and reset the index at level=1, which will convert the data and columns into a stack. This will end up having each of the column headers as a column (called level_1) and the data as another column(called 0)
Then set the index as level_1, which will move the column names as index
Remove the index name (level_1). Then transpose the dataframe
Code is shown below.
df3=df3.stack().reset_index(level=1).set_index('level_1')
df3.index.name = None
df3=df3.T
Output
df3
I've got a non-standard Excel table with the help of openpyxl. I've done some part on the way to convert it to pandas dataframe. But now I'm stuck with this problem.
I want to select just a range of columns rows and get data from them. Like take cells from 4 to 12 row, and column from j to x. I hope you understand me.
Sorry for my English.
You can try something like that:
df = pd.read_excel('data.xlsx', skiprows=4, usecols=['J:X'], nrows=9)
If the number of rows is not fixed, you can use your second column as delimiter.
df = pd.read_excel('data.xlsx', skiprows=4, usecols=['J:X'])
df = df[df.iloc[:, 1].notna()]
you could skip the rows as you read the excel file to a Dataframe and initially drop the first 4 rows and then manipulate the Dataframe as follows.
first line is reading the file by skipping the first 4 rows
second line is dropping a range of rows from the dataframe (startRow and endRow being the integer values of the row index)
third line is dropping 2 columns from the dataframe
df = pd.read_excel('fileName.xlsx', skiprows=4)
df.drop([startRow, endRow], inplace=True)
df.drop(['column1', 'column2'], axis=1)
I see a lot of questions related to dropping rows that have a certain value in a column, or dropping the entirety of columns, but pretend we have a Pandas Dataframe like the one below.
In this case, how could one write a line to go through the CSV, and drop all rows like 2 and 4? Thank you.
You could try
~((~df).all(axis=1))
to get the rows that you want to keep/drop. To get the dataframe with just those rows, you would use
df = df[~((~df).all(axis=1))]
A more detailed explanation is here:
Delete rows from a pandas DataFrame based on a conditional expression involving len(string) giving KeyError
This should help
for i in range(df.shape[0]):
value=df.shape[1]
count=0
for column_name in column_names:
if df.loc[[i]].column_name==False:
count=count+1
if count==value:
df.drop(index=i,inplace=True)
I have a dataframe df with two columns date and data. I want to take the first difference of the data column and add it as a new column.
It seems that df.set_index('date').shift() or df.set_index('date').diff() give me the desired result. However, when I try to add it as a new column, I get NaN for all the rows.
How can I fix this command:
df['firstdiff'] = df.set_index('date').shift()
to make it work?