Drop rows and reset_index in a dataframe [duplicate] - python

This question already has answers here:
Pandas reset index is not taking effect [duplicate]
(4 answers)
Closed 5 days ago.
This post was edited and submitted for review 5 days ago.
I was wondering why reset_index() has no effect in the following piece of code.
data = [0,10,20,30,40,50]
df = pd.DataFrame(data, columns=['Numbers'])
df.drop(df.index[2:4], inplace=True)
df.reset_index()
df
Numbers
0 0
1 10
4 40
5 50
UPDATE:
If I use df.reset_index(inplace=True), I see a new column which is not desired.
index Numbers
0 0 0
1 1 10
2 4 40
3 5 50

Because reset_index() has inplace=False as default, so you need to do reset_index(inplace=True). Docs

Please try this code
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({'column_name': [1, 2, 0, 4, 0, 6]})
# drop rows where column 'column_name' has value of 0
df = df[df['column_name'] != 0]
# reset the index of the resulting DataFrame
df = df.reset_index(drop=True)
print(df)

Related

Doing .diff() on pandas column(s) gives wrong output? [duplicate]

This question already has answers here:
Subtract consecutive columns in a Pandas or Pyspark Dataframe
(2 answers)
Closed 2 years ago.
I am trying to take the difference of a column using .diff() in a dataframe with a date column and a value column.
import pandas as pd
d = {'Date':['11/11/2011', '11/12/2011', '11/13/2011'], 'a': [2, 3,4]}
df1 = pd.DataFrame(data=d)
df1.diff(axis = 1)
Pandas gives me this output:
Date a
0 11/11/2011 2
1 11/12/2011 3
2 11/13/2011 4
Which is the df1 and not the difference where I expect the output to be:
Date a
0 11/11/2011 NaN
1 11/12/2011 1
2 11/13/2011 1
df1.set_index('Date').diff(axis = 0) saves the day
axis=1 means you are subtracting columns not rows. Your target result is related to rows. Use axis=0 instead.
Second, it is not correct to do subtractions over strings. It will throw an error since python does not support that.

Pandas set index or reindex without changing the order of the data frame

Hello I have a dataframe I sorted so the index is not in order so I want to reorder the index so that sorted values have an index that is sequential I have not been able to figure this out should I remove the index or is there a way to set the index? When I reindex it should sorts by the index which unsorts by index.
Solution
I made some dummy data to show this. I hope this answers your question. Leave comments if you have any questions.
import pandas as pd
df = pd.DataFrame({'x': [1,2,3], 'y': [120, 8, 32]})
df = df.reset_index(drop=False).rename(columns={'index': 'ID'})
df = df.sort_values(by='y', ascending=True)
# After Sorting
print(df)
print("-----------------------")
# After Recovering
print(df.reindex(df.ID.to_list()).drop(columns='ID'))
Output:
ID x y
1 1 2 8
2 2 3 32
0 0 1 120
-----------------------
x y
1 2 8
2 3 32
0 1 120
df=df.reset_index(drop=True)? – ansev 1 min ago

How to add a column onto a dataframe to sum the rows [duplicate]

This question already has answers here:
Find the max of two or more columns with pandas
(3 answers)
Closed 3 years ago.
I am looking to add in a column to my data frame in order to sum the total of all values in a respective row
For example:
1 2 Column I Want to Add
4 9 13
7 1 8
You can df.sum(axis = 1), which will create a new column (not a row):
import pandas as pd
df = pd.DataFrame({1: [4, 7], 2: [9, 1]})
df['COLUMN I Want to Add'] = df.sum(axis = 1)
print(df)
Output:
1 2 COLUMN I Want to Add
0 4 9 13
1 7 1 8
df['new_col'] = df['col1'] + df['col2']

simple pivot dataframe in pandas [duplicate]

This question already has answers here:
How to switch columns rows in a pandas dataframe
(2 answers)
Closed 4 years ago.
Have a simple df:
df = pd.DataFrame({"v": [1, 2]}, index = pd.Index(data = ["a", "b"], name="colname"))
Want to reshape it to look like this:
a b
0 1 2
How do I do that? I looked at the docs for pd.pivot and pd.pivot_table but
df.reset_index().pivot(columns = "colname", values = "v")
produces a df that has NaNs obviously.
update: i want dataframes not series because i am going to concatenate a bunch of them together to store results of a computation.
From your setup
v
colname
a 1
b 2
Seems like you need to transpose
>>> df.T
or
>>> df.transpose()
Which yield
colname a b
v 1 2
You can always reset the index to get 0 and set the column name to None to get your expected output
ndf = df.T.reset_index(drop=True)
ndf.columns.name = None
a b
0 1 2
How about:
df.T.reset_index(drop=True)
[out]
colname a b
0 1 2

Adding values to end of pandas data frame from beginning of data frame [duplicate]

This question already has answers here:
Create a Pandas Dataframe by appending one row at a time
(31 answers)
Closed 5 years ago.
I have a pandas data frame that looks something like this:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
And I would like to add row 0 to the end of the data frame and to get a new data frame that looks like this:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 1 2 3
What can I do in pandas to do this?
You can try:
df = df.append(df.iloc[0], ignore_index=True)
If you are inserting data from a list, this might help -
import pandas as pd
df = pd.DataFrame( [ [1,2,3], [2,5,7], [7,8,9]], columns=['A', 'B', 'C'])
print(df)
df.loc[-1] = [1,2,3] # list you want to insert
df.index = df.index + 1 # shifting index
df = df.sort_index() # sorting by index
print(df)

Categories