Reset column index of pandas dataframe

Reset column index of pandas dataframe - python

Is it possible to reset columns so they becomes first row of DataFrame. For example,
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
a b
0 1 4
1 2 5
2 3 6
Desired ouput,
df2 = df.reset_column() ???
0 1
0 a b
1 1 4
2 2 5
3 3 6

Can also chain reset.index
df.T.reset_index().T.reset_index(drop=True)
0 1
0 a b
1 1 4
2 2 5
3 3 6

Use
In [57]: pd.DataFrame(np.vstack([df.columns, df]))
Out[57]:
0 1
0 a b
1 1 4
2 2 5
3 3 6

Inserting column names at the first row and resetting the indices.
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df.loc[-1] = df.columns
df.index = df.index + 1
df = df.sort_index()
df.columns = [0,1]
df
0 1
0 a b
1 1 4
2 2 5
3 3 6

Related

Pandas - Attach column to a DataFrame

I have two dataframes, which for simplicity look like:
A B C D E
1 2 3 4 5
5 4 3 2 1
1 3 5 7 9
9 7 5 3 1
And the second one looks like:
F
0
1
0
1
So, both dataframes have the SAME number of rows.
I want to attach column F to the first dataframe:
A B C D E F
1 2 3 4 5 0
5 4 3 2 1 1
1 3 5 7 9 0
9 7 5 3 1 1
I have already tried various methods such as joins, iloc, adding df['F'] manually, and I don't seem to find an answer. Most of the time I get F added to the dataframe, but with its data filled with NaN (e.g. the lines where the first dataframe was filled, I get NaN in F, and then I get double the number of rows with NaN everywhere, except F, where the data is OK).

It seems you want to add column F to the first dataframe regardless of the index of both dataframes. In that case, just assign through ndarray of column F
df1['F'] = df2['F'].to_numpy()
Out[131]:
A B C D E F
0 1 2 3 4 5 0
1 5 4 3 2 1 1
2 1 3 5 7 9 0
3 9 7 5 3 1 1

You have just to create a new column on the original dataframe assigning the result of the second dataframe:
generating the example
import pandas as pd
data1 = {"A": [1, 5, 1, 9],
"B": [2, 4, 3, 7],
"C": [3, 3, 5, 5],
"D": [4, 2, 7, 3],
"E": [5, 1, 9, 1]}
data2 = {"F": [0, 1, 0, 1]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
#creating the column
df1["F"] = df2.F
df1
> A B C D E F
> 0 1 2 3 4 5 0
> 1 5 4 3 2 1 1
> 2 1 3 5 7 9 0
> 3 9 7 5 3 1 1

use meshgrid for rows with common values in column

my dataframes:
df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 2, 3], [7, 8, 8]]),columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 2, 3], [5, 8, 8]]),columns=['a', 'b', 'c'])
df1,df2:
a b c
0 1 2 3
1 4 2 3
2 7 8 8
a b c
0 1 2 3
1 4 2 3
2 5 8 8
I want to combine rows from columns a from both df's in all sequences but only where values in column b and c are equal.
Right now I have only solution for all in general with this code:
x = np.array(np.meshgrid(df1.a.values,
df2.a.values)).T.reshape(-1,2)
df = pd.DataFrame(x)
print(df)
0 1
0 1 1
1 1 4
2 1 5
3 4 1
4 4 4
5 4 5
6 7 1
7 7 4
8 7 5
expected output for df1.a and df2.a only for rows where df1.b==df2.b and df1.c==df2.c:
0 1
0 1 1
1 1 4
2 4 1
3 4 4
4 7 5
so basically i need to group by common rows in selected columns band c

You should try DataFrame.merge using inner merge:
df1.merge(df2, on=['b', 'c'])[['a_x', 'a_y']]
a_x a_y
0 1 1
1 1 4
2 4 1
3 4 4
4 7 5

Get values and column names

I have a pandas data frame that looks something like this:
data = {'1' : [0, 2, 0, 0], '2' : [5, 0, 0, 2], '3' : [2, 0, 0, 0], '4' : [0, 7, 0, 0]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd'])
df
1 2 3 4
a 0 5 2 0
b 2 0 0 7
c 0 0 0 0
d 0 2 0 0
I know I can get the maximum value and the corresponding column name for each row by doing (respectively):
df.max(1)
df.idxmax(1)
How can I get the values and the column name for every cell that is not zero?
So in this case, I'd want 2 tables, one giving me each value != 0 for each row:
a 5
a 2
b 2
b 7
d 2
And one giving me the column names for those values:
a 2
a 3
b 1
b 4
d 2
Thanks!

You can use stack for Series, then filter by boolean indexing, rename_axis, reset_index and last drop column or select columns by subset:
s = df.stack()
df1 = s[s!= 0].rename_axis(['a','b']).reset_index(name='c')
print (df1)
a b c
0 a 2 5
1 a 3 2
2 b 1 2
3 b 4 7
4 d 2 2
df2 = df1.drop('b', axis=1)
print (df2)
a c
0 a 5
1 a 2
2 b 2
3 b 7
4 d 2
df3 = df1.drop('c', axis=1)
print (df3)
a b
0 a 2
1 a 3
2 b 1
3 b 4
4 d 2
df3 = df1[['a','c']]
print (df3)
a c
0 a 5
1 a 2
2 b 2
3 b 7
4 d 2
df3 = df1[['a','b']]
print (df3)
a b
0 a 2
1 a 3
2 b 1
3 b 4
4 d 2

Pandas number rows within group in increasing order

Given the following data frame:
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B'],
'B':['a','a','b','a','a','a'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B a
5 B a
I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
I've tried this so far:
df['C']=df.groupby(['A','B'])['B'].transform('rank')
...but it doesn't work!

Use groupby/cumcount:
In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3

Use groupby.rank function.
Here the working example.
df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df
C1 C2
a 1
a 2
a 3
b 4
b 5
df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df
C1 C2 RANK
a 1 1
a 2 2
a 3 3
b 4 1
b 5 2

Assign series to DataFrame with unequal indices

I have the following dataframe and series with different indices and like to add series 's' to dataframe df2.
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [1, 1, 2, 2], 'c': [1, 2, 3,4]})
>>> df
a b c
0 1 1 1
1 2 1 2
2 2 2 3
3 3 2 4
>>> df2 = df.set_index(['a', 'b'])
>>> df2
c
a b
1 1 1
2 1 2
2 3
3 2 4
>>> s = pd.Series([10, 20, 30], pd.MultiIndex.from_tuples([[1], [2], [3]], names=['a']))
>>> s
a
1 10
2 20
3 30
dtype: int64
>>> df2['x'] = s
>>> df2
c x
a b
1 1 1 NaN
2 1 2 NaN
2 3 NaN
3 2 4 NaN
I know column 'x' is set NaN because the column indices don't match, but is there a way to add series 's' by only taking into account the matching columns?
The expected result is
>>> df2
c x
a b
1 1 1 10
2 1 2 20
2 3 20 # because index a=2 (ignored 'b' because it didn't exist in series 's')
3 2 4 30

You can use DataFrame.join:
>>> df2.join(pd.DataFrame({"x": s}))
c x
a b
1 1 1 10
2 1 2 20
2 3 20
3 2 4 30

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reset column index of pandas dataframe - python

Is it possible to reset columns so they becomes first row of DataFrame. For example, import pandas as pd df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) a b 0 1 4 1 2 5 2 3 6 Desired ouput, df2 = df.reset_column() ??? 0 1 0 a b 1 1 4 2 2 5 3 3 6

Can also chain reset.index df.T.reset_index().T.reset_index(drop=True) 0 1 0 a b 1 1 4 2 2 5 3 3 6

Use In [57]: pd.DataFrame(np.vstack([df.columns, df])) Out[57]: 0 1 0 a b 1 1 4 2 2 5 3 3 6

Inserting column names at the first row and resetting the indices. import pandas as pd df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) df.loc[-1] = df.columns df.index = df.index + 1 df = df.sort_index() df.columns = [0,1] df 0 1 0 a b 1 1 4 2 2 5 3 3 6

Related

Pandas - Attach column to a DataFrame

use meshgrid for rows with common values in column

Get values and column names

Pandas number rows within group in increasing order

Assign series to DataFrame with unequal indices

Categories

Resources