Python: dataframe pivot with duplicate labels - python

I have a panda dataframe as below
index ColumnName ColumnValue
0 A 1
1 B 2
2 C 3
3 A 4
4 B 5
5 C 6
6 A 7
7 B 8
8 C 9
I want ouput like below as panda dataframe
A B C
1 2 3
4 5 6
7 8 9
Can anyone sugget how i can i achieve desired output ?
Regards
Vipul

First solution came into my mind is to use for loop with unique columnName as below. If you want pivot method to achieve it, someone else might help you.
columns = df['ColumnName'].unique()
data = {}
for column in columns:
data[column] = list(df[df['ColumnName'] == column]['ColumnValue'])
pd.DataFrame(data)
which will give you the below output
A B C
0 1 2 3
1 4 5 6
2 7 8 9

Related

Add all column values repeated of one data frame to other in pandas

Having two data frames:
df1 = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
a b
0 1 4
1 2 5
2 3 6
df2 = pd.DataFrame({'c':[7],'d':[8]})
c d
0 7 8
The goal is to add all df2 column values to df1, repeated and create the following result. It is assumed that both data frames do not share any column names.
a b c d
0 1 4 7 8
1 2 5 7 8
2 3 6 7 8
If there are strings columns names is possible use DataFrame.assign with unpack Series created by selecing first row of df2:
df = df1.assign(**df2.iloc[0])
print (df)
a b c d
0 1 4 7 8
1 2 5 7 8
2 3 6 7 8
Another idea is repeat values by df1.index with DataFrame.reindex and use DataFrame.join (here first index value of df2 is same like first index value of df1.index):
df = df1.join(df2.reindex(df1.index, method='ffill'))
print (df)
a b c d
0 1 4 7 8
1 2 5 7 8
2 3 6 7 8
If no missing values in original df is possible use forward filling missing values in last step, but also are types changed to floats, thanks #Dishin H Goyan:
df = df1.join(df2).ffill()
print (df)
a b c d
0 1 4 7.0 8.0
1 2 5 7.0 8.0
2 3 6 7.0 8.0

Move dataframe column below the other(pandas Python)

I want to shift especific column down by one (I dont know if other library can help me)
import pandas as pd
#pd.set_option('display.max_rows',100)
fac=pd.read_excel('TEST.xlsm',sheet_name="DC - Consumables",header=None, skiprows=1)
df = pd.DataFrame(fac)
df1=df.iloc[0:864,20:39]
df2=df.iloc[0:864,40:59]
df1=pd.concat([df1,df2])
print (df1)
I want one column to be below the other column
A B C` A B C`
1 2 3` 6 7 8`
4 5 8` 4 1 9`
my code print this
A B C
1 2 3
4 5 8
A B C
6 7 8
4 1 9
I need the second column (dataframe) to be below the first column, like this:
A B C
1 2 3
4 5 8
A B C
6 7 8
4 1 9
Please help me
Try pd.concat().
df3 = pd.concat([df1, df2])

Select Columns of a DataFrame based on another DataFrame

I am trying to select a subset of a DataFrame based on the columns of another DataFrame.
The DataFrames look like this:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
a b
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
I want to get all rows of the first Dataframe for the columns which are included in both DataFrames. My result should look like this:
a b
0 0 1
1 4 5
2 8 9
3 12 13
You can use pd.Index.intersection or its syntactic sugar &:
intersection_cols = df1.columns & df2.columns
res = df1[intersection_cols]
import pandas as pd
data1=[[0,1,2,3,],[4,5,6,7],[8,9,10,11],[12,13,14,15]]
data2=[[0,1],[2,3],[4,5],[6,7],[8,9]]
df1 = pd.DataFrame(data=data1,columns=['a','b','c','d'])
df2 = pd.DataFrame(data=data2,columns=['a','b'])
df1[(df1.columns) & (df2.columns)]

How to sum values by value in other columns in pandas in Python?

Hi I am dealing with some data by using pandas.
I am facing a problem but here I'll try to simplify it.
Suppose I have a dataset looks like this:
# Incidents Place Month
0 3 A 1
1 5 B 1
2 2 C 2
3 2 B 2
4 6 C 3
5 3 A 1
So I want to sum the # of incidents by the place, that is, I want to have a result like
P #
A 3
B 7(5+2)
C 8(2+6)
stored in a pandas DataFrame. I don't care about other columns at this point.
Next question is, now if I want to use the data in Month column as well, I'd like to have result looks like
P M #
A 1 6(3+3)
B 1 5
B 2 2
C 2 2
C 3 6
How can I achieve these results in pandas? I have tried groupby and some other functions but I cannot reach the point...
Any help is appreciated!
You can do it in this way:
In [35]: df
Out[35]:
# Incidents Place Month
0 3 A 1
1 5 B 1
2 2 C 2
3 2 B 2
4 6 C 3
5 3 A 1
In [36]: df.groupby('Place')['# Incidents'].sum().reset_index()
Out[36]:
Place # Incidents
0 A 6
1 B 7
2 C 8
In [37]: df.groupby(['Place', 'Month'])['# Incidents'].sum().reset_index()
Out[37]:
Place Month # Incidents
0 A 1 6
1 B 1 5
2 B 2 2
3 C 2 2
4 C 3 6
Please find here a Pandas documentation with lots of examples.

Python: Applying a function to DataFrame taking input from the new calculated column

Im facing a problem with applying a function to a DataFrame (to model a solar collector based on annual hourly weather data)
Suppose I have the following (simplified) DataFrame:
df2:
A B C
0 11 13 5
1 6 7 4
2 8 3 6
3 4 8 7
4 0 1 7
Now I have defined a function that takes all rows as input to create a new column called D, but I want the function to also take the last calculated value of D (except of course for the first row as no value for D is calculated) as input.
def Funct(x):
D = x['A']+x['B']+x['C']+(x-1)['D']
I know that the function above is not working, but it gives an idea of what I want.
So to summarise:
Create a function that creates a new column in the dataframe and takes the value of the new column one row above it as input
Can somebody help me?
Thanks in advance.
It sounds like you are calculating a cumulative sum. In that case, use cumsum:
In [45]: df['D'] = (df['A']+df['B']+df['C']).cumsum()
In [46]: df
Out[46]:
A B C D
0 11 13 5 29
1 6 7 4 46
2 8 3 6 63
3 4 8 7 82
4 0 1 7 90
[5 rows x 4 columns]
Are you looking for this?
You can use shift to align the previous row with current row and then you can do your operation.
In [7]: df
Out[7]:
a b
1 1 1
2 2 2
3 3 3
4 4 4
[4 rows x 2 columns]
In [8]: df['c'] = df['b'].shift(1) #First row will be Nan
In [9]: df
Out[9]:
a b c
1 1 1 NaN
2 2 2 1
3 3 3 2
4 4 4 3
[4 rows x 3 columns]

Categories