DataFrame transform column values to new columns

DataFrame transform column values to new columns - python

I have following series:
project id type
First 130403725 PRODUCT 68
EMPTY 2
Six 130405706 PRODUCT 24
132517244 PRODUCT 33
132607436 PRODUCT 87
How I can transform type values to new columns:
PRODUCT EMPTY
project id
First 130403725 68 2
Six 130405706 24 0
132517244 33 0
132607436 87 0

This is a classic pivot table:
df_pivoted = df.pivot(index=["project", "id"], columns=["type"], values=[3])
I've used 3 as the index of the value column but it would be more clear if you would have named it.

Use unstack, because MultiIndex Series:
s1 = s.unstack(fill_value=0)
print (s1)
type EMPTY PRODUCT
project id
First 130403725 2 68
Six 130405706 0 24
132517244 0 33
132607436 0 87
For DataFrame:
df = s.unstack(fill_value=0).reset_index().rename_axis(None, axis=1)
print (df)
project id EMPTY PRODUCT
0 First 130403725 2 68
1 Six 130405706 0 24
2 Six 132517244 0 33
3 Six 132607436 0 87

Related

How to combine dataframes based on index column name

Hello I am new to python and I have 2 dfs and a list of tickers and i would like to combine the 2 dfs based on a list of tickers. My second df had the tickers imported from an excel sheet and so the column names in the index are in a different order, I am not sure if that changes anything.
df1 looks like
df1
index
ABC
DEF
XYZ
avg
2
6
12
std
1
2
3
var
24
25
35
max
56
66
78
df 2
index
10
40
96
ticker
XYZ
ABC
DEF
Sector
Auto
Tech
Mining
I would like to combine them based on their ticker names in a third df with all the information so it looks something like this
df3
index
ABC
DEF
XYZ
avg
2
6
12
std
1
2
3
var
24
25
35
max
56
66
78
Sector
Tech
Mining
Auto
I have tried this
df3= pd.concat([df1,df2], ignore_index=True)
but it made a df where they were side by side instead of in one combine df. Any help would be appreciated.

You need to set the index
df2 = df2.set_index('index').T.set_index('ticker').T
out = pd.concat([df1,df2])
ABC DEF XYZ
index
avg 2 6 12
std 1 2 3
var 24 25 35
max 56 66 78
Sector Tech Mining Auto

Split Two Related DataFrame Columns into Two New DataFrames

I basically have 2 related columns in a data frame in python. One of the columns is binary i.e. 1,0,0,1,0 etc and the next column has a related value i.e 200, 34, 124, etc. I want to take all the zero values with their corresponding values in the adjacent column and create a new data frame and do the same for all the ones. An illustration of the columns are below;
Location Price
1 24
0 200
0 56
0 89
1 101
1 94
1 3

You can make two new dataframes with just ones and zeros like this, IIUC:
df[df.Location == 0]
# Location Price
#1 0 200
#2 0 56
#3 0 89
df[df.Location == 1]
# Location Price
#0 1 24
#4 1 101
#5 1 94
#6 1 3

Pandas Multiindex get values from first entry of index

I have the following multiindex dataframe:
from io import StringIO
import pandas as pd
datastring = StringIO("""File,no,runtime,value1,value2
A,0, 0,12,34
A,0, 1,13,34
A,0, 2,23,34
A,1, 6,23,38
A,1, 7,22,38
B,0,17,15,35
B,0,18,17,35
C,0,34,23,32
C,0,35,21,32
""")
df = pd.read_csv(datastring, sep=',')
df.set_index(['File','no',df.index], inplace=True)
>> df
runtime value1 value2
File no
A 0 0 0 12 34
1 1 13 34
2 2 23 34
1 3 6 23 38
4 7 22 38
B 0 5 17 15 35
6 18 17 35
C 0 7 34 23 32
8 35 21 32
What I would like to get is just the first values of every entry with a new file and a different number
A 0 34
A 1 38
B 0 35
C 0 32
The most similar questions I could find where these
Resample pandas dataframe only knowing result measurement count
MultiIndex-based indexing in pandas
Select rows in pandas MultiIndex DataFrame
but I was unable to construct a solution from them. The best I got was the ix operation, but as the values technically are still there (just not on display), the result is
idx = pd.IndexSlice
df.loc[idx[:,0],:]
could, for example, filter for the 0 value but would still return the entire rest of the dataframe.
Is a multiindex even the right tool for the task at hand? How to solve this?

Use GroupBy.first by first and second level of MultiIndex:
s = df.groupby(level=[0,1])['value2'].first()
print (s)
File no
A 0 34
1 38
B 0 35
C 0 32
Name: value2, dtype: int64
If need one column DataFrame use one element list:
df1 = df.groupby(level=[0,1])[['value2']].first()
print (df1)
value2
File no
A 0 34
1 38
B 0 35
C 0 32
Another idea is remove 3rd level by DataFrame.reset_index and filter by Index.get_level_values with boolean indexing:
df2 = df.reset_index(level=2, drop=True)
s = df2.loc[~df2.index.duplicated(), 'value2']
print (s)
File no
A 0 34
1 38
B 0 35
C 0 32
Name: value2, dtype: int64

For the sake of completeness, I would like to add another method (which I would not have found without the answere by jezrael).
s = df.groupby(level=[0,1])['value2'].nth(0)
This can be generalized to finding any, not merely the first entry
t = df.groupby(level=[0,1])['value1'].nth(1)
Note that the selection was changed from value2 to value1 as for the former, the results of nth(0) and nth(1) would have been identical.
Pandas documentation link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.nth.html

Iteratively Capture Value Counts in Single DataFrame

I have a pandas dataframe that looks something like this:
id group gender age_grp status
1 1 m over21 active
2 4 m under21 active
3 2 f over21 inactive
I have over 100 columns and thousands of rows. I am trying to create a single pandas dataframe of the value_counts of each of the colums. So I want something that looks like this:
group1
gender m 100
f 89
age over21 98
under21 11
status active 87
inactive 42
Any one know a simple way I can iteratively concat the value_counts from each of the 100+ columns in the original dataset while capturing the name of the columns as a hierarchical index?
Eventually I want to be able to merge with another dataframe of a different group to look like this:
group1 group2
gender m 100 75
f 89 92
age over21 98 71
under21 11 22
status active 87 44
inactive 42 13
Thanks!

This should do it:
df.stack().groupby(level=1).value_counts()
id 1 1
2 1
3 1
group 1 1
2 1
4 1
gender m 2
f 1
age_grp over21 2
under21 1
status active 2
inactive 1
dtype: int64

Pandas individual item using index and column

I have a csv file test.csv. I am trying to use pandas to select items dependent on whether the second value is above a certain value. Eg
index A B
0 44 1
1 45 2
2 46 57
3 47 598
4 48 5
So what i would like is if B is larger than 50 then give me the values in A as an integer which I could assign a variable to
edit 1:
Sorry for the poor explanation. The final purpose of this is that I want to look in table 1:
index A B
0 44 1
1 45 2
2 46 57
3 47 598
4 48 5
for any values above 50 in column B and get the column A value and then look in table 2:
index A B
5 44 12
6 45 13
7 46 14
8 47 15
9 48 16
so in the end i want to end up with the value in column B of table two which i can print out as an integer and not as a series. If this is not possible using panda then ok but is there a way to do it in any case?

You can use dataframa slicing, to get the values you want:
import pandas as pd
f = pd.read_csv('yourfile.csv')
f[f['B'] > 50].A
in this code
f['B'] > 50
is the condition, returning a booleans array of True/False for all values meeting the condition or not, and then the corresponding A values are selected
This would be the output:
2 46
3 47
Name: A, dtype: int64
Is this what you wanted?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

DataFrame transform column values to new columns - python

I have following series: project id type First 130403725 PRODUCT 68 EMPTY 2 Six 130405706 PRODUCT 24 132517244 PRODUCT 33 132607436 PRODUCT 87 How I can transform type values to new columns: PRODUCT EMPTY project id First 130403725 68 2 Six 130405706 24 0 132517244 33 0 132607436 87 0

This is a classic pivot table: df_pivoted = df.pivot(index=["project", "id"], columns=["type"], values=[3]) I've used 3 as the index of the value column but it would be more clear if you would have named it.

Related

How to combine dataframes based on index column name

Split Two Related DataFrame Columns into Two New DataFrames

Pandas Multiindex get values from first entry of index

Iteratively Capture Value Counts in Single DataFrame

Pandas individual item using index and column

Categories

Resources