This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
Considering the following dataframe:
import pandas as pd
import numpy as np
import random
np.random.seed(10)
df1 = pd.DataFrame({'x':[1,2,3,4,5,1,2,3,4,5],
'y':[10,10,10,10,10,20,20,20,20,20],
'z':np.random.normal(size = 10)
})
I want to convert the x values into columns and y values into index (decreasing) with corresponding z values in the dataframe. It's something like this df2:
df2 = pd.DataFrame(np.random.randn(2,5), index = [20,10], columns=[1,2,3,4,5])
How can I conver df1 into df2's style?
You can use pandas.pivot_table:
res = df1.pivot_table(index='y', columns='x', values='z')
You may wish to remove or change your index names, but this is your result:
x 1 2 3 4 5
y
10 1.331587 0.715279 -1.545400 -0.008384 0.621336
20 -0.720086 0.265512 0.108549 0.004291 -0.174600
Related
This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Closed 7 months ago.
Is there any way that I can drop the value if its index = column index.
I mean, this is my toy dataframe
d = {'Non': [1, 2,4,5,2,7], 'Schzerando': [3, 4,8,4,7,7], 'cc': [1,2,0.75,0.25,0.3,1]}
df = pd.DataFrame(data=d)
df
Then I just want to keep the row which df["cc"] == 1 and 2, like this
Toy dataframe to try.
You can filter out the rows by converting the cc column to int type then filter by applying mask.
df['cc'] = df['cc'].astype('Int64')
df = df[df['cc'] == 1 | df['cc'] == 2 | df['cc'] == 3]
or you can declare a list with all the values you want to filter for then use pandas isin
f_list = [1,2,3]
df[df['cc'].isin(f_list)]
df[(df['cc'] == 1) | (df['cc'] == 2)]
Output :
I have a list of arrays:
[array([10,20,30]), array([5,6,7])]
How to turn it into pandas dataframe? pd.DataFrame() puts arrays in on column. desired result is:
0 1 2
10 20 30
5 6 7
0 1 2 here are column names
import pandas as pd
import numpy as np
a = [np.array([10,20,30]), np.array([5,6,7])]
print(pd.DataFrame(a))
Make sure you put the np before the array.
import pandas as pd
import numpy as np
list = [np.array([10,20,30]), np.array([5,6,7])]
df = pd.DataFrame(list)
print(df)
output:
0 1 2
0 10 20 30
1 5 6 7
If you still get an error, is the list of arrays a result from previous data manipulation or did you manually type out the values / array lists?
This question already has answers here:
How to apply a function to two columns of Pandas dataframe
(15 answers)
Closed 3 years ago.
I am new to Pandas. Would like to know how to apply a function to two columns in a dataframe and map the output from the function to a new column in the dataframe. Is this at all possible with pandas syntax or should I resort to native Python to iterate over the rows in the dataframe columns to generate the new column?
a b
1 2
3 1
2 9
Question is how to get, for example, the multiplication of the two numbers in a new column c
a b c
1 2 2
3 1 3
2 9 18
You can do with pandas.
For example:
def funcMul(row):
return row['a']*row['b']
Then,
df['c'] = df.apply(funcMul,1)
Output:
a b c
0 1 2 2
1 3 1 3
2 2 9 18
You can do the following with pandas
import pandas as pd
def func(r):
return r[0]*r[1]
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
df['c'] = df.apply(func, axis = 1)
Also, here is the official documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
The comment by harvpan shows the simplest way to achieve your specific example, but here is a generic way to do what you asked:
def functionUsedInApply(row):
""" The function logic for the apply function comes here.
row: A Pandas Series containing the a row in df.
"""
return row['a'] * row['b']
def functionUsedInMap(value):
""" This function is used in the map after the apply.
For this example, if the value is larger than 5,
return the cube, otherwise, return the square.
value: a value of whatever type is returned by functionUsedInApply.
"""
if value > 5:
return value**3
else:
return value**2
df['new_column_name'] = df.apply(functionUsedInApply,axis=1).map(functionUsedInMap)
The function above first adds columns a and b together and then returns the square of that value for a+b <=5 and the cube of that value for a+b > 5.
This question already has answers here:
Conditional Replace Pandas
(7 answers)
Closed 3 years ago.
I want to add "1" in each row for columns "Score" where the below statement is true,
import pandas as pd
import numpy as np
df = pd.read_csv(Path1 + 'Test.csv')
df.replace(np.nan, 0, inplace=True)
df[(df.Day7 >= 500)]
Sample Value
Output
Could you please try following.
df['score']=np.where(df['Day7']>=500,1,"")
Or as per OP's comment(adding #anky_91's enhanced solution here):
np.where((df['Day7']>=500)&(df['Day7']<1000),1,"")
When we print the value of df following will be the output.
Cat Day7 score
0 Advertisir 145
1 Blogs 56
2 Business 92
3 Classfied 23
4 Continuin 110
5 Corporate 1974 1
You are halfway there. Just use df.loc[mask, "Score"] = 1:
import numpy as np
import pandas as pd
df = pd.DataFrame({"Day7":np.random.rand(5)*1000,
"Score": np.random.rand(5)})
print(df)
df.loc[(df.Day7>=500), "Score"] = 1
print(df)
df = df.assign(Score=0)
df.Score = df.Day7 >= 500
This question already has an answer here:
Dataframe selecting Max for a column but output values of another
(1 answer)
Closed 4 years ago.
In the following code, how can I select data2's element each row that is given by the list of column index, data.idxmax(axis=1)?
data1 = pd.DataFrame([[1,2], [4,3], [5,6]])
data2 = pd.DataFrame([[10,20], [30,40], [50,60]])
data1.idxmax(axis=1)
The result should be pd.Series or pd.DataFrame of [20,30,60].
Use the lookup function:
i = data1.idxmax(axis=1)
data2.lookup(i.index, i.values)
This will give you an array with the values. To get the result as a Series, simply create it:
pd.Series(data2.lookup(i.index, i.values))
You can try max with axis = 1 and eq with axis = 0
data2[data1.eq(data1.max(1),0)].stack()
Out[193]:
0 1 20.0
1 0 30.0
2 1 60.0
dtype: float64