How to add value if condition match, python [duplicate] - python

This question already has answers here:
Conditional Replace Pandas
(7 answers)
Closed 3 years ago.
I want to add "1" in each row for columns "Score" where the below statement is true,
import pandas as pd
import numpy as np
df = pd.read_csv(Path1 + 'Test.csv')
df.replace(np.nan, 0, inplace=True)
df[(df.Day7 >= 500)]
Sample Value
Output

Could you please try following.
df['score']=np.where(df['Day7']>=500,1,"")
Or as per OP's comment(adding #anky_91's enhanced solution here):
np.where((df['Day7']>=500)&(df['Day7']<1000),1,"")
When we print the value of df following will be the output.
Cat Day7 score
0 Advertisir 145
1 Blogs 56
2 Business 92
3 Classfied 23
4 Continuin 110
5 Corporate 1974 1

You are halfway there. Just use df.loc[mask, "Score"] = 1:
import numpy as np
import pandas as pd
df = pd.DataFrame({"Day7":np.random.rand(5)*1000,
"Score": np.random.rand(5)})
print(df)
df.loc[(df.Day7>=500), "Score"] = 1
print(df)

df = df.assign(Score=0)
df.Score = df.Day7 >= 500

Related

How to turn list of arrays into dataframe?

I have a list of arrays:
[array([10,20,30]), array([5,6,7])]
How to turn it into pandas dataframe? pd.DataFrame() puts arrays in on column. desired result is:
0 1 2
10 20 30
5 6 7
0 1 2 here are column names
import pandas as pd
import numpy as np
a = [np.array([10,20,30]), np.array([5,6,7])]
print(pd.DataFrame(a))
Make sure you put the np before the array.
import pandas as pd
import numpy as np
list = [np.array([10,20,30]), np.array([5,6,7])]
df = pd.DataFrame(list)
print(df)
output:
0 1 2
0 10 20 30
1 5 6 7
If you still get an error, is the list of arrays a result from previous data manipulation or did you manually type out the values / array lists?

Dropping every row with len >2 Pandas python

Suppose I have a dataframe
. Values
0 25
1 897
2 48
3 28
4 214
5 25
I am trying to drop all rows with len > 2 with the following code but nothing happens when I run it.
import pandas as pd
df = pd.read_csv('File.csv')
for index in df.index:
if len(df.loc[index, 'Sevens']) > 2:
df.drop([index])
else:
pass
Use Series.str.len in boolean indexing:
df1 = df[df['Value'].str.len() <=2]
If values was numbers:
df1 = df[df['Value'].astype(str).str.len() <=2]

How do i replace the values of a dataframe on a condition

I have a pandas dataframe with more than 50 columns. All the data except the 1st column is float. I want to replace any value greater than 5.75 with 100. Can someone advise any function to do the same.
The replace function is not working as to_value can only take "=" function, and not the greater than function.
This can be done using
df['ColumnName'] = np.where(df['ColumnName'] > 5.75, 100, df['First Season'])
You can make a custom function and pass it to apply:
import pandas as pd
import random
df = pd.DataFrame({'col_name': [random.randint(0,10) for x in range(100)]})
def f(x):
if x >= 5.75:
return 100
return x
df['modified'] = df['col_name'].apply(f)
print(df.head())
col_name modified
0 2 2
1 5 5
2 7 100
3 1 1
4 9 100
If you have a dataframe:
import pandas as pd
import random
df = pd.DataFrame({'first_column': [random.uniform(5,6) for x in range(10)]})
print(df)
Gives me:
first_column
0 5.620439
1 5.640604
2 5.286608
3 5.642898
4 5.742910
5 5.096862
6 5.360492
7 5.923234
8 5.489964
9 5.127154
Then check if the value is greater than 5.75:
df[df > 5.75] = 100
print(df)
Gives me:
first_column
0 5.620439
1 5.640604
2 5.286608
3 5.642898
4 5.742910
5 5.096862
6 5.360492
7 100.000000
8 5.489964
9 5.127154
import numpy as np
import pandas as pd
#Create df
np.random.seed(0)
df = pd.DataFrame(2*np.random.randn(100,50))
for col_name in df.columns[1:]: #Skip first column
df.loc[:,col_name][df.loc[:,col_name] > 5.75] = 100
np.where(df.value > 5.75, 100, df.value)

Converting DataFrame in Python [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
Considering the following dataframe:
import pandas as pd
import numpy as np
import random
np.random.seed(10)
df1 = pd.DataFrame({'x':[1,2,3,4,5,1,2,3,4,5],
'y':[10,10,10,10,10,20,20,20,20,20],
'z':np.random.normal(size = 10)
})
I want to convert the x values into columns and y values into index (decreasing) with corresponding z values in the dataframe. It's something like this df2:
df2 = pd.DataFrame(np.random.randn(2,5), index = [20,10], columns=[1,2,3,4,5])
How can I conver df1 into df2's style?
You can use pandas.pivot_table:
res = df1.pivot_table(index='y', columns='x', values='z')
You may wish to remove or change your index names, but this is your result:
x 1 2 3 4 5
y
10 1.331587 0.715279 -1.545400 -0.008384 0.621336
20 -0.720086 0.265512 0.108549 0.004291 -0.174600

Count number of majority value cross rows in Dataframe in Python

I have a DataFrame like:
df = np.array([[1,5,3,4,5,5,6,],[1,2,2,3,4,5,6],[1,2,3,4,5,6,6]])
df = pd.DataFrame(df)
and my expected output is the majority value of each row, like:
0 5
1 2
2 6
I'm new with Pandas. Thank you for any help.
with pandas version 0.13.0, you can use df.mode(axis = 1)
(check your version with pd.__version__)
df.mode(axis=1)
0
0 5
1 2
2 6
[3 rows x 1 columns]
The concept you are looking for is a mode, which is the most commonly occurring number in a set. Scipy and Pandas both have ways to handle modes, through scipy.stats.mode and pandas.DataFrame.mode(works along an axis). So for this example you could say:
df = np.array([[1,5,3,4,5,5,6,],[1,2,2,3,4,5,6],[1,2,3,4,5,6,6]])
for i in np.arange(len(df)):
results = np.zeros(len(df))
results[i] = scipy.stats.mode(df[i])
This should return a numpy array with the modes of each array. To do this same thing with Pandas you can do:
df = np.array([[1,5,3,4,5,5,6,],[1,2,2,3,4,5,6],[1,2,3,4,5,6,6]])
df = pd.DataFrame(df)
df.mode(axis = 1)
The documentation is here: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.mode.html

Categories