I want to modify a specific data points in the pandas data frame under a condition. For example in the following table, I want to divide the data by 2 in column A where only row values of column B is greater than 1.
Column A
Column B
1
1
2
1
3
1
4
2
5
2
6
2
7
1
8
1
Expected output :
Column A
Column B
1
1
2
1
3
1
2
2
2.5
2
3
2
7
1
8
1
How can I modify the data frame with pandas?
df.loc[df["Column B"] > 1,"Column A"] = df["Column A"]/2
Hope it Helps...
you can try 'where' which takes the opposite condition and replaces it with the values.
import pandas as pd
data=pd.DataFrame(data={'A':[1,2,3,4,5,6,7,8],'B':[1,1,1,2,2,2,1,1]})
data.A=data.A.where(data.B<=1,data.A/2)
try this:
df["A"]=df.apply(lambda x:x["A"]/2 if x["B"]>1 else x["A"],axis=1)
Related
I am working with a pandas dataframe where I have the following two columns: "personID" and "points". I would like to create a third variable ("localMin") which will store the minimum value of the column "points" at each point in the dataframe as compared with all previous values in the "points" column for each personID (see image below).
Does anyone have an idea how to achieve this most efficiently? I have approached this problem using shift() with different period sizes, but of course, shift is sensitive to variations in the sequence and doesn't always produce the output I would expect.
Thank you in advance!
Use groupby.cummin:
df['localMin'] = df.groupby('personID')['points'].cummin()
Example:
df = pd.DataFrame({'personID': list('AAAAAABBBBBB'),
'points': [3,4,2,6,1,2,4,3,1,2,6,1]
})
df['localMin'] = df.groupby('personID')['points'].cummin()
output:
personID points localMin
0 A 3 3
1 A 4 3
2 A 2 2
3 A 6 2
4 A 1 1
5 A 2 1
6 B 4 4
7 B 3 3
8 B 1 1
9 B 2 1
10 B 6 1
11 B 1 1
So suppose I have a dataframe like:
A B
0 1 1
1 2 4
2 3 9
I want to have one long dataframe where there are three columns row, col, value like:
row col value
0 0 A 1
1 1 A 2
2 2 A 3
3 0 B 1
4 1 B 4
5 2 B 9
Basically making a 2D array into 1D and remembering the row and column of each entry so the resulting dataframe would be of shape (n*m , 3).
How is this possible with Pandas?
Actually the order of entries in the resulting dataframe isn't important for me.
use melt:
df = df.reset_index()
df.melt(id_vars=['index'], value_vars=['A','B'])
it should give you the thing you want. Let me know if it works.
For example, I have the following dataframe:
I want to transform the dataframe from above to something like this:
Thank's for any kind of help!
Run:
df['Number'] = df.svn_changes.str.match(r'r\d+').cumsum()
Yes, is contains with regex and cumsum:
df = pd.DataFrame({'svn_changes':['r123456','RowValueRow','ValueRowValue',
'some_string_string','r234566','ValueRowValue',
'some_string_string','r123789','something_here',
'ValueRowValue','String_2','String_4']})
df['Number'] = df['svn_changes'].str.contains('r\d+').cumsum()
print(df)
Output:
svn_changes Number
0 r123456 1
1 RowValueRow 1
2 ValueRowValue 1
3 some_string_string 1
4 r234566 2
5 ValueRowValue 2
6 some_string_string 2
7 r123789 3
8 something_here 3
9 ValueRowValue 3
10 String_2 3
11 String_4 3
Here's a simple reusable line you can use to do that:
df['new_col'] = df['old_col'].str.contains('string_to_match')*1
The new column will have value 1 if the string is present in this column, and 0 otherwise.
I have a dataframe extracted from an excel file which I have manipulated to be in the following form (there are mutliple rows but this is reduced to make my question as clear as possible):
|A|B|C|A|B|C|
index 0: 1 2 3 4 5 6
As you can see there are repetitions of the column names. I would like to merge this dataframe to look like the following:
|A|B|C|
index 0: 1 2 3
index 1: 4 5 6
I have tried to use the melt function but have not had any success thus far.
import pandas as pd
df = pd.DataFrame([[1,2,3,4,5,6]], columns = ['A', 'B','C','A', 'B','C'])
df
A B C A B C
0 1 2 3 4 5 6
pd.concat(x for _, x in df.groupby(df.columns.duplicated(), axis=1))
A B C
0 1 2 3
0 4 5 6
How can I drop the exact duplicates of a row. So if I have a data frame that looks like so:
A B C
1 2 3
3 2 2
1 2 3
now my data frame is a lot larger than this but is their a way that we can have python look at every row and if the values in the rows are the exact same as another row just drop or delete that row. I want to take in to account for the whole data frame i don't want to specify the column I want to get unique values for.
you can use DataFrame.drop_duplicates() method:
In [23]: df
Out[23]:
A B C
0 1 2 3
1 3 2 2
2 1 2 3
In [24]: df.drop_duplicates()
Out[24]:
A B C
0 1 2 3
1 3 2 2
You can get a de-duplicated dataframe with the inverse of .duplicated:
df[~df.duplicated(['A','B','C'])]
Returns:
>>> df[~df.duplicated(['A','B','C'])]
A B C
0 1 2 3
1 3 2 2