Dataframe image
the operation that I intend to perform is whenever there is a '2' in the column 3, we need to take that entry and take the column 1 value of that entry and subtract the column 1 value of the previous entry and then multiply the result by a constant integer (say 5).
For example: From the image we have a '2' in column 3 at 6:00 and the value of column 1 for that entry is 0.011333 and take the previous column 1 entry which is 0.008583 and perform the following.
(0.011333 - 0.008583)* 5.
This I want to perform everytime when we receive a '2' in column 3 in a dataframe. Please help. I am not able to get the write code to perform the above operation.
Hope this helps:
You can use df.shift(1) to get the previous row and np.where to get the row satisfying your condition
df = pd.DataFrame([['ABC', 1, 0, 0],
['DEF', 2, 0, 0],
['GHI', 3, 0, 0],
['JKL', 4, 0, 2],
['MNO', 5, 0, 2],
['PQR', 6, 0, 2],
['STU', 7, 0, 0]],
columns=['Date & Time', 'column 1', 'column 2', 'column 3'])
df['new'] = np.where(df['column 3'] == 2, df['column 1'] - df['column 1'].shift(1) * 5, 0)
print(df)
Output:
Date & Time column 1 column 2 column 3 new
0 ABC 1 0 0 0.0
1 DEF 2 0 0 0.0
2 GHI 3 0 0 0.0
3 JKL 4 0 2 -11.0
4 MNO 5 0 2 -15.0
5 PQR 6 0 2 -19.0
6 STU 7 0 0 0.0
You can change your calculations as you want. In the else part you can put np.NaN or any other calculation if you want.
Would something like that do the job ?
dataframe = [
[1,3,6,6,7],
[4,3,5,6,7],
[12,3,2,6,7],
[2,3,7,6,7],
[9,3,5,6,7],
[13,3,2,6,7]
]
constant = 5
list_of_outputs = []
for row in dataframe:
if row[2] == 2:
try:
output = (row[0] - prev_entry) * constant
list_of_outputs.append(output)
except:
print("No previous entry!")
prev_entry = row[0]
Perhaps this question will help you
I think in SQL way, so basically you will make new column that filled with the value from the row above it.
df['column1_lagged'] = df['column 1'].shift(1)
Then you create another column that do the calculation
constant = 5
df['calculation'] = (df['column 1'] - df['column1_lagged'])*constant
After that you just slice the dataframe to your condition (column 3 with '2's)
condition = df['column 3'] == 2
df[condition]
Related
I have following dataframe table:
df = pd.DataFrame({'A': [0, 1, 0],
'B': [1, 1, 1]},
index=['2020-01-01', '2020-02-01', '2020-03-01'])
I'm trying to achieve that every value where 1 is present will be replaced by an increasing number. I'm looking for something like:
df.replace(1, value=3)
that works great but instead of number 3 I need number to be increasing (as I want to use it as ID)
number += 1
If I join those together, it doesn't work (or at least I'm not able to find correct syntax) I'd like to obtain following result:
df = pd.DataFrame({'A': [0, 2, 0],
'B': [1, 3, 4]},
index=['2020-01-01', '2020-02-01', '2020-03-01'])
Note: I can not use any command that relies on specification of column or row name, because table has 2600 columns and 5000 rows.
Element-wise assignment on a copy of df.values can work.
More specifically, a range starting from 1 to the number of 1's (inclusive) is assigned onto the location of 1 elements in the value array. The assigned array is then put back into the original dataframe.
Code
(Data as given)
1. Row-first ordering (what the OP wants)
arr = df.values
mask = (arr > 0)
arr[mask] = range(1, mask.sum() + 1)
for i, col in enumerate(df.columns):
df[col] = arr[:, i]
# Result
print(df)
A B
2020-01-01 0 1
2020-02-01 2 3
2020-03-01 0 4
2. Column-first ordering (another possibility)
arr_tr = df.values.transpose()
mask_tr = (arr_tr > 0)
arr_tr[mask_tr] = range(1, mask_tr.sum() + 1)
for i, col in enumerate(df.columns):
df[col] = arr_tr[i, :]
# Result
print(df)
A B
2020-01-01 0 2
2020-02-01 1 3
2020-03-01 0 4
I have a pandas dataframe and want to replace the current row with data with the previous row if the value of a certain column in the current row is 1, but had no success yet. Any help appreciated.
It could be done like this:
#B is the column that lets you know if the row should change or not
for i in range(1, len(df)):
if df.loc[i, 'B'] == 1:
df.loc[i, :] = df.loc[i-1, :]
This can be done just using the shift operator and assigning. If b is the index we want to condition on, then we create a condition based on b first
import pandas as pd
df = pd.DataFrame({
'a':[1, 2, 3, 4, 5, 6],
'b':[0, 1, 1, 0, 0, 1],
'c':[7, 8, 9, 0, 1, 2]}
)
# this is our condition (which row we're changing)
index_to_change = df['b'] == 1
# this is a list of columns we want to change
cols_to_change = ['a', 'c']
df.loc[index_to_change, cols_to_change] = df[cols_to_change].shift(1).loc[index_to_change]
Output:
In []: df
Out[]:
a b c
0 1.0 0 7.0
1 1.0 1 7.0
2 2.0 1 8.0
3 4.0 0 0.0
4 5.0 0 1.0
5 5.0 1 1.0
I have a Pandas dataframe df with 102 columns. Each column is named differently, say A, B, C etc. to give the original dataframe following structure
Column A. Column B. Column C. ....
Row 1.
Row 2.
---
Row n
I would like to change the columns names from A, B, C etc. to F1, F2, F3, ...., F102. I tried using df.columns but wasn't successful in renaming them this way. Any simple way to automatically rename all column names to F1 to F102 automatically, insteading of renaming each column name individually?
df.columns=["F"+str(i) for i in range(1, 103)]
Note:
Instead of a “magic” number 103 you may use the calculated number of columns (+ 1), e.g.
len(df.columns) + 1, or
df.shape[1] + 1.
(Thanks to ALollz for this tip in his comment.)
One way to do this is to convert it to a pair of lists, and convert the column names list to the index of a loop:
import pandas as pd
d = {'Column A': [1, 2, 3, 4, 5, 4, 3, 2, 1], 'Column B': [1, 2, 3, 4, 5, 4, 3, 2, 1], 'Column c': [1, 2, 3, 4, 5, 4, 3, 2, 1]}
dataFrame = pd.DataFrame(data=d)
cols = list(dataFrame.columns.values) #convert original dataframe into a list containing the values for column name
index = 1 #start at 1
for column in cols:
cols[index-1] = "F"+str(index) #rename the column name based on index
index += 1 #add one to index
vals = dataFrame.values.tolist() #get the values for the rows
newDataFrame = pd.DataFrame(vals, columns=cols) #create a new dataframe containing the new column names and values from rows
print(newDataFrame)
Output:
F1 F2 F3
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 4 4 4
6 3 3 3
7 2 2 2
8 1 1 1
I need to replace the values of a certain cell with values from another cell if a certain condition is met.
for r in df:
if df['col1'] > 1 :
df['col2']
else:
I am hoping for every value in column 1 to be replaced with their respective value in column 2 if the condition if the value of the row in column 1 is greater than 1.
No need to loop through the entire dataframe.
idx=df['col1']>1
df.loc[idx,'col1']=df.loc[idx,'col2']
Using a for loop
for _,row in df.iterrows():
if row['col1']>1:
row['col1']=row['col2']
elif condition:
#put assignment here
else other_condition:
#put assignment here
Here is an example
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 4, 6]})
print(df)
print('----------')
# the condition here is A^2 == B
df.loc[df['A'] * df['A'] == df['B'], 'A'] = df['B']
print(df)
output
A B
0 1 4
1 2 4
2 3 6
----------
A B
0 1 4
1 4 4
2 3 6
I'm working in Python. I have two dataframes df1 and df2:
d1 = {'timestamp1': [88148 , 5617900, 5622548, 5645748, 6603950, 6666502], 'col01': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d1)
d2 = {'timestamp2': [5629500, 5643050, 6578800, 6583150, 6611350], 'col02': [7, 8, 9, 10, 11], 'col03': [0, 1, 0, 0, 1]}
df2 = pd.DataFrame(d2)
I want to create a new column in df1 with the value of the minimum timestamp of df2 greater than the current df1 timestamp, where df2['col03'] is zero. This is the way I did it:
df1['colnew'] = np.nan
TSs = df1['timestamp1']
for TS in TSs:
values = df2['timestamp2'][(df2['timestamp2'] > TS) & (df2['col03']==0)]
if not values.empty:
df1.loc[df1['timestamp1'] == TS, 'colnew'] = values.iloc[0]
It works, but I'd prefer not to use a for loop. Is there a better way to do this?
Use pandas.merge_asof with a forward direction
pd.merge_asof(
df1, df2.loc[df2.col03 == 0, ['timestamp2']],
left_on='timestamp1', right_on='timestamp2', direction='forward'
).rename(columns=dict(timestamp2='colnew'))
col01 timestamp1 colnew
0 1 88148 5629500.0
1 2 5617900 5629500.0
2 3 5622548 5629500.0
3 4 5645748 6578800.0
4 5 6603950 NaN
5 6 6666502 NaN
Give a try to the apply method.
def func(x):
values = df2['timestamp2'][(df2['timestamp2'] > x) & (df2['col03']==0)]
if not values.empty:
return values.iloc[0]
else:
np.NAN
df1["timestamp1"].apply(func)
You can create a separate function to do what has to be done.
The output is your new column
0 5629500.0
1 5629500.0
2 5629500.0
3 6578800.0
4 NaN
5 NaN
Name: timestamp1, dtype: float64
It is not an one-line solution, but it helps keeping things organised.