I have a dataset where I need to match the column A and fetch its corresponding next value in next column B.
For example , I have to check if 1 is matched in column A, If true then print "First Page"
Similarly for all the values in column A has to be matched with say X , if true, then print its next value in column B.
Example:
By using df.iloc you can can get the row or column you want by index.
By using mask you can filter the data frame to get the row you want (where column a == some value) and take the value in the second column by df.iloc[0,1].
import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4,3,2,1]}
df = pd.DataFrame(data=d)
df
col1 col2
0 1 4
1 2 3
2 3 2
3 4 1
# a is the value in the first column and df is the data frame
def a2b(a,df):
return df[df.iloc[:,0]==a].iloc[0,1]
a2b(2,df)
returns 3
Related
I want to update Freq column in df1 using Freq column in data frame 2 as shown below,
data = {'Cell':[1,2,3,4,'10-05','10-09'], 'Freq':[True, True,True,True,True,True]}
df1 = pd.DataFrame(data)
Dataframe 1
Dataframe 1
Dataframe 2
data2 = {'Cell-1':[1,1,1,1,1,1,2,2,2,2,2,2],'Cell-2':[1,2,3,4,'10-05','10-09',1,2,3,4,'10-05','10-09'] ,'Freq':[True, False,True,False,True,True,True, False,True,False,True,False]}
df2 = pd.DataFrame(data2)
Dataframe 2
df1 column 1 has keys while column 2 is corresponding value which in this case is either True or False.
Lets take for example key = 1 in Dataframe 1. This key = 1 has multiple values in Dataframe 2 as shown in the figure. The multiple values for this key = 1 in dataframe 2 is due to values in Column 2, Dataframe 2 which in turn are keys to Dataframe 1 which I want to update in column 2 of df1.
Algorithm in action figure
Alogrithm in action
For a dataframe:
df = pd.DataFrame({"A":[0,0],"B":[0,1],"C":[1,2],"D":[2,2]})
How to obtain the column name or column index when the value is 2 or a certain value
and put it in a new column at df, say df["TAG"]
df = pd.DataFrame({"A":[0,0],"B":[0,1],"C":[1,2],"D":[2,2],"TAG":[D,C]})
i tried
df["TAG"]=np.where(df[cols]>=2,df.columns,'')
where [cols] is the list of df columns
So far i can only find how to find row index when matching a value in Pandas
In excel we can do some approach using MATCH(TRUE,INDEX($A:$D>=2,0),) and apply to multiple rows
Any help or hints are appreciated
Thank you so much in advance
Try idxmax:
>>> df['TAG'] = df.ge(2).T.idxmax()
>>> df
A B C D TAG
0 0 0 1 2 D
1 0 1 2 2 C
>>>
I have a dataframe containing 3 columns. I want to get the value in first column corresponding to last entry in second column and the the value in first column whose their associated values in second column have at least 8 difference with the last entry of second column and put them in a list. Since 18 is the reference, I want to have its associated value from col1 in the list and have a data frame in the output. I am trying to figure out how I can do this in pandas.
col1 col2 col3
a 0 1
b 2 1
c 13 1
d 18 1
the output that i want is:
col1 col3
[d, b, a] 1
Thanks in advance.
From the way I interpreted your question d should not be included. This is since 18 - 18 = 0 < 8
Regardless, I took a three step approach to this problem.
# Get the desired reference value
last_entry = df.iloc[-1][col2]
# Select only rows whose difference is at least 8
# Or the case where it is the last entry
qry = "{ref}-col2 >= 8 or index=={idx}".format(ref=last_entry, idx=len(df)-1)
diff_gt_8 = df.query(qry)
# For each value of col3 get a list of values of col1 and convert to DataFrame
pd.DataFrame( diff_gt_8.groupby(col3)[col1].apply(list) )
To compare it to your previous value:
df[(df[col2] - df[col2].shift(1)) < 12]
Note that df[col2].shift(1) returns a series with all rows down by 1. So we can compare a row in df[col2] to its previous row.
The first value will never be included since it is NaN (not a number).
UPDATE
If I've understood your new question correctly, this is what you want.
last_two_rows = df.iloc[-2:, :] # Select last two rows
if (last_two_rows.iloc[-1][col2] - last_two_rows.iloc[-2][col2]) < 12:
last_two_rows[col1].iloc[-2] = np.nan
last_two_rows[[col1]]
I have a large dataframe containing lots of columns.
For each row/index in the dataframe I do some operations, read in some ancilliary ata, etc and get a new value. Is there a way to add that new value into a new column at the correct row/index?
I can use .assign to add a new column but as I'm looping over the rows and only generating the data to add for one value at a time (generating it is quite involved). When it's generated I'd like to immediately add it to the dataframe rather than waiting until I've generated the entire series.
This doesn't work and gives a key error:
df['new_column_name'].iloc[this_row]=value
Do I need to initialise the column first or something?
There are two steps to created & populate a new column using only a row number...
(in this approach iloc is not used)
First, get the row index value by using the row number
rowIndex = df.index[someRowNumber]
Then, use row index with the loc function to reference the specific row and add the new column / value
df.loc[rowIndex, 'New Column Title'] = "some value"
These two steps can be combine into one line as follows
df.loc[df.index[someRowNumber], 'New Column Title'] = "some value"
If you have a dataframe like
import pandas as pd
df = pd.DataFrame(data={'X': [1.5, 6.777, 2.444, pd.np.NaN], 'Y': [1.111, pd.np.NaN, 8.77, pd.np.NaN], 'Z': [5.0, 2.333, 10, 6.6666]})
Instead of iloc,you can use .loc with row index and column name like df.loc[row_indexer,column_indexer]=value
df.loc[[0,3],'Z'] = 3
Output:
X Y Z
0 1.500 1.111 3.000
1 6.777 NaN 2.333
2 2.444 8.770 10.000
3 NaN NaN 3.000
If you want to add values to certain rows in a new column, depending on values in other cells of the dataframe you can do it like this:
import pandas as pd
df = pd.DataFrame(data={"A":[1,1,2,2], "B":[1,2,3,4]})
Add value in a new column based on the values in cloumn "A":
df.loc[df.A == 2, "C"] = 100
This creates the column "C" and addes the value 100 to it, if column "A" is 2.
Output:
A B C
0 1 1 NaN
1 1 2 NaN
2 2 3 100
3 2 4 100
It is not necessary to initialise the column first.
You can just use pandas built in function DataFrame.at
You can chose a list on several index or a single index and column
df.at[4, 'B'] = 10
I have a pandas dataframe that looks like this:
I would like to iterate through column 3 and if an element exists, add a new row to the dataframe, using the value in column 3 as the new value in column 2, while also using the data in columns 0 and 1 from the row where it was found as the values for columns 0 and 1 in the newly added row:
Here, row 2 is the newly added row. The values in columns 0 and 1 in this row come from the row where "D" was found, and now column 2 of the new row contains the value from column 3 in the first row, "D".
Here is one way to do it, but surely there must be a more general solution, especially if I wish to scan more than a single column:
a = pd.DataFrame([['A','B','C','D'],[1,2,'C']])
b = a.copy()
for tu in a.itertuples(index=False): # Iterate by row
if tu[3]: # If exists
b = b.append([[tu[0],tu[1],tu[3]]], ignore_index=True) # Append with new row using correct tuple elements.
You can do this without any loops by creating a new df with the columns you want and appending it to the original.
import pandas as pd
import numpy as np
df = pd.DataFrame([['A','B','C','D'],[1,2,'C']])
ndf = df[pd.notnull(df[3])][[0,1,3]]
ndf.columns = [0,1,2]
df = df.append(ndf, ignore_index=True)
This will leave NaN for the new missing values which you can change then change to None.
df[3] = df[3].where((pd.notnull(df[3])), None)
prints
0 1 2 3
0 A B C D
1 1 2 C None
2 A B D None
This may be a bit more general (assuming your columns are integers and that you are always looking to fill the previous columns in this pattern)
import pandas as pd
def append_rows(scan_row,scanned_dataframe):
new_df = pd.DataFrame()
for i,row in scanned_dataframe.iterrows():
if row[scan_row]:
new_row = [row[i] for i in range(scan_row -1)]
new_row.append(row[scan_row])
print new_row
new_df = new_df.append([new_row],ignore_index=True)
return new_df
a = pd.DataFrame([['A','B','C','D'],[1,2,'C']])
b = a.copy()
b = b.append(append_rows(3,a))