I have dataframe,
df
uid
1
2
3
...
I want to assign a new column, with values 0 or 1 depending on the uid, which I will assign.
df
uid new
1 0
2 0
3 1
..
You must explain the underlying logic.
That said, there are many possible ways.
Considering an explicit mapping with map:
mapper = {1: 0, 2: 0, 3: 1}
df['new'] = df['uid'].map(mapper)
# or
mapper = {0: [1, 2], 1: [3]}
df['new'] = df['uid'].map({k:v for v,l in mapper.items() for k in l})
Or using a list of targets for the 1 with isin and conversion to int
target = [3]
df['new'] = df['uid'].isin(target).astype(int)
Output:
uid new
0 1 0
1 2 0
2 3 1
If there is a correlation between uid and new, you can create a function to define the mapping between uid and new
def mapping(value):
new_value = value // 2
return new_value
Then
df["new"] = df["uid"].apply(mapping)
Or directly
df["new"] = df["uid"].apply(lambda value: value // 2)
from the relation of 3 uid's I came up as a relation the uid which is divisible by 3 is assign to 1 else 0. (Not sure the relation is correct or not as you have given only 3 values of uid)
you can apply np.where() -> np.where(condition, x, y) if condition satisfy it assign value x else value y
import pandas as pd
import numpy as np
df = pd.DataFrame({'uid': [1, 2, 3]})
df["new"] = np.where(df["uid"] % 3 == 0, 1, 0)
print(df)
Output:
uid new
0 1 0
1 2 0
2 3 1
Related
Please see example dataframe below:
I'm trying match values of columns X with column names and retrieve value from that matched column
so that:
A B C X result
1 2 3 B 2
5 6 7 A 5
8 9 1 C 1
Any ideas?
Here are a couple of methods:
# Apply Method:
df['result'] = df.apply(lambda x: df.loc[x.name, x['X']], axis=1)
# List comprehension Method:
df['result'] = [df.loc[i, x] for i, x in enumerate(df.X)]
# Pure Pandas Method:
df['result'] = (df.melt('X', ignore_index=False)
.loc[lambda x: x['X'].eq(x['variable']), 'value'])
Here I just build a dataframe from your example and call it df
dict = {
'A': (1,5,8),
'B': (2,6,9),
'C': (3,7,1),
'X': ('B','A','C')}
df = pd.DataFrame(dict)
You can extract the value from another column based on 'X' using the following code. There may be a better way to do this without having to convert first to list and retrieving the first element.
list(df.loc[df['X'] == 'B', 'B'])[0]
I'm going to create a column called 'result' and fill it with 'NA' and then replace the value based on your conditions. The loop below, extracts the value and uses .loc to replace it in your dataframe.
df['result'] = 'NA'
for idx, val in enumerate(list(vals)):
extracted = list(df.loc[df['X'] == val, val])[0]
df.loc[idx, 'result'] = extracted
Here it is as a function:
def search_replace(dataframe, search_col='X', new_col_name='result'):
dataframe[new_col_name] = 'NA'
for idx, val in enumerate(list(vals)):
extracted = list(dataframe.loc[dataframe[search_col] == val, val])[0]
dataframe.loc[idx, new_col_name] = extracted
return df
and the output
>>> search_replace(df)
A B C X result
0 1 2 3 B 2
1 5 6 7 A 5
2 8 9 1 C 1
I have a df that looks something like this:
name A B C D
1 bar 1 0 1 1
2 foo 0 0 0 1
3 cat 1 0-1 0
4 pet 0 0 0 1
5 ser 0 0-1 0
6 chet 0 0 0 1
I need to use loc method to add values in a new column ('E') based on the values of the other columns as a group for instance if values are [1,0,0,0] value in column E will be 1. I've tried this:
d = {'A': 1, 'B': 0, 'C': 0, 'D': 0}
A = pd.Series(data=d, index=['A', 'B', 'C', 'D'])
df.loc[df.iloc[:, 1:] == A, 'E'] = 1
It didn't work. I need to use loc method or other numpy based method since the dataset is huge. If it is possible to avoid creating a series to compare the row that would also be great, somehow extracting the values of columns A B C D and compare them as a group for each row.
You can compare values with A with test if match all rows in DataFrame.all:
df.loc[(df == A).all(axis=1), 'E'] = 1
For 0,1 column:
df['E'] = (df == A).all(axis=1).astype(int)
df['E'] = np.where(df == A).all(axis=1), 1, 0)
Here in my program I have 4 columns of csv file, in that x,y values having 0,0 values I want to change those 0,0 values to my desired values without changing other x,y values. Can you please help me how to change these values?
I tried this given code but other values of x,y values are also changing because here I am adding 3 value for whole x, but I don't want change remaining values I want to change the 0,0 x,y values to my desired values only, so can you please guide me. Thank you in advance
import pandas as pd
df = pd.read_csv("Tunnel.csv",delimiter= ',')
df['X'] = df['X'] + 3
df['Y'] = df['Y'] + 4
print(df)
This is my csv_file
You can select subframes of zero entries:
df[df['X'] == 0] += 3
df[df['Y'] == 0] += 4
To write your dataframe to a csv file named file_name use to_csv
file_name = 'file.csv'
df.to_csv(file_name)
You could use the df.loc method as follows:
df.loc[(df['X'] == 0) & (df['Y'] ==0) , ['X', 'Y'] ] = 3,4
another way around using with df.iteritems :
>>> df = pd.DataFrame({'a': [0, 0, 2], 'b': [ 0, 2, 1]})
>>> df
a b
0 0 0
1 0 2
2 2 1
>>> for key, val in df.iteritems():
... val[val == 0] = 3
...
>>> df
a b
0 3 3
1 3 2
2 2 1
You can use apply function to set these values. Below is the code:
import pandas as pd
df=pd.read_csv('t.csv',header=None, names=['x','y','depth','color'])
dfc=df.copy()
dfc['x']=dfc['x'].apply(lambda t: 1 if t==0 else t)
dfc['y']=dfc['y'].apply(lambda t: 1 if t==0 else t)
Input:
Output:
Hope this helps.
Thanks,
Rohan Hodarkar
Recently, I am converting from SAS to Python pandas. One question I have is that does pandas have a retain like function in SAS,so that I can dynamically referencing the last record. In the following code, I have to manually loop through each line and reference the last record. It seems pretty slow compared to the similar SAS program. Is there anyway that makes it more efficient in pandas? Thank you.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 1, 1, 1], 'B': [0, 0, 1, 0]})
df['C'] = np.nan
df['lag_C'] = np.nan
for row in df.index:
if row == df.head(1).index:
df.loc[row, 'C'] = (df.loc[row, 'A'] == 0) + 0
else:
if (df.loc[row, 'B'] == 1):
df.loc[row, 'C'] = 1
elif (df.loc[row, 'lag_C'] == 0):
df.loc[row, 'C'] = 0
elif (df.loc[row, 'lag_C'] != 0):
df.loc[row, 'C'] = df.loc[row, 'lag_C'] + 1
if row != df.tail(1).index:
df.loc[row +1, 'lag_C'] = df.loc[row, 'C']
Very complicated algorithm, but I try vectorized approach.
If I understand it, there can be use cumulative sum as using in this question. Last column lag_C is shifted column C.
But my algorithm can't be use in first rows of df, because only these rows are counted from first value of column A and sometimes column B. So I created column D, where are distinguished rows and latter are copy to output column C, if conditions are True.
I changed input data and test first problematic rows. I try test all three possibilities of first 3 rows of column B with first row of column A.
My input condition are:
Column A and B are only 1 or O. Column C and lag_C are helper columns with only NaN.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1,1,1,1,1,0,0,1,1,0,0], 'B': [0,0,1,1,0,0,0,1,0,1,0]})
df1 = pd.DataFrame({'A': [1,1,1,1,1,0,0,1,1,0,0], 'B': [0,0,1,1,0,0,0,1,0,1,0]})
#cumulative sum of column B
df1['C'] = df1['B'].cumsum()
df1['lag_C'] = 1
#first 'group' with min value is problematic, copy to column D for latter use
df1.loc[df1['C'] == df1['C'].min() ,'D'] = df1['B']
#cumulative sums of groups to column C
df1['C']= df1.groupby(['C'])['lag_C'].cumsum()
#correct problematic states in column C, use value from D
if (df1['A'].loc[0] == 1):
df1.loc[df1['D'].notnull() ,'C'] = df1['D']
if ((df1['A'].loc[0] == 1) & (df1['B'].loc[0] == 1)):
df1.loc[df1['D'].notnull() ,'C'] = 0
del df1['D']
#shifted column lag_C from column C
df1['lag_C'] = df1['C'].shift(1)
print df1
# A B C lag_C
#0 1 0 0 NaN
#1 1 0 0 0
#2 1 1 1 0
#3 1 1 1 1
#4 1 0 2 1
#5 0 0 3 2
#6 0 0 4 3
#7 1 1 1 4
#8 1 0 2 1
#9 0 1 1 2
#10 0 0 2 1
I have a list of lists as below
[[1, 2], [1, 3]]
The DataFrame is similar to
A B C
0 1 2 4
1 0 1 2
2 1 3 0
I would like a DataFrame, if the value in column A is equal to the first element of any of the nested lists and the value in column B of the corresponding row is equal to the second element of that same nested list.
Thus the resulting DataFrame should be
A B C
0 1 2 4
2 1 3 0
The code below do want you need:
tmp_filter = pandas.DataFrame(None) #The dataframe you want
# Create your list and your dataframe
tmp_list = [[1, 2], [1, 3]]
tmp_df = pandas.DataFrame([[1,2,4],[0,1,2],[1,3,0]], columns = ['A','B','C'])
#This function will pass the df pass columns by columns and
#only keep the columns with the value you want
def pass_true_df(df, cond):
for i, c in enumerate(cond):
df = df[df.iloc[:,i] == c]
return df
# Pass through your list and add the row you want to keep
for i in tmp_list:
tmp_filter = pandas.concat([tmp_filter, pass_true_df(tmp_df, i)])
import pandas
df = pandas.DataFrame([[1,2,4],[0,1,2],[1,3,0],[0,2,5],[1,4,0]],
columns = ['A','B','C'])
filt = pandas.DataFrame([[1, 2], [1, 3],[0,2]],
columns = ['A','B'])
accum = []
#grouped to-filter
data_g = df.groupby('A')
for k2,v2 in data_g:
accum.append(v2[v2.B.isin(filt.B[filt.A==k2])])
print(pandas.concat(accum))
result:
A B C
3 0 2 5
0 1 2 4
2 1 3 0
(I made the data and filter a little more complicated as a test.)