Create new column based on the sum of selected columns in python

Create new column based on the sum of selected columns in python - python

I have a dataframe like this
ID Q001 Q002 Q003 Q004 Q005 Q006 Q007 Q008 Win
A 1 1 1 1 1 1 1 1 Yes
B 0 1 0 1 0 1 0 1 No
C 0 1 0 1 0 1 0 1 No
D 1 1 0 1 1 1 1 1 Yes
E 1 1 0 1 1 1 1 1 Yes
F 1 1 0 1 1 1 1 1 Yes
G 0 0 1 0 0 0 0 0 No
H 0 0 0 0 0 0 0 0 No
I 1 0 1 0 1 0 1 0 No
In the above dataframe, I want to create the colum 'Win' and assign the values 'Yes' if the sum of Q001 and Q002 is equal or higher than 2 and 'No', if lower than 2. How can I do this in Python?

Use np.where() to return a value conditional on other columns.
df['Win'] = np.where(df['Q001'] + df['Q002'] >= 2, 'Yes', 'No')

Check
df['Win'] = np.where(df[['Q001','Q002']].sum(1)>=2,'Yes','No')
df
Out[680]:
ID Q001 Q002 Q003 Q004 Q005 Q006 Q007 Q008 Win
0 A 1 1 1 1 1 1 1 1 Yes
1 B 0 1 0 1 0 1 0 1 No
2 C 0 1 0 1 0 1 0 1 No
3 D 1 1 0 1 1 1 1 1 Yes
4 E 1 1 0 1 1 1 1 1 Yes
5 F 1 1 0 1 1 1 1 1 Yes
6 G 0 0 1 0 0 0 0 0 No
7 H 0 0 0 0 0 0 0 0 No
8 I 1 0 1 0 1 0 1 0 No

Simply use:
import numpy as np
cols = ['Q001', 'Q002']
df['Win'] = np.where(df[cols].sum(axis=1).ge(2),
'Yes', 'No')
You can scale this up to any number of columns.
Output:
ID Q001 Q002 Q003 Q004 Q005 Q006 Q007 Q008 Win
0 A 1 1 1 1 1 1 1 1 Yes
1 B 0 1 0 1 0 1 0 1 No
2 C 0 1 0 1 0 1 0 1 No
3 D 1 1 0 1 1 1 1 1 Yes
4 E 1 1 0 1 1 1 1 1 Yes
5 F 1 1 0 1 1 1 1 1 Yes
6 G 0 0 1 0 0 0 0 0 No
7 H 0 0 0 0 0 0 0 0 No
8 I 1 0 1 0 1 0 1 0 No

You could calculate the column as a Boolean series and replace the values with Yes and No (if you must):
df['Win'] = (df['Q001'] + df['Q002'] >= 2).replace({False: 'No', True: 'Yes'})

Related

Most efficient way to create a binary matrix of users/purchases?

I have data where there are N users and K possible items. The data is in the form of a dictionary like data[user] = [item1, item2, ...]. I want to take this dictionary and create an N x K matrix where the (n,k) is entry is 1 if user n has purchased this item and 0 otherwise. Below is sample data.
import random
random.seed(10)
# Users
N = list(range(10))
# Items represented by an integer
K = list(range(1000))
# I have a dict of {user: [item1, item2...itemK]}
# where k differs by user
data = {x:random.sample(K, random.randint(1,50)) for x in N}
# Now I want to create an N x K matrix, where rows are users, columns are items, and the (n,k) entry
# is 1 if user i has item k in list and 0 otherwise.

If I understand your question right, you can convert the list of items each user has to set and then do a test for each item.
Note: I lowered the number of items to 50 (to represent it better on screen):
import random
random.seed(10)
# Users
N = list(range(10))
# Items represented by an integer
K = list(range(50))
# I have a dict of {user: [item1, item2...itemK]}
# where k differs by user
data = {x: random.sample(K, random.randint(1, 50)) for x in N}
# create matrix:
matrix = []
for v in data.values():
v = set(v)
matrix.append([int(i in v) for i in K])
# print matrix:
for row in matrix:
print(*row)
Prints (each row is different user):
1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1
1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
0 1 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1
0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 1

The best possible way includes traversing each user in dictionary and each item the user has at the least.
//Assuming users are also represented by integers
mat = [[0]*N]*K //Matrix initialised to value 0
for ui in data:
for i in data[ui]:
mat[ui][i]=1
If the user can have repeated items, you can try-
mat = [[0]*N]*K
for ui in data:
for i in list(set(data[ui])):
mat[ui][i]=1

Analyze values around a main value in 2D matrix

I have in Python, a series of 2D arrays consisting of both negative and positive values with commas. For each matrix I have to find values included in a range. Up to this point I have succeeded.
Once I have found the values with their indices, however, I have to analyze their surroundings (with for example a submatrix of known size) and depending on the values I find in the surroundings (through a condition) I assign the value 0 or 1 .
Thanks in advance everyone
Update:
I expect to get a matrix that contains zones with values of 1 or with zero values taking into account the values surrounding my main value defined with an initial condition.
A part of My matrix 2D = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 2 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 3 3 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 2 3 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
]
I would like to analyze the neighborhood of the value 2. If in its neighborhood there are only 1 values then I assign a condition (eg true) or a specific value.
On the other hand, if in its surroundings there are values equal to 2 I would like to extend the search (maximum distance of 3 cells from the identified value) until the condition is satisfied (the neighborhood equal to 1).
Thanks

Accumulate 1 and Reset to 0 once condition is met

Currently I have a dataset below and I try to accumulate the value if ColA is 0 while reset the value to 0 (restart counting again) if the ColA is 1 again.
ColA
1
0
1
1
0
1
0
0
0
1
0
0
0
My expected result is as below.
ColA Accumulate
1 0
0 1
1 0
1 0
0 1
1 0
0 1
0 2
0 3
1 0
0 1
0 2
0 3
The current code I use
test['Value'] = np.where ( (test['ColA']==1),test['ColA'].cumsum() ,0)
ColA Value
1 0
0 1
1 0
1 0
0 2
1 0
0 3
0 4
0 5
1 0
0 6
0 7
0 8

Use cumsum if performance is important:
a = df['ColA'] == 0
cumsumed = a.cumsum()
df['Accumulate'] = cumsumed-cumsumed.where(~a).ffill().fillna(0).astype(int)
print (df)
ColA Accumulate
0 1 0
1 0 1
2 1 0
3 1 0
4 0 1
5 1 0
6 0 1
7 0 2
8 0 3
9 1 0
10 0 1
11 0 2
12 0 3

This should do it:
test['Value'] = (test['ColA']==0) * 1 * (test['ColA'].groupby((test['ColA'] != test['ColA'].shift()).cumsum()).cumcount() + 1)
It is an adaption of this answer.

How do I open a binary matrix and convert it into a 2D array or a dataframe?

I have a binary matrix in a txt file that looks as follows:
0011011000
1011011000
0011011000
0011011010
1011011000
1011011000
0011011000
1011011000
0100100101
1011011000
I want to make this into a 2D array or a dataframe where there is one number per column and the rows are as shown. I've tried using numpy and pandas, but the output has only one column that contains the whole number. I want to be able to call an entire column as a number.
One of the codes I've tried is:
with open("a1data1.txt") as myfile:
dat1=myfile.read().split('\n')
dat1=pd.DataFrame(dat1)

Use read_fwf with parameter widths:
df = pd.read_fwf("a1data1.txt", header=None, widths=[1]*10)
print (df)
0 1 2 3 4 5 6 7 8 9
0 0 0 1 1 0 1 1 0 0 0
1 1 0 1 1 0 1 1 0 0 0
2 0 0 1 1 0 1 1 0 0 0
3 0 0 1 1 0 1 1 0 1 0
4 1 0 1 1 0 1 1 0 0 0
5 1 0 1 1 0 1 1 0 0 0
6 0 0 1 1 0 1 1 0 0 0
7 1 0 1 1 0 1 1 0 0 0
8 0 1 0 0 1 0 0 1 0 1
9 1 0 1 1 0 1 1 0 0 0

After you read your txt, you can using following code fix it
pd.DataFrame(df[0].apply(list).values.tolist())
Out[846]:
0 1 2 3 4 5 6 7 8 9
0 0 0 1 1 0 1 1 0 0 0
1 1 0 1 1 0 1 1 0 0 0
2 0 0 1 1 0 1 1 0 0 0
3 0 0 1 1 0 1 1 0 1 0
4 1 0 1 1 0 1 1 0 0 0
5 1 0 1 1 0 1 1 0 0 0
6 0 0 1 1 0 1 1 0 0 0
7 1 0 1 1 0 1 1 0 0 0
8 0 1 0 0 1 0 0 1 0 1
9 1 0 1 1 0 1 1 0 0 0

Select the values in pop dataframe which its index in index_par dataframe

I'm buliding genetic algorithm for feature selection. And I'm having some difficulties.
I have pop dataframe (population), consist of 20 individus and 9 features:
0 1 2 3 4 5 6 7 8
0 0 1 1 1 0 0 0 0 1
1 0 0 1 1 1 0 0 1 0
2 0 1 0 0 1 0 0 0 1
3 0 0 0 1 1 0 0 1 1
4 1 0 0 1 1 1 1 1 0
5 1 1 0 0 0 1 0 1 1
6 0 0 1 1 0 1 1 1 1
7 1 1 0 0 1 1 1 1 1
8 0 0 0 0 1 0 0 1 1
9 1 0 1 1 1 1 1 1 1
10 0 0 1 1 0 1 0 1 1
11 1 1 1 0 1 1 0 0 0
12 0 0 1 0 0 0 1 1 0
13 0 0 1 1 1 1 1 1 0
14 1 1 1 1 0 0 0 1 0
15 1 1 0 1 1 1 0 1 1
16 1 0 1 0 1 1 1 0 0
17 1 1 0 0 1 1 0 0 1
18 1 0 1 0 0 0 1 0 0
19 1 1 1 1 1 1 1 0 0
And I have index_par dataframe, consist of index number:
0
0 0
1 1
2 4
3 5
4 8
5 10
6 11
7 13
8 14
9 19
The index_par dataframe is the indexes of selected parent for crossover.
How can I select the values in pop dataframe which its index in index_par dataframe? Thanks in advance.

I think you need loc by column 0 of index_par:
index_par = pd.DataFrame({0:[0,1,4,5,8,10,11,13,14,19]})
df3 = pop.loc[index_par[0]]
print (df3)
0 1 2 3 4 5 6 7 8
0 0 1 1 1 0 0 0 0 1
1 0 0 1 1 1 0 0 1 0
4 1 0 0 1 1 1 1 1 0
5 1 1 0 0 0 1 0 1 1
8 0 0 0 0 1 0 0 1 1
10 0 0 1 1 0 1 0 1 1
11 1 1 1 0 1 1 0 0 0
13 0 0 1 1 1 1 1 1 0
14 1 1 1 1 0 0 0 1 0
19 1 1 1 1 1 1 1 0 0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create new column based on the sum of selected columns in python - python

Use np.where() to return a value conditional on other columns. df['Win'] = np.where(df['Q001'] + df['Q002'] >= 2, 'Yes', 'No')

You could calculate the column as a Boolean series and replace the values with Yes and No (if you must): df['Win'] = (df['Q001'] + df['Q002'] >= 2).replace({False: 'No', True: 'Yes'})

Related

Most efficient way to create a binary matrix of users/purchases?

Analyze values around a main value in 2D matrix

Accumulate 1 and Reset to 0 once condition is met

How do I open a binary matrix and convert it into a 2D array or a dataframe?

Select the values in pop dataframe which its index in index_par dataframe

Categories

Resources