When the column "event" has a value different than 0, and rows above from column "input1" has value 5, I need to copy the value from row above of column "label".
What I have now:
input1 input2 input3 event label
0 0 0 0 0
5 5 0 0 2
5 5 0 0 2
0 0 0 24 0
0 0 0 0 0
5 0 5 0 3
5 0 5 0 3
5 0 5 0 3
0 0 0 25 0
0 0 0 0 0
What I need to happen:
input1 input2 input3 event label marker
0 0 0 0 0 0
5 5 0 0 2 0
5 5 0 0 2 0
0 0 0 24 0 2
0 0 0 0 0 0
5 0 5 0 3 0
5 0 5 0 3 0
5 0 5 0 3 0
0 0 0 25 0 3
0 0 0 0 0 0
You can use boolean masks eq/ne, and shift to get the previous values, then select with where:
# is the previous input1 equal to 5?
m1 = df['input1'].shift().eq(5)
# is event not 0?
m2 = df['event'].ne(0)
# get the previous label if both conditions are true, else 0
df['marker'] = df['label'].shift(fill_value=0).where(m1&m2, 0)
# OR
# df['marker'] = df['label'].shift().where(m1&m2, 0).convert_dtypes()
output:
input1 input2 input3 event label marker
0 0 0 0 0 0 0
1 5 5 0 0 2 0
2 5 5 0 0 2 0
3 0 0 0 24 0 2
4 0 0 0 0 0 0
5 5 0 5 0 3 0
6 5 0 5 0 3 0
7 5 0 5 0 3 0
8 0 0 0 25 0 3
9 0 0 0 0 0 0
Related
In the pandas data frame, the one-hot encoded vectors are present as columns, i.e:
Rows A B C D E
0 0 0 0 1 0
1 0 0 1 0 0
2 0 1 0 0 0
3 0 0 0 1 0
4 1 0 0 0 0
4 0 0 0 0 1
How to convert these columns into one data frame column by label encoding them in python? i.e:
Rows A
0 4
1 3
2 2
3 4
4 1
5 5
Also need suggestion on this that some rows have multiple 1s, how to handle those rows because we can have only one category at a time.
Try with argmax
#df=df.set_index('Rows')
df['New']=df.values.argmax(1)+1
df
Out[231]:
A B C D E New
Rows
0 0 0 0 1 0 4
1 0 0 1 0 0 3
2 0 1 0 0 0 2
3 0 0 0 1 0 4
4 1 0 0 0 0 1
4 0 0 0 0 1 5
argmaxis the way to go, adding another way using idxmax and get_indexer:
df['New'] = df.columns.get_indexer(df.idxmax(1))+1
#df.idxmax(1).map(df.columns.get_loc)+1
print(df)
Rows A B C D E New
0 0 0 0 1 0 4
1 0 0 1 0 0 3
2 0 1 0 0 0 2
3 0 0 0 1 0 4
4 1 0 0 0 0 1
5 0 0 0 0 1 5
Also need suggestion on this that some rows have multiple 1s, how to
handle those rows because we can have only one category at a time.
In this case you dot your DataFrame of dummies with an array of all the powers of 2 (based on the number of columns). This ensures that the presence of any unique combination of dummies (A, A+B, A+B+C, B+C, ...) will have a unique category label. (Added a few rows at the bottom to illustrate the unique counting)
df['Category'] = df.dot(2**np.arange(df.shape[1]))
A B C D E Category
Rows
0 0 0 0 1 0 8
1 0 0 1 0 0 4
2 0 1 0 0 0 2
3 0 0 0 1 0 8
4 1 0 0 0 0 1
5 0 0 0 0 1 16
6 1 0 0 0 1 17
7 0 1 0 0 1 18
8 1 1 0 0 1 19
Another readable solution on top of other great solutions provided that works for ANY type of variables in your dataframe:
df['variables'] = np.where(df.values)[1]+1
output:
A B C D E variables
0 0 0 0 1 0 4
1 0 0 1 0 0 3
2 0 1 0 0 0 2
3 0 0 0 1 0 4
4 1 0 0 0 0 1
5 0 0 0 0 1 5
I am trying to do cumulative sum by intervals ie. with cumsum being reset to zero if the next value to accumulate is 0. Below is an example with the desired result following. I have tried using numpy 'convolve' and 'groupby' but can't get come up with a way to do the reset except by creating a def that loops over all the rows. Is there a clever approach I'm missing? Note that the real data in column 'x' are real numbers separated by 0's.
import numpy as np
import pandas as pd
a = pd.DataFrame([[0,0],[1,0],[1,0],[1,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],\
[0,0],[0,0],[0,0],[0,0],[1,0],[1,0],[0,0]], columns=["x","y"])
def patch(k):
k["z"] = k.x.cumsum()
return k
print(patch(a))
Current output:
x y z
0 0 0 0
1 1 0 1
2 1 0 2
3 1 0 3
4 0 0 3
6 0 0 3
7 0 0 3
9 0 0 3
10 0 0 3
12 0 0 3
13 1 0 4
15 1 0 5
16 0 0 5
Desired output:
x y z
0 0 0 0
1 1 0 1
2 1 0 2
3 1 0 3
4 0 0 0
6 0 0 0
7 0 0 0
9 0 0 0
10 0 0 0
12 0 0 0
13 1 0 1
15 1 0 2
16 0 0 0
Do groupby on cumsum:
a['z'] = a.groupby(a['x'].eq(0).cumsum())['x'].cumsum()
Output:
x y z
0 0 0 0
1 1 0 1
2 1 0 2
3 1 0 3
4 0 0 0
6 0 0 0
7 0 0 0
9 0 0 0
10 0 0 0
12 0 0 0
13 1 0 1
15 1 0 2
16 0 0 0
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 1 1 1 0 0 0 0 0 1 1 0 0 3 3 0 0 0 4 4 0 0 0 5 5 5 5 0 0 2 2 2 2 2 0 2 2 2 2 2 0 0 0 6 6 6 6 6 6 0 6 6 6 6]
[0 1 1 0 0 0 0 0 0 0 0 0 0 3 3 0 0 0 4 4 0 0 5 5 5 5 5 5 0 2 2 2 2 2 2 2 2 2 2 2 2 0 0 6 6 6 6 6 6 6 6 6 6 6]
[1 1 1 0 0 0 0 0 0 0 0 0 0 3 3 0 0 0 4 4 0 5 5 5 0 0 5 5 5 0 2 2 0 0 2 2 0 0 0 2 2 0 0 6 6 0 0 6 6 6 0 0 6 6]
[1 1 1 0 0 0 0 0 0 0 0 0 0 3 3 0 0 0 4 4 0 5 5 5 5 0 0 0 0 0 2 2 0 2 2 2 0 0 0 2 2 2 0 6 6 0 0 0 6 6 0 0 6 6]
[1 1 1 0 0 0 0 0 0 0 0 0 0 3 3 0 0 0 4 4 0 0 5 5 5 5 5 5 0 0 2 2 0 2 2 2 0 0 0 2 2 2 0 6 6 0 0 0 6 6 0 0 6 6]
[0 1 1 0 0 0 0 0 0 7 0 0 0 3 3 0 0 0 4 4 0 0 0 0 5 5 5 5 5 0 2 2 0 2 2 2 0 0 0 2 2 2 0 6 6 0 0 0 6 6 0 0 6 6]]
what i want is to change the value of every none 0 number to 1.
what i can do:
for element in list:
for sub_element in element:
if sub_element != 0:
sub_element = 1
How can i do that in numpy?
If your numpy array is named a, you can use something like this:
a[a!=0.0] = 1
proof:
>>> a = numpy.array([0.0, 1.0, 2.0, 3.0, 0.0, 10.0])
>>> a
array([ 0., 1., 2., 3., 0., 10.])
>>> a[a!=0.0] = 1
>>> a
array([ 0., 1., 1., 1., 0., 1.])
This works because a != 0.0 will return array with True/False values
where the condition is met and then the assignment is performed only on those
elements where there is True:
>>> a != 0
array([False, True, True, True, False, True], dtype=bool)
Also, this works with any other condition.
Given a NumPy array A:
res = A.astype(bool).astype(int)
For efficiency, and most use cases, you can keep your array as Boolean, omitting the integer conversion.
This works because 0 is the only integer considered False; all others are True. Converting a Boolean array to int is then a trivial mapping of False to 0 and True to 1.
Has a table like this:
ID Word
1 take
2 the
3 long
4 long
5 road
6 and
7 walk
8 it
9 walk
10 it
Wanna to use pivot table in pandas to get distinct words in columns and 1 and 0 in Values. Smth like this matrix:
ID Take The Long Road And Walk It
1 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0
3 0 0 1 0 0 0 0
4 0 0 1 0 0 0 0
5 0 0 0 1 0 0 0
and so on
Trying to use pivot table but not familiar with pandas syntax yet:
import pandas as pd
data = pd.read_csv('dataset.txt', sep='|', encoding='latin1')
table = pd.pivot_table(data,index=["ID"],columns=pd.unique(data["Word"].values),fill_value=0)
How can I rewrite pivot table function to deal with it?
You can use concatwith str.get_dummies:
print pd.concat([df['ID'], df['Word'].str.get_dummies()], axis=1)
ID and it long road take the walk
0 1 0 0 0 0 1 0 0
1 2 0 0 0 0 0 1 0
2 3 0 0 1 0 0 0 0
3 4 0 0 1 0 0 0 0
4 5 0 0 0 1 0 0 0
5 6 1 0 0 0 0 0 0
6 7 0 0 0 0 0 0 1
7 8 0 1 0 0 0 0 0
8 9 0 0 0 0 0 0 1
9 10 0 1 0 0 0 0 0
Or as Edchum mentioned in comments - pd.get_dummies:
print pd.concat([df['ID'], pd.get_dummies(df['Word'])], axis=1)
ID and it long road take the walk
0 1 0 0 0 0 1 0 0
1 2 0 0 0 0 0 1 0
2 3 0 0 1 0 0 0 0
3 4 0 0 1 0 0 0 0
4 5 0 0 0 1 0 0 0
5 6 1 0 0 0 0 0 0
6 7 0 0 0 0 0 0 1
7 8 0 1 0 0 0 0 0
8 9 0 0 0 0 0 0 1
9 10 0 1 0 0 0 0 0
I'm at a point where I have a list of tuples like this:
[('Ben', '5 0 0 0 0 0 0 1 0 1 -3 5 0 0 0 5 5 0 0 0 0 5 0 0 0 0 0 0 0 0 1 3 0 1 0 -5 0 0 5 5 0 5 5 5 0 5 5 0 0 0 5 5 5 5 -5'),
('Moose', '5 5 0 0 0 0 3 0 0 1 0 5 3 0 5 0 3 3 5 0 0 0 0 0 5 0 0 0 0 0 3 5 0 0 0 0 0 5 -3 0 0 0 5 0 0 0 0 0 0 5 5 0 3 0 0'),
...]
it continues on for some time.
I'm relatively new to python (and programming in general), so I'm struggling to figure out how to create a new tuple with the item[1] element of each tuple modified, but the item[0] element of each left untouched. All I want to do is make the second element of each tuple into a list of all those numbers. I tried to just .split() it, but tuples are immutable.
How would you approach this problem, then?
Ultimately its for a beginning practice example. We're making a very primitive movie recommender based on the ratings for each film, and each of those numbers is a rating that person gave corresponding to the film.
Since tuples are immutable, you'll have to create new ones based on old ones:
data = [('Ben', '5 0 0 0 0 0 0 1 0 1 -3 5 0 0 0 5 5 0 0 0 0 5 0 0 0 0 0 0 0 0 1 3 0 1 0 -5 0 0 5 5 0 5 5 5 0 5 5 0 0 0 5 5 5 5 -5'),
('Moose', '5 5 0 0 0 0 3 0 0 1 0 5 3 0 5 0 3 3 5 0 0 0 0 0 5 0 0 0 0 0 3 5 0 0 0 0 0 5 -3 0 0 0 5 0 0 0 0 0 0 5 5 0 3 0 0')]
data = [(name, ratings.split()) for name, ratings in data]
If you aren't familiar with list comprehensions, this is equivalent to
new_data = []
for name, ratings in data:
new_data.append( (name, ratings.split()) )
data = new_data