Python pandas cumsum with reset by value in another column

Python pandas cumsum with reset by value in another column - python

I've got success/failure data on several simulations. Each simulation consists of several trials and I want a cumulative sum of the successes per simulation. Here's an example of my data:
data = pd.DataFrame([[0, 0, 0],
[0, 1, 0],
[0, 2, 1],
[0, 3, 0],
[1, 0, 1],
[1, 1, 0],
[1, 2, 0],
[1, 3, 1],
[2, 0, 0],
[2, 1, 1],
[2, 2, 1],
[2, 3, 1],
[0, 0, 0],
[0, 1, 1],
[0, 2, 1],
[0, 3, 0]],
columns=['simulation', 'trial', 'success'])
Using this answer, I came up with the following code but it isn't quite working and I can't figure out why.
cumsum = data['success'].cumsum()
reset = -cumsum[data['trial'] == 0].diff().fillna(cumsum)
data['cumsum'] = data['success'].where(data['trial'] != 0, reset).cumsum()
The resulting column is [0, 0, 1, 1, -1, -1, -1, 0, -1, 0, 1, 2, -1, 0, 1, 1] but I expect [0, 0, 1, 1, 1, 1, 1, 2, 0, 1, 2, 3, 0, 1, 2, 2]

You can do groupby 'simulation' & then cumsum the 'success'.
data.groupby(data.simulation.ne(data.simulation.shift()).cumsum())['success'].cumsum()
or
data.groupby((data.simulation!=data.simulation.shift()).cumsum())['success'].cumsum()

Related

Concatenate all 2 dimensional values in a dictionary. (Output is Torch tensor)

I want to concatenate all 2 dimensional values in a dictionary.
The number of rows of these values is always the same.
D = {'a': [[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
'b': [[1, 1],
[1, 1],
[1, 1]],
'c': [[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]]
}
And the output must be form of a torch tensor.
tensor([[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2]])
Any help would be appreciated!!

import torch
print(torch.cat(tuple([torch.tensor(D[name]) for name in D.keys()]), dim=1))
Output:
tensor([[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2]])

from itertools import chain
l = []
for i in range(len(D)):
t = [ D[k][i] for k in D ]
l.append( list(chain.from_iterable(t)) )
Output:
[[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2]]

Merge two arrays with the same dimension based on a condition

I have two arrays with the same dimension:
a = [
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 1, 1, 1], ]
b = [
[0, 1, 1, 0],
[0, 0, 0, 0],
[2, 0, 0, 2],
[0, 0, 0, 0], ]
I would like to create a new one, only changing the values where B is not 0 and is different than A. The result would be:
c = [
[1, 1, 1, 1],
[1, 0, 0, 1],
[2, 0, 0, 2],
[1, 1, 1, 1], ]
How can I do this?

You can do assignment with boolean conditions:
a[b != 0] = b[b != 0]
a
array([[1, 1, 1, 1],
[1, 0, 0, 1],
[2, 0, 0, 2],
[1, 1, 1, 1]])

Here is one that I find easy to parse:
>>> np.where(b,b,a)
array([[1, 1, 1, 1],
[1, 0, 0, 1],
[2, 0, 0, 2],
[1, 1, 1, 1]])
This picks each value from either the third or second arguments based on whether the first argument is zero or not.

You don't need numpy. Here is a solution you can actually read line by line:
a = [
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 1, 1, 1], ]
b = [
[0, 1, 1, 0],
[0, 0, 0, 0],
[2, 0, 0, 2],
[0, 0, 0, 0], ]
c = a[:]
#I would like to create a new one, only changing the values where B is not 0 and is different than A. The result would be:
for lineindex,line in enumerate(a):
for index,x in enumerate(line):
if x != b[lineindex][index] and b[lineindex][index] != 0:
c[lineindex][index] = b[lineindex][index]
print(c)

Present ndarray of interger elements as ndarray of arrays

I'm having problems with vectorized function application to ndarrays.
What is a good and working way to do this?
Input:
y_train
array([0, 0, 2, 1, 2, 0, 2, 1, 0, 0, 1, 2, 1, 0, 0, 0, 2, 0, 2, 1, 2, 2,
1, 2, 2, 0, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 1, 2, 2, 2, 0, 2, 1, 0,
0, 0, 1, 2, 0, 2, 2, 1, 2, 2, 1, 2, 2, 2, 0, 1, 1, 1, 1, 2, 0, 0,
0, 1, 1, 1, 0, 2, 0, 1, 1, 2, 0, 2, 2, 2, 2, 0, 2, 2, 0, 0, 0, 1,
2, 0, 1, 0, 0, 1, 2, 2, 2, 0, 1, 1, 2, 0, 1, 0, 0, 2, 2, 0, 2, 0,
2, 1, 1, 0, 1, 0, 2, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0])
Desired Output:
array([[0,0],
[0,0],
[0,1],
[1,0],
...
..])
I have:
def func(x):
return np.array([int(x) for x in list(np.binary_repr(x,width=2,))])
func(y_train)
TypeError Traceback (most recent call last)
<ipython-input-178-ca45ba935147> in <module>
TypeError: only integer scalar arrays can be converted to a scalar index

Based on the additional chat conversation, it looks like you want to convert the (130,1) shaped array to a (130,2) shaped array where you want to replace 0 with [0,0], 1 with [1,0], and 2 with [0,1].
To do this, create a dictionary, then lookup the dictionary, and replace each element in y_train with the dictionary value.
The code is as follows:
y_train = [0, 0, 2, 1, 2, 0, 2, 1, 0, 0, 1, 2, 1, 0, 0, 0, 2, 0, 2, 1, 2, 2,
1, 2, 2, 0, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 1, 2, 2, 2, 0, 2, 1, 0,
0, 0, 1, 2, 0, 2, 2, 1, 2, 2, 1, 2, 2, 2, 0, 1, 1, 1, 1, 2, 0, 0,
0, 1, 1, 1, 0, 2, 0, 1, 1, 2, 0, 2, 2, 2, 2, 0, 2, 2, 0, 0, 0, 1,
2, 0, 1, 0, 0, 1, 2, 2, 2, 0, 1, 1, 2, 0, 1, 0, 0, 2, 2, 0, 2, 0,
2, 1, 1, 0, 1, 0, 2, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0]
d = {0:[0,0],1:[1,0],2:[0,1]}
arr = [d[i] for i in y_train]
print (arr)
The output of this will be:
[[0, 0], [0, 0], [0, 1], [1, 0], [0, 1], [0, 0], [0, 1], [1, 0], [0, 0], [0, 0], [1, 0], [0, 1], [1, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 0], [0, 1], [1, 0], [0, 1], [0, 1], [1, 0], [0, 1], [0, 1], [0, 0], [1, 0], [0, 1], [1, 0], [1, 0], [0, 1], [1, 0], [1, 0], [0, 1], [0, 0], [0, 1], [1, 0], [0, 1], [0, 1], [0, 1], [0, 0], [0, 1], [1, 0], [0, 0], [0, 0], [0, 0], [1, 0], [0, 1], [0, 0], [0, 1], [0, 1], [1, 0], [0, 1], [0, 1], [1, 0], [0, 1], [0, 1], [0, 1], [0, 0], [1, 0], [1, 0], [1, 0], [1, 0], [0, 1], [0, 0], [0, 0], [0, 0], [1, 0], [1, 0], [1, 0], [0, 0], [0, 1], [0, 0], [1, 0], [1, 0], [0, 1], [0, 0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 0], [0, 1], [0, 1], [0, 0], [0, 0], [0, 0], [1, 0], [0, 1], [0, 0], [1, 0], [0, 0], [0, 0], [1, 0], [0, 1], [0, 1], [0, 1], [0, 0], [1, 0], [1, 0], [0, 1], [0, 0], [1, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 0], [0, 1], [0, 0], [0, 1], [1, 0], [1, 0], [0, 0], [1, 0], [0, 0], [0, 1], [0, 0], [0, 0], [0, 0], [1, 0], [1, 0], [1, 0], [1, 0], [0, 0], [1, 0], [1, 0], [0, 0], [0, 0], [0, 0]]
You can also achieve this using list(map(d.get, y_train)) where d is the dictionary with the lookup values.
Looks like you want this to be a two 2 column array.
import numpy as np
y_train = [0, 0, 2, 1, 2, 0, 2, 1, 0, 0, 1, 2, 1, 0, 0, 0, 2, 0, 2, 1, 2, 2,
1, 2, 2, 0, 1, 2, 1, 1, 2, 1, 1, 2, 0, 2, 1, 2, 2, 2, 0, 2, 1, 0,
0, 0, 1, 2, 0, 2, 2, 1, 2, 2, 1, 2, 2, 2, 0, 1, 1, 1, 1, 2, 0, 0,
0, 1, 1, 1, 0, 2, 0, 1, 1, 2, 0, 2, 2, 2, 2, 0, 2, 2, 0, 0, 0, 1,
2, 0, 1, 0, 0, 1, 2, 2, 2, 0, 1, 1, 2, 0, 1, 0, 0, 2, 2, 0, 2, 0,
2, 1, 1, 0, 1, 0, 2, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0]
ln = len(y_train) #find the length of the list
arr = np.array(y_train) #convert y_train to numpy array
arr1 = arr.reshape(ln//2,2) #convert it to length/2 for rows, 2 for columns
print (arr1)
The output of this will be:
[[0 0]
[2 1]
[2 0]
[2 1]
[0 0]
[1 2]
......
[2 0]
[0 0]
[1 1]
[1 1]
[0 1]
[1 0]
[0 0]]

How to set 0-1 matrix using a vector of indices using numpy?

Matrix A:
A = np.array([[3, 0, 0, 8, 3],
[9, 3, 2, 2, 6],
[5, 5, 4, 2, 8],
[3, 8, 7, 1, 2],
[3, 9, 1, 5, 5]])
Matrix B: values in each row means the index of each row in matrix A.
B = np.array([[1, 2],
[3, 4],
[1, 3],
[0, 1],
[2, 3]])
We will set values in A whose index are in B to 1, others to 0.
Then the result will be:
A = np.array([[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 1, 0]])
I don't want to use for loop, how can I do it with numpy?

We can index using arrays. For axis0, we just make a range for 0-len(B) to cover each row. Then for axis1, we transpose B to represent all the column indices we want to access.
>>> C = np.zeros_like(A)
>>> C[np.arange(len(B)), B.T] = 1
>>> C
array([[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 1, 0]])

>>> B = np.array([[1, 2],
... [3, 4],
... [1, 3],
... [0, 1],
... [2, 3]])
Convenient but a bit wasteful
>>> np.identity(5,int)[B].sum(1)
array([[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 1, 0]])
More economical but also more typing
>>> out = np.zeros((5,5),int)
>>> out[np.c_[:5],B] = 1
>>> out
array([[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 1, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 1, 0]])

how to convert Node and Arc sets representation to an adjacency matrix in python

What am I doing wrong?
N=[1, 2, 3, 4, 5, 6]
A=[[1, 2], [1, 3], [1, 5], [2, 3], [2, 4], [3, 4], [3, 5], [4, 6], [5, 6]]
for i in range (len(N)):
for j in range (len(N)):
my_list1 = [i[0] for i in A]
my_list2 = [i[1] for i in A]
print my_list1
print my_list2
I am not getting this output instead im getting [1, 1, 1, 2, 2, 3, 3, 4, 5]
repeated multiply times
ADJ=[[0, 1, 1, 0, 1, 0], [0, 0, 1, 1, 0, 0], [0, 0, 0, 1, 1, 0], \
[0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0]]

The simplest way to approach this requires building an empty adjacency matrix first, then populating it with a single pass through A. Here's a simple example that ignores the contents of N.
#!/usr/bin/env python
def show(matrix):
for row in matrix:
print row
print
N = [1, 2, 3, 4, 5, 6]
A = [[1, 2], [1, 3], [1, 5], [2, 3], [2, 4], [3, 4], [3, 5], [4, 6], [5, 6]]
adj_target = [
[0, 1, 1, 0, 1, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0]
]
show(adj_target)
size = len(N)
adj = [[0]*size for _ in range(size)]
#show(adj)
for u,v in A:
adj[u-1][v-1] += 1
show(adj)
print adj == adj_target
output
[0, 1, 1, 0, 1, 0]
[0, 0, 1, 1, 0, 0]
[0, 0, 0, 1, 1, 0]
[0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 0]
[0, 1, 1, 0, 1, 0]
[0, 0, 1, 1, 0, 0]
[0, 0, 0, 1, 1, 0]
[0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 0]
True

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python pandas cumsum with reset by value in another column - python

You can do groupby 'simulation' & then cumsum the 'success'. data.groupby(data.simulation.ne(data.simulation.shift()).cumsum())['success'].cumsum() or data.groupby((data.simulation!=data.simulation.shift()).cumsum())['success'].cumsum()

Related

Concatenate all 2 dimensional values in a dictionary. (Output is Torch tensor)

Merge two arrays with the same dimension based on a condition

Present ndarray of interger elements as ndarray of arrays

How to set 0-1 matrix using a vector of indices using numpy?

how to convert Node and Arc sets representation to an adjacency matrix in python

Categories

Resources