I'm sure this has been asked before, but I couldn't find exactly what I was looked for.
I have a np.array and I would like to create an additional column (C2) which has values dependent on another column (C1).
In pseudocode, I would like to make a column where (j = 2:n):
R1C1 = R1C2
IF |Rj-1C2 - RjC1| < 20 THEN RjC2 = Rj-1C2
ElSE RjC2 = RjC1
I'm quite new to python, but I'm sure this is pretty straight forward. I basically just need to know how I can insert this formula into python for an np.array.
Thank you
This is pretty specific. Not sure there is a simple formula for this, because you are recursively generating the column rather than using existing data. You could do the following, where a is the index of your old column and b is the index of the column you want to fill in :
arr[0, b] = arr[0, a]
for j in range(1, n):
arr[j, b] = arr[j - 1, b] if abs(arr[j - 1, b] - arr[j, a]) < 20 else arr[j, a]
I'm going to use zero index (i.e. row 0 being the first row, row 1 being the second row, col 0 being first column, col 1 being second column, etc.) for ease of explaining and code implementation.
The logic
Say we have a numpy array like this (call it array a) - as per your specification, both columns in first rows are the same.
a = np.array(
[
[10, 10],
[15, None],
[50, None]
]
)
You want to set n as 3 (number of rows).
The looping variant j takes the range of index 1 (inclusive) to n (exclusive). For our dummy example, j would be 1, 2. (i.e. 2 loops)
Note that Numpy indexing looks like this:
a[0][1] means first row (row 0), second column (col 1).
a[1][1] means second row (row 1), second column (col 1).
The condition being:
if abs(a[j-1][1] - a[j][0]) < 20 ... then a[j][1] = a[j-1][1]
otherwise, a[j][1] = a[j][0]
i.e. Expected output:
[
[10, 10],
[15, 10],
[50, 50]
]
The code
This is a straight Numpy implementation
import numpy as np
# Create a sample numpy array as per specification
a = np.array(
[
[10, 10],
[15, None],
[50, None]
]
)
# get number of rows there are for looping upper bound
# for our dummy example, n = 3
n = a.shape[0]
# do the loop
for j in range(1, n):
if abs(a[j-1][1] - a[j][0]) < 20:
a[j][1] = a[j-1][1]
else:
a[j][1] = a[j][0]
# the array `a` is now is now updated to...
# array([[10, 10],
# [15, 10],
# [50, 50]], dtype=object)
Also, I would suggest you to rename your question from the original:
Create column based on row value of another column.
to the new:
Update column based on row value of another column.
... since you always only have two columns (but can be many rows)
Related
This is a continuation of another question I have asked before (Dataframe add element from a column based on values contiguity from another columns), I got the solution if I use a pandas DataFrame, but not if I have 2 lists, and here is where I am stuck.
I have 2 lists:
a=[2,3,4,1]
b=[5,6,7,2,8,9,1,2,3,4]
The result I would like to add the element of b using the value of a.
The first element of a = 2 so I would like to add from b the first 2 elements (5+6)
The second element of a = 3 so I would like to add from b the next 3 elements (7+2+8)
and so on.
I tried a for loop but the sum always starts from the first element of b. There is a way to get the result I want without change b or create another list?
Is this what you're looking for?
a=[2,3,4,1]
b=[5,6,7,2,8,9,1,2,3,4]
c = []
index = 0
for item in a:
c.append(sum(b[index: index + item]))
index += item
print(c)
Output
[11, 17, 15, 4]
numpy:
import numpy as np
np.add.reduceat(b,np.cumsum(np.concatenate([[0],a[:-1]])))
# array([11, 17, 15, 4])
python:
import itertools as it
bi = iter(b)
[sum(it.islice(bi,x)) for x in a]
# [11, 17, 15, 4]
I would use numpy.cumsum to get a running sum of the starting index for the next series of sums. Then you can zip that index list against itself offset by 1 to determine the slice to sum for each iteration.
>>> from numpy import cumsum
>>> starts = cumsum([0] + a)
>>> [sum(b[i:j]) for i,j in zip(starts, starts[1:])]
[11, 17, 15, 4]
a=[2,3,4,1]
b=[5,6,7,2,8,9,1,2,3,4]
new = []
i=0
for x in range(len(a)):
el = a[x]
new.append(sum(b[i:i+el]))
i=i+el
print(new)
#[11, 17, 15, 4]
Without creating a new intermediary list, you could do something like this:
[ sum( b[ sum(a[:i]): ][ :a[i] ] ) for i in range(len(a)) ]
Although, it is somewhat computation heavy. Using a for loop which builds the list c would be a much more efficient approach, like #Balaji Ambresh answered.
I have a list of ND arrays(vectors), each vector has a (1,300) shape.
My goal is to find duplicate vectors inside a list, to sum them and then divide them by the size of a list, the result value(a vector) will replace the duplicate vector.
For example, a is a list of ND arrays, a = [[2,3,1],[5,65,-1],[2,3,1]], then the first and the last element are duplicates.
their sum would be :[4,6,2],
which will be divided by the size of a list of vectors, size = 3.
Output: a = [[4/3,6/3,2/3],[5,65,-1],[4/3,6/3,2/3]]
I have tried to use a Counter but it doesn't work for ndarrays.
What is the Numpy way?
Thanks.
If you have numpy 1.13 or higher, this is pretty simple:
def f(a):
u, inv, c = np.unique(a, return_counts = True, return_inverse = True, axis = 0)
p = np.where(c > 1, c / a.shape[0], 1)[:, None]
return (u * p)[inv]
If you don't have 1.13, you'll need some trick to convert a into a 1-d array first. I recommend #Jaime's excellent answer using np.void here
How it works:
u is the unique rows of a (usually not in their original order)
c is the number of times each row of u are repeated in a
inv is the indices to get u back to a, i.e. u[inv] = a
p is the multiplier for each row of u based on your requirements. 1 if c == 1 and c / n (where n is the number of rows in a) if c > 1. [:, None] turns it into a column vector so that it broadcasts well with u
return u * p indexed back to their original locations by [inv]
You can use numpy unique , with count return count
elements, count = np.unique(a, axis=0, return_counts=True)
Return Count allow to return the number of occurrence of each element in the array
The output is like this ,
(array([[ 2, 3, 1],
[ 5, 65, -1]]), array([2, 1]))
Then you can multiply them like this :
(count * elements.T).T
Output :
array([[ 4, 6, 2],
[ 5, 65, -1]])
Apologies if this is a simple question - I'm new to Python and numpy - but I'd be very grateful for your help.
I've got a 2D numpy array of data arranged in rows and columns, with the first column representing time, and subsequent columns showing values of different parameters at each point in time.
I want to read down a given column of data from top to bottom (i.e. time = 0 to time = number of rows), and test each element in that column in sequence to find the very first instance (and only the first instance) where the data values in that column meet given criteria.
This is different to testing 'all' or 'any' of the elements in a column 'all at once' by testing and iterating using the numpy arange() function.
As a minimal working example in pseudocode, if my array is:
myarray =
[[1, 4 ....]
[2, 3 ....]
[3, 8 ....]
[4, 9 ....]....]
...where the first column is time, and the second column contains the values of data collected at each time point.
I want to be able to iterate over the rows in sequence from top to bottom and test:
threshold = 5
for row = 0 to number of rows:
if data in [column 1, row] > threshold:
print "The first time point at which the data exceed the threshold is at time = 3 "
break
What is the most Pythonic (i.e. efficient and intelligible) way of doing this?
Is it necessary to convert the array into a list before iterating & testing, or is it possible to sequentially iterate and test over the array directly?
Hope this makes some sort of sense...
Many thanks in anticipation
Dave
Try this code:
>>> myarray = [[1, 4 ], [2, 3 ], [3, 8 ], [4, 9 ]]
>>> stop = False
>>> for row in myarray:
for d in row:
if d > 5:
print("Row: ", row, "Data: ", d)
stop = True
break
if stop:
break
('Row: ', [3, 8], 'Data: ', 8)
>>>
I have two 2D arrays, e.g.,
A = [[1,0],[2,0],[3,0],[4,0]]
B = [[2,0.3],[4,0.1]]
Although the arrays are much larger, with A about 10x the size of B, and about 100,000 rows in A. I want to replace rows in A with the row in B whenever the 1st elements of the rows match, and leave the other rows in A unchanged. In the above example, I want to end up with:
[[1,0],[2,0.3],[3,0],[4,0.1]]
How do I do this, preferably efficiently?
We will have to iterate through the entire array A once in any case, since we are transforming it. What we could speed up though, is the look-up if a particular first element of A exists in B. To that end, it would be efficient to create a dictionary out of B. That way, lookup will be constant time. I am assuming here that the first element of A matches to only one element of B.
Transforming B to a dict can be done this way:
transformed_B = { item[0]: item[1] for item in B}
Replacing the elements in A could then be done with:
transformed_A = [[item[0], transformed_B[item[0]]] if item[0] in transformed_B else item for item in A]
Another option is to sort the smallest array and use binary search to find matching values. You can do this in a vectorized manner as follows:
a = np.zeros((1000, 2))
b = np.zeros((100, 2))
a[:, 0] =np.random.randint(200, size=(1000,))
b[:, 0] = np.random.choice(np.arange(100), size=(100,), replace=False)
b[: ,1] = np.random.rand(100)
# sort and binary search
b_sort = b[np.argsort(b[:, 0])]
idx = np.searchsorted(b_sort[:, 0], a[:, 0])
# don't look at indices larger than largest possible in b_sort
mask = idx < b.shape[0]
# check whether the value at the returned index really is the same
mask[mask] &= b_sort[idx[mask], 0] == a[:, 0][mask]
# copy the second column for positions fulfilling both conditions
a[:, 1][mask] = b_sort[idx[mask] ,1]
# only values < 100 should have a second column != 0
>>> a
array([[ 7.40000000e+01, 5.38114946e-01],
[ 8.80000000e+01, 9.21309165e-01],
[ 8.60000000e+01, 1.86336715e-01],
...,
[ 1.88000000e+02, 0.00000000e+00],
[ 5.00000000e+00, 3.81152557e-01],
[ 1.38000000e+02, 0.00000000e+00]]
)
I two numpy arrays, both M by N. X contains random values. Y contains true/false. Array A contains indices for rows in X that need replacement, with the value -1. I want to only replace values where Y is true.
Here is some code to do that:
M=30
N=40
X = np.zeros((M,N)) # random values, but 0s work too
Y = np.where(np.random.rand(M,N) > .5, True, False)
A=np.array([ 7, 8, 10, 13]), # in my setting, it's (1,4), not (4,)
for i in A[0]:
X[i][Y[A][i]==True]=-1
However, what I actually want is only replace some of the entries. List B contains how many need to be replaced for each index in A. It's already ordered so A[0][0] corresponds to B[0], etc. Also, it's true that if A[i] = k, then the corresponding row in Y has at least k trues.
B = [1,2,1,1]
Then for each index i (in loop),
X[i][Y[A][i]==True][0:B[i]] = -1
This doesn't work. Any ideas on a fix?
Unfortunately, I don't have an elegant answer; however, this works:
M=30
N=40
X = np.zeros((M,N)) # random values, but 0s work too
Y = np.where(np.random.rand(M,N) > .5, True, False)
A=np.array([ 7, 8, 10, 13]), # in my setting, it's (1,4), not (4,)
B = [1,2,1,1]
# position in row where X should equal - 1, i.e. X[7,a0], X[8,a1], etc
a0=np.where(Y[7]==True)[0][0]
a1=np.where(Y[8]==True)[0][0]
a2=np.where(Y[8]==True)[0][1]
a3=np.where(Y[10]==True)[0][0]
a4=np.where(Y[13]==True)[0][0]
# For each row (i) indexed by A, take only B[i] entries where Y[i]==True. Assume these indices in X = -1
for i in range(len(A[0])):
X[A[0][i]][(Y[A][i]==True).nonzero()[0][0:B[i]]]=-1
np.sum(X) # should be -5
X[7,a0]+X[8,a1]+X[8,a2]+X[10,a3]+X[13,a4] # should be -5
It is not clear what you want to do, here is my understanding:
import numpy as np
m,n = 30,40
x = np.zeros((m,n))
y = np.random.rand(m,n) > 0.5 #no need for where here
a = np.array([7,8,10,13])
x[a] = np.where(y[a],-1,x[a]) #need where here