I have the following numpy array
u = np.array([a1,b1,a2,b2...,an,bn])
where I would like to subtract the a and b elements from each other and end up with a numpy array:
u_result = np.array([(a2-a1),(b2-b1),(a3-a2),(b3-b2),....,(an-a_(n-1)),(an-a_(n-1))])
How can I do this without too much array splitting and for loops? I'm using this in a larger loop so ideally, I would like to do this efficiently (and learn something new)
(I hope the indexing of the resulting array is clear)
Or simply, perform a substraction :
u = np.array([3, 2, 5, 3, 7, 8, 12, 28])
u[2:] - u[:-2]
Output:
array([ 2, 1, 2, 5, 5, 20])
you can use ravel torearrange as your original vector.
Short answer:
u_r = np.ravel([np.diff(u[::2]),
np.diff(u[1::2])], 'F')
Here a long and moore detailed explanation:
separate a from b in u this can be achieved indexing
differentiate a and b you can use np.diff for easiness of code.
ravel again the differentiated values.
#------- Create u---------------
import numpy as np
a_aux = np.array([50,49,47,43,39,34,28])
b_aux = np.array([1,2,3,4,5,6,7])
u = np.ravel([a_aux,b_aux],'F')
print(u)
#-------------------------------
#1)
# get a as elements with index 0, 2, 4 ....
a = u[::2]
b = u[1::2] #get b as 1,3,5,....
#2)
#differentiate
ad = np.diff(a)
bd = np.diff(b)
#3)
#ravel putting one of everyone
u_result = np.ravel([ad,bd],'F')
print(u_result)
You can try in this way. Firstly, split all a and b elements using array[::2], array[1::2]. Finally, subtract from b to a (np.array(array[1::2] - array[::2])).
import numpy as np
array = np.array([7,8,9,6,5,2])
u_result = np.array(array[1::2] - array[::2] )
print(u_result)
Looks like you need to use np.roll:
shift = 2
u = np.array([1, 11, 2, 12, 3, 13, 4, 14])
shifted_u = np.roll(u, -shift)
(shifted_u - u)[:-shift]
Returns:
array([1, 1, 1, 1, 1, 1])
Related
I currently have some code where I've created a mask which checks to see if a variable matches the first position in a sequence, called index_pos_overload. If it matches, the variable is chosen, and the check ends. However, I want to be able to use this mask to not only check if the number satisfies the condition of the mask, but if it doesn't move along to the next value in the sequence which does. It's essentially to pick out a row in my pandas data column, hyst. My code currently looks like this:
import pandas as pd
from itertools import chain
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
possible_overload_cycle = 1
index_pos_overload = chain.from_iterable((hyst.index[i])
for i in range(0, len(hyst)-1, 5))
if (possible_overload_cycle == index_pos_overload):
hyst_overload_cycle = possible_overload_cycle
else:
hyst_overload_cycle = 5 #next value in iterable where index_pos_overload is true
The expected output of hyst_overload_cycle should be this:
print(hyst_overload_cycle)
5
I've included my logic as to how I think this should work - possible_overload_cycle = 1 does not point to the first position in the dataframe, so hyst_overload_cycle should return as 5, the first position in the mask. I hope I've made sense, as I can't quite seem to work out how I would go about this programatically.
If I understood you correctly, it may be simpler than you think:
index_pos_overload can be an array / list, there is no need to use complex constructs to store a sequence of values
to find the first non-zero value from index_pos_overload, one can simply use np.nonzero()[0][0] (the first [0] is to select the dimension, the second is to select the index within that axis) and use array indexing of that on the original index_pos_overload array
The code would look like:
import numpy as np
import pandas as pd
hyst = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9, 7, 5, 3, 6, 3, 2 ,1, 5, 2]})
possible_overload_cycle = 1
index_pos_overload = np.array([hyst.index[i] for i in range(0, len(hyst)-1, 5)])
if possible_overload_cycle in index_pos_overload:
hyst_overload_cycle = possible_overload_cycle
else:
hyst_overload_cycle = index_pos_overload[np.nonzero(index_pos_overload)[0][0]]
print(hyst_overload_cycle)
# 5
Numpy has а repeat function, that repeats each element of the array a given (per element) number of times.
I want to implement a function that does similar thing but repeats not individual elements, but variably sized blocks of consecutive elements. Essentially I want the following function:
import numpy as np
def repeat_blocks(a, sizes, repeats):
b = []
start = 0
for i, size in enumerate(sizes):
end = start + size
b.extend([a[start:end]] * repeats[i])
start = end
return np.concatenate(b)
For example, given
a = np.arange(20)
sizes = np.array([3, 5, 2, 6, 4])
repeats = np.array([2, 3, 2, 1, 3])
then
repeat_blocks(a, sizes, repeats)
returns
array([ 0, 1, 2,
0, 1, 2,
3, 4, 5, 6, 7,
3, 4, 5, 6, 7,
3, 4, 5, 6, 7,
8, 9,
8, 9,
10, 11, 12, 13, 14, 15,
16, 17, 18, 19,
16, 17, 18, 19,
16, 17, 18, 19 ])
I want to push these loops into numpy in the name of performance. Is this possible? If so, how?
Here's one vectorized approach using cumsum -
# Get repeats for each group using group lengths/sizes
r1 = np.repeat(np.arange(len(sizes)), repeats)
# Get total size of output array, as needed to initialize output indexing array
N = (sizes*repeats).sum() # or np.dot(sizes, repeats)
# Initialize indexing array with ones as we need to setup incremental indexing
# within each group when cumulatively summed at the final stage.
# Two steps here:
# 1. Within each group, we have multiple sequences, so setup the offsetting
# at each sequence lengths by the seq. lengths preceeeding those.
id_ar = np.ones(N, dtype=int)
id_ar[0] = 0
insert_index = sizes[r1[:-1]].cumsum()
insert_val = (1-sizes)[r1[:-1]]
# 2. For each group, make sure the indexing starts from the next group's
# first element. So, simply assign 1s there.
insert_val[r1[1:] != r1[:-1]] = 1
# Assign index-offseting values
id_ar[insert_index] = insert_val
# Finally index into input array for the group repeated o/p
out = a[id_ar.cumsum()]
This function is a great candidate to speed up using Numba:
#numba.njit
def repeat_blocks_jit(a, sizes, repeats):
out = np.empty((sizes * repeats).sum(), a.dtype)
start = 0
oi = 0
for i, size in enumerate(sizes):
end = start + size
for rep in range(repeats[i]):
oe = oi + size
out[oi:oe] = a[start:end]
oi = oe
start = end
return out
This is significantly faster than Divakar's pure NumPy solution, and a lot closer to your original code. I made no effort at all to optimize it. Note that np.dot() and np.repeat() can't be used here, but that doesn't matter when all the code gets compiled.
Plus, since it is njit meaning "nopython" mode, you can even use #numba.njit(nogil=True) and get multicore speedup if you have many of these calls to make.
I am tryng to make a dict that could hold some array sniplets
like [127:130, 122:124] but dict = {1:[127:130, 122:124], 2:[127:129, 122:123]} doesn't work.
Is there a way to do this? It doesn't need to be dicts, but I want a bunch of these areas to be callable.
So I have 256x256 arrays and I want to select small areas in them for some calculations:
fft[127:130, 122:124]
Would be great if the whole part between brackets could be in a dict
You could use the slice function. It returns a slice object that can be stored in a dictionary. eg:
slice_1 = slice(127, 130)
slice_2 = slice(122, 124)
slice_a = slice(127, 129)
slice_b = slice(122, 123)
d = {1:[slice_1, slice_2],
2:[slice_a, slice_b]
}
x = fft[d[1]] # Same as fft[127:130, 122:124]
y = fft[d[2]] # Same as fft[127:129, 122:123]
Slicing numpy arrays returns a view, and not a copy, maybe this is what you are looking for?
import numpy
a = numpy.arange(10)
b = a[3:6] # array([3, 4, 5])
a[4] = 0
#b is now array([ 3, 0, 5])
b[1] = 1
#a is now array([0, 1, 2, 3, 1, 5, 6, 7, 8, 9])
I want to extract a slice of length 10, beginning at index 2, of a numpy array A:
import numpy
A = numpy.array([1,3,5,3,9])
def bigslice(A, begin_at, length):
a = A[begin_at:begin_at + length]
while len(a) + len(A) < length:
a = numpy.concatenate((a,A))
return numpy.concatenate((a, A[:length-len(a)]))
print bigslice(A, begin_at = 2, length = 10)
#[5,3,9,1,3,5,3,9,1,3]
This is correct. But I'm looking for a more efficient way to do this (especially when I'll have arrays of thousands of elements at the end) : I suspect the concatenate used here to recreate lots of new temporary arrays, and that would be un-efficient.
How to do the same thing more efficiently ?
Since the middle part of the array is already known to you (i.e. n repetitions of the full array), you can simply construct the middle portion using np.tile:
def cyclical_slice(A, start, length):
arr_l = len(A)
middle = np.tile(A, length // arr_l)
return np.array([A[start:], middle, A[0:length - len(middle)]])
Your code doesn't seem to guarantee that you get a slice of length length, e.g.
>>> A = numpy.array([1,3,5,3,9])
>>> bigslice(A, 0, 3)
array([1, 3, 5, 3, 9, 1, 3, 5])
Assuming that this is an oversight, maybe you could use np.pad, e.g.
def wpad(A, begin_at, length):
to_pad = max(length + begin_at - len(A), 0)
return np.pad(A, (0, to_pad), mode='wrap')[begin_at:begin_at+length]
which gives
>>> wpad(A, 0, 3)
array([1, 3, 5])
>>> wpad(A, 0, 10)
array([1, 3, 5, 3, 9, 1, 3, 5, 3, 9])
>>> wpad(A, 2, 10)
array([5, 3, 9, 1, 3, 5, 3, 9, 1, 3])
and so on.
I'm looking to quickly (hopefully without a for loop) generate a Numpy array of the form:
array([a,a,a,a,0,0,0,0,0,b,b,b,0,0,0, c,c,0,0....])
Where a, b, c and other values are repeated at different points for different ranges. I'm really thinking of something like this:
import numpy as np
a = np.zeros(100)
a[0:3,9:11,15:16] = np.array([a,b,c])
Which obviously doesn't work. Any suggestions?
Edit (jterrace answered the original question):
The data is coming in the form of an N*M Numpy array. Each row is mostly zeros, occasionally interspersed by sequences of non-zero numbers. I want to replace all elements of each such sequence with the last value of the sequence. I'll take any fast method to do this! Using where and diff a few times, we can get the start and stop indices of each run.
raw_data = array([.....][....])
starts = array([0,0,0,1,1,1,1...][3, 9, 32, 7, 22, 45, 57,....])
stops = array([0,0,0,1,1,1,1...][5, 12, 50, 10, 30, 51, 65,....])
last_values = raw_data[stops]
length_to_repeat = stops[1]-starts[1]
Note that starts[0] and stops[0] are the same information (which row the run is occurring on). At this point, since the only route I know of is what jterrace suggest, we'll need to go through some contortions to get similar start/stop positions for the zeros, then interleave the zero start/stop with the values start/stops, and interleave the number 0 with the last_values array. Then we loop over each row, doing something like:
for i in range(N)
values_in_this_row = where(starts[0]==i)[0]
output[i] = numpy.repeat(last_values[values_in_this_row], length_to_repeat[values_in_this_row])
Does that make sense, or should I explain some more?
If you have the values and repeat counts fully specified, you can do it this way:
>>> import numpy
>>> values = numpy.array([1,0,2,0,3,0])
>>> counts = numpy.array([4,5,3,3,2,2])
>>> numpy.repeat(values, counts)
array([1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 3, 3, 0, 0])
you can use numpy.r_:
>>> np.r_[[a]*4,[b]*3,[c]*2]
array([1, 1, 1, 1, 2, 2, 2, 3, 3])