Splitting up an array in python into sub arrays - python

I have an array which is 1 -> 160. I want to split this into 10 arrays that are split every sixteen numbers. This is what I have so far:
amplitude=[]
for i in range (0,160):
amplitude.append(i+1)
print(amplitude)
#split arrays up into a line for each sample
traceno=10 #number of traces in file
samplesno=16 #number of samples in each trace. This wont change.
amplitude_split=np.zeros((traceno,samplesno) ,dtype=np.int)
#fill in the arrays with amplitude/sample numbers
for i in range(len(amplitude)):
for j in range(traceno):
for k in range(samplesno):
amplitude_split[j,k]=amplitude[i]
print(amplitude_split[1,:])
As an output I only get [160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160]
Where I require something along the lines of:
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]
[17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]
etc...

You are nesting the loops. So you consistently fill the new array with the same number from the first one, and end with the last one 160 repeated everywhere.
You only need to copy the list into a 1D numpy array, and then reshape it:
amplitude_split=np.array(amplitude, dtype=np.int).reshape((traceno,samplesno))

Well, if we're using Numpy arrays, we can use Numpy functionality:
amplitude = np.arange(1, 161)
amplitude_split = amplitude.reshape(10, 16)
Otherwise, you've already been linked to how to do it for plain lists, but I'd like to point out that you still don't need a loop to fill amplitude in the first place:
amplitude = list(range(1, 161))
In general, with Python you should be trying hard not to think in terms of starting with an initially blank "storage" area that you then fill in. Just create the data you want directly - by conversions of the sort above, by list comprehensions etc., or if necessary by .append() ing - rather than overwriting a dummy value.

See grouper in https://docs.python.org/2/library/itertools.html#recipes
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)

Related

Select only values below 50 from array, add 5 then multiply by 2. The other values should remain unchanged

I have a python array that i got using
array = np.arange(2,201,2).reshape(25,4)
which gave me this:
[[ 2 4 6 8]
[ 18 20 22 24]
[ 34 36 38 40]
[ 50 52 54 56]
[ 66 68 70 72]
[ 82 84 86 88]
[ 98 100 102 104]
[114 116 118 120]
[130 132 134 136]
[146 148 150 152]
[162 164 166 168]
[178 180 182 184]
[194 196 198 200]]
but now i'm instructed to select only the values below 50 from "array", add 5 to these values, and then multiply by 2. The other values should remain unchanged and everything should be saved as "array". This is a school assignment so I don't have the output but basically the output should be the array in the same 25x4 shape and the first ~3 rows will be changed (since those are the ones under 50) and the other rows/values will be the same (since they're over 50). I've tried the following code:
for i in array:
if array < 50:
print((i+5)*2)
else:
print(i)
and I'm getting an error that says -
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
any help would be greatly appreciated since I can't find any other articles with similar questions
There are 2 ways to address this question. A Python one and a numpy one (numpy is not Python...).
Python way:
You have a sequence of sequence containers. You can use a double iteration to test the values one at a time and replace the ones that have to be:
for row in array: # iterate over the rows
for i, val in enumerate(row): # then the values in the row
if val <=50: # test them
row[i] = (val + 5) * 2 # and replace
This works as soon as the outer iteration gives you a direct access to the row container. This is true for both Python containers (lists) and numpy arrays but may not be guaranteed for any type of containers. The super safe way would be to keep the indexes and directly modify array:
for i in range(len(array)):
for j in range(len(array[i])):
if array[i][j]< 50:
array[i][j] = (array[i][j] + 2) * 5
Numpy way:
The power of numpy is to provide high speed iterations on its arrays. In numpy wordings it is called vectorization. You should first extract the relevant indexes and then change the values in one single vectorized operation:
ix = np.where(array < 50)
array[ix] = (array[ix] + 5) * 2
For large arrays, this second way should be at least one magnitude order faster than the first one.
For your question, the correct way is the one that matches your current lesson, either Python or numpy...
import numpy as np
array = np.arange(2,201,2).reshape(25,4)
values = [ (element+5)*2 if element < 50 else element for innerList in array for element in innerList ]
print(values)

How to vectorize more than one vector in Julia?

I want to write vectorized style code in Julia in the context of wanting to define a function which takes more than one vector as arguments like below.
[code]
using PyPlot;
m=[453 21 90;34 1 44;13 553 66]
a = [1,2,3]
b=[1,2,3]
f(x,y) = m[x,y]
f.(a,b)
#= expected result
3×3 Matrix{Int64}:
453 21 90
34 1 44
13 553 66
#
[real result]
3-element Vector{Int64}:
453
1
66
The dot notation only picks the first element of each row, ignoring the others, and makes a vector with just 3 elements instead of 3 x 3 matrix.
How can I write to get the expected result?
Any information would be appreciated.
one of the two vectors needs to be a row vector so that Julia understands what you want to do, this simple example should help you understand Julia broadcasting:
julia> [1,2,3] .+ [10,20,30] # both have the same dimensions
3-element Vector{Int64}:
11
22
33
julia> [1,2,3]' .+ [10,20,30]
# first has dimensions (1,3) and second (3,1) => result is dimension (3,3)
3×3 Matrix{Int64}:
11 12 13
21 22 23
31 32 33
You're looking for
julia> f.(a, b')
3×3 Matrix{Int64}:
453 21 90
34 1 44
13 553 66
Note the relevant section in the documentation for broadcast (type ?broadcast into a REPL session to access it):
Singleton and missing dimensions are expanded to match the extents of the other arguments by virtually repeating the value.
a is treated as a 3x1 matrix (but has the type Vector{T}), while b' is used as a 1x3 matrix (with the type Adjoint(T, Vector{T})). These are broadcast to the resulting 3x3 matrix.
When using a and b directly, no expansion of dimensions is necessary, and you'll end up with a 3x1 matrix.

Send an array with specific size to workers

I want to send an array with size of 336 in splitted parts to my 8 workers. I want the workers 0-8 to get the sizes 12,18,30,36,48,54,66 and 72. So add 6 then 12 and 6 and so on... To this point I was able to cut the array into pieces of 10.
This is what I came up with:
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
v=np.random.rand(100,1) #array
if rank == 0:
# Process to send data to the different processes. Just send evenly chunks to the processes.
for i in range(1, size):
v_splitted=[np.array_split(v, 10)[i-1]]
comm.send(v_splitted, dest=i, tag=i)
# worker processes
else:
# each worker process receives data from master process
data = comm.recv(source=0, tag=rank)
How do I make sure, that each worker gets the desired size?
you can use accumulate from itertools and zip to build a list of slices. Then use that to break down your array in chunks of the desired sizes:
from itertools import accumulate
sizes = [12,18,30,36,48,54,66,72] # or [*accumulate([12,6]*4)]
breaks = [*accumulate(sizes)]
slices = [slice(s,e) for s,e in zip([0]+breaks,breaks)]
v = list(range(336))
for i,chunk in enumerate(slices):
print(len(v[chunk]),":",*v[chunk][:3],"...",*v[chunk][-3:])
# comm.send(v[chunk], dest=i, tag=i)
output:
12 : 0 1 2 ... 9 10 11
18 : 12 13 14 ... 27 28 29
30 : 30 31 32 ... 57 58 59
36 : 60 61 62 ... 93 94 95
48 : 96 97 98 ... 141 142 143
54 : 144 145 146 ... 195 196 197
66 : 198 199 200 ... 261 262 263
72 : 264 265 266 ... 333 334 335
how it works
The breaks list contains the cumulative numbers of items that are processed at the end of each chunk:
[12, 30, 60, 96, 144, 198, 264, 336]
These numbers correspond to the end of index ranges that would represent each chunk of data. To obtain the start of these ranges, we simply need to pair each end value with the end value of the preceding chunk (the first chunk starting at zero):
starts (s): [0] [12, 30, 60, 96, 144, 198, 264, 336]
ends (e): [12, 30, 60, 96, 144, 198, 264, 336]
ranges: (0,12), (12,30), (30,60) ... (264,336)
This is what the slices variable will contain except that, to facilitate usage later on, it returns a list of slice() objects instead of a list of range() objects. The slice objects can be used directly as subscripts to the list or array containing the data (e.g. v[slice]). The zip() function is used here to create pairs of end values where the previous end (i.e. the start) is obtained by offsetting the breaks with one extra entry (of zero)

numpy: take multiple range subsets of the same of size

What I'm looking for
# I have an array
x = np.arange(0, 100)
# I have a size n
n = 10
# I have a random set of numbers
indexes = np.random.randint(n, 100, 10)
# What I want is a matrix where every row i is the i-th element of indexes plus the previous n elements
res = np.empty((len(indexes), n), int)
for (i, v) in np.ndenumerate(indexes):
res[i] = x[v-n:v]
To reformulate, as I wrote in the title what am looking for is a way to take multiple subsets (of the same size) of an initial array.
Just to add a detail this loopy version works, I want just to know if there is a numpyish way to achieve this in a more elegant way.
The following does what you are asking for. It uses numpy.lib.stride_tricks.as_strided to create a special view on the data which can be indexed in the desired way.
import numpy as np
from numpy.lib import stride_tricks
x = np.arange(100)
k = 10
i = np.random.randint(k, len(x)+1, size=(5,))
xx = stride_tricks.as_strided(x, strides=np.repeat(x.strides, 2), shape=(len(x)-k+1, k))
print(i)
print(xx[i-k])
Sample output:
[ 69 85 100 37 54]
[[59 60 61 62 63 64 65 66 67 68]
[75 76 77 78 79 80 81 82 83 84]
[90 91 92 93 94 95 96 97 98 99]
[27 28 29 30 31 32 33 34 35 36]
[44 45 46 47 48 49 50 51 52 53]]
A bit of explanation. Arrays store not only data but also a small "header" with layout information. Amongst this are the strides which tell how to translate linear memory to nd. There is a stride for each dimension which is just the offset at which the next element along that dimension can be found. So the strides for a 2d array are (row offset, element offset). as_strided permits to directly manipulate an array's strides; by setting row offsets to the same as element offsets we create a view that looks like
0 1 2 ...
1 2 3 ...
2 3 4
. .
. .
. .
Note that no data are copied at this stage; for exasmple, all the 2s refer to the same memory location in the original array. Which is why this solution should be quite efficient.

Find multiple maximum values in a 2d array fast

The situation is as follows:
I have a 2D numpy array. Its shape is (1002, 1004). Each element contains a value between 0 and Inf. What I now want to do is determine the first 1000 maximum values and store the corresponding indices in to a list named x and a list named y. This is because I want to plot the maximum values and the indices actually correspond to real time x and y position of the value.
What I have so far is:
x = numpy.zeros(500)
y = numpy.zeros(500)
for idx in range(500):
x[idx] = numpy.unravel_index(full.argmax(), full.shape)[0]
y[idx] = numpy.unravel_index(full.argmax(), full.shape)[1]
full[full == full.max()] = 0.
print os.times()
Here full is my 2D numpy array. As can be seen from the for loop, I only determine the first 500 maximum values at the moment. This however already takes about 5 s. For the first 1000 maximum values, the user time should actually be around 0.5 s. I've noticed that a very time consuming part is setting the previous maximum value to 0 each time. How can I speed things up?
Thank you so much!
If you have numpy 1.8, you can use the argpartition function or method.
Here's a script that calculates x and y:
import numpy as np
# Create an array to work with.
np.random.seed(123)
full = np.random.randint(1, 99, size=(8, 8))
# Get the indices for the largest `num_largest` values.
num_largest = 8
indices = (-full).argpartition(num_largest, axis=None)[:num_largest]
# OR, if you want to avoid the temporary array created by `-full`:
# indices = full.argpartition(full.size - num_largest, axis=None)[-num_largest:]
x, y = np.unravel_index(indices, full.shape)
print("full:")
print(full)
print("x =", x)
print("y =", y)
print("Largest values:", full[x, y])
print("Compare to: ", np.sort(full, axis=None)[-num_largest:])
Output:
full:
[[67 93 18 84 58 87 98 97]
[48 74 33 47 97 26 84 79]
[37 97 81 69 50 56 68 3]
[85 40 67 85 48 62 49 8]
[93 53 98 86 95 28 35 98]
[77 41 4 70 65 76 35 59]
[11 23 78 19 16 28 31 53]
[71 27 81 7 15 76 55 72]]
x = [0 2 4 4 0 1 4 0]
y = [6 1 7 2 7 4 4 1]
Largest values: [98 97 98 98 97 97 95 93]
Compare to: [93 95 97 97 97 98 98 98]
You could loop through the array as #Inspired suggests, but looping through NumPy arrays item-by-item tends to lead to slower-performing code than code which uses NumPy functions since the NumPy functions are written in C/Fortran, while the item-by-item loop tends to use Python functions.
So, although sorting is O(n log n), it may be quicker than a Python-based one-pass O(n) solution. Below np.unique performs the sort:
import numpy as np
def nlargest_indices(arr, n):
uniques = np.unique(arr)
threshold = uniques[-n]
return np.where(arr >= threshold)
full = np.random.random((1002,1004))
x, y = nlargest_indices(full, 10)
print(full[x, y])
print(x)
# [ 2 7 217 267 299 683 775 825 853]
print(y)
# [645 621 132 242 556 439 621 884 367]
Here is a timeit benchmark comparing nlargest_indices (above) to
def nlargest_indices_orig(full, n):
full = full.copy()
x = np.zeros(n)
y = np.zeros(n)
for idx in range(n):
x[idx] = np.unravel_index(full.argmax(), full.shape)[0]
y[idx] = np.unravel_index(full.argmax(), full.shape)[1]
full[full == full.max()] = 0.
return x, y
In [97]: %timeit nlargest_indices_orig(full, 500)
1 loops, best of 3: 5 s per loop
In [98]: %timeit nlargest_indices(full, 500)
10 loops, best of 3: 133 ms per loop
For timeit purposes I needed to copy the array inside nlargest_indices_orig, lest full get mutated by the timing loop.
Benchmarking the copying operation:
def base(full, n):
full = full.copy()
In [102]: %timeit base(full, 500)
100 loops, best of 3: 4.11 ms per loop
shows this added about 4ms to the 5s benchmark for nlargest_indices_orig.
Warning: nlargest_indices and nlargest_indices_orig may return different results if arr contains repeated values.
nlargest_indices finds the n largest values in arr and then returns the x and y indices corresponding to the locations of those values.
nlargest_indices_orig finds the n largest values in arr and then returns one x and y index for each large value. If there is more than one x and y corresponding to the same large value, then some locations where large values occur may be missed.
They also return indices in a different order, but I suppose that does not matter for your purpose of plotting.
If you want to know the indices of the n max/min values in the 2d array, my solution (for largest is)
indx = divmod((-full).argpartition(num_largest,axis=None)[:3],full.shape[0])
This finds the indices of the largest values from the flattened array and then determines the index in the 2d array based on the remainder and mod.
Nevermind. Benchmarking shows the unravel method is twice as fast at least for num_largest = 3.
I'm afraid that the most time-consuming part is recalculating maximum. In fact, you have to calculate maximum of 1002*1004 numbers 500 times which gives you 500 million comparisons.
Probably you should write your own algorithm to find the solution in one pass: keep only 1000 greatest numbers (or their indices) somewhere while scanning your 2D array (without modifying the source array). I think that some sort of a binary heap (have a look at heapq) would suit for the storage.

Categories