array/list/tuple of nparrays with variable length? - python

In Python, I have the following problem, made into a toy example:
import random
import numpy as np
x_arr = np.array([], dtype = object)
for x in range(5):
y_arr = np.array([], dtype=object)
for y in range(5):
r = random.random()
if r < 0.5:
y_arr = np.append(y_arr,y)
if random.random() < 0.9:
x_arr = np.append(x_arr, y_arr)
#This results in
>>> x_arr
array([4, 0, 1, 2, 4, 0, 3, 4], dtype=object)
I would like to have
array([array([4]), array([0, 1, 2, 4]), array([0, 3, 4]), dtype=object)
So apparently, in this run 3 out of 5 (variable) times the array $y_arr$ is written into $x_arr$, having lengths 1,4, and 3 (variable).
append() puts the results in one long 1D-structure, where I would like to keep it 2D. Also, considering the example, it might be that no numbers get written at all (if you are 'unlucky' with the random numbers). So i have an a priori unknown array of arrays with, each of those, a priori unknown number of elements. How would I approach this in Python, other than finding an upperbound on both and store a lot of zeros?

You might do it in a two step process? First add an element, then set the element. This circumvents the automatic flatten which happens in np.append() when axis=None (default behavior), as documented here.
import random
import numpy as np
x_arr = np.array([], dtype = object).reshape((1,0))
for x in range(5):
y_arr = np.array([], dtype=np.int32)
for y in range(5):
r = random.random()
if r < 0.5:
y_arr = np.append(y_arr,y)
if random.random() < 0.9:
x_arr = np.append(x_arr, 0)
x_arr[-1] = y_arr
print type(x_arr)
print x_arr
This gives:
<type 'numpy.ndarray'>
[array([0, 1, 2]) array([0, 1, 2, 3]) array([0, 1, 4]) array([0, 1, 3, 4])
array([2, 3])]
Also, why not use a python list for x_arr (or y_arr?). Nested numpy arrays are not really useful when they are not ndarrays.

Related

Use information of two arrays to create a third one

I have two numpy-arrays and want to create a third one with the information in these twos.
Here is a simple example:
have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
use = np.array([[2], [3]])
solution = np.array([[1, 1, 3, 4], [5, 5, 5, 8]])
What I want is to use the "use"-array, which gives me the number of how often I want to use the first element in each row from my "have"-array.
So the 2 in "use" means, that I want to have two times a "1" in my new array "solution". Similary for the "3" in use, I want that my new array has 3 times a "5". The rest from have should be the same.
It is important to use the "use"-array for doing this (or a numpy-array in general).
Do you have some ideas?
If there are only small such data structures and performance is not an issue then you can do this so simple:
np.array([ [a[0]]*b[0]+list(a[b[0]:]) for a,b in zip(have,use)])
Simply iterate through the have and replace the values based on the use.
Use:
for i in range(use.shape[0]):
have[i, :use[i, 0]] = np.repeat(have[i, 0], use[i, 0])
Using only numpy operations:
First create a boolean mask of same size as have. mask(i, j) is True if j < use[i, j] otherwise it's False. So mask is True for indices which are to be replaced by first column value. Now use np.where to replace.
n, m = have.shape
mask = np.repeat(np.arange(m)[None, :], n, axis = 0) < use
have = np.where(mask, have[:, 0:1], have)
Output:
>>> have
array([[1, 1, 3, 4],
[5, 5, 5, 8]])
If performance matters, you can use np.apply_along_axis().
import numpy as np
have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
use = np.array([[2], [3]])
def rep1st(arr):
rep = arr[0]
res = np.repeat(arr[1], rep)
res = np.concatenate([res, arr[rep+1:]])
return res
solution = np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))
update:
As #hpaulj said, actually the method using apply_along_axis above is not as efficient as I expected. I misunderstood it. Reference: numpy np.apply_along_axis function speed up?.
However, I made some test on current methods:
import numpy as np
from timeit import timeit
def rep1st(arr):
rep = arr[0]
res = np.repeat(arr[1], rep)
res = np.concatenate([res, arr[rep + 1:]])
return res
def test(row, col, run):
have = np.random.randint(0, 100, size=(row, col))
use = np.random.randint(0, col, size=(row, 1))
d = locals()
d.update(globals())
# method by me
t1 = timeit("np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))", number=run, globals=d)
# method by #quantummind
t2 = timeit("np.array([[a[0]] * b[0] + list(a[b[0]:]) for a, b in zip(have, use)])", number=run, globals=d)
# method by #Amit Vikram Singh
t3 = timeit(
"np.where(np.repeat(np.arange(have.shape[1])[None, :], have.shape[0], axis=0) < use, have[:, 0:1], have)",
number=run, globals=d
)
print(f"{t1:8.6f}, {t2:8.6f}, {t3:8.6f}")
test(1000, 10, 10)
test(100, 100, 10)
test(10, 1000, 10)
test(1000000, 10, 1)
test(100000, 100, 1)
test(10000, 1000, 1)
test(1000, 10000, 1)
test(100, 100000, 1)
test(10, 1000000, 1)
results:
0.062488, 0.028484, 0.000408
0.010787, 0.013811, 0.000270
0.001057, 0.009146, 0.000216
6.146863, 3.210017, 0.044232
0.585289, 1.186013, 0.034110
0.091086, 0.961570, 0.026294
0.039448, 0.917052, 0.022553
0.028719, 0.919377, 0.022751
0.035121, 1.027036, 0.025216
It shows that the second method proposed by #Amit Vikram Singh always works well even when the arrays are huge.

Filtering of array elements by another array in numpy

Here a simple example
import numpy as np
x=np.random.rand(5,5)
k,p = np.where(x>0.5)
k and p are arrays of indices
Now I have a list of rows which should be considered m=[0,2,4], so I need to find all entries of k which are in the list m.
I came up with a very simple but horrible inefficient solution
d = np.array([ (a,b) for a,b in zip(k,p) if a in m])
The solution works, but very slow. I’m looking for a better and more efficient one. I need to do a few millions of such operations with dynamically adjusted m, so efficiency of an algorithm is really a critical question.
Maybe the below is faster:
d=np.dstack((k,p))[0]
print(d[np.isin(d[:,0],m)])
You could use isin() to get a boolean mask which you can use to index k.
>>> x=np.random.rand(3,3)
>>> x
array([[0.74043564, 0.48328081, 0.82396324],
[0.40693944, 0.24951958, 0.18043229],
[0.46623863, 0.53559775, 0.98956277]])
>>> k, p = np.where(x > 0.5)
>>> p
array([0, 2, 1, 2])
>>> k
array([0, 0, 2, 2])
>>> m
array([0, 1])
>>> np.isin(k, m)
array([ True, True, False, False])
>>> k[np.isin(k, m)]
array([0, 0])
How about:
import numpy as np
m = np.array([0, 2, 4])
k, p = np.where(x[m, :] > 0.5)
k = m[k]
print(zip(k, p))
This only considers the interesting rows (and then zips them to 2d indices).

Python NumPy: How to fill a matrix using an equation

I wish to initialise a matrix A, using the equation A_i,j = f(i,j) for some f (It's not important what this is).
How can I do so concisely avoiding a situation where I have two for loops?
numpy.fromfunction fits the bill here.
Example from doc:
>>> import numpy as np
>>> np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
One could also get the indexes of your array with numpy.indices and then apply the function f in a vectorized fashion,
import numpy as np
shape = 1000, 1000
Xi, Yj = np.indices(shape)
A = (2*Xi + 3*Yj).astype(np.int) # or any other function f(Xi, Yj)

Switch value of elements on some positions

I have a numpy array X of size N, filled with 0 and 1.
I generate a sample S of size M
I want to revert the elements of X on each position from sample S.
I want to ask whether this is possible without using loops, but using some atomic operation from the numpy mask module.
I want to any type of loop like
for i in sample:
X[i] = 1-X[i]
and replace it with a single call in pylab.
Possible ?
Use X[sample] = 1 - X[sample].
For example:
>>> import numpy as np
>>> X = np.array([1, 1, 0, 1, 1])
>>> sample = [1,2,3]
>>> X[sample]
array([1, 0, 1])
>>> X[sample] = 1 - X[sample]
>>> X
array([1, 0, 1, 0, 1])

Nonzero function help, Python Numpy

I have two arrays, and I have a complex condition like this: new_arr<0 and old_arr>0
I am using nonzero but I am getting an error. The code I have is this:
indices = nonzero(new_arr<0 and old_arr>0)
I tried:
indices = nonzero(new_arr<0) and nonzero(old_arr>0)
But it gave me incorrect results.
Is there any way around this? And is there a way to get the common indices from two nonzero statements. For example, if:
indices1 = nonzero(new_arr<0)
indices2 = nonzero(old_arr>0)
and these two indices would contain:
indices1 = array([0, 1, 3])
indices2 = array([2, 3, 4])
The correct result would be getting the common element from these two (in this case it would be the element 3). Something like this:
result = common(indices1, indices2)
Try indices = nonzero((new_arr < 0) & (old_arr > 0)):
In [5]: import numpy as np
In [6]: old_arr = np.array([ 0,-1, 0,-1, 1, 1, 0, 1])
In [7]: new_arr = np.array([ 1, 1,-1,-1,-1,-1, 1, 1])
In [8]: np.nonzero((new_arr < 0) & (old_arr > 0))
Out[8]: (array([4, 5]),)
Try
indices = nonzero(logical_and(new < 0, old > 0))
(Thinking about it, my previous example wasn't all that useful if all it did was return nonzero(condition) anyway.)

Categories