Converting list of tuples into numpy array and reshaping it? - python

I have three lists.
a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
c = [11, 12 , 13, 14, 15]
I combine them and make one list of tuples using list comprehension
combine_list = [(a1, b1, c1) for a1 in a for b1 in b for c1 in c]
This combine list has 5*5*5 = 125 elements.
Now I want to convert this combine_list into a numpy array with shape (5, 5, 5). So, I use the following code:
import numpy as np
combine_array = np.asarray(combine_list).reshape(5, 5, 5)
This gives me an error:
ValueError: total size of new array must be unchanged
But, when I try to reshape list of single 125 numbers (no tuple elements) to a numpy array, no such error occurs.
Any help on how to reshape list of tuples to a numpy array ?

Not sure if this is what you want, but you can use multi-dimensional list comprehension.
combine_list = [[[ (i, j, k) for k in c] for j in b] for i in a]
combine_array = np.asarray(combine_list)

If you really need the 3 ints in a 5x5x5 then you need a dtype of 3 ints. Using itertools.product to combine the lists:
>>> import itertools
>>> np.array(list(itertools.product(a,b,c)), dtype='int8,int8,int8').reshape(5,5,5)
Alternatively, just include the 3 elements in the reshape:
>>> np.array(list(itertools.product(a,b,c))).reshape(5,5,5,3)

Related

Remove infinite values from numpy array

I would like to remove elements from one array B that have the same index as the inf elements from another array A.
I have two numpy array such as
A = np.array([1,2,3,4, float('inf')])
B = np.array([5, 6, 7, 8, 9])
If I do B[A>2], the output is array([7, 8, 9]). However, if I do B[math.isfinite(A)], then I get an error
TypeError: only size-1 arrays can be converted to Python scalars
How can I select the elements from B where the value in A is not infinity?
I think you have the answer in your question:
B = B[A!= float('inf')]

Delete array of values from numpy array

This post is an extension of this question.
I would like to delete multiple elements from a numpy array that have certain values. That is for
import numpy as np
a = np.array([1, 1, 2, 5, 6, 8, 8, 8, 9])
How do I delete one instance of each value of [1,5,8], such that the output is [1,2,6,8,8,9]. All I have found in the documentation for an array removal is the use of np.setdiff1d, but this removes all instances of each number. How can this be updated?
Using outer comparison and argmax to only remove once. For large arrays this will be memory intensive, since the created mask has a.shape * r.shape elements.
r = np.array([1, 5, 8])
m = (a == r[:, None]).argmax(1)
np.delete(a, m)
array([1, 2, 6, 8, 8, 9])
This does assume that each value in r appears in a at least once, otherwise the value at index 0 will get deleted since argmax will not find a match, and will return 0.
delNums = [np.where(a == x)[0][0] for x in [1,5,8]]
a = np.delete(a, delNums)
here, delNums contains the indexes of the values 1,5,8 and np.delete() will delete the values at those specified indexes
OUTPUT:
[1 2 6 8 8 9]

what is the most pythonic way to split a 2d array to arrays of each row?

I have a function foo that returns an array with the shape (1000, 2)
how can I split it to two arrays a(1000) and b(1000)
I'm looking for something like this:
a;b = foo()
I'm looking for an answer that can easily generalize to the case in which the shape is (1000, 5) or so.
The zip(*...) idiom transposes a traditional more-dimensional Python list:
x = [[1,2], [3,4], [5,6]]
# get columns
a, b = zip(*x) # zip(*foo())
# a, b = map(list, zip(*x)) # if you prefer lists over tuples
a
# (1, 3, 5)
# get rows
a, b, c = x
a
# [1, 2]
Transpose and unpack?
a, b = foo().T
>>> a, b = np.arange(20).reshape(-1, 2).T
>>> a
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
>>> b
array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])
You can use numpy.hsplit.
x = np.arange(12).reshape((3, 4))
np.hsplit(x, x.shape[1])
This returns a list of subarrays. Note that in the case of a 2d input, the subarrays will be shape (n, 1). Unless you wrap a function around it to squeeze them to 1d:
def split_1d(arr_2d):
"""Split 2d NumPy array on its columns."""
split = np.hsplit(arr_2d, arr_2d.shape[1])
split = [np.squeeze(arr) for arr in split]
return split
a, b, c, d = split_1d(x)
a
# array([0, 4, 8])
d
# array([ 3, 7, 11])
You could just use list comprehensions, e.g.
(a,b)=([i[0] for i in mylist],[i[1] for i in mylist])
To generalise you could use a comprehension within a comprehension:
(a,b,c,d,e)=([row[i] for row in mylist] for i in range(5))
You can do this simply by using zip function like:
def foo(mylist):
return zip(*mylist)
Now call foo with as much dimension as you have in mylist, and it would do the requisite like:
mylist = [[1, 2], [3, 4], [5, 6]]
a, b = foo(mylist)
# a = (1, 3, 5)
# b = (2, 4, 6)
So this is a little nuts, but if you want to assign different letters to each sub-array in your array, and do so for any number of sub-arrays (up to 26 because alphabet), you could do:
import string
letters = list(string.ascii_lowercase) # get all of the lower-case letters
arr_dict = {k: v for k, v in zip(letters, foo())}
or more simply (for the last line):
arr_dict = dict(zip(letters, foo()))
Then you can access each individual element as arr_dict['a'] or arr_dict['b']. This feels a little mad-scientist-ey to me, but I thought it was fun.

find indeces of grouped-item matches between two arrays

a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
I have two lists of integers that each is grouped by every 2 consecutive items (ie indices [0, 1], [2, 3] and so on).
The pairs of items cannot be found as duplicates in either list, neither in the same or the reverse order.
One list is significantly larger and inclusive of the other.
I am trying to figure out an efficient way to get the indices
of the larger list's grouped items that are also in the smaller one.
The desired output in the example above should be:
[2,3,6,7,10,11] #indices
Notice that, as an example, the first group ([3,4]) should not get indices 11,12 as a match because in that case 3 is the second element of [1,3] and 4 the first element of [4,7].
Since you are grouping your arrays by pairs, you can reshape them into 2 columns for comparison. You can then compare each of the elements in the shorter array to the longer array, and reduce the boolean arrays. From there it is a simple matter to get the indices using a reshaped np.arange.
import numpy as np
from functools import reduce
a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
# reshape a and b into columns
a2 = a.reshape((-1,2))
b2 = b.reshape((-1,2))
# create a generator of bools for the row of a2 that holds b2
b_in_a_generator = (np.all(a2==row, axis=1) for row in b2)
# reduce the generator to get an array of boolean that is True for each row
# of a2 that equals one of the rows of b2
ix_bool = reduce(lambda x,y: x+y, b_in_a_generator)
# grab the indices by slicing a reshaped np.arange array
ix = np.arange(len(a)).reshape((-1,2))[ix_bool]
ix
# returns:
array([[ 2, 3],
[ 6, 7],
[10, 11]])
If you want a flat array, simply ravel ix
ix.ravel()
# returns
array([ 2, 3, 6, 7, 10, 11])
Here's one approach making use of NumPy view of group of elements -
# Taken from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def grouped_indices(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
sidx = a0v.argsort()
idx = sidx[np.searchsorted(a0v,b0v, sorter=sidx)]
return ((idx*2)[:,None] + [0,1]).ravel()
If there isn't a membership between any group from b in a, we could filter that out using a mask : a0v[idx] == b0v.
Sample run -
In [345]: a
Out[345]: array([5, 8, 3, 4, 2, 5, 7, 8, 1, 9, 1, 3, 4, 7])
In [346]: b
Out[346]: array([3, 4, 7, 8, 1, 3])
In [347]: grouped_indices(a, b)
Out[347]: array([ 2, 3, 6, 7, 10, 11])
Another one using np.in1d to replace np.searchsorted -
def grouped_indices_v2(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
return (np.flatnonzero(np.in1d(a0v, b0v))[:,None]*2 + [0,1]).ravel()

Convert a numpy array to an array of numpy arrays

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])
For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!
Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

Categories