Convert array of lists to array of tuples/triple - python

I have a 2D Numpy array with 3 columns. It looks something like this array([[0, 20, 1], [1,2,1], ........, [20,1,1]]). It basically is array of list of lists. How can I convert this matrix into array([(0,20,1), (1,2,1), ........., (20,1,1)])? I want the output to be a array of triple. I have been trying to use tuple and map functions described in Convert numpy array to tuple,
R = mydata #my data is sparse matrix of 1's and 0's
#First row
#R[0] = array([0,0,1,1]) #Just a sample
(rows, cols) = np.where(R)
vals = R[rows, cols]
QQ = zip(rows, cols, vals)
QT = tuple(map(tuple, np.array(QQ))) #type of QT is tuple
QTA = np.array(QT) #type is array
#QTA gives an array of lists
#QTA[0] = array([0, 2, 1])
#QTA[1] = array([0, 3, 1])
But the desired output is QTA should be array of tuples i.e QTA = array([(0,2,1), (0,3,1)]).

Your 2d array is not a list of lists, but it readily converts to that
a.tolist()
As Jimbo shows, you can convert this to a list of tuples with a comprehension (a map will also work). But when you try to wrap that in an array, you get the 2d array again. That's because np.array tries to create as large a dimensioned array as the data allows. And with sublists (or tuples) all of the same length, that's a 2d array.
To preserve tuples you have switch to a structured array. For example:
a = np.array([[0, 20, 1], [1,2,1]])
a1=np.empty((2,), dtype=object)
a1[:]=[tuple(i) for i in a]
a1
# array([(0, 20, 1), (1, 2, 1)], dtype=object)
Here I create an empty structured array with dtype object, the most general kind. Then I assign values, using a list of tuples, which is the proper data structure for this task.
An alternative dtype is
a1=np.empty((2,), dtype='int,int,int')
....
array([(0, 20, 1), (1, 2, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
Or in one step: np.array([tuple(i) for i in a], dtype='int,int,int')
a1=np.empty((2,), dtype='(3,)int') produces the 2d array. dt=np.dtype([('f0', '<i4', 3)]) produces
array([([0, 20, 1],), ([1, 2, 1],)],
dtype=[('f0', '<i4', (3,))])
which nests 1d arrays in the tuples. So it looks like object or 3 fields is the closest we can get to an array of tuples.

not the great solution, but this will work:
# take from your sample
>>>a = np.array([[0, 20, 1], [1,2,1], [20,1,1]])
# construct an empty array with matching length
>>>b = np.empty((3,), dtype=tuple)
# manually put values into tuple and store in b
>>>for i,n in enumerate(a):
>>> b[i] = (n[0],n[1],n[2])
>>>b
array([(0, 20, 1), (1, 2, 1), (20, 1, 1)], dtype=object)
>>>type(b)
numpy.ndarray

Related

How to input a 1-D array of dimensions into numpy.random.randn?

Say I have a 1-D array dims:
dims = np.array((1,2,3,4))
I want to create a n-th order normally distributed tensor where n is the size of the dims and dims[i] is the size of the i-th dimension.
I tried to do
A = np.random.randn(dims)
But this doesn't work. I could do
A = np.random.randn(1,2,3,4)
which would work but n can be large and n can be random in itself. How can I read in a array of the size of the dimensions in this case?
Use unpacking with an asterisk:
np.random.randn(*dims)
Unpacking is standard Python when the signature is randn(d0, d1, ..., dn)
In [174]: A = np.random.randn(*dims)
In [175]: A.shape
Out[175]: (1, 2, 3, 4)
randn docs suggests standard_normal which takes a tuple (or array which can be treated as a tuple):
In [176]: B = np.random.standard_normal(dims)
In [177]: B.shape
Out[177]: (1, 2, 3, 4)
In fact the docs, say new code should use this:
In [180]: rgn = np.random.default_rng()
In [181]: rgn.randn
Traceback (most recent call last):
File "<ipython-input-181-b8e8c46209d0>", line 1, in <module>
rgn.randn
AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randn'
In [182]: rgn.standard_normal(dims).shape
Out[182]: (1, 2, 3, 4)

extracting a column as a vector from a matrix in python

I have a csv file , which I am converting it to a matrix using the following command:
reader = csv.reader(open("spambase_X.csv", "r"), delimiter=",")
x = list(reader)
result = numpy.array(x)
print(result.shape) #outputs (57,4601)
Now I want to extract the first column of the matrix result , which I am doing by the following:
col1=(result[:, 1])
**print(col1.shape) #outputs (57,)**
Why isnt it printing as (57,1). How can I do that?
TIA
yes it will return the array of shape (57,). If you want to be as (57,1) , you can do it so by reshape().
col1=(result[:, 1]).reshape(-1,1)
You can add []
result[:,[1]].shape
Out[284]: (2, 1)
Data input
result
Out[285]:
array([[1, 2, 3],
[1, 2, 3]])
More Information
result[:,[1]]
Out[286]:
array([[2],
[2]])
result[:,1]
Out[287]: array([2, 2])
col1 = result[:, 1] is 1D array, thus you see that the shape it (57, ).
You can convert it to a 2D array with a single column doing:
col1[:, np.newaxis] # shape: (57, 1)
If you want a 2D array with a single row you can do:
col1[np.newaxis, :] # shape: (1, 57)

merging multiple numpy arrays

i have 3 numpy arrays which store image data of shape (4,100,100).
arr1= np.load(r'C:\Users\x\Desktop\py\output\a1.npy')
arr2= np.load(r'C:\Users\x\Desktop\py\output\a2.npy')
arr3= np.load(r'C:\Users\x\Desktop\py\output\a3.npy')
I want to merge all 3 arrays into 1 array.
I have tried in this way:
merg_arr = np.zeros((len(arr1)+len(arr2)+len(arr3), 4,100,100), dtype=input_img.dtype)
now this make an array of the required length but I don't know how to copy all the data in this array. may be using a loop?
This will do the trick:
merge_arr = np.concatenate([arr1, arr2, arr3], axis=0)
np.stack arranges arrays along a new dimension. Their dimensions (except for the first) need to match.
Demo:
arr1 = np.empty((60, 4, 10, 10))
arr2 = np.empty((14, 4, 10, 10))
arr3 = np.empty((6, 4, 10, 10))
merge_arr = np.concatenate([arr1, arr2, arr3], axis=0)
print(merge_arr.shape) # (80, 4, 10, 10)

Dynamically indexing/choosing the dimension of numpy array

Just working on a CNN and am stuck on a tensor algorithm.
I want to be able to iterate through a list, or tuple, of dimensions and choose a range of elements of X (a multi dimensional array) from that dimension, while leaving the other dimensions alone.
x = np.random.random((10,3,32,32)) #some multi dimensional array
dims = [2,3] #aka the 32s
#for a dimension in dims
#I want the array of numbers from i:i+window in that dimension
#something like
arr1 = x.index(i:i+3,axis = dim[0])
#returns shape 10,3,3,32
arr2 = arr1.index(i:i+3,axis = dim[1])
#returns shape 10,3,3,3
np.take should work for you (read its docs)
In [237]: x=np.ones((10,3,32,32),int)
In [238]: dims=[2,3]
In [239]: arr1=x.take(range(1,1+3), axis=dims[0])
In [240]: arr1.shape
Out[240]: (10, 3, 3, 32)
In [241]: arr2=x.take(range(1,1+3), axis=dims[1])
In [242]: arr2.shape
Out[242]: (10, 3, 32, 3)
You can try slicing with
arr1 = x[:,:,i:i+3,:]
and
arr2 = arr1[:,:,:,i:i+3]
Shape is then
>>> x[:,:,i:i+3,:].shape
(10, 3, 3, 32)

How to add names to a numpy array without changing its dimension?

I have an existing two-column numpy array to which I need to add column names. Passing those in via dtype works in the toy example shown in Block 1 below. With my actual array, though, as shown in Block 2, the same approach is having an unexpected (to me!) side-effect of changing the array dimensions.
How can I convert my actual array, the one named Y in the second block below, to an array having named columns, like I did for array A in the first block?
Block 1: (Columns of A named without reshaping dimension)
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
A
# array([[ 1, 2],
# [ 3, 4],
# [ 50, 100]])
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
A
# array([[(1, 2)],
# [(3, 4)],
# [(50, 100)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
Block 2: (Naming columns of my actual array, Y, reshapes its dimension)
import numpy as np
## Code to reproduce Y, the array I'm actually dealing with
RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])
Y = X.T
Y
# array([[1, 3],
# [2, 2],
# [3, 2],
# [4, 1],
# [5, 1],
# [6, 1]])
## My unsuccessful attempt to add names to the array's columns
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
Y.dtype=dt
Y
# array([[(1, 2), (3, 2)],
# [(3, 4), (2, 1)],
# [(5, 6), (1, 1)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
## What I'd like instead of the results shown just above
# array([[(1, 3)],
# [(2, 2)],
# [(3, 2)],
# [(4, 1)],
# [(5, 1)],
# [(6, 1)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
First because your question asks about giving names to arrays, I feel obligated to point out that using "structured arrays" for the purpose of giving names is probably not the best approach. We often like to give names to rows/columns when we're working with tables, if this is the case I suggest you try something like pandas which is awesome. If you simply want to organize some data in your code, a dictionary of arrays is often much better than a structured array, so for example you can do:
Y = {'ID':X[0], 'Ring':X[1]}
With that out of the way, if you want to use a structured array, here is the clearest way to do it in my opinion:
import numpy as np
RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])
dt = {'names':['ID', 'Ring'], 'formats':[int, int]}
Y = np.zeros(len(RING), dtype=dt)
Y['ID'] = X[0]
Y['Ring'] = X[1]
store-different-datatypes-in-one-numpy-array
another page including a nice solution of adding name to an array which can be used as column
Example:
r = np.core.records.fromarrays([x1,x2,x3],names='a,b,c')
# x1, x2, x3 are flatten array
# a,b,c are field name
This is because Y is not C_CONTIGUOUS, you can check it by Y.flags:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
You can call Y.copy() or Y.ravel() first:
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
print Y.ravel().view(dt) # the result shape is (6, )
print Y.copy().view(dt) # the result shape is (6, 1)
Are you completely sure about the outputs for A and Y? I get something different using Python 2.7.6 and numpy 1.8.1.
My initial output for A is the same as yours, as it should be. After running the following code for the first example
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
the contents of array A are actually
array([[(1, 0), (3, 0)],
[(2, 0), (2, 0)],
[(3, 0), (2, 0)],
[(4, 0), (1, 0)],
[(5, 0), (1, 0)],
[(6, 0), (1, 0)]],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
This makes somewhat more sense to me than the output you added because dtype determines the data-type of every element in the array and the new definition states that every element should contain two fields, so it does, but the value of the second field is set to 0 because there was no preexisting value for the second field.
However, if you would like to make numpy group columns of your existing array so that every row contains only one element, but with each element having two fields, you could introduce a small code change.
Since a tuple is needed to make numpy group elements into a more complex data-type, you could make this happen by creating a new array and turning every row of the existing array into a tuple. Here is a simple working example
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)
Using this short piece of code, array B becomes
array([(1, 2), (3, 4), (50, 100)],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
To make B a 2D array, it is enough to write
B.reshape(len(B), 1) # in this case, even B.size would work instead of len(B)
For the second example, the similar thing needs to be done to make Y a structured array:
Y = np.array(list(map(tuple, X.T)), dtype=dt)
After doing this for your second example, array Y looks like this
array([(1, 3), (2, 2), (3, 2), (4, 1), (5, 1), (6, 1)],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
You can notice that the output is not the same as the one you expect it to be, but this one is simpler because instead of writing Y[0,0] to get the first element, you can just write Y[0]. To also make this array 2D, you can also use reshape, just as with B.
Try re-writing the definition of X:
X = np.array(zip(ID, RING))
and then you don't need to define Y = X.T

Categories