I have an existing two-column numpy array to which I need to add column names. Passing those in via dtype works in the toy example shown in Block 1 below. With my actual array, though, as shown in Block 2, the same approach is having an unexpected (to me!) side-effect of changing the array dimensions.
How can I convert my actual array, the one named Y in the second block below, to an array having named columns, like I did for array A in the first block?
Block 1: (Columns of A named without reshaping dimension)
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
A
# array([[ 1, 2],
# [ 3, 4],
# [ 50, 100]])
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
A
# array([[(1, 2)],
# [(3, 4)],
# [(50, 100)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
Block 2: (Naming columns of my actual array, Y, reshapes its dimension)
import numpy as np
## Code to reproduce Y, the array I'm actually dealing with
RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])
Y = X.T
Y
# array([[1, 3],
# [2, 2],
# [3, 2],
# [4, 1],
# [5, 1],
# [6, 1]])
## My unsuccessful attempt to add names to the array's columns
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
Y.dtype=dt
Y
# array([[(1, 2), (3, 2)],
# [(3, 4), (2, 1)],
# [(5, 6), (1, 1)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
## What I'd like instead of the results shown just above
# array([[(1, 3)],
# [(2, 2)],
# [(3, 2)],
# [(4, 1)],
# [(5, 1)],
# [(6, 1)]],
# dtype=[('ID', '<i4'), ('Ring', '<i4')])
First because your question asks about giving names to arrays, I feel obligated to point out that using "structured arrays" for the purpose of giving names is probably not the best approach. We often like to give names to rows/columns when we're working with tables, if this is the case I suggest you try something like pandas which is awesome. If you simply want to organize some data in your code, a dictionary of arrays is often much better than a structured array, so for example you can do:
Y = {'ID':X[0], 'Ring':X[1]}
With that out of the way, if you want to use a structured array, here is the clearest way to do it in my opinion:
import numpy as np
RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])
dt = {'names':['ID', 'Ring'], 'formats':[int, int]}
Y = np.zeros(len(RING), dtype=dt)
Y['ID'] = X[0]
Y['Ring'] = X[1]
store-different-datatypes-in-one-numpy-array
another page including a nice solution of adding name to an array which can be used as column
Example:
r = np.core.records.fromarrays([x1,x2,x3],names='a,b,c')
# x1, x2, x3 are flatten array
# a,b,c are field name
This is because Y is not C_CONTIGUOUS, you can check it by Y.flags:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
You can call Y.copy() or Y.ravel() first:
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
print Y.ravel().view(dt) # the result shape is (6, )
print Y.copy().view(dt) # the result shape is (6, 1)
Are you completely sure about the outputs for A and Y? I get something different using Python 2.7.6 and numpy 1.8.1.
My initial output for A is the same as yours, as it should be. After running the following code for the first example
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
the contents of array A are actually
array([[(1, 0), (3, 0)],
[(2, 0), (2, 0)],
[(3, 0), (2, 0)],
[(4, 0), (1, 0)],
[(5, 0), (1, 0)],
[(6, 0), (1, 0)]],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
This makes somewhat more sense to me than the output you added because dtype determines the data-type of every element in the array and the new definition states that every element should contain two fields, so it does, but the value of the second field is set to 0 because there was no preexisting value for the second field.
However, if you would like to make numpy group columns of your existing array so that every row contains only one element, but with each element having two fields, you could introduce a small code change.
Since a tuple is needed to make numpy group elements into a more complex data-type, you could make this happen by creating a new array and turning every row of the existing array into a tuple. Here is a simple working example
import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)
Using this short piece of code, array B becomes
array([(1, 2), (3, 4), (50, 100)],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
To make B a 2D array, it is enough to write
B.reshape(len(B), 1) # in this case, even B.size would work instead of len(B)
For the second example, the similar thing needs to be done to make Y a structured array:
Y = np.array(list(map(tuple, X.T)), dtype=dt)
After doing this for your second example, array Y looks like this
array([(1, 3), (2, 2), (3, 2), (4, 1), (5, 1), (6, 1)],
dtype=[('ID', '<i4'), ('Ring', '<i4')])
You can notice that the output is not the same as the one you expect it to be, but this one is simpler because instead of writing Y[0,0] to get the first element, you can just write Y[0]. To also make this array 2D, you can also use reshape, just as with B.
Try re-writing the definition of X:
X = np.array(zip(ID, RING))
and then you don't need to define Y = X.T
Related
I have an array with the form as follows (with much more elements):
coords = np.array(
[[(2, 1), 1613, 655],
[(2, 5), 906, 245],
[(5, 2), 0, 0]])
And I would like to find the index of a specific tuple. For example, I might be looking for the position of the tuple (2, 5), which should be in position 1 in this case.
I have tried with np.where and np.argwhere, with no luck:
pos = np.argwhere(coords == (2,5))
print(pos)
>> DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
pos = np.where(coords == (2,5))
print(pos)
>> DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
How can I get the index of a tuple?
If you intend to use a numpy array containing objects, all comparison will be done using python itself. At that point, you have given up almost all the advantages of numpy and may as well use a list:
coords = coords.tolist()
index = next((i for i, n in enumerate(coords) if n[0] == (2, 5)), -1)
If you really want to use numpy, I suggest you transform your data appropriately. Two simple options come to mind. You can either expand your tuple and create an array of shape (N, 4), or you can create a structured array that preserves the arrangement of the data as a unit, and has shape (N,). The former is much simpler, while the later is, in my opinion, more elegant.
If you flatten the coordinates:
coords = np.array([[x[0][0], x[0][1], x[1], x[2]] for x in coords])
index = np.flatnonzero(np.all(coords[:, :2] == [2, 5], axis=1))
The structured solution:
coordt = np.dtype([('x', np.int_), ('y', np.int_)])
dt = np.dtype([('coord', coordt), ('a', np.int_), ('b', np.int_)])
coords = np.array([((2, 1), 1613, 655), ((2, 5), 906, 245), ((5, 2), 0, 0)], dtype=dt)
index = np.flatnonzero(coords['coord'] == np.array((2, 5), dtype=coordt))
You can also just transform the first part of your data to a real numpy array, and operate on that:
coords = np.array(coords[:, 0].tolist())
index = np.flatnonzero((coords == [2, 5]).all(axis=1))
You should not compare (2, 5) and coords, but compare (2, 5) and coords[:, 0].
Try this code.
np.where([np.array_equal(coords[:, 0][i], (2, 5)) for i in range(len(coords))])[0]
Try this one
import numpy as np
coords = np.array([[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]])
tpl=(2,5)
i=0 # index of the column in which the tuple you are looking for is listed
pos=([t[i] for t in coords].index(tpl))
print(pos)
Assuming your target tuple (e.g. (2,5) ) is always in the first column of the numpy array coords i.e. coords[:,0] you can simply do the following without any loops!
[*coords[:,0]].index((2,5))
If the tuples aren't necessarily in the first column always, then you can use,
[*coords.flatten()].index((2,5))//3
Hope that helps.
First of all, the tuple (2, 5) is in position 0 as it is the first element of the list [(2, 5), 906, 245].
And second of all, you can use basic python functions to check the index of a tuple in that array. Here's how you do it:
>>> coords = np.array([[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]])
>>>
>>> coords_list = cl = list(coords)
>>> cl
[[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]]
>>>
>>> tuple_to_be_checked = tuple_ = (2, 5)
>>> tuple_
(2, 5)
>>>
>>> for i in range(0, len(cl), 1): # Dynamically works for any array `cl`
for j in range(0, len(cl[i]), 1): # Dynamic; works for any list `cl[i]`
if cl[i][j] == tuple_: # Found the tuple
# Print tuple index and containing list index
print(f'Tuple at index {j} of list at index {i}')
break # Break to avoid unwanted loops
Tuple at index 0 of list at index 1
>>>
I want to change a value of given array in numpy to a multiplication of other elements of the array. Therefore I want to extract the multi_index and manipulate it so that I can identify the position and use it. (e.g. nditer through all elements and always do 'current position in array = next position +position above in array'
I tried to call a function with the multi_index of the current position and want said function to take it and e.g. increase it by one position. (<0 , 1> ---> <0 , 2> while <0 , n> n>=length otherwise <0 , 1> ---> <1 , 0>)
import numpy as np;
def fill(multi_index):
"This will get the new value of the current iteration value judgeing from its index"
return a[(multi_index + <0,1>) + (multi_index + <0,-1>) + (multi_index + <1,0>) + (multi_index + <-1,0>)]
#a = np.random.uniform(0, 100, size=(100, 100))
a = np.arange(6).reshape(2,3)
it = np.nditer(a, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
it[0] = fill(it.multi_index)
print(it[0])
it.iternext()
"""for x in np.nditer(a, flags=['multi_index'], op_flags=['readwrite']):
print(x)"""
I don't understand how to extract the actual "coordinates" from the multi_index. I am kinda new to python so please try to explain it thoroughly if possible. Thanks.
Edit: Before I only coded on C++ and a bit Java, so I used to mainly using arrays (in c++ it would be somthing like this:
int main() {
int a[100][100];
for (int i=1, i<=a.length-1, i++) {
for (int j=1, i<=a.width-1, j++) {
a[i][j] = 1/4 (a[i][j+1]+a[i][j-1]+a[i+1][j]+a[i-1][j]);
}
}
return 0;
}
In [152]: a = np.arange(6).reshape(2,3)
In [153]: a
Out[153]:
array([[0, 1, 2],
[3, 4, 5]])
Let's run your nditer and look at its values:
In [157]: it = np.nditer(a, flags=['multi_index'], op_flags=['readwrite'])
In [158]: while not it.finished:
...: print(it.multi_index, a[it.multi_index], it[0], type(it[0]))
...: it.iternext()
...:
(0, 0) 0 0 <class 'numpy.ndarray'>
(0, 1) 1 1 <class 'numpy.ndarray'>
(0, 2) 2 2 <class 'numpy.ndarray'>
(1, 0) 3 3 <class 'numpy.ndarray'>
(1, 1) 4 4 <class 'numpy.ndarray'>
(1, 2) 5 5 <class 'numpy.ndarray'>
At each iteration multiindex is a tuple of the i,j indices. a[it.multiindex] then selects that item from the array. But it[0] is also that item, but wrapped as a 0d array. If you aren't comfortable with the idea of a 0d array (shape ()) then nditer is not the tool for you (at this time).
If you just want the sequential indexing tuples, ndindex works just as well:
In [162]: list(np.ndindex(a.shape))
Out[162]: [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
(in fact, np.lib.index_tricks.py shows that ndindex uses the nditer multiindex. nditer isn't commonly used in numpy Python level code.)
Or to get indices plus value:
In [177]: list(np.ndenumerate(a))
Out[177]: [((0, 0), 0), ((0, 1), 1), ((0, 2), 2), ((1, 0), 3), ((1, 1), 4), ((1, 2), 5)]
Just values in flat order:
In [178]: a.ravel()
Out[178]: array([0, 1, 2, 3, 4, 5])
BUT, in numpy we prefer not to iterate at all. Instead we try to write code that works with the whole array, using the fast compiled numpy methods. Iteration on arrays is slow, slower than iteration on lists.
===
Looks like your iteration, in a somewhat stylized sense, is:
for i in range(n):
for j in range(m):
a[i,j] = ( a[i,j+1] + a[i,j-1] + a[i+1,j] + a[i-1,j] )/4
There some details to worry about. What about the edges, where j+/-1 is out of bounds? And is this calculation sequential, so that a[i,j] depends on the changes just made to a[i,j-1]; or is it buffered?
In general sequential, iterative calculations on an array like this are a bad fit for numpy.
On the other hand, buffered calculations can be nicely done with whole-array slices
x[1:-1, 1:-1] = (x[:,:-1]+x[:,1:]+x[:-1,:]+x[1:,:])/4
There are also, in scipy some convolution functions that perform calculations on moving windows.
I have a 2D Numpy array with 3 columns. It looks something like this array([[0, 20, 1], [1,2,1], ........, [20,1,1]]). It basically is array of list of lists. How can I convert this matrix into array([(0,20,1), (1,2,1), ........., (20,1,1)])? I want the output to be a array of triple. I have been trying to use tuple and map functions described in Convert numpy array to tuple,
R = mydata #my data is sparse matrix of 1's and 0's
#First row
#R[0] = array([0,0,1,1]) #Just a sample
(rows, cols) = np.where(R)
vals = R[rows, cols]
QQ = zip(rows, cols, vals)
QT = tuple(map(tuple, np.array(QQ))) #type of QT is tuple
QTA = np.array(QT) #type is array
#QTA gives an array of lists
#QTA[0] = array([0, 2, 1])
#QTA[1] = array([0, 3, 1])
But the desired output is QTA should be array of tuples i.e QTA = array([(0,2,1), (0,3,1)]).
Your 2d array is not a list of lists, but it readily converts to that
a.tolist()
As Jimbo shows, you can convert this to a list of tuples with a comprehension (a map will also work). But when you try to wrap that in an array, you get the 2d array again. That's because np.array tries to create as large a dimensioned array as the data allows. And with sublists (or tuples) all of the same length, that's a 2d array.
To preserve tuples you have switch to a structured array. For example:
a = np.array([[0, 20, 1], [1,2,1]])
a1=np.empty((2,), dtype=object)
a1[:]=[tuple(i) for i in a]
a1
# array([(0, 20, 1), (1, 2, 1)], dtype=object)
Here I create an empty structured array with dtype object, the most general kind. Then I assign values, using a list of tuples, which is the proper data structure for this task.
An alternative dtype is
a1=np.empty((2,), dtype='int,int,int')
....
array([(0, 20, 1), (1, 2, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
Or in one step: np.array([tuple(i) for i in a], dtype='int,int,int')
a1=np.empty((2,), dtype='(3,)int') produces the 2d array. dt=np.dtype([('f0', '<i4', 3)]) produces
array([([0, 20, 1],), ([1, 2, 1],)],
dtype=[('f0', '<i4', (3,))])
which nests 1d arrays in the tuples. So it looks like object or 3 fields is the closest we can get to an array of tuples.
not the great solution, but this will work:
# take from your sample
>>>a = np.array([[0, 20, 1], [1,2,1], [20,1,1]])
# construct an empty array with matching length
>>>b = np.empty((3,), dtype=tuple)
# manually put values into tuple and store in b
>>>for i,n in enumerate(a):
>>> b[i] = (n[0],n[1],n[2])
>>>b
array([(0, 20, 1), (1, 2, 1), (20, 1, 1)], dtype=object)
>>>type(b)
numpy.ndarray
Suppose I create a 2 dimensional array
m = np.random.normal(0, 1, size=(1000, 2))
q = np.zeros(shape=(1000,1))
print m[:,0] -q
When I take m[:,0].shape I get (1000,) as opposed to (1000,1) which is what I want. How do I coerce m[:,0] to a (1000,1) array?
By selecting the 0th column in particular, as you've noticed, you reduce the dimensionality:
>>> m = np.random.normal(0, 1, size=(5, 2))
>>> m[:,0].shape
(5,)
You have a lot of options to get a 5x1 object back out. You can index using a list, rather than an integer:
>>> m[:, [0]].shape
(5, 1)
You can ask for "all the columns up to but not including 1":
>>> m[:,:1].shape
(5, 1)
Or you can use None (or np.newaxis), which is a general trick to extend the dimensions:
>>> m[:,0,None].shape
(5, 1)
>>> m[:,0][:,None].shape
(5, 1)
>>> m[:,0, None, None].shape
(5, 1, 1)
Finally, you can reshape:
>>> m[:,0].reshape(5,1).shape
(5, 1)
but I'd use one of the other methods for a case like this.
I have a numpy structured array of the following form:
x = np.array([(1,2,3)]*2, [('t', np.int16), ('x', np.int8), ('y', np.int8)])
I now want to generate views into this array that team up 't' with either 'x' or 'y'. The usual syntax creates a copy:
v_copy = x[['t', 'y']]
v_copy
#array([(1, 3), (1, 3)],
# dtype=[('t', '<i2'), ('y', '|i1')])
v_copy.base is None
#True
This is not unexpected, since picking two fields is "fancy indexing", at which point numpy gives up and makes a copy. Since my actual records are large, I want to avoid the copy at all costs.
It is not at all true that the required elements cannot be accessed within numpy's strided memory model. Looking at the individual bytes in memory:
x.view(np.int8)
#array([1, 0, 2, 3, 1, 0, 2, 3], dtype=int8)
one can figure out the necessary strides:
v = np.recarray((2,2), [('b', np.int8)], buf=x, strides=(4,3))
v
#rec.array([[(1,), (3,)],
# [(1,), (3,)]],
# dtype=[('b', '|i1')])
v.base is x
#True
Clearly, v points to the correct locations in memory without having created a copy. Unfortunately, numpy won't allow me to reinterpret these memory locations as the original data types:
v_view = v.view([('t', np.int16), ('y', np.int8)])
#ValueError: new type not compatible with array.
Is there a way to trick numpy into doing this cast, so that an array v_view equivalent to v_copy is created, but without having made a copy? Perhaps working directly on v.__array_interface__, as is done in np.lib.stride_tricks.as_strided()?
You can construct a suitable dtype like so
dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2)))
and then do
y = np.recarray(x.shape, buf=x, strides=x.strides, dtype=dt2)
In future Numpy versions (> 1.6), you can also do
dt2 = np.dtype(dict(names=('t', 'x'), formats=(np.int16, np.int8), offsets=(0, 2), itemsize=4))
y = x.view(dt2)
This works with numpy 1.6.x and avoids creating a recarray:
dt2 = {'t': (np.int16, 0), 'y': (np.int8, 3)}
v_view = np.ndarray(x.shape, dtype=dt2, buffer=x, strides=x.strides)
v_view
#array([(1, 3), (1, 3)],
# dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True
One can wrap this in a class overloading np.ndarray:
class arrayview(np.ndarray):
def __new__(subtype, x, fields):
dtype = {f: x.dtype.fields[f] for f in fields}
return np.ndarray.__new__(subtype, x.shape, dtype,
buffer=x, strides=x.strides)
v_view = arrayview(x, ('t', 'y'))
v_view
#arrayview([(1, 3), (1, 3)],
# dtype=[('t', '<i2'), ('', '|V1'), ('y', '|i1')])
v_view.base is x
#True