How to convert 1D numpy array of tuples to 2D numpy array? - python

I have a numpy array of tuples:
import numpy as np
the_tuples = np.array([(1, 4), (7, 8)], dtype=[('f0', '<i4'), ('f1', '<i4')])
I would like to have a 2D numpy array instead:
the_2Darray = np.array([[1,4],[7,8]])
I have tried doing several things, such as
import numpy as np
the_tuples = np.array([(1, 4), (7, 8)], dtype=[('f0', '<i4'), ('f1', '<i4')])
the_2Darray = np.array([*the_tuples])
How can I convert it?

Needed to add the_tuples.astype(object).
import numpy as np
the_tuples = np.array([(1, 4), (7, 8)], dtype=[('f0', '<i4'), ('f1', '<i4')])
the_tuples = the_tuples.astype(object)
the_2Darray = np.array([*the_tuples])

This is a structured array - 1d with a compound dtype:
In [2]: arr = np.array([(1, 4), (7, 8)], dtype=[('f0', '<i4'), ('f1', '<i4')])
In [3]: arr.shape, arr.dtype
Out[3]: ((2,), dtype([('f0', '<i4'), ('f1', '<i4')]))
recfunctions has a function designed to do such a conversion:
In [4]: import numpy.lib.recfunctions as rf
In [5]: arr1 = rf.structured_to_unstructured(arr)
In [6]: arr1
Out[6]:
array([[1, 4],
[7, 8]])
view works if all fields have the same dtype, but the shape isn't kept well:
In [7]: arr.view('int')
Out[7]: array([1, 4, 7, 8])
The tolist produces a list of tuples (instead of a list of lists):
In [9]: arr.tolist()
Out[9]: [(1, 4), (7, 8)]
Which can be used as the basis for making a new array:
In [10]: np.array(arr.tolist())
Out[10]:
array([[1, 4],
[7, 8]])
Note that when you created the array (in [2]) you had to use the list of tuples format.

Related

numpy genfromtxt - infer column header if headers not provided

I understand that with genfromtxt, the defaultfmt parameter can be used to infer default column names, which is useful if column names are not in input data. And defaultfmt, if not provided, is defaulted to f%i. E.g.
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, dtype=(int, float, int))
array([(1, 2.0, 3), (4, 5.0, 6)],
dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<i8')])
So here we have autogenerated column names f0, f1, f2.
But what if I want numpy to infer both column headers and data type? I thought you do it with dtype=None. Like this
>>> data3 = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data3, dtype=None, ???) # some parameter combo
array([(1, 2, 3), (4, 5, 6)],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
I still want the automatically generated column names of f0, f1...etc. And I want numpy to automatically determine the datatypes based on the data, which I thought was the whole point of doing dtype=None.
EDIT
But unfortunately that doesn't ALWAYS work.
This case works when I have both floats and ints.
>>> data3b = StringIO("1 2 3.0\n 4 5 6.0")
>>> np.genfromtxt(data3b, dtype=None)
array([(1, 2, 3.), (4, 5, 6.)],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<f8')])
So numpy correctly inferred datatype of i8 for first 2 column, and f8 for last column.
But, if I provide all ints, the inferred columned names disappears.
>>> data3c = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data3c, dtype=None)
array([[1, 2, 3],
[4, 5, 6]])
My identical code may or may not work depending on the input data? That doesn't sound right.
And yes I know there's pandas. But I'm not using pandas on purpose. So please bear with me on that.
In [2]: txt = '''1,2,3
...: 4,5,6'''.splitlines()
Defaylt 2d array of flaots:
In [6]: np.genfromtxt(txt, delimiter=',',encoding=None)
Out[6]:
array([[1., 2., 3.],
[4., 5., 6.]])
2d of ints:
In [7]: np.genfromtxt(txt, dtype=None, delimiter=',',encoding=None)
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
Specified field dtypes:
In [8]: np.genfromtxt(txt, dtype='i,i,i', delimiter=',',encoding=None)
Out[8]:
array([(1, 2, 3), (4, 5, 6)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
Specified field names:
In [9]: np.genfromtxt(txt, dtype=None, delimiter=',',encoding=None, names=['a','b','c'])
Out[9]:
array([(1, 2, 3), (4, 5, 6)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
The unstructured array can be converted to structured with:
In [10]: import numpy.lib.recfunctions as rf
In [11]: rf.unstructured_to_structured(Out[7])
Out[11]:
array([(1, 2, 3), (4, 5, 6)],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
In numpy the default, preferred array, is multidimensional numeric. That's why it produces Out7] if it can.

numpy: creating recarray fast with different column types

I am trying to create a recarray from a series of numpy arrays with column names and mixed variable types.
The following works but is slow:
import numpy as np
a = np.array([1,2,3,4], dtype=np.int)
b = np.array([6,6,6,6], dtype=np.int)
c = np.array([-1.,-2.-1.,-1.], dtype=np.float32)
d = np.array(list(zip(a,b,c,d)),dtype = [('a',np.int),('b',np.int),('c',np.float32)])
d = d.view(np.recarray())
I think there should be a way to do this with np.stack((a,b,c), axis=-1), which is faster than the list(zip()) method. However, there does not seem to be a trivial way to do the stacking an preserving column types. This link does seem to show how to do it, but its pretty clunky and I hope there is a better way.
Thanks for the help!
np.rec.fromarrays is probably what you want:
>>> np.rec.fromarrays([a, b, c], names=['a', 'b', 'c'])
rec.array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])
Here's the field by field approach that I commented on:
In [308]: a = np.array([1,2,3,4], dtype=np.int)
...: b = np.array([6,6,6,6], dtype=np.int)
...: c = np.array([-1.,-2.,-1.,-1.], dtype=np.float32)
...: dt = np.dtype([('a',np.int),('b',np.int),('c',np.float32)])
...:
...:
(I had to correct your copy-n-pasted c).
In [309]: arr = np.zeros(a.shape, dtype=dt)
In [310]: for name, x in zip(dt.names, [a,b,c]):
...: arr[name] = x
...:
In [311]: arr
Out[311]:
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])
Since typically the array will have many more records (rows) than fields this should be faster than the list of tuples approach. In this case it probably is comprable in speed.
In [312]: np.array(list(zip(a,b,c)), dtype=dt)
Out[312]:
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])
rec.fromarrays, after some setup to determine the dtype, does:
_array = recarray(shape, descr)
# populate the record array (makes a copy)
for i in range(len(arrayList)):
_array[_names[i]] = arrayList[i]
The only way to use stack is to create recarrays first:
In [315]: [np.rec.fromarrays((i,j,k), dtype=dt) for i,j,k in zip(a,b,c)]
Out[315]:
[rec.array((1, 6, -1.),
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
rec.array((2, 6, -2.),
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
rec.array((3, 6, -1.),
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
rec.array((4, 6, -1.),
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])]
In [316]: np.stack(_)
Out[316]:
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
dtype=(numpy.record, [('a', '<i8'), ('b', '<i8'), ('c', '<f4')]))

In place sorting by column fails on slices

I'm trying to sort a numpy array by a specific column (in-place) using the solution from this answer. For the most part it works, but it fails on any array that's a view on another array:
In [35]: columnnum = 2
In [36]: a = np.array([[1,2,3], [4,7,5], [9,0,1]])
In [37]: a
Out[37]:
array([[1, 2, 3],
[4, 7, 5],
[9, 0, 1]])
In [38]: b = a[:,(0, 2)]
In [39]: b
Out[39]:
array([[1, 3],
[4, 5],
[9, 1]])
In [40]: a.view(','.join([a.dtype.str] * a.shape[1])).sort(order=['f%d' % columnnum], axis=0)
In [41]: a
Out[41]:
array([[9, 0, 1],
[1, 2, 3],
[4, 7, 5]])
In [42]: b.view(','.join([b.dtype.str] * b.shape[1])).sort(order=['f%d' % columnnum], axis=0)
ValueError: new type not compatible with array.
It looks like numpy doesn't support views of views, which makes a certain amount of sense, but I now can't figure out how to get the view I need for any array, whether it itself is a view or not. So far, I haven't been able to find any way to get the necessary information about the view I have to construct the new one I need.
For now, I'm using the l = l[l[:,columnnum].argsort()] in-place sorting method, which works fine, but since I'm operating on large datasets, I'd like to avoid the extra memory overhead of the argsort() call (the list of indexes). Is there either a way to get the necessary information about the view or to do the sort by column?
In [1019]: a=np.array([[1,2,3],[4,7,5],[9,0,1]])
In [1020]: b=a[:,(0,2)]
This is the a that you are sorting; a structured array with 3 fields. It uses the same data buffer, but interpreting groups of 3 ints as fields rather than columns.
In [1021]: a.view('i,i,i')
Out[1021]:
array([[(1, 2, 3)],
[(4, 7, 5)],
[(9, 0, 1)]],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
By the same logic, you try to view b:
In [1022]: b.view('i,i')
/usr/local/bin/ipython3:1: DeprecationWarning: Changing the shape of non-C contiguous array by
descriptor assignment is deprecated. To maintain
the Fortran contiguity of a multidimensional Fortran
array, use 'a.T.view(...).T' instead
#!/usr/bin/python3
....
ValueError: new type not compatible with array.
But if I use 3 fields instead of 2, it works (but with the same warning):
In [1023]: b.view('i,i,i')
/usr/local/bin/ipython3:1: DeprecationWarning:...
Out[1023]:
array([[(1, 4, 9), (3, 5, 1)]],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
The problem is that b is Fortran order. Check b.flags or
In [1026]: a.strides
Out[1026]: (12, 4)
In [1027]: b.strides
Out[1027]: (4, 12)
b is a copy, not a view. I don't know, off hand, why this construction of b changed the order.
Heeding the warning, I can do:
In [1047]: b.T.view('i,i,i').T
Out[1047]:
array([[(1, 4, 9), (3, 5, 1)]],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
A default copy (order c) of b can be viewed as 2 fields:
In [1042]: b1=b.copy()
In [1043]: b1.strides
Out[1043]: (8, 4)
In [1044]: b1.view('i,i')
Out[1044]:
array([[(1, 3)],
[(4, 5)],
[(9, 1)]],
dtype=[('f0', '<i4'), ('f1', '<i4')])
A footnote on: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
The memory layout of an advanced indexing result is optimized for each indexing operation and no particular memory order can be assumed.
====================
b in this case was constructed with advanced indexing, and thus is a copy, even a true view might not be viewable in this way either:
In [1052]: a[:,:2].view('i,i')
....
ValueError: new type not compatible with array.
In [1054]: a[:,:2].copy().view('i,i')
Out[1054]:
array([[(1, 2)],
[(4, 7)],
[(9, 0)]],
dtype=[('f0', '<i4'), ('f1', '<i4')])
The view is selecting a subset of the values: 'i,i,x,i,i,x,i,i,x...', and that does not translated into structured dtype.
The structured view of a does: '(i,i,i),(i,i,i),...'
You can select a subset of the fields of a structured array:
In [1059]: a1=a.view('i,i,i')
In [1060]: a1
Out[1060]:
array([[(1, 2, 3)],
[(4, 7, 5)],
[(9, 0, 1)]],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
In [1061]: b1=a1[['f0','f2']]
In [1062]: b1
Out[1062]:
array([[(1, 3)],
[(4, 5)],
[(9, 1)]],
dtype=[('f0', '<i4'), ('f2', '<i4')])
But there are limits as to what you can do with such a view. Values can be changed in a1, and seen in a and b1. But I get an error if I try to change values in b1.
This is on the development edge.

Is it possible to retain the datatype of individual numpy arrays with concatenation

import numpy as np
a = np.array([[1,2],[3,4],[5,6]])
a = np.reshape(a,(1,6))
b = np.array([[1,2.1],[3.5,4],[5,6.8]])
b = np.reshape(b,(1,6))
c = np.concatenate((a,b))
When I concatenate the arrays a and b with np.concatenate. I get an array of type float. Is it possible to retain the datatype of the resulting arrays after concatenation?
There are mixed types... If I have read you correctly you essentially want to pair the values from 'a' and 'b'. If that is the case, you can flatten your input arrays and reassemble them while retaining the appropriate dtype. This is one approach, shown verbosely so you can alter the format if you want to construct it differently.
import numpy as np
a = np.array([[1, 2], [3, 4], [5, 6]])
a = a.flatten()
b = np.array([[1, 2.1], [3.5, 4], [5, 6.8]])
b = b.flatten()
dt = a.dtype.descr + b.dtype.descr
c = np.array(list(zip(a, b)), dtype=dt)
frmt = """
:Array 'a'... {}
:Array 'b'... {}
:Combined dtype ... {}
:Resultant structured array...
{!r:}
:Viewed column-wise
{!r:}
:
"""
print(frmt.format(a, b, dt, c, c.reshape(c.shape[0], 1)))
output
:Array 'a'... [1 2 3 4 5 6]
:Array 'b'... [ 1.000 2.100 3.500 4.000 5.000 6.800]
:Combined dtype ... [('', '<i8'), ('', '<f8')]
:Resultant structured array...
array([(1, 1.0), (2, 2.1), (3, 3.5), (4, 4.0), (5, 5.0), (6, 6.8)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
:View column-wise
array([[(1, 1.0)],
[(2, 2.1)],
[(3, 3.5)],
[(4, 4.0)],
[(5, 5.0)],
[(6, 6.8)]],
dtype=[('f0', '<i8'), ('f1', '<f8')])
:
As an alternative to all that, you can exploit some optional tools from
https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py
from numpy.lib import recfunctions as rfn
a0 = np.array([[1, 2],[3, 4],[5, 6]])
b0 = np.array([[1, 2.1],[3.5, 4],[5, 6.8]])
d = rfn.merge_arrays((a0, b0), flatten=True, usemask=False, asrecarray=False)
Yielding...
>>> c
array([(1, 1.0), (2, 2.1), (3, 3.5), (4, 4.0), (5, 5.0), (6, 6.8)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
>>> d
array([(1, 1.0), (2, 2.1), (3, 3.5), (4, 4.0), (5, 5.0), (6, 6.8)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
>>> c == d
array([1, 1, 1, 1, 1, 1], dtype=bool)
EDIT
Based on your comments, it appears you need to consider how you really want to keep the data together. An array with a dtype of 'object' is possible, but probably not very useful...possible, but not the best way of organizing it. Consider saving to a *.npz format instead, if your intent is for archiving. In any event...
a = np.arange(9).reshape(3,3)
>>> b = np.arange(9.).reshape(3,3)
>>> c = np.array(["array a", "array b"])
>>> d = np.array([a,b,c], dtype='object')
>>> d
array([array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]),
array([[ 0.000, 1.000, 2.000],
[ 3.000, 4.000, 5.000],
[ 6.000, 7.000, 8.000]]),
array(['array a', 'array b'],
dtype='<U7')], dtype=object)

why setting dtype on numpy array changes dimensions?

I am tryng to understand how to set dtypes of an array. My original numpy array dimensions are (583760, 7) i.e. 583760 rows and 7 columns. I am setting dtype as follows
>>> allRics.shape
(583760, 7)
>>> allRics.dtype = [('idx', np.float), ('opened', np.float), ('time', np.float),('trdp1',np.float),('trdp0',np.float),('dt',np.float),('value',np.float)]
>>> allRics.shape
(583760, 1)
Why is there a change in the original shape of the array? What causes this change? I am basically trying to sort original numpy array by time column and thats why I am setting the dtype. But after the dimension change, I am not able to sort array
>>> x=np.sort(allRics,order='time')
there is no change in the output of the above command. Could you please advice?
You are turning your array into a structured array. Basically, instead of a 2D array it is now treated as a 1D array of structs. Take a look as a simpler example below:
>>> import numpy as np
>>> arr = np.array([(1,2,3),(3,4,5)])
>>> arr
array([[1, 2, 3],
[3, 4, 5]])
>>> arr.shape
(2, 3)
>>> arr.dtype=[('a',int),('b',int),('c', int)]
>>> arr # Notice that tuples inside the elements
array([[(1, 2, 3)],
[(3, 4, 5)]],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr.shape
(2, 1)
The structured array not sorting is most assurdly a bug. It looks like a work around it so actually declare the array a structured array to begin with:
>>> arr_s = np.sort(arr, order='b')
>>> arr_s
array([[(1, 2, 3)],
[(3, 4, 5)]],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> dtype=[('a',np.int64),('b',np.int64),('c', np.int64)]
>>> arr = np.array([(5,2,3),(3,4,1)], dtype=dtype)
>>> arr
array([(5, 2, 3), (3, 4, 1)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr_s = np.sort(arr, order='a')
>>> arr_s
array([(3, 4, 1), (5, 2, 3)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr_s = np.sort(arr, order='b')
>>> arr_s
array([(5, 2, 3), (3, 4, 1)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr_s = np.sort(arr, order='c')
>>> arr_s
array([(3, 4, 1), (5, 2, 3)],
dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>>
You might be able to avoid using structured arrays alltogether if all you are using them for is sorting. You could do something like:
new_order = np.argosrt(allRics[:, 2])
x = allRics[new_order]

Categories