I have one 2D array and one 1D array. I would like to zip them together.
import numpy as np
arr2D = [[5.88964708e-02, -2.38142395e-01, -4.95821417e-01, -7.07269274e-01],
[0.53363666, 0.1654723 , -0.16439857, -0.44880487]]
arr2D = np.asarray(arr2D)
arr1D = np.arange(7, 8.5+0.5, 0.5)
arr1D = np.asarray(arr1D)
res = np.array(list(zip(arr1D, arr2D)))
print(res)
which results in:
[[7.0 array([ 0.05889647, -0.2381424 , -0.49582142, -0.70726927])]
[7.5 array([ 0.53363666, 0.1654723 , -0.16439857, -0.44880487])]]
But I am trying to get:
[[(7.0, 0.05889647), (7.5, -0.2381424), (8.0, -0.49582142), (8.5, -0.70726927)]]
[[(7.0, 0.53363666), (7.5, 0.1654723),(8.0, -0.16439857), (8.5, -0.44880487)]]
How can I do this?
You were almost there! Here's a solution:
list(map(lambda x: list(zip(arr1D, x)), arr2D))
[[(7.0, 0.0588964708),
(7.5, -0.238142395),
(8.0, -0.495821417),
(8.5, -0.707269274)],
[(7.0, 0.53363666), (7.5, 0.1654723), (8.0, -0.16439857), (8.5, -0.44880487)]]
In [382]: arr2D = [[5.88964708e-02, -2.38142395e-01, -4.95821417e-01, -7.07269274e-01],
...: [0.53363666, 0.1654723 , -0.16439857, -0.44880487]]
...: arr2D = np.asarray(arr2D)
...: arr1D = np.arange(7, 8.5+0.5, 0.5) # already an array
In [384]: arr2D.shape
Out[384]: (2, 4)
In [385]: arr1D.shape
Out[385]: (4,)
zip iterates on the first dimension of the arguments, and stops with the shortest:
In [387]: [[i,j[0:2]] for i,j in zip(arr1D, arr2D)]
Out[387]:
[[7.0, array([ 0.05889647, -0.2381424 ])],
[7.5, array([0.53363666, 0.1654723 ])]]
If we transpose the 2d, so it is now (4,2), we get a four element list:
In [389]: [[i,j] for i,j in zip(arr1D, arr2D.T)]
Out[389]:
[[7.0, array([0.05889647, 0.53363666])],
[7.5, array([-0.2381424, 0.1654723])],
[8.0, array([-0.49582142, -0.16439857])],
[8.5, array([-0.70726927, -0.44880487])]]
We could add another level of iteration to get the desired pairs:
In [390]: [[(i,k) for k in j] for i,j in zip(arr1D, arr2D.T)]
Out[390]:
[[(7.0, 0.0588964708), (7.0, 0.53363666)],
[(7.5, -0.238142395), (7.5, 0.1654723)],
[(8.0, -0.495821417), (8.0, -0.16439857)],
[(8.5, -0.707269274), (8.5, -0.44880487)]]
and with list transpose idiom:
In [391]: list(zip(*_))
Out[391]:
[((7.0, 0.0588964708), (7.5, -0.238142395), (8.0, -0.495821417), (8.5, -0.707269274)),
((7.0, 0.53363666), (7.5, 0.1654723), (8.0, -0.16439857), (8.5, -0.44880487))]
Or we can get that result directly by moving the zip into an inner loop:
[[(i,k) for i,k in zip(arr1D, row)] for row in arr2D]
In other words, you are pairing the elements of arr1D with the elements of each row of 2D, rather than with the whole row.
Since you already have arrays, one of the array solutions might be better, but I'm trying to clarify what is happening with zip.
numpy
There are various ways of building a numpy array from these arrays. Since you want to repeat the arr1D values:
This repeat makes a (4,2) array that matchs arr2D (tile also works):
In [400]: arr1D[None,:].repeat(2,0)
Out[400]:
array([[7. , 7.5, 8. , 8.5],
[7. , 7.5, 8. , 8.5]])
In [401]: arr2D
Out[401]:
array([[ 0.05889647, -0.2381424 , -0.49582142, -0.70726927],
[ 0.53363666, 0.1654723 , -0.16439857, -0.44880487]])
which can then be joined on a new trailing axis:
In [402]: np.stack((_400, arr2D), axis=2)
Out[402]:
array([[[ 7. , 0.05889647],
[ 7.5 , -0.2381424 ],
[ 8. , -0.49582142],
[ 8.5 , -0.70726927]],
[[ 7. , 0.53363666],
[ 7.5 , 0.1654723 ],
[ 8. , -0.16439857],
[ 8.5 , -0.44880487]]])
Or a structured array with tuple-like display:
In [406]: arr = np.zeros((2,4), dtype='f,f')
In [407]: arr
Out[407]:
array([[(0., 0.), (0., 0.), (0., 0.), (0., 0.)],
[(0., 0.), (0., 0.), (0., 0.), (0., 0.)]],
dtype=[('f0', '<f4'), ('f1', '<f4')])
In [408]: arr['f1'] = arr2D
In [409]: arr['f0'] = _400
In [410]: arr
Out[410]:
array([[(7. , 0.05889647), (7.5, -0.2381424 ), (8. , -0.49582142),
(8.5, -0.70726925)],
[(7. , 0.5336367 ), (7.5, 0.1654723 ), (8. , -0.16439857),
(8.5, -0.44880486)]], dtype=[('f0', '<f4'), ('f1', '<f4')])
You can use numpy.tile to expand the 1d array, and then use numpy.dstack, namely:
import numpy as np
arr2D = np.array([[5.88964708e-02, -2.38142395e-01, -4.95821417e-01, -7.07269274e-01],
[0.53363666, 0.1654723 , -0.16439857, -0.44880487]])
arr1D = np.arange(7, 8.5+0.5, 0.5)
np.dstack([np.tile(arr1D, (2,1)), arr2D])
array([[[ 7. , 0.05889647],
[ 7.5 , -0.2381424 ],
[ 8. , -0.49582142],
[ 8.5 , -0.70726927]],
[[ 7. , 0.53363666],
[ 7.5 , 0.1654723 ],
[ 8. , -0.16439857],
[ 8.5 , -0.44880487]]])
Related
I am trying to convert my array of lists into an array of tuples.
results=
array([[1. , 0.0342787 ],
[0. , 0.04436508],
[1. , 0.09101833 ],
[0. , 0.03492954],
[1. , 0.06059857]])
results1=np.empty((5,), dtype=object)
results1[:] = np.array([tuple(i) for i in results])
results1
I tried the above following the advice given here but I get the error ValueError: could not broadcast input array from shape (5,2) into shape (5).
How do I create a numpy array of tuples from a numpy array of lists?
Try this, in order to get an array of tuples as mentioned in title:
import numpy as np
results = np.array([[1. , 0.0342787 ],
[0. , 0.04436508],
[1. , 0.09101833],
[0. , 0.03492954],
[1. , 0.06059857]])
temp = []
for item in results:
temp.append(tuple(item))
results1= np.empty(len(temp), dtype=object)
results1[:] = temp
print(results1)
# array([(1.0, 0.0342787), (0.0, 0.04436508), (1.0, 0.09101833),
# (0.0, 0.03492954), (1.0, 0.06059857)], dtype=object)
Remove np.array() from the assignment step in np.array([tuple(i) for i in results]) and it will work like a breeze. When you pass this list to np.array, the highest possible number of axes is automatically guessed, and your tuples, having pairs of numbers, end up reproducing a (5,2) matrix.
Why dont do this?:
import numpy as np
results= np.array([[1. , 0.0342787 ],
[0. , 0.04436508],
[1. , 0.09101833 ],
[0. , 0.03492954],
[1. , 0.06059857]])
results1 = [tuple(i) for i in results]
results1
Output:
[(1.0, 0.0342787), (0.0, 0.04436508), (1.0, 0.09101833), (0.0,
0.03492954), (1.0, 0.06059857)]
Working from the examples in my answer in your link, Convert array of lists to array of tuples/triple
In [22]: results=np.array([[1. , 0.0342787 ],
...: [0. , 0.04436508],
...: [1. , 0.09101833 ],
...: [0. , 0.03492954],
...: [1. , 0.06059857]])
In [23]: a1 = np.empty((5,), object)
In [24]: a1[:]= [tuple(i) for i in results]
In [25]: a1
Out[25]:
array([(1.0, 0.0342787), (0.0, 0.04436508), (1.0, 0.09101833),
(0.0, 0.03492954), (1.0, 0.06059857)], dtype=object)
or the structured array:
In [26]: a1 = np.array([tuple(i) for i in results], dtype='i,i')
In [27]: a1
Out[27]:
array([(1, 0), (0, 0), (1, 0), (0, 0), (1, 0)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
You got the error because you did not follow my answer:
In [30]: a1[:]= np.array([tuple(i) for i in results])
Traceback (most recent call last):
File "<ipython-input-30-5c1cc6c4105a>", line 1, in <module>
a1[:]= np.array([tuple(i) for i in results])
ValueError: could not broadcast input array from shape (5,2) into shape (5)
The a1[:]=... assign works for a list, but not for an array.
Note that wrapping the tuple list in an array just reproduces the original results:
In [31]: np.array([tuple(i) for i in results])
Out[31]:
array([[1. , 0.0342787 ],
[0. , 0.04436508],
[1. , 0.09101833],
[0. , 0.03492954],
[1. , 0.06059857]])
A list of tuples:
In [32]: [tuple(i) for i in results]
Out[32]:
[(1.0, 0.0342787),
(0.0, 0.04436508),
(1.0, 0.09101833),
(0.0, 0.03492954),
(1.0, 0.06059857)]
I have an array of lists (corr: N-Dimensional array)
s_cluster_data
Out[410]:
array([[ 0.9607611 , 0.19538569, 0. ],
[ 1.03990463, 0.22274072, 0. ],
[ 1.09430461, 0.22603228, 0. ],
...,
[ 1.10802461, -0.54190659, 2. ],
[ 0.9288097 , -0.49195368, 2. ],
[ 0.81606986, -0.47141286, 2. ]])
I would like to make the third column an integer. I've tried to assign dtype as such
dtype=[('A','f8'),('B','f8'),('C','i4')]
s_cluster_data = np.array(s_cluster_data, dtype=dtype)
s_cluster_data
Out[414]:
array([[( 0.9607611 , 0.9607611 , 0), ( 0.19538569, 0.19538569, 0),
( 0. , 0. , 0)],
[( 1.03990463, 1.03990463, 1), ( 0.22274072, 0.22274072, 0),
( 0. , 0. , 0)],
[( 1.09430461, 1.09430461, 1), ( 0.22603228, 0.22603228, 0),
( 0. , 0. , 0)],
...,
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
Which creates an array of lists of tuples (corr: array with dtype), with each index in lists becoming a separate tuple.
I've also tried to take apart the array, read it in as array of tuples, but return back to original state.
list_cluster = s_cluster_data.tolist() # py list
tuple_cluster = [tuple(l) for l in list_cluster] # list of tuples
dtype=[('A','f8'),('B','f8'),('C','i4')]
sd_cluster_data = np.array(tuple_cluster, dtype=dtype) # array of tuples with dtype
sd_cluster_data
Out: ...,
(1.0020371 , -0.56034073, 2), (1.18264038, -0.55773913, 2),
(1.00550194, -0.55359672, 2), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
So ideally the above output is what I would like to see, but with array of lists, not array of tuples.
I tried to take the array apart and merge it back as lists
x_val_arr = np.array([x[0] for x in sd_cluster_data])
y_val_arr = np.array([x[1] for x in sd_cluster_data])
cluster_id_arr = np.array([x[2] for x in sd_cluster_data])
coordinates_arr = np.stack((x_val_arr,y_val_arr,cluster_id_arr),axis=1)
But once again I get floats in the third column
coordinates_arr
Out[416]:
array([[ 0.9607611 , 0.19538569, 0. ],
[ 1.03990463, 0.22274072, 0. ],
[ 1.09430461, 0.22603228, 0. ],
...,
[ 1.10802461, -0.54190659, 2. ],
[ 0.9288097 , -0.49195368, 2. ],
[ 0.81606986, -0.47141286, 2. ]])
So this is probably a question due to my lack of domain knowledge, but do ndarrays not support mixed data types if it consists of lists, not tuples?
In [87]: import numpy.lib.recfunctions as rf
In [88]: arr = np.array([[ 0.9607611 , 0.19538569, 0. ],
...: [ 1.03990463, 0.22274072, 0. ],
...: [ 1.09430461, 0.22603228, 0. ],
...: [ 1.10802461, -0.54190659, 2. ],
...: [ 0.9288097 , -0.49195368, 2. ],
...: [ 0.81606986, -0.47141286, 2. ]])
In [89]: arr
Out[89]:
array([[ 0.9607611 , 0.19538569, 0. ],
[ 1.03990463, 0.22274072, 0. ],
[ 1.09430461, 0.22603228, 0. ],
[ 1.10802461, -0.54190659, 2. ],
[ 0.9288097 , -0.49195368, 2. ],
[ 0.81606986, -0.47141286, 2. ]])
There are various ways of constructing a structured array from 2d array like this. Recent versions provide a convenient unstructured_to_structured function:
In [90]: dt = np.dtype([('A','f8'),('B','f8'),('C','i4')])
In [92]: rf.unstructured_to_structured(arr, dt)
Out[92]:
array([(0.9607611 , 0.19538569, 0), (1.03990463, 0.22274072, 0),
(1.09430461, 0.22603228, 0), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
Each row of arr has been turned into a structured record, displayed as a tuple.
A functionally equivalent approach is to create a 'blank' array, and assign field values by name:
In [93]: res = np.zeros(arr.shape[0], dt)
In [94]: res
Out[94]:
array([(0., 0., 0), (0., 0., 0), (0., 0., 0), (0., 0., 0), (0., 0., 0),
(0., 0., 0)], dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
In [95]: res['A'] = arr[:,0]
In [96]: res['B'] = arr[:,1]
In [97]: res['C'] = arr[:,2]
In [98]: res
Out[98]:
array([(0.9607611 , 0.19538569, 0), (1.03990463, 0.22274072, 0),
(1.09430461, 0.22603228, 0), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
and to belabor the point, we could also make the structured array from a list of tuples:
In [104]: np.array([tuple(row) for row in arr.tolist()], dt)
Out[104]:
array([(0.9607611 , 0.19538569, 0), (1.03990463, 0.22274072, 0),
(1.09430461, 0.22603228, 0), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
The problem might be in the way you pass data to np.array. The rows of array should be tuples.
a = np.array([( 0.9607611 , 0.19538569, 0. )], dtype='f8, f8, i4')
will create an array
array([(0.9607611, 0.19538569, 0)],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])
I wanted to access my array both as a 3-element entity (3d position) and individual element (each of x,y,z coordinate).
After some researching, I ended up doing the following.
>>> import numpy as np
>>> arr = np.zeros(5, dtype={'pos': (('<f8', (3,)), 0),
'x': (('<f8', 1), 0),
'y': (('<f8', 1), 8),
'z': (('<f8', 1), 16)})
>>> arr["x"] = 0
>>> arr["y"] = 1
>>> arr["z"] = 2
# I can access the whole array by "pos"
>>> print(arr["pos"])
>>> array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
However, I've always been making array in this style:
>>> arr = np.zeros(10, dtype=[("pos", "f8", (3,))])
But I can't find a way to specify both the offset and the shape of the element at the same time in this style. Is there a way to do this?
In reference to the docs page, https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.dtypes.html
you are using the fields dictionary form, with (data-type, offset) value
{'field1': ..., 'field2': ..., ...}
dt1 = {'pos': (('<f8', (3,)), 0),
'x': (('<f8', 1), 0),
'y': (('<f8', 1), 8),
'z': (('<f8', 1), 16)}
The display for the resulting dtype is the other dictionary format:
{'names': ..., 'formats': ..., 'offsets': ..., 'titles': ..., 'itemsize': ...}
In [15]: np.dtype(dt1)
Out[15]: dtype({'names':['x','pos','y','z'],
'formats':['<f8',('<f8', (3,)),'<f8','<f8'],
'offsets':[0,0,8,16], 'itemsize':24})
In [16]: np.dtype(dt1).fields
Out[16]:
mappingproxy({'pos': (dtype(('<f8', (3,))), 0),
'x': (dtype('float64'), 0),
'y': (dtype('float64'), 8),
'z': (dtype('float64'), 16)})
offsets aren't mentioned any where else on the documentation page.
The last format is a union type. It's a little unclear as to whether that's allowed or discouraged. The examples don't seem to work. There have been some changes in how multifield indexing works, and that may have affected this.
Let's play around with various ways of viewing the array:
In [25]: arr
Out[25]:
array([(0., [ 0. , 10. , 0. ], 10., 0. ),
(1., [ 1. , 11. , 0.1], 11., 0.1),
(2., [ 2. , 12. , 0.2], 12., 0.2),
(3., [ 3. , 13. , 0.3], 13., 0.3),
(4., [ 4. , 14. , 0.4], 14., 0.4)],
dtype={'names':['x','pos','y','z'], 'formats':['<f8',('<f8', (3,)),'<f8','<f8'], 'offsets':[0,0,8,16], 'itemsize':24})
In [29]: dt3=[('x','<f8'),('y','<f8'),('z','<f8')]
In [30]: np.dtype(dt3)
Out[30]: dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
In [31]: np.dtype(dt3).fields
Out[31]:
mappingproxy({'x': (dtype('float64'), 0),
'y': (dtype('float64'), 8),
'z': (dtype('float64'), 16)})
In [32]: arr.view(dt3)
Out[32]:
array([(0., 10., 0. ), (1., 11., 0.1), (2., 12., 0.2), (3., 13., 0.3),
(4., 14., 0.4)], dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
In [33]: arr['pos']
Out[33]:
array([[ 0. , 10. , 0. ],
[ 1. , 11. , 0.1],
[ 2. , 12. , 0.2],
[ 3. , 13. , 0.3],
[ 4. , 14. , 0.4]])
In [35]: arr.view('f8').reshape(5,3)
Out[35]:
array([[ 0. , 10. , 0. ],
[ 1. , 11. , 0.1],
[ 2. , 12. , 0.2],
[ 3. , 13. , 0.3],
[ 4. , 14. , 0.4]])
In [37]: arr.view(dt4)
Out[37]:
array([([ 0. , 10. , 0. ],), ([ 1. , 11. , 0.1],),
([ 2. , 12. , 0.2],), ([ 3. , 13. , 0.3],),
([ 4. , 14. , 0.4],)], dtype=[('pos', '<f8', (3,))])
In [38]: arr.view(dt4)['pos']
Out[38]:
array([[ 0. , 10. , 0. ],
[ 1. , 11. , 0.1],
[ 2. , 12. , 0.2],
[ 3. , 13. , 0.3],
[ 4. , 14. , 0.4]])
According to Python Cookbook, below is how to write a list of tuple into binary file:
from struct import Struct
def write_records(records, format, f):
'''
Write a sequence of tuples to a binary file of structures.
'''
record_struct = Struct(format)
for r in records:
f.write(record_struct.pack(*r))
# Example
if __name__ == '__main__':
records = [ (1, 2.3, 4.5),
(6, 7.8, 9.0),
(12, 13.4, 56.7) ]
with open('data.b', 'wb') as f:
write_records(records, '<idd', f)
And it works well.
For reading (large amount of binary data), the author recommended the following:
>>> import numpy as np
>>> f = open('data.b', 'rb')
>>> records = np.fromfile(f, dtype='<i,<d,<d')
>>> records
array([(1, 2.3, 4.5), (6, 7.8, 9.0), (12, 13.4, 56.7)],
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8')])
>>> records[0]
(1, 2.3, 4.5)
>>> records[1]
(6, 7.8, 9.0)
>>>
It is also good, but this record is not a normal numpy array. For instance, type(record[0]) will return <type 'numpy.void'>. Even worse, I cannot extract the first column using X = record[:, 0].
Is there a way to efficiently load list(or any other types) from binary file into a normal numpy array?
Thx in advance.
In [196]: rec = np.fromfile('data.b', dtype='<i,<d,<d')
In [198]: rec
Out[198]:
array([( 1, 2.3, 4.5), ( 6, 7.8, 9. ), (12, 13.4, 56.7)],
dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8')])
This is a 1d structured array
In [199]: rec['f0']
Out[199]: array([ 1, 6, 12], dtype=int32)
In [200]: rec.shape
Out[200]: (3,)
In [201]: rec.dtype
Out[201]: dtype([('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8')])
Note that its tolist looks identical to your original records:
In [202]: rec.tolist()
Out[202]: [(1, 2.3, 4.5), (6, 7.8, 9.0), (12, 13.4, 56.7)]
In [203]: records
Out[203]: [(1, 2.3, 4.5), (6, 7.8, 9.0), (12, 13.4, 56.7)]
You could create a 2d array from either list with:
In [204]: arr2 = np.array(rec.tolist())
In [205]: arr2
Out[205]:
array([[ 1. , 2.3, 4.5],
[ 6. , 7.8, 9. ],
[ 12. , 13.4, 56.7]])
In [206]: arr2.shape
Out[206]: (3, 3)
There are other ways of converting a structured array to 'regular' array, but this is simplest and most consistent.
The tolist of a regular array uses nested lists. The tuples in the structured version are intended to convey a difference:
In [207]: arr2.tolist()
Out[207]: [[1.0, 2.3, 4.5], [6.0, 7.8, 9.0], [12.0, 13.4, 56.7]]
In the structured array the first field is integer. In the regular array the first column is same as the others, float.
If the binary file contained all floats, you could load it as a 1d of floats and reshape
In [208]: with open('data.f', 'wb') as f:
...: write_records(records, 'ddd', f)
In [210]: rec2 = np.fromfile('data.f', dtype='<d')
In [211]: rec2
Out[211]: array([ 1. , 2.3, 4.5, 6. , 7.8, 9. , 12. , 13.4, 56.7])
But to take advantage of any record structure in the binary file, you have load by records as well, which means structured array:
In [213]: rec3 = np.fromfile('data.f', dtype='d,d,d')
In [214]: rec3
Out[214]:
array([( 1., 2.3, 4.5), ( 6., 7.8, 9. ), ( 12., 13.4, 56.7)],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8')])
I have an exported pandas dataframe that is now a numpy.array object.
subset = array[:4,:]
array([[ 2. , 12. , 33.33333333, 2. ,
33.33333333, 12. ],
[ 2. , 2. , 33.33333333, 2. ,
33.33333333, 2. ],
[ 2.8 , 8. , 45.83333333, 2.75 ,
46.66666667, 13. ],
[ 3.11320755, 75. , 56. , 3.24 ,
52.83018868, 33. ]])
print subset.dtype
dtype('float64')
I was to convert the column values to specific types, and set column names as well, this means I need to convert it to a ndarray.
Here are my dtypes:
[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'),
('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
('NULL_COUNT_B', '<f8')]
When I go to convert the array, I get:
ValueError: new type not compatible with array.
How do you cast each column to a specific value so I can convert the array to an ndarray?
Thanks
You already have an ndarray. What you are seeking is a structured array, one with this compound dtype. First see if pandas can do it for you. If that fails we might be able to do something with tolist and a list comprehension.
In [84]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
...: f8'),
...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
...: ('NULL_COUNT_B', '<f8')]
In [85]: subset=np.array([[ 2. , 12. , 33.33333333, 2.
...: ,
...: 33.33333333, 12. ],
...: [ 2. , 2. , 33.33333333, 2. ,
...: 33.33333333, 2. ],
...: [ 2.8 , 8. , 45.83333333, 2.75 ,
...: 46.66666667, 13. ],
...: [ 3.11320755, 75. , 56. , 3.24 ,
...: 52.83018868, 33. ]])
In [86]: subset
Out[86]:
array([[ 2. , 12. , 33.33333333, 2. ,
33.33333333, 12. ],
[ 2. , 2. , 33.33333333, 2. ,
33.33333333, 2. ],
[ 2.8 , 8. , 45.83333333, 2.75 ,
46.66666667, 13. ],
[ 3.11320755, 75. , 56. , 3.24 ,
52.83018868, 33. ]])
Now make an array with dt. Input for a structured array has to be a list of tuples - so I'm using tolist and a list comprehension
In [87]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
....
ValueError: field 'NULL_COUNT_B' occurs more than once
In [88]: subset.shape
Out[88]: (4, 6)
In [89]: dt
Out[89]:
[('PERCENT_A_NEW', '<f8'),
('JoinField', '<i4'),
('NULL_COUNT_B', '<f8'),
('PERCENT_COMP_B', '<f8'),
('RANKING_A', '<f8'),
('RANKING_B', '<f8'),
('NULL_COUNT_B', '<f8')]
In [90]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
...: f8'),
...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')]
In [91]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
Out[91]:
array([(2.0, 12, 33.33333333, 2.0, 33.33333333, 12.0),
(2.0, 2, 33.33333333, 2.0, 33.33333333, 2.0),
(2.8, 8, 45.83333333, 2.75, 46.66666667, 13.0),
(3.11320755, 75, 56.0, 3.24, 52.83018868, 33.0)],
dtype=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'), ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')])