Unable to subtract specific fields within structured numpy arrays - python

While trying to subtract to fields within a structured numpy array, the following error occurs:
In [8]: print serPos['pos'] - hisPos['pos']
---------------------------------------------------------------------------
TypeError
Traceback (most recent call last) <ipython-input-8-8a22559cfb2d> in <module>()
----> 1 print serPos['pos'] - hisPos['pos']
TypeError: ufunc 'subtract' did not contain a loop with signature matching types
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
Given the standard float dtype, why would I be unable to perform this subtraction?
To reproduce these conditions, the following example code is provided:
import numpy as np
raw = np.dtype([('residue', int),
('pos', [('x', float),
('y', float),
('z', float)])])
serPos = np.empty([0,2],dtype=raw)
hisPos = np.empty([0,2],dtype=raw)
serPos = np.append(serPos, np.array([(1,(1,2,3))], dtype=raw))
hisPos = np.append(hisPos, np.array([(1,(1,2,3))], dtype=raw))
print serPos['pos'], hisPos['pos'] # prints fine
print serPos['pos'] - hisPos['pos'] # errors with ufunc error
Any suggestions would be greatly appreciated!

The dtype for serPos['pos'] is compound
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
subtraction (and other such operations) has not been defined for compound dtype. It doesn't work for the raw dtype either.
You could subtract the individual fields
serPos['pos']['x']-hisPos['pos']['x']
I think we can also view serPos['pos'] as a 2d array (3 columns) and subtract that form. But I need to test the syntax.
serPos['pos'].view((float,(3,)))
should produce a (N,3) 2d array.

Related

Adding a field to a structured numpy array (4)

This has been addressed before (here, here and here). I want to add a new field to a structure array returned by numpy genfromtxt (also asked here).
My new problem is that the csv file I'm reading has only a header line and a single data row:
output-Summary.csv:
Wedge, DWD, Yield (wedge), Efficiency
1, 16.097825, 44283299.473156, 2750887.118836
I'm reading it via genfromtxt and calculate a new value 'tl':
test_out = np.genfromtxt('output-Summary.csv', delimiter=',', names=True)
tl = 300 / test_out['DWD']
test_out looks like this:
array((1., 16.097825, 44283299.473156, 2750887.118836),
dtype=[('Wedge', '<f8'), ('DWD', '<f8'), ('Yield_wedge', '<f8'), ('Efficiency', '<f8')])
Using recfunctions.append_fields (as suggested in the examples 1-3 above) fails over the use of len() for the size 1 array:
from numpy.lib import recfunctions as rfn
rfn.append_fields(test_out,'tl',tl)
TypeError: len() of unsized object
Searching for alternatives (one of the answers here) I find that mlab.rec_append_fields works well (but is deprecated):
mlab.rec_append_fields(test_out,'tl',tl)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: MatplotlibDeprecationWarning: The rec_append_fields function was deprecated in version 2.2.
"""Entry point for launching an IPython kernel.
rec.array((1., 16.097825, 44283299.473156, 2750887.118836, 18.63605798),
dtype=[('Wedge', '<f8'), ('DWD', '<f8'), ('Yield_wedge', '<f8'), ('Efficiency', '<f8'), ('tl', '<f8')])
I can also copy the array over to a new structured array "by hand" as suggested here. This works:
test_out_new = np.zeros(test_out.shape, dtype=new_dt)
for name in test_out.dtype.names:
test_out_new[name]=test_out[name]
test_out_new['tl']=tl
So in summary - is there a way to get recfunctions.append_fields to work with the genfromtxt output from my single row csv file?
I would really rather use a standard way to handle this rather than a home brew..
Reshape the array (and new field) to size (1,). With just one line, the genfromtxt is loading the data as a 0d array, shape (). The rfn code isn't heavily used, and isn't a robust as it should be. In other words, the 'standard way' is still bit buggy.
For example:
In [201]: arr=np.array((1,2,3), dtype='i,i,i')
In [202]: arr.reshape(1)
Out[202]: array([(1, 2, 3)], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
In [203]: rfn.append_fields(arr.reshape(1), 't1',[1], usemask=False)
Out[203]:
array([(1, 2, 3, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('t1', '<i8')])
Nothing wrong with the home_brew. Most of the rfn functions use that mechanism - define a new dtype, create a recipient array with that dtype, and copy the fields over, name by name.

Apply function to single column of structured numpy array in Python

I have a structured numpy array with two columns. One column contains a series of date times as strings, and the other contains measured values corresponding to that date.
data = array([('date1', 2.3), ('date3', 2.4), ...]
dtype=[('date', '<U16'), ('val', '<f8')])
I also have a number of functions similar to the following:
def example_func(x):
return 5*x + 1
I am trying to apply example_func to the second column of my array and generate the result
array([('date1', 12.5), ('date3', 11.6), ...]
dtype=[('date', '<U16'), ('val', '<f8')])
Everything I try, however, either raises a future warning from numpy or requires a for loop. Any ideas on how I can do this efficiently?
This works for me:
In [7]: example_func(data['val'])
Out[7]: array([ 12.5, 13. ])
In [8]: data['val'] = example_func(data['val'])
In [9]: data
Out[9]:
array([('date1', 12.5), ('date3', 13. )],
dtype=[('date', '<U16'), ('val', '<f8')])
In [10]: np.__version__
Out[10]: '1.12.0'
I have gotten future warnings when accessing several fields (with a list of names), and then attempting some sort of modification. It suggests making a copy etc. But I can't generate such a warning with a single field access like this.
In [15]: data[['val', 'date']]
Out[15]:
array([( 12.5, 'date1'), ( 13. , 'date3')],
dtype=[('val', '<f8'), ('date', '<U16')])
In [16]: data[['val', 'date']][0] = (12, 'date2')
/usr/local/bin/ipython3:1: FutureWarning: Numpy has detected that you (may be) writing to an array returned
by numpy.diagonal or by selecting multiple fields in a structured
array. This code will likely break in a future numpy release --
see numpy.diagonal or arrays.indexing reference docs for details.
The quick fix is to make an explicit copy (e.g., do
arr.diagonal().copy() or arr[['f0','f1']].copy()).
Developers aren't happy with how they access several fields at once. It's ok to read them, but changing is under evaluation. And in '1.13' there's some change about copying fields by position rather than name.

Numpy: preserving dtype after column_stack()

python3, numpy1.10
Let's say, I have something like
some_array = ['a', 'b', 'c']
bool_array = numpy.array([False for x in range(len(parts_array))], dtype='bool_')
Then bool_array will be [False, False, False] and with bool type.
And when I do
another_array = numpy.column_stack((some_array, bool_array)), both types become str, which I don't want to.
What I want is preserving bool type in the second column. I don't care about type of the first column.
Will I need to create another array? Seems like the solution is to pass the dtype like in structured arrays, but I'd like to not copy the whole array generated by column_stack().
I don't think you can have multiple data types in one numpy array. You might want to try a pandas data frame for example if you have
some_array=['a','b','c']
bool_array=[False, False, Fale]
you can define a data frame
df=pd.DataFrame(bool_array, index=some_array.tolist(), columns=[1])
This will like like
1
a False
b False
c False
the bool_array will keep its type but you will have to use data frame operations to access and manipulate it
There are 2 (main) ways of adding values to a structured array - with a list of tuples or by copying values to each field.
e.g. by field:
In [368]: arr=np.zeros((3,),dtype='S10,bool')
In [369]: arr
Out[369]:
array([(b'', False), (b'', False), (b'', False)],
dtype=[('f0', 'S10'), ('f1', '?')])
In [371]: arr['f0']=['one','two','three']
In [372]: arr['f1']=[True,False,True]
In [373]: arr
Out[373]:
array([(b'one', True), (b'two', False), (b'three', True)],
dtype=[('f0', 'S10'), ('f1', '?')])
If you have a list of sublists, you'll need to convert the sublists to tuples:
In [378]: alist=[['one',True],['two',False],['three',True]]
In [379]: np.array([tuple(i) for i in alist], dtype=arr.dtype)
Out[379]:
array([(b'one', True), (b'two', False), (b'three', True)],
dtype=[('f0', 'S10'), ('f1', '?')])
The column_stack won't help as an intermediate array, since it has lost the original column dtypes.
You could also explore functions in from numpy.lib import recfunctions. That module has functions to merge arrays. But the ones I have looked at use the field copy method, so they won't save any time.
In [381]: recfunctions.merge_arrays([['one','two','three'],[False,True,False]])
Out[381]:
array([('one', False), ('two', True), ('three', False)],
dtype=[('f0', '<U5'), ('f1', '?')])

When dtype is specified with genfromtxt a 2D array becomes 1D - how to prevent this?

As seen here:
http://library.isr.ist.utl.pt/docs/numpy/user/basics.io.genfromtxt.html#choosing-the-data-type
"In all the cases but the first one, the output will be a 1D array with a structured dtype. This dtype has as many fields as items in the sequence. The field names are defined with the names keyword."
The problem is how do I get around this? I want to use genfromtxt with a data file with columns that are, e.g. int, string, int.
If I do:
dtype=(int, "|S5|", int)
Then the entire shape changes from (x, y) to merely (x, ) and I get 'too many indices' errors when I try to use masks.
When I use dtype=None I get to keep the 2D structure, but it often makes mistakes if the 1st row the column looks like it could be a number (this often occurs in my data set).
How am I best to get around this?
You cannot have a 2D array, it would mean having 1D arrays with mixed dtype for each row, which is not possible.
Having an array of records shouldn't be a problem:
In [1]: import numpy as np
In [2]: !cat test.txt
42 foo 41
40 bar 39
In [3]: data = np.genfromtxt('test.txt',
..: dtype=np.dtype([('f1', int), ('f2', np.str_, 5), ('f3', int)]))
In [4]: data
Out[4]:
array([(42, 'foo', 41), (40, 'bar', 39)],
dtype=[('f1', '<i8'), ('f2', '<U5'), ('f3', '<i8')])
In [5]: data['f3']
Out[5]: array([41, 39])
In [6]: data['f3'][1]
Out[6]: 39
If you need a masked array, look here: How can I mask elements of a record array in Numpy?
To mask by 1st column value:
In [7]: data['f1'] == 40
Out[7]: array([False, True], dtype=bool)
In [8]: data[data['f1'] == 40]
Out[8]:
array([(40, 'bar', 39)],
dtype=[('f1', '<i8'), ('f2', '<U5'), ('f3', '<i8')])

List of tuples to Numpy recarray

Given a list of tuples, where each tuple represents a row in a table, e.g.
tab = [('a',1),('b',2)]
Is there an easy way to convert this to a record array? I tried
np.recarray(tab,dtype=[('name',str),('value',int)])
which doesn't seem to work.
try
np.rec.fromrecords(tab)
rec.array([('a', 1), ('b', 2)],
dtype=[('f0', '|S1'), ('f1', '<i4')])

Categories