Adding a field to a structured numpy array (4) - python

This has been addressed before (here, here and here). I want to add a new field to a structure array returned by numpy genfromtxt (also asked here).
My new problem is that the csv file I'm reading has only a header line and a single data row:
output-Summary.csv:
Wedge, DWD, Yield (wedge), Efficiency
1, 16.097825, 44283299.473156, 2750887.118836
I'm reading it via genfromtxt and calculate a new value 'tl':
test_out = np.genfromtxt('output-Summary.csv', delimiter=',', names=True)
tl = 300 / test_out['DWD']
test_out looks like this:
array((1., 16.097825, 44283299.473156, 2750887.118836),
dtype=[('Wedge', '<f8'), ('DWD', '<f8'), ('Yield_wedge', '<f8'), ('Efficiency', '<f8')])
Using recfunctions.append_fields (as suggested in the examples 1-3 above) fails over the use of len() for the size 1 array:
from numpy.lib import recfunctions as rfn
rfn.append_fields(test_out,'tl',tl)
TypeError: len() of unsized object
Searching for alternatives (one of the answers here) I find that mlab.rec_append_fields works well (but is deprecated):
mlab.rec_append_fields(test_out,'tl',tl)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: MatplotlibDeprecationWarning: The rec_append_fields function was deprecated in version 2.2.
"""Entry point for launching an IPython kernel.
rec.array((1., 16.097825, 44283299.473156, 2750887.118836, 18.63605798),
dtype=[('Wedge', '<f8'), ('DWD', '<f8'), ('Yield_wedge', '<f8'), ('Efficiency', '<f8'), ('tl', '<f8')])
I can also copy the array over to a new structured array "by hand" as suggested here. This works:
test_out_new = np.zeros(test_out.shape, dtype=new_dt)
for name in test_out.dtype.names:
test_out_new[name]=test_out[name]
test_out_new['tl']=tl
So in summary - is there a way to get recfunctions.append_fields to work with the genfromtxt output from my single row csv file?
I would really rather use a standard way to handle this rather than a home brew..

Reshape the array (and new field) to size (1,). With just one line, the genfromtxt is loading the data as a 0d array, shape (). The rfn code isn't heavily used, and isn't a robust as it should be. In other words, the 'standard way' is still bit buggy.
For example:
In [201]: arr=np.array((1,2,3), dtype='i,i,i')
In [202]: arr.reshape(1)
Out[202]: array([(1, 2, 3)], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
In [203]: rfn.append_fields(arr.reshape(1), 't1',[1], usemask=False)
Out[203]:
array([(1, 2, 3, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('t1', '<i8')])
Nothing wrong with the home_brew. Most of the rfn functions use that mechanism - define a new dtype, create a recipient array with that dtype, and copy the fields over, name by name.

Related

Converting a list of ints or floats read from a file to a numpy array

I have bunch of int values that I have to read from a file and store it in a numpy array. This is how I am doing it:
el_connect = np.zeros([NEls,3],dtype=int)
for i in range(0,NEls):
connct = file.readline().strip().split()
for j in range(0,len(connct)):
el_connect[i,j] = int(connct[j])
This is how I am currently doing it. Is there a better way to do it where I can eliminate the second for loop?
Other questions I have regarding this are:
How can I deal with the scenario where certain columns are ints and other columns are floats because numpy arrays cannot handle multiple data types?
Also, how can I throw exception if the format of the file is not as I expected? Just a few examples would do.
You could get rid of both loops by using np.genfromtxt() assuming space separated values (which I'm inferring from the above code).
In []:
data = '''1 2 2.2 3
3 4 4.1 2'''
np.genfromtxt(StringIO(data)) # Replace `StringIO(data)` with filename
Out[]:
array([[1. , 2. , 2.2, 3. ],
[3. , 4. , 4.1, 2. ]])
np.genfromtxt() infers np.float64 for the array if you have mixed set of ints and floats but if you want to explicitly describe the types you can with:
np.genfromtxt(StringIO(data), [np.int32, np.int32, np.float, np.int32])
Out[]:
array([(1, 2, 2.2, 3), (3, 4, 4.1, 2)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', '<i4')])

Apply function to single column of structured numpy array in Python

I have a structured numpy array with two columns. One column contains a series of date times as strings, and the other contains measured values corresponding to that date.
data = array([('date1', 2.3), ('date3', 2.4), ...]
dtype=[('date', '<U16'), ('val', '<f8')])
I also have a number of functions similar to the following:
def example_func(x):
return 5*x + 1
I am trying to apply example_func to the second column of my array and generate the result
array([('date1', 12.5), ('date3', 11.6), ...]
dtype=[('date', '<U16'), ('val', '<f8')])
Everything I try, however, either raises a future warning from numpy or requires a for loop. Any ideas on how I can do this efficiently?
This works for me:
In [7]: example_func(data['val'])
Out[7]: array([ 12.5, 13. ])
In [8]: data['val'] = example_func(data['val'])
In [9]: data
Out[9]:
array([('date1', 12.5), ('date3', 13. )],
dtype=[('date', '<U16'), ('val', '<f8')])
In [10]: np.__version__
Out[10]: '1.12.0'
I have gotten future warnings when accessing several fields (with a list of names), and then attempting some sort of modification. It suggests making a copy etc. But I can't generate such a warning with a single field access like this.
In [15]: data[['val', 'date']]
Out[15]:
array([( 12.5, 'date1'), ( 13. , 'date3')],
dtype=[('val', '<f8'), ('date', '<U16')])
In [16]: data[['val', 'date']][0] = (12, 'date2')
/usr/local/bin/ipython3:1: FutureWarning: Numpy has detected that you (may be) writing to an array returned
by numpy.diagonal or by selecting multiple fields in a structured
array. This code will likely break in a future numpy release --
see numpy.diagonal or arrays.indexing reference docs for details.
The quick fix is to make an explicit copy (e.g., do
arr.diagonal().copy() or arr[['f0','f1']].copy()).
Developers aren't happy with how they access several fields at once. It's ok to read them, but changing is under evaluation. And in '1.13' there's some change about copying fields by position rather than name.

Unable to subtract specific fields within structured numpy arrays

While trying to subtract to fields within a structured numpy array, the following error occurs:
In [8]: print serPos['pos'] - hisPos['pos']
---------------------------------------------------------------------------
TypeError
Traceback (most recent call last) <ipython-input-8-8a22559cfb2d> in <module>()
----> 1 print serPos['pos'] - hisPos['pos']
TypeError: ufunc 'subtract' did not contain a loop with signature matching types
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
Given the standard float dtype, why would I be unable to perform this subtraction?
To reproduce these conditions, the following example code is provided:
import numpy as np
raw = np.dtype([('residue', int),
('pos', [('x', float),
('y', float),
('z', float)])])
serPos = np.empty([0,2],dtype=raw)
hisPos = np.empty([0,2],dtype=raw)
serPos = np.append(serPos, np.array([(1,(1,2,3))], dtype=raw))
hisPos = np.append(hisPos, np.array([(1,(1,2,3))], dtype=raw))
print serPos['pos'], hisPos['pos'] # prints fine
print serPos['pos'] - hisPos['pos'] # errors with ufunc error
Any suggestions would be greatly appreciated!
The dtype for serPos['pos'] is compound
dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
subtraction (and other such operations) has not been defined for compound dtype. It doesn't work for the raw dtype either.
You could subtract the individual fields
serPos['pos']['x']-hisPos['pos']['x']
I think we can also view serPos['pos'] as a 2d array (3 columns) and subtract that form. But I need to test the syntax.
serPos['pos'].view((float,(3,)))
should produce a (N,3) 2d array.

When dtype is specified with genfromtxt a 2D array becomes 1D - how to prevent this?

As seen here:
http://library.isr.ist.utl.pt/docs/numpy/user/basics.io.genfromtxt.html#choosing-the-data-type
"In all the cases but the first one, the output will be a 1D array with a structured dtype. This dtype has as many fields as items in the sequence. The field names are defined with the names keyword."
The problem is how do I get around this? I want to use genfromtxt with a data file with columns that are, e.g. int, string, int.
If I do:
dtype=(int, "|S5|", int)
Then the entire shape changes from (x, y) to merely (x, ) and I get 'too many indices' errors when I try to use masks.
When I use dtype=None I get to keep the 2D structure, but it often makes mistakes if the 1st row the column looks like it could be a number (this often occurs in my data set).
How am I best to get around this?
You cannot have a 2D array, it would mean having 1D arrays with mixed dtype for each row, which is not possible.
Having an array of records shouldn't be a problem:
In [1]: import numpy as np
In [2]: !cat test.txt
42 foo 41
40 bar 39
In [3]: data = np.genfromtxt('test.txt',
..: dtype=np.dtype([('f1', int), ('f2', np.str_, 5), ('f3', int)]))
In [4]: data
Out[4]:
array([(42, 'foo', 41), (40, 'bar', 39)],
dtype=[('f1', '<i8'), ('f2', '<U5'), ('f3', '<i8')])
In [5]: data['f3']
Out[5]: array([41, 39])
In [6]: data['f3'][1]
Out[6]: 39
If you need a masked array, look here: How can I mask elements of a record array in Numpy?
To mask by 1st column value:
In [7]: data['f1'] == 40
Out[7]: array([False, True], dtype=bool)
In [8]: data[data['f1'] == 40]
Out[8]:
array([(40, 'bar', 39)],
dtype=[('f1', '<i8'), ('f2', '<U5'), ('f3', '<i8')])

List of tuples to Numpy recarray

Given a list of tuples, where each tuple represents a row in a table, e.g.
tab = [('a',1),('b',2)]
Is there an easy way to convert this to a record array? I tried
np.recarray(tab,dtype=[('name',str),('value',int)])
which doesn't seem to work.
try
np.rec.fromrecords(tab)
rec.array([('a', 1), ('b', 2)],
dtype=[('f0', '|S1'), ('f1', '<i4')])

Categories