Python h5py - 'Shape tuple is incompatible with data' error

Python h5py - 'Shape tuple is incompatible with data' error - python

I am trying to write a program which puts data into a .h5 file. There should be 3 columns, one with the number of the variable (from counter in the for loop. one with the name of the variable (shown in the 2nd column of list_of_vars), and one with its unit (3rd column of list_of_vars).
Code is below:
import numpy as np
import h5py as h5
list_of_vars = [
('ADC_alt', 'ADC_alt', 'ft'),
('ADC_temp', 'ADC_temp', 'degC'),
('ADC_ias', 'ADC_ias', 'kts'),
('ADC_tas', 'ADC_tas', 'kts'),
('ADC_aos', 'ADC_aos', 'deg'),
('ADC_aoa', 'ADC_aoa', 'deg'),
]
#write new h5 file
var = h5.File('telemetry.h5','w')
for counter, val in enumerate(list_of_vars):
varnum = var.create_dataset('n°', (6,), data = counter)
varname = var.create_dataset('Variable name', (6,), dtype = 'str_', data = val[1])
varunit = var.create_dataset('Unit', (6,), dtype = 'str_', data = val[2])
data = np.array(varname,varunit)
print(data)
However, when I run it, I get the error ValueError: Shape tuple is incompatible with data
What is wrong here?

Lots of little problems to correct. If I understand, you want to create ONE heterogeneous dataset (with 1 field(column) of ints named 'n°', and 2 fields(columns) of strings named 'Variable name' and 'Unit'). What your code is trying to create is 18 separate datasets (3 created with each loop thru enumerate(list_of_vars).
There is a trick when working with heterogeneous datasets: If you add row wise, you have to reference the dataset row AND column indices, OR add the entire row. I prefer to load data field/column wise. Generally you have fewer fields than rows -- fewer loops == fewer write cycles == faster.
Here is the process you want. It creates the dataset, then adds the data for each field from the count, then list_of_vars[1], then from list_of_vars[2]. At the end it reads and prints the data from the dataset. Code below:
#write new h5 file
with h5.File('telemetry.h5','w') as var:
dt = np.dtype( [('n°', int), ('Variable name','S10'), ('Unit', 'S10')] )
dset = var.create_dataset('data', dtype=dt, shape=(len(list_of_vars),))
dset['n°'] = np.arange(len(list_of_vars))
dset['Variable name'] = [val[1] for val in list_of_vars]
dset['Unit'] = [val[2] for val in list_of_vars]
data = dset[:]
print(data)
If you prefer to use the enumerate loop, use this method. It loads items by row index. For completeness, it also shows how to index the dataset by [row,field name], but do not recommend it.
#write new h5 file
with h5.File('telemetry.h5','w') as var:
dt = np.dtype( [('n°', int), ('Variable name','S10'), ('Unit', 'S10')] )
dset = var.create_dataset('data',dtype=dt,shape=(len(list_of_vars),))
for counter, val in enumerate(list_of_vars):
dset[counter] = (counter, val[1], val[2])
# alternate row/field indexing method:
# dset[counter,'n°'] = counter
# dset[counter,'Variable name'] = val[1]
# dset[counter,'Unit'] = val[2]
data = dset[:]
print(data)

When asking about a problem, show the whole error.
When I run your code I get:
1129:~/mypy$ python3 stack68181330.py
Traceback (most recent call last):
File "stack68181330.py", line 18, in <module>
varnum = var.create_dataset('n°', (6,), data = counter)
File "/usr/local/lib/python3.8/dist-packages/h5py/_hl/group.py", line 149, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/usr/local/lib/python3.8/dist-packages/h5py/_hl/dataset.py", line 61, in make_new_dset
raise ValueError("Shape tuple is incompatible with data")
ValueError: Shape tuple is incompatible with data
It's having problems with your first dataset creation:
var.create_dataset('n°', (6,), data = counter)
What are you saying here? Make a dataset with name 'n*', and shape (6,) - 6 elements. But what is counter? It's the current enumerate value, one integer. Do you see how the shape (6,) doesn't match the data?
The other dataset lines potentially have similar problems.
It doesn't get on to another problem. You are doing the create_dataset repeatedly in the loop. Once you have a dataset named 'n*', you can't create another with the same name.
I suspect you want to make one dataset, with 6 slots, and repeatedly assign counter values to it. Not to repeatedly create a dataset with the same name.
Let's change the dataset creation and write to something that works:
varnum = var.create_dataset('n°', (6,), dtype=int)
varname = var.create_dataset('Variable name', (6,), dtype = 'S10')
varunit = var.create_dataset('Unit', (6,), dtype = 'S10')
for counter, val in enumerate(list_of_vars):
varnum[counter] = counter
varname[counter] = val[1]
varunit[counter] = val[2]
var.flush()
print(varnum, varnum[:])
print(varname)
print(varname[:])
print(varunit)
print(varunit[:])
and run:
1144:~/mypy$ python3 stack68181330.py
<HDF5 dataset "n°": shape (6,), type "<i8"> [0 1 2 3 4 5]
<HDF5 dataset "Variable name": shape (6,), type "|S10">
[b'ADC_alt' b'ADC_temp' b'ADC_ias' b'ADC_tas' b'ADC_aos' b'ADC_aoa']
<HDF5 dataset "Unit": shape (6,), type "|S10">
[b'ft' b'degC' b'kts' b'kts' b'deg' b'deg]
The b'ft' string display is bytestrings, the result of using S10. I think there are ways of specifying unicode, but I haven't looked at the h5py docs in a while.
There are simpler ways of writing this data, but I chose to keep it close to your attempt, to better illustrate the basics of both Python iteration, and h5py use.
I could write the data directly to the datasets, without iteration with:
Make array from the list:
arr = np.array(list_of_vars, dtype='S')
print(arr)
varnum = var.create_dataset('n°', np.arange(arr.shape[0]))
varname = var.create_dataset('Variable name', data=arr[:,1])
varunit = var.create_dataset('Unit', data=arr[:,2])
I let it deduce shape and dtype from the data.

Related

Is there a way to extend a PyTables EArray in the second dimension?

I have a 2D array that can grow to larger sizes than I'm able to fit on memory, so I'm trying to store it in a h5 file using Pytables. The number of rows is known beforehand but the length of each row is not known and is variable between rows. After some research, I thought something along these lines would work, where I can set the extendable dimension as the second dimension.
filename = os.path.join(tempfile.mkdtemp(), 'example.h5')
h5_file = open_file(filename, mode="w", title="Example Extendable Array")
h5_group = h5_file.create_group("/", "example_on_dim_2")
e_array = h5_file.create_earray(h5_group, "example", Int32Atom(shape=()), (100, 0)) # Assume num of rows is 100
# Add some item to index 2
print(e_array[2]) # should print an empty array
e_array[2] = np.append(e_array[2], 5) # add the value 5 to row 2
print(e_array[2]) # should print [5], currently printing empty array
I'm not sure if it's possible to add elements in this way (I might have misunderstood the way earrays work), but any help would be greatly appreciated!

You're close...but have a small misunderstanding of some of the arguments and behavior. When you create the EArray with shape=(100,0), you don't have any data...just an object designated to have 100 rows that can add columns. Also, you need to use e_array.append() to add data, not np.append(). Also, if you are going to create a very large array, consider defining the expectedrows= parameter for improved performance as the EArray grows.
Take a look at this code.
import tables as tb
import numpy as np
filename = 'example.h5'
with tb.File(filename, mode="w", title="Example Extendable Array") as h5_file :
h5_group = h5_file.create_group("/", "example_on_dim_2")
# Assume num of rows is 100
#e_array = h5_file.create_earray(h5_group, "example", Int32Atom(shape=()), (100, 0))
e_array = h5_file.create_earray(h5_group, "example", atom=tb.IntAtom(), shape=(100, 0))
print (e_array.shape)
e_array.append(np.arange(100,dtype=int).reshape(100,1)) # append a column of values
print (e_array.shape)
print(e_array[2]) # prints [2]

Here is an example showing how to create a VLArray (Variable Length). It is similar to the EArray example above, and follows the example from the Pytables doc (link in comment above). However, although a VLArray supports variable length rows, it does not have a mechanism to add items to an existing row (AFAIK).
import tables as tb
import numpy as np
filename = 'example_vlarray.h5'
with tb.File(filename, mode="w", title="Example Variable Length Array") as h5_file :
h5_group = h5_file.create_group("/", "vl_example")
vlarray = h5_file.create_vlarray(h5_group, "example", tb.IntAtom(), "ragged array of ints",)
# Append some (variable length) rows:
vlarray.append(np.array([0]))
vlarray.append(np.array([1, 2]))
vlarray.append([3, 4, 5])
vlarray.append([6, 7, 8, 9])
# Now, read it through an iterator:
print('-->', vlarray.title)
for x in vlarray:
print('%s[%d]--> %s' % (vlarray.name, vlarray.nrow, x))

Creating a dataset from multiple hdf5 groups

creating a dataset from multiple hdf5 groups
Code for groups with
np.array(hdf.get('all my groups'))
I have then added code for creating a dataset from groups.
with h5py.File('/train.h5', 'w') as hdf:
hdf.create_dataset('train', data=one_T+two_T+three_T+four_T+five_T)
The error message being
ValueError: operands could not be broadcast together with shapes(534456,4) (534456,14)
The numbers in each group are the same other than the varying column lengths. 5 separate groups to one dataset.

This answer addresses the OP's request in comments to my first answer ("an example would be ds_1 all columns, ds_2 first two columns, ds_3 column 4 and 6, ds_4 all columns"). The process is very similar, but the input is "slightly more complicated" than the first answer. As a result I used a different approach to define dataset names and colums to be copied. Differences:
The first solution iterates over the dataset names from the "keys()" (copying each dataset completely, appending to a dataset in the new file). The size of the new dataset is calculated by summing sizes of all datasets.
The second solution uses 2 lists to define 1) dataset names (ds_list) and 2) associated columns to copy from each dataset (col_list is a of lists). The size of the new dataset is calculated by summing the number of columns in col_list. I used "fancy indexing" to extract the columns using col_list.
How you decide to do this depends on your data.
Note: for simplicity, I deleted the dtype and shape tests. You should include these to avoid errors with "real world" problems.
Code below:
# Data for file1
arr1 = np.random.random(120).reshape(20,6)
arr2 = np.random.random(120).reshape(20,6)
arr3 = np.random.random(120).reshape(20,6)
arr4 = np.random.random(120).reshape(20,6)
# Create file1 with 4 datasets
with h5py.File('file1.h5','w') as h5f :
h5f.create_dataset('ds_1',data=arr1)
h5f.create_dataset('ds_2',data=arr2)
h5f.create_dataset('ds_3',data=arr3)
h5f.create_dataset('ds_4',data=arr4)
# Open file1 for reading and file2 for writing
with h5py.File('file1.h5','r') as h5f1 , \
h5py.File('file2.h5','w') as h5f2 :
# Loop over datasets in file1 to get dtype and rows (should test compatibility)
for i, ds in enumerate(h5f1.keys()) :
if i == 0:
ds_0_dtype = h5f1[ds].dtype
n_rows = h5f1[ds].shape[0]
break
# Create new empty dataset with appropriate dtype and size
# Use maxshape parameter to make resizable in the future
ds_list = ['ds_1','ds_2','ds_3','ds_4']
col_list =[ [0,1,2,3,4,5], [0,1], [3,5], [0,1,2,3,4,5] ]
n_cols = sum( [ len(c) for c in col_list])
h5f2.create_dataset('combined', dtype=ds_0_dtype, shape=(n_rows,n_cols), maxshape=(n_rows,None))
# Loop over datasets in file1, read data into xfer_arr, and write to file2
first = 0
for ds, cols in zip(ds_list, col_list) :
xfer_arr = h5f1[ds][:,cols]
last = first + xfer_arr.shape[1]
h5f2['combined'][:, first:last] = xfer_arr[:]
first = last

Here you go; a simple example to copy values from 3 datasets in file1 to a single dataset in file2. I included some tests to verify compatible dtype and shape. The code to create file1 are included at the top. Comments in the code should explain the process. I have another post that shows multiple ways to copy data between 2 HDF5 files. See this post: How can I combine multiple .h5 file?
import h5py
import numpy as np
import sys
# Data for file1
arr1 = np.random.random(80).reshape(20,4)
arr2 = np.random.random(40).reshape(20,2)
arr3 = np.random.random(60).reshape(20,3)
#Create file1 with 3 datasets
with h5py.File('file1.h5','w') as h5f :
h5f.create_dataset('ds_1',data=arr1)
h5f.create_dataset('ds_2',data=arr2)
h5f.create_dataset('ds_3',data=arr3)
# Open file1 for reading and file2 for writing
with h5py.File('file1.h5','r') as h5f1 , \
h5py.File('file2.h5','w') as h5f2 :
# Loop over datasets in file1 and check data compatiblity
for i, ds in enumerate(h5f1.keys()) :
if i == 0:
ds_0 = ds
ds_0_dtype = h5f1[ds].dtype
n_rows = h5f1[ds].shape[0]
n_cols = h5f1[ds].shape[1]
else:
if h5f1[ds].dtype != ds_0_dtype :
print(f'Dset 0:{ds_0}: dtype:{ds_0_dtype}')
print(f'Dset {i}:{ds}: dtype:{h5f1[ds].dtype}')
sys.exit('Error: incompatible dataset dtypes')
if h5f1[ds].shape[0] != n_rows :
print(f'Dset 0:{ds_0}: shape[0]:{n_rows}')
print(f'Dset {i}:{ds}: shape[0]:{h5f1[ds].shape[0]}')
sys.exit('Error: incompatible dataset shape')
n_cols += h5f1[ds].shape[1]
prev_ds = ds
# Create new empty dataset with appropriate dtype and size
# Using maxshape paramater to make resizable in the future
h5f2.create_dataset('ds_123', dtype=ds_0_dtype, shape=(n_rows,n_cols), maxshape=(n_rows,None))
# Loop over datasets in file1, read data into xfer_arr, and write to file2
first = 0
for ds in h5f1.keys() :
xfer_arr = h5f1[ds][:]
last = first + xfer_arr.shape[1]
h5f2['ds_123'][:, first:last] = xfer_arr[:]
first = last

Store ndarray in a PyTable (and how to define the Col()-type)

TL;DR: I have a PyTable with a float32 Col and get an error when writing a numpy-float32-array into it. (How) can I store a numpy-array (float32) in the Column of a PyTables table?
I'm new to PyTables - following a recommendation of TFtables (a lib to use HDF5 in Tensorflow), I'm using it to store all my HDF5 data (currently distributed in batches in several files with each three datasets) within a table in a single HDF5 file. Datasets are
'data' : (n_elements, 1024, 1024, 4)#float32
'label' : (n_elements, 1024, 1024, 1)#uint8
'weights' : (n_elements, 1024, 1024, 1)#float32
where the n_elements are distributed over several files that I want to merge into one now (to allow unordered access).
So when I build my table, I figured each dataset represents a column. I built everything in a generic way that allows to do this for an arbitrary number of datasets:
# gets dtypes (and shapes) of the dsets (accessed by dset_keys = ['data', 'label', 'weights']
dtypes, shapes = _determine_shape(hdf5_files, dset_keys)
# to dynamically generate a table, I'm using a dict (not a class as in the PyTables tutorials)
# the dict is (conform with the doc): { 'col_name' : Col()-class-descendent }
table_description = {dset_keys[i]: tables.Col.from_dtype(dtypes[i]) for i in range(len(dset_keys))}
# create a file, a group-node and attach a table to it
h5file = tables.open_file(destination_file, mode="w", title="merged")
group = h5file.create_group("/", 'main', 'Node for data table')
table = h5file.create_table(group, 'data_table', table_description, "Collected data with %s" % (str(val_keys)))
The dtypes that I get for each dsets (read with h5py) are obviously the ones of the numpy arrays (ndarray) that reading the dset returns: float32 or uint8. So the Col()-types are Float32Col an UInt8Col. I naively assumed that I can now write a float32-array into this col, but filling in data with:
dummy_data = np.zeros([1024,1024,3], float32) # normally data read from other files
sample = table.row
sample['data'] = dummy_data
results in TypeError: invalid type (<class 'numpy.ndarray'>) for column ``data``. So now I feel stupid for assuming I'd be able to write an array in there, BUT there are no "ArrayCol()" types offered, neither are there any hints in the PyTables doc as to whether or how it is possible to write an array into a column. How do I do this?
There are "shape" arguments in the Col() class and it's descendents, so it should be possible, otherwise what are these for?!

I know it's a bit late, but I think the answer to your problem lies in the shape parameter for Float32Col.
Here's how it's used in the documentation:
from tables import *
from numpy import *
# Describe a particle record
class Particle(IsDescription):
name = StringCol(itemsize=16) # 16-character string
lati = Int32Col() # integer
longi = Int32Col() # integer
pressure = Float32Col(shape=(2,3)) # array of floats (single-precision)
temperature = Float64Col(shape=(2,3)) # array of doubles (double-precision)
# Open a file in "w"rite mode
fileh = open_file("tutorial2.h5", mode = "w")
# Get the HDF5 root group
root = fileh.root
# Create the groups:
for groupname in ("Particles", "Events"):
group = fileh.create_group(root, groupname)
# Now, create and fill the tables in Particles group
gparticles = root.Particles
# Create 3 new tables
for tablename in ("TParticle1", "TParticle2", "TParticle3"):
# Create a table
table = fileh.create_table("/Particles", tablename, Particle, "Particles: "+tablename)
# Get the record object associated with the table:
particle = table.row
# Fill the table with 257 particles
for i in xrange(257):
# First, assign the values to the Particle record
particle['name'] = 'Particle: %6d' % (i)
particle['lati'] = i
particle['longi'] = 10 - i
########### Detectable errors start here. Play with them!
particle['pressure'] = array(i*arange(2*3)).reshape((2,4)) # Incorrect
#particle['pressure'] = array(i*arange(2*3)).reshape((2,3)) # Correct
########### End of errors
particle['temperature'] = (i**2) # Broadcasting
# This injects the Record values
particle.append()
# Flush the table buffers
table.flush()
Here's the link to the part of the documentation I'm referring to
https://www.pytables.org/usersguide/tutorials.html

Edit: I just saw that the tables.Col.from_type(type, shape) allows using the precision of a type (float32 instead of float alone). The rest stays the same (takes a string and shape).
The factory function tables.Col.from_kind(kind, shape) can be used to construct a Col-Type that supports ndarrays. What "kind" is and how to use this isn't documented anywhere I found; however with trial and error I found that allowed "kind"s are strings of basic datatypes. I.e.: 'float', 'uint', ... without the precision (NOT 'float64')
Since I get numpy.dtypes from h5py reading a dataset (dset.dtype), these have to be cast to str and the precision needs to be removed.
In the end the relevant lines look like this:
# get key, dtype and shapes of elements per dataset from the datasource files
val_keys, dtypes, element_shapes = _get_dtypes(datasources, element_axis=element_axis)
# for storing arrays in columns apparently one has to use "kind"
# "kind" cannot be created with dtype but only a string representing
# the dtype w/o precision, e.g. 'float' or 'uint'
dtypes_kind = [''.join(i for i in str(dtype) if not i.isdigit()) for dtype in dtypes]
# create table description as dictionary
description = {val_keys[i]: tables.Col.from_kind(dtypes_kind[i], shape=element_shapes[i]) for i in range(len(val_keys))}
Then writing data into the table finally works as suggested:
sample = table.row
sample[key] = my_array
Since it all felt a bit "hacky" and isn't documented well, I am still wondering, whether this is not an intended use for PyTables and would leave this question open for abit to see if s.o. knows more about this...

h5py extend dataset without knowing row index

EDIT to show more accurately my situation
import h5py # necessary for storing data
import numpy as np
dat = np.random.random([3, 2])
# create 'test.hdf5' if not exist, otherwise open
with h5py.File('test.? Shdf5', 'a') as f:
# create group or get ref to existing group
group = f.require_group('test_group')
# create dataset or get ref to existing dataset
dataset = group.require_dataset('test_set', shape=(0, dat.shape[1]),
maxshape=(None, dat.shape[1]),
dtype=float, chunks=True)
dataset_shape = dataset.shape # get shape of current dataset
dataset_new_length = dataset_shape[0] + dat.shape[0] # new row length
dataset.resize((dataset_new_length, dat.shape[1])) # increase row length dataset
dataset[dataset_shape[0]:dataset_new_length] = dat # add new data to dataset
Problem
The first time
I run this script, it works without problems. The 2nd time however shape=(0, dat.shape[1]) does not match anymore, because it became shape=(3, dat.shape[1]). The whole point of require_dataset() is that you don't have to make a try: except: construction, however because of the required shape=(), it does not work with a maxshape=(None, dat.shape[1]).
Also, shape=(None, dat.shape[1]) is not allowed.
Question
Is there a solution using require_dataset() that does not use a try: except: like this (if you replace dataset = group.require_dataset() with this piece of code, the script runs without problems)?:
try:
dataset = group.create_dataset('test_set', shape=(0, dat.shape[1]),
maxshape=(None, dat.shape[1]),
dtype=float, chunks=True)
except:
dataset = group['test_set']
Better solution would be
A command where I don't need to know the size would be even better, as something like this:
dataset = dataset.extend(data)

Numpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names

I am running Python 2.6. I have the following example where I am trying to concatenate the date and time string columns from a csv file. Based on the dtype I set (None vs object), I am seeing some differences in behavior that I cannot explained, see Question 1 and 2 at the end of the post. The exception returned is not too descriptive, and the dtype documentation doesn't mention any specific behavior to expect when dtype is set to object.
Here is the snippet:
#! /usr/bin/python
import numpy as np
# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())
# (Fail) case 1: dtype=None splicing a column fails
next(data) # eat away the title line
header = [item.strip() for item in next(data).split(',')] # get the headers
arr1 = np.genfromtxt(data, dtype=None, delimiter=',',skiprows=1)# skiprows=1 for the row with units
arr1.dtype.names = header # assign the header to names
# so we can do y=arr['Speed']
y1 = arr1['Speed']
# Q1 IndexError: invalid index
#a1 = arr1[:,0]
#print a1
# EDIT1:
print "arr1.shape "
print arr1.shape # (3,)
# Fails as expected TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'
# z1 = arr1['Date'] + arr1['Time']
# This can be workaround by specifying dtype=object, which leads to case 2
data.seek(0) # resets
# (Fail) case 2: dtype=object assign header fails
next(data) # eat away the title line
header = [item.strip() for item in next(data).split(',')] # get the headers
arr2 = np.genfromtxt(data, dtype=object, delimiter=',',skiprows=1) # skiprows=1 for the row with units
# Q2 ValueError: there are no fields define
#arr2.dtype.names = header # assign the header to names. so we can use it to do indexing
# ie y=arr['Speed']
# y2 = arr['Date'] + arr['Time'] # column headings were assigned previously by arr.dtype.names = header
data.seek(0) # resets
# (Good) case 3: dtype=object but don't assign headers
next(data) # eat away the title line
header = [item.strip() for item in next(data).split(',')] # get the headers
arr3 = np.genfromtxt(data, dtype=object, delimiter=',',skiprows=1) # skiprows=1 for the row with units
y3 = arr3[:,0] + arr3[:,1] # slice the columns
print y3
# case 4: dtype=None, all data are ints, array dimension 2-D
# simulate a csv file
from StringIO import StringIO
data2 = StringIO("""
Title
Date,Time,Speed
,,(m/s)
45,46,85
12,13,86
50,46,87
""".strip())
next(data2) # eat away the title line
header = [item.strip() for item in next(data2).split(',')] # get the headers
arr4 = np.genfromtxt(data2, dtype=None, delimiter=',',skiprows=1)# skiprows=1 for the row with units
#arr4.dtype.names = header # Value error
print "arr4.shape "
print arr4.shape # (3,3)
data2.seek(0) # resets
Question 1: At comment Q1, why can I not slice a column, when dtype=None?
This could be avoided by
a) arr1=np-genfromtxt... was initialized with dtype=object like case 3,
b) arr1.dtype.names=... wascommented out to avoid the Value error in case 2
Question 2: At comment Q2, why can I not set the dtype.names when dtype=object?
EDIT1:
Added a case 4 that shows when the dimension of the array would be 2-D if the values in the simulated csv files are all ints instead. One can slice the column, but assigning the dtype.names would still fail.
Update the term 'splice' to 'slice'.

Question 1
This is indexing, not 'splicing', and you can't index into the columns of data for exactly the same reason I explained to you before in my answer to Question 7 here. Look at arr1.shape - it is (3,), i.e. arr1 is 1D, not 2D. There are no columns for you to index into.
Now look at the shape of arr2 - you'll see that it's (3,3). Why is this? If you do specify dtype=desired_type, np.genfromtxt will treat every delimited part of your input string the same (i.e. as desired_type), and it will give you an ordinary, non-structured numpy array back.
I'm not quite sure what you wanted to do with this line:
z1 = arr1['Date'] + arr1['Time']
Did you mean to concatenate the date and time strings together like this: '2012-04-01 00:10'? You could do it like this:
z1 = [d + ' ' + t for d,t in zip(arr1['Date'],arr1['Time'])]
It depends what you want to do with the output (this will give you a list of strings, not a numpy array).
I should point out that, as of version 1.7, Numpy has core array types that support datetime functionality. This would allow you to do much more useful things like computing time deltas etc.
dts = np.array(z1,dtype=np.datetime64)
Edit:
If you want to plot timeseries data, you can use matplotlib.dates.strpdate2num to convert your strings to matplotlib datenums, then use plot_date():
from matplotlib import dates
from matplotlib import pyplot as pp
# convert date and time strings to matplotlib datenums
dtconv = dates.strpdate2num('%Y-%m-%d%H:%M')
datenums = [dtconv(d+t) for d,t in zip(arr1['Date'],arr1['Time'])]
# use plot_date to plot timeseries
pp.plot_date(datenums,arr1['Speed'],'-ob')
You should also take a look at Pandas, which has some nice tools for visualising timeseries data.
Question 2
You can't set the names of arr2 because it is not a structured array (see above).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python h5py - 'Shape tuple is incompatible with data' error - python

Related

Is there a way to extend a PyTables EArray in the second dimension?

Creating a dataset from multiple hdf5 groups

Store ndarray in a PyTable (and how to define the Col()-type)

h5py extend dataset without knowing row index

Numpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names

Categories

Resources