Hello I was given an h5 file with an xdmf file associated with it. The visualization looks like this. Here the color is just the feature ID. I wanted to add some data to the h5 file to be able to visualize it in paraview. The newly added data does not appear in the paraview although clearly being there when using hdfview. The data I'm trying to add are the ones titled engineering stress and true stress. The only difference I noticed is that the number of attributes for these is zero while it's 5 for the rest but I dont know what to do with that information.
Here's the code I currently have set up:
nf_product = h5py.File(filename,"a")
e_princ = np.empty((280,150,280,3))
t_princ = e_princ
for i in tqdm(range(grain_count)):
a = np.where(feature_ID == i+1)
e_princ[a,0] = eng_stress[i,0]
e_princ[a,1] = eng_stress[i,1]
e_princ[a,2] = eng_stress[i,2]
t_princ[a,0] = true_stress[i,0]
t_princ[a,1] = true_stress[i,1]
t_princ[a,2] = true_stress[i,2]
EngineeringStress = nf_product.create_dataset('DataContainers/nfHEDM/CellData/EngineeringStressPrinciple',data=np.float32(e_princ))
TrueStress = nf_product.create_dataset('DataContainers/nfHEDM/CellData/TrueStressPrinciple',data=np.float32(t_princ))
I am new to using h5 and xdmf files so I may be going about this entirely wrong but the way I understand it is an xdmf file acts as a pointer to the data in the h5 file so I can't understand why the new data doesnt appear in paraview.
First, did you close the file withnf_product.close()? If not, new datasets may not have been flushed from memory. You may also need to flush the buffers withnf_product.flush() Better, use the Python with / as: file context manager and it is done automatically.
Next, you can simply use data=e_princ (and t_princ), there is no need to cast a numpy array to a numpy array.
Finally, verify the values in e_princ and t_princ. I think will be the same because they reference the same numpy object. You need to create t_princ as an empty array, the same as e_princ. Also they have 4 indices, and you only have 2 when you populate them with [a,0]. Be sure that works as expected.
Related
Trying to extract a sample from of large file using numpy's memmap:
# indices - boolean vector indicating which lines we want to extract
file_big = np.memmap('path_big_file',dtype='int16',shape=(indices.shape[0],L))
file_small = np.memmap('new_path_for_small_file',dtype='int16',shape=(indices.sum(),L))
The expected result would be that a new file will be created with only part of the data, as identified by the indices.
# place data in files:
file_small[:] = file_big[indices]
The above is the described procedure in the manual. It does not work - called as not having enough memory, even though memory should not be an issue: using only memmap and not uploading data.
I'm trying to create a new fits file from an initial template.fits
This template.fits has the table of the voice 1 with 3915 rows, instead, my new file, must have more then 50000 rows.
The part of the code is the following:
hdulist = fits.open('/Users/Martina/Desktop/Ubuntu_Condivisa/Post_Doc_IAPS/ASTRI/ASTRI_scienceTools/Astrisim_MC/template.fits')
hdu0=hdulist[0]
hdu0.writeto(out_pile+'.fits', clobber=True)
hdu1=hdulist[1]
hdu1.header['NAXIS2'] = na
hdu1.header['ONTIME'] = tsec
hdu1.header['LIVETIME'] = tsec
hdu1.writeto(out_pile+'.fits', clobber=True)
hdu1_data=hdu1.data
for j in range(na-1):
hdu1_data[j+1][1]=j+1
hdu1_data[j+1][3]=t[j]+0.
hdu1_data[j+1][7]=ra[j]
hdu1_data[j+1][8]=dec[j]
hdu1_data[j+1][21]=enetot[j]
hdu1.writeto(out_pile+'.fits', clobber=True)
When I try to fill the new table (the last part of the code), the error is the following:
Traceback (most recent call last):
File "C:\Users\Martina\AppData\Local\Programs\Python\Python36\lib\site-packages\astropy\utils\decorators.py", line 734, in __get__
return obj.__dict__[self._key]
KeyError: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Astrisim_MC_4.py", line 340, in
hdu1_data=hdu1.data
File "C:\Users\Martina\AppData\Local\Programs\Python\Python36\lib\site-packages\astropy\utils\decorators.py", line 736, in __get__
val = self.fget(obj)
File "C:\Users\Martina\AppData\Local\Programs\Python\Python36\lib\site-packages\astropy\io\fits\hdu\table.py", line 404, in data
data = self._get_tbdata()
File "C:\Users\Martina\AppData\Local\Programs\Python\Python36\lib\site-packages\astropy\io\fits\hdu\table.py", line 171, in _get_tbdata
self._data_offset)
File "C:\Users\Martina\AppData\Local\Programs\Python\Python36\lib\site-packages\astropy\io\fits\hdu\base.py", line 478, in _get_raw_data
return self._file.readarray(offset=offset, dtype=code, shape=shape)
File "C:\Users\Martina\AppData\Local\Programs\Python\Python36\lib\site-packages\astropy\io\fits\file.py", line 279, in readarray
buffer=self._mmap)
TypeError: buffer is too small for requested array
I tried to vary the number of rows and the code works correctly up to 3969 rows.
How can I solve the problem?
Thank you very much in advance,
cheers!
Martina
Your initial problem where where you did this:
hdu1.header['NAXIS2'] = na
A natural thing to think you might be able to do, but you actually should not. In general when working with astropy.io.fits, one should almost never manually mess with keywords in the FITS header that describe the structure of the data itself. This stems in part from the design of FITS itself--that it mixes these structural keywords in with metadata keywords--and partly a design issue with astropy.io.fits that it lets you manipulate these keywords at all, or that it doesn't more tightly tie the data to them. I wrote about this issue at more length here: https://github.com/astropy/astropy/issues/3836 but never got around to adding more explanation of this to the documentation.
Basically the way you can think about it is that when a FITS file is opened, its header is first read and parsed into a Header object containing all the header keywords. Some book-keeping is also done to keep track of how much data is in the file after the header. Then when you access the data of the HDU the header keywords are used to determine what the type and shape of the data is. So by doing something like
hdu1.header['NAXIS2'] = na
hdu1_data = hdu1.data
this isn't somehow growing the data in the file. Instead it's just confusing it into thinking there are more rows of data in the file then there actually are, hence the error "buffer is too small for requested array". The "buffer" it's referring to in this case is the rest of the data in the file, and you're requesting that it read an array that's longer than there is data in the file.
The fact that it allows you do break this at all is a bug in Astropy IMO. When the file is first opened it should save away all the correct structural keywords in the background, so that the data can still be loaded properly even if the user accidentally modifies these keywords (or perhaps the user should be completely prevented from modifying these keywords directly.
That's a long way to explain where you went wrong, but maybe it will help better understand how the library works.
As to your actual question, I think #Evert's advice is good, to use the higher-evel and easier to work with astropy.table to create a new table that's the size you need, and then copy the existing table into the new one. You can open the FITS table directly as a Table object as well with Table.read. I think you can also copy the FITS metadata over but I'm not sure exactly the best way to do that.
One other minor comment unrelated to your main question--when working with arrays you don't have to (and in fact shouldn't) use for loops to perform vectorizable operations.
For example since this is just looping over array indices:
for j in range(na-1):
hdu1_data[j+1][1]=j+1
hdu1_data[j+1][3]=t[j]+0.
hdu1_data[j+1][7]=ra[j]
hdu1_data[j+1][8]=dec[j]
hdu1_data[j+1][21]=enetot[j]
you can write operations like this like:
hdu1_data[:][1] = np.arange(na)
hdu1_data[:][3] = t + 0.
hdu1_data[:][7] = ra
and so on (I'm not sure why you were doing j+1 because this is skipping the first row, but the point still stands). This assumes of course that the array being updated (hdu1_data, in this case) already has na rows. But that's why you need to grow or concatenate to your array first if it's not already that size.
I am often writing scripts for various 3d packages (3ds max, Maya, etc) and that is why I am interested with Alembic, a file format that is getting a lot of attention lately.
Quick explanation for anyone who does not know this project: alembic - www.alembic.io - is a file format created for containing 3d meshes and data connected with them. It is using a tree-like structure, as You may see below, with one root node and its childs, childs of childs etc. Objects of this node can have properties.
I am trying to learn how to use this Alembic with Python.
There are some tutorials on docks page of this project and I'm having some problems with this one:
http://docs.alembic.io/python/cask.html
It's about using cask module - a wrapper that should manipulating a content of files easier.
This part:
a = cask.Archive("animatedcube.abc")
r = cask.Xform()
x = a.top.children["cube1"]
a.top.children["root"] = r
r.children["cube1"] = x
a.write_to_file("/var/tmp/cask_insert_node.abc")
works well. Afther that there's new file "cask_insert_node.abc" and it has objects as expected.
But when I'm adding some properties to objects, like this:
a = cask.Archive("animatedcube.abc")
r = cask.Xform()
x = a.top.children["cube1"]
x.properties['new_property'] = cask.Property()
a.top.children["root"] = r
r.children["cube1"] = x
a.write_to_file("/var/tmp/cask_insert_node.abc")
the "cube1" object in a resulting file do not contain property "new_property".
The saving process is a problem, i know that the property has been added to "cube1" before saving, I've checked it another way, with a function that I wrote which creates graph of objects in archive.
The code for this module is there:
source
Does anyone know what I am doing wrong? How to save parameters? Some other way?
Sadly, cask doesn't support this. One cannot modify an archive and have the result saved (somehow related to how Alembic streams the data off of disk). What you'll want to do is create an output archive
oArchive = alembic.Abc.CreateArchiveWithInfo(...)
then copy all desired data from your input archive over to your output archive including time sampling (
.addTimeSampling(iArchive.getTimeSampling(i) for i in iArchive.getNumTimeSamplings()
, and objects, recursing through iArchive.getTop() and oArchive.getTop() defining output properties (alembic.Abc.OArrayProperty, or OScalarProperty) as you encounter them in the iArchive. When these are defined, you can interject your new values as samples to the property at that time.
It's a real beast, and something that cask really ought to support. In fact, someone in the Alembic community should just do everyone a favor and write a cask2 (casket?) which wraps all of this into simple calls like you instinctively tried to do.
I am trying to update a FITS file with a new column of data. My file has a Primary HDU, and two other HDUs, each one including a table.
Since adding a new column to the table of an already existing FITS file is a pain (unsolvable, see here and here), I changed my mind and try to focus on creating a new file with a modified table.
This means I have to copy all the rest from the original file (Primary HDU, other HDUs, etc.). Is there a standard way to do this? Or, what is the best (fastest?) way, possibly avoiding to copy each element one by one "by hand"?
On the topic of adding a new columns, have you seen this documentation? This is the most straightforward way to create a new table with the new column added. This necessarily involves creating a new binary table HDU since it describes different data.
Or have you looked into the Astropy table interface? It supports reading and writing FITS tables. See here. It basically works the same way but goes to some more efforts to the hide the details. This is the interface the PyFITS/astropy.io.fits interface is gradually being replaced with since it actually provides a good table interface.
Adding a new HDU or replacing an existing HDU in an existing FITS file is simply a matter of opening that file and updating the HDUList data structure (which works like a normal Python list) and writing the updated HDUList to a new file.
A full example might look something like:
try:
from astropy.io import fits
except ImportError:
import pyfits as fits
with fits.open('path/to/file.fits') as hdul:
table_hdu = hdul[1] # If the table is the first extension HDU
new_column = fits.Column(name='NEWCOL', format='D', array=np.zeros(len(table_hdu.data)))
new_columns = fits.ColDefs([new_column])
new_table_hdu = fits.new_table(table_hdu.columns + new_columns)
# Replace the original table HDU with the new one
hdul[1] = new_table_hdu
hdul.writeto('path/to/new_file.fits')
Something roughly like that should work. This will be easier in Astropy once the new Table interface is fully integrated but for now that's what it involves. There is no reason to do anything "by hand" so to speak.
I am trying to write a 100x6 numpy array to a data file using
# print array to file
data = open("store.dat", "w")
for j in xrange(len(array)):
data.write(str(array[i]) + "\n")
but when the file writes, it automatically splits the line after 68 characters i.e. 5 columns of a 6 column array. For example, I want to see
[ 35.47842918 21.82382715 3.18277209 0.38992263 1.17862342 0.46170848]
as one of the lines, but am getting
[ 35.47842918 21.82382715 3.18277209 0.38992263 1.17862342
0.46170848]
I've narrowed it down to a problem with str(array[i]) and it deciding to make itself a new line.
Secondly, is there a better way to be going about this? I'm very new to python and know little about properly coding in it. Ultimately, I'm writing out a simulation of stars to later be read by a module that will render the coordinates in VPython. I thought, to avoid the problems of real-time rendering I could just pass a file once the simulation is complete. Is this inefficient? Should I be passing an object instead?
Thank you
Perhaps it would be more convenient to write it using numpy.save instead?
Alternatively, you can also use:
numpy.array_str(array[i], max_line_width=1000000)