Copy FITS file HDUs and data - python

I am trying to update a FITS file with a new column of data. My file has a Primary HDU, and two other HDUs, each one including a table.
Since adding a new column to the table of an already existing FITS file is a pain (unsolvable, see here and here), I changed my mind and try to focus on creating a new file with a modified table.
This means I have to copy all the rest from the original file (Primary HDU, other HDUs, etc.). Is there a standard way to do this? Or, what is the best (fastest?) way, possibly avoiding to copy each element one by one "by hand"?

On the topic of adding a new columns, have you seen this documentation? This is the most straightforward way to create a new table with the new column added. This necessarily involves creating a new binary table HDU since it describes different data.
Or have you looked into the Astropy table interface? It supports reading and writing FITS tables. See here. It basically works the same way but goes to some more efforts to the hide the details. This is the interface the PyFITS/astropy.io.fits interface is gradually being replaced with since it actually provides a good table interface.
Adding a new HDU or replacing an existing HDU in an existing FITS file is simply a matter of opening that file and updating the HDUList data structure (which works like a normal Python list) and writing the updated HDUList to a new file.
A full example might look something like:
try:
from astropy.io import fits
except ImportError:
import pyfits as fits
with fits.open('path/to/file.fits') as hdul:
table_hdu = hdul[1] # If the table is the first extension HDU
new_column = fits.Column(name='NEWCOL', format='D', array=np.zeros(len(table_hdu.data)))
new_columns = fits.ColDefs([new_column])
new_table_hdu = fits.new_table(table_hdu.columns + new_columns)
# Replace the original table HDU with the new one
hdul[1] = new_table_hdu
hdul.writeto('path/to/new_file.fits')
Something roughly like that should work. This will be easier in Astropy once the new Table interface is fully integrated but for now that's what it involves. There is no reason to do anything "by hand" so to speak.

Related

Changes in h5 file arent refelcted in xdmf file

Hello I was given an h5 file with an xdmf file associated with it. The visualization looks like this. Here the color is just the feature ID. I wanted to add some data to the h5 file to be able to visualize it in paraview. The newly added data does not appear in the paraview although clearly being there when using hdfview. The data I'm trying to add are the ones titled engineering stress and true stress. The only difference I noticed is that the number of attributes for these is zero while it's 5 for the rest but I dont know what to do with that information.
Here's the code I currently have set up:
nf_product = h5py.File(filename,"a")
e_princ = np.empty((280,150,280,3))
t_princ = e_princ
for i in tqdm(range(grain_count)):
a = np.where(feature_ID == i+1)
e_princ[a,0] = eng_stress[i,0]
e_princ[a,1] = eng_stress[i,1]
e_princ[a,2] = eng_stress[i,2]
t_princ[a,0] = true_stress[i,0]
t_princ[a,1] = true_stress[i,1]
t_princ[a,2] = true_stress[i,2]
EngineeringStress = nf_product.create_dataset('DataContainers/nfHEDM/CellData/EngineeringStressPrinciple',data=np.float32(e_princ))
TrueStress = nf_product.create_dataset('DataContainers/nfHEDM/CellData/TrueStressPrinciple',data=np.float32(t_princ))
I am new to using h5 and xdmf files so I may be going about this entirely wrong but the way I understand it is an xdmf file acts as a pointer to the data in the h5 file so I can't understand why the new data doesnt appear in paraview.
First, did you close the file withnf_product.close()? If not, new datasets may not have been flushed from memory. You may also need to flush the buffers withnf_product.flush() Better, use the Python with / as: file context manager and it is done automatically.
Next, you can simply use data=e_princ (and t_princ), there is no need to cast a numpy array to a numpy array.
Finally, verify the values in e_princ and t_princ. I think will be the same because they reference the same numpy object. You need to create t_princ as an empty array, the same as e_princ. Also they have 4 indices, and you only have 2 when you populate them with [a,0]. Be sure that works as expected.

Appending Excel cell values using pandas

Edit: I found out a solution to my question. More or less look at the user manual for openPyxl instead of online tutorials, the tutorials ran errors when I tried them (I tried more than one) and their thought process was significantly different from the thought process in the user manual. And also I ended up not using pandas as much as I thought I would need to.
I am trying to append certain values in an Excel file with multiple sheets based on user inputs and then rewrite it to the Excel file (without deleting the rest of the sheets). So far I have tried this which seems to combine the data but I didn't quite see how it applied to what I am doing since I want to append a part of a sheet instead of rewrite the whole excel file. I have also tried a few other things with ExcelWriter but I don't quite understand it since it usually wipes all the data in the file (I may be using it wrong).
episode_dataframe = pd.read_excel (r'All_excerpts (Siena Copy)_test.xlsx', sheet_name=episode)
#episode is a specified string inputted by user, this line makes a data frame for the specified sheet
episode_dataframe.loc[(int(pass_num) - 1), 'Resources'] = resources
#resources is also a user inputted string, it's what I am trying to append the spreadsheet cell value to, this appends to corresponding data frame
path_R = open("All_excerpts (Siena Copy)_test.xlsx", "rb")
with pd.ExcelWriter(path_R) as writer:
writer.book = openpyxl.load_workbook(path_R)
#I copied this from [here][3], i think it should make the writer for the to_excel? I don't fully know
episode_dataframe.to_excel(writer, sheet_name=episode, engine=openpyxl, if_sheet_exsits ='replace')
#this should write the sheet data frame onto the file, but I don't want it to delete the other sheets
Additionally, I have been running into a bunch of other smaller errors, a big one was Workbook' object has no attribute 'add worksheet' even though I'm not trying to add a worksheet, also I could not get their solution to work.
I am a bit of a novice at python, so my code might be a bit of a mess.

Combining Astropy FITS files?

So I have some Astropy fits tables that I save (they have all have the same format, column names, etc.). I want to take all these fits files and combine them to make one large fits file.
Currently, I am playing around with the astropy.io append and update functions to no avail.
Any help would be greatly appreciated.
So I have it working now. This is what I did essentially:
# Read in the fits table you want to append
table = Table.read(input_file, format='fits')
# Read in the large table you want to append to
base_table = Table.read('base_file.fits', format='fits')
# Use Astropy's 'vstack' function and overwrite the file
concat_table = vstack([base_table,append_table])
concat_table.write('base_file.fits', format='fits', overwrite=True)
In my case, all the columns are the same for every table. So I just looped through all the fits files and appended them one at a time. There are probably other ways to do this, but I found this was the easiest.

How to store complex csv data in django?

I am working on django project.where user can upload a csv file and stored into database.Most of the csv file i saw 1st row contain header and then under the values but my case my header presents on column.like this(my csv data)
I did not understand how to save this type of data on my django model.
You can transpose your data. I think it is more appropriate for your dataset in order to do real analysis. Usually things such as id values would be the row index and the names such company_id, company_name, etc would be the columns. This will allow you to do further analysis (mean, std, variances, ptc_change, group_by) and use pandas at its fullest. Thus said:
import pandas as pd
df = pd.read_csv('yourcsvfile.csv')
df2 = df.T
Also, as #H.E. Lee pointed out. In order to save your model to your database, you can either use the method to_sql in your dataframe to save in mysql (e.g. your connection), if you're using mongodb you can use to_json and then import the data, or you can manually set your function transformation to your database.
You can flip it with the built-in CSV module quite easily, no need for cumbersome modules like pandas (which in turn requires NumPy...)... Since you didn't define the Python version you're using, and this procedure differs slightly between the versions, I'll assume Python 3.x:
import csv
# open("file.csv", "rb") in Python 2.x
with open("file.csv", "r", newline="") as f: # open the file for reading
data = list(map(list, zip(*csv.reader(f)))) # read the CSV and flip it
If you're using Python 2.x you should also use itertools.izip() instead of zip() and you don't have to turn the map() output into a list (it already is).
Also, if the rows are uneven in your CSV you might want to use itertools.zip_longest() (itertools.izip_longest() in Python 2.x) instead.
Either way, this will give you a 2D list data where the first element is your header and the rest of them are the related data. What you plan to do from there depends purely on your DB... If you want to deal with the data only, just skip the first element of data when iterating and you're done.
Given your data it may be best to store each row as a string entry using TextField. That way you can be sure not to lose any structure going forward.

Pyfits or astropy.io.fits add row to binary table in fits file

How can I add a single row to binary table inside large fits file using pyfits, astropy.io.fits or maybe some other python library?
This file is used as a log, so every second a single row will be added, eventually the size of the file will reach gigabytes so reading all the file and writing it back or keeping the copy of data in memory and writing it to the file every seconds is actually impossible. With pyfits or astropy.io.fits so far I could only read everything to memory add new row and then write it back.
Example. I create fits file like this:
import numpy, pyfits
data = numpy.array([1.0])
col = pyfits.Column(name='index', format='E', array=data)
cols = pyfits.ColDefs([col])
tbhdu = pyfits.BinTableHDU.from_columns(cols)
tbhdu.writeto('test.fits')
And I want to add some new value to the column 'index', i.e. add one more row to the binary table.
Solution This is a trivial task for cfitsio library (method fits_insert_row(...)), so I use python module which is based on it: https://github.com/esheldon/fitsio
And here is the solution using fitsio. To create new fits file the one can do:
import fitsio, numpy
from fitsio import FITS,FITSHDR
fits = FITS('test.fits','rw')
data = numpy.zeros(1, dtype=[('index','i4')])
data[0]['index'] = 1
fits.write(data)
fits.close()
To append a row:
fits = FITS('test.fits','rw')
#you can actually use the same already opened fits file,
#to flush the changes you just need: fits.reopen()
data = numpy.zeros(1, dtype=[('index','i4')])
data[0]['index'] = 2
fits[1].append(data)
fits.close()
Thank you for your help.

Categories