Combining Astropy FITS files? - python

So I have some Astropy fits tables that I save (they have all have the same format, column names, etc.). I want to take all these fits files and combine them to make one large fits file.
Currently, I am playing around with the astropy.io append and update functions to no avail.
Any help would be greatly appreciated.

So I have it working now. This is what I did essentially:
# Read in the fits table you want to append
table = Table.read(input_file, format='fits')
# Read in the large table you want to append to
base_table = Table.read('base_file.fits', format='fits')
# Use Astropy's 'vstack' function and overwrite the file
concat_table = vstack([base_table,append_table])
concat_table.write('base_file.fits', format='fits', overwrite=True)
In my case, all the columns are the same for every table. So I just looped through all the fits files and appended them one at a time. There are probably other ways to do this, but I found this was the easiest.

Related

displaying multiple pandas function created on python in the same csv file

How can i display multiple pandas function created on python in the same csv file
So I have multiple data tables saved as pandas dataframes, and I want to output all of them into the same CSV for ease of access. However, I am not really sure the best way to go about this, as I want to maintain each dataframes inherent structure (ie columns and index), so I can combine them all into 1 single dataframe.
You have 2 choices:
Either you combine them first (pd.concat()) with all the advantages and limitations of that appraoch, then you cann call .to_csv and it will print 1 file. If they are structurally the same, this is great because you will be able to read the file again.
Or, you call .to_csv() multiple times, and save the output in a "buffer", which you can then write (see here). Probably the only way if your DataFrames are very different from a structural perspective, but a mess to read them later.
Is .json output an option for what you want to do?
Thanks alot for the comment Kingotto, I used to first option added the this code and it was able to help me arrange my functions horizontally and exported the file to csv like this:
frames = pd.concat([file_1, file_2, file_3], axis = 1)
save the dataframe
frames.to_csv('Combined.csv', index = False)

How to copy a partial or skeleton h5py file

I have a few questions wrapped up into this issue. I realize this might be a convoluted post and can provide extra details.
A code package I use can produce large .h5 files (source.h5) (100+ Gb), where almost all of this data resides in 1 dataset (group2/D). I want to make a new .h5 file (dest.h5) using Python that contains all datasets except group2/D of source.h5 without needing to copy the entire file. I then will condense group2/D after some postprocessing and write a new group2/D in dest.h5 with much less data. However, I need to keep source.h5 because this postprocessing may need to be performed multiple times into multiple destination files.
source.h5 is always structured the same and cannot be changed in either source.h5 or dest.h5, where each letter is a dataset:
group1/A
group1/B
group2/C
group2/D
I thus want to initially make a file with this format:
group1/A
group1/B
group2/C
and again, fill in group2/D later. Simply copying source.h5 multiple times is always possible, but I'd like to avoid having to copy a huge file a bunch of times because disk space is limited and this is something that isn't a 1 off case.
I searched and found this question (How to partially copy using python an Hdf5 file into a new one keeping the same structure?) and tested if dest.h5 would be the same as source.h5:
fs = h5py.File('source.h5', 'r')
fd = h5py.File('dest.h5', 'w')
fs.copy('group1', fd)
fd.create_group('group2')
fs.copy('group2/C', fd['/group2'])
fd.copy('group2/D', fd['/group2'])
fs.close()
fd.close()
but the code package I used couldn't read the file I created (which I must have happen), implying there was some critical data loss when I did this operation (the file sizes differ by 7 kb also). I'm assuming the problem was when I created group2 manually because I checked with numpy that the values in group1 datasets exactly matched in both source.h5 and dest.h5. Before I did any digging into what data is missing I wanted to get a few things out of the way:
Question 1: Is there .h5 file metadata that accompanies each group or dataset? If so, how can I see it so I can create a group2 in dest.h5 that exactly matches the one in source.h5? Is there a way to see if 2 groups (not datasets) exactly match each other?
Question 2: Alternatively, is it possible to simply copy the data structure of a .h5 file (i.e. groups and datasets with empty lists as a skeleton file) so that fields can be populated later? Or, as a subset of this question, is there a way to copy a blank dataset to another file such that any metadata is retained (assuming there is some)?
Question 3: Finally, to avoid all this, is it possible to just copy a subset of source.h5 to dest.h5? With something like:
fs.copy(['group1','group2/C'], fd)
Thanks for your time. I appreciate you reading this far

make custom spreadsheets with python

I have a pandas data frame with two columns:
year experience and salary
I want to save a csv file with these two columns and also have some stats at the head of the file as in the image:
Is there any option to handle these with pandas or any other library of do I have to make a script to write it line adding the commas between fields?
Pandas does not support what you want to do here. The problem is that your format is no valid csv. The RFC for CSV states that Each record is located on a separate line, implying that a line corresponds to a record, with an optional header line. Your format adds the average and max values, which do not correspond to records.
As I see it, you have three paths to go from here: i. You create two separate data frames and map them to csv files (super precise would be 3), one with your records, one with the additional values. ii. Write your data frame to csv first, then open that file and insert the your additional values at the top. iii. If your goal is an import into excel, however, #gefero 's suggestion is the right hint: try using the xslxwriter package do directly write to cells in a spreadsheet.
You can read the file as two separate parts (stats and csv)
Reading stats:
number_of_stats_rows = 3
stats = pandas.read_csv(file_path, nrows=number_of_stats_rows, header=None).fillna('')
Reading remaining file:
other_data = pandas.read_csv(file_path, skiprows=number_of_stats_rows).fillna('')
Take a look to xslxwriter. Perhaps it´s what you are looking for.

Copy FITS file HDUs and data

I am trying to update a FITS file with a new column of data. My file has a Primary HDU, and two other HDUs, each one including a table.
Since adding a new column to the table of an already existing FITS file is a pain (unsolvable, see here and here), I changed my mind and try to focus on creating a new file with a modified table.
This means I have to copy all the rest from the original file (Primary HDU, other HDUs, etc.). Is there a standard way to do this? Or, what is the best (fastest?) way, possibly avoiding to copy each element one by one "by hand"?
On the topic of adding a new columns, have you seen this documentation? This is the most straightforward way to create a new table with the new column added. This necessarily involves creating a new binary table HDU since it describes different data.
Or have you looked into the Astropy table interface? It supports reading and writing FITS tables. See here. It basically works the same way but goes to some more efforts to the hide the details. This is the interface the PyFITS/astropy.io.fits interface is gradually being replaced with since it actually provides a good table interface.
Adding a new HDU or replacing an existing HDU in an existing FITS file is simply a matter of opening that file and updating the HDUList data structure (which works like a normal Python list) and writing the updated HDUList to a new file.
A full example might look something like:
try:
from astropy.io import fits
except ImportError:
import pyfits as fits
with fits.open('path/to/file.fits') as hdul:
table_hdu = hdul[1] # If the table is the first extension HDU
new_column = fits.Column(name='NEWCOL', format='D', array=np.zeros(len(table_hdu.data)))
new_columns = fits.ColDefs([new_column])
new_table_hdu = fits.new_table(table_hdu.columns + new_columns)
# Replace the original table HDU with the new one
hdul[1] = new_table_hdu
hdul.writeto('path/to/new_file.fits')
Something roughly like that should work. This will be easier in Astropy once the new Table interface is fully integrated but for now that's what it involves. There is no reason to do anything "by hand" so to speak.

Exporting a list to a CSV/space separated and each sublist in its own column

I'm sure there is an easy way to do this, so here goes. I'm trying to export my lists into CSV in columns. (Basically, it's how another program will be able to use the data I've generated.) I have the group called [frames] which contains [frame001], [frame002], [frame003], etc. I would like the CSV file that's generated to have all the values for [frame001] in the first column, [frame002] in the second column, and so on. I thought if I could save the file as CSV I could manipulate it in Excel, however, I figure there is a solution that I can program to skip that step.
This is the code that I have tried using so far:
import csv
data = [frames]
out = csv.writer(open(filename,"w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
I have also tried:
import csv
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
If there's a way to do this so that all the values are space separated, that would be ideal, but at this point I've been trying this for hours and can't get my head around the right solution.
What you're describing is that you want to translate a 2 dimensional array of data. In Python you can achieve this easily with the zip function as long as the inner lists are all the same length.
out.writerows(zip(*data))
If they are not all the same length, you can use itertools.izip_longest to fill the remaining fields with some default value (even '').

Categories