Combine two FITS files by taking a 'pizza wedge' slice? - python

I would like to combine two FITS files, by taking a slice out of one and inserting it into the other. The slice would be based on an angle measured from the centre pixel, see example image below:
Can this be done using Astropy? There are many questions on combining FITS on the site, but most of these are related to simply adding two files together, rather that combining segments like this.

Here is one recommended approach:
1.Read in your two files
Assuming the data is in an ImageHDU data array..
from astropy.io import fits
# read the numpy arrays out of the files
# assuming they contain ImageHDUs
data1 = fits.getdata('file1.fits')
data2 = fits.getdata('file2.fits')
2. Cut out the sections and put them into a new numpy array
Build up indices1 & indices2 for the desired sections in the new file... A simple numpy index to fill in the missing section into a new numpy array.
After being inspired by https://stackoverflow.com/a/18354475/15531842
The sector_mask function defined in the answer to get indices for each array using angular slices.
mask = sector_mask(data1.shape, centre=(53,38), radius=100, angle_range=(280,340))
indices1 = ~mask
indices2 = mask
Then these indices can be used to transfer the data into a new array.
import numpy as np
newdata = np.zeros_like(data1)
newdata[indices1] = data1[indices1]
newdata[indices2] = data2[indices2]
If the coordinate system is well known then it may be possible to use astropy's Cutout2D class, although I was not able to figure out how to fully use it. It wasn't clear if it could do an angular slice from the example given. See astropy example https://docs.astropy.org/en/stable/nddata/utils.html#d-cutout-using-an-angular-size
3a. Then write out the new array as a new file.
If special header information is not needed in the new file. Then the numpy array with the new image can be written out to a FITS file with one line of astropy code.
# this is an easy way to write a numpy array to FITS file
# no header information is carried over
fits.writeto('file_combined.fits', data=newdata)
3b. Carry the FITS header information over to the new file
If there is a desire to carry over header information then an ImageHDU can be built from the numpy array and include the desired header as a dictionary.
img_hdu = fits.ImageHDU(data=newdata, header=my_header_dict)
hdu_list = fits.HDUList()
hdu_list.append(fits.PrimaryHDU())
hdu_list.append(img_hdu)
hdu_list.writeto('file_combined.fits')

Related

Xarray to merge two hdf5 file with different dimension length

I have some instrumental data which saved in hdf-5 format as multiple 2-d array along with the measuring time. As attached figures below, d1 and d2 are two independent file in which the instrument recorded in different time. They have the same data variables, and the only difference is the length of phony_dim_0, which represet the total data points varying with measurement time.
These files need to be loaded to a specific software provided by the instrument company for obtaining meaningful results. I want to merge multiple files with Python xarray while keeping in their original format, and then loaed one merged file into the software.
Here is my attempt:
files = os.listdir("DATA_PATH")
d1 = xarray.open_dataset(files[0])
d2 = xarray.open_dataset(files[1])
## copy a new one to save the merged data array.
d0 = d1
vars_ = [c for c in d1]
for var in vars_:
d0[var].values = np.vstack([d1[var],d2[var]])
The error shows like this:
replacement data must match the Variable's shape. replacement data has shape (761, 200); Variable has shape (441, 200)
I thought about two solution for this problem:
expanding the dimension length to the total length of all merged files.
creating a new empty dataframe in the same format of d1 and d2.
However, I still could not figure out the function to achieve that. Any comments or suggestions would be appreciated.
Supplemental information
dataset example [d1],[d2]
I'm not familiar with xarray, so can't help with your code. However, you don't need xarray to copy HDF5 data; h5py is designed to work nicely with HDF5 data as NumPy arrays, and is all you need to get merge the data.
A note about Xarray. It uses different nomenclature than HDF5 and h5py. Xarray refers to the files as 'datasets', and calls the HDF5 datasets 'data variables'. HDF5/h5py nomenclature is more frequently used, so I am going to use it for the rest of my post.
There are some things to consider when merging datasets across 2 or more HDF5 files. They are:
Consistency of the data schema (which you have checked).
Consistency of attributes. If datasets have different attribute names or values, the merge process gets a lot more complicated! (Yours appear to be consistent.)
It's preferable to create resizabe datasets in the merged file. This simplifies the process, as you don't need to know the total size when you initially create the dataset. Better yet, you can add more data later (if/when you have more files).
I looked at your files. You have 8 HDF5 datasets in each file. One nice thing: the datasets are resizble. That simplifies the merge process. Also, although your datasets have a lot of attributes, they appear to be common in both files. That also simplifies the process.
The code below goes through the following steps to merge the data.
Open the new merge file for writing
Open the first data file (read-only)
Loop thru all data sets
a. use the group copy function to copy the dataset (data plus maxshape parameters, and attribute names and values).
Open the second data file (read-only)
Loop thru all data sets and do the following:
a. get the size of the 2 datasets (existing and to be added)
b. increase the size of HDF5 dataset with .resize() method
c. write values from dataset to end of existing dataset
At the end it loops thru all 3 files and prints shape and
maxshape for all datasets (for visual comparison).
Code below:
import h5py
files = [ '211008_778183_m.h5', '211008_778624_m.h5', 'merged_.h5' ]
# Create the merge file:
with h5py.File('merged_.h5','w') as h5fw:
# Open first HDF5 file and copy each dataset.
# Will use maxhape and attributes from existing dataset.
with h5py.File(files[0],'r') as h5fr:
for ds in h5fr.keys():
h5fw.copy(h5fr[ds], h5fw, name=ds)
# Open second HDF5 file and copy data from each dataset.
# Resizes existing dataset as needed to hold new data.
with h5py.File(files[1],'r') as h5fr:
for ds in h5fr.keys():
ds_a0 = h5fw[ds].shape[0]
add_a0 = h5fr[ds].shape[0]
h5fw[ds].resize(ds_a0+add_a0,axis=0)
h5fw[ds][ds_a0:] = h5fr[ds][:]
for fname in files:
print(f'Working on file:{fname}')
with h5py.File(fname,'r') as h5f:
for ds, h5obj in h5f.items():
print (f'for: {ds}; axshape={h5obj.shape}, maxshape={h5obj.maxshape}')

Transposing data from automatically created lists

I am trying to transpose data from lists that are automatically created for each text file. Each text file gets its own version of the listsR list. I then place the lists into another list, listlist so that I can manage a list of lists. I know how to do this using declared lists but this code needs the flexibility to utilize any number of text files, transpose the lists and take the average of each index from among all the lists. This is hopefully being used to create baselines from the text files.
import os
import csv
import numpy as np
os.chdir('////Users////thomaswolff////Desktop////baseline2')
def listNew():
listlist = []
for data in os.listdir(os.getcwd()):
if data.endswith('.TXT') and 'baseline' in data:
with open(data,'rU') as file:
listsR = [[] for i in xrange(0)]
for row in csv.DictReader(file):
listsR.append(float(row[' IRI R e']))
listlist.append(listsR)
This works fine for placing the data into lists of lists but i need to transpose the data according to the index. So listlist[0:][0] would be the first index from each list within listlist. Text files of the data I'm using can be found here at Github using the baseline.txt files:
As you can see from this code I've written in the past, I know how to do this using declared lists, but this is different.
To transpose listlist you can use zip()
list_list_transpose = zip(*listlist)

Creating Arrays from cvs files in python

So I have a data file, which i must extract specific data from. Using;
x=15 #need a way for code to assess how many lines to skip from given data
maxcol=2000 #need a way to find final row in data
data=numpy.genfromtxt('data.dat.csv',skip_header=x,delimiter=',')
column_one=data[0;max,0]
column_two=data[0:max,1]
this gives me an array for the specific case where there are (x=)15 lines of metadata above the required data and where the number of rows of data is (maxcol=)2000. In what way do I go about changing the code to satisfy any value for x and maxcol?
Use pandas. Its read_csv function does all that you want (I don't include its equivalent of delimiter, sep=',', because comma-delimited is the default):
import pandas as pd
data = pd.read_csv('data.dat.csv', skiprows=x, nrows=maxcol)
If you really want that as a numpy array, you can do this:
data = data.values
But you can probably just leave it as a pandas DataFrame.

How to output the numbers in array from Fourier transform to a one single line in Excel file using Python

I want to output the numeric result from image processing to the .xls file in one line exactly in cells horizontally, does anybody can advice what Python module to use and what code to add, please? In other words, how to arrange digits from an array and put them exactly in Excel cells horizontally?
Code fragment:
def fouriertransform(self): #function for FT computation
for filename in glob.iglob ('*.tif'):
imgfourier = mahotas.imread (filename) #read the image
arrayfourier = numpy.array([imgfourier])#make an array
# Take the fourier transform of the image.
F1 = fftpack.fft2(imgfourier)
# Now shift so that low spatial frequencies are in the center.
F2 = fftpack.fftshift(F1)
# the 2D power spectrum is:
psd2D = np.abs(F2)**2
print psd2D
f.write(str(psd2D))#write to file
This should be a comment, but I don't have enough rep.
What other people say
It looks like this is a duplicate of Python Writing a numpy array to a CSV File, which uses numpy.savetxt to generate a csv file (which excel can read)
Example
Numpy is probably the easiest way to go in my opinion. You could get fancy and use the np.flatten module. After that, it would be as simple as saving the array with a comma separating each element. Something like this (tested!) is even simpler
import numpy as np
arr = np.ones((3,3))
np.savetxt("test.csv",arr,delimiter=',',newline=',')
Note this is a small hack, since newlines are treated as commas. This makes "test.csv" look like:
1,1,1,1,1,1,1,1,1, # there are 9 ones there, I promise! 3x3
Note there is a trailing comma. I was able to open this in excel no problem, voila!

rpy2: Converting a data.frame to a numpy array

I have a data.frame in R. It contains a lot of data : gene expression levels from many (125) arrays. I'd like the data in Python, due mostly to my incompetence in R and the fact that this was supposed to be a 30 minute job.
I would like the following code to work. To understand this code, know that the variable path contains the full path to my data set which, when loaded, gives me a variable called immgen. Know that immgen is an object (a Bioconductor ExpressionSet object) and that exprs(immgen) returns a data frame with 125 columns (experiments) and tens of thousands of rows (named genes). (Just in case it's not clear, this is Python code, using robjects.r to call R code)
import numpy as np
import rpy2.robjects as robjects
# ... some code to build path
robjects.r("load('%s')"%path) # loads immgen
e = robjects.r['data.frame']("exprs(immgen)")
expression_data = np.array(e)
This code runs, but expression_data is simply array([[1]]).
I'm pretty sure that e doesn't represent the data frame generated by exprs() due to things like:
In [40]: e._get_ncol()
Out[40]: 1
In [41]: e._get_nrow()
Out[41]: 1
But then again who knows? Even if e did represent my data.frame, that it doesn't convert straight to an array would be fair enough - a data frame has more in it than an array (rownames and colnames) and so maybe life shouldn't be this easy. However I still can't work out how to perform the conversion. The documentation is a bit too terse for me, though my limited understanding of the headings in the docs implies that this should be possible.
Anyone any thoughts?
This is the most straightforward and reliable way i've found to to transfer a data frame from R to Python.
To begin with, I think exchanging the data through the R bindings is an unnecessary complication. R provides a simple method to export data, likewise, NumPy has decent methods for data import. The file format is the only common interface required here.
data(iris)
iris$Species = unclass(iris$Species)
write.table(iris, file="/path/to/my/file/np_iris.txt", row.names=F, sep=",")
# now start a python session
import numpy as NP
fpath = "/path/to/my/file/np_iris.txt"
A = NP.loadtxt(fpath, comments="#", delimiter=",", skiprows=1)
# print(type(A))
# returns: <type 'numpy.ndarray'>
print(A.shape)
# returns: (150, 5)
print(A[1:5,])
# returns:
[[ 4.9  3.   1.4  0.2  1. ]
[ 4.7  3.2  1.3  0.2  1. ]
[ 4.6  3.1  1.5  0.2  1. ]
[ 5.   3.6  1.4  0.2  1. ]]
According to the Documentation (and my own experience for what it's worth) loadtxt is the preferred method for conventional data import.
You can also pass in to loadtxt a tuple of data types (the argument is dtypes), one item in the tuple for each column. Notice 'skiprows=1' to step over the column headers (for loadtxt rows are indexed from 1, columns from 0).
Finally, i converted the dataframe factor to integer (which is actually the underlying data type for factor) prior to exporting--'unclass' is probably the easiest way to do this.
If you have big data (ie, don't want to load the entire data file into memory but still need to access it) NumPy's memory-mapped data structure ('memmap') is a good choice:
from tempfile import mkdtemp
import os.path as path
filename = path.join(mkdtemp(), 'tempfile.dat')
# now create a memory-mapped file with shape and data type
# based on original R data frame:
A = NP.memmap(fpath, dtype="float32", mode="w+", shape=(150, 5))
# methods are ' flush' (writes to disk any changes you make to the array), and 'close'
# to write data to the memmap array (acdtually an array-like memory-map to
# the data stored on disk)
A[:] = somedata[:]
Why going through a data.frame when 'exprs(immgen)' returns a /matrix/ and your end goal is to have your data in a matrix ?
Passing the matrix to numpy is straightforward (and can even be made without making a copy):
http://rpy.sourceforge.net/rpy2/doc-2.1/html/numpy.html#from-rpy2-to-numpy
This should beat in both simplicity and efficiency the suggestion of going through text representation of numerical data in flat files as a way to exchange data.
You seem to be working with bioconductor classes, and might be interested in the following:
http://pypi.python.org/pypi/rpy2-bioconductor-extensions/

Categories