I have a large binary file that I want to read as a 48x1414339 array. I read it in this way:
f = open(fname, 'rb')
s = f.read()
import array
a = array.array('f',s)
But this gives me a 1D string. Is there a way to keep the columns distinct?
Wrap it in a class and implement e.g. __getitem__() to convert an index pair to the linear index. Using separate arrays is probably just adding overhead unless you plan to use rows separately.
Related
How can I save a ragged tensor as a file on my disk and then reuse it in calculations opening it from the disk? Tensor consists of a nested array of numbers with 4 signs after the point. (I'm working in a Google Colab and using Google disk to save my files, I know only Python a little bit).
Here is my data:
I take this column "sim_fasttex" which is a list of lists of different length, reshape each of them according to "h" and "w" and collect all these matrices in one list, so finally it's going to be a ragged tensor of the shape (number of rows in initial table, variable length of a matrix, variable heigth of a matrix)
I don't know your context but,
You can save any object to a file using the pickle module. Like this
import pickle
the_object = object
with open("a_file_name.pkl", "wb") as f:
pickle.dump(the_object, f)
And later you can load that same object:
import pickle
with open("a_file_name.pkl", "rb") as f:
the_object = pickle.load(f)
I would like to combine two FITS files, by taking a slice out of one and inserting it into the other. The slice would be based on an angle measured from the centre pixel, see example image below:
Can this be done using Astropy? There are many questions on combining FITS on the site, but most of these are related to simply adding two files together, rather that combining segments like this.
Here is one recommended approach:
1.Read in your two files
Assuming the data is in an ImageHDU data array..
from astropy.io import fits
# read the numpy arrays out of the files
# assuming they contain ImageHDUs
data1 = fits.getdata('file1.fits')
data2 = fits.getdata('file2.fits')
2. Cut out the sections and put them into a new numpy array
Build up indices1 & indices2 for the desired sections in the new file... A simple numpy index to fill in the missing section into a new numpy array.
After being inspired by https://stackoverflow.com/a/18354475/15531842
The sector_mask function defined in the answer to get indices for each array using angular slices.
mask = sector_mask(data1.shape, centre=(53,38), radius=100, angle_range=(280,340))
indices1 = ~mask
indices2 = mask
Then these indices can be used to transfer the data into a new array.
import numpy as np
newdata = np.zeros_like(data1)
newdata[indices1] = data1[indices1]
newdata[indices2] = data2[indices2]
If the coordinate system is well known then it may be possible to use astropy's Cutout2D class, although I was not able to figure out how to fully use it. It wasn't clear if it could do an angular slice from the example given. See astropy example https://docs.astropy.org/en/stable/nddata/utils.html#d-cutout-using-an-angular-size
3a. Then write out the new array as a new file.
If special header information is not needed in the new file. Then the numpy array with the new image can be written out to a FITS file with one line of astropy code.
# this is an easy way to write a numpy array to FITS file
# no header information is carried over
fits.writeto('file_combined.fits', data=newdata)
3b. Carry the FITS header information over to the new file
If there is a desire to carry over header information then an ImageHDU can be built from the numpy array and include the desired header as a dictionary.
img_hdu = fits.ImageHDU(data=newdata, header=my_header_dict)
hdu_list = fits.HDUList()
hdu_list.append(fits.PrimaryHDU())
hdu_list.append(img_hdu)
hdu_list.writeto('file_combined.fits')
I am trying to port this bit of matlab code to python
matlab
function write_file(im,name)
fp = fopen(name,'wb');
M = size(im);
fwrite(fp,[M(1) M(2) M(3)],'int');
fwrite(fp,im(:),'float');
fclose(fp);
where im is a 3D matrix. As far as I understand, the function first writes a binary file with a header row containing the matrix size. The header is made of 3 integers. Then, the im is written as a single column of floats. In matlab this takes few seconds for a file of 150MB.
python
import struct
import numpy as np
def write_image(im, file_name):
with open(file_name, 'wb') as f:
l = im.shape[0]*im.shape[1]*im.shape[2]
header = np.array([im.shape[0], im.shape[1], im.shape[2]])
header_bin = struct.pack("I"*3, *header)
f.write(header_bin)
im_bin = struct.pack("f"*l,*np.reshape(im, (l,1), order='F'))
f.write(im_bin)
f.close()
where im is a numpy array. This code works well as I compared with the binary returned by matlab and they are the same. However, for the 150MB file, it takes several seconds and tends to drain all the memory (in the image linked I stopped the execution to avoid it, but you can see how it builds up!).
This does not make sense to me as I am running the function on a 15GB of RAM PC. How come a 150MB file processing requires so much memory?
I'd happy to use a different method, as far as it is possible to have two formats for the header and the data column.
There is no need to use struct to save your array. numpy.ndarray has a convenience method for saving itself in binary mode: ndarray.tofile. The following should be much more efficient than creating a gigantic string with the same number of elements as your array:
def write_image(im, file_name):
with open(file_name, 'wb') as f:
np.array(im.shape).tofile(f)
im.T.tofile(f)
tofile always saves in row-major C order, while MATLAB uses column-major Fortran order. The simplest way to get around this is to save the transpose of the array. In general, ndarray.T should create a view (wrapper object pointing to the same underlying data) instead of a copy, so your memory usage should not increase noticeably from this operation.
I want to output the numeric result from image processing to the .xls file in one line exactly in cells horizontally, does anybody can advice what Python module to use and what code to add, please? In other words, how to arrange digits from an array and put them exactly in Excel cells horizontally?
Code fragment:
def fouriertransform(self): #function for FT computation
for filename in glob.iglob ('*.tif'):
imgfourier = mahotas.imread (filename) #read the image
arrayfourier = numpy.array([imgfourier])#make an array
# Take the fourier transform of the image.
F1 = fftpack.fft2(imgfourier)
# Now shift so that low spatial frequencies are in the center.
F2 = fftpack.fftshift(F1)
# the 2D power spectrum is:
psd2D = np.abs(F2)**2
print psd2D
f.write(str(psd2D))#write to file
This should be a comment, but I don't have enough rep.
What other people say
It looks like this is a duplicate of Python Writing a numpy array to a CSV File, which uses numpy.savetxt to generate a csv file (which excel can read)
Example
Numpy is probably the easiest way to go in my opinion. You could get fancy and use the np.flatten module. After that, it would be as simple as saving the array with a comma separating each element. Something like this (tested!) is even simpler
import numpy as np
arr = np.ones((3,3))
np.savetxt("test.csv",arr,delimiter=',',newline=',')
Note this is a small hack, since newlines are treated as commas. This makes "test.csv" look like:
1,1,1,1,1,1,1,1,1, # there are 9 ones there, I promise! 3x3
Note there is a trailing comma. I was able to open this in excel no problem, voila!
I am currently trying to read an fortran file with python with the following technique
with open(myfile, "rb") as f:
for i in range (0, n):
s = struct.unpack('=f', f.read(4))
mylist.append(s[0])
But it is very slow for large arrays. Is there a way to read the content of the entire loop in one time and put it to mylist in order to avoid a conversion/append of each item one by one?
Thank you very much.
This is what the array module is for:
a = array.array('f')
a.fromfile(f, n)
Now you can use the array object like a normal sequence type. You can also convert it to a list if you need to with tolist.