I'm looking up in Numpy docs how to save an array, the provided this:
from tempfile import TemporaryFile
outfile = TemporaryFile()
np.save(outfile, array)
I tried doing it without the tempfile thing and it worked, so, I'm wondering what's the point of that?
Related
Sorry for the novice question. I'm just starting to learn Python and I don't have any coding background. I already ended up doing this process manually, but I'm curious what the automated process would look like and would like to learn from the example.
So I have a folder of 50 npz files. I need to pull a specific 29x5 array from each npz file and concatenate all of it into a single csv. This is what I did manually:
import numpy as np
import os
os.chdir('D:/Documents/WorkingDir')
data1=np.load('file1.npz', mmap_mode='r')
array1 = data1.f.array
#data2=etc.
#array2=etc.
grandarray = np.concatenate((array1,array2), axis = 0)
np.savetext('grandarray.csv', grandarrray, delimiter=",")
I gather you can use glob to get a list of all files in the same folder with the .npz extension, but I can't figure out how to turn my manual process into a script and automate it. I'll gladly take links to tutorial websites that can get me going in this direction as well. Thank you all for your time.
You need to use iteration. A loop would be fine, but I find a list comprehension to be acceptable here.
import glob
import numpy as np
import os
os.chdir('D:/Documents/WorkingDir')
filenames = glob.glob('*.npz')
data_arrays = [np.load(filename, mmap_mode='r').f.array for filename in filenames]
grandarray = np.concatenate(data_arrays, axis = 0)
np.savetext('grandarray.csv', grandarrray, delimiter=",")
I am trying to concatenate many numpy arrays, I put each array in one file, In fact the problem that I have a lot of files, Memory can't support to create a big array Data_Array = np.zeros((1000000,7000)), where I will put all my files. So, I found in this question Combining NumPy arrays that I can use np.concatenate:
file1= np.load('file1_Path.npy')
file2= np.load('file2_Path.npy')
file3= np.load('file3_Path.npy')
file4= np.load('file4_Path.npy')
dataArray=np.concatenate((file1, file2, file3, file4), axis=0)
test= dataArray.shape
print(test)
print (dataArray)
print (dataArray.shape)
plt.plot(dataArray.T)
plt.show()
This way gives me a very good result, but now, I need to replace file1, file2, file3, file4 by the path to the folder of my files:
import matplotlib.pyplot as plt
import numpy as np
import glob
import os, sys
fpath ="Path_To_Big_File"
npyfilespath =r'Path_To_Many_Numpy_Files'
os.chdir(npyfilespath)
npfiles= glob.glob("*.npy")
npfiles.sort()
for i,npfile in enumerate(npfiles):
dataArray=np.concatenate(npfile, axis=0)
np.save(fpath, all_arrays)
It gives me this error:
np.concatenate(npfile, axis=0)
ValueError: zero-dimensional arrays cannot be concatenated
Could you please help me to make this method np.concatenate works?
If you wish to use large arrays, just use np.memmap instead of loading the data into memory. The advantage of memmap is that data is always saved to disk when necessary. For example, you can create a memory mapped array in the following way:
import numpy as np
a=np.memmap('myFile',dtype=np.int,mode='w+',shape=(1000000,8000))
You can then use 'a' as a normal numpy array.
The limit is then your hard disk ! This creates a file on your hard disk that you can read later. You just change mode to 'r' and read data from the array.
More info about memmap here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
In order to fill that array from npy files of shape (1,8000), just write:
for i,npFile in enumerate(npfFiles):
a[i,:]=np.load(npFile)
a.flush()
The flush method insures everything has been written on disk
I have 166600 numpy files, I want to put them into one numpy file: file by file,
I mean that the creation of my new big file must from the begin: the first file must be read and written in the file, so the big file contains only the first file, after that I need to read and write the second file, so the big file contains the first two files.
import matplotlib.pyplot as plt
import numpy as np
import glob
import os, sys
fpath ="path_Of_my_final_Big_File"
npyfilespath ="path_of_my_numpy_files"
os.chdir(npyfilespath)
npfiles= glob.glob("*.npy")
npfiles.sort()
all_arrays = np.zeros((166601,8000))
for i,npfile in enumerate(npfiles):
all_arrays[i]=np.load(os.path.join(npyfilespath, npfile))
np.save(fpath, all_arrays)
If I understand your questions correctly, you can use numpy.concatenate for this:
import matplotlib.pyplot as plt
import numpy as np
import glob
import os, sys
fpath ="path_Of_my_final_Big_File"
npyfilespath ="path_of_my_numpy_files"
os.chdir(npyfilespath)
npfiles= glob.glob("*.npy")
npfiles.sort()
all_arrays = []
for i, npfile in enumerate(npfiles):
all_arrays.append(np.load(os.path.join(npyfilespath, npfile)))
np.save(fpath, np.concatenate(all_arrays))
Depending on the shape of your arrays and the intended concatenation, you might need to specify the axis parameter of concatenate.
I have part of an API in Flask that currently returns a Numpy array in Json, I need to offer the option to return as a CSV rather than as Json.
The only way I have successfully done this is to save the Numpy array as a CSV using numpy.savetxt then serve that file. I have found I can not leave the file behind like this How can I generate file on the fly and delete it after download? however that still feels 'kludgy'
Is there a way to return a Numpy array as a CSV without going via the file?
Yes you can,
#app.route('/download')
def download():
csv = convert_numpy_array_to_csv(your_numpy_array)
response = make_response(csv)
response.headers["Content-Disposition"] = "attachment; filename=array.csv"
return response
You don't have to save the csv to a file. In case you can't avoid creating files, you can create the files in your temp folders (could be obtained by import tempfile;tempfile.gettempdir()). Doing so would automatically remove the files every time your system is restarted
Well, if you want to use the numpy.savetxt function, then you can just use cStringIO:
from cStringIO import StringIO
output = StringIO()
numpy.savetxt(output, numpy_array)
csv_string = output.getvalue()
For python 3 you would import StringIO or BytesIO from the io module instead.
I am writing a function that converts a 2d python array into a matlab file. Here is my code so far...
def save_array(arr,fname):
import scipy.io
import numpy
out_dict={}
out_dict[fname]=arr
scipy.io.savemat(fname.mat,out_dict)`
I want fname to be a string, but I am not sure how I can get the savemat part to work.
import scipy.io
import numpy as np
def save_array(arr, arrname, fname):
"""
Save an array to a .mat file
Inputs:
arr: ndarray to save
arrname: name to save the array as (string)
fname: .mat filename (string)
"""
out_dict={arrname: arr}
scipy.io.savemat(fname,out_dict)
save_array(np.array([1,2,3]), 'arr', 'test.mat')
Might be worth doing a python tutorial or two. This is very basic stuff!