I have a large numpy array and I'd like to dump it into a file using ASCII format. I would like to specify the format. This works:
import numpy
a = numpy.random.rand(5)
fmt = "{:.11e}\n"
with open("out.dat", "w") as f:
for item in a:
f.write(fmt.format(item))
but is slow because I manually loop over all entries of a. Is there a way to handle this in only one write operation?
Provided RAM is not an issue, you can try formatting the array to a string and then exporting it:
a_str = np.array2string(a, formatter={'float_kind':lambda x: "%.11f" % x}, separator='\n', threshold=np.inf)[1:-1]
with open("out.dat", "w") as f:
f.write(a_str)
Im converting my code from matlab to python. In matlab to save my variables i usually do this
for i =1:4
for j=1:3
save(['data_',int2str(i),'_' int2str(j)'.mat'],'var1', 'var2' )
end
end
so in the end i have files like: data_1_1, data_1_2 etc
How can i modify the code below to have a similar naming convention
import pickle
Tests = 5
data = {}
for i in range(Tests):
for j in range(4)
data['r'] = 5
data['m'] = 500
data['n'] = 500
data['X'] = np.random.rand(data['m'],data['n'])
data['Y'] = np.random.rand(data['m'],data['n'])
with open('data{}.pickle'.format(i), 'wb') as f:
pickle.dump(data, f)
I would like to save my pickled as say data_1_2 etc
Help! im new to python. thanks!
I'm a python user for scientific computation. Now, I have some numpy arrays, and the size of each of them is huge. Thus, I can not short all of them in the memory at the same time. I want to save the arrays in the disk and read them one by one at each time to do some calculation. How to perform this process pythonicly?
I know if all the data are stored in the memory, I can create a list named array_list like this:
array_list = []
for i0 in range(n_array):
t_ayyay = do_some_calculate()
array_list.append(t_ayyay)
and when I want to use them:
for i0 in range(n_array):
t_ayyay = array_list[i0]
# do something.
How to save array_list in the disk, and I can read each object using the index without load all of them in the memory?
Thanks.
Pickle is your friend for serialization.
import pickle
some_list = [....]
pickle_out = open("some_list.pickle", "w")
pickle.dump(some_list, pickle_out)
pickle_out.close()
to open up your saved array
pickle_in = open("some_list.pickle", "r")
some_list = pickle.open(pickle_in)
I am working with datasets stored in large text files. For the analysis I am carrying out, I open the files, extract parts of the dataset and compare the extracted subsets. My code works like so:
from math import ceil
with open("seqs.txt","rb") as f:
f = f.readlines()
assert type(f) == list, "ERROR: file object not converted to list"
fives = int( ceil(0.05*len(f)) )
thirds = int( ceil(len(f)/3) )
## top/bottom 5% of dataset
low_5=f[0:fives]
top_5=f[-fives:]
## top/bottom 1/3 of dataset
low_33=f[0:thirds]
top_33=f[-thirds:]
## Write lists to file
# top-5
with open("high-5.out","w") as outfile1:
for i in top_5:
outfile1.write("%s" %i)
# low-5
with open("low-5.out","w") as outfile2:
for i in low_5:
outfile2.write("%s" %i)
# top-33
with open("high-33.out","w") as outfile3:
for i in top_33:
outfile3.write("%s" %i)
# low-33
with open("low-33.out","w") as outfile4:
for i in low_33:
outfile4.write("%s" %i)
I am trying to find a more clever way of automating the process of writing the lists out to files. In this case there are only four, but in the future cases where I may end up with as many as 15-25 lists I would some function to take care of this. I wrote the following:
def write_to_file(*args):
for i in args:
with open(".out", "w") as outfile:
outfile.write("%s" %i)
but the resulting file only contains the final list when I call the function like so:
write_to_file(low_33,low_5,top_33,top_5)
I understand that I have to define an output file for each list (which I am not doing in the function above), I'm just not sure how to implement this. Any ideas?
Make your variable names match your filenames and then use a dictionary to hold them instead of keeping them in the global namespace:
data = {'high_5': # data
,'low_5': # data
,'high_33': # data
,'low_33': # data}
for key in data:
with open('{}.out'.format(key), 'w') as output:
for i in data[key]:
output.write(i)
Keeps your data in a single easy to use place, and assuming you want to apply the same actions to them you can continue using the same paradigm.
As mentioned by PM2Ring below, it would be advisable to use underscores (as you do in the variable names) instead of dashes(as you do in the filenames) as by doing so you can pass the dictionary keys as keyword arguments into a writing function:
write_to_file(**data)
This would equate to:
write_to_file(low_5=f[:fives], high_5=f[-fives:],...) # and the rest of the data
From this you could use one of the functions defined by the other answers.
You could have one output file per argument by incrementing a counter for each argument. For example:
def write_to_file(*args):
for index, i in enumerate(args):
with open("{}.out".format(index+1), "w") as outfile:
outfile.write("%s" %i)
The example above will create output files "1.out", "2.out", "3.out", and "4.out".
Alternatively, if you had specific names you wanted to use (as in your original code), you could do something like the following:
def write_to_file(args):
for name, data in args:
with open("{}.out".format(name), "w") as outfile:
outfile.write("%s" % data)
args = [('low-33', low_33), ('low-5', low_5), ('high-33', top_33), ('high-5', top_5)]
write_to_file(args)
which would create output files "low-33.out", "low-5.out", "high-33.out", and "high-5.out".
Don't try to be clever. Instead aim to have your code readable, easy to understand. You can group repeated code into a function, for example:
from math import ceil
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write('{}'.format(item))
with open('data.txt') as f:
numbers = list(f)
five_percent = int(len(numbers) * 0.05)
thirty_three_percent = int(ceil(len(numbers) / 3.0))
# Why not: thirty_three_percent = int(len(numbers) * 0.33)
save_to_file(numbers[:five_percent], 'low-5.out')
save_to_file(numbers[-five_percent:], 'high-5.out')
save_to_file(numbers[:thirty_three_percent], 'low-33.out')
save_to_file(numbers[-thirty_three_percent:], 'high-33.out')
Update
If you have quite a number of lists to write, then it makes sense to use a loop. I suggest to have two functions: save_top_n_percent and save_low_n_percent to help with the job. They contain a little duplicated code, but by separating them into two functions, it is clearer and easier to understand.
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write(item)
def save_top_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[-n_percent:], 'top-{}.out'.format(n))
def save_low_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[:n_percent], 'low-{}.out'.format(n))
with open('data.txt') as f:
numbers = list(f)
for n_percent in [5, 33]:
save_top_n_percent(n_percent, numbers)
save_low_n_percent(n_percent, numbers)
On this line you are opening up a file called .out each time and writing to it.
with open(".out", "w") as outfile:
You need to make the ".out" unique for each i in args. you can achieve this by passing in a list as the args and the list will contain the file name and data.
def write_to_file(*args):
for i in args:
with open("%s.out" % i[0], "w") as outfile:
outfile.write("%s" % i[1])
And pass in arguments like so...
write_to_file(["low_33",low_33],["low_5",low_5],["top_33",top_33],["top_5",top_5])
You are creating a file called '.out' and overwriting it each time.
def write_to_file(*args):
for i in args:
filename = i + ".out"
contents = globals()[i]
with open(".out", "w") as outfile:
outfile.write("%s" %contents)
write_to_file("low_33", "low_5", "top_33", "top_5")
https://stackoverflow.com/a/6504497/3583980 (variable name from a string)
This will create low_33.out, low_5.out, top_33.out, top_5.out and their contents will be the lists stored in these variables.
I need to write a long list of ints and floats with Python the same way fwrite would do in C - in a binary form.
This is necessary to create input files for another piece of code I am working with.
What is the best way to do this?
You can do this quite simply with the struct module.
For example, to write a list of 32-bit integers in binary:
import struct
ints = [10,50,100,2500,256]
with open('output', 'w') as fh:
data = struct.pack('i' * len(ints), *ints)
fh.write(data)
Will write '\n\x00\x00\x002\x00\x00\x00d\x00\x00\x00\xc4\t\x00\x00\x00\x01\x00\x00'
Have a look at numpy: numpy tofile:
With the array-method 'tofile' you can write binary-data:
# define output-format
numdtype = num.dtype('2f')
# write data
myarray.tofile('filename', numdtype)
Another way is to use memmaps: numpy memmaps
# create memmap
data = num.memmap('filename', mode='w+', dtype=num.float, offset=myoffset, shape=(my_shape), order='C')
# put some data into in:
data[1:10] = num.random.rand(9)
# flush to disk:
data.flush()
del data