Save file to pickel with nested loop indeces as filen names - python

Im converting my code from matlab to python. In matlab to save my variables i usually do this
for i =1:4
for j=1:3
save(['data_',int2str(i),'_' int2str(j)'.mat'],'var1', 'var2' )
end
end
so in the end i have files like: data_1_1, data_1_2 etc
How can i modify the code below to have a similar naming convention
import pickle
Tests = 5
data = {}
for i in range(Tests):
for j in range(4)
data['r'] = 5
data['m'] = 500
data['n'] = 500
data['X'] = np.random.rand(data['m'],data['n'])
data['Y'] = np.random.rand(data['m'],data['n'])
with open('data{}.pickle'.format(i), 'wb') as f:
pickle.dump(data, f)
I would like to save my pickled as say data_1_2 etc
Help! im new to python. thanks!

Related

Saving multiple arrays in a text file (python)

I have 500 arrays named A0 to A499 (All arrays are of different sizes). I want to save these arrays in a text file. Is there any way to get my job done? Also, if possible, I would like to keep their names (A0,A1 etc) so that it is easier to recall later.
I am able to save a single array using np.savetxt but i have no idea how to do it for these 500 arrays.
Thank you very much.
for i in range(500):
exec("A%s=SMtoM(outputS(115,15,0.62))"%(i))
this is how I made my 500 Arrays!
Using pickle :
import pickle
def load(filename):
with open(filename, 'rb') as f:
my_lists = pickle.load(f)
return my_lists
def save(filename, my_lists):
with open(filename, 'wb') as f:
pickle.dump(my_lists)
# Where my_lists = [A0, A1, ..., A499]
Try:
outpt= open("file.txt", "w")
for array in arraylist:
# write each array as a line
outpt.write(arraylist)
outpt.write("\n")
outpt.close()
Pickle works even better as mentioned above.

How to read in multiple documents with same code?

So I have a couple of documents, of which each has a x and y coordinate (among other stuff). I wrote some code which is able to filter out said x and y coordinates and store them into float variables.
Now Ideally I'd want to find a way to run the same code on all documents I have (number not fixed, but let's say 3 for now), extract x and y coordinates of each document and calculate an average of these 3 x-values and 3 y-values.
How would I approach this? Never done before.
I successfully created the code to extract the relevant data from 1 file.
Also note: In reality each file has more than just 1 set of x and y coordinates but this does not matter for the problem discussed at hand.
I'm just saying that so that the code does not confuse you.
with open('TestData.txt', 'r' ) as f:
full_array = f.readlines()
del full_array[1:31]
del full_array[len(full_array)-4:len(full_array)]
single_line = full_array[1].split(", ")
x_coord = float(single_line[0].replace("1 Location: ",""))
y_coord = float(single_line[1])
size = float(single_line[3].replace("Size: ",""))
#Remove unecessary stuff
category= single_line[6].replace(" Type: Class: 1D Descr: None","")
In the end I'd like to not have to write the same code for each file another time, especially since the amount of files may vary. Now I have 3 files which equals to 3 sets of coordinates. But on another day I might have 5 for example.
Use os.walk to find the files that you want. Then for each file do you calculation.
https://docs.python.org/2/library/os.html#os.walk
First of all create a method to read a file via it's file name and do the parsing in your way. Now iterate through the directory,I guess files are in the same directory.
Here is the basic code:
import os
def readFile(filename):
try:
with open(filename, 'r') as file:
data = file.read()
return data
except:
return ""
for filename in os.listdir('C:\\Users\\UserName\\Documents'):
#print(filename)
data=readFile( filename)
print(data)
#parse here
#do the calculation here

Repeating code when sorting information from a txt-file - Python

I'm having some problem with avoiding my code to repeat itself, like the title says, when I import data from a txt-file. My question is, if there is a smarter way to loop the function. I'm still very new to python in general so I don't have good knowledge in this area.
The code that I'm using is the following
with open("fundamenta.txt") as fundamenta:
fundamenta_list = []
for row in fundamenta:
info_1 = row.strip()
fundamenta_list.append(info_1)
namerow_1 = fundamenta_list[1]
sol_1 = fundamenta_list[2]
pe_1 = fundamenta_list[3]
ps_1 = fundamenta_list[4]
namerow_2 = fundamenta_list[5]
sol_2 = fundamenta_list[6]
pe_2 = fundamenta_list[7]
ps_2 = fundamenta_list[8]
namerow_3 = fundamenta_list[9]
sol_3 = fundamenta_list[10]
pe_3 = fundamenta_list[11]
ps_3 = fundamenta_list[12]
So when the code is reading from "fundamenta_list" how do I change to prevent code repetition?
It looks to me that your input file has records each as a block of 4 rows, so in turn is namerow, sol, pe, ps, and you'll be creating objects that take these 4 fields. Assuming your object is called MyObject, you can do something like:
with open("test.data") as f:
objects = []
while f:
try:
(namerow, sol, pe, ps) = next(f).strip(), next(f).strip(), next(f).strip(), next(f).strip()
objects.append(MyObject(namerow, sol, pe, ps))
except:
break
then you can access your objects as objects[0] etc.
You could even make it into a function returning the list of objects like in Moyote's answer.
If I understood your question correctly, you may want to make a function out of your code, so you can avoid repeating the same code.
You can do this:
def read_file_and_save_to_list(file_name):
with open(file_name) as f:
list_to_return = []
for row in f:
list_to_return.append(row.strip())
return list_to_return
Then afterwards you can call the function like this:
fundamenta_list = read_file_and_save_to_list("fundamenta.txt")

Writing multiple lists to multiple output files

I am working with datasets stored in large text files. For the analysis I am carrying out, I open the files, extract parts of the dataset and compare the extracted subsets. My code works like so:
from math import ceil
with open("seqs.txt","rb") as f:
f = f.readlines()
assert type(f) == list, "ERROR: file object not converted to list"
fives = int( ceil(0.05*len(f)) )
thirds = int( ceil(len(f)/3) )
## top/bottom 5% of dataset
low_5=f[0:fives]
top_5=f[-fives:]
## top/bottom 1/3 of dataset
low_33=f[0:thirds]
top_33=f[-thirds:]
## Write lists to file
# top-5
with open("high-5.out","w") as outfile1:
for i in top_5:
outfile1.write("%s" %i)
# low-5
with open("low-5.out","w") as outfile2:
for i in low_5:
outfile2.write("%s" %i)
# top-33
with open("high-33.out","w") as outfile3:
for i in top_33:
outfile3.write("%s" %i)
# low-33
with open("low-33.out","w") as outfile4:
for i in low_33:
outfile4.write("%s" %i)
I am trying to find a more clever way of automating the process of writing the lists out to files. In this case there are only four, but in the future cases where I may end up with as many as 15-25 lists I would some function to take care of this. I wrote the following:
def write_to_file(*args):
for i in args:
with open(".out", "w") as outfile:
outfile.write("%s" %i)
but the resulting file only contains the final list when I call the function like so:
write_to_file(low_33,low_5,top_33,top_5)
I understand that I have to define an output file for each list (which I am not doing in the function above), I'm just not sure how to implement this. Any ideas?
Make your variable names match your filenames and then use a dictionary to hold them instead of keeping them in the global namespace:
data = {'high_5': # data
,'low_5': # data
,'high_33': # data
,'low_33': # data}
for key in data:
with open('{}.out'.format(key), 'w') as output:
for i in data[key]:
output.write(i)
Keeps your data in a single easy to use place, and assuming you want to apply the same actions to them you can continue using the same paradigm.
As mentioned by PM2Ring below, it would be advisable to use underscores (as you do in the variable names) instead of dashes(as you do in the filenames) as by doing so you can pass the dictionary keys as keyword arguments into a writing function:
write_to_file(**data)
This would equate to:
write_to_file(low_5=f[:fives], high_5=f[-fives:],...) # and the rest of the data
From this you could use one of the functions defined by the other answers.
You could have one output file per argument by incrementing a counter for each argument. For example:
def write_to_file(*args):
for index, i in enumerate(args):
with open("{}.out".format(index+1), "w") as outfile:
outfile.write("%s" %i)
The example above will create output files "1.out", "2.out", "3.out", and "4.out".
Alternatively, if you had specific names you wanted to use (as in your original code), you could do something like the following:
def write_to_file(args):
for name, data in args:
with open("{}.out".format(name), "w") as outfile:
outfile.write("%s" % data)
args = [('low-33', low_33), ('low-5', low_5), ('high-33', top_33), ('high-5', top_5)]
write_to_file(args)
which would create output files "low-33.out", "low-5.out", "high-33.out", and "high-5.out".
Don't try to be clever. Instead aim to have your code readable, easy to understand. You can group repeated code into a function, for example:
from math import ceil
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write('{}'.format(item))
with open('data.txt') as f:
numbers = list(f)
five_percent = int(len(numbers) * 0.05)
thirty_three_percent = int(ceil(len(numbers) / 3.0))
# Why not: thirty_three_percent = int(len(numbers) * 0.33)
save_to_file(numbers[:five_percent], 'low-5.out')
save_to_file(numbers[-five_percent:], 'high-5.out')
save_to_file(numbers[:thirty_three_percent], 'low-33.out')
save_to_file(numbers[-thirty_three_percent:], 'high-33.out')
Update
If you have quite a number of lists to write, then it makes sense to use a loop. I suggest to have two functions: save_top_n_percent and save_low_n_percent to help with the job. They contain a little duplicated code, but by separating them into two functions, it is clearer and easier to understand.
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write(item)
def save_top_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[-n_percent:], 'top-{}.out'.format(n))
def save_low_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[:n_percent], 'low-{}.out'.format(n))
with open('data.txt') as f:
numbers = list(f)
for n_percent in [5, 33]:
save_top_n_percent(n_percent, numbers)
save_low_n_percent(n_percent, numbers)
On this line you are opening up a file called .out each time and writing to it.
with open(".out", "w") as outfile:
You need to make the ".out" unique for each i in args. you can achieve this by passing in a list as the args and the list will contain the file name and data.
def write_to_file(*args):
for i in args:
with open("%s.out" % i[0], "w") as outfile:
outfile.write("%s" % i[1])
And pass in arguments like so...
write_to_file(["low_33",low_33],["low_5",low_5],["top_33",top_33],["top_5",top_5])
You are creating a file called '.out' and overwriting it each time.
def write_to_file(*args):
for i in args:
filename = i + ".out"
contents = globals()[i]
with open(".out", "w") as outfile:
outfile.write("%s" %contents)
write_to_file("low_33", "low_5", "top_33", "top_5")
https://stackoverflow.com/a/6504497/3583980 (variable name from a string)
This will create low_33.out, low_5.out, top_33.out, top_5.out and their contents will be the lists stored in these variables.

How to make file name a variable using np.savetxt in python?

Is it possible to make the output filename a variable using np.savetxt? I have multiple input file from where I will read and perform some calculations and output the results in a file. Right now I am changing the file name each time for different output, But is there a way to do it automatically? The code I used is as below:
np.savetxt('ES-0.dat', np.c_[strain_percent, es_avg, es_std])
I would like to change the file name to ES-25.dat, ES 50.dat, ES-75.dat ....etc. This is also dependent upon the input file which I read like this:
flistC11 = glob.glob('ES-0')
Is there also a way to automatically change the input file to ES-25, ES-50, ES-75....etc?
I tried using loops but both the input and output has to be inside ' ' which does not allow me to make it a variable. Any idea how can I solve this problem? My work will be much easier then.
Added information after Saullo Castro's answer:
The file that I'm reading (ES*) consists two simple columns like this:
200 7.94
200 6.55
200 6.01
200 7.64
200 6.33
200 7.96
200 7.92
The whole script is as below:
import numpy as np
import glob
import sys
flistC11 = glob.glob('ES-s*')
#%strain
fdata4 = []
for fname in flistC11:
load = np.loadtxt(fname)
fdata4.append(load[:,0]) #change to 0=strain or 1=%ES
fdata_arry4=np.array(fdata4)
print fdata_arry4
strain=np.mean(fdata_arry4[0,:])
strain_percent = strain/10
print strain_percent
#ES
fdata5 = []
for fname in flistC11:
load = np.loadtxt(fname)
fdata5.append(load[:,1]) #change to 0=strain or 1=%ES
fdata_arry5=np.array(fdata5)
print fdata_arry5
es_avg=np.mean(fdata_arry5[0,:])
es_std=np.std(fdata_arry5[0,:])
print es_avg
print es_std
np.savetxt('{0}.dat'.format(fname), np.c_[strain_percent,es_avg,es_std])
You can do something like:
flistC11 = glob.glob('ES*')
for fname in flistC11:
# ...something...
np.savetxt('{0}.dat'.format(fname), np.c_[strain_percent, es_avg, es_std])
note that using ES* will tell glob() to return the names of all files beggining with ES.
EDIT:
Based on your comments it seems you actually want something like this:
import glob
import numpy as np
flistC11 = glob.glob('ES-s*')
for fname in flistC11:
strains, stresses = np.loadtxt(fname, unpack=True)
strain = np.mean(strains)
strain_percent = strain/10
print fname, strain_percent
es_avg = np.mean(stresses)
es_std = np.std(stresses)
print fname, es_avg, es_std
np.savetxt('{0}.dat'.format(fname), np.c_[strain_percent, es_avg, es_std])
It's not entirely clear where your error is (where is line 15?), but lets assume it is in the load. You have
fdata4 = []
for fname in flistC11:
load = np.loadtxt(fname)
fdata4.append(load[:,0]) #change to 0=strain or 1=%ES
I'd suggest changing this to:
fdata4 = []
for fname in flistC11:
print fname # check that the names make sense
load = np.loadtxt(fname)
print load.shape # check that the shape is as expected
# maybe print more of 'load' here
# I assume you want to collect 'load' from all files, not just the last
fdata4.append(load[:,0]) #change to 0=strain or 1=%ES
print fdata4
In an Ipython shell, I had no problem producing:
In [90]: flistC11=['ES0','ES1','ES2']
In [91]: for fname in flistC11:
np.savetxt('{}.dat'.format(fname), np.arange(10))
....:
In [92]: glob.glob('ES*')
Out[92]: ['ES2.dat', 'ES0.dat', 'ES1.dat']

Categories