np.savetxt isn't storing array information - python

I have the following code:
#Time integration
T=28
AT=5/(1440)
L=T/AT
tr=np.linspace(AT,T,AT) %I set minimum_value to AT,
to avoid a DivisionByZero Error (in function beta_cc(i))
np.savetxt('tiempo_real.csv',tr,delimiter=",")
#Parameters
fcm28=40
beta_cc=0
fcm=0
s=0
# Hardening coeficient (s)
ct=input("Cement Type (1, 2 or 3): ")
print("Cement Type: "+str(ct))
if int(ct)==1:
s=0.2
elif int(ct)==2:
s=0.25
elif int(ct)==3:
s=0.38
else: print("Invalid answer")
# fcm determination
iter=1
maxiter=8065
while iter<maxiter:
iter += 1
beta_cc = np.exp(s*(1-(28/tr))**0.5)
fcm = beta_cc*fcm28
np.savetxt('Fcm_Results.csv',fcm,delimiter=",")
The code runs without errors, and it creates the two desired files, but there is no information stored in neither.
What I would like the np.savetxt to do is to create a .CSV file with the result of fcm at every iteration, (so a 1:8064 array)
Instead of the while-loop, I had previously tried using a For-loop, but as the timestep is a float, I had some problems with it.
Thank you very much.
PS. Not sure if I should mention: I used Python3 on Ubuntu.

If anyone has the same issue, I solved this by changing the loop to a FOR loop, appending the iterative values of the functions (beta_cc & fcm) in an array, and using the savetxt command.
iteration=0
maxiteration=8064
fcmM1=[]
tiemporeal=[]
for i in range(iterat,maxiter):
def beta_cc(i):
return np.exp(s*1-(28/tr)**0.5))
def fcm(i):
return beta_cc(i)**fcm28
tr=tr+AT
fcmM1.append(fcm(i))
tiemporeal.append(tr)
np.savetxt('M1_Resultados_fcm.csv',fcmM1,delimiter=",",header="Fcm",fmt="%s")

Related

google tensor flow crash course. Issues with REPRESENTATION:Programming exercises Task 2: Make Better Use of Latitude

Hi got into another roadblock in tensorflow crashcourse...at the representation programming excercises at this page.
https://developers.google.com/…/repres…/programming-exercise
I'm at Task 2: Make Better Use of Latitude
seems I narrowed the issue to when I convert the raw latitude data into "buckets" or ranges which will be represented as 1 or zero in my feature. The actual code and issue I have is in the paste bin. Any advice would be great! thanks!
https://pastebin.com/xvV2A9Ac
this is to convert the raw latitude data in my pandas dictionary into "buckets" or ranges as google calls them.
LATITUDE_RANGES = zip(xrange(32, 44), xrange(33, 45))
the above code I changed and replaced xrange with just range since xrange is already deprecated python3.
could this be the problem? using range instead of xrange? see below for my conundrum.
def select_and_transform_features(source_df):
selected_examples = pd.DataFrame()
selected_examples["median_income"] = source_df["median_income"]
for r in LATITUDE_RANGES:
selected_examples["latitude_%d_to_%d" % r] = source_df["latitude"].apply(
lambda l: 1.0 if l >= r[0] and l < r[1] else 0.0)
return selected_examples
The next two are to run the above function and convert may exiting training and validation data sets into ranges or buckets for latitude
selected_training_examples = select_and_transform_features(training_examples)
selected_validation_examples = select_and_transform_features(validation_examples)
this is the training model
_ = train_model(
learning_rate=0.01,
steps=500,
batch_size=5,
training_examples=selected_training_examples,
training_targets=training_targets,
validation_examples=selected_validation_examples,
validation_targets=validation_targets)
THE PROBLEM:
oki so here is how I understand the problem. When I run the training model it throws this error
ValueError: Feature latitude_32_to_33 is not in features dictionary.
So I called selected_training_examples and selected_validation_examples
here's what I found. If I run
selected_training_examples = select_and_transform_features(training_examples)
then I get the proper data set when I call selected_training_examples which yields all the feature "buckets" including Feature #latitude_32_to_33
but when I run the next function
selected_validation_examples = select_and_transform_features(validation_examples)
it yields no buckets or ranges resulting in the
`ValueError: Feature latitude_32_to_33 is not in features dictionary.`
so I next tried disabling the first function
selected_training_examples = select_and_transform_features(training_examples)
and I just ran the second function
selected_validation_examples = select_and_transform_features(validation_examples)
If I do this, I then get the desired dataset for
selected_validation_examples .
The problem now is running the first function no longer gives me the "buckets" and I'm back to where I began? I guess my question is how are the two functions affecting each other? and preventing the other from giving me the datasets I need? If I run them together?
Thanks in advance!
a python developer gave me the solution so just wanted to share. LATITUDE_RANGES = zip(xrange(32, 44), xrange(33, 45)) can only be used once the way it was written so I placed it inside the succeding def select_and_transform_features(source_df) function which solved the issues. Thanks again everyone.

unclear index error python

This is my third thread in StackOverflow.
I think I already learnt a LOT by reading threads here and clearing my doubts.
I'm trying to transform an excel table, in my own python script. I've done so much, and now that I'm almost finishing the script, I'm getting an Err message that I can't really understand. Here is my code: (I tried to give as much information as possible!)
def _sensitivity_analysis(datasource):
#datasource is a list with data that may be used for HBV_model() function;
datasource_length = len(datasource) #returns tha size of the data time series
sense_param = parameter_vector #collects the parameter data from the global vector (parameter_vector);
sense_index = np.linspace(0, 11, 12) #Vector that reflects the indexes of parameters that must be analyzed (0 - 11)
sense_factor = np.linspace(0.5, 2, 31) #Vecor with the variance factors that multiply the original parameter value;
ns_sense = [] #list that will be filled with Nasch-Sutcliff values (those numbers will be data for sensitivity analysis)
for i in range(sense_factor.shape[0]): #start column loop
ns_sense.append([]) #create column in ns_sense matrix
for j in range(sense_index.shape[0]): #start row loop
aux = sense_factor[i]*sense_param[j] #Multiplies the param[j] value by the factor[i] value
print(i,j,aux) #debug purposes
sense_param[j] = aux #substitutes the original parameter value by the modified one
hbv = _HBV_model(datasource, sense_param) #run the model calculations (works awesomely!)
sqrdiff = _square_diff() #does square-difference calculations for Nasch-Sutcliff;
average = _qcalc_qmed() #does square-difference calculations for Nasch-Sutcliff [2];
nasch = _nasch_sutcliff(sqrdiff, average) #Returns the Nasch-Sutcliff calculation value
ns_sense[i].insert(j, nasch) #insert the value into ns_sense(i, j) for further uses;
sense_param = np.array([np.float64(catchment_area), np.float64(thresh_temp),
np.float64(degreeday_factor), np.float64(field_capacity),
np.float64(shape_coeficient), np.float64(model_paramC),
np.float64(surfaceflow_param), np.float64(thresh_surface_level),
np.float64(interflow_param), np.float64(baseflow_param),
np.float64(percolation_param), np.float64(soilmoist_param)]) #restores sense_param to original values
for i in range(len(datasource)): #HBV_model() transforms original data (index = 5) in a fully calculated data (index 17)
for j in range(12): #in order to return it to original state before a new loop
datasource[i].pop() #data is popped out;
print(ns_sense) #debug purposes
So, when I run _sensitivity_analysis(datasource) I receive this message:
File "<ipython-input-47-c9748eaba818>", line 4, in <module>
aux = sense_factor[i]*sense_param[j]
IndexError: index 3652 is out of bounds for axis 0 with size 31;
I'm totally aware that it is talking about a index that is not accessible as it does not exists.
Explaining my situation, datasource is a list with index [3652]. But I can't see how the console is trying to access index 3652, as I'm not asking it to do so. The only point I'm trying to access such value is in the final loop:
for i in range(len(datasource)):
I'm really lost. I'd really apreciate if you could help me guys! If you need more info, I can give you.
Guess: sense_factor = np.linspace(0.5, 2, 31) has 31 elements - you ask for element 3652 and it naturally blows. i takes this value in final loop. Rewrite final loop as:
for k in range(len(datasource))
for m in range(12):
datasource[k].pop()
However your code has many issues - you should have been not using indexes
at all - instead use for loops directly on the arrays
You reused your variable names, here:
for i in range(sense_factor.shape[0]):
...
for j in range(sense_index.shape[0]):
and then here:
for i in range(len(datasource)):
for j in range(12):
so in aux = sense_factor[i]*sense_param[j], you're using the wrong value of i, and it's basically a fluke that you're not using the wrong value of j too.
Don't reuse variable names in the same scope.

Python input/output optimisation

I think this code takes too long to execute, so maybe there are better ways to do this. I'm not looking for an answer related to parallelising the for loops, or using more than one processor.
What I'm trying to do is to read values from "file" using "np.genfromtxt(file)". I have 209*500*16 of these files. I want to extract the minimum value of the highest 1000 values of the 209 loop, and putting these 500 values in 16 different files. If the files are missing or the data hasn't the adequate size, the info is written to the "missing_all" file.
The questions are:
Is this the best method to open a file?
Is this the best method to write to files?
How can I make this code faster?
Code:
import numpy as np
import os.path
output_filename2 = '/home/missing_all.txt'
target2 = open(output_filename2, 'w')
for w in range(16):
group = 1200 + 50*w
output_filename = '/home/veto_%s.txt' %(group)
target = open(output_filename, 'w')
for z in range(1,501):
sig_b = np.zeros((209*300))
y = 0
for index in range(1,210):
file = '/home/BandNo_%s_%s/%s_209.dat' %(group,z,index)
if not os.path.isfile(file):
sig_b[y:y+300] = 0
y = y + 300
target2.write('%s %s %s\n' % (group,z,index))
continue
data = np.genfromtxt(file)
if (data.shape[0] < 300):
sig_b[y:y+300] = 0
y = y + 300
target2.write('%s %s %s\n' % (group,z,index))
continue
sig_b[y:y+300] = np.sort(data[:,4])[::-1][0:300]
y = y + 300
sig_b = np.sort(sig_b[:])[::-1][0:1000]
target.write('%s\n' % (sig_b[-1]))
Profiler
You can use a profiler to figure out what parts of your script take the most time. This way you know exactly what takes the most time and can optimize those lines instead of blindly trying to optimize your code. The time invested to figure out how the profiler works will pay for itself easily later on.
Some possible slow-downs
Here are some guesses, but they really are only guesses.
You open() only 17 files, so it probably doesn't matter how exactly you do this.
I don't know much about writing-performance. Using file.write() seems fine to me.
genfromtextfile probably takes quite a while (depends on your input files), is loadtxt an alternative for you? The docs states you can use it for data without holes.
Using a binary file format instead of text could speed up reading the file.
You sort your array on every iteration. Is there a way to sort it only at the end?
Usually asking the file system something is not very fast, i.e. os.path.isfile(file) is potentially slow. You could try creating a dict of all the children of the parent directory and use that cached version.
Similarly, if most of your files exist, using exceptions can be faster:
try:
data = np.genfromtxt(file)
except FileNotFoundError: # not sure if this is the correct exception
sig_b[y:y+300] = 0
y += 300
target2.write('%s %s %s\n' % (group,z,index))
continue
I didn't try to understand your code in detail. Maybe you can reduce the necessary work by using a smarter algorithm?
PS: I like that you try to put all equal signs on the same column. Unfortunately here it makes it harder to read your code.

Having an issue with using median function in numpy

I am having an issue with using the median function in numpy. The code used to work on a previous computer but when I tried to run it on my new machine, I got the error "cannot perform reduce with flexible type". In order to try to fix this, I attempted to use the map() function to make sure my list was a floating point and got this error message: could not convert string to float: .
Do some more attempts at debugging, it seems that my issue is with my splitting of the lines in my input file. The lines are of the form: 2456893.248202,4.490 and I want to split on the ",". However, when I print out the list for the second column of that line, I get
4
.
4
9
0
so it seems to somehow be splitting each character or something though I'm not sure how. The relevant section of code is below, I appreciate any thoughts or ideas and thanks in advance.
def curve_split(fn):
with open(fn) as f:
for line in f:
line = line.strip()
time,lc = line.split(",")
#debugging stuff
g=open('test.txt','w')
l1=map(lambda x:x+'\n',lc)
g.writelines(l1)
g.close()
#end debugging stuff
return time,lc
if __name__ == '__main__':
# place where I keep the lightcurve files from the image subtraction
dirname = '/home/kuehn/m4/kepler/subtraction/detrending'
files = glob.glob(dirname + '/*lc')
print(len(files))
# in order to create our lightcurve array, we need to know
# the length of one of our lightcurve files
lc0 = curve_split(files[0])
lcarr = np.zeros([len(files),len(lc0)])
# loop through every file
for i,fn in enumerate(files):
time,lc = curve_split(fn)
lc = map(float, lc)
# debugging
print(fn[5:58])
print(lc)
print(time)
# end debugging
lcm = lc/np.median(float(lc))
#lcm = ((lc[qual0]-np.median(lc[qual0]))/
# np.median(lc[qual0]))
lcarr[i] = lcm
print(fn,i,len(files))

Iterate over two big arrays at once

I have to iterate over two arrays which are 1000x1000 big. I already reduced the resolution to 100x100 to make the iteration faster, but it still takes about 15 minutes for ONE array!
So I tried to iterate over both at the same time, for which I found this:
for index, (x,y) in ndenumerate(izip(x_array,y_array)):
but then I get the error:
ValueError: too many values to unpack
Here is my full python code: I hope you can help me make this a lot faster, because this is for my master thesis and in the end I have to run it about a 100 times...
area_length=11
d_circle=(area_length-1)/2
xdis_new=xdis.copy()
ydis_new=ydis.copy()
ie,je=xdis_new.shape
while (np.isnan(np.sum(xdis_new))) and (np.isnan(np.sum(ydis_new))):
xdis_interpolated=xdis_new.copy()
ydis_interpolated=ydis_new.copy()
# itx=np.nditer(xdis_new,flags=['multi_index'])
# for x in itx:
# print 'next x and y'
for index, (x,y) in ndenumerate(izip(xdis_new,ydis_new)):
if np.isnan(x):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=xdis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
xdis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
if np.isnan(y):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=ydis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
ydis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
else:
print 'no need to interpolate'
xdis_new=xdis_interpolated
ydis_new=ydis_interpolated
Some advice:
Profile your code to see what is the slowest part. It may not be the iteration but the computations that need to be done each time.
Reduce function calls as much as possible. Function calls are not for free in Python.
Rewrite the slowest part as a C extension and then call that C function in your Python code (see Extending and Embedding the Python interpreter).
This page has some good advice as well.
You specifically asked for iterating two arrays in a single loop. Here is a way to do that
l1 = ["abc", "def", "hi"]
l2 = ["ghi", "jkl", "lst"]
for f,s in zip(l1,l2):
print "%s : %s" %(f,s)
The above is for python 3, you can use izip for python 2
You may use this as your for loop:
for index, x in ndenumerate((x_array,y_array)):
But it wont help you much, because your computer cant do two things at the same time.
Profiling is definitely a good start to identify where all the time spent actually goes.
I usually use the cProfile module, as it requires minimal overhead and gives me more than enough information.
import cProfile
import pstats
cProfile.run('main()', "ProfileData.txt", 'tottime')
p = pstats.Stats('ProfileData.txt')
p.sort_stats('cumulative').print_stats(100)
I your example you would have to wrap your code into a main() function to be able to use this code snippet at the very end of your file.
Comment #1: You don't want to use ndenumerate on the izip iterator, as it'll output you the iterator, which isn't what you want.
Comment #2:
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
could be simplified in i1 = min(index[0]-(area_length-1)/2, 0), and you could store your (area_length+/-1)/2 in specific variables.
Idea #1 : try to iterate on flat versions of the arrays, i.e. with something like
for (i, (x, y)) in enumerate(izip(xdis_new.flat,ydis_new.flat)):
You could get the original indices via divmod(i, xdis_new.shape[-1]), as you should be iterating by rows first.
Idea #2 : Iterate only on the nans, i.e. indexing your arrays with np.isnan(xdis_new)|np.isnan(ydis_new), that could save you some iterations
EDIT #1
You probably don't need to initialize d, weight_fac and weighted_area in your loop, as you compute them separately.
Your weight[where(d>0)] can be simplified in weight[d>0]
Do you need weight_fac ? Can't you just compute weight then normalize it in place ? That should save you some temporary arrays.

Categories