Beginner problems with references to arrays in python 3.1.1 - python

As part of the last assignment in a beginner python programing class, I have been assigned a traveling sales man problem. I settled on a recursive function to find each permutation and the sum of the distances between the destinations, however, I am have a lot of problems with references. Arrays in different instances of the Permute and Main functions of TSP seem to be pointing to the same reference.
from math import sqrt
class TSP:
def __init__(self):
self.CartisianCoordinates = [['A',[1,1]], ['B',[2,2]], ['C',[2,1]], ['D',[1,2]], ['E',[3,3]]]
self.Array = []
self.Max = 0
self.StoredList = ['',0]
def Distance(self, i1, i2):
x1 = self.CartisianCoordinates[i1][1][0]
y1 = self.CartisianCoordinates[i1][1][1]
x2 = self.CartisianCoordinates[i2][1][0]
y2 = self.CartisianCoordinates[i2][1][1]
return sqrt(pow((x2 - x1), 2) + pow((y2 - y1), 2))
def Evaluate(self):
temparray = []
Data = []
for i in range(len(self.CartisianCoordinates)):
Data.append([])
for i1 in range(len(self.CartisianCoordinates)):
for i2 in range(len(self.CartisianCoordinates)):
if i1 != i2:
temparray.append(self.Distance(i1, i2))
else:
temparray.append('X')
Data[i1] = temparray
temparray = []
self.Array = Data
self.Max = len(Data)
def Permute(self,varray,index,vcarry,mcarry): #Problem Class
array = varray[:]
carry = vcarry[:]
for i in range(self.Max):
print ('ARRAY:', array)
print (index,i,carry,array[index][i])
if array[index][i] != 'X':
carry[0] += self.CartisianCoordinates[i][0]
carry[1] += array[index][i]
if len(carry) != self.Max:
temparray = array[:]
for j in range(self.Max):temparray[j][i] = 'X'
index = i
mcarry += self.Permute(temparray,index,carry,mcarry)
else:
return mcarry
print ('pass',mcarry)
return mcarry
def Main(self):
out = []
self.Evaluate()
for i in range(self.Max):
array = self.Array[:] #array appears to maintain the same reference after each copy, resulting in an incorrect array being passed to Permute after the first iteration.
print (self.Array[:])
for j in range(self.Max):array[j][i] = 'X'
print('I:', i, array)
out.append(self.Permute(array,i,[str(self.CartisianCoordinates[i][0]),0],[]))
return out
SalesPerson = TSP()
print(SalesPerson.Main())
It would be greatly appreciated if you could provide me with help in solving the reference problems I am having. Thank you.

Slicing a list (by using [:] as you do) does not create a deep copy -- it creates a shallow copy. That means that if the list contains references to other lists, the copy will contain the same references -- not references to new lists. Put another way, only the list itself is copied, not its elements or its elements' elements.
What you want here is a deep copy. Instead of
array = self.Array[:]
try
array = copy.deepcopy(self.Array)
for which you'll need import copy.

Related

Append a 2D array while looping through it

I want to store certain values in a 2D array. In the below code. I want sT to be total. When the inner loop runs the values to be stored in rows and then next column when the outer loop increment happens.
class pricing_lookback:
def __init__(self,spot,rate,sigma,time,sims,steps):
self.spot = spot
self.rate = rate
self.sigma = sigma
self.time = time
self.sims = sims
self.steps = steps
self.dt = self.time/self.steps
def call_floatingstrike(self):
simulationS = np.array([])
simulationSt = np.array([])
call2 = np.array([])
total = np.empty(shape=[self.steps, self.sims])
for j in range(self.sims):
sT = self.spot
pathwiseminS = np.array([])
for i in range(self.steps):
phi= np.random.normal()
sT *= np.exp((self.rate-0.5*self.sigma*self.sigma)*self.dt + self.sigma*phi*np.sqrt(self.dt))
pathwiseminS = np.append(pathwiseminS, sT)
np.append(total,[[j,sT]])###This should store values in rows of j column
#print (pathwiseminS)
#tst1 = np.append(tst1, pathwiseminS[1])
call2 = np.append(call2, max(pathwiseminS[self.steps-1]-self.spot,0))
#print (pathwiseminS[self.steps-1])
#print(call2)
simulationSt = np.append(simulationSt,pathwiseminS[self.steps-1])
simulationS = np.append(simulationS,min(pathwiseminS))
call = max(np.average(simulationSt) - np.average(simulationS),0)
return call, total#,call2,
Here is a simple example of what I think you are trying to do:
for i in range(5):
row = np.random.rand(5,)
if i == 0:
my_array = row
else:
my_array = np.vstack((my_array, row))
print(row)
However, this is not very efficient with memory, especially if you are dealing with large arrays, as this has to allocate new memory on every loop. It would be much better to preallocate an empty array and then populate it if possible.
To answer the question of how to append a column, it would be something like this:
import numpy as np
x = np.random.rand(5, 4)
column_to_append = np.random.rand(5,)
np.insert(x, x.shape[1], column_to_append, axis=1)
Again, this is not memory efficient and should be avoided whenever possible. Preallocation is much better.

Make my Nested loops Works simpler (Operating Time is Higher)

I am a learner in nested loops in python.
Below I have written my code. I want to make my code simpler, since when I run the code it takes so much time to produce the result.
I have a list which contains 1000 values:
Brake_index_values = [ 44990678, 44990679, 44990680, 44990681, 44990682, 44990683,
44997076, 44990684, 44997077, 44990685,
...
44960673, 8195083, 8979525, 100107546, 11089058, 43040161,
43059162, 100100533, 10180192, 10036189]
I am storing the element no 1 in another list
original_top_brake_index = [Brake_index_values[0]]
I created a temporary list called temp and a numpy array for iteration through Loop:
temp =[]
arr = np.arange(0,1000,1)
Loop operation:
for i in range(1, len(Brake_index_values)):
if top_15_brake <= 15:
a1 = Brake_index_values[i]
#a2 = Brake_index_values[j]
a3 = arr[:i]
for j in a3:
a2 = range(Brake_index_values[j] - 30000, Brake_index_values[j] + 30000)
if a1 in a2:
pass
else:
temp.append(a1)
if len(temp)== len(a3):
original_top_brake_index.append(a1)
top_15_brake += 1
del temp[:]
else:
del temp[:]
continue
I am comparing the Brake_index_values[1] element available between the range of 30000 before and after Brake_index_values[0] element, that is `range(Brake_index_values[0]-30000, Brake_index_values[0]+30000).
If the Brake_index_values[1] available between the range, I should ignore that element and go for the next element Brake_index_values[2] and follow the same process as before for Brake_index_values[0] & Brake_index_values[1]
If it is available, store the Value, in original_top_brake_index thorough append operation.
In other words :
(Lets take 3 values a,b & c. I am checking whether the value b is in range between (a-30000 to a+30000). Possibility 1: If b is in between (a-30000 to a+30000) , neglect that element (Here I am storing inside a temporary list). Then the same process continues with c (next element) Possibility 2: If b is not in b/w those range put b in another list called original_top_brake_index
(this another list is the actual result what i needed)
The result I get:
It is working, but it takes so much time to complete the operation and sometimes it shows MemoryError.
I just want my code to work simpler and efficient with simple operations.
Try this code (with numpy):
import numpy as np
original_top_brake_index = [Brake_index_values[0]]
top_15_brake = 0
Brake_index_values = np.array(Brake_index_values)
for i, a1 in enumerate(Brake_index_values[0:]):
if top_15_brake > 15:
break
m = (Brake_index_values[:i] - a1)
if np.logical_or(m > 30000, m < - 30000).all():
original_top_brake_index.append(a1)
top_15_brake += 1
Note: you can probably make it even more efficient, but this already should reduce the number of operations significantly (and doesn't change much the logic of your original code)
We can use the bisect module to shorten the elements we actually have to lookup by finding the smallest element that's greater or less than the current value. We will use recipes from here
Let's look at this example:
from bisect import bisect_left, bisect_right
def find_lt(a, x):
'Find rightmost value less than x'
i = bisect_left(a, x)
if i:
return a[i-1]
return
def find_gt(a, x):
'Find leftmost value greater than x'
i = bisect_right(a, x)
if i != len(a):
return a[i]
return
vals = [44990678, 44990679, 44990680, 44990681, 44990682, 589548954, 493459734, 3948305434, 34939349534]
vals.sort() # we have to sort the values for bisect to work
passed = []
originals = []
for val in vals:
passed.append(val)
l = find_lt(passed, val)
m = find_gt(passed, val)
cond1 = (l and l + 30000 >= val)
cond2 = (m and m - 30000 <= val)
if not l and not m:
originals.append(val)
continue
elif cond1 or cond2:
continue
else:
originals.append(val)
Which gives us:
print(originals)
[44990678, 493459734, 589548954, 3948305434, 34939349534]
There might be another, more mathematical way to do this, but this should at least simplify your code.

How do you use a list as an index argument for numpy ndarrays?

So I have a problem that might be super duper simple.
I have these numpy ndarrays that I allocated and want to assign values to them via indices returned as lists. It might be easier if I showed you some example code. The questionable code I have is at the bottom, and in my testing (before actually taking this to scale) I keep getting syntax errors :'(
EDIT: edited to make it easier to troubleshoot and put some example code at the bottoms
import numpy as np
def do_stuff(index, mask):
# this is where the calculations are made
magic = sum(mask)
return index, magic
def foo(full_index, comparison_dims, *xargs):
# I have this function executed in Parallel since I'm using a machine with 36 nodes per core, and can access upto 16 cores for each script #blessed
# figure out how many dimensions there are, and how big they are
parent_dims = []
parent_diffs = []
for j in xargs:
parent_dims += [len(j)]
parent_diffs += [j[1] - j[0]] # this is used to find a mask
index = [] # this is where the individual dimension indices will be stored
dim_n = 0
# loop through the dimensions
while dim_n < len(parent_dims):
dim_index = full_index % parent_dims[dim_n]
index += [dim_index]
if dim_n == 0:
mask = (comparison_dims[dim_n] > xargs[dim_n][dim_index] - parent_diffs[dim_n]/2) * \
(comparison_dims[dim_n] <= xargs[dim_n][dim_index] +parent_diffs[dim_n] / 2)
else:
mask *= (comparison_dims[dim_n] > xargs[dim_n][dim_index] - parent_diffs[dim_n]/2) * \
(comparison_dims[dim_n] <=xargs[dim_n][dim_index] + parent_diffs[dim_n] / 2)
full_index //= parent_dims[dim_n]
dim_n += 1
return do_stuff(index, mask)
def bar(comparison_dims, *xargs):
if len(xargs) == comparison_dims.shape[0]:
pass
elif len(comparison_dims.shape) == 2:
pass
else:
raise ValueError("silly person, you failed")
from joblib import Parallel, delayed
dims = []
for j in xargs:
dims += [len(j)]
myArray = np.empty(tuple(dims))
results = Parallel(n_jobs=1)(
delayed(foo)(
index, comparison_dims, *xargs)
for index in range(np.prod(dims))
)
# LOOK HERE, HELP HERE!
for index_list, result in results:
# I thought this would work, but oh golly was I was wrong, index_list here is a list of ints, and result is a value
# for example index, result = [0,3,7], 45.4
# so in execution, that would yield: myArray[0,3,7] = 45.4
# instead it yields SyntaxError because I don't know what I'm doing XD
myArray[*index_list] = result
return myArray
Any ideas how I can make that work. What do I need to do?
I'm not the sharpest tool in the shed, but I think with your help we might be able to figure this out!
A quick example to troubleshoot this problem would be:
compareDims = np.array([np.random.rand(1000), np.random.rand(1000)])
dim0 = np.arange(0,1,1./20)
dim1 = np.arange(0,1,1./30)
myArray = bar(compareDims, dim0, dim1)
To index a numpy array with an arbitrary list of multidimensional indices. you actually need to use a tuple:
for index_list, result in results:
myArray[tuple(index_list)] = result

I need to vectorize the following in order for the code can run faster

This portion I was able to vectorize and get rid of a nested loop.
def EMalgofast(obsdata, beta, pjt):
n = np.shape(obsdata)[0]
g = np.shape(pjt)[0]
zijtpo = np.zeros(shape=(n,g))
for j in range(g):
zijtpo[:,j] = pjt[j]*stats.expon.pdf(obsdata,scale=beta[j])
zijdenom = np.sum(zijtpo, axis=1)
zijtpo = zijtpo/np.reshape(zijdenom, (n,1))
pjtpo = np.mean(zijtpo, axis=0)
I wasn't able to vectorize the portion below. I need to figure that out
betajtpo_1 = []
for j in range(g):
num = 0
denom = 0
for i in range(n):
num = num + zijtpo[i][j]*obsdata[i]
denom = denom + zijtpo[i][j]
betajtpo_1.append(num/denom)
betajtpo = np.asarray(betajtpo_1)
return(pjtpo,betajtpo)
I'm guessing Python is not your first programming language based on what I see. The reason I'm saying this is that in python, normally we don't have to deal with manipulating indexes. You act directly on the value or the key returned. Make sure not to take this as an offense, I do the same coming from C++ myself. It's a hard to remove habits ;).
If you're interested in performance, there is a good presentation by Raymond Hettinger on what to do in Python to be optimised and beautiful :
https://www.youtube.com/watch?v=OSGv2VnC0go
As for the code you need help with, is this helping for you? It's unfortunatly untested as I need to leave...
ref:
Iterating over a numpy array
http://docs.scipy.org/doc/numpy/reference/generated/numpy.true_divide.html
def EMalgofast(obsdata, beta, pjt):
n = np.shape(obsdata)[0]
g = np.shape(pjt)[0]
zijtpo = np.zeros(shape=(n,g))
for j in range(g):
zijtpo[:,j] = pjt[j]*stats.expon.pdf(obsdata,scale=beta[j])
zijdenom = np.sum(zijtpo, axis=1)
zijtpo = zijtpo/np.reshape(zijdenom, (n,1))
pjtpo = np.mean(zijtpo, axis=0)
betajtpo_1 = []
#manipulating an array of numerator and denominator instead of creating objects each iteration
num=np.zeros(shape=(g,1))
denom=np.zeros(shape=(g,1))
#generating the num and denom real value for the end result
for (x,y), value in numpy.ndenumerate(zijtpo):
num[x],denom[x] = num[x] + value *obsdata[y],denom[x] + value
#dividing all at once after instead of inside the loop
betajtpo_1= np.true_divide(num/denom)
betajtpo = np.asarray(betajtpo_1)
return(pjtpo,betajtpo)
Please leave me some feedback !
Regards,
Eric Lafontaine

Python -- using regex on a class instance

I have a class that was taking in lists of 1's and 0's and performing GF(2) finite field arithmetic operations. It used to work until I tried to make it take the input in polynomial format. As for how the finite arithmetic will be done after fixing the regex issue, I was thinking about overloading the operators.
The actual code in parsePolyToListInput(input) works when outside the class. The problem seems to be in the regex, which errors that it will only take in a string (this makes sense), but does not seem to initialize with self.expr as a parameter (that's a problem). The #staticmethod just before the initialization was an attempt to salvage the unbound error as it the polynomial was passed in, but this is apparently completely wrong. Just to save you time if you decide to look at any of the arithmetic operations, modular inverse does not work (seems to be due to the formatting issue after every iteration of that while loop for division in the function and what the return type is):
import re
class gf2poly:
#binary arithemtic on polynomials
##staticmethod
def __init__(self,expr):
self.expr = expr
#self.expr = [int(i) for i in expr]
self.expr = gf2poly.parsePolyToListInput(self.expr)
def convert(self): #to clarify the input if necessary
convertToString = str(self.expr)
print "expression is %s"%(convertToString)
def id(self): #returns modulus 2 (1,0,0,1,1,....) for input lists
return [int(self.expr[i])%2 for i in range(len(self.expr))]
def listToInt(self): #converts list to integer for later use
result = gf2poly.id(self)
return int(''.join(map(str,result)))
def prepBinary(a,b): #converts to base 2 and orders min and max for use
a = gf2poly.listToInt(a); b = gf2poly.listToInt(b)
bina = int(str(a),2); binb = int(str(b),2)
a = min(bina,binb); b = max(bina,binb);
return a,b
#staticmethod
def outFormat(raw):
raw = str(raw[::-1]); g = [] #reverse binary string for enumeration
[g.append(i) for i,c in enumerate(raw) if c == '1']
processed = "x**"+' + x**'.join(map(str, g[::-1]))
if len(g) == 0: return 0 #return 0 if list empty
return processed #returns result in gf(2) polynomial form
def parsePolyToListInput(poly):
c = [int(i.group(0)) for i in re.finditer(r'\d+', poly)] #re.finditer returns an iterator
#m = max(c)
return [1 if x in c else 0 for x in xrange(max(c), -1, -1)]
#return d
def add(self,other): #accepts 2 lists as parameters
a = gf2poly.listToInt(self); b = gf2poly.listToInt(other)
bina = int(str(a),2); binb = int(str(b),2)
m = bina^binb; z = "{0:b}".format(m)
return z #returns binary string
def subtract(self,other): #basically same as add() but built differently
result = [self.expr[i] ^ other.expr[i] for i in range(len(max(self.expr,other.expr)))]
return int(''.join(map(str,result)))
def multiply(a,b): #a,b are lists like (1,0,1,0,0,1,....)
a,b = gf2poly.prepBinary(a,b)
g = []; bitsa = "{0:b}".format(a)
[g.append((b<<i)*int(bit)) for i,bit in enumerate(bitsa)]
m = reduce(lambda x,y: x^y,g); z = "{0:b}".format(m)
return z #returns product of 2 polynomials in gf2
def divide(a,b): #a,b are lists like (1,0,1,0,0,1,....)
a,b = gf2poly.prepBinary(a,b)
bitsa = "{0:b}".format(a); bitsb = "{0:b}".format(b)
difflen = len(str(bitsb)) - len(str(bitsa))
c = a<<difflen; q=0
while difflen >= 0 and b != 0: #a is divisor, b is dividend, b/a
q+=1<<difflen; b = b^c # b/a because of sorting in prep
lendif = abs(len(str(bin(b))) - len(str(bin(c))))
c = c>>lendif; difflen -= lendif
r = "{0:b}".format(b); q = "{0:b}".format(q)
return r,q #returns r remainder and q quotient in gf2 division
def remainder(a,b): #separate function for clarity when calling
r = gf2poly.divide(a,b)[0]; r = int(str(r),2)
return "{0:b}".format(r)
def quotient(a,b): #separate function for clarity when calling
q = gf2poly.divide(a,b)[1]; q = int(str(q),2)
return "{0:b}".format(q)
def extendedEuclideanGF2(a,b): # extended euclidean. a,b are GF(2) polynomials in list form
inita,initb=a,b; x,prevx=0,1; y,prevy = 1,0
while sum(b) != 0:
q = gf2poly.quotient(a,b);
q = list(q); q = [int(x) for x in q]
#q = list(q);
#q = tuple([int(i) for i in q])
q = gf2poly(q)
a,b = b,gf2poly.remainder(a,b);
#a = map(list, a);
#b = [list(x) for x in a];
#a = [int(x) for x in a]; b = [int(x) for x in b];
b = list(b); b = [int(x) for x in b]
#b = list(b);
#b = tuple([int(i) for i in b])
b = gf2poly(b)
#x,prevx = (prevx-q*x, x);
#y,prevy=(prevy-q*y, y)
print "types ",type(q),type(a),type(b)
#q=a//b; a,b = b,a%b; x,prevx = (prevx-q*x, x); y,prevy=(prevy-q*y, y)
#print("%d * %d + %d * %d = %d" % (inita,prevx,initb,prevy,a))
return a,prevx,prevy # returns gcd of (a,b), and factors s and t
def modular_inverse(a,mod): # where a,mod are GF(2) polynomials in list form
gcd,s,t = gf2poly.extendedEuclideanGF2(a,mod); mi = gf2poly.remainder(s,mod)
#gcd,s,t = ext_euc_alg_i(a,mod); mi = s%mod
if gcd !=1: return False
#print ("%d * %d mod %d = 1"%(a,mi,mod))
return mi # returns modular inverse of a,mod
I usually test it with this input:
a = x**14 + x**1 + x**0
p1 = gf2poly(a)
b = x**6 + x**2 + x**1
p2 = gf2poly(b)
The first thing you might notice about my code is that it's not very good. There are 2 reasons for that:
1) I wrote it so that the 1st version could do work in the finite field GF(2), and output in polynomial format. Then the next versions were supposed to be able to take polynomial inputs, and also perform the crucial 'modular inverse' function which is not working as planned (this means it's actually not working at all).
2) I'm teaching myself Python (I'm actually teaching myself programming overall), so any constructive criticism from pro Python programmers is welcome as I'm trying to break myself of beginner habits as quickly as possible.
EDIT:
Maybe some more of the code I've been testing with will help clarify what works and what doesn't:
t1 = [1,1,1]; t2 = [1,0,1]; t3 = [1,1]; t4 = [1, 0, 1, 1, 1, 1, 1]
t5 = [1,1,1,1]; t6 = [1,1,0,1]; t7 = [1,0,1,1,0]
f1 = gf2poly(t1); f2 = gf2poly(t2); f3 = gf2poly(t3); f4 = gf2poly(t4)
f5 = gf2poly(t5);f6 = gf2poly(t6);f7 = gf2poly(t7)
##print "subtract: ",a.subtract(b)
##print "add: ",a.add(b)
##print "multiply: ",gf2poly.multiply(f1,f3)
##print "multiply: ",gf2poly.multiply(f1,f2)
##print "multiply: ",gf2poly.multiply(f3,f4)
##print "degree a: ",a.degree()
##print "degree c: ",c.degree()
##print "divide: ",gf2poly.divide(f1,b)
##print "divide: ",gf2poly.divide(f4,a)
##print "divide: ",gf2poly.divide(f4,f2)
##print "divide: ",gf2poly.divide(f2,a)
##print "***********************************"
##print "quotient: ",gf2poly.quotient(f2,f5)
##print "remainder: ",gf2poly.remainder(f2,f5)
##testq = gf2poly.quotient(f4,f2)
##testr = gf2poly.remainder(f4,f2)
##print "quotient: ",testq,type(testq)
##print "remainder: ",testr,type(testr)
##print "***********************************"
##print "outFormat testp: ",gf2poly.outFormat(testq)
##print "outFormat testr: ",gf2poly.outFormat(testr)
##print "***********************************"
#print "gf2poly.modular_inverse(): ",gf2poly.modular_inverse(f2,f3)
print "p1 ",p1 #,type(f2),type(f3)
#print "parsePolyToListInput ",gf2poly.parsePolyToListInput(a)
Part of your problem is that you haven't declared self as an argument for parsePolyToListInput. When you call a method, the instance you call it on is implicitly bound as the first argument. Naming the first argument self is a convention, not a strict requirement - the instance is being bound to poly, which you then try to run a regexp over.
It looks me like there's some confusion in your design here about what's behavior of individual instances of the class and what's class-level or module-level behavior. In Python, it's perfectly acceptable to leave something that doesn't take an instance of a class as a parameter defined as a module-level function rather than shoehorning it in awkwardly. parsePolyToListInput might be one such function.
Your add implementation, similarly, has a comment saying it "accepts 2 lists as parameters". In fact, it's going to get a gf2poly instance as its first argument - this is probably right if you're planning to do operator overloading, but it means the second argument should also be a gf2poly instance as well.
EDIT:
Yeah, your example code shows a breakdown between class behavior and instance behavior. Either your multiply call should look something like this:
print "multiply: ",f1.multiply(f3)
Or multiply shouldn't be a method at all:
gfpoly.py:
def multiply(f1, f2):
a,b = prepBinary(a,b)
g = []; bitsa = "{0:b}".format(a)
[g.append((b<<i)*int(bit)) for i,bit in enumerate(bitsa)]
m = reduce(lambda x,y: x^y,g); z = "{0:b}".format(m)
return z #returns product of 2 polynomials in gf2
That latter approach is, for instance, how the standard math library does things.
The advantage of defining a multiplication method is that you could name it appropriately (http://docs.python.org/2/reference/datamodel.html#special-method-names) and use it with the * operator:
print "multiply: ",f1 *f3

Categories