My code suddenly stops writing at beyond 15500 iterations - python

I'm studying how to code in Python and I'm trying to recreate a code I did in college.
The code is based on a 2D Ising model applied to epidemiology. What it does is:
it constructs a 2D 100x100 array using numpy, and assigns a value of -1 to each element.
The energy is calculated based on the function calc_h in the script below.
Then, the code randomly selects a cell from the lattice, changes the value to 1, then calculates the energy of the system again.
Then, the code compares if the energy of the system is less than or equal to the previous configuration. If it does, it "accepts" the change. If it isn't, a probability is compared to a random number to determine if the change is "accepted". This part is done in the metropolis function.
The code repeats the process using a while loop until the maximum specified iteration, max_iterations.
-The code tallies the number of elements with a -1 value (which is the s variable) and the number of elements with a 1 value (which is the i variable) in the countSI function. The script appends to a text file every 500 iteratons.
THE PROBLEM
I ran the script and besides taking too long to execute, the tallying stops at 15500. The code doesn't throw any error, but it just keeps going. I waited for around 3 hours for it to finish but it still goes only up to 15500 iterations.
I've tried commenting out the writing to csv block and instead printing the values first to observe it as it happens, and there I see, it stops at 15500 again.
I have no idea what's wrong as it doesn't throw in any error, and the code doesn't stop.
Here's the whole script. I put a description of what the part does below each block:
import numpy as np
import random as r
import math as m
import csv
init_size = input("Input array size: ")
size = int(init_size)
this part initializes the size of the 2D array. For observation purposes, I selected a 100 by 100 latice.
def check_overflow(indx, size):
if indx == size - 1:
return -indx
else:
return 1
I use this function for the calc_h function, to initialize a circular boundary condition. Simply put, the edges of the lattice are connected to one another.
def calc_h(pop, J1, size):
h_sum = 0
r = 0
c = 0
while r < size:
buffr = check_overflow(r, size)
while c < size:
buffc = check_overflow(c, size)
h_sum = h_sum + J1*pop[r,c] * pop[r,c-1] * pop[r-1,c] * pop[r+buffr,c] * pop[r,c+buffc]
c = c + 1
c = 0
r = r + 1
return h_sum
this function calculates the energy of the system by taking the sum of the product of the value of a cell, its top, bottom, left and right neighbors, multiplied to a constant J.
def metropolis(h, h0, T_):
if h <= h0:
return 1
else:
rand_h = r.random()
p = m.exp(-(h - h0)/T_)
if rand_h <= p:
return 1
else:
return 0
This determines whether the change from -1 to 1 is accepted depending on what calc_h gets.
def countSI(pop, sz, iter):
s = np.count_nonzero(pop == -1)
i = np.count_nonzero(pop == 1)
row = [iter, s, i]
with open('data.txt', 'a') as si_csv:
tally_data = csv.writer(si_csv)
tally_data.writerow(row)
si_csv.seek(0)
This part tallies the number of -1's and 1's in the lattice.
def main():
J = 1
T = 4.0
max_iterations = 150000
population = np.full((size, size), -1, np.int8) #initialize population array
The 2D array is initialized in population.
h_0 = calc_h(population, J, size)
turn = 1
while turn <= max_iterations:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
while population[inf_x,inf_y] == 1:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
population[inf_x, inf_y] = 1
h = calc_h(population, J, size)
accept_i = metropolis(h,h_0,T)
This is the main loop, where a random cell is selected, and whether the change is accepted or not is determined by the function metropolis.
if (accept_i == 0):
population[inf_x, inf_y] = -1
if turn%500 == 0 :
countSI(population, size, turn)
The script tallies every 500th iteration.
turn = turn + 1
h_0 = h
main()
The expected output is a text file with the tallies of the number of the s and i every 500th iteration. something that looks like this:
500,9736,264
1000,9472,528
1500,9197,803
2000,8913,1087
2500,8611,1389
3000,8292,1708
3500,7968,2032
4000,7643,2357
4500,7312,2688
5000,6960,3040
5500,6613,3387
6000,6257,3743
6500,5913,4087
7000,5570,4430
7500,5212,4788
I have no idea where to start at a solution. At first, I thought it was the writing to csv that's causing the problem, but probing through the print function proves otherwise. I tried to make it as concise as I can.
I hope you guys can help! I really wanna learn this language and start simulating a lot of stuff, and I think this mini project is a great starting step for me.
Thanks a lot!

Answer provided by #randomir in the comments:
Your code is probably wrong. It will block in that nested while loop whenever the number of spins to flip is smaller than the number of iterations. In your example from the previous comment, the size of the population is 10000 and you want to flip 15500 spins. Note that once spin is flipped up (with 100% prob), it will be flipped down with smaller prob, due to metropolis sampling.
works.

Related

Searching for the 'p' value in a coin toss simulation

Newbie to coding, attempting to find the 'p' value in a coin toss simulation.
Currently getting the attribute error:
'int' object has no attribute 'sum'.
How could it be? please Help.'''
import numpy as np
import random
attempts = 0
t = 0
for I in range (10000):
attempts = random.randint(2, 30)
if (attempts.sum >= 22 ):
t += 1
p = t / 10000
print(p)
If you are just trying to toss a coin 10,000 times and see how many turn up heads (or tails, if you prefer) then this is a simple way to do it. The random.random function returns a number such that 0 <= x < 1, so 50% of the time it should be less than .5.
import random
tosses = 100000
t = 0
for i in range(tosses):
if random.random() < .5:
t += 1
p = t / tosses
print(p)
attempts is the most recent random integer you generated. An int has no attribute (data field) sum. Since you haven't adequately described what you think your code does, we can't fix the problem.
Python's sum function adds up a sequence of items; see the documentation for examples.
You try to count something with variable m, but you give it no initial value.
You set t to 0, and later divide it by your loop limit, but you've never changed the value; this will be 0.0.
Update after OP comments
I think I understand now: you want to estimate the probability of getting at least 22 heads (or whatever side you choose) in a set of 30 tosses of a fair coin. I'll do my best to utilize your original code.
First of all, you have to toss a fair coin; the function call you made generates a random integer in the range [2, 30]. Instead, you need to do a call such as the below in groups of 30:
flip = random.randint(0,1)
This gives you a 0 or 1. Let's assume that we want to count 1 results: this allows us to simply add the series:
count = sum(random.randint(0,1) for _ in range(30))
This will loop 30 times, put the results in a list, and add them up; there's your count of desired flips. Now, do 10,000 of those 30-flip groups, checking each for 22 results:
import random
t = 0
for i in range (10000):
count = sum(random.randint(0,1) for _ in range(30))
if (count >= 22):
t += 1
p = t / 10000
print(p)
Now, if you want to tighten this even more, use the fact that a successful comparison (i.e. True) will evaluate to 1; False will be 0: make all 10,000 trials in an outer comprehension (in-line for):
t = sum(
sum(random.randint(0,1) for flip in range(30)) > 22
for trial in range(10000) )
print(t / 10000)
flip and trial are dummy loop variables; you can use whatever two you like.
Finally, it's usually better style to make named variables for your algorithms parameters, such as
threshhold = 22
trial_limit = 10000
flip_limit = 30
and use those names in your code.

Processing big list using python

So I'm trying to solve a challenge and have come across a dead end. My solution works when the list is small or medium but when it is over 50000. It just "time out"
a = int(input().strip())
b = list(map(int,input().split()))
result = []
flag = []
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and temp in flag):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flag.remove(temp)
else:
flag.append(b[i])
result.sort()
for i in result:
print(i[0],i[1])
Where
a = 10
and b = [ 2, 4 ,6 ,8, 5 ]
Solution sum any two element in b which matches a
**Edit: ** Updated full code
flag is a list, of potentially the same order of magnitude as b. So, when you do temp in flag that's a linear search: it has to check every value in flag to see if that value is == temp. So, that's 50000 comparisons. And you're doing that once per loop in a linear walk over b. So, your total time is quadratic: 50,000 * 50,000 = 2,500,000,000. (And flag.remove is also linear time.)
If you replace flag with a set, you can test it for membership (and remove from it) in constant time. So your total time drops from quadratic to linear, or 50,000 steps, which is a lot faster than 2 billion:
flagset = set(flag)
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and temp in flagset):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flagset.remove(temp)
else:
flagset.add(b[i])
flag = list(flagset)
If flag needs to retain duplicate values, then it's a multiset, not a set, which means you can implement with Counter:
flagset = collections.Counter(flag)
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and flagset[temp]):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flagset[temp] -= 1
else:
flagset[temp] += 1
flag = list(flagset.elements())
In your edited code, you’ve got another list that’s potentially of the same size, result, and you’re sorting that list every time through the loop.
Sorting takes log-linear time. Since you do it up to 50,000 times, that’s around log(50;000) * 50,000 * 50,000, or around 30 billion steps.
If you needed to keep result in order throughout the operation, you’d want to use a logarithmic data structure, like a binary search tree or a skiplist, so you could insert a new element in the right place in logarithmic time, which would mean just 800.000 steps.
But you don’t need it in order until the end. So, much more simply, just move the result.sort out of the loop and do it at the end.

Append keeps on appending the same item, does not append the right ones, Python

This is what I have imported:
import random
import matplotlib.pyplot as plt
from math import log, e, ceil, floor
import numpy as np
from numpy import arange,array
import pdb
from random import randint
Here I define the function matrix(p,m)
def matrix(p,m): # A matrix with zeros everywhere, except in every entry in the middle of the row
v = [0]*m
v[(m+1)/2 - 1] = 1
vv = array([v,]*p)
return vv
ct = np.zeros(5) # Here, I choose 5 cause I wanted to work with an example, but should be p in general
Here I define MHops which basically takes the dimensions of the matrix, the matrix and the vector ct and gives me a new matrix mm and a new vector ct
def MHops(p,m,mm,ct):
k = 0
while k < p : # This 'spans' the rows
i = 0
while i < m : # This 'spans' the columns
if mm[k][i] == 0 :
i+=1
else:
R = random.random()
t = -log(1-R,e) # Calculate time of the hopping
ct[k] = ct[k] + t
r = random.random()
if 0 <= r < 0.5 : # particle hops right
if 0 <= i < m-1:
mm[k][i] = 0
mm[k][i+1] = 1
break
else:
break # Because it is at the boundary
else: # particle hops left
if 0 < i <=m-1:
mm[k][i] = 0
mm[k][i-1] = 1
break
else: # Because it is at the boundary
break
break
k+=1
return (mm,ct) # Gives me the new matrix showing the new position of the particles and a new vector of times, showing the times taken by each particle to hop
Now what I wanna do is iterating this process, but I wanna be able to visualize every step in a list. In short what I am doing is:
1. creating a matrix representing a lattice, where 0 means there is no particle in that slot and 1 means there is a particle there.
2. create a function MHops which simulate a random walk of one step and gives me the new matrix and a vector ct which shows the times at which the particles move.
Now I want to have a vector or an array where I have 2*n objects, i.e. the matrix mm and the vector ct for n iterations. I want the in a array, list or something like this cause I need to use them later on.
Here starts my problem:
I create an empty list, I use append to append items at every iteration of the while loop. However the result that I get is a list d with n equal objects coming from the last iteration!
Hence my function for the iteration is the following:
def rep_MHops(n,p,m,mm,ct):
mat = mm
cct = ct
d = []
i = 0
while i < n :
y = MHops(p,m,mat,cct) # Calculate the hop, so y is a tuple y = (mm,ct)
mat = y[0] # I reset mat and cct so that for the next iteration, I go further
cct = y[1]
d.append(mat)
d.append(cct)
i+=1
return d
z = rep_MHops(3,5,5,matrix(5,5),ct) #If you check this, it doesn't work
print z
However it doesn't work, I don't understand why. What I am doing is using MHops, then I want to set the new matrix and the new vector as those in the output of MHops and doing this again. However if you run this code, you will see that v works, i.e. the vector of the times increases and the matrix of the lattice change, however when I append this to d, d is basically a list of n equal objects, where the object are the last iteration.
What is my mistake?
Furthermore if you have any coding advice for this code, they would be more than welcome, I am not sure this is an efficient way.
Just to let you understand better, I would like to use the final vector d in another function where first of all I pick a random time T, then I would basically check every odd entry (every ct) and hence check every entry of every ct and see if these numbers are less than or equal to T. If this happens, then the movement of the particle happened, otherwise it didn't.
From this then I will try to visualize with matpotlibt the result with an histogram or something similar.
Is there anyone who knows how to run this kind of simulation in matlab? Do you think it would be easier?
You're passing and storing by references not copies, so on the next iteration of your loop MHops alters your previously stored version in d. Use import copy; d.append(copy.deepcopy(mat)) to instead store a copy which won't be altered later.
Why?
Python is passing the list by reference, and every loop you're storing a reference to the same matrix object in d.
I had a look through python docs, and the only mention I can find is
"how do i write a function with output parameters (call by reference)".
Here's a simpler example of your code:
def rep_MHops(mat_init):
mat = mat_init
d = []
for i in range(5):
mat = MHops(mat)
d.append(mat)
return d
def MHops(mat):
mat[0] += 1
return mat
mat_init = [10]
z = rep_MHops(mat_init)
print(z)
When run gives:
[[15], [15], [15], [15], [15]]
Python only passes mutable objects (such as lists) by reference. An integer isn't a mutable object, here's a slightly modified version of the above example which operates on a single integer:
def rep_MHops_simple(mat_init):
mat = mat_init
d = []
for i in range(5):
mat = MHops_simple(mat)
d.append(mat)
return d
def MHops_simple(mat):
mat += 1
return mat
z = rep_MHops_simple(mat_init=10)
print(z)
When run gives:
[11, 12, 13, 14, 15]
which is the behaviour you were expecting.
This SO answer How do I pass a variable by reference? explains it very well.

Simple Genetic Algorithm meeting local optimum for "Hello World"

My target was simple, using genetic algorithm to reproduce the classical "Hello, World" string.
My code was based on this post. The code mainly contain 4 parts:
Generate the population which has serval different individual
Define the fitness and grade function which evaluate the individual good or bad based on the comparing with target.
Filter the population and leave len(pop)*retain individuals
Add some other individuals and mutate randomly
The parents's DNA will pass over to its children to comprise the whole population.
I modified the code and shows like this:
import numpy as np
import string
from operator import add
from random import random, randint
def population(GENSIZE,target):
p = []
for i in range(0,GENSIZE):
individual = [np.random.choice(list(string.printable[:-5])) for j in range(0,len(target))]
p.append(individual)
return p
def fitness(source, target):
fitval = 0
for i in range(0,len(source)-1):
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
def grade(pop, target):
'Find average fitness for a population.'
summed = reduce(add, (fitness(x, target) for x in pop))
return summed / (len(pop) * 1.0)
def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in p]
graded = [ x[1] for x in sorted(graded)]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random_select > random():
parents.append(individual)
# mutate some individuals
for individual in parents:
if mutate > random():
pos_to_mutate = randint(0, len(individual)-1)
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
#
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) / 2
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
GENSIZE = 40
target = "Hello, World"
p = population(GENSIZE,target)
fitness_history = [grade(p, target),]
for i in xrange(20):
p = evolve(p, target)
fitness_history.append(grade(p, target))
# print p
for datum in fitness_history:
print datum
But it seems that the result can't fit targetwell.
I tried to change the GENESIZE and loop time(more generation).
But the result always get stuck. Sometimes, enhance the loop time can help to find a optimum solution. But when I change the loop time to an much larger number like for i in xrange(10000). The result shows the error like:
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
ValueError: chr() arg not in range(256)
Anyway, how to modify my code and get an good result.
Any advice would be appreciate.
The chr function in Python2 only accepts values in the range 0 <= i < 256.
You are passing:
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
So you need to check that the result of
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
is not going to be outside that range, and take corrective action before passing to chr if it is outside that range.
EDIT
A reasonable fix for the ValueError might be to take the amended value modulo 256 before passing to chr:
chr((ord(individual[pos_to_mutate]) + np.random.randint(-1, 1)) % 256)
There is another bug: the fitness calculation doesn't take the final element of the candidate list into account: it should be:
def fitness(source, target):
fitval = 0
for i in range(0,len(source)): # <- len(source), not len(source) -1
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
Given that source and target must be of equal length, the function can be written as:
def fitness(source, target):
return sum((ord(t) - ord(s)) ** 2 for (t, s) in zip(target, source))
The real question was, why doesn't the code provided evolve random strings until the target string is reached.
The answer, I believe, is it may, but will take a lot of iterations to do so.
Consider, in the blog post referenced in the question, each iteration generates a child which replaces the least fit member of the gene pool if the child is fitter. The selection of the child's parent is biased towards fitter parents, increasing the likelihood that the child will enter the gene pool and increase the overall "fitness" of the pool. Consequently the members of the gene pool converge on the desired result within a few thousand iterations.
In the code in the question, the probability of mutation is much lower, based on the initial conditions, that is the defaults for the evolve function.
Parents that are retained have only a 1% chance of mutating, and one third of the time the "mutation" will not result in a change (zero is a possible result of random.randint(-1, 1)).
Discard parents are replaced by individuals created by merging two retained individuals. Since only 20% of parents are retained, the population can converge on a local minimum where each new child is effectively a copy of an existing parent, and so no diversity is introduced.
So apart from fixing the two bugs, the way to converge more quickly on the target is to experiment with the initial conditions and to consider changing the code in the question to inject more diversity, for example by mutating children as in the original blog post, or by extending the range of possible mutations.

Optimization error on random word splitting function Python

I wrote a word splitting function. It splits a word into random characters. For example if input is 'runtime' one of each below output possible:
['runtime']
['r','untime']
['r','u','n','t','i','m','e']
....
But it's runtime is very high when I want to split 100k words do you have any suggestions to optimize or write it smarter.
def random_multisplitter(word):
from numpy import mod
spw = []
length = len(word)
rand = random_int(word)
if rand == length: #probability of not splitting
return [word]
else:
div = mod(rand, (length + 1)) #defining division points
bound = length - div
spw.append(div)
while div != 0:
rand = random_int(word)
div = mod(rand,(bound+1))
bound = bound-div
spw.append(div)
result = spw
b = 0
points =[]
for x in range(len(result)-1): #calculating splitting points
b=b+result[x]
points.append(b)
xy=0
t=[]
for i in points:
t.append(word[xy:i])
xy=i
if word[xy:len(word)]!='':
t.append(word[xy:len(word)])
if type(t)!=list:
return [t]
return t
I don't get what you're doing there, but the outcomes definitely are not all equally probable for your code. Therefore the code is not working and actually StackOverflow might be the right place even though you don't know it.
How do I know that your code doesn't work? The Law of Large Numbers! It looked suspicious so I just generated one million samples with your function and got this distribution:
Note that the y-Axis' scaling is logarithmic, those estimated probabilities vary a lot!
So now some code that is both a lot faster and actually producing equally probable results:
def random_multisplitter(word):
# add's bits will tell whether a char shall be added to last substring or
# be the beginning of its own substring
add = random.randint(0, 2**len(word) - 1)
# append 0 to make sure first char is start of first substring
add <<= 1
res = []
for char in word:
# see if last bit is 1
if add & 1:
res[-1] += char
else:
res.append(char)
# shift to next bit
add >>= 1
return res
This is what Blckknght suggested and, believe me or not, I had the same idea about an hour before they posted their comment, but I had no time to write this answer.
Anyway, here are the estimated probabilities of that function:
All gathered around 1/64=0.015625 (green line) suggesting that the probability distribution is uniform.
The timings on my machine using python2.7 are 4.56 µs for this function and 20.1 µs for your function.

Categories