Matrix of variable size [i x j] (Python, Numpy)

Matrix of variable size [i x j] (Python, Numpy) - python

I am attempting to build a simple genetic algorithm that will optimize to an input string, but am having trouble building the [individual x genome] matrix (row n is individual n's genome.) I want to be able to change the population size, mutation rate, and other parameters to study how that affects convergence rate and program efficiency.
This is what I have so far:
import random
import itertools
import numpy as np
def evolve():
goal = 'Hello, World!' #string to optimize towards
ideal = list(goal)
#converting the string into a list of integers
for i in range (0,len(ideal)):
ideal [i] = ord(ideal[i])
print(ideal)
popSize = 10 #population size
genome = len(ideal) #determineing the length of the genome to be the length of the target string
mut = 0.03 #mutation rate
S = 4 #tournament size
best = float("inf") #initial best is very large
maxVal = max(ideal)
minVal = min(ideal)
print (maxVal)
i = 0 #counting variables assigned to solve UnboundLocalError
j = 0
print(maxVal, minVal)
#constructing initial population array (individual x genome)
pop = np.empty([popSize, len(ideal)])
for i, j in itertools.product(range(i), range(j)):
pop[i, j] = [i, random.randint(minVal,maxVal)]
print(pop)
This produces a matrix of the population size with the correct genome length, but the genomes are something like:
[ 6.91364167e-310 6.91364167e-310 1.80613009e-316 1.80613009e-316
5.07224590e-317 0.00000000e+000 6.04100487e+151 3.13149876e-120
1.11787892e+253 1.47872844e-028 7.34486815e+223 1.26594941e-118
7.63858409e+228]
I need them to be random integers corresponding to random ASCII characters .
What am I doing wrong with this method?
Is there a way to make this faster?
I found my current method here:
building an nxn matrix in python numpy, for any n
I found another method that I do not understand, but seems faster and simper, if I can use it here I would like to.
Initialise numpy array of unknown length
Thank you for any assistance you can provide.

Your loop isn't executing because i and j are both 0, so range(i) and range(j) are empty. Also you can't assign a list [i,random] to an array value (np.empty defaults to np.float64). I've simply changed it to only store the random number, but if you really want to store a list, you can change the creation of pop to be pop = np.empty([popSize, len(ideal)],dtype=list)
Otherwise use this for the last lines:
for i, j in itertools.product(range(popSize), range(len(ideal))):
pop[i, j] = random.randint(minVal,maxVal)

Related

Analyzing the complexity matrix path-finding

Recently in my homework, I was assinged to solve the following problem:
Given a matrix of order nxn of zeros and ones, find the number of paths from [0,0] to [n-1,n-1] that go only through zeros (they are not necessarily disjoint) where you could only walk down or to the right, never up or left. Return a matrix of the same order where the [i,j] entry is the number of paths in the original matrix that go through [i,j], the solution has to be recursive.
My solution in python:
def find_zero_paths(M):
n,m = len(M),len(M[0])
dict = {}
for i in range(n):
for j in range(m):
M_top,M_bot = blocks(M,i,j)
X,Y = find_num_paths(M_top),find_num_paths(M_bot)
dict[(i,j)] = X*Y
L = [[dict[(i,j)] for j in range(m)] for i in range(n)]
return L[0][0],L
def blocks(M,k,l):
n,m = len(M),len(M[0])
assert k<n and l<m
M_top = [[M[i][j] for i in range(k+1)] for j in range(l+1)]
M_bot = [[M[i][j] for i in range(k,n)] for j in range(l,m)]
return [M_top,M_bot]
def find_num_paths(M):
dict = {(1, 1): 1}
X = find_num_mem(M, dict)
return X
def find_num_mem(M,dict):
n, m = len(M), len(M[0])
if M[n-1][m-1] != 0:
return 0
elif (n,m) in dict:
return dict[(n,m)]
elif n == 1 and m > 1:
new_M = [M[0][:m-1]]
X = find_num_mem(new_M,dict)
dict[(n,m-1)] = X
return X
elif m == 1 and n>1:
new_M = M[:n-1]
X = find_num_mem(new_M, dict)
dict[(n-1,m)] = X
return X
new_M1 = M[:n-1]
new_M2 = [M[i][:m-1] for i in range(n)]
X,Y = find_num_mem(new_M1, dict),find_num_mem(new_M2, dict)
dict[(n-1,m)],dict[(n,m-1)] = X,Y
return X+Y
My code is based on the idea that the number of paths that go through [i,j] in the original matrix is equal to the product of the number of paths from [0,0] to [i,j] and the number of paths from [i,j] to [n-1,n-1]. Another idea is that the number of paths from [0,0] to [i,j] is the sum of the number of paths from [0,0] to [i-1,j] and from [0,0] to [i,j-1]. Hence I decided to use a dictionary whose keys are matricies of the form [[M[i][j] for j in range(k)] for i in range(l)] or [[M[i][j] for j in range(k+1,n)] for i in range(l+1,n)] for some 0<=k,l<=n-1 where M is the original matrix and whose values are the number of paths from the top of the matrix to the bottom. After analizing the complexity of my code I arrived at the conclusion that it is O(n^6).
Now, my instructor said this code is exponential (for find_zero_paths), however, I disagree.
The recursion tree (for find_num_paths) size is bounded by the number of submatrices of the form above which is O(n^2). Also, each time we add a new matrix to the dictionary we do it in polynomial time (only slicing lists), SO... the total complexity is polynomial (poly*poly = poly). Also, the function 'blocks' runs in polynomial time, and hence 'find_zero_paths' runs in polynomial time (2 lists of polynomial-size times a function which runs in polynomial time) so all in all the code runs in polynomial time.
My question: Is the code polynomial and my O(n^6) bound is wrong or is it exponential and I am missing something?

Unfortunately, your instructor is right.
There is a lot to unpack here:
Before we start, as quick note. Please don't use dict as a variable name. It hurts ^^. Dict is a reserved keyword for a dictionary constructor in python. It is a bad practice to overwrite it with your variable.
First, your approach of counting M_top * M_bottom is good, if you were to compute only one cell in the matrix. In the way you go about it, you are unnecessarily computing some blocks over and over again - that is why I pondered about the recursion, I would use dynamic programming for this one. Once from the start to end, once from end to start, then I would go and compute the products and be done with it. No need for O(n^6) of separate computations. Sine you have to use recursion, I would recommend caching the partial results and reusing them wherever possible.
Second, the root of the issue and the cause of your invisible-ish exponent. It is hidden in the find_num_mem function. Say you compute the last element in the matrix - the result[N][N] field and let us consider the simplest case, where the matrix is full of zeroes so every possible path exists.
In the first step, your recursion creates branches [N][N-1] and [N-1][N].
In the second step, [N-1][N-1], [N][N-2], [N-2][N], [N-1][N-1]
In the third step, you once again create two branches from every previous step - a beautiful example of an exponential explosion.
Now how to go about it: You will quickly notice that some of the branches are being duplicated over and over. Cache the results.

amplitude spectrum in Python

I have a given array with a length of over 1'000'000 and values between 0 and 255 (included) as integers. Now I would like to plot on the x-axis the integers from 0 to 255 and on the y-axis the quantity of the corresponding x value in the given array (called Arr in my current code).
I thought about this code:
list = []
for i in range(0, 256):
icounter = 0
for x in range(len(Arr)):
if Arr[x] == i:
icounter += 1
list.append(icounter)
But is there any way I can do this a little bit faster (it takes me several minutes at the moment)? I thought about an import ..., but wasn't able to find a good package for this.

Use numpy.bincount for this task (look for more details here)
import numpy as np
list = np.bincount(Arr)

While I completely agree with the previous answers that you should use a standard histogram algorithm, it's quite easy to greatly speed up your own implementation. Its problem is that you pass through the entire input for each bin, over and over again. It would be much faster to only process the input once, and then write only to the relevant bin:
def hist(arr):
nbins = 256
result = [0] * nbins # or np.zeroes(nbins)
for y in arr:
if y>=0 and y<nbins:
result[y] += 1
return result

Reset array in a nested loop while saving array at each iteration

I am running some simulations that take a numpy array as input, continue for several iterations until some condition is met (stochastically), and then repeat again using the same starting array. Each complete simulation starts with the same array and the number of steps to complete each simulation isn't known beforehand (but it's fine to put a limit on it, maxt).
I would like to save the array (X) after each step through each simulation, preferably in a large multi-dimensional array. Below I'm saving each simulation output in a list, and saving the array with copy.copy. I appreciate there are other methods I could use (using tuples for instance) so is there a more efficient method for doing this in Python?
Note: I appreciate this is a trivial example and that one could vectorize the code below. In the actual application being used, however, I have to use a loop as the stochasticity is introduced in a more complicated manner.
import numpy as np, copy
N = 10
Xstart = np.zeros(N)
Xstart[0] = 1
num_sims = 3
prob = 0.2
maxt = 20
analysis_result = []
for i in range(num_sims):
print("-------- Starting new simulation --------")
t = 0
X = copy.copy(Xstart)
# Create a new array to store results, save the array
sim_result = np.zeros((maxt, N))
sim_result[t,:] = X
while( (np.count_nonzero(X) < N) & (t < maxt) ):
print(X)
# Increment elements of the array stochastically
X[(np.random.rand(N) < prob)] += 1
# Save the array for time t
sim_result[t,:] = copy.copy(X)
t += 1
print(X)
analysis_result.append(sim_result[:t,:])

Append keeps on appending the same item, does not append the right ones, Python

This is what I have imported:
import random
import matplotlib.pyplot as plt
from math import log, e, ceil, floor
import numpy as np
from numpy import arange,array
import pdb
from random import randint
Here I define the function matrix(p,m)
def matrix(p,m): # A matrix with zeros everywhere, except in every entry in the middle of the row
v = [0]*m
v[(m+1)/2 - 1] = 1
vv = array([v,]*p)
return vv
ct = np.zeros(5) # Here, I choose 5 cause I wanted to work with an example, but should be p in general
Here I define MHops which basically takes the dimensions of the matrix, the matrix and the vector ct and gives me a new matrix mm and a new vector ct
def MHops(p,m,mm,ct):
k = 0
while k < p : # This 'spans' the rows
i = 0
while i < m : # This 'spans' the columns
if mm[k][i] == 0 :
i+=1
else:
R = random.random()
t = -log(1-R,e) # Calculate time of the hopping
ct[k] = ct[k] + t
r = random.random()
if 0 <= r < 0.5 : # particle hops right
if 0 <= i < m-1:
mm[k][i] = 0
mm[k][i+1] = 1
break
else:
break # Because it is at the boundary
else: # particle hops left
if 0 < i <=m-1:
mm[k][i] = 0
mm[k][i-1] = 1
break
else: # Because it is at the boundary
break
break
k+=1
return (mm,ct) # Gives me the new matrix showing the new position of the particles and a new vector of times, showing the times taken by each particle to hop
Now what I wanna do is iterating this process, but I wanna be able to visualize every step in a list. In short what I am doing is:
1. creating a matrix representing a lattice, where 0 means there is no particle in that slot and 1 means there is a particle there.
2. create a function MHops which simulate a random walk of one step and gives me the new matrix and a vector ct which shows the times at which the particles move.
Now I want to have a vector or an array where I have 2*n objects, i.e. the matrix mm and the vector ct for n iterations. I want the in a array, list or something like this cause I need to use them later on.
Here starts my problem:
I create an empty list, I use append to append items at every iteration of the while loop. However the result that I get is a list d with n equal objects coming from the last iteration!
Hence my function for the iteration is the following:
def rep_MHops(n,p,m,mm,ct):
mat = mm
cct = ct
d = []
i = 0
while i < n :
y = MHops(p,m,mat,cct) # Calculate the hop, so y is a tuple y = (mm,ct)
mat = y[0] # I reset mat and cct so that for the next iteration, I go further
cct = y[1]
d.append(mat)
d.append(cct)
i+=1
return d
z = rep_MHops(3,5,5,matrix(5,5),ct) #If you check this, it doesn't work
print z
However it doesn't work, I don't understand why. What I am doing is using MHops, then I want to set the new matrix and the new vector as those in the output of MHops and doing this again. However if you run this code, you will see that v works, i.e. the vector of the times increases and the matrix of the lattice change, however when I append this to d, d is basically a list of n equal objects, where the object are the last iteration.
What is my mistake?
Furthermore if you have any coding advice for this code, they would be more than welcome, I am not sure this is an efficient way.
Just to let you understand better, I would like to use the final vector d in another function where first of all I pick a random time T, then I would basically check every odd entry (every ct) and hence check every entry of every ct and see if these numbers are less than or equal to T. If this happens, then the movement of the particle happened, otherwise it didn't.
From this then I will try to visualize with matpotlibt the result with an histogram or something similar.
Is there anyone who knows how to run this kind of simulation in matlab? Do you think it would be easier?

You're passing and storing by references not copies, so on the next iteration of your loop MHops alters your previously stored version in d. Use import copy; d.append(copy.deepcopy(mat)) to instead store a copy which won't be altered later.
Why?
Python is passing the list by reference, and every loop you're storing a reference to the same matrix object in d.
I had a look through python docs, and the only mention I can find is
"how do i write a function with output parameters (call by reference)".
Here's a simpler example of your code:
def rep_MHops(mat_init):
mat = mat_init
d = []
for i in range(5):
mat = MHops(mat)
d.append(mat)
return d
def MHops(mat):
mat[0] += 1
return mat
mat_init = [10]
z = rep_MHops(mat_init)
print(z)
When run gives:
[[15], [15], [15], [15], [15]]
Python only passes mutable objects (such as lists) by reference. An integer isn't a mutable object, here's a slightly modified version of the above example which operates on a single integer:
def rep_MHops_simple(mat_init):
mat = mat_init
d = []
for i in range(5):
mat = MHops_simple(mat)
d.append(mat)
return d
def MHops_simple(mat):
mat += 1
return mat
z = rep_MHops_simple(mat_init=10)
print(z)
When run gives:
[11, 12, 13, 14, 15]
which is the behaviour you were expecting.
This SO answer How do I pass a variable by reference? explains it very well.

Compress an array in python?

Is there a way to "compress" an array in python so as to keep the same range but simply decrease the number of elements to a given value?
For example I have an array with 1000 elements and I want to modify it to have 100. Specifically I have a numpy array that is
x = linspace(-1,1,1000)
But because of the way in which I am using it in my project, I can't simply recreate it using linspace as it will not always be in the domain of -1 to 1 and have 1000 elements. These parameters change and I don't have access to them in the function I am defining. So I need a way to compress the array while keeping the -1 to 1 mapping. Think of it as decreasing the "resolution" of the array. Is this possible with any built in functions or different libraries?

A simple way to "resample" your array is to group it into chunks, then average each chunk:
(Chunking function is from this answer)
# Chunking function
def chunks(l, n):
for i in xrange(0, len(l), n):
yield l[i:i+n]
# Resampling function
def resample(arr, newLength):
chunkSize = len(arr)/newLength
return [np.mean(chunk) for chunk in chunks(arr, chunkSize)]
# Example:
import numpy as np
x = np.linspace(-1,1,15)
y = resample(x, 5)
print y
# Result:
# [-0.85714285714285721, -0.4285714285714286, -3.7007434154171883e-17, 0.42857142857142844, 0.8571428571428571]
As you can see, the range of the resampled array does drift inward, but this effect would be much smaller for larger arrays.
It's not clear to me whether the arrays will always be generated by numpy.linspace or not. If so, there are simpler ways of doing this, like simply picking each nth member of the original array, where n is determined by the "compression" ratio:
def linearResample(arr, newLength):
spacing = len(arr) / newLength
return arr[::spacing]

You could pick items at random to reduce any bias you have in the reduction. If the original sample is unordered it would just be:
import random
sample = range(1000)
def reduce(sample, count):
work = sample[:]
random.shuffle(work)
return work[:count]
If order matters, then use enum to track position and reassemble
def reduce(sample, count):
indexed = [item for item in enumerate(sample)]
random.shuffle(indexed)
trimmed = indexed[:count]
trimmed.sort()
return [item for index,item in trimmed]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matrix of variable size [i x j] (Python, Numpy) - python

Related

Analyzing the complexity matrix path-finding

amplitude spectrum in Python

Reset array in a nested loop while saving array at each iteration

Append keeps on appending the same item, does not append the right ones, Python

Compress an array in python?

Categories

Resources