Related
I am trying to implement a simple mapping to a set of values from an array created with numpy of 2-D.
For each row in the array I need to choose the correct value corresponding with the set of values and add it to a array.
For example:
[0, 1, 0, 0] -> 3
...
[1, 0, 1, 0] -> 2
But, my first implementation made me wonder if I'm doing something really wrong or not efficient at all because of the size of my dataset, so I did this workaround without using for loops and optimize speed execution using dictionary lookup.
import numpy as np
# function to perform the search and return the index accordingly (it is supposed to be fast because of data structure)
def get_val(n):
map_list = {0: [0, 1, 0], 1: [0, 1, 0], 2: [1, 0, 0], 3: [0, 0, 1]}
map_vals = list(map_list.values())
index = map_vals.index(list(n))
return(index)
# set of arbitrary arrays
li = np.array([[0, 1, 0], [0, 0, 1]])
# here is the performance improvement attempt with the help of the function above
arr = [get_val(n) for n in li]
print(arr)
I'm not completely sure if this is the correct way to do it for getting the needed value for a set like this. If there is a better way, please let me know.
Otherwise, I refer to my main question:
what is the best way possible to optimize the code?
Thanks so much for your help.
You can try use matrix multiplication (dot product):
a=np.array([[0, 0, 0],[0, 1, 0], [1, 0, 0], [0, 0, 1]]) # dict values
c=np.array([0,1,2,3]) # dict keys
li = np.array([[0, 1, 0], [0, 0, 1]])
b=np.linalg.pinv(a)#c # decoding table
result=li#b
print(result)
I need to calculate the eigenvalues of an 8x8-matrix and plot each of the eigenvalues for a symbolic variable occuring in the matrix. For the matrix I'm using I get 8 different eigenvalues where each is representing a function in "W", which is my symbolic variable.
Using python I tried calculating the eigenvalues with Scipy and Sympy which worked kind of, but the results are stored in a weird way (at least for me as a newbie not understanding much of programming so far) and I didn't find a way to extract just one eigenvalue in order to plot it.
import numpy as np
import sympy as sp
W = sp.Symbol('W')
w0=1/780
wl=1/1064
# This is my 8x8-matrix
A= sp.Matrix([[w0+3*wl, 2*W, 0, 0, 0, np.sqrt(3)*W, 0, 0],
[2*W, 4*wl, 0, 0, 0, 0, 0, 0],
[0, 0, 2*wl+w0, np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W],
[0, 0, np.sqrt(3)*W, 3*wl, 0, 0, 0, 0],
[0, 0, 0, 0, wl+w0, np.sqrt(2)*W, 0, 0],
[np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W, 2*wl, 0, 0],
[0, 0, 0, 0, 0, 0, w0, W],
[0, 0, np.sqrt(2)*W, 0, 0, 0, W, wl]])
# Calculating eigenvalues
eva = A.eigenvals()
evaRR = np.array(list(eva.keys()))
eva1p = evaRR[0] # <- this is my try to refer to the first eigenvalue
In the end I hope to get a plot over "W" where the interesting range is [-0.002 0.002]. For the ones interested it's about atomic physics and W refers to the rabi frequency and I'm looking at so called dressed states.
You're not doing anything incorrectly -- I think you're just caught up since your eigenvalues look so jambled and complicated.
import numpy as np
import sympy as sp
import matplotlib.pyplot as plt
W = sp.Symbol('W')
w0=1/780
wl=1/1064
# This is my 8x8-matrix
A= sp.Matrix([[w0+3*wl, 2*W, 0, 0, 0, np.sqrt(3)*W, 0, 0],
[2*W, 4*wl, 0, 0, 0, 0, 0, 0],
[0, 0, 2*wl+w0, np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W],
[0, 0, np.sqrt(3)*W, 3*wl, 0, 0, 0, 0],
[0, 0, 0, 0, wl+w0, np.sqrt(2)*W, 0, 0],
[np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W, 2*wl, 0, 0],
[0, 0, 0, 0, 0, 0, w0, W],
[0, 0, np.sqrt(2)*W, 0, 0, 0, W, wl]])
# Calculating eigenvalues
eva = A.eigenvals()
evaRR = np.array(list(eva.keys()))
# The above is copied from your question
# We have to answer what exactly the eigenvalue is in this case
print(type(evaRR[0])) # >>> Piecewise
# Okay, so it's a piecewise function (link to documentation below).
# In the documentation we see that we can use the .subs method to evaluate
# the piecewise function by substituting a symbol for a value. For instance,
print(evaRR[0].subs(W, 0)) # Will substitute 0 for W
# This prints out something really nasty with tons of fractions..
# We can evaluate this mess with sympy's numerical evaluation method, N
print(sp.N(evaRR[0].subs(W, 0)))
# >>> 0.00222190090611143 - 6.49672880062804e-34*I
# That's looking more like it! Notice the e-34 exponent on the imaginary part...
# I think it's safe to assume we can just trim that off.
# This is done by setting the chop keyword to True when using N:
print(sp.N(evaRR[0].subs(W, 0), chop=True)) # >>> 0.00222190090611143
# Now let's try to plot each of the eigenvalues over your specified range
fig, ax = plt.subplots(3, 3) # 3x3 grid of plots (for our 8 e.vals)
ax = ax.flatten() # This is so we can index the axes easier
plot_range = np.linspace(-0.002, 0.002, 10) # Range from -0.002 to 0.002 with 10 steps
for n in range(8):
current_eigenval = evaRR[n]
# There may be a way to vectorize this computation, but I'm not familiar enough with sympy.
evaluated_array = np.zeros(np.size(plot_range))
# This will be our Y-axis (or W-value). It is set to be the same shape as
# plot_range and is initally filled with all zeros.
for i in range(np.size(plot_range)):
evaluated_array[i] = sp.N(current_eigenval.subs(W, plot_range[i]),
chop=True)
# The above line is evaluating your eigenvalue at a specific point,
# approximating it numerically, and then chopping off the imaginary.
ax[n].plot(plot_range, evaluated_array, "c-")
ax[n].set_title("Eigenvalue #{}".format(n))
ax[n].grid()
plt.tight_layout()
plt.show()
And as promised, the Piecewise documentation.
I need to have a fitness proportionate selection approach to a GA, however my population cant loose the structure (order), in this case while generating the probabilities, I believe the individuals get the wrong weights, the program is:
population=[[[0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1], [6], [0]],
[[0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1], [4], [1]],
[[0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0], [6], [2]],
[[1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0], [4], [3]]]
popultion_d={'0,0,1,0,1,1,0,1,1,1,1,0,0,0,0,1': 6,
'0,0,1,1,1,0,0,1,1,0,1,1,0,0,0,1': 4,
'0,1,1,0,1,1,0,0,1,1,1,0,0,1,0,0': 6,
'1,0,0,1,1,1,0,0,1,1,0,1,1,0,0,0': 4}
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = (sum(fitness))
relative_fitness = [f/total_fit for f in fitness]
probabilities = [sum(relative_fitness[:i+1]) for i in range(len(relative_fitness))]
return (probabilities)
def FitnessProportionateSelection(population, probabilities, number):
chosen = []
for n in range(number):
r = random.random()
for (i, individual) in enumerate(population):
if r <= probabilities[i]:
chosen.append(list(individual))
break
return chosen
number=2
The population element is: [[individual],[fitness],[counter]]
The probabilities function output is: [0.42857142857142855, 0.5714285714285714, 0.8571428571428571, 1.0]
What I notice here is that the previous weight is summed up to the next one, not necessarily being in crescent order, so a think a higher weight is given to the cromosome with a lowest fitness.
I dont want to order it because I need to index the lists by position later, so I think I will have wrong matches.
Anyone knows a possible solution, package or different approach to perform a weighted the selection in this case?
p.s: I know the dictionary may be redundant here, but I had several other problems using the list itself.
Edit: I tried to use random.choices() as you can see below (using relative fitness):
def FitnessChoices(population, probabilities, number):
return random.choices(population, probabilities, number)
But I get this error: TypeError: choices() takes from 2 to 3 positional arguments but 4 were given
Thank you!
Using random.choices is certainly a good idea. You just need to understand the function call. You have to specify, whether your probabilities are marginal or cumulated. So you could use either
import random
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = sum(fitness)
relative_fitness = [f/total_fit for f in fitness]
return relative_fitness
def FitnessChoices(population, relative_fitness, number):
return random.choices(population, weights = relative_fitness, k = number)
or
import random
def ProbabilityList(population_d):
fitness = population_d.values()
total_fit = sum(fitness)
relative_fitness = [f/total_fit for f in fitness]
cum_probs = [sum(relative_fitness[:i+1]) for i in range(len(relative_fitness))]
return cum_probs
def FitnessChoices(population, cum_probs, number):
return random.choices(population, cum_weights = cum_probs, k = number)
I'd recommend you to have a look at the differences between keyword and positional arguments in python.
Say you have an image in the form of a numpy.array:
vals=numpy.array([[3,24,25,6,2],[8,7,6,3,2],[1,4,23,23,1],[45,4,6,7,8],[17,11,2,86,84]])
And you want to compute how many cells are inside each object, given a threshold value of 17 (example):
from scipy import ndimage
from skimage.measure import regionprops
blobs = numpy.where(vals>17, 1, 0)
labels, no_objects = ndimage.label(blobs)
props = regionprops(blobs)
If you check, this gives an image with 4 distinct objects over the threshold:
In[1]: blobs
Out[1]:
array([[0, 1, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 1, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1]])
In fact:
In[2]: no_objects
Out[2]: 4
I want to compute the number of cells (or area) of each object. The intended outcome is a dictionary with the object ID: number of cells format:
size={0:2,1:2,2:1,3:2}
My attempt:
size={}
for label in props:
size[label]=props[label].area
Returns an error:
Traceback (most recent call last):
File "<ipython-input-76-e7744547aa17>", line 3, in <module>
size[label]=props[label].area
TypeError: list indices must be integers, not _RegionProperties
I understand I am using label incorrectly, but the intent is to iterate over the objects. How to do this?
A bit of testing and research sometimes goes a long way.
The problem is both with blobs, because it is not carrying the different labels but only 0,1 values, and label, which needs to be replaced by an iterator looping over range(0,no_objects).
This solution seems to be working:
import skimage.measure as measure
import numpy
from scipy import ndimage
from skimage.measure import regionprops
vals=numpy.array([[3,24,25,6,2],[8,7,6,3,2],[1,4,23,23,1],[45,4,6,7,8],[17,11,2,86,84]])
blobs = numpy.where(vals>17, 1, 0)
labels, no_objects = ndimage.label(blobs)
#blobs is not in an amicable type to be processed right now, so:
labelled=ndimage.label(blobs)
resh_labelled=labelled[0].reshape((vals.shape[0],vals.shape[1])) #labelled is a tuple: only the first element matters
#here come the props
props=measure.regionprops(resh_labelled)
#here come the sought-after areas
size={i:props[i].area for i in range (0, no_objects)}
Result:
In[1]: size
Out[1]: {0: 2, 1: 2, 2: 1, 3: 2}
And if anyone wants to check for the labels:
In[2]: labels
Out[2]:
array([[0, 1, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 2, 2, 0],
[3, 0, 0, 0, 0],
[0, 0, 0, 4, 4]])
And if anyone wants to plot the 4 objects found:
import matplotlib.pyplot as plt
plt.set_cmap('OrRd')
plt.imshow(labels,origin='upper')
To answer the original question:
You have to apply regionprops to the labeled image: props = regionprops(labels)
You can then construct the dictionary using:
size = {r.label: r.area for r in props}
which yields
{1: 2, 2: 2, 3: 1, 4: 2}
That regionprops will generate a lot more information than just the area of each blob. So, if you are just looking to get the count of pixels for the blobs, as an alternative and with focus on performance, we can use np.bincount on labels obtained with ndimage.label, like so -
np.bincount(labels.ravel())[1:]
Thus, for the given sample -
In [53]: labeled_areas = np.bincount(labels.ravel())[1:]
In [54]: labeled_areas
Out[54]: array([2, 2, 1, 2])
To have these results in a dictionary, one additional step would be -
In [55]: dict(zip(range(no_objects), labeled_areas))
Out[55]: {0: 2, 1: 2, 2: 1, 3: 2}
Take the probability distribution of a XOR gate in which every configuration is equally probable (configurations are given by outcomes_sub; the probability mass function by pmf_xor_sub):
import numpy as np
import itertools as it
outcomes_sub = [list(item) for item in list(it.product([0,1], repeat=3))]
pmf_xor_sub = np.array([1/4, 0, 0, 1/4, 0, 1/4, 1/4, 0])
Now take the probability distribution corresponding to two uncorrelated such XORs:
outcomes = [outcome1 + outcome2 for (outcome1, outcome2)
in it.product(outcomes_sub, outcomes_sub)]
pmf_xor = [pmf1 * pmf2 for (pmf1, pmf2)
in it.product(pmf_xor_sub, pmf_xor_sub)]
And create some data based on it:
indices = np.random.choice(len(outcomes), 10000, p=pmf_xor)
data_xor = np.array([outcomes[index] for index in indices])
data_xor looks like this:
array([[1, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 0],
...,
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
I.e., two independent XORs back to back. What's the right way to perform dimensionality reduction on it? PCA won't work (because the dependence is non-linear, right?):
from sklearn import decomposition
pca_xor = decomposition.PCA()
pca_xor.fit(data_xor)
Now, pca_xor.explained_variance_ratio_ gives:
array([ 0.17145045, 0.17018817, 0.16758773, 0.16575979, 0.16410862,
0.16090524], dtype=float32)
No two components stand out. I understand that a non-linear method such as kernel PCA should work here, but I am struggling to find pointers to ways of applying it to my problem.
To give a bit more context: what I am actually after is ways to bring out the structure in data_xor: two big XOR blobs, each of which is composed of some finer-grained stuff. If I am going about it all wrong, feel free to point that out too.