I am trying to extract some features from an image but each of the extracted features are really small. The easiest way to extract larger features seems to be to use a larger structuring element but the following code fails when ITER > 1.
from scipy import ndimage,misc
lena=misc.lena().astype(float64)
lena/=ndimage.maximum(lena)
lena=lena>0.54# convert to binary image
# =====================
ITER=1 # || FAILS WHEN ITER > 1 ||
# =====================
struct=ndimage.generate_binary_structure(2,1)
struct=ndimage.iterate_structure(struct,ITER)
lena_label,n =ndimage.label(lena,struct)
slices=ndimage.find_objects(lena_label)
images=[lena[sl] for sl in slices]
imshow(images[0])
.
RuntimeError: structure dimensions must be equal to 3
The parameter structure for the ndimage.label function is used to determine the connectivity of the input. When you represent the input as a rectangular matrix, this connectivity commonly regards either the 4 or the 8 neighbors around a point p. Scipy follows this convention and limits the accepted structure to such cases, therefore it raises an error when anything larger than 3x3 is passed to the function.
If you really want to do such thing, first you need to define very clearly the connectivity you are trying to describe. Then you need to implement it. A simpler way is to first dilate the input, and then label it. This will effectively give the larger features that would be labeled with a larger structure parameter.
Related
I want to create a numpy 3-dimensional array that would be a representation of a texas holdem starting hand matrix with corresponding frequencies for performing certain action in a given spot facing some action (for example UTG facing 3-bet from BU).
If you google preflop hand chart you will find thousands of pictures with hand matrices where fold/call/raise as actions are usually indicated by different colors.
I want to represent that in a numpy 3-dimensional array WITH DIFFERENT DATA TYPES with 13 rows x 13 columns and any number of "layers" in 3rd dimension depending on number of actions I want to store, for example I might want to store min raise/raise 3x/raise all in/call/fold. For that I would need different data types for the first element of "3rd dimension" and integers or decimals for other layers of 3rd dimension. First layer would be just the text representing starting hand combination (like "AA" or "89suited" and the rest of the cells would be numeric.
I created an image for easier understanding of what I mean.
Green layer would be string data type representing the hand matrix.
Yellow layer would be number of combinations of that starting hand.
Blue layer would be for example how often you raise. If you look at the picture you would see that AKs gets raised 81% of the time while AQs 34% of the time.
To get green layer you would type:
array[:,:,0]
Yellow layer would be:
array[:,:,1]
ans so forth.
I know how to create a solution for my problem using JSON, dictionary or some other tool but in the interests of learning and challenges I would like to solve that using numpy.
I also know how to create an array of all text and I could store numbers as strings, retrieve them as such and convert them but that solution is also unsatisfactory.
Plus it would be beneficial to have it as numpy array because of all the slicing and summing that you can do on an array, like knowing the total number of hands that get raised which in this case would be sum of (number of combos, i.e. layer 2 * frequencies of individual starting hands getting raised).
So the question boils down to how to create a 3d numpy array from the start with different data types?
Can you suggest a module function from numpy/scipy that can find local maxima/minima in a text file? I was trying to use the nearest neighbours approach, but the data fluctuations cause false identification. Is it possible to use the neighbour's approach but use 20 data points as the sample_len.
scipy.signal.argrelmax looks for relative maxima in an array (there is also argrelmin for minima). It has the order keyword argument which allows you to compare eg. 20 neighbours.
I have temporal discrete information which may have missing values.
I do have a mask indicating where are those values too.
How can I perform an efficient interpolation filling those values?
In practice I have a TxCxJ tensor (Q). Some elements are let’s say corrupted. I would like, given a corrupted element Q[t,c,j], to fill that value with an interpolation between Q[t-1,c,j] and Q[t+1,c,j]
Also, in the worst case I may find several consecutive corrupted elements:
Q[t_0:t_1,c,j]
to be filled with the interpolation between
Q[t_0-1,c,j] and Q[t_1+1,c,j]
It is ok to use linear interpolation using numpy or pytorch (or any other suitable libray for which I don't need to study 5 months :/). I can code it using for loops but i was looking for any efficient library which allows to pass a mask or something with some sort of cool indexing/masking not having to run the algorithm for known points.
Thaaanks
Spectrum_3 = Spectrum_1/Spectrum_2, but they have different sizes. How could I proceed? Since I am dealing with spectra, my approach is to decrease the resolution of Spectrum_1 so that the data size matches (if you come from Astrophysics is this a correct approach?). Anyhow, I (think I) need to bin the data from Spectrum_1 in such a way that the size of it matches the size of Spectrum_2.
arr1.size is 313136
synth_spec2.size is 102888
arr1_new = arr1.reshape(-1,2).mean(axis=1) # should be the answer but
# I don`t fully understand it.
I need
len(arr1_new) == len(synth_spec2) #True
Generally you need to interpolate the two spectra onto a common wavelength grid, paying careful attention to the ends of the spectra if they don't overlap fully. I would suggest having looking at the synphot package and in particular the SourceSpectrum classes. Despite the name, it has support for a variety of spectra as synthetic photometry is normally done by assembling a suitable source spectrum, applying reddening/extinction etc to it and then multiplying by a filter bandpasses (which is also spectrum-like, being transmission against wavelength) and integrating to derive a flux.
I have written python (2.7.3) code wherein I aim to create a weighted sum of 16 data sets, and compare the result to some expected value. My problem is to find the weighting coefficients which will produce the best fit to the model. To do this, I have been experimenting with scipy's optimize.minimize routines, but have had mixed results.
Each of my individual data sets is stored as a 15x15 ndarray, so their weighted sum is also a 15x15 array. I define my own 'model' of what the sum should look like (also a 15x15 array), and quantify the goodness of fit between my result and the model using a basic least squares calculation.
R=np.sum(np.abs(model/np.max(model)-myresult)**2)
'myresult' is produced as a function of some set of parameters 'wts'. I want to find the set of parameters 'wts' which will minimise R.
To do so, I have been trying this:
res = minimize(get_best_weightings,wts,bounds=bnds,method='SLSQP',options={'disp':True,'eps':100})
Where my objective function is:
def get_best_weightings(wts):
wts_tr=wts[0:16]
wts_ti=wts[16:32]
for i,j in enumerate(portlist):
originalwtsr[j]=wts_tr[i]
originalwtsi[j]=wts_ti[i]
realwts=originalwtsr
imagwts=originalwtsi
myresult=make_weighted_beam(realwts,imagwts,1)
R=np.sum((np.abs(modelbeam/np.max(modelbeam)-myresult))**2)
return R
The input (wts) is an ndarray of shape (32,), and the output, R, is just some scalar, which should get smaller as my fit gets better. By my understanding, this is exactly the sort of problem ("Minimization of scalar function of one or more variables.") which scipy.optimize.minimize is designed to optimize (http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.minimize.html ).
However, when I run the code, although the optimization routine seems to iterate over different values of all the elements of wts, only a few of them seem to 'stick'. Ie, all but four of the values are returned with the same values as my initial guess. To illustrate, I plot the values of my initial guess for wts (in blue), and the optimized values in red. You can see that for most elements, the two lines overlap.
Image:
http://imgur.com/p1hQuz7
Changing just these few parameters is not enough to get a good answer, and I can't understand why the other parameters aren't also being optimised. I suspect that maybe I'm not understanding the nature of my minimization problem, so I'm hoping someone here can point out where I'm going wrong.
I have experimented with a variety of minimize's inbuilt methods (I am by no means committed to SLSQP, or certain that it's the most appropriate choice), and with a variety of 'step sizes' eps. The bounds I am using for my parameters are all (-4000,4000). I only have scipy version .11, so I haven't tested a basinhopping routine to get the global minimum (this needs .12). I have looked at minimize.brute, but haven't tried implementing it yet - thought I'd check if anyone can steer me in a better direction first.
Any advice appreciated! Sorry for the wall of text and the possibly (probably?) idiotic question. I can post more of my code if necessary, but it's pretty long and unpolished.