Matrix with given numbers in random places in python/numpy - python

I have an NxN matrix filled with zeros. Now I want to add to the matrix, say, n ones and m twos to random places. I.e. I want to create a matrix where there is some fixed amount of a given number at random places and possibly a fixed amount of some other given number in random places. How do I do this?
In Matlab I would do this by making a random permutation of the matrix indices with randperm() and then filling the n first indices given by randperm of the matrix with ones and m next with twos.

You can use numpy.random.shuffle to randomly permute an array in-place.
>>> import numpy as np
>>> X = np.zeros(N * N)
>>> X[:n] = 1
>>> X[n:n+m] = 2
>>> np.random.shuffle(X)
>>> X = X.reshape((N, N))

Would numpy.random.permutation be what you are looking for?
You can do something like this:
In [9]: a=numpy.zeros(100)
In [10]: p=numpy.random.permutation(100)
In [11]: a[p[:10]]=1
In [12]: a[p[10:20]]=2
In [13]: a.reshape(10,10)
Out[13]:
array([[ 0., 1., 0., 0., 0., 2., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0., 2., 0.],
[ 0., 2., 0., 0., 0., 0., 2., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 2., 0., 2., 1., 1., 0.],
[ 0., 0., 0., 0., 1., 0., 2., 0., 0., 0.],
[ 0., 2., 0., 2., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 2., 0., 0., 0., 1., 0.]])
Here we create a random permutation, then set the first 10 indices taken from the permutation in a to 1, then the next 10 indices to 2.

To generate the indices of the elements for where to add ones and twos, what about this?
# assuming N, n and m exist.
In [1]: import random
In [3]: indices = [(m, n) for m in range(N) for n in range(N)]
In [4]: random_indices = random.sample(indices, n + m)
In [5]: ones = random_indices[:n]
In [6]: twos = random_indices[n:]
Corrected as commented by Petr Viktorin in order not to have overlapping indexes in ones and twos.
An alternate way to generate the indices:
In [7]: import itertools
In [8]: indices = list(itertools.product(range(N), range(N)))

Related

Indexing numpy matrix

So lets say I have a (4,10) array initialized to zeros, and I have an input array in the form [2,7,0,3]. The input array will modify the zeros matrix to look like this:
[[0,0,1,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,1,0,0],
[1,0,0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0,0,0]]
I know I can do that by looping through the input target and indexing the matrix array with something like matrix[i][target in input target], but I tried to do it without a loop doing something like:
matrix[:, input_target] = 1, but that sets me the entire matrix to all 1.
Apparently the way to do it is:
matrix[range(input_target.shape[0]), input_target], the question is why this works and not using the colon ?
Thanks!
You only wish to update one column for each row. Therefore, with advanced indexing you must explicitly provide those row identifiers:
A = np.zeros((4, 10))
A[np.arange(A.shape[0]), [2, 7, 0, 3]] = 1
Result:
array([[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]])
Using a colon for the row indexer will tell NumPy to update all rows for the specified columns:
A[:, [2, 7, 0, 3]] = 1
array([[ 1., 0., 1., 1., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 1., 1., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 1., 1., 0., 0., 0., 1., 0., 0.],
[ 1., 0., 1., 1., 0., 0., 0., 1., 0., 0.]])

Calculate the area of two separate geometries in Python

I have been stumped on this problem for a while now and was wondering if anyone would be able to help. So let's say I have a binary image as shown below and I would like to count the black elements (zero). The problem is I want to know the number of elements associated with 'background' and 'trapezoid' in the middle individually, so output two values. What would be the easiest way to approach this? I have been trying to do it without using a mask but is that even possible? I have the numpy and scipy libraries if that helps.
You can use two functions from scipy.ndimage.measurements: label and find_objects.
First you invert the array, because label function considers zero to be the background.
inverted = 1 - binary_image_array
Then you call label to find the different regions:
labeled_array, num_features = scipy.ndimage.measurements.label(inverted)
So, for this particular array, where you already know there are exactely two black blobs, you have the two regions in labeled_array.
Obviously, the scipy approach is a good answer.
I was thinking that you might be able to work with numpy.cumsum and numpy.diff to find an enclosed area.
The cumulative sum will be zero while you are in the black area, then increase by one for every pixel in the white area, be stable again while you traverse the enclosed area, then start increasing again, etc.
The second order difference then finds places where the jumps occur, and you are left with a "classified" map. No guarantee that this generalizes, just an idea.
a = numpy.zeros((10,10))
a[3:7,3:7] = 1
a[4:6, 4:6] = 0
y = numpy.cumsum(a, axis=0)
x = numpy.cumsum(a, axis=1)
yy= numpy.diff(y, n=2, axis=0)
xx = numpy.diff(x, n=2, axis=1)
numpy.dot(xx,yy)
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 2., 2., 2., 2., 0., 0., 0.],
[ 0., 0., 0., 2., 4., 4., 2., 0., 0., 0.],
[ 0., 0., 0., 2., 4., 4., 2., 0., 0., 0.],
[ 0., 0., 0., 2., 2., 2., 2., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

Inexplicable behavior when using vlen with h5py

I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can't explain, maybe you can me help in understanding what is happening:
>>>> import h5py
>>>> import numpy as np
>>>> fp = h5py.File(datasource_fname, mode='w')
>>>> dt = h5py.special_dtype(vlen=np.dtype('float32'))
>>>> train_targets = fp.create_dataset('target_sequence', shape=(9549, 5,), dtype=dt)
>>>> test
Out[130]:
array([[ 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.]])
>>>> train_targets[0] = test
>>>> train_targets[0]
Out[138]:
array([ array([ 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1.], dtype=float32),
array([ 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.], dtype=float32),
array([ 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.], dtype=float32),
array([ 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0.], dtype=float32),
array([ 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)], dtype=object)
I do expect the train_targets[0] to be of this shape, however I can't recognize the rows in my array. They seem to be totally jumbled about, however it is consistent. By which I mean that every time I try the above code, train_targets[0] looks the same.
To clarify: the first element in my train_targets, in this case test, has shape (5,11), however the second element might be of shape (5,38) which is why I use vlen.
Thank you for your help
Mat
I think
train_targets[0] = test
has stored your (11,5) array as an F ordered array in a row of train_targets. According to the (9549,5) shape, that's a row of 5 elements. And since it is vlen, each element is a 1d array of length 11.
That's what you get back in train_targets[0] - an array of 5 arrays, each shape (11,), with values taken from test (order F).
So I think there are 2 issues - what a 2d shape means, and what vlen allows.
My version of h5py is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen only works with 1d arrays, an extension, so to speak, of byte strings.
Does the 5 in shape=(9549, 5,) have anything to do with 5 in the test.shape? I don't think it does, at least not as numpy and h5py see it.
When I make a file following the string vlen example:
>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)
and then do:
ds[0]='this one string'
and look at ds[0], I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds.
ds[0,0]='another'
is the correct way to set just one element.
vlen is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,) and (38,) with vlen, but not 2d ones.
Actually, train_targets output is reproduced with:
In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
test1[i]=test.T.flatten()[i:i+11]
It's 11 values taken from the transpose (F order), but shifted for each sub array.

How to Make Sense of Fourier Transform Results in Audio Frequency Analysis

I am doing audio analysis in Python. My end goal is to get a list of frequencies and their respective volumes, like { frequency : volume (0.0 - 1.0) }.
I have my audio data as a list of frames with values between -1.0 and +1.0. I used numpy's fourier transform on this list — numpy.fftpack.fft(). But the resulting data makes no sense to me.
I do understand that the fourier transform transforms from the time to the frequency domain, but not quite how it mathematically works. That's why I don't quite understand the results.
What do the values in the list that numpy.fftpack.fft() returns mean? How do I work with it/interpret it?
What would be the max/min values of the fourier transform performed on a list as described above be?
How can I get to my end goal of a dictionary in the form { frequency : volume (0.0 - 1.0) }?
Thank you. Sorry if my lack of understanding of the fourier transform made you facepalm.
Consider the FFT of a single period of a sine wave:
>>> t = np.linspace(0, 2*np.pi, 100)
>>> x = np.sin(t)
>>> f = np.fft.rfft(x)
>>> np.round(np.abs(f), 0)
array([ 0., 50., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.])
The FFT returns an array of complex numbers which give the amplitude and phase of the frequencies. Assuming you're only interested in the amplitude, I've used np.abs to get the magnitude for each frequency and rounded it to the nearest integer using np.round(__, 0). You can see the spike at index 1 indicating a sin wave with period equal to the number of samples was found.
Now make the wave a bit more complex
>>> x = np.sin(t) + np.sin(3*t) + np.sin(5*t)
>>> f = np.fft.rfft(x)
>>> np.round(np.abs(f), 0)
array([ 0., 50., 1., 50., 0., 48., 4., 2., 2., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.])
We now see spikes at indicies 1, 3 & 5 corresponding to our input. Sine waves with periods of n, n/3 and n/5 (where n in the number of input samples).
EDIT
Here's a good conceptual explanation of the Fourier transform: http://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/

python scatter threshold function not working

I want a scatterplot with values exceeding a particular threshold to have another color then the ones "inside" the threshold.
Here is what I wrote so far:
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
n = 100
x = rnd.uniform(low = -1, high = 1, size = n)
y = rnd.uniform(low = -1, high = 1, size = n)
a = x**2 + y**2
c = np.zeros(n)
for i in range(n):
if a[i] <= 1:
c[i] = 0
else:
c[i] = 1
plt.scatter(x,y, color = c)
plt.show()
the output is a completely black scatter plot.
c = array([ 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.,
0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.,
0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 1., 1., 0., 0., 1., 1., 1.])
I tried the following:
for i in range(n):
if a[i] <= 1:
c[i] = "r"
else:
c[i] = "g"
ValueError: could not convert string to float: r
and several other variations of the theme. However I am stuck. Please help, thank you very much for your time.
Best wishes
You have c defined as integers with this line:
c = np.zeros(n)
But then in your second code snippet you are trying to set c as a string.
c[i] = "r"
Choose a new name for your string array:
cs = []
for i in range(n):
if a[i] <= 1:
cs.append("r")
else:
cs.append("g")
If scatter complains about c not being from numpy, you can set a numpy chararry with: numpy.chararray.

Categories