Generating a random line on a 2d array in python - python

I've got a 2d array of zeros: 250 by 250. And I want to generate a random straight random line of a specific length (haven't yet decided). Obviously, since it's a line the values that turn from zero to one must be connected in some way, vertically, horizontally, or diagonally; and it also has to be straight. How could I do this? I'm quite stuck with this problem, any help would be appreciated.

We can do:
import numpy as np
SIZE = 250
arr = np.zeros((SIZE, SIZE))
M_POS = np.arange(-SIZE, SIZE)
M_POS = np.r_[M_POS, 1 / M_POS[M_POS!=0]]
M = np.random.choice(M_POS, 1)[0]
N = np.random.choice(np.arange(-SIZE, SIZE), 1)[0]
L = 50
P0 = np.array([0, N])
X_Y = np.array([1, 1 / M]) if abs(M) < 1 else np.array([1, M])
draw_in = np.add(np.repeat([P0], L, axis=0),
np.repeat([X_Y], L, axis=0) * np.arange(L)[:, np.newaxis]).astype(int)
draw_in = draw_in[((draw_in < SIZE) & (draw_in>0)).all(axis=1)]
arr[draw_in[:, 0], draw_in[:, 1]] = 1

Related

np.where() to eliminate data, where coordinates are too close to each other

I'm doing aperture photometry on a cluster of stars, and to get easier detection of background signal, I want to only look at stars further apart than n pixels (n=16 in my case).
I have 2 arrays, xs and ys, with the x- and y-values of all the stars' coordinates:
Using np.where I'm supposed to find the indexes of all stars, where the distance to all other stars is >= n
So far, my method has been a for-loop
import numpy as np
# Lists of coordinates w. values between 0 and 2000 for 5000 stars
xs = np.random.rand(5000)*2000
ys = np.random.rand(5000)*2000
# for-loop, wherein the np.where statement in question is situated
n = 16
for i in range(len(xs)):
index = np.where( np.sqrt( pow(xs[i] - xs,2) + pow(ys[i] - ys,2)) >= n)
Due to the stars being clustered pretty closely together, I expected a severe reduction in data, though even when I tried n=1000 I still had around 4000 datapoints left
Using just numpy (and part of the answer here)
X = np.random.rand(5000,2) * 2000
XX = np.einsum('ij, ij ->i', X, X)
D_squared = XX[:, None] + XX - 2 * X.dot(X.T)
out = np.where(D_squared.min(axis = 0) > n**2)
Using scipy.spatial.pdist
from scipy.spatial import pdist, squareform
D_squared = squareform(pdist(x, metric = 'sqeuclidean'))
out = np.where(D_squared.min(axis = 0) > n**2)
Using a KDTree for maximum fast:
from scipy.spatial import KDTree
X_tree = KDTree(X)
in_radius = np.array(list(X_tree.query_pairs(n))).flatten()
out = np.where(~np.in1d(np.arange(X.shape[0]), in_radius))
np.random.seed(seed=1)
xs = np.random.rand(5000,1)*2000
ys = np.random.rand(5000,1)*2000
n = 16
mask = (xs>=0)
for i in range(len(xs)):
if mask[i]:
index = np.where( np.sqrt( pow(xs[i] - x,2) + pow(ys[i] - y,2)) <= n)
mask[index] = False
mask[i] = True
x = xs[mask]
y = ys[mask]
print(len(x))
4220
You can use np.subtract.outer for creating the pairwise comparisons. Then you check for each row whether the distance is below 16 for exactly one item (which is the comparison with the particular start itself):
distances = np.sqrt(
np.subtract.outer(xs, xs)**2
+ np.subtract.outer(ys, ys)**2
)
indices = np.nonzero(np.sum(distances < 16, axis=1) == 1)

Turn grid into a checkerboard pattern in python?

I have successfully created a grid, but am now trying to turn my grid into a checkerboard pattern, preferably using a variant of the floodfill command. how do I make sure the program recognizes which squares are even and which are odd?
currently the IDE is set so m[i][j]= 1 gives blue, while m[i][j]= 0 gives red, which I am happy to keep, and so I do not need to define the colors. Thank you.
Code I have so far :
from pylab import *
from numpy import *
from math import *
m=zeros((100,100))
for i in range(100):
for j in range(100):
if (math.floor(i) % 10) != 0:
if (math.floor(j) % 10) != 0:
m[i][j]= 1
else:
m[i][j]= 0
imshow(m)
show()
Code output :
import numpy as np
def make_checkerboard(n_rows, n_columns, square_size):
n_rows_, n_columns_ = int(n_rows/square_size + 1), int(n_columns/square_size + 1)
rows_grid, columns_grid = np.meshgrid(range(n_rows_), range(n_columns_), indexing='ij')
high_res_checkerboard = np.mod(rows_grid, 2) + np.mod(columns_grid, 2) == 1
square = np.ones((square_size,square_size))
checkerboard = np.kron(high_res_checkerboard, square)[:n_rows,:n_columns]
return checkerboard
square_size = 5
n_rows = 14
n_columns = 67
checkerboard = make_checkerboard(n_rows, n_columns, square_size)
You can check the sum of the two indices (row and column) and color it with the first color if it's odd and second otherwise. Something like:
for i in range(nrows):
for j in range(ncols):
m[i][j] = 0 if (i+j)%2 else 1
Use the modulus operation:
m[i][j] = (i+j) % 2
I would create a linear array, fill every second value and reshape.
In your case (even amount of columns), prepend one column and get rid of it after reshaping:
import numpy as np
rows = 100
cols = 100 + 1 # product of rows*cols must be odd, we fix it later
m = np.zeros((rows*cols, 1)) # create array
m[::2] = 1 # fill every second
m = np.reshape(m, (rows, cols)) # reshape array to matrix
m = m[:, :-1] # cut additional column
You can create a checkerboard style array with NumPy, then resize it with scipy's imresize to make that equal to your desired canvas area.
Thus, the steps would be :
1) Create a NumPy array of shape (10,10) corresponding to 10 x 10 sized checkerboard pattern. To do so, start with zeros array and fill the alternate rows and columns with ones :
arr = np.zeros((10,10),dtype=int)
arr[::2,::2] = 1
arr[1::2,1::2] = 1
2) Resize the array 10x to have (100,100) pixel sized output image :
from scipy.misc import imresize # Importing required function
out = imresize(arr,10*np.array(arr.shape),interp='nearest')/255
Output :
With only minimal modification of your code it would look something like this:
from pylab import *
from numpy import *
from math import *
m=zeros((100,100))
for i in range(100):
for j in range(100):
if (math.floor(i) % 10) != 0:
if (math.floor(j) % 10) != 0:
m[i][j]= 1
if (int(i / 10) + int(j / 10)) % 2: # the only two extra lines.
m[i][j] = 0 #
imshow(m)
show()
Alternatively just this (assuming you really need the 100x100) to get rid of the "boundary lines":
m=zeros((100,100))
for i in range(100):
for j in range(100):
m[i][j] = (int(i / 10) + int(j / 10)) % 2
Cheers.

3D distance vectorization

I need help vectorizing this code. Right now, with N=100, its takes a minute or so to run. I would like to speed that up. I have done something like this for a double loop, but never with a 3D loop, and I am having difficulties.
import numpy as np
N = 100
n = 12
r = np.sqrt(2)
x = np.arange(-N,N+1)
y = np.arange(-N,N+1)
z = np.arange(-N,N+1)
C = 0
for i in x:
for j in y:
for k in z:
if (i+j+k)%2==0 and (i*i+j*j+k*k!=0):
p = np.sqrt(i*i+j*j+k*k)
p = p/r
q = (1/p)**n
C += q
print '\n'
print C
The meshgrid/where/indexing solution is already extremely fast. I made it about 65 % faster. This is not too much, but I explain it anyway, step by step:
It was easiest for me to approach this problem with all 3D vectors in the grid being columns in one large 2D 3 x M array. meshgrid is the right tool for creating all the combinations (note that numpy version >= 1.7 is required for a 3D meshgrid), and vstack + reshape bring the data into the desired form. Example:
>>> np.vstack(np.meshgrid(*[np.arange(0, 2)]*3)).reshape(3,-1)
array([[0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1]])
Each column is one 3D vector. Each of these eight vectors represents one corner of a 1x1x1 cube (a 3D grid with step size 1 and length 1 in all dimensions).
Let's call this array vectors (it contains all 3D vectors representing all points in the grid). Then, prepare a bool mask for selecting those vectors fulfilling your mod2 criterion:
mod2bool = np.sum(vectors, axis=0) % 2 == 0
np.sum(vectors, axis=0) creates an 1 x M array containing the element sum for each column vector. Hence, mod2bool is a 1 x M array with a bool value for each column vector. Now use this bool mask:
vectorsubset = vectors[:,mod2bool]
This selects all rows (:) and uses boolean indexing for filtering the columns, both are fast operations in numpy. Calculate the lengths of the remaining vectors, using the native numpy approach:
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
This is quite fast -- however, scipy.stats.ss and bottleneck.ss can perform the squared sum calculation even faster than this.
Transform the lengths using your instructions:
with np.errstate(divide='ignore'):
p = (r/lengths)**n
This involves finite number division by zero, resulting in Infs in the output array. This is entirely fine. We use numpy's errstate context manager for making sure that these zero divisions do not throw an exception or a runtime warning.
Now sum up the finite elements (ignore the infs) and return the sum:
return np.sum(p[np.isfinite(p)])
I have implemented this method two times below. Once exactly like just explained, and once involving bottleneck's ss and nansum functions. I have also added your method for comparison, and a modified version of your method that skips the np.where((x*x+y*y+z*z)!=0) indexing, but rather creates Infs, and finally sums up the isfinite way.
import sys
import numpy as np
import bottleneck as bn
N = 100
n = 12
r = np.sqrt(2)
x,y,z = np.meshgrid(*[np.arange(-N, N+1)]*3)
gridvectors = np.vstack((x,y,z)).reshape(3, -1)
def measure_time(func):
import time
def modified_func(*args, **kwargs):
t0 = time.time()
result = func(*args, **kwargs)
duration = time.time() - t0
print("%s duration: %.3f s" % (func.__name__, duration))
return result
return modified_func
#measure_time
def method_columnvecs(vectors):
mod2bool = np.sum(vectors, axis=0) % 2 == 0
vectorsubset = vectors[:,mod2bool]
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
#measure_time
def method_columnvecs_opt(vectors):
# On my system, bn.nansum is even slightly faster than np.sum.
mod2bool = bn.nansum(vectors, axis=0) % 2 == 0
# Use ss from bottleneck or scipy.stats (axis=0 is default).
lengths = np.sqrt(bn.ss(vectors[:,mod2bool]))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return bn.nansum(p[np.isfinite(p)])
#measure_time
def method_original(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
ind = np.where((x*x+y*y+z*z)!=0)
x = x[ind]
y = y[ind]
z = z[ind]
p=np.sqrt(x*x+y*y+z*z)/r
return np.sum((1/p)**n)
#measure_time
def method_original_finitesum(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
lengths = np.sqrt(x*x+y*y+z*z)
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
print method_columnvecs(gridvectors)
print method_columnvecs_opt(gridvectors)
print method_original(x,y,z)
print method_original_finitesum(x,y,z)
This is the output:
$ python test.py
method_columnvecs duration: 1.295 s
12.1318801965
method_columnvecs_opt duration: 1.162 s
12.1318801965
method_original duration: 1.936 s
12.1318801965
method_original_finitesum duration: 1.714 s
12.1318801965
All methods produce the same result. Your method becomes a bit faster when doing the isfinite style sum. My methods are faster, but I would say that this is an exercise of academic nature rather than an important improvement :-)
I have one question left: you were saying that for N=3, the calculation should produce a 12. Even yours doesn't do this. All methods above produce 12.1317530867 for N=3. Is this expected?
Thanks to #Bill, I was able to get this to work. Very fast now. Perhaps could be done better, especially with the two masks to get rid of the two conditions that I originally had for loops for.
from __future__ import division
import numpy as np
N = 100
n = 12
r = np.sqrt(2)
x, y, z = np.meshgrid(*[np.arange(-N, N+1)]*3)
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
ind = np.where((x*x+y*y+z*z)!=0)
x = x[ind]
y = y[ind]
z = z[ind]
p=np.sqrt(x*x+y*y+z*z)/r
ans = (1/p)**n
ans = np.sum(ans)
print 'ans'
print ans

Can I vectorize this 2d array indexing where the 2nd dimension depends on the value of the first?

In the example below I have a 2D array that has some real results that are shifted and padded. The shifts depend on the row (the padding is used to make the array rectangular as required by numpy). Is it possible to extract the real results without a Python loop?
import numpy as np
# results are 'shifted' where the shift depends on the row
shifts = np.array([0, 8, 4, 2], dtype=int)
max_shift = shifts.max()
n = len(shifts)
t = 10 # length of the real results we care about
a = np.empty((n, t + max_shift), dtype=int)
b = np.empty((n, t), dtype=int)
for i in range(n):
a[i] = np.concatenate([[0] * shifts[i], # shift
(i+1) * np.arange(1, t+1), # real data
[0] * (max_shift - shifts[i]) # padding
])
print "shifted and padded\n", a
# I'd like to remove this Python loop if possible
for i in range(n):
b[i] = a[i, shifts[i]:shifts[i] + t]
print "real data\n", b
You can use two array to get the data out:
a[np.arange(4)[:, None], shifts[:, None] + np.arange(10)]
or:
i, j = np.ogrid[:4, :10]
a[i, shifts[:, None]+j]
This is called Advanced indexing in NumPy document.

How do I vectorized these two numpy operations without using for loop?

there is an operation in numpy I've found hard to implement without looping
That operation is that I have two inputs: beta,x
beta.shape = (M,N,K) and x.shape = (I,K)
The operation I'm interested in can be done using for loop as follows
result = np.zeros((M,N,I,K)) # buffer to save my operation results
for m in range(M):
for n in range(N):
beta_ = beta[m][n] # has shape (K,)
result[m][n] = x * beta_
Let tricks here that I can do without loops so that the whole operation can be computational efficient ?
You're interested in multiplying the elements across the common K dimension, and keeping the results along the remaining dimensions.
This means you can use np.einsum using the dimensions of beta, x, and the shape result you're interested in like 'mnk,ik->mnik':
import numpy as np
M = 4
N = 3
I = 7
K = 6
beta = np.arange(M*N*K).reshape(M,N,K)
x = np.arange(I*K).reshape(I,K)
result1 = np.zeros((M,N,I,K)) # buffer to save my operation results
for m in range(M):
for n in range(N):
beta_ = beta[m][n] # has shape (K,)
result1[m][n] = x * beta_
result2 = np.einsum('mnk,ik->mnik', beta, x)
print (np.array_equal(result1,result2))
True
Not part of the question but while talking about np.einsum... If you want to sum across any of these dimensions, you can omit it from the end dimensions:
import numpy as np
M = 4
N = 3
I = 7
K = 6
beta = np.arange(M*N*K).reshape(M,N,K)
x = np.arange(I*K).reshape(I,K)
result1 = np.zeros((M,N,I,K)) # buffer to save my operation results
for m in range(M):
for n in range(N):
beta_ = beta[m][n] # has shape (K,)
result1[m][n] = x * beta_
result1 = result1.sum(axis=1)
result2 = np.einsum('mnk,ik->mik', beta, x)
print (np.array_equal(result1,result2))
True

Categories