I am trying to create a function which takes two inputs. One input is the matrix (n*m), and the second is K. K is a integer value. The distance between the cells A[3][2] and A[1][4] is |1-3| + |4-2| = 4. The expected output from the function is the count of cells with cell distance greater than K.
Cell here is each entry in the given matrix A. For example, A[0][0] is a cell and it has an entry value of 1 in the matrix.
I have created a function like this:
A = [[1, 0, 0],
[0, 0, 0],
[0, 0, 1],
[0, 1, 0]]
def findw(K, matrix):
m_c = matrix.copy()
result = 0
for i, j in zip(range(len(matrix)), range(len(m_c))):
for k, l in zip(range(len(matrix[i])), range(len(m_c[j]))):
D = abs(i - l) + abs(j - k)
print(i, k)
print(j, l)
print(D)
if D > K:
result += 1
return result
findw(1, A)
The output I got from the above function for the given matrix A with K = 1 is 9. But I am expecting 3. From the output I also realized that for both the matrices my function is always taking same value, for example (0,0) or (1,0), etc. See the print output below.
findw(1, A)
0 0
0 0
0
0 1
0 1
2
0 2
0 2
4
1 0
1 0
2
1 1
1 1
0
1 2
1 2
2
2 0
2 0
4
2 1
2 1
2
2 2
2 2
0
3 0
3 0
6
3 1
3 1
4
3 2
3 2
2
Out[120]: 9
It looks like my function is not iterating for cells where the indexes for both matrices are different. For example, matrix[0][0] and m_c[0][1].
How can I resolve this issue?
Working under the assumption that it is only the positions which have the value 1 that you care about, you could first enumerate those indices and then loop over the pairs of such things. itertools is a natural tool to use here:
from itertools import product, combinations
def D(p,q):
i,j = p
k,l = q
return abs(i-k) + abs(j-l)
def findw(k,matrix):
m = len(matrix)
n = len(matrix[0])
result = 0
indices = [(i,j) for i,j in product(range(m),range(n)) if matrix[i][j] == 1]
for p,q in combinations(indices,2):
d = D(p,q)
if d > k:
print(p,q,d)
result += 1
return result
#test:
A = [[1, 0, 0],
[0, 0, 0],
[0, 0, 1],
[0, 1, 0]]
print(findw(1, A))
Output:
(0, 0) (2, 2) 4
(0, 0) (3, 1) 4
(2, 2) (3, 1) 2
3
Related
I have Size x Size array that is initialized with '0' only. I want to fill it with randomly picked integers in a way so it forms a triangle. I tried making a for loop with another 2 in its body, but it doesn't seem to work.
pyramid = [[1, 0, 0],
[4, 8, 0],
[1, 5, 3]]
This is the desirable format
pyramid = [[0]*rows]*rows
for i in range(0, 3, 1):
for j in range(0, 3, 1):
for k in range(0, i+1, 1):
pyramid[i][k] = random.randint(1, 10)
You have an array that is initially:
0 1 2 3
-------
0|0 0 0 0
1|0 0 0 0
2|0 0 0 0
3|0 0 0 0
You want it to be
0 1 2 3
-------
0|4 0 0 0
1|1 5 0 0
2|9 8 4 0
3|2 5 7 1
Notice that if you iterate over rows, you only need to fill up that row until you hit a diagonal (including the diagonal).
So for row 0, you fill up until column 0. For row 1, you do column 0 and 1. For row 2, you do column 0,1 and 2. And so on, so that for row i, you do columns 0, 1, ... i-1, i.
You could iterate over rows in one loop (i.e. row will range between 0 and N inclusive) and then in an inner loop iterate over the columns for that row - i.e. column for row i should range between 0 and i inclusive.
You will need to use the value of rows. Additionally, your value of j does not add anything. You need to loop through a 2-dimensional array, so 2 loops should be sufficient.
for i in range(rows):
for j in range(i+1):
pyramid[i][j] = random.randint(1, 10)
Additionally, you are initializing a 2-d list, where each entry is a copy of the others, referencing the same object. So if you change pyramid[i][j], you also change pyramid[i+1][j].
To prevent this use pyramid = [[0 for i in range(rows)] for i in range(rows)]
This should work fine:
import random
pyramid = [
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]
]
x = len(pyramid)
y = len(pyramid[0])
for i in range(x):
for j in range(y):
if i >= j:
pyramid[i][j] = random.randint(1,10)
I'm looking to express in pure python what is being done by the np.kron function.
Let's say i have these lists:
v1 = [1, 0, 0, 1]
v2 = [1, 0, 0, 1]
i would like to define a function that would create a list of lists by multiplying v1 by each element in v2. So these two lists would produce 4 lists:
[[1 * v1[0]],[0 * v1[1]],[0 * v1[2]],[1 * v1[3]]]
Currently, I can get the right lists from a list comprehension:
i = [[a*b for a in v1] for b in v2]
>>[[1, 0, 0, 1], [0, 0, 0, 0], [0, 0, 0, 0], [1, 0, 0, 1]]
those lists are correct, but when i convert to np.array and reformat it, there are 1s in the quadrants rather than down the diagonal:
print(np.array(i).reshape(4,4))
[[1 0 0 1]
[0 0 0 0]
[0 0 0 0]
[1 0 0 1]]
if np.kron is passed v1, v2 after converting them to numpy arrays it would give:
i2 = np.kron((np.array(v1).reshape(2,2)),(np.array(v2).reshape(2,2)))
[[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]]
which is a beautiful 4 x 4 identity matrix; thats what im looking to express in pure python, rather than using np kron function.
While the kron solution is by far the simplest (and you can always dump it into a list with .tolist), let's look at a pure python implementation.
There are two parts here: how to implement kron, and how to reshape a list. One way to do it is to make a grid that tells you which element to get from where at each location.
So the index into v1 would look like:
0 1 0 1
2 3 2 3
0 1 0 1
2 3 2 3
The index into v2 is something like
0 0 1 1
0 0 1 1
2 2 3 3
2 2 3 3
You can convert these into an expression in term of the row r and column c. The index into v1 looks something like
i1 = 2 * (r % 2) + (c % 2)
For v2, you can write
i2 = 2 * (r // 2) + (c // 2)
If course it's the constant 2 that would change if you tried to shape the inputs or outputs differently.
Now you can just write a nested comprehension:
output = [[v1[2 * (r % 2) + (c % 2)] * v2[2 * (r // 2) + (c // 2)] for c in range(4)] for r in range(4)]
Suppose I have an (n*m) binary matrix df similar to the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.binomial(1, .3, size=(6,8)))
0 1 2 3 4 5 6 7
------------------------------
0 | 0 0 0 0 0 1 1 0
1 | 0 1 0 0 0 0 0 0
2 | 0 0 0 0 1 0 0 0
3 | 0 0 0 0 0 1 0 1
4 | 0 1 1 0 1 0 0 0
5 | 1 0 1 1 1 0 0 1
I want to shuffle the values in the matrix to create a new_df of the same shape, such that both marginal distributions are the same, such as the following:
0 1 2 3 4 5 6 7
------------------------------
0 | 0 0 0 0 1 0 0 1
1 | 0 0 0 0 1 0 0 0
2 | 0 0 0 0 0 0 0 1
3 | 0 1 1 0 0 0 0 0
4 | 1 0 0 0 1 1 0 0
5 | 0 1 1 1 0 1 1 0
In the new matrix, the sum of each row is equal to the sum of the corresponding row in the original matrix, and likewise, columns in the new matrix have the same sum as the corresponding column in the original matrix.
The solution is pretty easy to check:
# rows have the same marginal distribution
assert(all(df.sum(axis=1) == new_df.sum(axis=1)))
# columns have the same marginal distribution
assert(all(df.sum(axis=0) == new_df.sum(axis=0)))
If n*m is small, I can use a brute-force approach to the shuffle:
def shuffle_2d(df):
"""Shuffles a multidimensional binary array, preserving marginal distributions"""
# get a list of indices where the df is 1
rowlist = []
collist = []
for i_row, row in df.iterrows():
for i_col, val in row.iteritems():
if df.loc[i_row, i_col] == 1:
rowlist.append(i_row)
collist.append(i_col)
# create an empty df of the same shape
new_df = pd.DataFrame(index=df.index, columns=df.columns, data=0)
# shuffle until you get no repeat coordinates
# this is so you don't increment the same cell in the matrix twice
repeats = 999
while repeats > 1:
pairs = list(zip(np.random.permutation(rowlist), np.random.permutation(collist)))
repeats = pd.value_counts(pairs).max()
# populate new data frame at indicated points
for i_row, i_col in pairs:
new_df.at[i_row, i_col] += 1
return new_df
The problem is that the brute force approach scales poorly. (As in that line from Indiana Jones and the Last Crusade: https://youtu.be/Ubw5N8iVDHI?t=3)
As a quick demo, for an n*n matrix, the number of attempts needed to get an acceptable shuffle looks like: (in one run)
n attempts
2 1
3 2
4 4
5 1
6 1
7 11
8 9
9 22
10 4416
11 800
12 66
13 234
14 5329
15 26501
16 27555
17 5932
18 668902
...
Is there a straightforward solution that preserves the exact marginal distributions (or tells you where no other pattern is possible that preserves that distribution)?
As a fallback, I could also use an approximation algorithm that could minimize the sum of squared errors on each row.
Thanks! =)
EDIT:
For some reason I wasn't finding existing answers before I wrote this question, but after posting it they all show up in the sidebar:
Is it possible to shuffle a 2D matrix while preserving row AND column frequencies?
Randomize matrix in perl, keeping row and column totals the same
Sometimes all you need to do is ask...
Thanks mostly to https://stackoverflow.com/a/2137012/6361632 for inspiration, here's a solution that seems to work:
def flip1(m):
"""
Chooses a single (i0, j0) location in the matrix to 'flip'
Then randomly selects a different (i, j) location that creates
a quad [(i0, j0), (i0, j), (i, j0), (i, j) in which flipping every
element leaves the marginal distributions unaltered.
Changes those elements, and returns 1.
If such a quad cannot be completed from the original position,
does nothing and returns 0.
"""
i0 = np.random.randint(m.shape[0])
j0 = np.random.randint(m.shape[1])
level = m[i0, j0]
flip = 0 if level == 1 else 1 # the opposite value
for i in np.random.permutation(range(m.shape[0])): # try in random order
if (i != i0 and # don't swap with self
m[i, j0] != level): # maybe swap with a cell that holds opposite value
for j in np.random.permutation(range(m.shape[1])):
if (j != j0 and # don't swap with self
m[i, j] == level and # check that other swaps work
m[i0, j] != level):
# make the swaps
m[i0, j0] = flip
m[i0, j] = level
m[i, j0] = level
m[i, j] = flip
return 1
return 0
def shuffle(m1, n=100):
m2 = m1.copy()
f_success = np.mean([flip1(m2) for _ in range(n)])
# f_success is the fraction of flip attempts that succeed, for diagnostics
#print(f_success)
# check the answer
assert(all(m1.sum(axis=1) == m2.sum(axis=1)))
assert(all(m1.sum(axis=0) == m2.sum(axis=0)))
return m2
Which we can call as:
m1 = np.random.binomial(1, .3, size=(6,8))
array([[0, 0, 0, 1, 1, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 1, 0, 1],
[1, 1, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0]])
m2 = shuffle(m1)
array([[0, 0, 0, 0, 1, 1, 0, 1],
[1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 1],
[1, 1, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 1]])
How many iterations do we need to get to a steady-state distribution? I've set a default of 100 here, which is sufficient for these small matrices.
Below I plot the correlation between the original matrix and the shuffled matrix (500 times) for various numbers of iterations.
for _ in range(500):
m1 = np.random.binomial(1, .3, size=(9,9)) # create starting df
m2 = shuffle(m1, n_iters)
corrs.append(np.corrcoef(m1.flatten(), m2.flatten())[1,0])
plt.hist(corrs, bins=40, alpha=.4, label=n_iters)
For a 9x9 matrix, we see improvements up until about 25 iterations, beyond which we're in a steady state.
For an 18x18 matrix, we see small gains going from 100 to 250 iterations, but not much beyond.
Note that the correlation between starting and ending distributions is lower for larger matrices, but it takes us longer to get there.
You have to look for two rows and two columns, the cut points of which give a matrix with 1 0 on the top and 0 1 on the bottom (or the other way around). These values you can switch (to 01 and 10).
There is even an algorithm, that can sample from all possible matrices with identical marginals (implemented in the R-package RaschSampler) developed by Verhelst (2008, link to article page).
A newer algorithm by Wang (2020, link), more efficient for some cases, is also available.
I want to:
Create a vector list from 0 to 4, i.e. [0, 1, 2, 3, 4] and from that
Create a matrix containing a "tiered list" from 0 to 4, 3 times over, once for each dimension. The matrix has 4^3 = 64 rows, so for example
T = [0 0 0
0 0 1
0 0 2
0 0 3
0 0 4
0 1 0
0 1 1
0 1 2
0 1 3
0 1 4
0 2 0
...
1 0 0
...
1 1 0
....
4 4 4]
This is what I have so far:
n=5;
ind=list(range(0,n))
print(ind)
I am just getting started with Python so any help would be greatly appreciated!
The python itertools module product() function can do this:
for code in itertools.product( range(5), repeat=3 ):
print(code)
Giving the result:
(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
(0, 0, 3)
...
(4, 4, 2)
(4, 4, 3)
(4, 4, 4)
So to make this into a matrix:
import itertools
matrix = []
for code in itertools.product( range(5), repeat=3 ):
matrix.append( list(code) )
list_ = []
for a in range(5):
for b in range(5):
for c in range(5):
list_ += [a ,b ,c ]
print(list_)
Note, you really want the matrix to have 5^3 = 125 rows. The basic answer is to just iterate in nested for loops:
T = []
for a in range(5):
for b in range(5):
for c in range(5):
T.append([a, b, c])
There are other, probably faster, ways of doing this, but for sheer get 'er done velocity, it's hard to beat this.
big_array = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
print(big_array)
[[0 1 0 0 1 0 0 1]
[0 1 0 0 0 0 0 0]
[0 1 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0]
[1 0 0 0 1 0 0 0]]
Is there a way to iterate over this numpy array and for each 2x2 cluster of 0s, set all values within that cluster = 5? This is what the output would look like.
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 0]
[0 1 5 5 1 5 5 0]
[0 0 5 5 1 5 5 0]
[1 0 5 5 1 5 5 0]]
My thoughts are to use advanced indexing to set the 2x2 shape = to 5, but I think it would be really slow to simply iterate like:
1) check if array[x][y] is 0
2) check if adjacent array elements are 0
3) if all elements are 0, set all those values to 5.
big_array = [1, 7, 0, 0, 3]
i = 0
p = 0
while i <= len(big_array) - 1 and p <= len(big_array) - 2:
if big_array[i] == big_array[p + 1]:
big_array[i] = 5
big_array[p + 1] = 5
print(big_array)
i = i + 1
p = p + 1
Output:
[1, 7, 5, 5, 3]
It is a example, not whole correct code.
Here's a solution by viewing the array as blocks.
First you need to define this function rolling_window from here https://gist.github.com/seberg/3866040/revisions
Then break the array big, your starting array, into 2x2 blocks using this function.
Also generate an array which has indices of every element in big and break it similarly into 2x2 blocks.
Then generate a boolean mask where the 2x2 blocks of big are all zero, and use the index array to get those elements.
blks = rolling_window(big,window=(2,2)) # 2x2 blocks of original array
inds = np.indices(big.shape).transpose(1,2,0) # array of indices into big
blkinds = rolling_window(inds,window=(2,2,0)).transpose(0,1,4,3,2) # 2x2 blocks of indices into big
mask = blks == np.zeros((2,2)) # generate a mask of every 2x2 block which is all zero
mask = mask.reshape(*mask.shape[:-2],-1).all(-1) # still generating the mask
# now blks[mask] is every block which is zero..
# but you actually want the original indices in the array 'big' instead
inds = blkinds[mask].reshape(-1,2).T # indices into big where elements need replacing
big[inds[0],inds[1]] = 5 #reassign
You need to test this: I did not. But the idea is to break the array into blocks, and an array of indices into blocks, then develop a boolean condition on the blocks, use those to get the indices, and then reassign.
An alternative would be to iterate through indblks as defined here, then test the 2x2 obtained from big at each indblk element and reassign if necessary.
This is my attempt to help you solve your problem. My solution may be subject to fair criticism.
import numpy as np
from itertools import product
m = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
h = 2
w = 2
rr, cc = tuple(d + 1 - q for d, q in zip(m.shape, (h, w)))
slices = [(slice(r, r + h), slice(c, c + w))
for r, c in product(range(rr), range(cc))
if not m[r:r + h, c:c + w].any()]
for s in slices:
m[s] = 5
print(m)
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 5]
[0 1 5 5 1 5 5 5]
[0 5 5 5 1 5 5 5]
[1 5 5 5 1 5 5 5]]