How to create matrix in python of repeating number? - python

I want to:
Create a vector list from 0 to 4, i.e. [0, 1, 2, 3, 4] and from that
Create a matrix containing a "tiered list" from 0 to 4, 3 times over, once for each dimension. The matrix has 4^3 = 64 rows, so for example
T = [0 0 0
0 0 1
0 0 2
0 0 3
0 0 4
0 1 0
0 1 1
0 1 2
0 1 3
0 1 4
0 2 0
...
1 0 0
...
1 1 0
....
4 4 4]
This is what I have so far:
n=5;
ind=list(range(0,n))
print(ind)
I am just getting started with Python so any help would be greatly appreciated!

The python itertools module product() function can do this:
for code in itertools.product( range(5), repeat=3 ):
print(code)
Giving the result:
(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
(0, 0, 3)
...
(4, 4, 2)
(4, 4, 3)
(4, 4, 4)
So to make this into a matrix:
import itertools
matrix = []
for code in itertools.product( range(5), repeat=3 ):
matrix.append( list(code) )

list_ = []
for a in range(5):
for b in range(5):
for c in range(5):
list_ += [a ,b ,c ]
print(list_)

Note, you really want the matrix to have 5^3 = 125 rows. The basic answer is to just iterate in nested for loops:
T = []
for a in range(5):
for b in range(5):
for c in range(5):
T.append([a, b, c])
There are other, probably faster, ways of doing this, but for sheer get 'er done velocity, it's hard to beat this.

Related

Function to use indexes in a matrix

I am trying to create a function which takes two inputs. One input is the matrix (n*m), and the second is K. K is a integer value. The distance between the cells A[3][2] and A[1][4] is |1-3| + |4-2| = 4. The expected output from the function is the count of cells with cell distance greater than K.
Cell here is each entry in the given matrix A. For example, A[0][0] is a cell and it has an entry value of 1 in the matrix.
I have created a function like this:
A = [[1, 0, 0],
[0, 0, 0],
[0, 0, 1],
[0, 1, 0]]
def findw(K, matrix):
m_c = matrix.copy()
result = 0
for i, j in zip(range(len(matrix)), range(len(m_c))):
for k, l in zip(range(len(matrix[i])), range(len(m_c[j]))):
D = abs(i - l) + abs(j - k)
print(i, k)
print(j, l)
print(D)
if D > K:
result += 1
return result
findw(1, A)
The output I got from the above function for the given matrix A with K = 1 is 9. But I am expecting 3. From the output I also realized that for both the matrices my function is always taking same value, for example (0,0) or (1,0), etc. See the print output below.
findw(1, A)
0 0
0 0
0
0 1
0 1
2
0 2
0 2
4
1 0
1 0
2
1 1
1 1
0
1 2
1 2
2
2 0
2 0
4
2 1
2 1
2
2 2
2 2
0
3 0
3 0
6
3 1
3 1
4
3 2
3 2
2
Out[120]: 9
It looks like my function is not iterating for cells where the indexes for both matrices are different. For example, matrix[0][0] and m_c[0][1].
How can I resolve this issue?
Working under the assumption that it is only the positions which have the value 1 that you care about, you could first enumerate those indices and then loop over the pairs of such things. itertools is a natural tool to use here:
from itertools import product, combinations
def D(p,q):
i,j = p
k,l = q
return abs(i-k) + abs(j-l)
def findw(k,matrix):
m = len(matrix)
n = len(matrix[0])
result = 0
indices = [(i,j) for i,j in product(range(m),range(n)) if matrix[i][j] == 1]
for p,q in combinations(indices,2):
d = D(p,q)
if d > k:
print(p,q,d)
result += 1
return result
#test:
A = [[1, 0, 0],
[0, 0, 0],
[0, 0, 1],
[0, 1, 0]]
print(findw(1, A))
Output:
(0, 0) (2, 2) 4
(0, 0) (3, 1) 4
(2, 2) (3, 1) 2
3

Numpy / Pandas slicing based on intervals

Trying to figure out a way to slice non-contiguous and non-equal length rows of a pandas / numpy matrix so I can set the values to a common value. Has anyone come across an elegant solution for this?
import numpy as np
import pandas as pd
x = pd.DataFrame(np.arange(12).reshape(3,4))
#x is the matrix we want to index into
"""
x before:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
"""
y = pd.DataFrame([[0,3],[2,2],[1,2],[0,0]])
#y is a matrix where each row contains a start idx and end idx per column of x
"""
0 1
0 0 3
1 2 3
2 1 3
3 0 1
"""
What I'm looking for is a way to effectively select different length slices of x based on the rows of y
x[y] = 0
"""
x afterwards:
array([[ 0, 1, 2, 0],
[ 0, 5, 0, 7],
[ 0, 0, 0, 11]])
Masking can still be useful, because even if a loop cannot be entirely avoided, the main dataframe x would not need to be involved in the loop, so this should speed things up:
mask = np.zeros_like(x, dtype=bool)
for i in range(len(y)):
mask[y.iloc[i, 0]:(y.iloc[i, 1] + 1), i] = True
x[mask] = 0
x
0 1 2 3
0 0 1 2 0
1 0 5 0 7
2 0 0 0 11
As a further improvement, consider defining y as a NumPy array if possible.
I customized this answer to your problem:
y_t = y.values.transpose()
y_t[1,:] = y_t[1,:] - 1 # or remove this line and change '>= r' below to '> r`
r = np.arange(x.shape[0])
mask = ((y_t[0,:,None] <= r) & (y_t[1,:,None] >= r)).transpose()
res = x.where(~mask, 0)
res
# 0 1 2 3
# 0 0 1 2 0
# 1 0 5 0 7
# 2 0 0 0 11

Add element after each element in numpy array python

I am just starting off with numpy and am trying to create a function that takes in an array (x), converts this into a np.array, and returns a numpy array with 0,0,0,0 added after each element.
It should look like so:
input array: [4,5,6]
output: [4,0,0,0,0,5,0,0,0,0,6,0,0,0,0]
I have tried the following:
import numpy as np
x = np.asarray([4,5,6])
y = np.array([])
for index, value in enumerate(x):
y = np.insert(x, index+1, [0,0,0,0])
print(y)
which returns:
[4 0 0 0 0 5 6]
[4 5 0 0 0 0 6]
[4 5 6 0 0 0 0]
So basically I need to combine the output into one single numpy array rather than three lists.
Would anybody know how to solve this?
Many thanks!
Use the numpy .zeros function !
import numpy as np
inputArray = [4,5,6]
newArray = np.zeros(5*len(inputArray),dtype=int)
newArray[::5] = inputArray
In fact, you 'force' all the values with indexes 0,5 and 10 to become 4,5 and 6.
so _____[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
becomes [4 0 0 0 0 5 0 0 0 0 6 0 0 0 0]
>>> newArray
array([4, 0, 0, 0, 0, 5, 0, 0, 0, 0, 6, 0, 0, 0 ,0])
I haven't used numpy to solve this problem,but this code seems to return your required output:
a = [4,5,6]
b = [0,0,0,0]
c = []
for x in a:
c = c + [x] + b
print(c)
I hope this helps!

Index of identical rows in a NumPy array

I already asked a variation of this question, but I still have a problem regarding the runtime of my code.
Given a numpy array consisting of 15000 rows and 44 columns. My goal is to find out which rows are equal and add them to a list, like this:
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0
1 2 3 4 5
Result:
equal_rows1 = [1,2,3]
equal_rows2 = [0,4]
What I did up till now is using the following code:
import numpy as np
input_data = np.load('IN.npy')
equal_inputs1 = []
equal_inputs2 = []
for i in range(len(input_data)):
for j in range(i+1,len(input_data)):
if np.array_equal(input_data[i],input_data[j]):
equal_inputs1.append(i)
equal_inputs2.append(j)
The problem is that it takes a lot of time to return the desired arrays and that this allows only 2 different "similar row lists" although there can be more. Is there any better solution for this, especially regarding the runtime?
This is pretty simple with pandas groupby:
df
A B C D E
0 1 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 1 0 0 0 0
5 1 2 3 4 5
[g.index.tolist() for _, g in df.groupby(df.columns.tolist()) if len(g.index) > 1]
# [[1, 2, 3], [0, 4]]
If you are dealing with many rows and many unique groups, this might get a bit slow. The performance depends on your data. Perhaps there is a faster NumPy alternative, but this is certainly the easiest to understand.
You can use collections.defaultdict, which retains the row values as keys:
from collections import defaultdict
dd = defaultdict(list)
for idx, row in enumerate(df.values):
dd[tuple(row)].append(idx)
print(list(dd.values()))
# [[0, 4], [1, 2, 3], [5]]
print(dd)
# defaultdict(<class 'list'>, {(1, 0, 0, 0, 0): [0, 4],
# (0, 0, 0, 0, 0): [1, 2, 3],
# (1, 2, 3, 4, 5): [5]})
You can, if you wish, filter out unique rows via a dictionary comprehension.

For loop iteration with range restarting at index 1

I'm trying to iterate through a loop with a step of 2 indexes at the time and once it reaches the end to restart the same but from index 1 this time rather than zero.
I have already read different articles on stack like this with a while loop workaround. However, I'm looking for an option which will simply use the element in my for loop with range and without using itertool or other libraries or a nested loop:
Here is my code:
j = [0,0,1,1,2,2,3,3,9,11]
count = 0
for i in range(len(j)):
if i >= len(j)/2:
print(j[len(j)-i])
count += 1
else:
count +=1
print(j[i*2],i)
Here is the output:
0 0
1 1
2 2
3 3
9 4
2
2
1
1
0
The loop does not start back from where is supposed to.
Here is the desired output:
0 0
1 1
2 2
3 3
9 4
0 5
1 6
2 7
3 8
11 9
How can I fix it?
You can do that by combining two range() calls like:
Code:
j = [0, 0, 1, 1, 2, 2, 3, 3, 9, 11]
for i in (j[k] for k in
(list(range(0, len(j), 2)) + list(range(1, len(j), 2)))):
print(i)
and using an itertools solution:
import itertools as it
for i in it.chain.from_iterable((it.islice(j, 0, len(j), 2),
it.islice(j, 1, len(j), 2))):
print(i)
Results:
0
1
2
3
9
0
1
2
3
11
Another itertools solution:
import itertools as it
lst = [0, 0, 1, 1, 2, 2, 3, 3, 9, 11]
a, b = it.tee(lst)
next(b)
for i, x in enumerate(it.islice(it.chain(a, b), None, None, 2)):
print(x, i)
Output
0 0
1 1
2 2
3 3
9 4
0 5
1 6
2 7
3 8
11 9

Categories