Python: How to read one element first and followed by two elements?

Python: How to read one element first and followed by two elements? - python

I would like to scan through sequences and return the value either 1 or 0 to indicate whether they are present or absent. For example: XYZXYZ
X Y Z X Y Z
1 0 0 1 0 0 - X
0 1 0 0 1 0 - Y
0 0 1 0 0 1 - Z
0 0 0 0 0 0 - XX
1 1 0 1 1 0 - XY
0 0 0 0 0 0 - XZ
0 0 0 0 0 0 - YX
0 0 0 0 0 0 - YY
0 1 1 0 1 1 - YZ
0 0 1 1 0 1 - ZX
0 0 0 0 0 0 - ZY
0 0 0 0 0 0 - ZZ
For two elements like XY, while scanning two elements at position X it will be given value one and when scanning at position Y, it will be given value one as well.
The example code below only scans one element at a time. When I replaced this line of code,
CHARS = ['X','Y','Z']
to
CHARS = ['X','Y','Z','XX','XY','XZ',...,'ZZ']
It can't read two elements.
The code below returns binary values in one line starting from X first and then Y and then followed by Z.
import numpy as np
seqs = ["XYZXYZ","YZYZYZ"]
CHARS = ['X','Y','Z']
CHARS_COUNT = len(CHARS)
maxlen = max(map(len, seqs))
res = np.zeros((len(seqs), CHARS_COUNT * maxlen), dtype=np.uint8)
for si, seq in enumerate(seqs):
seqlen = len(seq)
arr = np.chararray((seqlen,), buffer=seq)
for ii, char in enumerate(CHARS):
res[si][ii*seqlen:(ii+1)*seqlen][arr == char] = 1
print res
Example output of the code above:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1]]
How to enable it scan one element first and then followed by two elements?
Expected output:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0]]

I'm not sure if I completely get all the details, but this is what I'd do
seqs = ['xyzxyz', 'yzyzyz']
chars = ['x','y','z','xx','xy','xz','yx','yy','yz','zx','zy','zz']
N = len(chars)
out = []
for i, seq in enumerate(seqs):
M = len(seq) # if different seqs have different lenghts, this will break!
tmp = np.array([], dtype=int)
for c in chars:
o = np.array([0]*M)
index = -1
try:
while True:
index = seq.index(c, index+1)
o[index:(index+len(c))] = 1
except ValueError:
pass
finally:
tmp = np.r_[tmp, o]
out.append(tmp)
out = np.array(out)

Related

Python keeps updating wrong list

I'm trying to create a code to simulate the spread of something, via a 2D list of nxn structure. My issue is this: when I create a temp of my original list via temp = [*board], board[:], etc. it nonetheless updates both lists and instead of returning,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
returns
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
my code is here:
def spread(board, iterations, size):
temp = board[:]
for iteration in range(iterations):
for x in range(size):
for y in range(size):
if board[x][y] == 1:
if x+1 < size:
temp[x+1][y] = 1
if x-1 >= 0:
temp[x-1][y] = 1
if y+1 < size:
temp[x][y+1] = 1
if y-1 >= 0:
temp[x][y-1] = 1
board = temp[:]
return board
and I called it via
new_board = spread(my_board, 1, 15)

This is programming 101. Remember, lists are stored in heap, with pointers to them.
So really the variable board points to the place in heap where that array is stored. When you assign temp to board, what you are doing is creating a new pointer which points to that same array. I suggest taking a look at this using python tutor: https://pythontutor.com/visualize.html#mode=edit
For example:
b = [1,2,3,4,5]
a = b
a[0] = 2
print(b)
will output
[2,2,3,4,5]
Try it out in python tutor and you'll see what's happening!
To solve your problem, create a deep copy
def deep_copy(board):
temp = []
for i in range(len(board)):
row_copy = []
for j in range(len(board[0])):
row_copy.append(board[i][j])
temp.append(row_copy)
return temp

Bounding box of numpy array with periodic boundary conditions (wrapping)

I would like to do something similar to this question, or this other one, but using periodic boundary conditions (wrapping). I'll make a quick example.
Let's say I have the following numpy array:
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
Then, by using one of the methods proposed in the two linked questions, I am able to extract the bounding box of non-zero values:
0 0 0 1 1 1 1 1
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0
1 1 1 1 0 0 0 0
However, if the non-zero elements "cross" the border and come back on the other side, like so:
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
Then the result is:
1 1 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 1 1 1 0 0
which is not what I want. I would like the result to be the same as the previous case. I am trying to figure out an intelligent way to do this, but I am stuck. Anybody have ideas?

We can adapt this answer like so:
import numpy as np
def wrapped_bbox(a):
dims = [*range(1,a.ndim)]
bb = np.empty((a.ndim,2),int)
i = 0
while True:
n = a.shape[i]
r = np.arange(1,2*n+1)
ai = np.any(a,axis=tuple(dims))
r1_a = np.where(ai,r.reshape(2,n),0).ravel()
aux = np.maximum.accumulate(r1_a)
aux = r-aux
idx = aux.argmax()
mx = aux[idx]
if mx > n:
bb[i] = 0,n
else:
bb[i] = idx+1, idx+1 - mx
if bb[i,0] >= n:
bb[i,0] -= n
elif bb[i,1] == 0:
bb[i,1] = n
if i == len(dims):
return bb
dims[i] -= 1
i += 1
# example
x = """
......
.x...-
..x...
.....x
"""
x = np.array(x.strip().split())[:,None].view("U1")
x = (x == 'x').view('u1')
print(x)
for r in range(x.shape[1]):
print(wrapped_bbox(np.roll(x,r,axis=1)))
Run:
[[0 0 0 0 0 0] # x
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 1]]
[[1 4] # bbox vert
[5 3]] # bbox horz, note wraparound (left > right)
[[1 4]
[0 4]] # roll by 1
[[1 4]
[1 5]] # roll by 2
[[1 4]
[2 6]] # etc.
[[1 4]
[3 1]]
[[1 4]
[4 2]]

Search for unknown numbers in a txt and plot them

Recently, I started to evaluate some data with Python. However, it seems complicated to evaluate and manipulate my recorded data.
For instance, my .txt file consists of:
1551356567 0598523403 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523436 0000003362 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523469 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523502 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523535 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523766 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523799 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523832 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523865 0000003314 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523898 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356567 0598523931 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1551356568 0598524756 0000003384 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
The important values are only the third column (with 3362) and the first one (1551...), whereby the third column should be the x axis and the first the y axis. Only the lines with a value not equal to 0 are important. The idea is to create a loop which searches for values in the third column, and if there is a value != 0, then this value should be saved in a x-list (x) and the corresponding y value in a y-list (y).
Currently my script to read and manipulate the data looks like this:
import numpy as np
rawdata = np.loadtxt("file.txt")
num_lines = sum(1 for line in open("file.txt"))
with open("file.txt") as hv:
line = hv.readline()
x = list()
y = list()
i = 1
j = 0
while line != num_lines:
if rawdata[j][2] != 0:
x = x.append(rawdata[j][2])
y = x.append(rawdata[j][0])
else:
j += 1
if i == num_lines:
break
i += 1
print(x)
print(y)
I think there are some local and global variable problems but I couldn't solve them to lets say "update" my lists with the new values. At the end there should be a list with only:
[3362, 3314, 3384] for x and
[1551356567, 1551356567, 1551356568] for y
Do you have any suggestions how I can "update" my list?

As you read each line, split it on whitespace and convert each column to integers:
x = []
y = []
with open('file.txt') as f:
for line in f:
data = [int(col) for col in line.split()]
if data[2] != 0:
x.append(data[2])
y.append(data[0])
print(x)
print(y)
Output:
[3362, 3314, 3384]
[1551356567, 1551356567, 1551356568]

Check if every section of matrix is all 0 using numpy

I want to check if all entries of a matrix A within 10 indices of a given entry (x,y) are zero. I think something like this should do it
(numpy.take(A,[x-10:x+10,y-10:y+10]) == 0).all()
but I'm getting a invalid syntax error. Think I'm not constructing the index ranges right, any suggestions?

Don't worry about using take, just index your array like this:
(A[x-10:x+10,y-10:y+10] == 0).all()

A simple boolean check against the entries of the submatrix will do
np.all(A[x-10:x+11,y-10:y+11]==0)
(note the upper index is not included, so I changed to i-10:i+11)

Suppose A is an array of shape (19,19):
import numpy as np
H = W = 19
x, y = 1, 1
N = 10
A = np.random.randint(10, size=(H,W))
Then
In [433]: A[x-N:x+N,y-N:y+N]
Out[433]: array([[4]])
Since x-N is 1-10 = -9, A[x-N:x+N,y-N:y+N] is equivalent to A[-9:11,-9:11],
which is equivalent to A[19-9:11,19-9:11] which is the same as A[10:11,10:11].
So only one value is selected.
That's not giving you "all entries of a matrix A within 10 indices of a given
entry (x,y)".
Instead, you could generate the desired subregion using a boolean mask:
X, Y = np.ogrid[0:H,0:W]
mask = (np.abs(X - x) < N) & (np.abs(Y - y) < N)
Once you have the mask, you can select the subregion where the mask is True using A[mask], and test if every value is zero with
(A[mask] == 0).all()
import numpy as np
np.random.seed(2015)
H = W = 19
x, y = 1, 1
N = 10
A = np.random.randint(10, size=(H,W))
print(A[x-N:x+N,y-N:y+N])
# [[4]]
X, Y = np.ogrid[0:H,0:W]
mask = (np.abs(X - x) < N) & (np.abs(Y - y) < N)
print(mask.astype(int))
# [[1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
print(A[mask] == 0).all()
# False

Iterating in a area contained in a multidimensional array in python, why it doesn't work?

I need to fill a area in a 10*10 matrix with a real number and for this i have made this code in python:
# 'x' and 'x1' are points in a surface that delimit the area
[x, x1, y, y1] = [0, 5, 0, 2]
surfaceXY = [[0]*10]*10
for i in range(x, x1):
for j in range(y, y1):
surfaceXY[i][j] = 5
for k in range(10):
for l in range(10):
print surfaceXY[k][l],
print ""
i want to output this:
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
but the code will output is:
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
Can someone explain me why, and how is the right way to solve this problem in python?

Note that [[0]*10]*10 does not create a list of ten lists with ten zeroes each. It creates a list with one list, with ten zeroes in it.
Since there is only one list, changes the whole column in your grid.
Use
[[0]*10 for _ in range(10)]
See 2d array of zeros for additional discussion.

For me it looks like each of the rows have the same id (memory location) meaning they are the same data structure.
>>> id(surfaceXY[0])
4548151416
>>> id(surfaceXY[1])
4548151416
So this means that the mult operator of a list makes a copy of the pointer and not a copy of the data.
Try this initializer:
surfaceXY = [[0]*10 for i in range(10)]
Test:
>>> [x, x1, y, y1] = [0, 5, 0, 2]
>>> surfaceXY = [[0]*10 for i in range(10)]
>>> for i in range(x, x1):
... for j in range(y, y1):
... surfaceXY[i][j] = 5
...
>>> for k in range(10):
... for l in range(10):
... print surfaceXY[k][l],
... print ""
...
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
>>>

The problem is with your list creation. This:
surfaceXY = [[0]*10]*10
Creates ten references to a same list, or here, all rows you're making is actually pointing to one row only. When you change one of it, all of them is modified. Thus the result you are having.
To solve this, one would use this:
surfaceXY = [[0 for _ in range(10)] for _ in range(10)]
Or alternatively:
surfaceXY = [[0]*10 for _ in range(10)]
Here's my run with your modified code :)
>>> surfaceXY = [[0 for _ in range(10)] for _ in range(10)]
>>> for i in range(x, x1):
for j in range(y, y1):
surfaceXY[i][j] = 5
>>> for k in range(10):
for l in range(10):
print (surfaceXY[k][l], end=' ')
print('', end='\n')
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: How to read one element first and followed by two elements? - python

Related

Python keeps updating wrong list

Bounding box of numpy array with periodic boundary conditions (wrapping)

Search for unknown numbers in a txt and plot them

Check if every section of matrix is all 0 using numpy

Iterating in a area contained in a multidimensional array in python, why it doesn't work?

Categories

Resources