Split an array into several arrays by defined boundaries, python

Split an array into several arrays by defined boundaries, python - python

I have a numpy array which consists of 64 columns and 49 rows. Each row stands for a separate message and contains several pieces of information. When an information starts or ends can be recognized by the change of the value. In the following an excerpt of the numpy array:
[[1 1 0 0 2 2 2 2 1 0 0 0 0 2 ... 2 2 2]
[0 0 0 2 2 2 2 2 2 2 2 2 2 2 ... 2 2 2]
[2 0 0 1 2 0 0 0 0 0 0 0 0 0 ... 1 1 0]
.
.
.
[0 1 0 1 0 1 0 1 0 0 0 0 0 0 ... 2 2 2]]
The first information of the first signal therefore takes the first two positions [11]. By changing the value from 1 to 0 I know that the second information is in the third and fourth position [00]. The third information occupies the following four positions [2222]. The next information consists only of [1]. And so on...
Once I have identified the positions of each information of a signal I have to apply these boundaries to my signal numpy arrays. My first binary signal numpy array consists of 64 columns and 3031 rows:
[[1 1 0 0 0 0 0 1 0 1 0 1 0 0 ... 1 0 0 1]
[1 0 1 0 1 1 1 1 1 0 0 1 1 0 ... 1 1 1 0]
[0 1 0 1 1 1 0 0 1 0 0 1 1 1 ... 1 1 1 0]
.
.
.
[1 0 1 0 0 1 0 0 0 0 1 1 0 1 ... 1 1 1 0]]
My first array (first information from the first signal) consists of the first two positions as determined by the previous array. The output should look like this:
[[11]
[10]
[01]
.
.
.
[10]]
The output of the second array (third and fourth position) should be the following:
[[00]
[10]
[01]
.
.
.
[10]]
The output of the third array:
[[0001]
[1111]
[1100]
.
.
.
[0100]]
Unfortunately I do not know how to create and apply the initial boundaries of the first array to the binary arrays. Does anyone have a solution for this?
Thanks for the help!

Sorry, I placed the hint of where you should create a loop at the wrong place. See if this code works: (I tried to explain numpy slicing a little in comments but can learn more here: Numpy indexing and slicing
import itertools
import numpy as np
# Def to reshape signals according to message
def reshape(lst1, lst2):
iterator = iter(lst2)
return [[next(iterator) for _ in sublist]
for sublist in lst1]
# Arrays
array_1 = np.array([[1,1,0,0,2,2,2,2,1,0,0,0,0,2],
[0,0,0,2,2,2,2,2,2,2,2,2,2,2],
[2,0,0,1,2,0,0,0,0,0,0,0,0,0],
[0,1,0,1,0,1,0,1,0,0,0,0,0,0]])
array_2 = np.array([[1,1,0,0,0,0,0,1,0,1,0,1,0,0],
[1,0,1,0,1,1,1,1,1,0,0,1,1,0],
[0,1,0,1,1,1,0,0,1,0,0,1,1,1],
[1,0,1,0,0,1,0,0,0,0,1,1,0,1]])
#Group messages into pieces of information
signal_list = []
for lists in array_1:
signal_list.append([list(group) for key, group in itertools.groupby(lists)])
#Index for all message
All_messages={}
#Do this for each message:
for rows in range(len(array_1)):
#Reshapes each signal according to current message
signals_reshape = (np.array([reshape(signal_list[rows], array_2[i]) for i in range(len(array_2))]))
# Create list to append all signals in current message
all_signal = []
# Do this for each information block
for i in range(len(signals_reshape[rows])):
'''
Append information blocks
1st [:] = retrieve in all signals
2nd [:] = retrieve the whole signal
3rd [:,i] = retrieve information block from specific column
Example: signals_reshape[0][0][0] retrieves the first information element of first information block of the fisrt signal
signals_reshape[0][0][:] retrieves all the information elements from the first information block from the first signal
signals_reshape[:][:][:,0] retrieves the first information block from all the signals
'''
all_signal.append(signals_reshape[:][:][:,i].flatten())
# add message information to dictionary (+ 1 is so that the names starts at Message1 and not Message0
All_messages["Message{0}".format(rows+1)] = all_signal
print(All_messages['Message1'])
print(All_messages['Message2'])
print(All_messages['Message3'])
print(All_messages['Message4'])

See if this can help you. This example returns the information for the 1st message, but you should be able to create a loop for all 49 messages and assign it to a new list.
import itertools
import numpy as np
# Def to reshape signals according to message
def reshape(lst1, lst2):
iterator = iter(lst2)
return [[next(iterator) for _ in sublist]
for sublist in lst1]
# Arrays
array_1 = np.array([[1,1,0,0,2,2,2,2,1,0,0,0,0,2],
[0,0,0,2,2,2,2,2,2,2,2,2,2,2],
[2,0,0,1,2,0,0,0,0,0,0,0,0,0]])
array_2 = np.array([[1,1,0,0,0,0,0,1,0,1,0,1,0,0],
[1,0,1,0,1,1,1,1,1,0,0,1,1,0],
[0,1,0,1,1,1,0,0,1,0,0,1,1,1]])
#Group messages into pieces of information
signal_list = []
for lists in array_1:
signal_list.append([list(group) for key, group in itertools.groupby(lists)])
# reshapes signals for each message to be used
signals_reshape = np.array([reshape(signal_list[0], array_2[i]) for i in range(len(array_2))])
print(signals_reshape[0])
# Get information from first message (Can create a loop to do the same for all 49 messages)
final_list_1 = []
for i in range(len(signals_reshape[0])):
final_list_1.append(signals_reshape[:][:][:,i].flatten())
print(final_list_1[0])
print(final_list_1[1])
print(final_list_1[2])
Output:
final_list_1[0]
[list([1, 1]), list([1, 0]), list([0, 1])]
final_list_1[1]
[list([0, 0]), list([1, 0]), list([0, 1])]
final_list_1[2]
[list([0, 0, 0, 1]) list([1, 1, 1, 1]) list([1, 1, 0, 0])]

Credits to #Kempie. He has solved the problem. I just adapted his code to my needs, shortened it a bit and fixed some small bugs.
import itertools
import numpy as np
# Def to reshape signals according to message
def reshape(lst1, lst2):
iterator = iter(lst2)
return [[next(iterator) for _ in sublist]
for sublist in lst1]
# Arrays
array_1 = np.array([[1,1,0,0,2,2,2,2,1,0,0,0,0,2],
[0,0,0,2,2,2,2,2,2,2,2,2,2,2],
[2,0,0,1,2,0,0,0,0,0,0,0,0,0],
[0,1,0,1,0,1,0,1,0,0,0,0,0,0]])
#in my case, array_2 was a list (see difference of code to Kempies solutions)
array_2 = np.array([[1,1,0,0,0,0,0,1,0,1,0,1,0,0],
[1,0,1,0,1,1,1,1,1,0,0,1,1,0],
[0,1,0,1,1,1,0,0,1,0,0,1,1,1],
[1,0,1,0,0,1,0,0,0,0,1,1,0,1]])
#Group messages into pieces of information
signal_list = []
for lists in array_1:
signal_list.append([list(group) for key, group in itertools.groupby(lists)])
signals_reshape_list = []
#Do this for each message (as array_2 is a list, we must work with indices):
for rows in range(len(array_1)):
#Reshapes each signal according to current message
signals_reshape = (np.array([reshape(signal_list[rows], array_2[rows][i]) for i in range(len(array_2[rows]))]))
signals_reshape_list.append(signals_reshape)
#print first signal of third message e.g.
print(signals_reshape_list[2][:,0]

Related

find index where element change sign from 0 to 1

I have a DataFrame like below, where I want to find index where element goes from 0 to 1.
Below code gets all the instances I want, however it also includes cases where sign changes from -1 to 0, which I don't want.
import numpy as np
df=pd.DataFrame([0,1,1,1,0,1,0,-1,0])
df[np.sign(df[0]).diff(periods=1).eq(1)]
Output:
0
1 1
5 1
8 0

Just add another condition:
filtered = df[np.sign(df[0]).diff(1).eq(1) & np.sign(df[0]).eq(1)]
Output:
>>> filtered
0
1 1
5 1

Python 9x9 and 3x3 array validation excluding 0

I am trying to validate if any numbers are duplicates in a 9x9 array however need to exclude all 0 as they are the once I will solve later. I have a 9x9 array and would like to validate if there are any duplicates in the rows and columns however excluding all 0 from the check only numbers from 1 to 9 only. The input array as example would be:
[[1 0 0 7 0 0 0 0 0]
[0 3 2 0 0 0 0 0 0]
[0 0 0 6 0 0 0 0 0]
[0 8 0 0 0 2 0 7 0]
[5 0 7 0 0 1 0 0 0]
[0 0 0 0 0 3 6 1 0]
[7 0 0 0 0 0 2 0 9]
[0 0 0 0 5 0 0 0 0]
[3 0 0 0 0 4 0 0 5]]
Here is where I am currently with my code for this:
#Checking Columns
for c in range(9):
line = (test[:,c])
print(np.unique(line).shape == line.shape)
#Checking Rows
for r in range(9):
line = (test[r,:])
print(np.unique(line).shape == line.shape)
Then I would like to do the exact same for the 3x3 sub arrays in the 9x9 array. Again I need to somehow exclude the 0 from the check. Here is the code I currently have:
for r0 in range(3,9,3):
for c0 in range(3,9,3):
test1 = test[:r0,:c0]
for r in range(3):
line = (test1[r,:])
print(np.unique(line).shape == line.shape)
for c in range(3):
line = (test1[:,c])
print(np.unique(line).shape == line.shape)
``
I would truly appreciate assistance in this regard.

It sure sounds like you're trying to verify the input of a Sudoku board.
You can extract a box as:
for r0 in range(0, 9, 3):
for c0 in range(0, 9, 3):
box = test1[r0:r0+3, c0:c0+3]
... test that np.unique(box) has 9 elements...
Note that this is only about how to extract the elements of the box. You still haven't done anything about removing the zeros, here or on the rows and columns.
Given a box/row/column, you then want something like:
nonzeros = [x for x in box.flatten() if x != 0]
assert len(nonzeros) == len(set(nonzeros))
There may be a more numpy-friendly way to do this, but this should be fast enough.

Excluding zeros is fairly straight forward by masking the array
test = np.array(test)
non_zero_mask = (test != 0)
At this point you can either check the whole matrix for uniqueness
np.unique(test[non_zero_mask])
or you can do it for individual rows/columns
non_zero_row_0 = test[0, non_zero_mask[0]]
unique_0 = np.unique(non_zero_row_0)
You can add the logic above into a loop to get the behavior you want
As for the 3x3 subarrays, you can loop through them as you did in your example.

When you have a small collection of things (small being <=64 or 128, depending on architecture), you can turn it into a set using bits. So for example:
bits = ((2**board) >> 1).astype(np.uint16)
Notice that you have to use right shift after the fact rather than pre-subtracting 1 from board to cleanly handle zeros.
You can now compute three types of sets. Each set is the bitwise OR of bits in a particular arrangement. For this example, you can use sum just the same:
rows = bits.sum(axis=1)
cols = bits.sum(axis=0)
blocks = bits.reshape(3, 3, 3, 3).sum(axis=(1, 3))
Now all you have to do is compare the bit counts of each number to the number of non-zero elements. They will be equal if and only if there are no duplicates. Duplicates will cause the bit count to be smaller.
There are pretty efficient algorithms for counting bits, especially for something as small as a uint16. Here is an example: How to count the number of set bits in a 32-bit integer?. I've adapted it for the smaller size and numpy here:
def count_bits16(arr):
count = arr - ((arr >> 1) & 0x5555)
count = (count & 0x3333) + ((count >> 2) & 0x3333)
return (count * 0x0101) >> 8
This is the count of unique elements for each of the configurations. You need to compare it to the number of non-zero elements. The following boolean will tell you if the board is valid:
count_bits16(rows) == np.count_nonzero(board, axis=1) and \
count_bits16(cols) == np.count_nonzero(board, axis=0) and \
count_bits16(blocks) == np.count_nonzero(board.reshape(3, 3, 3, 3), axis=(1, 3))

Multidimension array indexing and column-accessing

I have a 3 dimensional array like
[[[ 1 4 4 ..., 952 0 0]
[ 2 4 4 ..., 33 0 0]
[ 3 4 4 ..., 1945 0 0]
...,
[4079 1 1 ..., 0 0 0]
[4080 2 2 ..., 0 0 0]
[4081 1 1 ..., 0 0 0]]
[[ 1 4 4 ..., 952 0 0]
[ 2 4 4 ..., 33 0 0]
[ 3 4 4 ..., 1945 0 0]
...,
[4079 1 1 ..., 0 0 0]
[4080 2 2 ..., 0 0 0]
[4081 1 1 ..., 0 0 0]]
.....
[[ 1 4 4 ..., 952 0 0]
[ 2 4 4 ..., 33 0 0]
[ 3 4 4 ..., 1945 0 0]
...,
[4079 1 1 ..., 0 0 0]
[4080 2 2 ..., 0 0 0]
[4081 1 1 ..., 0 0 0]]]
This array has total 5 data blocks. Each data blocks have 4081 lines and 9 columns.
My question here is about accessing to column, in data-block-wise.
I hope to index data-blocks, lines, and columns, and access to the columns, and do some works with if loops. I know how to access to columns in 2D array, like:
column_1 = [row[0] for row in inputfile]
but how can I access to columns per each data block?
I tried like ( inputfile = 3d array above )
for i in range(len(inputfile)):
AAA[i] = [row[0] for row in inputfile]
print AAA[2]
But it says 'name 'AAA' is not defined. How can I access to the column, for each data blocks? Should I need to make [None] arrays? Are there any other way without using empty arrays?
Also, how can I access to the specific elements of the accessed columns? Like AAA[i][j] = i-th datablock, and j-th line of first column. Shall I use one more for loop for line-wise accessing?
ps) I tried to analyze this 3d array in a way like
for i in range(len(inputfile)): ### number of datablock = 5
for j in range(len(inputfile[i])): ### number of lines per a datablock = 4081
AAA = inputfile[i][j] ### Store first column for each datablocks to AAA
print AAA[0] ### Working as I intended to access 1st column.
print AAA[0][1] ### Not working, invalid index to scalar variable. I can't access to the each elemnt.
But this way, I cannot access to the each elements of 1st column, AAA[0]. How can I access to the each elements in here?
I thought maybe 2 indexes were not enough, so I used 3 for-loops as:
for i in range(len(inputfile)): ### number of datablock = 5
for j in range(len(inputfile[i])): ### number of lines per a datablock = 4081
for k in range(len(inputfile[i][j])): ### number of columns per line = 9
AAA = inputfile[i][j][0]
print AAA[0]
Still, I cannot access to the each elements of 1st column, it says 'invalid index to scalar variable'. Also, AAA contains nine of each elements, just like
>>> print AAA
1
1
1
1
1
1
1
1
1
2
2
...
4080
4080
4080
4081
4081
4081
4081
4081
4081
4081
4081
4081
Like this, each elements repeats 9 times, which is not what I want.
I hope to use indices during my analysis, will use index as element during analysis. I want to access to the columns, and access to the each elements with all indices, in this 3d array. How can I do this?

A good practice in to leverage zip:
For example:
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> for i in a:
... for j in b:
... print i, b
...
1 [4, 5, 6]
1 [4, 5, 6]
1 [4, 5, 6]
2 [4, 5, 6]
2 [4, 5, 6]
2 [4, 5, 6]
3 [4, 5, 6]
3 [4, 5, 6]
3 [4, 5, 6]
>>> for i,j in zip(a,b):
... print i,j
...
1 4
2 5
3 6

Unless you're using something like NumPy, Python doesn't have multi-dimensional arrays as such. Instead, the structure you've shown is a list of lists of lists of integers. (Your choice of inputfile as the variable name is confusing here; such a variable would usually contain a file handle, iterating over which would yield one string per line, but I digress...)
Unfortunately, I'm having trouble understanding exactly what you're trying to accomplish, but at one point, you seem to want a single list consisting of the first column of each row. That's as simple as:
column = [row[0] for block in inputfile for row in block]
Granted, this isn't really a column in the mathematical sense, but it might possibly perhaps be what you want.
Now, as to why your other attempts failed:
for i in range(len(inputfile)):
AAA[i] = [row[0] for row in inputfile]
print AAA[2]
As the error message states, AAA is not defined. Python doesn't let you assign to an index of an undefined variable, because it doesn't know whether that variable is supposed to be a list, a dict, or something more exotic. For lists in particular, it also doesn't let you assign to an index that doesn't yet exist; instead, the append or extend methods are used for that:
AAA = []
for i, block in enumerate(inputfile):
for j, row in enumerate(block):
AAA.append(row[0])
print AAA[2]
(However, that isn't quite as efficient as the list comprehension above.)
for i in range(len(inputfile)): ### number of datablock = 5
for j in range(len(inputfile)): ### number of lines per a datablock = 4081
AAA = inputfile[i][j] ### Store first column for each datablocks to AAA
print AAA[0] ### Working as I intended to access 1st column.
print AAA[0][1] ### Not working, invalid index to scalar variable. I can't access to the each elemnt.
There's an obvious problem with the range in the second line, and an inefficiency in looking up inputfile[i] multiple times, but the real problem is in the last line. At this point, AAA refers to one of the rows of one of the blocks; for example, on the first time through, given your dataset above,
AAA == [ 1 4 4 ..., 952 0 0]
It's a single list, with no references to the data structure as a whole. AAA[0] works to access the number in the first column, 1, because that's how lists operate. The second column of that row will be in AAA[1], and so on. But AAA[0][1] throws an error, because it's equivalent to (AAA[0])[1], which in this case is equal to (1)[1], but numbers can't be indexed. (What's the second element of the number 1?)
for i in range(len(inputfile)): ### number of datablock = 5
for j in range(len(inputfile[i])): ### number of lines per a datablock = 4081
for k in range(len(inputfile[i][j])): ### number of columns per line = 9
AAA = inputfile[i][j][0]
print AAA[0]
This time, your for loops, though still inefficient, are at least correct, if you want to iterate over every number in the whole data structure. At the bottom, you'll find that inputfile[i][j][k] is integer k in row j in block i of the data structure. However, you're throwing out k entirely, and printing the first element of the row, once for each item in the row. (The fact that it's repeated exactly as many times as you have columns should have been a clue.) And once again, you can't index any further once you get to the integers; there is no inputfile[i][j][0][0].
Granted, once you get to an element, you can look at nearby elements by changing the indexes. For example, a three-dimensional cellular automaton might want to look at each of its neighbors. With proper corrections for the edges of the data, and checks to ensure that each block and row are the right length (Python won't do that for you), that might look something like:
for i, block in enumerate(inputfile):
for j, row in enumerate(block):
for k, num in enumerate(row):
neighbors = sum(
inputfile[i][j][k-1],
inputfile[i][j][k+1],
inputfile[i][j-1][k],
inputfile[i][j+1][k],
inputfile[i-1][j][k],
inputfile[i+1][j][k],
)
alive = 3 <= neigbors <= 4

Counting of adjacent cells in a numpy array

Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]

Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.

I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.

indexing numpy multidimensional arrays

I need to access this numpy array, sometimes with only the rows where the last column is 0, and sometimes the rows where the value of the last column is 1.
y = [0 0 0 0
1 2 1 1
2 -6 0 1
3 4 1 0]
I have to do this over and over, but would prefer to shy away from creating duplicate arrays or having to recalculate each time. Is there someway that I can identify the indices concerned and just call them? So that I can do this:
>>print y[LAST_COLUMN_IS_0]
[0 0 0 0
3 4 1 0]
>>print y[LAST_COLUMN_IS_1]
[1 2 1 1
2 -6 0 1]
P.S. The number of columns in the array never changes, it's always going to have 4 columns.

You can use numpy's boolean indexing to identify which rows you want to select, and numpy's fancy indexing/slicing to select the whole row.
print y[y[:,-1] == 0, :]
print y[y[:,-1] == 1, :]
You can save y[:,-1] == 0 and ... == 1 as usual, since they are just numpy arrays.
(The y[:,-1] selects the whole of the last column, and the == equality check happens element-wise, resulting in an array of booleans.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split an array into several arrays by defined boundaries, python - python

Related

find index where element change sign from 0 to 1

Python 9x9 and 3x3 array validation excluding 0

Multidimension array indexing and column-accessing

Counting of adjacent cells in a numpy array

indexing numpy multidimensional arrays

Categories

Resources