2-dimensional Array decomposition in Python - python

I would appreciate your help with a translation to Python 3 that decomposes an input array of any size into smaller square arrays of length 4.
I have tried chunks and the array functions in numpy but they are useless for this.
Here is my code in Perl that works well, but I want it to compare to Python (in efficiency).
sub make_array {
my $input = shift;
my $result;
my #parts = split '-', $input;
$result = [];
# Test for valid number of lines in inputs
my $lines = scalar #parts;
if($lines % $width){
die "Invalid line count $lines not divisible by $width" ;
# Or could pad here by adding an entire row of '0'.
}
# Chunk input lines into NxN subarrays
# loop across all input lines in steps of N lines
my $line_width = 0;
for (my $nn=0;$nn<$lines;$nn+=$width){
# make a temp array to handle $width rows of input
my #temp = (0..$width-1);
for my $ii (0..$width-1){
my $p = $parts[$nn+$ii];
my $padding_needed = length($p) % $width;
if($padding_needed != 0) {
print "'$p' is not divisible by correct width of $width, Adding $padding_needed zeros\n";
for my $pp (0..$padding_needed){
$p .= "0";
}
}
if($line_width == 0){
$line_width = length($p);
}
$temp[$ii] = $p;
}
# now process temp array left to right, creating keys
my $chunks = ($line_width/$width);
if($DEBUG) { print "chunks: $chunks\n"; }
for (my $zz =0;$zz<$chunks;$zz++){
if($DEBUG) { print "zz:$zz\n"; }
my $key;
for (my $yy=0;$yy<$width;$yy++){
my $qq = $temp[$yy];
$key .= substr($qq,$zz*$width, $width) . "-";
}
chop $key; # lose the trailing '-'
if($DEBUG) { print "Key: $key\n"; }
push #$result, $key;
}
}
if($DEBUG){
print "Reformatted input:";
print Dumper $result;
my $count = scalar #$result;
print "There are $count keys to check against the lookup table\n";
}
return $result;
}
As an example, I have the following 12 x 12 matrix:
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
and I want it decomposed into 6 square submatrices of length 4:
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
The original matrix comes from a file (the program should read it from a text file) in the following format:
000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011
So the program needs to split it by hyphens and take each chunk as a row of the large matrix. The 6 submatrices should come in the same input format, hence the first one would be:
0000,0000,0000,0000
The program should decompose any input matrix into square matrices of length j, say 4, if the original matrix is of size not multiple of 4 then it should disregard the remaining chunks that couldn't form a 4x4 matrix.
Several large matrices of different size could come in the original input file, with break lines as separators. For example, the original large matrix together with anothe rmatrix would look like the following in a text file:
000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011\n
0101,0101,0101,0101
and retrieve the 2 sets of subarrays, one of 6 arrays of 4x4 and a single one of 4x4 for the second one. If you solve it for the single case is of course fine.

This is easy with numpy. Suppose we have a 12x12 array;
In [1]: import numpy as np
In [2]: a = np.arange(144).reshape([-1,12])
In [3]: a
Out[3]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[ 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[ 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
[ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[ 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83],
[ 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95],
[ 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107],
[108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119],
[120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131],
[132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143]])
To select the top-left 4x4 array, use slicing:
In [4]: a[0:4,0:4]
Out[4]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15],
[24, 25, 26, 27],
[36, 37, 38, 39]])
The right-bottom sub-array is:
In [7]: a[8:12,8:12]
Out[7]:
array([[104, 105, 106, 107],
[116, 117, 118, 119],
[128, 129, 130, 131],
[140, 141, 142, 143]])
You can guess the rest...

Related

trace the column following the string column for given dataset

Looking for generic logic which can be applied to any dataset for finding the column which falls next to string column.
Problem:
Sum of first column and the column following the one with the string datatype = 5th column
{
[10, 20, 'tyh', 30],
[66, 88, 'ltk', 99],
[41, 31, 'qed', 11]
}
Expected output:
{
[10, 20, 'tyh', 30, 40],
[66, 88, 'ltk', 99, 165],
[41, 31, 'qed', 11 , 52]
}
The logic should work for below df also
{
[111, 'tfy', 122, 133],
[167, 'elt', 187, 197],
[143, 'xqe', 132, 112]
}
expected output :
{
[111, 'tfy', 122, 133 , 244],
[167, 'elt', 187, 197 , 364],
[143, 'xqe', 132, 112, 255]
}
what I did so far:
data[4]= data[0]+data[3]
but this is hard coded and so wont work for second df.
str_col_no = df.columns[df.dtypes==object][0]
df[4] = df[0] + df[str_col_no+1]

Modifying alternate indices of 3d numpy array

I have a numpy array with shape (140, 23, 2) being 140 frames, 23 objects, and x,y locations. The data has been generated by a GAN and when I animate the movement it's very jittery. I want to smooth it by converting the coordinates for each object so every odd number index to be the mid-point between the even numbered indices either side of it. e.g.
x[1] = (x[0] + x[2]) / 2
x[3] = (x[2] + x[4]) / 2
Below is my code:
def smooth_coordinates(df):
# df shape is (140, 23, 2)
# iterate through each object (23)
for j in range(len(df[0])):
# iterate through 140 frames
for i in range(len(df)):
# if it's an even number and index allows at least 1 index after it
if (i%2 != 0) and (i < (len(df[0])-2)):
df[i][j][0] = ( (df[i-1][j][0]+df[i+1][j][0]) /2 )
df[i][j][1] = ( (df[i-1][j][1]+df[i+1][j][1]) /2 )
return df
Aside from it being very inefficient my input df and output df are identical. Any suggestions for how to achieve this more efficiently?
import numpy as np
a = np.random.randint(100, size= [140, 23, 2]) # input array
b = a.copy()
i = np.ogrid[1: a.shape[0]-1: 2] # odd indicies
i
>>> [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25,
27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51,
53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77,
79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103,
105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129,
131, 133, 135, 137]
(a == b).all() # testing for equality
>>> True
a[i] = (a[i-1] + a[i+1]) / 2 # averaging positions across frames
(a == b).all() # testing for equality again
>>> False

Remove elements in a list if difference with previous element less than value

Given a list of numbers in ascending order. It is necessary to leave only elements to get such a list where the difference between the elements was greater or equal than a certain value (10 in my case).
Given:
list = [10,15,17,21,34,36,42,67,75,84,92,94,103,115]
Goal:
list=[10,21,34,67,84,94,115]
you could use a while loop and a variable to track the current index you are currently looking at. So starting at index 1, check if the number at this index minus the number in the previous index is less than 10. If it is then delete this index but keep the index counter the same so we look at the next num that is now in this index. If the difference is 10 or more increase the index to look at the next num. I have an additional print line in the loop you can remove this is just to show the comparing.
nums = [10, 15, 17, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
index = 1
while index < len(nums):
print(f"comparing {nums[index-1]} with {nums[index]} nums list {nums}")
if nums[index] - nums[index - 1] < 10:
del nums[index]
else:
index += 1
print(nums)
OUTPUT
comparing 10 with 15 nums list [10, 15, 17, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 10 with 17 nums list [10, 17, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 10 with 21 nums list [10, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 21 with 34 nums list [10, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 34 with 36 nums list [10, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 34 with 42 nums list [10, 21, 34, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 34 with 67 nums list [10, 21, 34, 67, 75, 84, 92, 94, 103, 115]
comparing 67 with 75 nums list [10, 21, 34, 67, 75, 84, 92, 94, 103, 115]
comparing 67 with 84 nums list [10, 21, 34, 67, 84, 92, 94, 103, 115]
comparing 84 with 92 nums list [10, 21, 34, 67, 84, 92, 94, 103, 115]
comparing 84 with 94 nums list [10, 21, 34, 67, 84, 94, 103, 115]
comparing 94 with 103 nums list [10, 21, 34, 67, 84, 94, 103, 115]
comparing 94 with 115 nums list [10, 21, 34, 67, 84, 94, 115]
[10, 21, 34, 67, 84, 94, 115]
You could build up the list in a loop. Start with the first number in the list. Keep track of the last number chosen to be in the new list. Add an item to the new list only when it differs from the last number chosen by at least the target amount:
my_list = [10,15,17,21,34,36,42,67,75,84,92,94,103,115]
last_num = my_list[0]
new_list = [last_num]
for x in my_list[1:]:
if x - last_num >= 10:
new_list.append(x)
last_num = x
print(new_list) #prints [10, 21, 34, 67, 84, 94, 115]
This problem can be solved fairly simply by iterating over your initial set of values, and adding them to your new list only when your difference of x condition is met.
Additionally, by putting this functionality into a function, you can get easily swap out the values or the minimum distance.
values = [10,15,17,21,34,36,42,67,75,84,92,94,103,115]
def foo(elements, distance):
elements = sorted(elements) # sorting the user input
new_elements = [elements[0]] # make a new list for output
for element in elements[1:]: # Iterate over the remaining elements...
if element - new_elements[-1] >= distance:
# this is the condition you described above
new_elements.append(element)
return new_elements
print(foo(values, 10))
# >>> [10, 21, 34, 67, 84, 94, 115]
print(foo(values, 5))
# >>> [10, 15, 21, 34, 42, 67, 75, 84, 92, 103, 115]
A few other notes here...
I sorted the array before I processed it. You may not want to do that for your particular application, but it seemed to make sense, since your sample data was already sorted. In the case that you don't want to sort the data before you build the list, you can remove the sorted on the line that I commented above.
I named the function foo because I was lazy and didn't want to think about the name. I highly recommend that you give it a more descriptive name.

How to use numpy to flip this array?

Suppose I have an array like this:
a = array([[[ 29, 29, 27],
[ 36, 38, 40],
[ 86, 88, 89]],
[[200, 200, 198],
[199, 199, 197]
[194, 194, 194]]])
and I want to flip the 3rd element from left to right in the list-of-lists so it will become like this:
b = array([[[ 29, 29, 89], # 27 became 89
[ 36, 38, 40],
[ 86, 88, 27]], # 89 became 27
[[200, 200, 194], # 198 became 194
[199, 199, 197],
[194, 194, 198]]]) # 194 became 198
I looked up the NumPy manual but I still cannot figure out a solution. .flip and .fliplr look suitable in this case, but how do I use them?
Index the array to select the sub-array, using:
> a[:,:,-1]
array([[198, 197, 194],
[ 27, 40, 89]])
This selects the last element along the 3rd dimension of a. The sub-array is of shape (2,3). Then reverse the selection using:
a[:,:,-1][:,::-1]
The second slice, [:,::-1], takes everything along the first dimension as-is ([:]), and all of the elements along the second dimension, but reversed ([::-1]). The slice syntax is basically saying start at the first element, go the last element ([:]), but do it in the reverse order ([::-1]). You could pseudo-code write it as [start here : end here : use this step size]. The the -1 tells it walk backwards.
And assign it to the first slice of the original array. This updates/overwrites the original value of a
a[:,:,-1] = a[:,:,-1][:,::-1]
> a
array([[[ 29, 29, 89],
[ 36, 38, 40],
[ 86, 88, 27]],
[[200, 200, 194],
[199, 199, 197],
[194, 194, 198]]])

Save numpy array as binary to read from FORTRAN

I have a series of numpy array, i need to save these numpy array in a loop as a raw binary float32 (without any header information) which need to be read from FORTRAN.
import numpy as np
f=open('test.bin','wb+')
for i in range(0,10):
np_data=np.random.rand(10,5)
fortran_data=np.asfortranarray(np_data,'float32')
fortran_data.tofile(f)
f.close()
Is this the correct way so that I can read this binary file created in python from FORTRAN correctly. Your suggestions will be highly apprecitaed
The code you wrote is almost right, but the .tofile method always write the vector in C order. I don't know why the np.asfortranarray() avoids this when writing in the binary file, but I tested and unfortunately we need to transpose the matrix before writing to correct read in Fortran without any other concern (this means in Fortran you can give the actual matrix dimension without needing any transpose).
The code below is to illustrate with a 3D matrix (which I ussually need to use) what I am saying:
a = np.arange(1,10*3*4+1)
b = a.reshape(10,12,order='F')
array([[ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101, 111],
[ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 102, 112],
[ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103, 113],
[ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94, 104, 114],
[ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95, 105, 115],
[ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96, 106, 116],
[ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97, 107, 117],
[ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98, 108, 118],
[ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99, 109, 119],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]])
b is already in Fortran order
c=b.reshape(10,3,4, order='F')
print(c[:,:,0])
[[ 1 11 21]
[ 2 12 22]
[ 3 13 23]
[ 4 14 24]
[ 5 15 25]
[ 6 16 26]
[ 7 17 27]
[ 8 18 28]
[ 9 19 29]
[10 20 30]]
Then I save the matrix c in a binary file:
c.T.tofile('test_c.bin')
So, using this Fortran code I am able to read the binary data in the correct order I created the c matrix in Python:
PROGRAM read_saved_python
IMPLICIT NONE
INTEGER(KIND=8),ALLOCATABLE :: matrix(:,:,:)
INTEGER :: Nx, Ny, Nz
Nx = 10
Ny = 3
Nz = 4
ALLOCATE(matrix(Nx, Ny, Nz))
OPEN(33, FILE="/home/victor/test_c.bin",&
FORM="UNFORMATTED", STATUS="UNKNOWN", ACTION="READ", ACCESS='STREAM')
READ(33) matrix
write(*,*) matrix(:,1,1)
CLOSE(33)
DEALLOCATE(matrix)
END PROGRAM read_saved_python
Notice in Fortran the indexes start in 1 and the print shows in column order (in this case: print the first column, the second and then the third). If you don't transpose the matrix here c.T.tofile('test_c.bin') when reading in Fortran you'll notice that the matrix is not as you wanted, even if you use function np.asfortranarray as you did ( I even tried np.asfortranarray(c).T.tofile('/home/victor/teste_d.bin') (just to make sure) but the matrix is written in c order in the binary file.
You will need the meta data of the array to read it in FORTRAN. This website (https://scipy.github.io/old-wiki/pages/Cookbook/InputOutput.html) has some information on using libnpy to write and an example code fex.f95 to read the binary file.

Categories