Save numpy array as binary to read from FORTRAN - python

I have a series of numpy array, i need to save these numpy array in a loop as a raw binary float32 (without any header information) which need to be read from FORTRAN.
import numpy as np
f=open('test.bin','wb+')
for i in range(0,10):
np_data=np.random.rand(10,5)
fortran_data=np.asfortranarray(np_data,'float32')
fortran_data.tofile(f)
f.close()
Is this the correct way so that I can read this binary file created in python from FORTRAN correctly. Your suggestions will be highly apprecitaed

The code you wrote is almost right, but the .tofile method always write the vector in C order. I don't know why the np.asfortranarray() avoids this when writing in the binary file, but I tested and unfortunately we need to transpose the matrix before writing to correct read in Fortran without any other concern (this means in Fortran you can give the actual matrix dimension without needing any transpose).
The code below is to illustrate with a 3D matrix (which I ussually need to use) what I am saying:
a = np.arange(1,10*3*4+1)
b = a.reshape(10,12,order='F')
array([[ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101, 111],
[ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 102, 112],
[ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103, 113],
[ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94, 104, 114],
[ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95, 105, 115],
[ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96, 106, 116],
[ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97, 107, 117],
[ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98, 108, 118],
[ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99, 109, 119],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]])
b is already in Fortran order
c=b.reshape(10,3,4, order='F')
print(c[:,:,0])
[[ 1 11 21]
[ 2 12 22]
[ 3 13 23]
[ 4 14 24]
[ 5 15 25]
[ 6 16 26]
[ 7 17 27]
[ 8 18 28]
[ 9 19 29]
[10 20 30]]
Then I save the matrix c in a binary file:
c.T.tofile('test_c.bin')
So, using this Fortran code I am able to read the binary data in the correct order I created the c matrix in Python:
PROGRAM read_saved_python
IMPLICIT NONE
INTEGER(KIND=8),ALLOCATABLE :: matrix(:,:,:)
INTEGER :: Nx, Ny, Nz
Nx = 10
Ny = 3
Nz = 4
ALLOCATE(matrix(Nx, Ny, Nz))
OPEN(33, FILE="/home/victor/test_c.bin",&
FORM="UNFORMATTED", STATUS="UNKNOWN", ACTION="READ", ACCESS='STREAM')
READ(33) matrix
write(*,*) matrix(:,1,1)
CLOSE(33)
DEALLOCATE(matrix)
END PROGRAM read_saved_python
Notice in Fortran the indexes start in 1 and the print shows in column order (in this case: print the first column, the second and then the third). If you don't transpose the matrix here c.T.tofile('test_c.bin') when reading in Fortran you'll notice that the matrix is not as you wanted, even if you use function np.asfortranarray as you did ( I even tried np.asfortranarray(c).T.tofile('/home/victor/teste_d.bin') (just to make sure) but the matrix is written in c order in the binary file.

You will need the meta data of the array to read it in FORTRAN. This website (https://scipy.github.io/old-wiki/pages/Cookbook/InputOutput.html) has some information on using libnpy to write and an example code fex.f95 to read the binary file.

Related

Modifying alternate indices of 3d numpy array

I have a numpy array with shape (140, 23, 2) being 140 frames, 23 objects, and x,y locations. The data has been generated by a GAN and when I animate the movement it's very jittery. I want to smooth it by converting the coordinates for each object so every odd number index to be the mid-point between the even numbered indices either side of it. e.g.
x[1] = (x[0] + x[2]) / 2
x[3] = (x[2] + x[4]) / 2
Below is my code:
def smooth_coordinates(df):
# df shape is (140, 23, 2)
# iterate through each object (23)
for j in range(len(df[0])):
# iterate through 140 frames
for i in range(len(df)):
# if it's an even number and index allows at least 1 index after it
if (i%2 != 0) and (i < (len(df[0])-2)):
df[i][j][0] = ( (df[i-1][j][0]+df[i+1][j][0]) /2 )
df[i][j][1] = ( (df[i-1][j][1]+df[i+1][j][1]) /2 )
return df
Aside from it being very inefficient my input df and output df are identical. Any suggestions for how to achieve this more efficiently?
import numpy as np
a = np.random.randint(100, size= [140, 23, 2]) # input array
b = a.copy()
i = np.ogrid[1: a.shape[0]-1: 2] # odd indicies
i
>>> [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25,
27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51,
53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77,
79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103,
105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129,
131, 133, 135, 137]
(a == b).all() # testing for equality
>>> True
a[i] = (a[i-1] + a[i+1]) / 2 # averaging positions across frames
(a == b).all() # testing for equality again
>>> False

Faster/lazier way to evenly and randomly split m*n into n group (each has m elements) in python

I want to split m*n elements (e.g., 1, 2, ..., m*n) into n group randomly and evenly such that each group has m random elements. Each group will process k (k>=1) elements at one time from its own group and at the same speed (via some synchronization mechanism), until all group has processed all their own elements. Actually each group is in an independent process/thread.
I use numpy.random.choice(m*n, m*n, replace=False) to generate the permutation first, and then index the permuted result from each group.
The problem is that when m*n is very large (e.g., >=1e8), the speed is very slow (tens of seconds or minutes).
Is there any faster/lazier way to do this? I think maybe this can be done in a lazier way, which is not generating the permuted result in the first time, but generate a generator first, and in each group, generate k elements at each time, and its effect should be identical to the method I currently use. But I don't know how to achieve this lazy way. And I am not sure whether this can be implemented actually.
You can make a generator that will progressively shuffle (a copy of) the list and lazily yield distinct groups:
import random
def rndGroups(A,size):
A = A.copy() # work on a copy (if needed)
p = len(A) # target position of random item
for _ in range(0,len(A),size): # work in chunks of group size
for _ in range(size): # Create one group
i = random.randrange(p) # random index in remaining items
p -= 1 # update randomized position
A[i],A[p] = A[p],A[i] # swap items
yield A[p:p+size] # return shuffled sub-range
Output:
A = list(range(100))
iG = iter(rndGroups(A,10)) # 10 groups of 10 items
s = set() # set to validate uniqueness
for _ in range(10): # 10 groups
g = next(iG) # get the next group from generator
s.update(g) # to check that all items are distinct
print(g)
print(len(s)) # must get 100 distinct values from groups
[87, 19, 85, 90, 35, 55, 86, 58, 96, 68]
[38, 92, 93, 78, 39, 62, 43, 20, 66, 44]
[34, 75, 72, 50, 42, 52, 60, 81, 80, 41]
[13, 14, 83, 28, 53, 5, 94, 67, 79, 95]
[9, 33, 0, 76, 4, 23, 2, 3, 32, 65]
[61, 24, 31, 77, 36, 40, 47, 49, 7, 97]
[63, 15, 29, 25, 11, 82, 71, 89, 91, 30]
[12, 22, 99, 37, 73, 69, 45, 1, 88, 51]
[74, 70, 98, 26, 59, 6, 64, 46, 27, 21]
[48, 17, 18, 8, 54, 10, 57, 84, 16, 56]
100
This will take just as long as pre-shuffling the list (if not longer) but it will let you start/feed threads as you go, thus augmenting the parallelism

Elementwise subtraction in numpy arrays

I have two numpy arrays of different dimensions:
x.shape = (1,1,M) and Y.shape = (N,N).
How do I perform Z = x - Y efficiently in python, such that Z.shape = (N,N,M), where - is an elementwise subtraction operation.
For example, M=10
x = array([[[1, 2, 3, 4, 5 , 6, 7, 8, 9, 10]]])
and N=8
Y = array([[11, 12, 13, 14, 15, 16, 17, 18],
[21, 22, 23, 24, 25, 26, 27, 28],
[31, 32, 33, 34, 35, 36, 37, 38],
[41, 42, 43, 44, 45, 46, 47, 48],
[51, 52, 53, 54, 55, 56, 57, 58],
[61, 62, 63, 64, 65, 66, 67, 68],
[71, 72, 73, 74, 75, 76, 77, 78],
[81, 82, 83, 84, 85, 86, 87, 88]])
Now the idea is to get a Z such that
Z[:,:,0] = array([[1-11, 1-12, 1-13, 1-14, 1-15, 1-16, 1-17, 1-18],
[1-21, 1-22, 1-23, 1-24, 1-25, 1-26, 1-27, 1-28],
[1-31, 1-32, 1-33, 1-34, 1-35, 1-36, 1-37, 1-38],
[1-41, 1-42, 1-43, 1-44, 1-45, 1-46, 1-47, 1-48],
[1-51, 1-52, 1-53, 1-54, 1-55, 1-56, 1-57, 1-58],
[1-61, 1-62, 1-63, 1-64, 1-65, 1-66, 1-67, 1-68],
[1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78],
[1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88]])
and
Z[:,:,9] = array([[10-11, 10-12, 10-13, 10-14, 10-15, 10-16, 10-17, 10-18],
[10-21, 10-22, 10-23, 10-24, 10-25, 10-26, 10-27, 10-28],
[10-31, 10-32, 10-33, 10-34, 10-35, 10-36, 10-37, 10-38],
[10-41, 10-42, 10-43, 10-44, 10-45, 10-46, 10-47, 10-48],
[10-51, 10-52, 10-53, 10-54, 10-55, 10-56, 10-57, 10-58],
[10-61, 10-62, 10-63, 10-64, 10-65, 10-66, 10-67, 10-68],
[10-71, 10-72, 10-73, 10-74, 10-75, 10-76, 10-77, 10-78],
[10-81, 10-82, 10-83, 10-84, 10-85, 10-86, 10-87, 10-88]])
and so on.
It is easy to do in MATLAB using just - operation. But Python does not support it.
The answer is: use different shape of y:
>>> y = y.reshape((8, 8, 1))
>>> (x-y).shape
(8, 8, 10)
This is a vizualization for better understanding with smaller dimensions:
You can compute your result without explicit creation of a reshaped array,
but using Numpy broadcasting.
The key to success is to add a new dimension to Y, using np.newaxis:
Z = x - Y[:, :, np.newaxis]

Remove elements in a list if difference with previous element less than value

Given a list of numbers in ascending order. It is necessary to leave only elements to get such a list where the difference between the elements was greater or equal than a certain value (10 in my case).
Given:
list = [10,15,17,21,34,36,42,67,75,84,92,94,103,115]
Goal:
list=[10,21,34,67,84,94,115]
you could use a while loop and a variable to track the current index you are currently looking at. So starting at index 1, check if the number at this index minus the number in the previous index is less than 10. If it is then delete this index but keep the index counter the same so we look at the next num that is now in this index. If the difference is 10 or more increase the index to look at the next num. I have an additional print line in the loop you can remove this is just to show the comparing.
nums = [10, 15, 17, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
index = 1
while index < len(nums):
print(f"comparing {nums[index-1]} with {nums[index]} nums list {nums}")
if nums[index] - nums[index - 1] < 10:
del nums[index]
else:
index += 1
print(nums)
OUTPUT
comparing 10 with 15 nums list [10, 15, 17, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 10 with 17 nums list [10, 17, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 10 with 21 nums list [10, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 21 with 34 nums list [10, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 34 with 36 nums list [10, 21, 34, 36, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 34 with 42 nums list [10, 21, 34, 42, 67, 75, 84, 92, 94, 103, 115]
comparing 34 with 67 nums list [10, 21, 34, 67, 75, 84, 92, 94, 103, 115]
comparing 67 with 75 nums list [10, 21, 34, 67, 75, 84, 92, 94, 103, 115]
comparing 67 with 84 nums list [10, 21, 34, 67, 84, 92, 94, 103, 115]
comparing 84 with 92 nums list [10, 21, 34, 67, 84, 92, 94, 103, 115]
comparing 84 with 94 nums list [10, 21, 34, 67, 84, 94, 103, 115]
comparing 94 with 103 nums list [10, 21, 34, 67, 84, 94, 103, 115]
comparing 94 with 115 nums list [10, 21, 34, 67, 84, 94, 115]
[10, 21, 34, 67, 84, 94, 115]
You could build up the list in a loop. Start with the first number in the list. Keep track of the last number chosen to be in the new list. Add an item to the new list only when it differs from the last number chosen by at least the target amount:
my_list = [10,15,17,21,34,36,42,67,75,84,92,94,103,115]
last_num = my_list[0]
new_list = [last_num]
for x in my_list[1:]:
if x - last_num >= 10:
new_list.append(x)
last_num = x
print(new_list) #prints [10, 21, 34, 67, 84, 94, 115]
This problem can be solved fairly simply by iterating over your initial set of values, and adding them to your new list only when your difference of x condition is met.
Additionally, by putting this functionality into a function, you can get easily swap out the values or the minimum distance.
values = [10,15,17,21,34,36,42,67,75,84,92,94,103,115]
def foo(elements, distance):
elements = sorted(elements) # sorting the user input
new_elements = [elements[0]] # make a new list for output
for element in elements[1:]: # Iterate over the remaining elements...
if element - new_elements[-1] >= distance:
# this is the condition you described above
new_elements.append(element)
return new_elements
print(foo(values, 10))
# >>> [10, 21, 34, 67, 84, 94, 115]
print(foo(values, 5))
# >>> [10, 15, 21, 34, 42, 67, 75, 84, 92, 103, 115]
A few other notes here...
I sorted the array before I processed it. You may not want to do that for your particular application, but it seemed to make sense, since your sample data was already sorted. In the case that you don't want to sort the data before you build the list, you can remove the sorted on the line that I commented above.
I named the function foo because I was lazy and didn't want to think about the name. I highly recommend that you give it a more descriptive name.

2-dimensional Array decomposition in Python

I would appreciate your help with a translation to Python 3 that decomposes an input array of any size into smaller square arrays of length 4.
I have tried chunks and the array functions in numpy but they are useless for this.
Here is my code in Perl that works well, but I want it to compare to Python (in efficiency).
sub make_array {
my $input = shift;
my $result;
my #parts = split '-', $input;
$result = [];
# Test for valid number of lines in inputs
my $lines = scalar #parts;
if($lines % $width){
die "Invalid line count $lines not divisible by $width" ;
# Or could pad here by adding an entire row of '0'.
}
# Chunk input lines into NxN subarrays
# loop across all input lines in steps of N lines
my $line_width = 0;
for (my $nn=0;$nn<$lines;$nn+=$width){
# make a temp array to handle $width rows of input
my #temp = (0..$width-1);
for my $ii (0..$width-1){
my $p = $parts[$nn+$ii];
my $padding_needed = length($p) % $width;
if($padding_needed != 0) {
print "'$p' is not divisible by correct width of $width, Adding $padding_needed zeros\n";
for my $pp (0..$padding_needed){
$p .= "0";
}
}
if($line_width == 0){
$line_width = length($p);
}
$temp[$ii] = $p;
}
# now process temp array left to right, creating keys
my $chunks = ($line_width/$width);
if($DEBUG) { print "chunks: $chunks\n"; }
for (my $zz =0;$zz<$chunks;$zz++){
if($DEBUG) { print "zz:$zz\n"; }
my $key;
for (my $yy=0;$yy<$width;$yy++){
my $qq = $temp[$yy];
$key .= substr($qq,$zz*$width, $width) . "-";
}
chop $key; # lose the trailing '-'
if($DEBUG) { print "Key: $key\n"; }
push #$result, $key;
}
}
if($DEBUG){
print "Reformatted input:";
print Dumper $result;
my $count = scalar #$result;
print "There are $count keys to check against the lookup table\n";
}
return $result;
}
As an example, I have the following 12 x 12 matrix:
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
and I want it decomposed into 6 square submatrices of length 4:
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
The original matrix comes from a file (the program should read it from a text file) in the following format:
000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011
So the program needs to split it by hyphens and take each chunk as a row of the large matrix. The 6 submatrices should come in the same input format, hence the first one would be:
0000,0000,0000,0000
The program should decompose any input matrix into square matrices of length j, say 4, if the original matrix is of size not multiple of 4 then it should disregard the remaining chunks that couldn't form a 4x4 matrix.
Several large matrices of different size could come in the original input file, with break lines as separators. For example, the original large matrix together with anothe rmatrix would look like the following in a text file:
000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011\n
0101,0101,0101,0101
and retrieve the 2 sets of subarrays, one of 6 arrays of 4x4 and a single one of 4x4 for the second one. If you solve it for the single case is of course fine.
This is easy with numpy. Suppose we have a 12x12 array;
In [1]: import numpy as np
In [2]: a = np.arange(144).reshape([-1,12])
In [3]: a
Out[3]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[ 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[ 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
[ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[ 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83],
[ 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95],
[ 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107],
[108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119],
[120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131],
[132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143]])
To select the top-left 4x4 array, use slicing:
In [4]: a[0:4,0:4]
Out[4]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15],
[24, 25, 26, 27],
[36, 37, 38, 39]])
The right-bottom sub-array is:
In [7]: a[8:12,8:12]
Out[7]:
array([[104, 105, 106, 107],
[116, 117, 118, 119],
[128, 129, 130, 131],
[140, 141, 142, 143]])
You can guess the rest...

Categories