Represent 3 values as 1 in numpy array - python

I am attempting to convert a 3 channel numpy array to a single channel numpy array. I want to combine all 3 element values into 1 number using:
x << 16 + y << 8 + z
My code below does that but it seems to make alot of the numbers zero. Is that correct? Or am I doing something wrong? Should those last 2 numbers be zero or something else?
ar = np.array((
((255,255,255),),
((255,20,255),),
((0,255,255),), # this becomes zero, is that correct?
((22,10,12),), # this becomes zero, is that correct?
), dtype='uint8')
c1,c2,c3 = cv2.split(ar)
single = np.int32(c1) << 16 + np.int32(c2) << 8 + np.int32(c3)
print(single)
print(ar.shape)
[[1069547520]
[ 522240]
[ 0]
[ 0]]
(4, 1, 3)

Add a column of zeros to make the array 4 bytes wide:
ar4 = np.insert(ar, 0, 0, 2)
Then simply view it as a big-endian array of 4-byte integers:
ar4.view('>u4')
This gives:
array([[[16777215]],
[[16717055]],
[[ 65535]],
[[ 1444364]]], dtype=uint32)
The only step here which really takes time is np.insert(), so if you are able to add that extra column while loading your data, the rest of the transformation is basically free (i.e. does not require copying data).

Related

Transforming different arrays into a loop

I was wondering if it was possible to transform these next process into a loop so that I can use one word for this (not as a vector):
Data0 = np.zeros(dem0.shape, dtype=np.int32)
Data0[zipp[0] >= 0 ] = 1
Data1 = np.zeros(dem1.shape, dtype=np.int32)
Data1[zipp[1] >= 0 ] = 1
Data2 = np.zeros(dem2.shape, dtype=np.int32)
Data2[zipp[2] >= 0 ] = 1
Data3 = np.zeros(dem3.shape, dtype=np.int32)
Data3[zipp[3] >= 0 ] = 1
As you can see, there are 4 shapes for each layer (four layers total). I am trying to put a specific/corresponding "zipp" vector position to each dem.shape for each layer I have (in vector zipp[i] each i is an array of each dem).
What I want it to do is to replace with the number 1 those values greater than or equal to zero in the array contained in zipp[i] for each layer/shape/dem.
However, as a result, I must deliver this as a word not a vector or array, so I've been thinking of a loop but haven't been illuminated enough just yet.
Thank you :)
I'm not quite sure what you mean by delivering the result "as a word not a vector or array", but assuming all of these arrays have the same shape you can reduce this to a couple of lines (maybe someone else knows how to do it in 1):
data = np.zeros_like(zipp, dtype=np.int32)
data[zipp >= 0] = 1
If just you want to return a boolean array of where zipp is greater than or equal to 0 you can do that in 1 line like this:
bool = np.greater_equal(zipp, 0)

Add mask to a byte-array knowing bits length and starting position

I have to apply a bit mask to a CAN-bus payload message (8 bytes) to filter a single signal (there are a multiple signals in a message) in Python 3 and my inputs are:
Length of the signal I want to filter in binary (think about a set of '1's).
The starting position of the signal.
The problem is that the signal can start in the middle of a byte and occupy more than 1 byte.
For example I have to filter a signal with starting bit position = 50 and length = 10
The mask will be byte 6 = (00111111) and byte 7 = (11000000). All other bytes set to 0.
I've tried to build an array of bytes with 1's and then apply | with an empty 8-byte length array to have the mask. And also create directly the 8-byte array but can't achieve how to bitwise correctly the starting position.
I tried with bitstring module and bytearray but can't find a good solution.
Could anyone help?
Thank you very much.
Edit: adding non-functional code if signal starts in the middle of byte:
my_mask_byte = [0, 0, 0, 0, 0, 0, 0, 0]
message_bit_pos = 50
message_signal_lenght = 10
byte_pos = message_bit_pos // 8
bit_pos = message_bit_pos % 8
for i in range(0, message_signal_lenght):
if i < 8:
my_mask_byte[byte_pos + i // 8] = 1 << i + bit_pos | my_mask_byte[byte_pos + i // 8]
else:
my_mask_byte[byte_pos + i // 8] = 1 << i-8 | my_mask_byte[byte_pos + i // 8]
for byte in my_mask_byte:
print(bin(byte))
byte 6 = (00111111) and byte 7 = (11110000)
You missed 2 bits since length is 10.
You can easily achieve this with numpy:
import numpy as np
message_bit_pos = 50
message_signal_length = 10
mask = np.uint64(0)
while message_signal_length > 0:
mask |= np.uint64(1 << (64-50-message_signal_length))
message_signal_length -= 1
print(f'mask: 0b{mask:064b}')
n = np.uint64(0b0000000000000000011000000000000000000000000000000011111111000000)
print(f'n: 0b{n:064b}')
n &= mask
print(f'n&m: 0b{n:064b}')
output:
mask: 0b0000000000000000000000000000000000000000000000000011111111110000
n: 0b0000000000000000011000000000000000000000000000000011111111000000
n&m: 0b0000000000000000000000000000000000000000000000000011111111000000

Iterate the code in a shortest way for the whole dataset

I have very big df:
df.shape() = (106, 3364)
I want to calculate so called frechet distance by using this Frechet Distance between 2 curves. And it works good. Example:
x = df['1']
x1 = df['1.1']
p = np.array([x, x1])
y = df['2']
y1 = df['2.1']
q = np.array([y, y1])
P_final = list(zip(p[0], p[1]))
Q_final = list(zip(q[0], q[1]))
from frechetdist import frdist
frdist(P_final,Q_final)
But I can not do row by row like:
`1 and 1.1` to `1 and 1.1` which is equal to 0
`1 and 1.1` to `2 and 2.1` which is equal to some number
...
`1 and 1.1` to `1682 and 1682.1` which is equal to some number
I want to create something (first idea is for loop, but maybe you have better solution) to calculate this frdist(P_final,Q_final) between:
first rows to all rows (including itself)
second row to all rows (including itself)
Finally, I supposed to get a matrix size (106,106) with 0 on diagonal (because distance between itself is 0)
matrix =
0 1 2 3 4 5 ... 105
0 0
1 0
2 0
3 0
4 0
5 0
... 0
105 0
Not including my trial code because it is confusing everyone!
EDITED:
Sample data:
1 1.1 2 2.1 3 3.1 4 4.1 5 5.1
0 43.1024 6.7498 45.1027 5.7500 45.1072 3.7568 45.1076 8.7563 42.1076 8.7563
1 46.0595 1.6829 45.0595 9.6829 45.0564 4.6820 45.0533 8.6796 42.0501 3.6775
2 25.0695 5.5454 44.9727 8.6660 41.9726 2.6666 84.9566 3.8484 44.9566 1.8484
3 35.0281 7.7525 45.0322 3.7465 14.0369 3.7463 62.0386 7.7549 65.0422 7.7599
4 35.0292 7.5616 45.0292 4.5616 23.0292 3.5616 45.0292 7.5616 25.0293 7.5613
I just used own sample data in your format (I hope)
import pandas as pd
from frechetdist import frdist
import numpy as np
# create sample data
df = pd.DataFrame([[1,2,3,4,5,6], [3,4,5,6,8,9], [2,3,4,5,2,2], [3,4,5,6,7,3]], columns=['1','1.1','2', '2.1', '3', '3.1'])
# this matrix will hold the result
res = np.ndarray(shape=(df.shape[1] // 2, df.shape[1] // 2), dtype=np.float32)
for row in range(res.shape[0]):
for col in range(row, res.shape[1]):
# extract the two functions
P = [*zip([df.loc[:, f'{row+1}'], df.loc[:, f'{row+1}.1']])]
Q = [*zip([df.loc[:, f'{col+1}'], df.loc[:, f'{col+1}.1']])]
# calculate distance
dist = frdist(P, Q)
# put result back (its symmetric)
res[row, col] = dist
res[col, row] = dist
# output
print(res)
Output:
[[0. 4. 7.5498343]
[4. 0. 5.5677643]
[7.5498343 5.5677643 0. ]]
Hope that helps
EDIT: Some general tips:
If speed matters: check if frdist handles also a numpy array of shape
(n_values, 2) than you could save the rather expensive zip-and-unpack operation
and directly use the arrays or build the data directly in a format the your library needs
Generally, use better column namings (3 and 3.1 is not too obvious). Why you dont call them x3, y3 or x3 and f_x3
I would actually put the data into two different Matrices. If you watch the
code I had to do some not-so-obvious stuff like iterating over shape
divided by two and built indices from string operations because of the given table layout

Python Numpy - Square Values Issue

I'm trying to square all the elements in a numpy array but the results are not what I'm expecting (ie some are negative numbers and none are the actual square values). Can anyone please explain what I'm doing wrong and/or whats going on?
import numpy as np
import math
f = 'file.bin'
frameNum = 25600
channelNum = 2640
data = np.fromfile(f,dtype=np.int16)
total = frameNum*channelNum*2
rs = data[:total].reshape(channelNum,-1) #reshaping the data a little. Omitting added values at the end.
I = rs[:,::2] # pull out every other column
print "Shape :", I.shape
print "I : ", I[1,:10]
print "I**2 : ", I[1,:10]**2
print "I*I : ",I[1,:10]* I[1,:10]
print "np.square : ",np.square(I[1,:10])
exit()
Output:
Shape : (2640L, 25600L)
I : [-5302 -5500 -5873 -5398 -5536 -6708 -6860 -6506 -6065 -6363]
I**2 : [ -3740 -27632 20193 -25116 -23552 -25968 4752 -8220 18529 -13479]
I*I : [ -3740 -27632 20193 -25116 -23552 -25968 4752 -8220 18529 -13479]
np.square : [ -3740 -27632 20193 -25116 -23552 -25968 4752 -8220 18529 -13479]
Any suggestions?
It is because of the dtype=np.int16. You are allowing only 16 bits to represent the numbers, and -5302**2 is larger than the maximum value (32767) that a signed 16-bit integer can take. So you're seeing only the lowest 16 bits of the result, the first of which is interpreted (or, from your point of view, misinterpreted) as a sign bit.
Convert your array to a different dtype - for example
I = np.array( I, dtype=np.int32 )
or
I = np.array( I, dtype=np.float )
before performing numerical operations that might go out of range.
With dtype=np.int16, the highest-magnitude integers you can square are +181 and -181. The square of 182 is larger than 32767 and so it overflows. Even with dtype=np.int32 representation, the highest-magnitude integers you can square are +46340 and -46340: the square of 46341 overflows.
This is the reason:
>>> a = np.array([-5302, -5500], dtype=np.int16)
>>> a * a
array([ -3740, -27632], dtype=int16)
This the solution:
b = np.array([-5302, -5500], dtype=np.int32)
>>> b * b
>>> array([28111204, 30250000], dtype=int32)
Change:
data = np.fromfile(f, dtype=np.int16)
into:
data = np.fromfile(f, dtype=np.in16).astype(np.int32)

Speeding up computations with numpy matrices

I have two matrices. Both are filled with zeros and ones. One is a big one (3000 x 2000 elements), and the other is smaller ( 20 x 20 ) elements. I am doing something like:
newMatrix = (size of bigMatrix), filled with zeros
l = (a constant)
for y in xrange(0, len(bigMatrix[0])):
for x in xrange(0, len(bigMatrix)):
for b in xrange(0, len(smallMatrix[0])):
for a in xrange(0, len(smallMatrix)):
if (bigMatrix[x, y] == smallMatrix[x + a - l, y + b - l]):
newMatrix[x, y] = 1
Which is being painfully slow. Am I doing anything wrong? Is there a smart way to make this work faster?
edit: Basically I am, for each (x,y) in the big matrix, checking all the pixels of both big matrix and the small matrix around (x,y) to see if they are 1. If they are 1, then I set that value on newMatrix. I am doing a sort of collision detection.
I can think of a couple of optimisations there -
As you are using 4 nested python "for" statements, you are about as slow as you can be.
I can't figure out exactly what you are looking for -
but for one thing, if your big matrix "1"s density is low, you can certainly use python's "any" function on bigMtarix's slices to quickly check if there are any set elements there -- you could get a several-fold speed increase there:
step = len(smallMatrix[0])
for y in xrange(0, len(bigMatrix[0], step)):
for x in xrange(0, len(bigMatrix), step):
if not any(bigMatrix[x: x+step, y: y + step]):
continue
(...)
At this point, if still need to interact on each element, you do another pair of indexes to walk each position inside the step - but I think you got the idea.
Apart from using inner Numeric operations like this "any" usage, you could certainly add some control flow code to break-off the (b,a) loop when the first matching pixel is found.
(Like, inserting a "break" statement inside your last "if" and another if..break pair for the "b" loop.
I really can't figure out exactly what your intent is - so I can't give you more specifc code.
Your example code makes no sense, but the description of your problem sounds like you are trying to do a 2d convolution of a small bitarray over the big bitarray. There's a convolve2d function in scipy.signal package that does exactly this. Just do convolve2d(bigMatrix, smallMatrix) to get the result. Unfortunately the scipy implementation doesn't have a special case for boolean arrays so the full convolution is rather slow. Here's a function that takes advantage of the fact that the arrays contain only ones and zeroes:
import numpy as np
def sparse_convolve_of_bools(a, b):
if a.size < b.size:
a, b = b, a
offsets = zip(*np.nonzero(b))
n = len(offsets)
dtype = np.byte if n < 128 else np.short if n < 32768 else np.int
result = np.zeros(np.array(a.shape) + b.shape - (1,1), dtype=dtype)
for o in offsets:
result[o[0]:o[0] + a.shape[0], o[1]:o[1] + a.shape[1]] += a
return result
On my machine it runs in less than 9 seconds for a 3000x2000 by 20x20 convolution. The running time depends on the number of ones in the smaller array, being 20ms per each nonzero element.
If your bits are really packed 8 per byte / 32 per int,
and you can reduce your smallMatrix to 20x16,
then try the following, here for a single row.
(newMatrix[x, y] = 1 when any bit of the 20x16 around x,y is 1 ??
What are you really looking for ?)
python -m timeit -s '
""" slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] """
import numpy as np
bits = np.zeros( 2000 // 16, np.uint16 ) # 2000 bits
bits[::8] = 1
mask = 32+16
nhit = 16 * [0]
def hit16( bits, mask, nhit ):
"""
slide 16-bit mask across 32-bit pairs bits[j], bits[j+1]
bits: long np.array( uint16 )
mask: 16 bits, int
out: nhit[j] += 1 where pair & mask != 0
"""
left = bits[0]
for b in bits[1:]:
pair = (left << 16) | b
if pair: # np idiom for non-0 words ?
m = mask
for j in range(16):
if pair & m:
nhit[j] += 1
# hitposition = jb*16 + j
m <<= 1
left = b
# if any(nhit): print "hit16:", nhit
' \
'
hit16( bits, mask, nhit )
'
# 15 msec per loop, bits[::4] = 1
# 11 msec per loop, bits[::8] = 1
# mac g4 ppc

Categories