Calculating distance between each consecutive element of an array - python

Let's say that I have the following array of numbers:
array([[-3 , 3],
[ 2, -1],
[-4, -4],
[-4, -4],
[ 0, 3],
[-3, -2],
[-4, -2]])
I would then like to compute the norm of the distance between each pair of consecutive numbers in the columns, i.e.
array([[norm(2--3), norm(-1-3)],
[norm(-4-2), norm(-4--1)],
[norm(-4--4), norm(-4--4)],
[norm(0--4), norm(3--4)],
[norm(-3-0), norm(-2-3)],
[norm(-4--3)-3, norm(-2--2)])
I would then like to take the mean of each column.
Is there a quick and efficient way of doing this in Python? I've been trying but have had no luck so far.
Thank you for your help!

This will do the job:
np.mean(np.absolute(a[1:]-a[:-1]),0)
This returns
array([ 3.16666667, 3.16666667])
Explanation:
First of all, np.absolute(a[1:]-a[:-1]) returns
array([[5, 4],
[6, 3],
[0, 0],
[4, 7],
[3, 5],
[1, 0]])
which is the array of the absolute values of the differences (I assume that by norm of a number you mean absolute value). Then applying np.mean with axis=0 returns the average value of every column.

Related

Use numpy to mask a row containing only zeros

I have a large array of point cloud data which is generated using the azure kinect. All erroneous measurements are assigned the coordinate [0,0,0]. I want to remove all coordinates with the value [0,0,0]. Since my array is rater large (1 million points) and since U need to do this process in real-time, speed is of the essence.
In my current approach I try to use numpy to mask out all rows that contain three zeroes ([0,0,0]). However, the np.ma.masked_equal function does not evaluate an entire row, but only evaluates single elements. As a result, rows that contain at least one 0 are already filtered by this approach. I only want rows to be filtered when all values in the row are 0. Find an example of my code below:
my_data = np.array([[1,2,3],[0,0,0],[3,4,5],[2,5,7],[0,0,1]])
my_data = np.ma.masked_equal(my_data, [0,0,0])
my_data = np.ma.compress_rows(my_data)
output
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7]])
desired output
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7],
[0, 0, 1]])`
Find all data points that are 0 (doesn't require np.ma module) and then select all rows that do not contain all zeros:
import numpy as np
my_data = np.array([[1, 2, 3], [0, 0, 0], [3, 4, 5], [2, 5, 7], [0, 0, 1]])
my_data[~(my_data == 0).all(axis= 1)]
Output:
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7],
[0, 0, 1]])
Instead of using the np.ma.masked_equal and np.ma.compress_rows functions, you can use the np.all function to check if all values in a row are equal to [0, 0, 0]. This should be faster than your method as it evaluates all values in a row at once.
mask = np.all(my_data == [0, 0, 0], axis=1)
my_data = my_data[~mask]

Replace numpy subarray when element matches a condition

I have an n x m x 3 numpy array. This represents a middle-step towards an RGB representation of a complex-function plotter. When the function being plotted takes infinite values or has singularities, parts of the RGB data become NaNs.
I'm looking for an efficient way to replace a row containing a NaN with a row of my choice, perhaps [0, 0, 0] or [1, 1, 1]. In terms of the RGB values, this has the effect of replacing poorly-behaving pixels with white or black pixels. By efficient, I mean some way that takes advantage of numpy's vectorization and speed.
Please note that I am not looking to merely replace the NaN values with 0 (which I know how to do with numpy.where); if a row contains a NaN, I want to replace the whole row. I suspect this can be done nicely in numpy, but I'm not sure how.
Concrete Question
Suppose we are given a 2 x 2 x 3 array arr. If a row contains a 5, I want to replace the row with [0, 0, 0]. Trivial code that does this slowly is as follows.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
# so arr is
# array([[[1, 2, 3],
# [4, 5, 6]],
#
# [[1, 3, 5],
# [2, 4, 6]]])
# Trivial and slow version to replace rows containing 5 with [0,0,0]
for i in range(len(arr)):
for j in range(len(arr[i])):
if 5 in arr[i][j]:
arr[i][j] = np.array([0, 0, 0])
# Now arr is
#
# array([[[1, 2, 3],
# [0, 0, 0]],
#
# [[0, 0, 0],
# [2, 4, 6]]])
How can we accomplish this taking advantage of numpy?
A simpler way would be -
arr[np.isin(arr,5).any(-1)] = 0
If it's just a single value that you are looking for, then we could simplify to -
arr[(arr==5).any(-1)] = 0
If you are looking to match against NaN, we need to do the comparison differently and use np.isnan instead -
arr[np.isnan(arr).any(-1)] = 0
If you are looking to assign array values, instead of just 0, the solutions stay the same. Hence it would be -
arr[(arr==5).any(-1)] = new_array
Using np.broadcast_to
arr[np.broadcast_to((arr == 5).any(-1)[..., None], arr.shape)] = 0
array([[[1, 2, 3],
[0, 0, 0]],
[[0, 0, 0],
[2, 4, 6]]])
Just as FYI, based on your description, if you want to find np.nans instead of integers like 5, you shouldn't use ==, but rather np.isnan
arr[np.broadcast_to((np.isnan(arr)).any(-1)[..., None], arr.shape)] = 0
you can do it using in1d function like below
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
arr[np.in1d(arr,5).reshape(arr.shape).any(axis=2)] = [0,0,0]
arr

Python n-dimensional array combinations

Suppose an arbitrary number of arrays of arbitrary length. I would like to construct the n-dimensional array of all the combinations from the values in the arrays. Or even better, a list of all the combinations.
However, I would also like the previous "diagonal" element along each combination, except when such an element does not exist, in which case the values which do not exist are set to say -inf.
Take for ex. the following simple 2-D case:
v1=[-2,2]
v2=[-3,3]
From which I would get all the combinations
[[-2,-3],
[-2,3],
[2,-3],
[2,3]]
Or in 2D array / matrix form
-3 3
-2 -2,-3 -2,3
2 2,-3 2,3
Now I would also like a new column with the previous "diagonal" elements (in this case there is only 1 real such case) for each element. By previous "diagonal" element I mean the element at index i-1, j-1, k-1, ..., n-1. On the margins we take all the previous values that are possible.
1 2
-2,-3 -inf,-inf
-2, 3 -inf,-3
2,-3 -2,-inf
2, 3 -2,-3
Edit: here is the code for the 2D case, which is not much use for the general n-case.
import math
v1=[-3,-1,2,4]
v2=[-2,0,2]
tmp=[]
tmp2=[]
for i in range(0,len(v1)):
for j in range(0,len(v2)):
tmp.append([v1[i],v2[j]])
if i==0 and j==0:
tmp2.append([-math.inf,-math.inf])
elif i==0:
tmp2.append([-math.inf,v2[j-1]])
elif j==0:
tmp2.append([v1[i-1],-math.inf])
else:
tmp2.append([v1[i-1],v2[j-1]])
And so
tmp
[[-3, -2],
[-3, 0],
[-3, 2],
[-1, -2],
[-1, 0],
[-1, 2],
[2, -2],
[2, 0],
[2, 2],
[4, -2],
[4, 0],
[4, 2]]
and
tmp2
[[-inf, -inf],
[-inf, -2],
[-inf, 0],
[-3, -inf],
[-3, -2],
[-3, 0],
[-1, -inf],
[-1, -2],
[-1, 0],
[2, -inf],
[2, -2],
[2, 0]]
Take a look at itertools.product().
To get the "diagonals" you could take the product of the vectors indices instead of the vectors themselves. That way you can access the values of each combination aswell as the previous values of the combination.
Example:
import itertools
v1=[-2,2]
v2=[-3,3]
vectors = [v1, v2]
combs = list(itertools.product(*[range(len(v)) for v in vectors]))
print(combs)
[(0, 0), (0, 1), (1, 0), (1, 1)]
print([[vectors[vi][ci] for vi, ci in enumerate(comb)] for comb in combs])
[[-2, -3], [-2, 3], [2, -3], [2, 3]]
print([[(vectors[vi][ci-1] if ci > 0 else np.inf) for vi, ci in enumerate(comb)] for comb in combs])
[[inf, inf], [inf, -3], [-2, inf], [-2, -3]]

Reorganizing a 3d numpy array

I've tried and searched for a few days, I've come closer but need your help.
I have a 3d array in python,
shape(files)
>> (31,2049,2)
which corresponds to 31 input files with 2 columns of data with 2048 rows and a header.
I'd like to sort this array based on the header, which is a number, in each file.
I tried to follow NumPy: sorting 3D array but keeping 2nd dimension assigned to first , but i'm incredibly confused.
First I try to setup get my headers for the argsort, I thought I could do
sortval=files[:][0][0]
but this does not work..
Then I simply did a for loop to iterate and get my headers
for i in xrange(shape(files)[0]:
sortval.append([i][0][0])
Then
sortedIdx = np.argsort(sortval)
This works, however I dont understand whats happening in the last line..
files = files[np.arange(len(deck))[:,np.newaxis],sortedIdx]
Help would be appreciated.
Another way to do this is with np.take
header = a[:,0,0]
sorted = np.take(a, np.argsort(header), axis=0)
Here we can use a simple example to demonstrate what your code is doing:
First we create a random 3D numpy matrix:
a = (np.random.rand(3,3,2)*10).astype(int)
array([[[3, 1],
[3, 7],
[0, 3]],
[[2, 9],
[1, 0],
[9, 2]],
[[9, 2],
[8, 8],
[8, 0]]])
Then a[:] will gives a itself, and a[:][0][0] is just the first row in first 2D array in a, which is:
a[:][0]
# array([[3, 1],
# [3, 7],
# [0, 3]])
a[:][0][0]
# array([3, 1])
What you want is the header which are 3,2,9 in this example, so we can use a[:, 0, 0] to extract them:
a[:,0,0]
# array([3, 2, 9])
Now we sort the above list and get an index array:
np.argsort(a[:,0,0])
# array([1, 0, 2])
In order to rearrange the entire 3D array, we need to slice the array with correct order. And np.arange(len(a))[:,np.newaxis] is equal to np.arange(len(a)).reshape(-1,1) which creates a sequential 2D index array:
np.arange(len(a))[:,np.newaxis]
# array([[0],
# [1],
# [2]])
Without the 2D array, we will slice the array to 2 dimension
a[np.arange(3), np.argsort(a[:,0,0])]
# array([[3, 7],
# [2, 9],
# [8, 0]])
With the 2D array, we can perform 3D slicing and keeps the shape:
a[np.arange(3).reshape(-1,1), np.argsort(a[:,0,0])]
array([[[3, 7],
[3, 1],
[0, 3]],
[[1, 0],
[2, 9],
[9, 2]],
[[8, 8],
[9, 2],
[8, 0]]])
And above is the final result you want.
Edit:
To arange the 2D arrays:, one could use:
a[np.argsort(a[:,0,0])]
array([[[2, 9],
[1, 0],
[9, 2]],
[[3, 1],
[3, 7],
[0, 3]],
[[9, 2],
[8, 8],
[8, 0]]])

Trying to add a column to a data file

I have a data file with 2 columns, x ranging from -5 to 4 and f(x). I need to add a third column with |f(x)| the absolute value of f(x). Then I need to export the 3 columns as a new data file.
Currently my code looks like this:
from numpy import *
data = genfromtxt("task1.dat")
c = []
ab = abs(data[:,1])
ablist = ab.tolist()
datalist = data.tolist()
c.append(ablist)
c.append (datalist)
A = asarray (c)
savetxt("task1b.dat", A)
It gives me the following error message for line "A = asarray(c)":
ValueError : setting an array element with a sequence.
Does someone know a quick and efficient way to add this column and export the data file?
You are getting a list within a list in c.
Anyway, I think this is much clearer:
import numpy as np
data = np.genfromtxt("task1.dat")
data_new = np.hstack((data, np.abs(data[:,-1]).reshape((-1,1))))
np.savetxt("task_out.dat", data_new)
c is a list and when you execute
c.append(ablist)
c.append (datalist)
it appends 2 lists of different shapes to the list c. It will probably end up looking like this
c == [ [ [....],[....]], [....]]
which is not possible to be parsed by numpy.asarray due to that shape difference
(I am saying probably because I am assuming there is a 2d matrix in genfromtxt("task1.dat"))
what you can do to concatenate the columns is
from numpy import *
data = genfromtxt("task1.dat")
ab = abs(data[:,1])
c = concatenate((data,ab.reshape(-1,1),axis=1)
savetxt("task1b.dat", c)
data is a 2d array like:
In [54]: data=np.arange(-5,5).reshape(5,2)
In [55]: data
Out[55]:
array([[-5, -4],
[-3, -2],
[-1, 0],
[ 1, 2],
[ 3, 4]])
In [56]: ab=abs(data[:,1])
There are various ways to concatenate 2 arrays. In this case, data is 2d, and ab is 1d, so you have to take some steps to ensure they are both 2d. np.column_stack does that for us.
In [58]: np.column_stack((data,ab))
Out[58]:
array([[-5, -4, 4],
[-3, -2, 2],
[-1, 0, 0],
[ 1, 2, 2],
[ 3, 4, 4]])
With a little change in indexing we could make ab a column array from that start, and simply concatenate on the 2nd axis:
ab=abs(data[:,[1]])
np.concatenate((data,ab),axis=1)
==================
The same numbers with your tolist produce a c like
In [72]: [ab.tolist()]+[data.tolist()]
Out[72]: [[4, 2, 0, 2, 4], [[-5, -4], [-3, -2], [-1, 0], [1, 2], [3, 4]]]
That is not good input for array.
To go the list route you need to do an iteration over a zip:
In [86]: list(zip(data,ab))
Out[86]:
[(array([-5, -4]), 4),
(array([-3, -2]), 2),
(array([-1, 0]), 0),
(array([1, 2]), 2),
(array([3, 4]), 4)]
In [87]: c=[]
In [88]: for i,j in zip(data,ab):
c.append(i.tolist()+[j])
....:
In [89]: c
Out[89]: [[-5, -4, 4], [-3, -2, 2], [-1, 0, 0], [1, 2, 2], [3, 4, 4]]
In [90]: np.array(c)
Out[90]:
array([[-5, -4, 4],
[-3, -2, 2],
[-1, 0, 0],
[ 1, 2, 2],
[ 3, 4, 4]])
Obviously this will be slower than the array concatenate, but studying this might help you understand both arrays and lists.

Categories