Numpy arrays with rows meeting 3 conditions involving the matching of elements

Numpy arrays with rows meeting 3 conditions involving the matching of elements - python

This question has three related parts. Consider the numpy array sample, P, having 4 columns.
import numpy as np
P = np.array([-4, 5, 2, -3],
[-5, 6, 0, -5],
[-6, 5, -2, 5],
[1, -2, 1, -2],
[2, -4, -6, 8],
[-4, 9, -4, 2],
[0, -8, -8, 1]])
I'm hoping to learn how to build three new arrays:
a) P1: This is P where the first element of a row has a match in the last 3 elements.
b) P2: This is P where the first 2 elements of a row have a match in the last 2 elements.
c) P3: This is P where the first 3 elements of a row have a match in the last element.
The outcomes, for the small sample array, would be:
P1 = [[-5, 6, 0, -5],
[1, -2, 1, -2],
[-4, 9, -4, 2]]
P2 = [[-5, 6, 0, -5],
[-6, 5, -2, 5],
[1, -2, 1, -2],
[-4, 9, -4, 2],
[0, -8, -8, 1]]
P3 = [-5, 6, 0, -5],
[-6, 5, -2, 5],
[1, -2, 1, -2]]

P1 and P3 are constructed the same way:
P1mask = (P[:, 0:1] == P[:, 1:]).any(axis=1)
P3mask = (P[:, -1:] == P[:, :-1]).any(axis=1)
P1 = P[P1mask, :]
P3 = P[P3mask, :]
The only really interesting thing here is that I'm indexing the columns as slices 0:1 and -1: instead of just 0 and -1 to preserve shape and enable broadcasting.
P2 can be constructed in a similar manner, although the solution is not very general:
P2mask = (P[:, 0:1] == P[:, 2:]).any(axis=1) | (P[:, 1:2] == P[:, 2:]).any(axis=1)
P2 = P[P2mask, :]
A more general solution would be to broadcast the two segments together with a new dimension so that the comparison done with | manually above can be automated:
split = 2
P2mask = (P[:, :split, None] == P[:, None, split:]).any(axis=(1, 2))
P2 = P[P2mask, :]
P1 and P3 are just the cases for split = 1 and split = 3, respectively.

You want to select all rows that fulfill a given condition, so you need to iterate over the rows of P, build a boolean array and apply it to the rows of P. In your case, the easiest way I can think of to check if there are shared elements, is to create two sets and check if their intersection is empty or not. This can be done via set.isdisjoint.
Final code:
P1 = P[[not set(row[:1]).isdisjoint(row[1:]) for row in P], :]
Analogous for P2 and P3.

Related

Replace consecutive duplicates in 2D numpy array

I have a two dimensional numpy array x:
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
My goal is to replace all consecutive duplicate numbers with a specific value (lets take -1), but by leaving one occurrence unchanged.
I could do this as follows:
def replace_consecutive_duplicates(x):
consec_dup = np.zeros(x.shape, dtype=bool)
consec_dup[:, 1:] = np.diff(x, axis=1) == 0
x[consec_dup] = -1
return x
# current output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, 5, -1, -1, 3],
# [ 0, 2, -1, -1, -1, 1, -1, 4]])
However, in this case the one occurrence left unchanged is always the first.
My goal is to leave the middle occurrence unchanged.
So given the same x as input, the desired output of function replace_consecutive_duplicates is:
# desired output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, -1, 5, -1, 3],
# [ 0, -1, 2, -1, -1, 1, -1, 4]])
Note that in case consecutive duplicate sequences with an even number of occurrences the middle left value should be unchanged. So the consecutive duplicate sequence [2, 2, 2, 2] in x[1] becomes [-1, 2, -1, -1]
Also note that I'm looking for a vectorized solution for 2D numpy arrays since performance is of absolute importance in my particular use case.
I've already tried looking at things like run length encoding and using np.diff(), but I didn't manage to solve this. Hope you guys can help!

The main problem is that you require the length of the number of consecutives values. This is not easy to get with numpy, but using itertools.groupby we can solve it using the following code.
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
def replace_row(arr: np.ndarray, new_val=-1):
results = []
for val, count in itertools.groupby(arr):
k = len(list(count))
results.extend([new_val] * ((k - 1) // 2))
results.append(val)
results.extend([new_val] * (k // 2))
return np.fromiter(results, arr.dtype)
if __name__ == '__main__':
for idx, row in enumerate(x):
x[idx, :] = replace_row(row)
print(x)
Output:
[[ 1 2 8 4 -1 5 -1 3]
[ 0 -1 2 -1 -1 1 -1 4]]
This isn't vectorized, but can be combined with multi threading since every row is handled one by one.

Python n-dimensional array combinations

Suppose an arbitrary number of arrays of arbitrary length. I would like to construct the n-dimensional array of all the combinations from the values in the arrays. Or even better, a list of all the combinations.
However, I would also like the previous "diagonal" element along each combination, except when such an element does not exist, in which case the values which do not exist are set to say -inf.
Take for ex. the following simple 2-D case:
v1=[-2,2]
v2=[-3,3]
From which I would get all the combinations
[[-2,-3],
[-2,3],
[2,-3],
[2,3]]
Or in 2D array / matrix form
-3 3
-2 -2,-3 -2,3
2 2,-3 2,3
Now I would also like a new column with the previous "diagonal" elements (in this case there is only 1 real such case) for each element. By previous "diagonal" element I mean the element at index i-1, j-1, k-1, ..., n-1. On the margins we take all the previous values that are possible.
1 2
-2,-3 -inf,-inf
-2, 3 -inf,-3
2,-3 -2,-inf
2, 3 -2,-3
Edit: here is the code for the 2D case, which is not much use for the general n-case.
import math
v1=[-3,-1,2,4]
v2=[-2,0,2]
tmp=[]
tmp2=[]
for i in range(0,len(v1)):
for j in range(0,len(v2)):
tmp.append([v1[i],v2[j]])
if i==0 and j==0:
tmp2.append([-math.inf,-math.inf])
elif i==0:
tmp2.append([-math.inf,v2[j-1]])
elif j==0:
tmp2.append([v1[i-1],-math.inf])
else:
tmp2.append([v1[i-1],v2[j-1]])
And so
tmp
[[-3, -2],
[-3, 0],
[-3, 2],
[-1, -2],
[-1, 0],
[-1, 2],
[2, -2],
[2, 0],
[2, 2],
[4, -2],
[4, 0],
[4, 2]]
and
tmp2
[[-inf, -inf],
[-inf, -2],
[-inf, 0],
[-3, -inf],
[-3, -2],
[-3, 0],
[-1, -inf],
[-1, -2],
[-1, 0],
[2, -inf],
[2, -2],
[2, 0]]

Take a look at itertools.product().
To get the "diagonals" you could take the product of the vectors indices instead of the vectors themselves. That way you can access the values of each combination aswell as the previous values of the combination.
Example:
import itertools
v1=[-2,2]
v2=[-3,3]
vectors = [v1, v2]
combs = list(itertools.product(*[range(len(v)) for v in vectors]))
print(combs)
[(0, 0), (0, 1), (1, 0), (1, 1)]
print([[vectors[vi][ci] for vi, ci in enumerate(comb)] for comb in combs])
[[-2, -3], [-2, 3], [2, -3], [2, 3]]
print([[(vectors[vi][ci-1] if ci > 0 else np.inf) for vi, ci in enumerate(comb)] for comb in combs])
[[inf, inf], [inf, -3], [-2, inf], [-2, -3]]

Python built-in heap (heapq): Odd behavior if inverted (max-heap)

I'm trying to use the Python (2.0) built-in min-heap data structure from the heapq module (https://docs.python.org/3/library/heapq.html) to build a max-heap. To do that I simply use the negative of the numbers I need to push into my heap.
Using this (max-heap version):
import heapq
h=[]
for i in xrange(10):
heapq.heappush(h,-i)
print h
I get something which doesn't look correct:
[0]
[-1, 0]
[-2, 0, -1]
[-3, -2, -1, 0]
[-4, -3, -1, 0, -2]
[-5, -3, -4, 0, -2, -1]
[-6, -3, -5, 0, -2, -1, -4]
[-7, -6, -5, -3, -2, -1, -4, 0]
[-8, -7, -5, -6, -2, -1, -4, 0, -3]
[-9, -8, -5, -6, -7, -1, -4, 0, -3, -2]
The min-heap version instead looks fine:
import heapq
h=[]
for i in xrange(10):
heapq.heappush(h,i)
print h
As you can see:
[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 5]
[0, 1, 2, 3, 4, 5, 6]
[0, 1, 2, 3, 4, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7, 8]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
What am I missing?
I've checked other SE questions/answers (e.g., python topN max heap, use heapq or self implement?, What do I use for a max-heap implementation in Python?, etc.) but they don't mention this issue.

As #user2357112 already mentioned, it is a min-heap. There is nothing wrong with the output. The difference between the 2 inputs is that, in the first scenario you enter the data in sorted fashion and in the second scenario, you input the data in reverse sorted fashion.
the min-heap property: the value of each node is greater than or equal
to the value of its parent, with the minimum-value element at the
root.
Case 1 : Reverse Sorted Input = 10,9,8,7,6
10
[10]
9
/
10
[9,10]
8
/ \
10 9
[8,10,9]
7
/ \
8 9
/
10
[7, 8,9,10]
6
/ \
7 9
/ \
10 8
[6,7,9,10,8]
Case 2 : Sorted Input = 1,2,3,4,5
1
[1]
1
/
2
[1,2]
1
/ \
2 3
[1,2,3]
1
/ \
2 3
/
4
[1,2,3,4]
1
/ \
2 3
/ \
4 5
[1,2,3,4,5]
If you are interested in how the heap is built and how it balances after every input, go to the following url. You can insert one element at a time and see it in action.
https://www.cs.usfca.edu/~galles/JavascriptVisual/Heap.html

The invariant of a min-heap is that each node is less than either of its children; there is no implied ordering between the two children (and therefore, there can be many valid orderings of a given set of values; the only value that has an absolutely fixed position is the minimum one, at the root of the tree). Note that this is true of your output:
,------------------,
,---+---, ,---|----------+---, |
| V V | | V V V
[-9, -8, -5, -6, -7, -1, -4, 0, -3, -2]
| | ^ ^ ^ ^
`---|---+---' | |
`-----------+---'
The fact that your other example ended up in completely sorted order is merely a coincidence, based on the different order in which items were inserted into the heap.

Trying to add a column to a data file

I have a data file with 2 columns, x ranging from -5 to 4 and f(x). I need to add a third column with |f(x)| the absolute value of f(x). Then I need to export the 3 columns as a new data file.
Currently my code looks like this:
from numpy import *
data = genfromtxt("task1.dat")
c = []
ab = abs(data[:,1])
ablist = ab.tolist()
datalist = data.tolist()
c.append(ablist)
c.append (datalist)
A = asarray (c)
savetxt("task1b.dat", A)
It gives me the following error message for line "A = asarray(c)":
ValueError : setting an array element with a sequence.
Does someone know a quick and efficient way to add this column and export the data file?

You are getting a list within a list in c.
Anyway, I think this is much clearer:
import numpy as np
data = np.genfromtxt("task1.dat")
data_new = np.hstack((data, np.abs(data[:,-1]).reshape((-1,1))))
np.savetxt("task_out.dat", data_new)

c is a list and when you execute
c.append(ablist)
c.append (datalist)
it appends 2 lists of different shapes to the list c. It will probably end up looking like this
c == [ [ [....],[....]], [....]]
which is not possible to be parsed by numpy.asarray due to that shape difference
(I am saying probably because I am assuming there is a 2d matrix in genfromtxt("task1.dat"))
what you can do to concatenate the columns is
from numpy import *
data = genfromtxt("task1.dat")
ab = abs(data[:,1])
c = concatenate((data,ab.reshape(-1,1),axis=1)
savetxt("task1b.dat", c)

data is a 2d array like:
In [54]: data=np.arange(-5,5).reshape(5,2)
In [55]: data
Out[55]:
array([[-5, -4],
[-3, -2],
[-1, 0],
[ 1, 2],
[ 3, 4]])
In [56]: ab=abs(data[:,1])
There are various ways to concatenate 2 arrays. In this case, data is 2d, and ab is 1d, so you have to take some steps to ensure they are both 2d. np.column_stack does that for us.
In [58]: np.column_stack((data,ab))
Out[58]:
array([[-5, -4, 4],
[-3, -2, 2],
[-1, 0, 0],
[ 1, 2, 2],
[ 3, 4, 4]])
With a little change in indexing we could make ab a column array from that start, and simply concatenate on the 2nd axis:
ab=abs(data[:,[1]])
np.concatenate((data,ab),axis=1)
==================
The same numbers with your tolist produce a c like
In [72]: [ab.tolist()]+[data.tolist()]
Out[72]: [[4, 2, 0, 2, 4], [[-5, -4], [-3, -2], [-1, 0], [1, 2], [3, 4]]]
That is not good input for array.
To go the list route you need to do an iteration over a zip:
In [86]: list(zip(data,ab))
Out[86]:
[(array([-5, -4]), 4),
(array([-3, -2]), 2),
(array([-1, 0]), 0),
(array([1, 2]), 2),
(array([3, 4]), 4)]
In [87]: c=[]
In [88]: for i,j in zip(data,ab):
c.append(i.tolist()+[j])
....:
In [89]: c
Out[89]: [[-5, -4, 4], [-3, -2, 2], [-1, 0, 0], [1, 2, 2], [3, 4, 4]]
In [90]: np.array(c)
Out[90]:
array([[-5, -4, 4],
[-3, -2, 2],
[-1, 0, 0],
[ 1, 2, 2],
[ 3, 4, 4]])
Obviously this will be slower than the array concatenate, but studying this might help you understand both arrays and lists.

What is the purpose of meshgrid in Python / NumPy?

Can someone explain to me what is the purpose of meshgrid function in Numpy? I know it creates some kind of grid of coordinates for plotting, but I can't really see the direct benefit of it.
I am studying "Python Machine Learning" from Sebastian Raschka, and he is using it for plotting the decision borders. See input 11 here.
I have also tried this code from official documentation, but, again, the output doesn't really make sense to me.
x = np.arange(-5, 5, 1)
y = np.arange(-5, 5, 1)
xx, yy = np.meshgrid(x, y, sparse=True)
z = np.sin(xx**2 + yy**2) / (xx**2 + yy**2)
h = plt.contourf(x,y,z)
Please, if possible, also show me a lot of real-world examples.

The purpose of meshgrid is to create a rectangular grid out of an array of x values and an array of y values.
So, for example, if we want to create a grid where we have a point at each integer value between 0 and 4 in both the x and y directions. To create a rectangular grid, we need every combination of the x and y points.
This is going to be 25 points, right? So if we wanted to create an x and y array for all of these points, we could do the following.
x[0,0] = 0 y[0,0] = 0
x[0,1] = 1 y[0,1] = 0
x[0,2] = 2 y[0,2] = 0
x[0,3] = 3 y[0,3] = 0
x[0,4] = 4 y[0,4] = 0
x[1,0] = 0 y[1,0] = 1
x[1,1] = 1 y[1,1] = 1
...
x[4,3] = 3 y[4,3] = 4
x[4,4] = 4 y[4,4] = 4
This would result in the following x and y matrices, such that the pairing of the corresponding element in each matrix gives the x and y coordinates of a point in the grid.
x = 0 1 2 3 4 y = 0 0 0 0 0
0 1 2 3 4 1 1 1 1 1
0 1 2 3 4 2 2 2 2 2
0 1 2 3 4 3 3 3 3 3
0 1 2 3 4 4 4 4 4 4
We can then plot these to verify that they are a grid:
plt.plot(x,y, marker='.', color='k', linestyle='none')
Obviously, this gets very tedious especially for large ranges of x and y. Instead, meshgrid can actually generate this for us: all we have to specify are the unique x and y values.
xvalues = np.array([0, 1, 2, 3, 4]);
yvalues = np.array([0, 1, 2, 3, 4]);
Now, when we call meshgrid, we get the previous output automatically.
xx, yy = np.meshgrid(xvalues, yvalues)
plt.plot(xx, yy, marker='.', color='k', linestyle='none')
Creation of these rectangular grids is useful for a number of tasks. In the example that you have provided in your post, it is simply a way to sample a function (sin(x**2 + y**2) / (x**2 + y**2)) over a range of values for x and y.
Because this function has been sampled on a rectangular grid, the function can now be visualized as an "image".
Additionally, the result can now be passed to functions which expect data on rectangular grid (i.e. contourf)

Courtesy of Microsoft Excel:

Actually the purpose of np.meshgrid is already mentioned in the documentation:
np.meshgrid
Return coordinate matrices from coordinate vectors.
Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,..., xn.
So it's primary purpose is to create a coordinates matrices.
You probably just asked yourself:
Why do we need to create coordinate matrices?
The reason you need coordinate matrices with Python/NumPy is that there is no direct relation from coordinates to values, except when your coordinates start with zero and are purely positive integers. Then you can just use the indices of an array as the index.
However when that's not the case you somehow need to store coordinates alongside your data. That's where grids come in.
Suppose your data is:
1 2 1
2 5 2
1 2 1
However, each value represents a 3 x 2 kilometer area (horizontal x vertical). Suppose your origin is the upper left corner and you want arrays that represent the distance you could use:
import numpy as np
h, v = np.meshgrid(np.arange(3)*3, np.arange(3)*2)
where v is:
array([[0, 0, 0],
[2, 2, 2],
[4, 4, 4]])
and h:
array([[0, 3, 6],
[0, 3, 6],
[0, 3, 6]])
So if you have two indices, let's say x and y (that's why the return value of meshgrid is usually xx or xs instead of x in this case I chose h for horizontally!) then you can get the x coordinate of the point, the y coordinate of the point and the value at that point by using:
h[x, y] # horizontal coordinate
v[x, y] # vertical coordinate
data[x, y] # value
That makes it much easier to keep track of coordinates and (even more importantly) you can pass them to functions that need to know the coordinates.
A slightly longer explanation
However, np.meshgrid itself isn't often used directly, mostly one just uses one of similar objects np.mgrid or np.ogrid.
Here np.mgrid represents the sparse=False and np.ogrid the sparse=True case (I refer to the sparse argument of np.meshgrid). Note that there is a significant difference between
np.meshgrid and np.ogrid and np.mgrid: The first two returned values (if there are two or more) are reversed. Often this doesn't matter but you should give meaningful variable names depending on the context.
For example, in case of a 2D grid and matplotlib.pyplot.imshow it makes sense to name the first returned item of np.meshgrid x and the second one y while it's
the other way around for np.mgrid and np.ogrid.
np.ogrid and sparse grids
>>> import numpy as np
>>> yy, xx = np.ogrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]])
>>> yy
array([[-5],
[-4],
[-3],
[-2],
[-1],
[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5]])
As already said the output is reversed when compared to np.meshgrid, that's why I unpacked it as yy, xx instead of xx, yy:
>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6), sparse=True)
>>> xx
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]])
>>> yy
array([[-5],
[-4],
[-3],
[-2],
[-1],
[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5]])
This already looks like coordinates, specifically the x and y lines for 2D plots.
Visualized:
yy, xx = np.ogrid[-5:6, -5:6]
plt.figure()
plt.title('ogrid (sparse meshgrid)')
plt.grid()
plt.xticks(xx.ravel())
plt.yticks(yy.ravel())
plt.scatter(xx, np.zeros_like(xx), color="blue", marker="*")
plt.scatter(np.zeros_like(yy), yy, color="red", marker="x")
np.mgrid and dense/fleshed out grids
>>> yy, xx = np.mgrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
[-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
[-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
[-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[ 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]])
The same applies here: The output is reversed compared to np.meshgrid:
>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6))
>>> xx
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
[-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
[-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
[-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[ 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]])
Unlike ogrid these arrays contain all xx and yy coordinates in the -5 <= xx <= 5; -5 <= yy <= 5 grid.
yy, xx = np.mgrid[-5:6, -5:6]
plt.figure()
plt.title('mgrid (dense meshgrid)')
plt.grid()
plt.xticks(xx[0])
plt.yticks(yy[:, 0])
plt.scatter(xx, yy, color="red", marker="x")
Functionality
It's not only limited to 2D, these functions work for arbitrary dimensions (well, there is a maximum number of arguments given to function in Python and a maximum number of dimensions that NumPy allows):
>>> x1, x2, x3, x4 = np.ogrid[:3, 1:4, 2:5, 3:6]
>>> for i, x in enumerate([x1, x2, x3, x4]):
... print('x{}'.format(i+1))
... print(repr(x))
x1
array([[[[0]]],
[[[1]]],
[[[2]]]])
x2
array([[[[1]],
[[2]],
[[3]]]])
x3
array([[[[2],
[3],
[4]]]])
x4
array([[[[3, 4, 5]]]])
>>> # equivalent meshgrid output, note how the first two arguments are reversed and the unpacking
>>> x2, x1, x3, x4 = np.meshgrid(np.arange(1,4), np.arange(3), np.arange(2, 5), np.arange(3, 6), sparse=True)
>>> for i, x in enumerate([x1, x2, x3, x4]):
... print('x{}'.format(i+1))
... print(repr(x))
# Identical output so it's omitted here.
Even if these also work for 1D there are two (much more common) 1D grid creation functions:
np.arange
np.linspace
Besides the start and stop argument it also supports the step argument (even complex steps that represent the number of steps):
>>> x1, x2 = np.mgrid[1:10:2, 1:10:4j]
>>> x1 # The dimension with the explicit step width of 2
array([[1., 1., 1., 1.],
[3., 3., 3., 3.],
[5., 5., 5., 5.],
[7., 7., 7., 7.],
[9., 9., 9., 9.]])
>>> x2 # The dimension with the "number of steps"
array([[ 1., 4., 7., 10.],
[ 1., 4., 7., 10.],
[ 1., 4., 7., 10.],
[ 1., 4., 7., 10.],
[ 1., 4., 7., 10.]])
Applications
You specifically asked about the purpose and in fact, these grids are extremely useful if you need a coordinate system.
For example if you have a NumPy function that calculates the distance in two dimensions:
def distance_2d(x_point, y_point, x, y):
return np.hypot(x-x_point, y-y_point)
And you want to know the distance of each point:
>>> ys, xs = np.ogrid[-5:5, -5:5]
>>> distances = distance_2d(1, 2, xs, ys) # distance to point (1, 2)
>>> distances
array([[9.21954446, 8.60232527, 8.06225775, 7.61577311, 7.28010989,
7.07106781, 7. , 7.07106781, 7.28010989, 7.61577311],
[8.48528137, 7.81024968, 7.21110255, 6.70820393, 6.32455532,
6.08276253, 6. , 6.08276253, 6.32455532, 6.70820393],
[7.81024968, 7.07106781, 6.40312424, 5.83095189, 5.38516481,
5.09901951, 5. , 5.09901951, 5.38516481, 5.83095189],
[7.21110255, 6.40312424, 5.65685425, 5. , 4.47213595,
4.12310563, 4. , 4.12310563, 4.47213595, 5. ],
[6.70820393, 5.83095189, 5. , 4.24264069, 3.60555128,
3.16227766, 3. , 3.16227766, 3.60555128, 4.24264069],
[6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
2.23606798, 2. , 2.23606798, 2.82842712, 3.60555128],
[6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
1.41421356, 1. , 1.41421356, 2.23606798, 3.16227766],
[6. , 5. , 4. , 3. , 2. ,
1. , 0. , 1. , 2. , 3. ],
[6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
1.41421356, 1. , 1.41421356, 2.23606798, 3.16227766],
[6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
2.23606798, 2. , 2.23606798, 2.82842712, 3.60555128]])
The output would be identical if one passed in a dense grid instead of an open grid. NumPys broadcasting makes it possible!
Let's visualize the result:
plt.figure()
plt.title('distance to point (1, 2)')
plt.imshow(distances, origin='lower', interpolation="none")
plt.xticks(np.arange(xs.shape[1]), xs.ravel()) # need to set the ticks manually
plt.yticks(np.arange(ys.shape[0]), ys.ravel())
plt.colorbar()
And this is also when NumPys mgrid and ogrid become very convenient because it allows you to easily change the resolution of your grids:
ys, xs = np.ogrid[-5:5:200j, -5:5:200j]
# otherwise same code as above
However, since imshow doesn't support x and y inputs one has to change the ticks by hand. It would be really convenient if it would accept the x and y coordinates, right?
It's easy to write functions with NumPy that deal naturally with grids. Furthermore, there are several functions in NumPy, SciPy, matplotlib that expect you to pass in the grid.
I like images so let's explore matplotlib.pyplot.contour:
ys, xs = np.mgrid[-5:5:200j, -5:5:200j]
density = np.sin(ys)-np.cos(xs)
plt.figure()
plt.contour(xs, ys, density)
Note how the coordinates are already correctly set! That wouldn't be the case if you just passed in the density.
Or to give another fun example using astropy models (this time I don't care much about the coordinates, I just use them to create some grid):
from astropy.modeling import models
z = np.zeros((100, 100))
y, x = np.mgrid[0:100, 0:100]
for _ in range(10):
g2d = models.Gaussian2D(amplitude=100,
x_mean=np.random.randint(0, 100),
y_mean=np.random.randint(0, 100),
x_stddev=3,
y_stddev=3)
z += g2d(x, y)
a2d = models.AiryDisk2D(amplitude=70,
x_0=np.random.randint(0, 100),
y_0=np.random.randint(0, 100),
radius=5)
z += a2d(x, y)
Although that's just "for the looks" several functions related to functional models and fitting (for example scipy.interpolate.interp2d,
scipy.interpolate.griddata even show examples using np.mgrid) in Scipy, etc. require grids. Most of these work with open grids and dense grids, however some only work with one of them.

Suppose you have a function:
def sinus2d(x, y):
return np.sin(x) + np.sin(y)
and you want, for example, to see what it looks like in the range 0 to 2*pi. How would you do it? There np.meshgrid comes in:
xx, yy = np.meshgrid(np.linspace(0,2*np.pi,100), np.linspace(0,2*np.pi,100))
z = sinus2d(xx, yy) # Create the image on this grid
and such a plot would look like:
import matplotlib.pyplot as plt
plt.imshow(z, origin='lower', interpolation='none')
plt.show()
So np.meshgrid is just a convenience. In principle the same could be done by:
z2 = sinus2d(np.linspace(0,2*np.pi,100)[:,None], np.linspace(0,2*np.pi,100)[None,:])
but there you need to be aware of your dimensions (suppose you have more than two ...) and the right broadcasting. np.meshgrid does all of this for you.
Also meshgrid allows you to delete coordinates together with the data if you, for example, want to do an interpolation but exclude certain values:
condition = z>0.6
z_new = z[condition] # This will make your array 1D
so how would you do the interpolation now? You can give x and y to an interpolation function like scipy.interpolate.interp2d so you need a way to know which coordinates were deleted:
x_new = xx[condition]
y_new = yy[condition]
and then you can still interpolate with the "right" coordinates (try it without the meshgrid and you will have a lot of extra code):
from scipy.interpolate import interp2d
interpolated = interp2d(x_new, y_new, z_new)
and the original meshgrid allows you to get the interpolation on the original grid again:
interpolated_grid = interpolated(xx[0], yy[:, 0]).reshape(xx.shape)
These are just some examples where I used the meshgrid there might be a lot more.

Short answer
The purpose of meshgrid is to help replace slow Python loops by faster vectorized operations available in NumPy library. meshgrid role is to prepare 2D arrays required by the vectorized operation.
Basic example showing the principle
Let's say we have two sequences of values,
a = [2,7,9,20]
b = [1,6,7,9] 
and we want to perform an operation on each possible pair of values, one taken from the first list, one taken from the second list. We also want to store the result. For example, let's say we want to get the sum of the values for each possible pair.
Slow and laborious method
c = []
for i in range(len(b)):
row = []
for j in range(len(a)):
row.append (a[j] + b[i])
c.append (row)
print (c)
Result:
[[3, 8, 10, 21],
[8, 13, 15, 26],
[9, 14, 16, 27],
[11, 16, 18, 29]]
Fast and easy method
i,j = np.meshgrid (a,b)
c = i + j
print (c)
Result:
[[ 3 8 10 21]
[ 8 13 15 26]
[ 9 14 16 27]
[11 16 18 29]]
You can see from this basic illustration how the explicit slow Python loops have been replaced by hidden faster C loops in Numpy library. This principle is widely used for 3D operations, included colored pixel maps. The common example is a 3D plot.
Common use: 3D plot
x = np.arange(-4, 4, 0.25)
y = np.arange(-4, 4, 0.25)
X, Y = np.meshgrid(x, y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
(Borrowed from this site)
meshgrid is used to create pairs of coordinates between -4 and +4 with .25 increments in each direction X and Y. Each pair is then used to find R, and Z from it. This way of preparing "a grid" of coordinates is frequently used in plotting 3D surfaces, or coloring 2D surfaces.
Meshgrid under the hood
The two arrays prepared by meshgrid are:
(array([[ 2, 7, 9, 20],
[ 2, 7, 9, 20],
[ 2, 7, 9, 20],
[ 2, 7, 9, 20]]),
array([[1, 1, 1, 1],
[6, 6, 6, 6],
[7, 7, 7, 7],
[9, 9, 9, 9]]))
These arrays are created by repeating the values provided, either horizontally or vertically. The two arrays are shape compatible for a vector operation.
Origin
numpy.meshgrid comes from MATLAB, like many other NumPy functions. So you can also study the examples from MATLAB to see meshgrid in use, the code for the 3D plotting looks the same in MATLAB.

meshgrid helps in creating a rectangular grid from two 1-D arrays of all pairs of points from the two arrays.
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 2, 3, 4])
Now, if you have defined a function f(x,y) and you wanna apply this function to all the possible combination of points from the arrays 'x' and 'y', then you can do this:
f(*np.meshgrid(x, y))
Say, if your function just produces the product of two elements, then this is how a cartesian product can be achieved, efficiently for large arrays.
Referred from here

Basic Idea
Given possible x values, xs, (think of them as the tick-marks on the x-axis of a plot) and possible y values, ys, meshgrid generates the corresponding set of (x, y) grid points---analogous to set((x, y) for x in xs for y in yx). For example, if xs=[1,2,3] and ys=[4,5,6], we'd get the set of coordinates {(1,4), (2,4), (3,4), (1,5), (2,5), (3,5), (1,6), (2,6), (3,6)}.
Form of the Return Value
However, the representation that meshgrid returns is different from the above expression in two ways:
First, meshgrid lays out the grid points in a 2d array: rows correspond to different y-values, columns correspond to different x-values---as in list(list((x, y) for x in xs) for y in ys), which would give the following array:
[[(1,4), (2,4), (3,4)],
[(1,5), (2,5), (3,5)],
[(1,6), (2,6), (3,6)]]
Second, meshgrid returns the x and y coordinates separately (i.e. in two different numpy 2d arrays):
xcoords, ycoords = (
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]),
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6]]))
# same thing using np.meshgrid:
xcoords, ycoords = np.meshgrid([1,2,3], [4,5,6])
# same thing without meshgrid:
xcoords = np.array([xs] * len(ys)
ycoords = np.array([ys] * len(xs)).T
Note, np.meshgrid can also generate grids for higher dimensions. Given xs, ys, and zs, you'd get back xcoords, ycoords, zcoords as 3d arrays. meshgrid also supports reverse ordering of the dimensions as well as sparse representation of the result.
Applications
Why would we want this form of output?
Apply a function at every point on a grid:
One motivation is that binary operators like (+, -, *, /, **) are overloaded for numpy arrays as elementwise operations. This means that if I have a function def f(x, y): return (x - y) ** 2 that works on two scalars, I can also apply it on two numpy arrays to get an array of elementwise results: e.g. f(xcoords, ycoords) or f(*np.meshgrid(xs, ys)) gives the following on the above example:
array([[ 9, 4, 1],
[16, 9, 4],
[25, 16, 9]])
Higher dimensional outer product: I'm not sure how efficient this is, but you can get high-dimensional outer products this way: np.prod(np.meshgrid([1,2,3], [1,2], [1,2,3,4]), axis=0).
Contour plots in matplotlib: I came across meshgrid when investigating drawing contour plots with matplotlib for plotting decision boundaries. For this, you generate a grid with meshgrid, evaluate the function at each grid point (e.g. as shown above), and then pass the xcoords, ycoords, and computed f-values (i.e. zcoords) into the contourf function.

Behind the scenes:
import numpy as np
def meshgrid(x , y):
XX = []
YY = []
for colm in range(len(y)):
XX.append([])
YY.append([])
for row in range(len(x)):
XX[colm].append(x[row])
YY[colm].append(y[colm])
return np.asarray(XX), np.asarray(YY)
Lets take dataset of #Sarsaparilla's answer as example:
y = [7, 6, 5]
x = [1, 2, 3, 4]
xx, yy = meshgrid(x , y)
and it outputs:
>>> xx
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
>>> yy
array([[7, 7, 7, 7],
[6, 6, 6, 6],
[5, 5, 5, 5]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy arrays with rows meeting 3 conditions involving the matching of elements - python

Related

Replace consecutive duplicates in 2D numpy array

Python n-dimensional array combinations

Python built-in heap (heapq): Odd behavior if inverted (max-heap)

Trying to add a column to a data file

What is the purpose of meshgrid in Python / NumPy?

Categories

Resources