I'm trying to make a Python app that shows a graph after the input of the data by the user, but the problem is that the y_array and the x_array do not have the same dimensions. When I run the program, this error is raised:
ValueError: x and y must have same first dimension, but have shapes () and ()
How can I draw a graph with the X and Y axis of different length?
Here is a minimal example code that will lead to the same error I got
:
import matplotlib.pyplot as plt
y = [0, 8, 9, 3, 0]
x = [1, 2, 3, 4, 5, 6, 7]
plt.plot(x, y)
plt.show()
This is virtually a copy/paste of the answer found here, but I'll show what I did to get these to match.
First, we need to decide which array to use- the x_array of length 7, or the y_array of length 5. I'll show both, starting with the former. Note that I am using numpy arrays, not lists.
Let's load the modules
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as interp
and the arrays
y = np.array([0, 8, 9, 3, 0])
x = np.array([1, 2, 3, 4, 5, 6, 7])
In both cases, we use interp.interp1d which is described in detail in the documentation.
For the x_array to be reduced to the length of the y_array:
x_inter = interp.interp1d(np.arange(x.size), x)
x_ = x_inter(np.linspace(0,x.size-1,y.size))
print(len(x_), len(y))
# Prints 5,5
plt.plot(x_,y)
plt.show()
Which gives
and for the y_array to be increased to the length of the x_array:
y_inter = interp.interp1d(np.arange(y.size), y)
y_ = y_inter(np.linspace(0,y.size-1,x.size))
print(len(x), len(y_))
# Prints 7,7
plt.plot(x,y_)
plt.show()
Which gives
I am trying to get median of each row of 2D torch.tensor. But the result is not what I expect when compared to working with standard array or numpy
import torch
import numpy as np
from statistics import median
print(torch.__version__)
>>> 0.4.1
y = [[1, 2, 3, 5, 9, 1],[1, 2, 3, 5, 9, 1]]
median(y[0])
>>> 2.5
np.median(y,axis=1)
>>> array([2.5, 2.5])
yt = torch.tensor(y,dtype=torch.float32)
yt.median(1)[0]
>>> tensor([2., 2.])
Looks like this is the intended behaviour of Torch as mentioned in this issue
https://github.com/pytorch/pytorch/issues/1837
https://github.com/torch/torch7/pull/182
The reasoning as mentioned in the link above
Median returns 'middle' element in case of odd-many elements, otherwise one-before-middle element (could also do the other convention to take mean of the two around-the-middle elements, but that would be twice more expensive, so I decided for this one).
You can emulate numpy median with pytorch:
import torch
import numpy as np
y =[1, 2, 3, 5, 9, 1]
print("numpy=",np.median(y))
print(sorted([1, 2, 3, 5, 9, 1]))
yt = torch.tensor(y,dtype=torch.float32)
ymax = torch.tensor([yt.max()])
print("torch=",yt.median())
print("torch_fixed=",(torch.cat((yt,ymax)).median()+yt.median())/2.)
I want to get the intersecting (common) rows across two 2D numpy arrays. E.g., if the following arrays are passed as inputs:
array([[1, 4],
[2, 5],
[3, 6]])
array([[1, 4],
[3, 6],
[7, 8]])
the output should be:
array([[1, 4],
[3, 6])
I know how to do this with loops. I'm looking at a Pythonic/Numpy way to do this.
For short arrays, using sets is probably the clearest and most readable way to do it.
Another way is to use numpy.intersect1d. You'll have to trick it into treating the rows as a single value, though... This makes things a bit less readable...
import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)
For large arrays, this should be considerably faster than using sets.
You could use Python's sets:
>>> import numpy as np
>>> A = np.array([[1,4],[2,5],[3,6]])
>>> B = np.array([[1,4],[3,6],[7,8]])
>>> aset = set([tuple(x) for x in A])
>>> bset = set([tuple(x) for x in B])
>>> np.array([x for x in aset & bset])
array([[1, 4],
[3, 6]])
As Rob Cowie points out, this can be done more concisely as
np.array([x for x in set(tuple(x) for x in A) & set(tuple(x) for x in B)])
There's probably a way to do this without all the going back and forth from arrays to tuples, but it's not coming to me right now.
I could not understand why there is no suggested pure numpy way to get this working. So I found one, that uses numpy broadcast. The basic idea is to transform one of the arrays to 3d by axes swapping. Let's construct 2 arrays:
a=np.random.randint(10, size=(5, 3))
b=np.zeros_like(a)
b[:4,:]=a[np.random.randint(a.shape[0], size=4), :]
With my run it gave:
a=array([[5, 6, 3],
[8, 1, 0],
[2, 1, 4],
[8, 0, 6],
[6, 7, 6]])
b=array([[2, 1, 4],
[2, 1, 4],
[6, 7, 6],
[5, 6, 3],
[0, 0, 0]])
The steps are (arrays can be interchanged) :
#a is nxm and b is kxm
c = np.swapaxes(a[:,:,None],1,2)==b #transform a to nx1xm
# c has nxkxm dimensions due to comparison broadcast
# each nxixj slice holds comparison matrix between a[j,:] and b[i,:]
# Decrease dimension to nxk with product:
c = np.prod(c,axis=2)
#To get around duplicates://
# Calculate cumulative sum in k-th dimension
c= c*np.cumsum(c,axis=0)
# compare with 1, so that to get only one 'True' statement by row
c=c==1
#//
# sum in k-th dimension, so that a nx1 vector is produced
c=np.sum(c,axis=1).astype(bool)
# The intersection between a and b is a[c]
result=a[c]
In a function with 2 lines for used memory reduction (correct me if wrong):
def array_row_intersection(a,b):
tmp=np.prod(np.swapaxes(a[:,:,None],1,2)==b,axis=2)
return a[np.sum(np.cumsum(tmp,axis=0)*tmp==1,axis=1).astype(bool)]
which gave result for my example:
result=array([[5, 6, 3],
[2, 1, 4],
[6, 7, 6]])
This is faster than set solutions, as it makes use only of simple numpy operations, while it reduces constantly dimensions, and is ideal for two big matrices. I guess I might have made mistakes in my comments, as I got the answer by experimentation and instinct. The equivalent for column intersection can either be found by transposing the arrays or by changing the steps a little. Also, if duplicates are wanted, then the steps inside "//" have to be skipped. The function can be edited to return only the boolean array of the indices, which came handy to me ,while trying to get different arrays indices with the same vector. Benchmark for the voted answer and mine (number of elements in each dimension plays role on what to choose):
Code:
def voted_answer(A,B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
return C.view(A.dtype).reshape(-1, ncols)
a_small=np.random.randint(10, size=(10, 10))
b_small=np.zeros_like(a_small)
b_small=a_small[np.random.randint(a_small.shape[0],size=[a_small.shape[0]]),:]
a_big_row=np.random.randint(10, size=(10, 1000))
b_big_row=a_big_row[np.random.randint(a_big_row.shape[0],size=[a_big_row.shape[0]]),:]
a_big_col=np.random.randint(10, size=(1000, 10))
b_big_col=a_big_col[np.random.randint(a_big_col.shape[0],size=[a_big_col.shape[0]]),:]
a_big_all=np.random.randint(10, size=(100,100))
b_big_all=a_big_all[np.random.randint(a_big_all.shape[0],size=[a_big_all.shape[0]]),:]
print 'Small arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_small,b_small),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_small,b_small),number=100)/100
print 'Big column arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_col,b_big_col),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_col,b_big_col),number=100)/100
print 'Big row arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_row,b_big_row),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_row,b_big_row),number=100)/100
print 'Big arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_all,b_big_all),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_all,b_big_all),number=100)/100
with results:
Small arrays:
Voted answer: 7.47108459473e-05
Proposed answer: 2.47001647949e-05
Big column arrays:
Voted answer: 0.00198730945587
Proposed answer: 0.0560171294212
Big row arrays:
Voted answer: 0.00500325918198
Proposed answer: 0.000308241844177
Big arrays:
Voted answer: 0.000864889621735
Proposed answer: 0.00257176160812
Following verdict is that if you have to compare 2 big 2d arrays of 2d points then use voted answer. If you have big matrices in all dimensions, voted answer is the best one by all means. So, it depends on what you choose each time.
Numpy broadcasting
We can create a boolean mask using broadcasting which can be then used to filter the rows in array A which are also present in array B
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
m = (A[:, None] == B).all(-1).any(1)
>>> A[m]
array([[1, 4],
[3, 6]])
Another way to achieve this using structured array:
>>> a = np.array([[3, 1, 2], [5, 8, 9], [7, 4, 3]])
>>> b = np.array([[2, 3, 0], [3, 1, 2], [7, 4, 3]])
>>> av = a.view([('', a.dtype)] * a.shape[1]).ravel()
>>> bv = b.view([('', b.dtype)] * b.shape[1]).ravel()
>>> np.intersect1d(av, bv).view(a.dtype).reshape(-1, a.shape[1])
array([[3, 1, 2],
[7, 4, 3]])
Just for clarity, the structured view looks like this:
>>> a.view([('', a.dtype)] * a.shape[1])
array([[(3, 1, 2)],
[(5, 8, 9)],
[(7, 4, 3)]],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
np.array(set(map(tuple, b)).difference(set(map(tuple, a))))
This could also work
Without Index
Visit https://gist.github.com/RashidLadj/971c7235ce796836853fcf55b4876f3c
def intersect2D(Array_A, Array_B):
"""
Find row intersection between 2D numpy arrays, a and b.
"""
# ''' Using Tuple ''' #
intersectionList = list(set([tuple(x) for x in Array_A for y in Array_B if(tuple(x) == tuple(y))]))
print ("intersectionList = \n",intersectionList)
# ''' Using Numpy function "array_equal" ''' #
""" This method is valid for an ndarray """
intersectionList = list(set([tuple(x) for x in Array_A for y in Array_B if(np.array_equal(x, y))]))
print ("intersectionList = \n",intersectionList)
# ''' Using set and bitwise and '''
intersectionList = [list(y) for y in (set([tuple(x) for x in Array_A]) & set([tuple(x) for x in Array_B]))]
print ("intersectionList = \n",intersectionList)
return intersectionList
With Index
Visit https://gist.github.com/RashidLadj/bac71f3d3380064de2f9abe0ae43c19e
def intersect2D(Array_A, Array_B):
"""
Find row intersection between 2D numpy arrays, a and b.
Returns another numpy array with shared rows and index of items in A & B arrays
"""
# [[IDX], [IDY], [value]] where Equal
# ''' Using Tuple ''' #
IndexEqual = np.asarray([(i, j, x) for i,x in enumerate(Array_A) for j, y in enumerate (Array_B) if(tuple(x) == tuple(y))]).T
# ''' Using Numpy array_equal ''' #
IndexEqual = np.asarray([(i, j, x) for i,x in enumerate(Array_A) for j, y in enumerate (Array_B) if(np.array_equal(x, y))]).T
idx, idy, intersectionList = (IndexEqual[0], IndexEqual[1], IndexEqual[2]) if len(IndexEqual) != 0 else ([], [], [])
return intersectionList, idx, idy
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
def matching_rows(A,B):
matches=[i for i in range(B.shape[0]) if np.any(np.all(A==B[i],axis=1))]
if len(matches)==0:
return B[matches]
return np.unique(B[matches],axis=0)
>>> matching_rows(A,B)
array([[1, 4],
[3, 6]])
This of course assumes the rows are all the same length.
import numpy as np
A=np.array([[1, 4],
[2, 5],
[3, 6]])
B=np.array([[1, 4],
[3, 6],
[7, 8]])
intersetingRows=[(B==irow).all(axis=1).any() for irow in A]
print(A[intersetingRows])
In Python, I have the following problem, made into a toy example:
import random
import numpy as np
x_arr = np.array([], dtype = object)
for x in range(5):
y_arr = np.array([], dtype=object)
for y in range(5):
r = random.random()
if r < 0.5:
y_arr = np.append(y_arr,y)
if random.random() < 0.9:
x_arr = np.append(x_arr, y_arr)
#This results in
>>> x_arr
array([4, 0, 1, 2, 4, 0, 3, 4], dtype=object)
I would like to have
array([array([4]), array([0, 1, 2, 4]), array([0, 3, 4]), dtype=object)
So apparently, in this run 3 out of 5 (variable) times the array $y_arr$ is written into $x_arr$, having lengths 1,4, and 3 (variable).
append() puts the results in one long 1D-structure, where I would like to keep it 2D. Also, considering the example, it might be that no numbers get written at all (if you are 'unlucky' with the random numbers). So i have an a priori unknown array of arrays with, each of those, a priori unknown number of elements. How would I approach this in Python, other than finding an upperbound on both and store a lot of zeros?
You might do it in a two step process? First add an element, then set the element. This circumvents the automatic flatten which happens in np.append() when axis=None (default behavior), as documented here.
import random
import numpy as np
x_arr = np.array([], dtype = object).reshape((1,0))
for x in range(5):
y_arr = np.array([], dtype=np.int32)
for y in range(5):
r = random.random()
if r < 0.5:
y_arr = np.append(y_arr,y)
if random.random() < 0.9:
x_arr = np.append(x_arr, 0)
x_arr[-1] = y_arr
print type(x_arr)
print x_arr
This gives:
<type 'numpy.ndarray'>
[array([0, 1, 2]) array([0, 1, 2, 3]) array([0, 1, 4]) array([0, 1, 3, 4])
array([2, 3])]
Also, why not use a python list for x_arr (or y_arr?). Nested numpy arrays are not really useful when they are not ndarrays.
I have a numpy matrix X and I would like to add to this matrix as new variables all the possible products between 2 columns.
So if X=(x1,x2,x3) I want X=(x1,x2,x3,x1x2,x2x3,x1x3)
Is there an elegant way to do that?
I think a combination of numpy and itertools should work
EDIT:
Very good answers but are they considering that X is a matrix? So x1,x1,.. x3 can eventually be arrays?
EDIT:
A Real example
a=array([[1,2,3],[4,5,6]])
Itertools should be the answer here.
a = [1, 2, 3]
p = (x * y for x, y in itertools.combinations(a, 2))
print list(itertools.chain(a, p))
Result:
[1, 2, 3, 2, 3, 6] # 1, 2, 3, 2 x 1, 3 x 1, 3 x 2
I think Samy's solution is pretty good. If you need to use numpy, you could transform it a little like this:
from itertools import combinations
from numpy import prod
x = [1, 2, 3]
print x + map(prod, combinations(x, 2))
Gives the same output as Samy's solution:
[1, 2, 3, 2, 3, 6]
If your arrays are small, then Samy's pure-Python solution using itertools.combinations should be fine:
from itertools import combinations, chain
def all_products1(a):
p = (x * y for x, y in combinations(a, 2))
return list(chain(a, p))
But if your arrays are large, then you'll get a substantial speedup by fully vectorizing the computation, using numpy.triu_indices, like this:
import numpy as np
def all_products2(a):
x, y = np.triu_indices(len(a), 1)
return np.r_[a, a[x] * a[y]]
Let's compare these:
>>> data = np.random.uniform(0, 100, (10000,))
>>> timeit(lambda:all_products1(data), number=1)
53.745754408999346
>>> timeit(lambda:all_products2(data), number=1)
12.26144006299728
The solution using numpy.triu_indices also works for multi-dimensional data:
>>> np.random.uniform(0, 100, (3,2))
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808]])
>>> all_products2(_)
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808],
[ 6014.22480172, 771.41777239],
[ 5625.39908354, 1373.00597677],
[ 8324.59122432, 4587.57109368]])
If you want to operate on columns rather than rows, use:
def all_products3(a):
x, y = np.triu_indices(a.shape[1], 1)
return np.c_[a, a[:,x] * a[:,y]]
For example:
>>> np.random.uniform(0, 100, (2,3))
array([[ 33.0062385 , 28.17575024, 20.42504351],
[ 40.84235995, 61.12417428, 58.74835028]])
>>> all_products3(_)
array([[ 33.0062385 , 28.17575024, 20.42504351, 929.97553238,
674.15385734, 575.4909246 ],
[ 40.84235995, 61.12417428, 58.74835028, 2496.45552756,
2399.42126888, 3590.94440122]])