I have an array of Cartesian coordinates
xy = np.array([[0,0], [2,3], [3,4], [2,5], [5,2]])
which I want to convert into an array of complex numbers representing the same:
c = np.array([0, 2+3j, 3+4j, 2+5j, 5+2j])
My current solution is this:
c = np.sum(xy * [1,1j], axis=1)
This works but seems crude to me, and probably there is a nicer version with some built-in magic using np.complex() or similar, but the only way I found to use this was
c = np.array(list(map(lambda c: np.complex(*c), xy)))
This doesn't look like an improvement.
Can anybody point me to a better solution, maybe using one of the many numpy functions I don't know by heart (is there a numpy.cartesian_to_complex() working on arrays I haven't found yet?), or maybe using some implicit conversion when applying a clever combination of operators?
Recognize that complex128 is just a pair of floats. You can then do this using a "view" which is free, after converting the dtype from int to float (which I'm guessing your real code might already do):
xy.astype(float).view(np.complex128)
The astype() converts the integers to floats, which requires construction of a new array, but once that's done the view() is "free" in terms of runtime.
The above gives you shape=(n,1); you can np.squeeze() it to remove the extra dimension. This is also just a view operation, so takes basically no time.
How about
c=xy[:,0]+1j*xy[:,1]
xy[:,0] will give an array of all elements in the 0th column of xy and xy[:,1] will give that of the 1st column.
Multiply xy[:,1] with 1j to make it imaginary and then add the result with xy[:,0].
Related
I have two real arrays (a and b), and I would like create a complex array (c) which takes the two real arrays as its real and imaginary parts respectively.
The simplest one would be
c = a + b * 1.0j
However, since my data size is quite large, such code is not very efficient.
We can also do the following,
c = np.empty(data_shape)
c.real = a
c.imag = b
I am wondering is there a better way to do that (e.g. using buffer or something)?
Thank you very much!
Since the real and imaginary parts of each element have to be contiguous, you will have to allocate another buffer to interleave the data no matter what. The second method shown in the question is therefore about as efficient as you're likely to get. One alternative would be
np.stack((a, b), axis=-1).view(np.complex).squeeze(-1)
This works for any array shape, not just 1D. It ensures proper interleaving by stacking along the last dimension in C order.
This assumes that your datatype is np.float. If not, either promote to float (e.g. a = a.astype(float)), or possibly change np.complex to something else.
I am facing a mystery right now. I get strange results in some program and I think it may be related to the computation since I got different results with my functions compared to manual computation.
This is from my program, I am printing the values pre-computation :
print("\nPrecomputation:\nmatrix\n:", matrix)
tmp = likelihood_left * likelihood_right
print("\nconditional_dep:", tmp)
print("\nfinal result:", matrix # tmp)
I got the following output:
Precomputation:
matrix:
[array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294])
array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784])
array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768])
array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674])
array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])]
conditional_dep: [0.01391123 0.01388155 0.17221067 0.02675524 0.01033257]
final result: [0.07995043 0.03485223 0.02184015 0.04721548 0.05323298]
The thing is when I compute the following code:
matrix = [np.array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294]),
np.array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784]),
np.array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768]),
np.array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674]),
np.array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])]
tmp = np.asarray([0.01391123, 0.01388155, 0.17221067, 0.02675524, 0.01033257])
matrix # tmp
The values in use are exactly the same as they should be in the computation before but I get the following result:
array([0.04171218, 0.04535276, 0.02546353, 0.04688848, 0.03106443])
This result is then obviously different than the previous one and is the true one (I computed the dot product by hand).
I have been facing this problem the whole day and I did not find anything useful online. If any of you have any even tiny idea where it can come from I'd be really happy :D
Thank's in advance
Yann
PS: I can show more of the code if needed.
PS2: I don't know if it is relevant but this is used in a dynamic programming algorithm.
To recap our discussion in the comments, in the first part ("pre-computation"), the following is true about the matrix object:
>>> matrix.shape
(5,)
>>> matrix.dtype
dtype('O') # aka object
And as you say, this is due to matrix being a slice of a larger, non-uniform array. Let's recreate this situation:
>>> matrix = np.array([[], np.array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294]), np.array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784]), np.array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768]), np.array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674]), np.array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])])[1:]
It is now not a matrix with scalars in rows and columns, but a column vector of column vectors. Technically, matrix # tmp is an operation between two 1-D arrays and hence NumPy should, according to the documentation, calculate the inner product of the two. This is true in this case, with the convention that the sum be over the first axis:
>>> np.array([matrix[i] * tmp[i] for i in range(5)]).sum(axis=0)
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
>>> matrix # tmp
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
This is essentially the same as taking the transpose of the proper 2-D matrix before the multiplication:
>>> np.stack(matrix).T # tmp
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
Equivalently, as noted by #jirasssimok:
>>> tmp # np.stack(matrix)
array([0.07995043, 0.03485222, 0.02184015, 0.04721548, 0.05323298])
Hence the erroneous or unexpected result.
As you have already resolved to do in the comments, this can be avoided in the future by ensuring all matrices are proper 2-D arrays.
It looks like you got the operands switched in one of your matrix multiplications.
Using the same values of matrix and tmp that you provided, matrix # tmp and tmp # matrix provide the two results you showed.1
matrix = [np.array([0.08078721, 0.5802404 , 0.16957052, 0.09629893, 0.07310294]),
np.array([0.14633129, 0.45458744, 0.20096238, 0.02142105, 0.17669784]),
np.array([0.41198731, 0.06197812, 0.05934063, 0.23325626, 0.23343768]),
np.array([0.15686545, 0.29516415, 0.20095091, 0.14720275, 0.19981674]),
np.array([0.15965914, 0.18383683, 0.10606946, 0.14234812, 0.40808645])]
tmp = np.asarray([0.01391123, 0.01388155, 0.17221067, 0.02675524, 0.01033257])
print(matrix # tmp) # [0.04171218 0.04535276 0.02546353 0.04688848 0.03106443]
print(tmp # matrix) # [0.07995043 0.03485222 0.02184015 0.04721548 0.05323298]
To make it a little more obvious what your code is doing, you might also consider using np.dot instead of #. If you pass matrix as the first argument and tmp as the second, it will have the result you want, and make it more clear that you're conceptually calculating dot products rather than multiplying matrices.
As an additional note, if you're performing matrix operations on matrix, it might be better if it was a single two-dimensional array instead of a list of 1-dimensional arrays. this will prevent errors of the sort you'll see right now if you try to run matrix # matrix. This would also let you say matrix.dot(tmp) instead of np.dot(matrix, tmp) if you wanted to.
(I'd guess that you can use np.stack or a similar function to create matrix, or you can call np.stack on matrix after creating it.)
1 Because tmp has only one dimension and matrix has two, NumPy can and will treat tmp as whichever type of vector makes the multiplication work (using broadcasting). So tmp is treated as a column vector in matrix # tmp and a row vector in tmp # matrix.
I am trying to multiply all the row values and column values of a 2 dimensional numpy array with an explicit for-loop:
product_0 = 1
product_1 = 1
for x in arr:
product_0 *= x[0]
product_1 *= x[1]
I realize the product will blow up to become an extremely large number but from my previous experience python has had no memory problem dealing very very extremely large numbers.
So from what I can tell this is a problem with numpy except I am not storing the gigantic product in a numpy array or any numpy data type for that matter its just a normal python variable.
Any idea how to fix this?
Using non inplace multiplication hasn't helped product_0 = x[0]*product_0
Python int are represented in arbitrary precision, so they cannot overflow. But numpy uses C++ under the hood, so the highest long signed integer is with fixed precision, 2^63 - 1. Your number is far beyond this value, having in average ((716-1)/2)^86507).
When you, in the for loop, extract the x[0] this is still a numpy object. To use the full power of python integers you need to clearly assign it as python int, like this:
product_0 = 1
product_1 = 1
for x in arr:
t = int(x[0])
product_0 = product_0 * t
and it will not overflow.
Following your comment, which makes your question more specific, your original problem is to calculate the geometric mean of the array for each row/column. Here the solution:
I generate first an array that has the same properties of your array:
arr = np.resize(np.random.randint(1,716,86507*2 ),(86507,2))
Then, calculate the geometric mean for each column/row:
from scipy import stats
gm_0 = stats.mstats.gmean(arr, axis = 0)
gm_1 = stats.mstats.gmean(arr, axis = 1)
gm_0 will be an array that contains the geometric mean of the xand y coordinates. gm_1 instead contains the geometric mean of the rows.
Hope this solves your problem!
You say
So from what I can tell this is a problem with numpy except I am not storing the gigantic product in a numpy array or any numpy data type for that matter its just a normal python variable.
Your product may not be a NumPy array, but it is using a NumPy data type. x[0] and x[1] are NumPy scalars, and multiplying a Python int by a NumPy scalar produces a NumPy scalar. NumPy integers have a finite range.
While you technically could call int on x[0] and x[1] to get a Python int, it'd probably be better to avoid needing such huge ints. You say you're trying to perform this multiplication to compute a geometric mean; in that case, it'd be better to compute the geometric mean by transforming to and from logarithms, or to use scipy.stats.mstats.gmean, which uses logarithms under the hood.
Numpy is compiled for 32 bit and not 64 bit , so while Python can handle this numpy will overflow for smaller values , if u want to use numpy then I recommend that you build it from source .
Edit
After some testing with
import numpy as np
x=np.abs(np.random.randn(1000,2)*1000)
np.max(x)
prod1=np.dtype('int32').type(1)
prod2=np.dtype('int32').type(1)
k=0
for i,j in x:
prod1*=i
prod2*=j
k+=1
print(k," ",prod1,prod2)
1.797693134e308 is the max value (to this many digits my numpy scalar was able to take)
if you run this you will see that numpy is able to handle quite a large value but when you said your max value is around 700 , even with a 1000 values my scalar overflowed.
As for how to fix this , rather than doing this manually the answer using scipy seems more viable now and is able to get the answer so i suggest that you go forward with that
from scipy.stats.mstats import gmean
x=np.abs(np.random.randn(1000,2)*1000)
print(gmean(x,axis=0))
You can achieve what you want with the following command in numpy:
import numpy as np
product_0 = np.prod(arr.astype(np.float64))
It can still reach np.inf if your numbers are large enough, but that can happen for any type.
I tried to solve a PDE numerically and in the course of this I faced the problem of a triple-nested for loop resembling the 3 spatial dimension. This construct is nested in another time loop, so you can imagine that the computing takes forever for sufficient large node numbers. The code block looks like this
for jy in range(0,cy-1):
for jx in range(0,cx-1):
for jz in range(0,cz-1):
T[n+1,jx,jy,jz] = T[n,jx,jy,jz] + s*(T[n,jx-1,jy,jz] - 2*T[n,jx,jy,jz] + T[n,jx+1,jy,jz]) + s*(T[n,jx,jy-1,jz] - 2*T[n,jx,jy,jz] + T[n,jx,jy+1,jz]) + s*(T[n,jx,jy,jz-1] - 2*T[n,jx,jy,jz] + T[n,jx,jy,jz+1])
It might look intimidating at first, but is quite easy. I have a 3 dimensional matrix representing a solid bulk material, where each point represents the current temperature. The iteratively calculated next temperature at each point is calculated taking into account each point next to that point - so 6 in total. In the case of a 1-dimensional solid the solution is just a simple matrix multiplication. Is there any chance to represent the 3-loop-system above in a simple matrix solution like in the 1D case?
Best regards!
With numpy you can easily do these kinds of matrix operations,
e.g for a 3x3 matrix
import numpy as np
T = np.random.random((3,3,3))
T = T*T - 2*T ... etc.
First off, you need to be a bit more careful with your terminology. A "matrix" is a 2-Dimensional array of numbers. So you are really talking about an array. Numpy, or better yet Scipy, has an data type called an ndarray. You need to be very careful manipulating them, because although they are sometimes used to represent matrices, there are operations that can be performed on 2-D arrays that are not mathematically legal for matrices.
I strongly recommend you use # and not * to perform multiplication of 1- or 2-D matrices, and be sure to add code to check that the operations you are doing are legal mathematically. As a trivial example, Python lets you add a 1 x n or an n x 1 vector to an n x n matrix, even though that is not mathematically correct. The reason it allows it is, as intimated above, because there is no true matrix type in Python.
It very well may be that you can reformulate your problem to use a 3-D array, and by experimentation find the particular operation you are trying to perform. Just keep in mind that the rules of linear algebra are only casually applied in Python.
I have a dataset on which I'm trying to apply some arithmetical method.
The thing is it gives me relatively large numbers, and when I do it with numpy, they're stocked as 0.
The weird thing is, when I compute the numbers appart, they have an int value, they only become zeros when I compute them using numpy.
x = np.array([18,30,31,31,15])
10*150**x[0]/x[0]
Out[1]:36298069767006890
vector = 10*150**x/x
vector
Out[2]: array([0, 0, 0, 0, 0])
I have off course checked their types:
type(10*150**x[0]/x[0]) == type(vector[0])
Out[3]:True
How can I compute this large numbers using numpy without seeing them turned into zeros?
Note that if we remove the factor 10 at the beggining the problem slitghly changes (but I think it might be a similar reason):
x = np.array([18,30,31,31,15])
150**x[0]/x[0]
Out[4]:311075541538526549
vector = 150**x/x
vector
Out[5]: array([-329406144173384851, -230584300921369396, 224960293581823801,
-224960293581823801, -368934881474191033])
The negative numbers indicate the largest numbers of the int64 type in python as been crossed don't they?
As Nils Werner already mentioned, numpy's native ctypes cannot save numbers that large, but python itself can since the int objects use an arbitrary length implementation.
So what you can do is tell numpy not to convert the numbers to ctypes but use the python objects instead. This will be slower, but it will work.
In [14]: x = np.array([18,30,31,31,15], dtype=object)
In [15]: 150**x
Out[15]:
array([1477891880035400390625000000000000000000L,
191751059232884086668491363525390625000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
28762658884932613000273704528808593750000000000000000000000000000000L,
437893890380859375000000000000000L], dtype=object)
In this case the numpy array will not store the numbers themselves but references to the corresponding int objects. When you perform arithmetic operations they won't be performed on the numpy array but on the objects behind the references.
I think you're still able to use most of the numpy functions with this workaround but they will definitely be a lot slower than usual.
But that's what you get when you're dealing with numbers that large :D
Maybe somewhere out there is a library that can deal with this issue a little better.
Just for completeness, if precision is not an issue, you can also use floats:
In [19]: x = np.array([18,30,31,31,15], dtype=np.float64)
In [20]: 150**x
Out[20]:
array([ 1.47789188e+39, 1.91751059e+65, 2.87626589e+67,
2.87626589e+67, 4.37893890e+32])
150 ** 28 is way beyond what an int64 variable can represent (it's in the ballpark of 8e60 while the maximum possible value of an unsigned int64 is roughly 18e18).
Python may be using an arbitrary length integer implementation, but NumPy doesn't.
As you deduced correctly, negative numbers are a symptom of an int overflow.