Numpy array being rounded? subtraction of small floats - python

I am assigning the elements of a numpy array to be equal to the subtraction of "small" valued, python float-type numbers. When I do this, and try to verify the results by printing to the command line, the array is reported as all zeros. Here is my code:
import numpy as np
np.set_printoptions(precision=20)
pc1x = float(-0.438765)
pc2x = float(-0.394747)
v1 = np.array([0,0,0])
v1[0] = pc1x-pc2x
print pc1x
print pc2x
print v1
The output looks like this:
-0.438765
-0.394747
[0 0 0]
I expected this for v1:
[-0.044018 0 0]
I am new to numpy, I admit, this may be an obvious mis-understanding of how numpy and float work. I thought that changing the numpy print options would fix, but no luck. Any help is great! Thanks!

You're declaring the array with v1 = np.array([0,0,0]), which numpy assumes you want an int array for. Any subsequent actions on it will maintain this int array status, so after adding your small number element wise, it casts back to int (resulting in all zeros). Declare it with
v1 = np.array([0,0,0],dtype=float)
There's a whole wealth of numpy specific/platform specific datatypes for numpy that are detailed in the dtype docs page.

You are creating the array with an integer datatype (since you don't specify it, NumPy uses the type of the initial data you gave it). Make it a float:
>>> v1 = np.array([0,0,0], dtype=np.float)
>>> v1[0] = pc1x-pc2x
>>> print v1
[-0.04401800000000000157 0. 0. ]
Or change the incoming datatype:
>>> v1 = np.array([0.0, 0.0, 0.0])
>>> v1[0] = pc1x-pc2x
>>> print v1
[-0.04401800000000000157 0. 0. ]

Related

Time Series Clustering of Numpy Objects

Every idea or suggestion would be appreciated! I have several "the same style" numpy objects(u1,u2,u3...) each of them is :
Object 1:
[[Timestamp('2004-02-28 00:59:16'), 19.9884],
[Timestamp('2004-02-28 01:03:16'), 19.3024],
...
[Timestamp('2004-02-28 01:06:16'), 19.1652]]
Object 2:
[[Timestamp('2004-02-28 01:08:17'), 19.567],
[Timestamp('2004-02-28 01:10:16'), 19.5376],
...
[Timestamp('2004-02-28 01:26:47'), 19.4788]]
I would like to find which of the these objects has the same "trends"in the time series by clustering them. I tried several ways including:
from sklearn.neighbors import NearestNeighbors
X = np.array([u1, u2, u3])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)
print(distances)
Some of my errors:
TypeError: float() argument must be a string or a number, not 'Timestamp'
ValueError: setting an array element with a sequence.
TypeError: only size-1 arrays can be converted to Python scalars
Conclusion
Can someone atleast give me a suggestion what should I do. Thanks!
(1) Your first error means that Timestamp must be converted into a string or a number. Just convert them to numbers by .value, which means nanoseconds since Unix epoch time (1970-01-01). Operation in lists:
u1 = list(map(lambda el: (el[0].value / 1e9, el[1]), u1))
u2 = list(map(lambda el: (el[0].value / 1e9, el[1]), u2))
...
(2) np.array([u1, u2, u3]) produces a 3D array instead of the usually expected 2D. This may be the cause of the second error (expected a number but got a sequence instead because of a redundant dimension). Replace this by one of the following:
X = np.array(u1 + u2 + ...) # for lists
X = pd.concat([u1, u2, ...], axis=0) # for dataframes
The revised code can run. Output using your sample data:
[[ 0. 240.00098041]
[ 0. 180.00005229]
[ 0. 121.00066712]
[ 0. 119.00000363]
[ 0. 119.00000363]
[ 0. 991.00000174]]

How to mutliply a number with negative power in python

When I try to multiply this by a negative integer it just returns an error
I use:
A = np.array([[1,2,0], [2,4,-2], [0,-2,3]])
From the screenshot, I can see this is homework.
So it asks for the matrix inverse. In maths this is written as A^(-1)
import numpy as np
A = np.array([[1,2,0], [2,4,-2], [0,-2,3]])
np.linalg.inv(A)
array([[-2. , 1.5 , 1. ],
[ 1.5 , -0.75, -0.5 ],
[ 1. , -0.5 , 0. ]])
In numpy, you can not raise integers by negative integer powers (Read this).
In python, the ** operator returns the value without any error.
In [6]: A = 20
In [7]: print(A ** -1)
0.05
You can also use pow(),
In [1]: A = 20
In [2]: pow(20, -1)
Out[2]: 0.05
If you're working with matrices, it's a good idea to ensure that they are instances of the numpy.matrix type rather than the more-generic numpy.ndarray.
import numpy as np
M = np.matrix([[ ... ]])
To convert an existing generic array to a matrix you can also pass it into np.asmatrix().
Once you have a matrix instance M, one way to get the inverse is M.I
To avoid the "integers not allowed" problem, ensure that the dtype of your matrix is floating-point, not integer (specify dtype=float in the call to matrix() or asmatrix())
To Insert power as negative value assume an another variable and name it "pow" and assign that negative value.
Now put below in your code.
pow = -3
value = 5**pow
print(value)
Execute the code and you will see result.
Hope it helps... 🤗🤗🤗

Why do I keep getting this error 'RuntimeWarning: overflow encountered in int_scalars'

I am trying to multiply all the row values and column values of a 2 dimensional numpy array with an explicit for-loop:
product_0 = 1
product_1 = 1
for x in arr:
product_0 *= x[0]
product_1 *= x[1]
I realize the product will blow up to become an extremely large number but from my previous experience python has had no memory problem dealing very very extremely large numbers.
So from what I can tell this is a problem with numpy except I am not storing the gigantic product in a numpy array or any numpy data type for that matter its just a normal python variable.
Any idea how to fix this?
Using non inplace multiplication hasn't helped product_0 = x[0]*product_0
Python int are represented in arbitrary precision, so they cannot overflow. But numpy uses C++ under the hood, so the highest long signed integer is with fixed precision, 2^63 - 1. Your number is far beyond this value, having in average ((716-1)/2)^86507).
When you, in the for loop, extract the x[0] this is still a numpy object. To use the full power of python integers you need to clearly assign it as python int, like this:
product_0 = 1
product_1 = 1
for x in arr:
t = int(x[0])
product_0 = product_0 * t
and it will not overflow.
Following your comment, which makes your question more specific, your original problem is to calculate the geometric mean of the array for each row/column. Here the solution:
I generate first an array that has the same properties of your array:
arr = np.resize(np.random.randint(1,716,86507*2 ),(86507,2))
Then, calculate the geometric mean for each column/row:
from scipy import stats
gm_0 = stats.mstats.gmean(arr, axis = 0)
gm_1 = stats.mstats.gmean(arr, axis = 1)
gm_0 will be an array that contains the geometric mean of the xand y coordinates. gm_1 instead contains the geometric mean of the rows.
Hope this solves your problem!
You say
So from what I can tell this is a problem with numpy except I am not storing the gigantic product in a numpy array or any numpy data type for that matter its just a normal python variable.
Your product may not be a NumPy array, but it is using a NumPy data type. x[0] and x[1] are NumPy scalars, and multiplying a Python int by a NumPy scalar produces a NumPy scalar. NumPy integers have a finite range.
While you technically could call int on x[0] and x[1] to get a Python int, it'd probably be better to avoid needing such huge ints. You say you're trying to perform this multiplication to compute a geometric mean; in that case, it'd be better to compute the geometric mean by transforming to and from logarithms, or to use scipy.stats.mstats.gmean, which uses logarithms under the hood.
Numpy is compiled for 32 bit and not 64 bit , so while Python can handle this numpy will overflow for smaller values , if u want to use numpy then I recommend that you build it from source .
Edit
After some testing with
import numpy as np
x=np.abs(np.random.randn(1000,2)*1000)
np.max(x)
prod1=np.dtype('int32').type(1)
prod2=np.dtype('int32').type(1)
k=0
for i,j in x:
prod1*=i
prod2*=j
k+=1
print(k," ",prod1,prod2)
1.797693134e308 is the max value (to this many digits my numpy scalar was able to take)
if you run this you will see that numpy is able to handle quite a large value but when you said your max value is around 700 , even with a 1000 values my scalar overflowed.
As for how to fix this , rather than doing this manually the answer using scipy seems more viable now and is able to get the answer so i suggest that you go forward with that
from scipy.stats.mstats import gmean
x=np.abs(np.random.randn(1000,2)*1000)
print(gmean(x,axis=0))
You can achieve what you want with the following command in numpy:
import numpy as np
product_0 = np.prod(arr.astype(np.float64))
It can still reach np.inf if your numbers are large enough, but that can happen for any type.

How to generate numbers between 0-1 in 4200 steps

I want to generate floating point numbers between 0 and 1 that are not random. I would like the range to consist of 4200 values so In python I did 1/4200 to get, what number is needed to get from 0-1 in 4200 steps. This gave me the value 0.0002380952380952381, I confirmed this by doing 0.0002380952380952381*4200 = 1 (in Python) I have tried:
y_axis = [0.1964457, 0.20904465, 0.22422191, 0.68414455, 0.5341106, 0.49412863]
x1 = [0.18536805, 0.22449078, 0.26378343 ,0.73328144 ,0.63372454, 0.60280087,0.49412863]
y2_axis = [0.18536805 0.22449078 0.26378343 ... 0.73328144 0.63372454 0.60280087] 0.49412863]
plt.plot(pl.frange(0,1,0.0002380952380952381) , y_axis)
plt.plot(x1,y2)
This returns: ValueError: x and y must have same first dimension, but have shapes (4201,) and (4200,)
I would like help with resolving this, otherwise any other method that would also work would also be appreciated. I am sure other solutions are available and this maybe long winded. Thank you
To generate the numbers, you can use a list comprehension:
[i/4200 for i in range(4201)]
Numpy makes this really easy:
>>> import numpy as np
>>> np.linspace(0, 1, 4200)
array([ 0.00000000e+00, 2.38151941e-04, 4.76303882e-04, ...,
9.99523696e-01, 9.99761848e-01, 1.00000000e+00])

Why is numpy array's .tolist() creating long doubles?

I have some math operations that produce a numpy array of results with about 8 significant figures. When I use tolist() on my array y_axis, it creates what I assume are 32-bit numbers.
However, I wonder if this is just garbage. I assume it is garbage, but it seems intelligent enough to change the last number so that rounding makes sense.
print "y_axis:",y_axis
y_axis = y_axis.tolist()
print "y_axis:",y_axis
y_axis: [-0.99636686 0.08357361 -0.01638707]
y_axis: [-0.9963668578012771, 0.08357361233570479, -0.01638706796138937]
So my question is: if this is not garbage, does using tolist actually help in accuracy for my calculations, or is Python always using the entire number, but just not displaying it?
When you call print y_axis on a numpy array, you are getting a truncated version of the numbers that numpy is actually storing internally. The way in which it is truncated depends on how numpy's printing options are set.
>>> arr = np.array([22/7, 1/13]) # init array
>>> arr # np.array default printing
array([ 3.14285714, 0.07692308])
>>> arr[0] # int default printing
3.1428571428571428
>>> np.set_printoptions(precision=24) # increase np.array print "precision"
>>> arr # np.array high-"precision" print
array([ 3.142857142857142793701541, 0.076923076923076927347012])
>>> float.hex(arr[0]) # actual underlying representation
'0x1.9249249249249p+1'
The reason it looks like you're "gaining accuracy" when you print out the .tolist()ed form of y_axis is that by default, more digits are printed when you call print on a list than when you call print on a numpy array.
In actuality, the numbers stored internally by either a list or a numpy array should be identical (and should correspond to the last line above, generated with float.hex(arr[0])), since numpy uses numpy.float64 by default, and python float objects are also 64 bits by default.
My understanding is that numpy is not showing you the full precision to make the matrices lay out consistently. The list shouldn't have any more precision than its numpy.array counterpart:
>>> v = -0.9963668578012771
>>> a = numpy.array([v])
>>> a
array([-0.99636686])
>>> a.tolist()
[-0.9963668578012771]
>>> a[0] == v
True
>>> a.tolist()[0] == v
True

Categories