I'm using numpy loadtxt function to read in a large set of data. The data appears to be rounded off. for example: The number in the text file is -3.79000000000005E+01 but numpy reads the number in as -37.9. I've set the dypte to np.float64 in the loadtxt call. Is there anyway to keep the precision of the original data file?
loadtxt is not rounding the number. What you are seeing is the way NumPy chooses to print the array:
In [80]: import numpy as np
In [81]: x = np.loadtxt('test.dat', dtype = np.float64)
In [82]: print(x)
-37.9
The actual value is the np.float64 closest to the value inputted.
In [83]: x
Out[83]: array(-37.9000000000005)
Or, in the more likely instance that you have a higher dimensional array,
In [2]: x = np.loadtxt('test.dat', dtype = np.float64)
If the repr of x looks truncated:
In [3]: x
Out[3]: array([-37.9, -37.9])
you can use np.set_printoptions to get higher precision:
In [4]: np.get_printoptions()
Out[4]:
{'edgeitems': 3,
'infstr': 'inf',
'linewidth': 75,
'nanstr': 'nan',
'precision': 8,
'suppress': False,
'threshold': 1000}
In [5]: np.set_printoptions(precision = 17)
In [6]: x
Out[6]: array([-37.90000000000050306, -37.90000000000050306])
(Thanks to #mgilson for pointing this out.)
Related
I want the values of the np.mean function to be roughly the same, before and after the dtype change. The dtype has to remain float32.
array = np.random.randint(0, high=255, size=(3, 12000, 12000),dtype="int")
array = array[:,500:10000,500:10000]
array= array.reshape((-1,3))
# array.shape is now (90250000, 3)
print(np.mean(array,axis=0),array.dtype) # Nr.1
array = array.astype("float32")
print(np.mean(array,axis=0),array.dtype) # Nr.2
Results of the two print functions:
[127.003107 127.00156286 126.99015613] int32
[47.589664 47.589664 47.589664] float32
Adding a .copy() the view line has no effect. The size of the view effects the impact on the float mean. Changing the size in both the last dimensions to [500:8000] results in:
[76.35497 76.35497 76.35497] float32
Around [500:5000]and below both means are actually around the same.
Changing the code starting from the reshape line:
array= array.reshape((-1,3))
array_float = array.astype("float32")
print(np.all(array_float==array),array.dtype,array_float.dtype)
Results in:
True int32 float32
So if the values are the same, why are the results from np.mean different ?
Your array:
In [50]: arr.shape, arr.dtype
Out[50]: ((90250000, 3), dtype('int32'))
You could have gotten this with np.random.randint(0, high=255, size=(90250000,3),dtype="int"). In fact we don't need that size 3 dimension. Anyways it's just many numbers in the (0,255) range.
The expected mean:
In [51]: np.mean(arr, axis=0)
Out[51]: array([126.9822936 , 126.99682718, 126.99214526])
But notice what we get if we just sum those numbers:
In [52]: np.sum(arr, axis=0)
Out[52]: array([-1424749891, -1423438235, -1423860778])
The int32 sum as overflowed and wrapped around. There are too many numbers. So mean must be doing something more sophisticated than simply summing and dividing by the count.
Taking mean on the float32 gives the funny values:
In [53]: np.mean(arr.astype('float32'), axis=0)
Out[53]: array([47.589664, 47.589664, 47.589664], dtype=float32)
but float64 matches the int case (but with a long conversion time):
In [54]: np.mean(arr.astype('float64'), axis=0)
Out[54]: array([126.9822936 , 126.99682718, 126.99214526])
It looks like the float mean is just doing the sum and divide method:
In [56]: np.sum(arr.astype('float64'), axis=0)
Out[56]: array([1.14601520e+10, 1.14614637e+10, 1.14610411e+10])
In [57]: np.sum(arr.astype('float32'), axis=0)
Out[57]: array([4.2949673e+09, 4.2949673e+09, 4.2949673e+09], dtype=float32)
In [58]: Out[56]/arr.shape[0]
Out[58]: array([126.9822936 , 126.99682718, 126.99214526])
In [59]: Out[57]/arr.shape[0]
Out[59]: array([47.58966533, 47.58966533, 47.58966533])
While the sum is within the range of float32:
In [60]: np.finfo('float32')
Out[60]: finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)
for some reason it is having problems getting the right values.
Note that the python sum has problems with the int version:
In [70]: sum(arr[:,0])
C:\Users\paul\AppData\Local\Temp\ipykernel_1128\1456076714.py:1: RuntimeWarning: overflow encountered in long_scalars
sum(arr[:,0])
Out[70]: -1424749891
There is a math.fsum that handles large sums better:
In [71]: math.fsum(arr[:,0])
Out[71]: 11460151997.0
Sum on the long ints also works fine:
In [72]: np.sum(arr.astype('int64'),axis=0)
Out[72]: array([11460151997, 11461463653, 11461041110], dtype=int64)
From the np.mean docs:
dtype : data-type, optional
Type to use in computing the mean. For integer inputs, the default
is `float64`; for floating point inputs, it is the same as the
input dtype.
Notes
-----
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for `float32` (see
example below). Specifying a higher-precision accumulator using the
`dtype` keyword can alleviate this issue.
Playing with the dtype parameter:
In [74]: np.mean(arr, axis=0, dtype='int32')
Out[74]: array([-15, -15, -15])
In [75]: np.mean(arr, axis=0, dtype='int64')
Out[75]: array([126, 126, 126], dtype=int64)
In [76]: np.mean(arr, axis=0, dtype='float32')
Out[76]: array([47.589664, 47.589664, 47.589664], dtype=float32)
In [77]: np.mean(arr, axis=0, dtype='float64')
Out[77]: array([126.9822936 , 126.99682718, 126.99214526])
The -15 is explained by:
In [78]: -1424749891/arr.shape[0]
Out[78]: -15.786702393351801
In sum, if you want accurate results you need to use float64, either with the default mean dtype, or the appropriate astype. Working with float32 can give problems, especially with this many elements.
Changing to "float64" solves the problem.
array = np.random.randint(0, high=255, size=(3, 12000, 12000),dtype="int")
array = array[:,500:10000,500:10000]
array= array.reshape((-1,3))
# array.shape is now (90250000, 3)
print(array.mean(axis=0),array.dtype) # Nr.1
array = array.astype("float64")
print(array.mean(axis=0),array.dtype) # Nr.2
Results in:
[126.98418438 126.9969912 127.00242922] int32
[126.98418438 126.9969912 127.00242922] float64
I have a pandas.Series that every element is a numpy.array,
For example:
p = pandas.Series([numpy.array([1,2]), numpy.array([2,4])])
I try to convert the whole Series into a multi-dimensional (2,2) numpy.array, for that I use values method of Series, but this one returns a single dimension numpy array that each element in the array is a numpy.array and the dtype of the array is object:
In [18]: p = pandas.Series([numpy.array([1,2]), numpy.array([2,4])])
In [19]: p.values
Out[19]: array([array([1, 2]), array([2, 4])], dtype=object)
The result that I would like to achieve is as the series would have been created as numpy array
In [23]: a = numpy.array([numpy.array([1,2]), numpy.array([2,4])])
In [24]: a
Out[24]:
array([[1, 2],
[2, 4]])
In [25]: a.shape
Out[25]: (2, 2)
Does anyone have an idea of how to make such a conversion? to_numpy method doesn't work also for me.
I would suggest following conversion:
import numpy as np
import pandas as pd
p = pd.Series([np.array([1,2]), np.array([2,4])])
np.array(p.values.tolist()).shape
I have a 1024^3 data set to read. But this array is too large, so I want to save the data every 2^n points. For example, if I set skip = 2, then the array will be 512^3.
import numpy as np
nx = 1024
filename = 'E:/AUTOIGNITION/Z1/Velocity1_inertHIT.bin'
U = np.fromfile(filename, dtype=np.float32, count=nx**3).reshape(nx, nx, nx)
How do I do this by using reshape?
If I understand correctly, you want to sample every (2^n):th value of the array, in each of the 3 dimensions.
You can do this by indexing as follows.
>>> import numpy as np
>>> n = 1
>>> example_array = np.zeros((1024,1024,1024))
>>> N = 2^n
>>> example_array_sampled = example_array[::N, ::N, ::N]
>>> example_array_sampled.shape
(512, 512, 512)
Verifying that it actually samples evenly:
>>> np.arange(64).reshape((4,4,4))[::2,::2,::2]
array([[[ 0, 2],
[ 8, 10]],
[[32, 34],
[40, 42]]])
This particular code still assumes that you will read in the whole array in memory before indexing into it. And the variable example_array_sampled will be a view into the original array.
More info on indexing numpy arrays: https://numpy.org/doc/stable/reference/arrays.indexing.html
Some info on numpy views: https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
Related question (in one dimension): subsampling every nth entry in a numpy array
I am trying to round a numpy array that is outputted by the result of a Keras model prediction. However after executing numpy.round/numpy.around, there is no change.
The end goal here is for the array to get rounded down to 0 if below/equal 0.50 or rounded up if above 0.50.
The code is here:
from keras.models import load_model
import numpy
model = load_model('tried.h5')
data = numpy.loadtxt("AppData\Roaming\MetaQuotes\Terminal\94DDB309C90B408373EFC53AC730F336\MQL4\Files\indicatorout.csv", delimiter=",")
data = numpy.array([data])
print(data)
outdata = model.predict(data)
print(outdata)
numpy.around(outdata, 0)
print(outdata)
numpy.savetxt("AppData\Roaming\MetaQuotes\Terminal\94DDB309C90B408373EFC53AC730F336\MQL4\Files\modelout.txt", outdata)
The logs are also here:
Using TensorFlow backend.
[[1.19539070e+01 1.72686310e+01 2.24426384e+01 1.82771435e+01
2.23788052e+01 1.62105408e+01 1.44595184e+01 1.90179043e+01
1.71749554e+01 1.69194088e+01 1.89911938e+01 1.76701393e+01
5.19613740e-01 5.38522415e+01 9.64037247e+01 1.73570000e-04
4.35710000e-04 9.55710000e-04]]
[[0.4215713]]
[[0.4215713]]
Any help would be greatly appreciated, thank you.
I assume that you want the elements in the array to round to some n decimal places. Below is an illustration for doing so:
# sample array to work with
In [21]: arr = np.random.randn(4)
In [22]: arr
Out[22]: array([-0.94817409, -1.61453252, 0.16566428, -0.53507549])
# round to 3 decimal places; note that `arr` is still unaffected.
In [23]: arr.round(decimals=3)
Out[23]: array([-0.948, -1.615, 0.166, -0.535])
# if you want to round it to nearest integer
In [24]: arr_rint = np.rint(arr)
In [25]: arr_rint
Out[25]: array([-1., -2., 0., -1.])
To make the decimal rounding to work in-place, specify the out= argument as in:
In [26]: arr.round(decimals=3, out=arr)
Out[26]: array([-0.948, -1.615, 0.166, -0.535])
Simple rounding of a floating point numpy array seems not working for some reason..
I get numpy array from reading a huge img (shape of (7352, 7472)). Ex values:
>>> imarray[3500:3503, 5000:5003]
array([[ 73.33999634, 73.40000153, 73.45999908],
[ 73.30999756, 73.37999725, 73.43000031],
[ 73.30000305, 73.36000061, 73.41000366]], dtype=float32)
And for rounding I've been just trying to use numpy.around() for the raw value, also writing values to a new array, a copie of raw array, but for some reason no results..
arr=imarray
numpy.around(imarray, decimals=3, out=arr)
arr[3500,5000] #results in 73.3399963379, as well as accessing imarray
So, even higher precision!!!
Is that because of such big array?
I need to round it to get the most frequent value (mode), and I'm searching the vay to avoid more and more libraries..
Your array has dtype float32. That is a 4-byte float.
The closest float to 73.340 representable using float32 is roughly 73.33999634:
In [62]: x = np.array([73.33999634, 73.340], dtype = np.float32)
In [63]: x
Out[63]: array([ 73.33999634, 73.33999634], dtype=float32)
So I think np.around is rounding correctly, it is just that your dtype has too large a granularity to round to the number you might be expecting.
In [60]: y = np.around(x, decimals = 3)
In [61]: y
Out[61]: array([ 73.33999634, 73.33999634], dtype=float32)
Whereas, if the dtype were np.float64:
In [64]: x = np.array([73.33999634, 73.340], dtype = np.float64)
In [65]: y = np.around(x, decimals = 3)
In [66]: y
Out[66]: array([ 73.34, 73.34])
Note that even though printed representation for y shows 73.34, it is not necessarily true that the real number 73.34 is exactly representable as a float64 either. The float64 representation is probably just so close to 73.34 that NumPy chooses to print it as 73.34.
The answer by #unutbu is absolutely correct. Numpy is rounding it as close to the number as it can given the precision that you requested. The only thing that I have to add is that you can use numpy.set_printoptions to change how the array is displayed:
>>> import numpy as np
>>> x = np.array([73.33999634, 73.340], dtype = np.float32)
>>> y = np.round(x, decimals = 3)
>>> y
array([ 73.33999634, 73.33999634], dtype=float32)
>>> np.set_printoptions(precision=3)
>>> y
array([ 73.34, 73.34], dtype=float32)