Truncating decimal digits numpy array of floats - python

I want to truncate the float values within the numpy array, for .e.g.
2.34341232 --> 2.34
I read the post truncate floating point but its for one float. I don't want to run a loop on the numpy array, it will be quite expensive. Is there any inbuilt method within numpy which can do this easily? I do need output as a float not string.

Try out this modified version of numpy.trunc().
import numpy as np
def trunc(values, decs=0):
return np.trunc(values*10**decs)/(10**decs)
Sadly, numpy.trunc function doesn't allow decimal truncation. Luckily, multiplying the argument and dividing it's result by a power of ten give the expected results.
vec = np.array([-4.79, -0.38, -0.001, 0.011, 0.4444, 2.34341232, 6.999])
trunc(vec, decs=2)
which returns:
>>> array([-4.79, -0.38, -0. , 0.01, 0.44, 2.34, 6.99])

Use numpy.round:
import numpy as np
a = np.arange(4) ** np.pi
a
=> array([ 0. , 1. , 8.82497783, 31.5442807 ])
a.round(decimals=2)
=> array([ 0. , 1. , 8.82, 31.54])

Related

Round a numpy array to .5 or .0 only

I have to round every element inside a numpy array only to .5 or .0 values. I know the np.arange() method, however it is not useful in this specific task since I can only use it to set a precision equal to one.
Here there is an example of what I should do:
x = np.array([2.99845, 4.51845, 0.33365, 0.22501, 2.48523])
x_rounded = some_function(x)
>>> x_rounded
array([3.0, 4.5, 0.5, 0.0, 2.5])
Is there a built-in method to do so or I have to create it?
If I should create that method, is there an efficient? I'm working on a big dataset, so I would like to avoid iterating over each element.
import numpy as np
x = np.array([2.99845, 4.51845, 0.33365, 0.22501, 2.48523])
np.round(2 * x) / 2
Output:
array([3. , 4.5, 0.5, 0. , 2.5])

Markov Clustering in Python

As the title says, I'm trying to get a Markov Clustering Algorithm to work in Python, namely Python 3.7
Unfortunately, it's not doing much of anything, and it's driving me up the wall trying to fix it.
EDIT: First, I've made the adjustments to the main code to make each column sum to 100, even if it's not perfectly balanced. I'm going to try to account for that in the final answer.
To be clear, the biggest problem is that the numbers spiral out of control, into such easily-understandable numbers as 5.56268465e-309, and I don't know how to convert that into something understandable.
Here's the code so far:
import numpy as np
import math
## How far you'd like your random-walkers to go (bigger number -> more walking)
EXPANSION_POWER = 2
## How tightly clustered you'd like your final picture to be (bigger number -> more clusters)
INFLATION_POWER = 2
ITERATION_COUNT = 10
def normalize(matrix):
return matrix/np.sum(matrix, axis=0)
def expand(matrix, power):
return np.linalg.matrix_power(matrix, power)
def inflate(matrix, power):
for entry in np.nditer(transition_matrix, op_flags=['readwrite']):
entry[...] = math.pow(entry, power)
return matrix
def run(matrix):
#np.fill_diagonal(matrix, 1)
#print(matrix)
matrix = normalize(matrix)
print(matrix)
for _ in range(ITERATION_COUNT):
matrix = normalize(inflate(expand(matrix, EXPANSION_POWER), INFLATION_POWER))
return matrix
transition_matrix = np.array ([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0.5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0.33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0.33,0,0,0.5,0,0,0,0,0,0,0,0,0,0.125,1],
[0,0,0,0.33,0,0,0.5,1,1,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.166,0,0,0,0,0,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.166,0,0,0,0,0.2,0,0,0,0,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,0,0,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,0,0,0,0,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0.5,0,1,0,0,0.125,0],
[0,0,0,0,0.167,0,0,0,0,0.2,0.25,0,1,0,1,0,0.125,0],
[0,0,0,0,0,0.34,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0],
[0,0,0,0,0,0.33,0,0,0,0,0,0,0,0,0,0.5,0,0]])
run(transition_matrix)
print(transition_matrix)
This is part of a uni assignment - I need to do this array both weighted and unweighted (though the weighted part can just wait until I've got the bloody thing working at all) any tips or suggestions?
Your transition matrix is not valid.
>>> transition_matrix.sum(axis=0)
>>> matrix([[1. , 1. , 0.99, 0.99, 0.96, 0.99, 1. , 1. , 0. , 1. ,
1. , 1. , 1. , 0. , 0. , 1. , 0.88, 1. ]])
Not only does some of your columns not sum to 1, some of them sum to 0.
This means when you try to normalize your matrix, you will end up with nan because you are dividing by 0.
Lastly, is there a reason why you are using a Numpy matrix instead of just a Numpy array, which is the recommended container for such data? Because using Numpy arrays will simplify some of the operations, such as raising each entry to a power. Also, there are some differences between Numpy matrix and Numpy array which can result in subtle bugs.

cancellation in numpy array operation including a scalar

I'm using NumPy version 1.7.1.
Now I came across a strange cancellation I don't understand:
>>> import numpy as np
>>> a = np.array([ 883, 931, 874], dtype=np.float32)
Mathematically a+0.1-a should be 0.1.
Now let's calculate the value of
this expression and absolute and relative error:
>>> a+0.1-a
array([ 0.09997559, 0.09997559, 0.09997559], dtype=float32)
>>> (a+0.1-a)-0.1
array([ -2.44155526e-05, -2.44155526e-05, -2.44155526e-05], dtype=float32)
>>> ((a+0.1-a)-0.1) / 0.1
array([-0.00024416, -0.00024416, -0.00024416], dtype=float32)
First question: This is a quite high absolute and relative error, this is just catastrophic cancellation, isn't it?
Second question: When I use an array instead of the scalar, NumPy is able to calculate with much more precision, see the relative error:
>>> a+np.array((0.1,)*3)-a
array([ 0.1, 0.1, 0.1])
>>> (a+np.array((0.1,)*3)-a)-0.1
array([ 2.27318164e-14, 2.27318164e-14, 2.27318164e-14])
This is just the numerical representation of 0.1 I guess.
But why is NumPy not able to handle this the same way if a scalar is used instead of an array as in a+0.1-a?
If you use double precision the scenario changes. What you are getting is expected for single precision (np.float32):
a = np.array([ 883, 931, 874], dtype=np.float64)
a+0.1-a
# array([ 0.1, 0.1, 0.1])
((a+0.1-a)-0.1) / 0.1
# array([ 2.27318164e-13, 2.27318164e-13, 2.27318164e-13])
Using np.array((0.1,)*3) in the middle of the expression turned everything to float64, which explains the higher precision in the second result.

Checking existence of an array inside an array of arrays python [duplicate]

This question already has answers here:
Python: Find number of occurrences of given array within two-dimensional array
(6 answers)
Closed 9 years ago.
I have a numpy array of arrays:
qv=array([[-1.075, -1.075, -3. ],
[-1.05 , -1.075, -3. ],
[-1.025, -1.075, -3. ],
...,
[-0.975, -0.925, -2. ],
[-0.95 , -0.925, -2. ],
[-0.925, -0.925, -2. ]])
And I want to determine if an array is contained in that 2-D array and return its index.
qt=array([-1. , -1.05, -3. ])
I can convert both arrays to lists and use the list.index() function:
qlist=qv.tolist()
ql=qt.tolist()
qindex=qlist.index(ql)
But I would like to avoid doing this because I think it will be a performance hit.
This should do the trick,
import numpy as np
np.where((qv == qt).all(-1))
Or
import numpy as np
tol = 1e-8
diff = (qv - qt)
np.where((abs(diff) < tol).all(-1))
The second method might be more appropriate when floating point precision issues come into play. Also, there might be a better approach if you have many qt to test against. For example scipy.spatial.KDTree.

Euclidian Distances between points

I have an array of points in numpy:
points = rand(dim, n_points)
And I want to:
Calculate all the l2 norm (euclidian distance) between a certain point and all other points
Calculate all pairwise distances.
and preferably all numpy and no for's. How can one do it?
If you're willing to use SciPy, the scipy.spatial.distance module (the functions cdist and/or pdist) do exactly what you want, with all the looping done in C. You can do it with broadcasting too but there's some extra memory overhead.
This might help with the second part:
import numpy as np
from numpy import *
p=rand(3,4) # this is column-wise so each vector has length 3
sqrt(sum((p[:,np.newaxis,:]-p[:,:,np.newaxis])**2 ,axis=0) )
which gives
array([[ 0. , 0.37355868, 0.64896708, 1.14974483],
[ 0.37355868, 0. , 0.6277216 , 1.19625254],
[ 0.64896708, 0.6277216 , 0. , 0.77465192],
[ 1.14974483, 1.19625254, 0.77465192, 0. ]])
if p was
array([[ 0.46193242, 0.11934744, 0.3836483 , 0.84897951],
[ 0.19102709, 0.33050367, 0.36382587, 0.96880535],
[ 0.84963349, 0.79740414, 0.22901247, 0.09652746]])
and you can check one of the entries via
sqrt(sum ((p[:,0]-p[:,2] )**2 ))
0.64896708223796884
The trick is to put newaxis and then do broadcasting.
Good luck!

Categories