How to ignore specific numbers in a numpy moving average?

How to ignore specific numbers in a numpy moving average? - python

Let's say I have a simple numpy array:
a = np.array([1,2,3,4,5,6,7])
I can calculate the moving average of a window with size 3 simply like:
np.convolve(a,np.ones(3),'valid') / 3
which would yield
array([2., 3., 4., 5., 6.])
Now, I would like to take a moving average but exclude anytime the number '2' appears. In other words, for the first 3 numbers, originally, it would be (1 + 2 + 3) / 3 = 2. Now, I would like to do (1 + 3) / 2 = 2. How can I specify a user-defined number to ignore and calculate the running mean without including this user-defined number? I would like to keep this to some sort of numpy function without bringing in pandas.

You could replace the unwanted values with 0 using a mask and separately compute the number of valid items, then compute the ratio:
a = np.array([1,2,3,4,5,6,7])
mask = a != 2
num = np.convolve(np.where(mask, a, 0), np.ones(3), 'valid')
denom = np.convolve(mask, np.ones(3), 'valid')
out = num/denom
Output:
array([2. , 3.5, 4. , 5. , 6. ])

Related

How to do element-wise rounding of NumPy array to first non-zero digit?

I would like to "round" (not exact a mathematical rounding) the elements of a numpy array in the following way:
Given a numpy NxN or NxM 2D array with digit between 0.00001 to 9.99999 like
a=np.array([[1.232, 1.872,2.732,0.123],
[0.0019, 0.025, 1.854, 0.00017],
[1.457, 0.0021, 2.34 , 9.99],
[1.527, 3.3, 0.012 , 0.005]]
)
I would like basically to "round" this numpy array by selecting the first non-zero digit (irregardless of the digit that follows the first non-zero digit) of each element
giving the output:
output =np.array([[1.0, 1.0, 2.0, 0.1],
[0.001, 0.02, 1.0, 0.0001],
[1.0, 0.002, 2 , 9.0],
[1, 3, 0.01 , 0.005]]
)
thanks for any help!

You could use np.logspace and np.seachsorted to determine the order of magnitude of each element and then floor divide and multiply back
po10 = np.logspace(-10,10,21)
oom = po10[po10.searchsorted(a)-1]
a//oom*oom
# array([[1.e+00, 1.e+00, 2.e+00, 1.e-01],
# [1.e-03, 2.e-02, 1.e+00, 1.e-04],
# [1.e+00, 2.e-03, 2.e+00, 9.e+00],
# [1.e+00, 3.e+00, 1.e-02, 5.e-03]])

What you would want to do is to keep a fixed number of significant figures.
This functionality is not integrated into NumPy.
To get only the 1 significant figure, you could look into either #PaulPanzer or #darcamo answers (assuming that you only have positive values).
If you want something that works a specified number of significant figures, you could use something like:
def significant_figures(arr, num=1):
# : compute the order of magnitude
order = np.zeros_like(arr)
mask = arr != 0
order[mask] = np.floor(np.log10(np.abs(arr[mask])))
del mask # free unused memory
# : compute the corresponding precision
prec = num - order - 1
return np.round(arr * 10.0 ** prec) / 10.0 ** prec
print(significant_figures(a, 1))
# [[1.e+00 2.e+00 3.e+00 1.e-01]
# [2.e-03 2.e-02 2.e+00 2.e-04]
# [1.e+00 2.e-03 2.e+00 1.e+01]
# [2.e+00 3.e+00 1.e-02 5.e-03]]
print(significant_figures(a, 2))
# [[1.2e+00 1.9e+00 2.7e+00 1.2e-01]
# [1.9e-03 2.5e-02 1.9e+00 1.7e-04]
# [1.5e+00 2.1e-03 2.3e+00 1.0e+01]
# [1.5e+00 3.3e+00 1.2e-02 5.0e-03]]
EDIT
For truncated output use np.floor() instead of np.round() just before the return.

First get the powers of 10 for each number in the array with
powers = np.floor(np.log10(a))
In your example this gives us
array([[ 0., 0., 0., -1.],
[-3., -2., 0., -4.],
[ 0., -3., 0., 0.],
[ 0., 0., -2., -3.]])
Now, if we divide the i-th element in the array by 10**power_i we essentially move each number non-zero element in the array to the first position. Now we can simple take the floor to remove the other non-zero digits and then multiply the result by 10**power_i to get back to the original scale.
The complete solution is then only the code below
powers = np.floor(np.log10(a))
10**powers * np.floor(a/10**powers)
What about numbers greater than or equal to 10?
For this you can simply take np.floor of the original value in the array. We can do this easily with a mask. You can modify the answer as below
powers = np.floor(np.log10(a))
result = 10**powers * np.floor(a/10**powers)
mask = a >= 10
result[mask] = np.floor(a[mask])
You can also use a mask to avoid computing the powers and logarithm for numbers that will just be replaced later.

How can I generate random matrix value chromosome in python which the sum of each chromosome equals to 1?

I want to generate random matrix chromosome with max value 1.0, min value 0, and the size is 10, Which every each rows equals to 1.
I have randomize chromosome using random.uniform in matrix with max value 1.0, min value 0, and the size is 10. How do I need to do if I want every each rows equals to 1? Thank you.
Input
import random
import numpy
def create_reference_solution(chromosome_length):
reference = numpy.random.uniform(low=0, high=1.0, size=(10,chromosome_length)) # Create array chromosome
return reference
print(create_reference_solution(4)) # print array
Output
[[0.49610843 0.73632368 0.38089333 0.38195847]
[0.97371743 0.8245768 0.7576383 0.69000418]
[0.57430261 0.02274222 0.36947273 0.69866079]
[0.89639171 0.69387191 0.23348819 0.98811965]
[0.14153835 0.10603574 0.25907029 0.349709 ]
[0.04914772 0.54748797 0.18464009 0.99592558]
[0.09897709 0.71638782 0.31578413 0.15487327]
[0.19852756 0.5675573 0.09665754 0.27815583]
[0.9085627 0.0907393 0.0585448 0.00976053]
[0.05092392 0.46098409 0.12467901 0.48316205]]

You can make an array of random numbers between zero and one that is one less than your chromosome length. Sort the list, then find the differences between 0 and the first, first and second...last and 1.0. You can think of these as divisions of the unit. np.diff() is handy for this because you can prepend 0 and append 1 to get exactly what you want.
For example:
import numpy as np
chromosome_length = 4
reference = np.random.uniform(low=0, high=1.0, size=(10,chromosome_length - 1)) # Create array chromosome
reference.sort(axis = 1)
diffs = np.diff(reference, prepend=0, append=1, axis=1)
print(diffs)
np.sum(diffs, axis=1)
diffs will be something like:
[[0.33912643 0.06899929 0.39308693 0.19878736]
[0.09431517 0.1920815 0.5591725 0.15443084]
[0.13874118 0.05951455 0.45170353 0.35004074]
[0.07826248 0.09976879 0.27325618 0.54871255]
[0.01879091 0.28365535 0.5275187 0.17003504]
[0.13071614 0.60090562 0.1776917 0.09068653]
[0.03938235 0.59978608 0.00799955 0.35283202]
[0.14483008 0.51857752 0.31868394 0.01790846]
[0.42866068 0.12372462 0.07070687 0.37690784]
[0.25118504 0.10828291 0.45142439 0.18910767]]
The sums of the rows is:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Python Optimization: Using vector technique to find power of each matrix in an numpy array

3D numpy array A contains a series (in this example, I am choosing 3) of 2D numpy array D of shape 2 x 2. The D matrix is as follows:
D = np.array([[1,2],[3,4]])
A is initialized and assigned as below:
idx = np.arange(3)
A = np.zeros((3,2,2))
A[idx,:,:] = D # This gives A = [[[1,2],[3,4]],[[1,2],[3,4]],\
# [[1,2],[3,4]]]
# In mathematical notation: A = {D, D, D}
Now, essentially what I require after the execution of the codes is:
Mathematically, A = {D^0, D^1, D^2} = {D0, D1, D2}
where D0 = [[1,0],[0,1]], D1 = [[1,2],[3,4]], D2=[[7,10],[15,22]]
Is it possible to apply power to each matrix element in A without using a for-loop? I would be doing larger matrices with more in the series.
I had defined, n = np.array([0,1,2]) # corresponding to powers 0, 1 and 2 and tried
Result = np.power(A,n) but I do not get the desired output.
Is there are an efficient way to do it?
Full code:
D = np.array([[1,2],[3,4]])
idx = np.arange(3)
A = np.zeros((3,2,2))
A[idx,:,:] = D # This gives A = [[[1,2],[3,4]],[[1,2],[3,4]],\
# [[1,2],[3,4]]]
# In mathematical notation: A = {D, D, D}
n = np.array([0,1,2])
Result = np.power(A,n) # ------> Not the desired output.

A cumulative product exists in numpy, but not for matrices. Therefore, you need to make your own 'matcumprod' function. You can use np.dot for this, but np.matmul (or #) is specialized for matrix multiplication.
Since you state your powers always go from 0 to some_power, I suggest the following function:
def matcumprod(D, upto):
Res = np.empty((upto, *D.shape), dtype=A.dtype)
Res[0, :, :] = np.eye(D.shape[0])
Res[1, :, :] = D.copy()
for i in range(1,upto):
Res[i, :, :] = Res[i-1,:,:] # D
return Res
By the way, a loop often times outperforms a built-in numpy function if the latter uses a lot of memory, so don't fret over it if your powers stay within bounds...

Alright, i spent a lot of time on this problem but could not seem to find a vectorized solution in the way you'd like. So i would like to instead first propose a basic solution, and then perhaps an optimization if you require finding continuous powers.
The function you're looking for is called numpy.linalg.matrix_power
import numpy as np
D = np.matrix([[1,2],[3,4]])
idx = np.arange(3)
A = np.zeros((3,2,2))
A[idx,:,:] = D # This gives A = [[[1,2],[3,4]],[[1,2],[3,4]],\
# [[1,2],[3,4]]]
# In mathematical notation: A = {D, D, D}
np.zeros(A.shape)
n = np.array([0,1,2])
result = [np.linalg.matrix_power(D, i) for i in n]
np.array(result)
#Output:
array([[[ 1, 0],
[ 0, 1]],
[[ 1, 2],
[ 3, 4]],
[[ 7, 10],
[15, 22]]])
However, if you notice, you end up calculating multiple powers for the same base matrix. We could instead utilize the intermediate results and go from there, using numpy.linalg.multi_dot
def all_powers_arr_of_matrix(A):
result = np.zeros(A.shape)
result[0] = np.linalg.matrix_power(A[0], 0)
for i in range(1, A.shape[0]):
result[i] = np.linalg.multi_dot([result[i - 1], A[i]])
return result
result = all_powers_arr_of_matrix(A)
#Output:
array([[[ 1., 0.],
[ 0., 1.]],
[[ 1., 2.],
[ 3., 4.]],
[[ 7., 10.],
[15., 22.]]])
Also, we can avoid creating the matrix A entirely, saving some time.
def all_powers_matrix(D, *rangeargs): #end exclusive
''' Expects 2D matrix.
Use as all_powers_matrix(D, end) or
all_powers_matrix(D, start, end)
'''
if len(rangeargs) == 1:
start = 0
end = rangeargs[0]
elif len(rangeargs) == 2:
start = rangeargs[0]
end = rangeargs[1]
else:
print("incorrect args")
return None
result = np.zeros((end - start, *D.shape))
result[0] = np.linalg.matrix_power(A[0], start)
for i in range(start + 1, end):
result[i] = np.linalg.multi_dot([result[i - 1], D])
return result
return result
result = all_powers_matrix(D, 3)
#Output:
array([[[ 1., 0.],
[ 0., 1.]],
[[ 1., 2.],
[ 3., 4.]],
[[ 7., 10.],
[15., 22.]]])
Note that you'd need to add error handling if you decide to use these functions as-is.

To calculate power of matrix D, one way could be to find the eigenvalues and right eigenvectors of it with np.linalg.eig and then raise the power of the diagonal matrix as it is easier, then after some manipulation, you can use two np.einsum to calculate A
#get eigvalues and eigvectors
eigval, eigvect = np.linalg.eig(D)
# to check how it works, you can do:
print (np.dot(eigvect*eigval,np.linalg.inv(eigvect)))
#[[1. 2.]
# [3. 4.]]
# so you get back on D
#use power as ufunc of outer with n on the eigenvalues to get all the one you want
arrp = np.power.outer( eigval, n).T
#apply_along_axis to create the diagonal matrix along the last axis
diagp = np.apply_along_axis( np.diag, axis=-1, arr=arrp)
#finally use two np.einsum to calculate with the subscript to get what you want
A = np.einsum('lij,jk -> lik',
np.einsum('ij,kjl -> kil',eigvect,diagp), np.linalg.inv(eigvect)).round()
print (A)
print (A.shape)
#[[[ 1. 0.]
# [-0. 1.]]
#
# [[ 1. 2.]
# [ 3. 4.]]
#
# [[ 7. 10.]
# [15. 22.]]]
#
#(3, 2, 2)

I don't have a full solution, but there are some things I wanted to mention which are a bit too long for the comments.
You might first look into addition chain exponentiation if you are computing big powers of big matrices. This is basically asking how many matrix multiplications are required to compute A^k for a given k. For instance A^5 = A(A^2)^2 so you need to only three matrix multiplies: A^2 and (A^2)^2 and A(A^2)^2. This might be the simplest way to gain some efficiency, but you will probably still have to use explicit loops.
Your question is also related to the problem of computing Ax, A^2x, ... , A^kx for a given A and x. This is an active area of research right now (search "matrix powers kernel"), since computing such a sequence efficiently is useful for parallel/communication avoiding Krylov subspace methods. If you're looking for a very efficient solution to your problem it might be worth looking into some of the results about this.

How to efficiently apply functions to values in an array based on condition?

I have an array arorg like this:
import numpy as np
arorg = np.array([[-1., 2., -4.], [0.5, -1.5, 3]])
and another array values that looks as follows:
values = np.array([1., 0., 2.])
values has the same number of entries as arorg has columns.
Now I want to apply functions to the entries or arorg depending on whether they are positive or negative:
def neg_fun(val1, val2):
return val1 / (val1 + abs(val2))
def pos_fun(val1, val2):
return 1. / ((val1 / val2) + 1.)
Thereby, val2 is the (absolute) value in arorg and val1 - this is the tricky part - comes from values; if I apply pos_fun and neg_fun to column i in arorg, val1 should be values[i].
I currently implement that as follows:
ar = arorg.copy()
for (x, y) in zip(*np.where(ar > 0)):
ar.itemset((x, y), pos_fun(values[y], ar.item(x, y)))
for (x, y) in zip(*np.where(ar < 0)):
ar.itemset((x, y), neg_fun(values[y], ar.item(x, y)))
which gives me the desired output:
array([[ 0.5 , 1. , 0.33333333],
[ 0.33333333, 0. , 0.6 ]])
As I have to do these calculations very often, I am wondering whether there is a more efficient way of doing this. Something like
np.where(arorg > 0, pos_fun(xxxx), arorg)
would be great but I don't know how to pass the arguments correctly (the xxx). Any suggestions?

As hinted in the question, here's one using np.where.
First off, we are using a direct translation of the function implementation to generate values/arrays for both positive and negative cases. Then, with a mask of positive values, we will choose between those two arrays using np.where.
Thus, the implementation would look something along these lines -
# Get positive and negative values for all elements
val1 = values
val2 = arorg
neg_vals = val1 / (val1 + np.abs(val2))
pos_vals = 1. / ((val1 / val2) + 1.)
# Get a positive mask and choose between positive and negative values
pos_mask = arorg > 0
out = np.where(pos_mask, pos_vals, neg_vals)

You don't need to apply function to zipped elements of arrays, you can accomplish the same thing through simple array operations and slicing.
First, get the positive and negative calculation, saved as arrays. Then create a return array of zeros (just as a default value), and populate it using boolean slices of pos and neg:
import numpy as np
arorg = np.array([[-1., 2., -4.], [0.5, -1.5, 3]])
values = np.array([1., 0., 2.])
pos = 1. / ((values / arorg) + 1)
neg = values / (values + np.abs(arorg))
ret = np.zeros_like(arorg)
ret[arorg>0] = pos[arorg>0]
ret[arorg<=0] = neg[arorg<=0]
ret
# returns:
array([[ 0.5 , 1. , 0.33333333],
[ 0.33333333, 0. , 0.6 ]])

import numpy as np
arorg = np.array([[-1., 2., -4.], [0.5, -1.5, 3]])
values = np.array([1., 0., 2.])
p = 1.0/(values/arorg+1)
n = values/(values+abs(arorg))
#using np.place to extract negative values and put them to p
np.place(p,arorg<0,n[arorg<0])
print(p)
[[ 0.5 1. 0.33333333]
[ 0.33333333 0. 0.6 ]]

Filtering histogram edges and counts

Consider a histogram calculation of a numpy array that returns percentages:
# 500 random numbers between 0 and 10,000
values = np.random.uniform(0,10000,500)
# Histogram using e.g. 200 buckets
perc, edges = np.histogram(values, bins=200,
weights=np.zeros_like(values) + 100/values.size)
The above returns two arrays:
perc containing the % (i.e. percentages) of values within each pair of consecutive edges[ix] and edges[ix+1] out of the total.
edges of length len(hist)+1
Now, say that I want to filter perc and edges so that I only end up with the percentages and edges for values contained within a new range [m, M]. '
That is, I want to work with the sub-arrays of perc and edges corresponding to the interval of values within [m, M]. Needless to say, the new array of percentages would still refer to the total fraction count of the input array. We just want to filter perc and edges to end up with the correct sub-arrays.
How can I post-process perc and edges to do so?
The values of m and M can be any number of course. In the example above, we can assume e.g. m = 0 and M = 200.

m = 0; M = 200
mask = [(m < edges) & (edges < M)]
>>> edges[mask]
array([ 37.4789683 , 87.07491593, 136.67086357, 186.2668112 ])
Let's work on a smaller dataset so that it is easier to understand:
np.random.seed(0)
values = np.random.uniform(0, 100, 10)
values.sort()
>>> values
array([ 38.34415188, 42.36547993, 43.75872113, 54.4883183 ,
54.88135039, 60.27633761, 64.58941131, 71.51893664,
89.17730008, 96.36627605])
# Histogram using e.g. 10 buckets
perc, edges = np.histogram(values, bins=10,
weights=np.zeros_like(values) + 100./values.size)
>>> perc
array([ 30., 0., 20., 10., 10., 10., 0., 0., 10., 10.])
>>> edges
array([ 38.34415188, 44.1463643 , 49.94857672, 55.75078913,
61.55300155, 67.35521397, 73.15742638, 78.9596388 ,
84.76185122, 90.56406363, 96.36627605])
m = 0; M = 50
mask = (m <= edges) & (edges < M)
>>> mask
array([ True, True, True, False, False, False, False, False, False,
False, False], dtype=bool)
>>> edges[mask]
array([ 38.34415188, 44.1463643 , 49.94857672])
>>> perc[mask[:-1]][:-1]
array([ 30., 0.])
m = 40; M = 60
mask = (m < edges) & (edges < M)
>>> edges[mask]
array([ 44.1463643 , 49.94857672, 55.75078913])
>>> perc[mask[:-1]][:-1]
array([ 0., 20.])

Well you might need some mathematics for this. The bins are equally spaced so you can determine which bin is the first to include and which is the last by using the width of each bin:
bin_width = edges[1] - edges[0]
Now compute the first and last valid bin:
first = math.floor((m - edges[0]) / bin_width) + 1 # How many bins from the left
last = math.floor((edges[-1] - M) / bin_width) + 1 # How many bins from the right
(Ignore the +1 for both if you want to include the bin containing m or M - but then be careful that you don't end up with negative values for first and last!)
Now you know how many bins to include:
valid_edges = edges[first:-last]
valid_perc = perc[first:-last]
This will exclude the first first points and the last last points.
Might be that I haven't payed enough attention to rounding and there is an "off by one" error included but I think the idea is sound. :-)
You probably need to catch special cases like M > edges[-1] but for readability I haven't included these.
Or if the bins are not equally spaced use boolean masks instead of the calculation:
first = edged[edges < m].size + 1
last = edged[edges > M].size + 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to ignore specific numbers in a numpy moving average? - python

Related

How to do element-wise rounding of NumPy array to first non-zero digit?

How can I generate random matrix value chromosome in python which the sum of each chromosome equals to 1?

Python Optimization: Using vector technique to find power of each matrix in an numpy array

How to efficiently apply functions to values in an array based on condition?

Filtering histogram edges and counts

Categories

Resources