Related
I'm writing a function to find the percentage change using Numpy and function calls. So far what I got is:
def change(a,b):
answer = (np.subtract(a[b+1], a[b])) / a[b+1] * 100
return answer
print(change(a,0))
"a" is the array I have made and b will be the index/numbers I am trying to calculate.
For example:
My Array is
[[1,2,3,5,7]
[1,4,5,6,7]
[5,8,9,10,32]
[3,5,6,13,11]]
How would I calculate the percentage change between 1 to 2 (=0.5) or 1 to 4(=0.75) or 5,7 etc..
Note: I know how mathematically to get the change, I'm not sure how to do this in python/ numpy.
If I understand correctly, that you're trying to find percent change in each row, then you can do:
>>> np.diff(a) / a[:,1:] * 100
Which gives you:
array([[ 50. , 33.33333333, 40. , 28.57142857],
[ 75. , 20. , 16.66666667, 14.28571429],
[ 37.5 , 11.11111111, 10. , 68.75 ],
[ 40. , 16.66666667, 53.84615385, -18.18181818]])
I know you have asked this question with Numpy in mind and got answers above:
import numpy as np
np.diff(a) / a[:,1:]
I attempt to solve this with Pandas. For those who would have the same question but using Pandas instead of Numpy
import pandas as pd
data = [[1,2,3,4,5],
[1,4,5,6,7],
[5,8,9,10,32],
[3,5,6,13,11]]
df = pd.DataFrame(data)
df_change = df.rolling(1,axis=1).sum().pct_change(axis=1)
print(df_change)
I suggest to simply shift the array. The computation basically becomes a one-liner.
import numpy as np
arr = np.array(
[
[1, 2, 3, 5, 7],
[1, 4, 5, 6, 7],
[5, 8, 9, 10, 32],
[3, 5, 6, 13, 11],
]
)
# Percentage change from row to row
pct_chg_row = arr[1:] / arr[:-1] - 1
[[ 0. 1. 0.66666667 0.2 0. ]
[ 4. 1. 0.8 0.66666667 3.57142857]
[-0.4 -0.375 -0.33333333 0.3 -0.65625 ]]
# Percentage change from column to column
pct_chg_col = arr[:, 1::] / arr[:, 0:-1] - 1
[[ 1. 0.5 0.66666667 0.4 ]
[ 3. 0.25 0.2 0.16666667]
[ 0.6 0.125 0.11111111 2.2 ]
[ 0.66666667 0.2 1.16666667 -0.15384615]]
You could easily generalize the task, so that you are not limited to compute the change from one row/column to another, but be able to compute the change for n rows/columns.
n = 2
pct_chg_row_generalized = arr[n:] / arr[:-n] - 1
[[4. 3. 2. 1. 3.57142857]
[2. 0.25 0.2 1.16666667 0.57142857]]
pct_chg_col_generalized = arr[:, n:] / arr[:, :-n] - 1
[[2. 1.5 1.33333333]
[4. 0.5 0.4 ]
[0.8 0.25 2.55555556]
[1. 1.6 0.83333333]]
If the output array must have the same shape as the input array, you need to make sure to insert the appropriate number of np.nan.
out_row = np.full_like(arr, np.nan, dtype=float)
out_row[n:] = arr[n:] / arr[:-n] - 1
[[ nan nan nan nan nan]
[ nan nan nan nan nan]
[4. 3. 2. 1. 3.57142857]
[2. 0.25 0.2 1.16666667 0.57142857]]
out_col = np.full_like(arr, np.nan, dtype=float)
out_col[:, n:] = arr[:, n:] / arr[:, :-n] - 1
[[ nan nan 2. 1.5 1.33333333]
[ nan nan 4. 0.5 0.4 ]
[ nan nan 0.8 0.25 2.55555556]
[ nan nan 1. 1.6 0.83333333]]
Finally, a small function for the general 2D case might look like this:
def np_pct_chg(arr: np.ndarray, n: int = 1, axis: int = 0) -> np.ndarray:
out = np.full_like(arr, np.nan, dtype=float)
if axis == 0:
out[n:] = arr[n:] / arr[:-n] - 1
elif axis == 1:
out[:, n:] = arr[:, n:] / arr[:, :-n] - 1
return out
The accepted answer is close but incorrect if you're trying to take % difference from left to right.
You should get the following percent difference:
1,2,3,5,7 --> 100%, 50%, 66.66%, 40%
check for yourself: https://www.calculatorsoup.com/calculators/algebra/percent-change-calculator.php
Going by what Josmoor98 said, you can use np.diff(a) / a[:,:-1] * 100 to get the percent difference from left to right, which will give you the correct answer.
array([[100. , 50. , 66.66666667, 40. ],
[300. , 25. , 20. , 16.66666667],
[ 60. , 12.5 , 11.11111111, 220. ],
[ 66.66666667, 20. , 116.66666667, -15.38461538]])
import numpy as np
a = np.array([[1,2,3,5,7],
[1,4,5,6,7],
[5,8,9,10,32],
[3,5,6,13,11]])
np.array([(i[:-1]/i[1:]) for i in a])
Combine all your arrays.
Then make a data frame from them.
df = pd.df(data=array you made)
Use the pct_change() function on dataframe. It will calculate the % change for all rows in dataframe.
I am a beginner in Python and am stuck on a problem. I have two lists of 60 floating point numbers, lets call them start and end. The numbers in both the lists are not in an increasing or decreasing order.
start = [ ] //60 floating point numbers
end = [ ] // 60 floating numbers
I would like to find 1000 interpolated values between start[0] and end[0] and repeat the process for all 60 values of list. How do I go about it?
You can do this with a list comprehension and using numpy.linspace
import numpy as np
[np.linspace(first, last, 1000) for first, last in zip(start, end)]
As a small example (with fewer values)
>>> start = [1, 5, 10]
>>> end = [2, 10, 20]
>>> [np.linspace(first, last, 5) for first, last in zip(start, end)]
[array([ 1. , 1.25, 1.5 , 1.75, 2. ]),
array([ 5. , 6.25, 7.5 , 8.75, 10. ]),
array([ 10. , 12.5, 15. , 17.5, 20. ])]
I have two vectors:
time_vec = np.array([0.2,0.23,0.3,0.4,0.5,...., 28....])
values_vec = np.array([500,200,220,250,200,...., 218....])
time_vec.shape == values_vec.shape
Now, I want to take the bin the values for every 0.5 second interval and take the mean of the values. So for example
value_vec = np.array(mean_of(500,200,220,250,200), mean_of(next values in next 0.5 second interval))
Is there any numpy method I am missing which bin and take mean of the bins?
You may use np.ufunc.reduceat. You just need to populate where the breaking points are, i.e. when floor(t / .5) changes:
say for:
>>> t
array([ 0. , 0.025 , 0.2125, 0.2375, 0.2625, 0.3375, 0.475 , 0.6875, 0.7 , 0.7375, 0.8 , 0.9 ,
0.925 , 1.05 , 1.1375, 1.15 , 1.1625, 1.1875, 1.1875, 1.225 ])
>>> b
array([ 0.8144, 0.3734, 1.4734, 0.6307, -0.611 , -0.8762, 1.6064, 0.3863, -0.0103, -1.6889, -0.4328, -0.7373,
1.7856, 0.8938, -1.1574, -0.4029, -0.4352, -0.4412, -1.7819, -0.3298])
the break points are:
>>> i = np.r_[0, 1 + np.nonzero(np.diff(np.floor(t / .5)))[0]]
>>> i
array([ 0, 7, 13])
and the sum over each interval is:
>>> np.add.reduceat(b, i)
array([ 3.411 , -0.6975, -3.6545])
and the mean would be sum over length of interval:
>>> np.add.reduceat(b, i) / np.diff(np.r_[i, len(b)])
array([ 0.4873, -0.1162, -0.5221])
You can pass a weights= parameter to np.histogram to compute the summed values within each time bin, then normalize by the bin count:
# 0.5 second time bins to average within
tmin = time_vec.min()
tmax = time_vec.max()
bins = np.arange(tmin - (tmin % 0.5), tmax - (tmax % 0.5) + 0.5, 0.5)
# summed values within each bin
bin_sums, edges = np.histogram(time_vec,bins=bins, weights=values_vec)
# number of values within each bin
bin_counts, edges = np.histogram(time_vec,bins=bins)
# average value within each bin
bin_means = bin_sums / bin_counts
You can use np.bincount that is supposedly pretty efficient for such binning operations. Here's an implementation based on it to solve our case -
# Find indices where 0.5 intervals shifts onto next ones
A = time_vec*2
idx = np.searchsorted(A,np.arange(1,int(np.ceil(A.max()))),'right')
# Setup ID array such that all 0.5 intervals are ID-ed same
out = np.zeros((A.size),dtype=int)
out[idx[idx < A.size]] = 1
ID = out.cumsum()
# Finally use bincount to sum and count elements of same IDs
# and thus get mean values per ID
mean_vec = np.bincount(ID,values_vec)/np.bincount(ID)
Sample run -
In [189]: time_vec
Out[189]:
array([ 0.2 , 0.23, 0.3 , 0.4 , 0.5 , 0.7 , 0.8 , 0.92, 0.95,
1. , 1.11, 1.5 , 2. , 2.3 , 2.5 , 4.5 ])
In [190]: values_vec
Out[190]: array([36, 11, 93, 32, 72, 75, 26, 28, 77, 31, 60, 77, 76, 32, 6, 85])
In [191]: ID
Out[191]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 4, 4, 5], dtype=int32)
In [192]: mean_vec
Out[192]: array([ 48.8, 47.4, 68.5, 76. , 19. , 85. ])
I've got a list of sorted samples. They're sorted by their sample time, where each sample is taken one second after the previous one.
I'd like to find the minimum value in a neighborhood of a specified size.
For example, given a neighborhood size of 2 and the following sample size:
samples = [ 5, 12.3, 12.3, 7, 2, 6, 9, 10, 5, 9, 17, 2 ]
I'd expect the following output: [5, 2, 5, 2]
What would be the best way to achieve this in numpy / scipy
Edited: Explained the reasoning behind the min values:
5 - the 2 number window next to it are [12.3 12.3]. 5 is smaller
2 - to the left [12.3, 7] to the right [6 9]. 2 is the min
5 - to the left [9 10] to the right [9 17]. 5 is the min
notice that 9 isn't min are there's a 2 window to its left and right with a smaller value (2)
Use scipy's argrelextrema:
>>> import numpy as np
>>> from scipy.signal import argrelextrema
>>> data = np.array([ 5, 12.3, 12.3, 7, 2, 6, 9, 10, 5, 9, 17, 2 ])
>>> radius = 2 # number of elements to the left and right to compare to
>>> argrelextrema(data, np.less, order=radius)
(array([4, 8]),)
Which suggest that numbers at position 4 and 8 (2 and 5) are the smallest ones in within a 2 size neighbourhood. The numbers at boundaries (5 and 2) are not detected since argrelextrema only supports clip or wrap boundary conditions. As for your question, I guess you are interested in them too. To detect them, it is easy to add reflect boundary conditions first:
>>> new_data = np.pad(data, radius, mode='reflect')
>>> new_data
array([ 12.3, 12.3, 5. , 12.3, 12.3, 7. , 2. , 6. , 9. ,
10. , 5. , 9. , 17. , 2. , 17. , 9. ])
With the data with the corresponding boundary conditions, we can now apply the previus extrema detector:
>>> arg_minimas = argrelextrema(new_data, np.less, order=radius)[0] - radius
>>> arg_minimas
array([ 0, 4, 8, 11])
Which returns the positions where the local extrema (minimum in this case since np.less) happens in a sliding window of radius=2.
NOTE the -radius to fix the +radius index after wrapping the array with reflect boundary conditions with np.pad.
EDIT: if you are insterested in the values and not in positions, it is straight forward:
>>> data[arg_minimas]
array([ 5., 2., 5., 2.])
It seems, basically you are finding local minima in a sliding window, but that sliding window slides in such a manner that the ending of the previous window act as the starting of a new window. For such a specific problem, suggested in this solution is a vectorized approach that uses broadcasting -
import numpy as np
# Inputs
N = 2
samples = [ 5, 12.3, 12.3, 7, 2, 6, 9, 10, 5, 9, 17, 2 ]
# Convert input list to a numpy array
S = np.asarray(samples)
# Calculate the number of Infs to be appended at the end
append_endlen = int(2*N*np.ceil((S.size+1)/(2*N))-1 - S.size)
# Append Infs at the start and end of the input array
S1 = np.concatenate((np.repeat(np.Inf,N),S,np.repeat(np.Inf,append_endlen)),0)
# Number of sliding windows
num_windows = int((S1.size-1)/(2*N))
# Get windowed values from input array into rows.
# Thus, get minimum from each row to get the desired local minimum.
indexed_vals = S1[np.arange(num_windows)[:,None]*2*N + np.arange(2*N+1)]
out = indexed_vals.min(1)
Sample runs
Run # 1: Original input data
In [105]: S # Input array
Out[105]:
array([ 5. , 12.3, 12.3, 7. , 2. , 6. , 9. , 10. , 5. ,
9. , 17. , 2. ])
In [106]: N # Window radius
Out[106]: 2
In [107]: out # Output array
Out[107]: array([ 5., 2., 5., 2.])
Run # 2: Modified input data, Window radius = 2
In [101]: S # Input array
Out[101]:
array([ 5. , 12.3, 12.3, 7. , 2. , 6. , 9. , 10. , 5. ,
9. , 17. , 2. , 0. , -3. , 7. , 99. , 1. , 0. ,
-4. , -2. ])
In [102]: N # Window radius
Out[102]: 2
In [103]: out # Output array
Out[103]: array([ 5., 2., 5., -3., -4., -4.])
Run # 3: Modified input data, Window radius = 3
In [97]: S # Input array
Out[97]:
array([ 5. , 12.3, 12.3, 7. , 2. , 6. , 9. , 10. , 5. ,
9. , 17. , 2. , 0. , -3. , 7. , 99. , 1. , 0. ,
-4. , -2. ])
In [98]: N # Window radius
Out[98]: 3
In [99]: out # Output array
Out[99]: array([ 5., 2., -3., -4.])
>>> import numpy as np
>>> a = np.array(samples)
>>> [a[max(i-2,0):i+2].min() for i in xrange(1, a.size)]
[5.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 2.0]
As Divakar pointed out in the comments, this is what a sliding window yields. If you want to remove duplicates, that can be done separately
This will look through each window, find the minimum value, and add it to a list if the window's minimum value isn't equal to the most recently added value.
samples = [5, 12.3, 12.3, 7, 2, 6, 9, 10, 5, 9, 17, 2]
neighborhood = 2
minima = []
for i in xrange(len(samples)):
window = samples[max(0, i - neighborhood):i + neighborhood + 1]
windowMin = min(window)
if minima == [] or windowMin != minima[-1]:
minima.append(windowMin)
This gives the output you described:
print minima
> [5, 2, 5, 2]
However, #imaluengo's answer is better since it will include both of two consecutive equal minimum values if they have different indices in the original list!
I have an array (2000 * 2000) with floats and I want to classify the numbers.
So all numbers between 10 and 20 should be replaced with 15 and numbers between 20 - 60 should be replaced with 40 and so on.
I wrote something looping over all the rows and columns with a couple of if statements... but it takes forever to run over large arrays. Does anybody know how to speed things up?
for a in range(grid.shape[0]): #grid is an array
for b in range(grid.shape[1]):
for c in range(len(z)):
if z[c][0] <= grid[a][b] < z[c][1]: # z is a list containing [lower,upper,replace_value]
grid[a][b]=z[c][2]
Would something like this work for you?
>>> import numpy as np
>>> grid = np.random.random((5,5)) * 100
>>> z = np.array([0, 10, 20, 60, 100.])
>>> replace_value = np.array([np.nan, 5., 15., 40., 80.])
>>> grid = replace_value[z.searchsorted(grid)]
>>> print grid
[[ 15. 40. 80. 80. 15.]
[ 80. 40. 15. 80. 80.]
[ 15. 80. 5. 15. 40.]
[ 40. 80. 5. 5. 80.]
[ 40. 5. 80. 5. 40.]]