what is the best way to create a NumPy array of a given size with values randomly and uniformly spread between -1 and 1?
I tried 2*np.random.rand(size)-1
I'm not sure. Try:
s = np.random.uniform(-1, 1, size)
reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.uniform.html
I can use numpy.arange:
import numpy as np
print(np.arange(start=-1.0, stop=1.0, step=0.2, dtype=np.float))
The step parameter defines the size and the uniformity in the distribution of the elements.
In your solution the np.random.rand(size) returns random floats in the half-open interval [0.0, 1.0)
this means 2 * np.random.rand(size) - 1 returns numbers in the half open interval [0, 2) - 1 := [-1, 1), i.e. range including -1 but not 1.
If this is what you wish to do then it is okay.
But, if you wish to generate numbers in the open interval (-1, 1), i.e. between -1 and 1 and hence not including either -1 or 1, may I suggest the following -
from numpy.random import default_rng
rg = default_rng(2)
size = (5,5)
rand_arr = rg.random(size)
rand_signs = rg.choice([-1,1], size)
rand_arr = rand_arr * rand_signs
print(rand_arr)
I have used the new suggested Generator per numpy, see link https://numpy.org/devdocs/reference/random/index.html#quick-start
100% working Code:
a = np.random.uniform(-1,1)
print(a)
Related
I have an array of magnetometer data with artifacts every two hours due to power cycling.
I'd like to replace those indices with NaN so that the length of the array is preserved.
Here's a code example, adapted from https://www.kdnuggets.com/2017/02/removing-outliers-standard-deviation-python.html.
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = [x for x in y if (x > mean - 2 * sd)]
final_list = [x for x in final_list if (x < mean + 2 * sd)]
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
px.line(y=y, x=x)
# px.scatter(y) # It looks like the outliers are successfully dropped.
# px.line(y=reject_outliers(y), x=x) # This is the line I'd like to see work.
When I run 'px.scatter(reject_outliers(y))', it looks like the outliers are successfully getting dropped:
...but that's looking at the culled y vector relative to the index, rather than the datetime vector x as in the above plot. As the debugging text indicates, the vector is shortened because the outlier values are dropped rather than replaced.
How can I edit my 'reject_outliers()` function to assign those values to NaN, or to adjacent values, in order to keep the length of the array the same so that I can plot my data?
Use else in the list comprehension along the lines of:
[x if x_condition else other_value for x in y]
Got a less compact version to work. Full code:
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
mean = np.mean(y)
sd = np.std(y)
final_list = np.copy(y)
for n in range(len(y)):
final_list[n] = y[n] if y[n] > mean - 5 * sd else np.nan
final_list[n] = final_list[n] if final_list[n] < mean + 5 * sd else np.nan
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
# px.line(y=y, x=x)
px.line(y=reject_outliers(y), x=x) # This is the line I wanted to get working - check!
More compact answer, sent via email by a friend:
In numpy you can select/index based on a Boolean array, and then make assignment with it:
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = y.copy()
final_list[np.abs(y - mean) > n * sd] = np.nan
return final_list
I also noticed that you didn’t use the value of n in your example code.
Alternatively, you can use the where method (https://numpy.org/doc/stable/reference/generated/numpy.where.html)
np.where(np.abs(y - mean) > n * sd, np.nan, y)
You don’t need the .copy() if you don’t mind modifying the input array.
Replace np.mean and np.std with np.nanmean and np.nanstd if you want the function to work on arrays that already contain nans, i.e. if you want to use this function recursively.
The answer about using if else in a list comprehension would work, but avoiding the list comprehension makes the function much faster if the arrays are large.
I'd like to sample n random numbers from a linspace without replacement and do so in batches. Thus, each sample in the batch should not have repeated numbers, but numbers may repeat across the batch.
The following code shows how I do it by calling Generator.choice repeatedly.
import numpy as np
low, high = 0, 10
sample_shape = (3,)
n = 5
rng = np.random.default_rng() # or previously instantiated RNG
space = np.linspace(start=low, stop=high, num=1000)
samples = np.stack(
[
rng.choice(space, size=n, replace=False)
for _ in range(np.prod(sample_shape, dtype=int))
]
)
samples = samples.reshape(sample_shape + (n,))
print(f"samples.shape: {samples.shape}")
print(samples)
Current output:
samples.shape: (3, 5)
[[4.15415415 5.56556557 1.38138138 7.78778779 7.03703704]
[1.48148148 6.996997 0.91091091 3.28328328 2.93293293]
[7.82782783 9.65965966 9.94994995 5.84584585 5.26526527]]
However, this procedure turns out to be a big bottleneck in my code. Is there a more efficient way of performing this?
I want to calculate root mean square of a function in Python. My function is in a simple form like y = f(x). x and y are arrays.
I tried Numpy and Scipy Docs and couldn't find anything.
I'm going to assume that you want to compute the expression given by the following pseudocode:
ms = 0
for i = 1 ... N
ms = ms + y[i]^2
ms = ms / N
rms = sqrt(ms)
i.e. the square root of the mean of the squared values of elements of y.
In numpy, you can simply square y, take its mean and then its square root as follows:
rms = np.sqrt(np.mean(y**2))
So, for example:
>>> y = np.array([0, 0, 1, 1, 0, 1, 0, 1, 1, 1]) # Six 1's
>>> y.size
10
>>> np.mean(y**2)
0.59999999999999998
>>> np.sqrt(np.mean(y**2))
0.7745966692414834
Do clarify your question if you mean to ask something else.
You could use the sklearn function
from sklearn.metrics import mean_squared_error
rmse = mean_squared_error(y_actual,[0 for _ in y_actual], squared=False)
numpy.std(x) tends to rms(x) in cases of mean(x) value tends to 0 (thanks to #Seb), like it can be with sound records, vibrations, and other signals of fluctuations from zero.
rms = lambda x_seq: (sum(x*x for x in x_seq)/len(x_seq))**(1/2)
In case you'd like to frame your array before compute RMS, this is a numpy solution:
nframes = 1000
rms = np.array([
np.sqrt(np.mean(arr**2))
for arr in np.array_split(arr,nframes)
])
If you'd like to specify frame length instead of frame counts, you'd do this first:
frame_length = 200
arr_length = arr.shape[0]
nframes = arr_length // frame_length +1
I have image in a numpy array of shape (3, height, width) and I want to create an of subimage views. I know exactly how many subimages I will have and can create each ones in cycle.
That's how I did it:
result_array = np.empty(
shape=(
int((res_img.shape[WIDTH] - SUB_IMG_WIDTH + 1) / step * (
res_img.shape[HEIGHT] - SUB_IMG_HEIGHT + 1) / step),
SUB_IMG_LAYERS, SUB_IMG_HEIGHT, SUB_IMG_WIDTH),
dtype=np.dtype(float))
for i in range(0, img.shape[WIDTH] - sub_img_shape[WIDTH], step):
for ii in range(0, img.shape[HEIGHT] - sub_img_shape[HEIGHT], step):
result_array[index] = img[:, i:i + sub_img_shape[WIDTH], ii:ii + sub_img_shape[HEIGHT]]
But instead of array of views I get array of copies. It's not problem by itself, I don't need to modify them, just use them simultaneously on GPU, but it's consume terrible amount of memory: My images have size about 1000x600 and I have roughly 100 000 subimages. So my array of subimages consume 3-4 Gb of my RAM.
I tried to store views in python list, like that:
for i in range(0, img.shape[WIDTH] - sub_img_shape[WIDTH], step):
for ii in range(0, img.shape[HEIGHT] - sub_img_shape[HEIGHT], step):
result_array.append(img[:, i:i + sub_img_shape[WIDTH], ii:ii + sub_img_shape[HEIGHT]])
And it worked, but I doubt that it's a good method. Any way I can do this with a numpy array and not a python list?
You can do it using the as_strided function:
import numpy as np
from numpy.lib.stride_tricks import as_strided
N=10
L=4*N
H=3*N
step=5
a=(np.arange(3*H*L)%256).reshape(3,H,L)
(k,j,i)=a.strides
b=as_strided (a,shape=(H/step,L/step,3,step,step),strides=(j*step,i*step,k,j,i))
b then address each bloc without copy.
In [29]: np.all(b[1,2]==a[:,5:10,10:15])
Out[29]: True
In [30]: a[:,5,10]=0 # modification of a
In [31]: np.all(b[1,2]==a[:,5:10,10:15])
Out[31]: True # b also modified
With numpy or scipy, is there any existing method that will return the endpoints of an interval which contains a specified percent of the values in a 1D array? I realize that this is simple to write myself, but it seems like the kind of thing that might be built in, although I can't find it.
E.g:
>>> import numpy as np
>>> x = np.random.randn(100000)
>>> print(np.bounding_interval(x, 0.68))
Would give approximately (-1, 1)
You can use np.percentile:
In [29]: x = np.random.randn(100000)
In [30]: p = 0.68
In [31]: lo = 50*(1 - p)
In [32]: hi = 50*(1 + p)
In [33]: np.percentile(x, [lo, hi])
Out[33]: array([-0.99206523, 1.0006089 ])
There is also scipy.stats.scoreatpercentile:
In [34]: scoreatpercentile(x, [lo, hi])
Out[34]: array([-0.99206523, 1.0006089 ])
I don't know of a built-in function to do it, but you can write one using the math package to specify approximate indices like this:
from __future__ import division
import math
import numpy as np
def bound_interval(arr_in, interval):
lhs = (1 - interval) / 2 # Specify left-hand side chunk to exclude
rhs = 1 - lhs # and the right-hand side
sorted = np.sort(arr_in)
lower = sorted[math.floor(lhs * len(arr_in))] # use floor to get index
upper = sorted[math.floor(rhs * len(arr_in))]
return (lower, upper)
On your specified array, I got the interval (-0.99072237819851039, 0.98691691784955549). Pretty close to (-1, 1)!