Python - generate random figures in ratio - python

I need a script which returns a list of random figures from
range(-100;+100) in ratio of positive/negative figures = 2/1. Current
wording returns voluntary ratio
import numpy as np
x=[]
for y in range(10):
y=np.random.randint(-100,100)
x.append(y)
print(x)

import numpy as np
neg = np.random.randint(-100, -1, 10)
poz = np.random.randint(0, 100, 20)
res = np.concatenate((neg, poz), axis=0)
print(res)
np.random.shuffle(res)#If you need to mix
print(res)
As an option.

Related

Replace outlier values with NaN in numpy? (preserve length of array)

I have an array of magnetometer data with artifacts every two hours due to power cycling.
I'd like to replace those indices with NaN so that the length of the array is preserved.
Here's a code example, adapted from https://www.kdnuggets.com/2017/02/removing-outliers-standard-deviation-python.html.
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = [x for x in y if (x > mean - 2 * sd)]
final_list = [x for x in final_list if (x < mean + 2 * sd)]
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
px.line(y=y, x=x)
# px.scatter(y) # It looks like the outliers are successfully dropped.
# px.line(y=reject_outliers(y), x=x) # This is the line I'd like to see work.
When I run 'px.scatter(reject_outliers(y))', it looks like the outliers are successfully getting dropped:
...but that's looking at the culled y vector relative to the index, rather than the datetime vector x as in the above plot. As the debugging text indicates, the vector is shortened because the outlier values are dropped rather than replaced.
How can I edit my 'reject_outliers()` function to assign those values to NaN, or to adjacent values, in order to keep the length of the array the same so that I can plot my data?
Use else in the list comprehension along the lines of:
[x if x_condition else other_value for x in y]
Got a less compact version to work. Full code:
import numpy as np
import plotly.express as px
# For pulling data from CDAweb:
from ai import cdas
import datetime
# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
'sp_phys',
'THG_L2_MAG_'+ 'PG2',
start,
end,
['thg_mag_'+ 'pg2']
)
x =data['UT']
y =data['VERTICAL_DOWN_-_Z']
def reject_outliers(y): # y is the data in a 1D numpy array
mean = np.mean(y)
sd = np.std(y)
final_list = np.copy(y)
for n in range(len(y)):
final_list[n] = y[n] if y[n] > mean - 5 * sd else np.nan
final_list[n] = final_list[n] if final_list[n] < mean + 5 * sd else np.nan
return final_list
px.scatter(reject_outliers(y))
print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
# px.line(y=y, x=x)
px.line(y=reject_outliers(y), x=x) # This is the line I wanted to get working - check!
More compact answer, sent via email by a friend:
In numpy you can select/index based on a Boolean array, and then make assignment with it:
def reject_outliers(y): # y is the data in a 1D numpy array
n = 5 # 5 std deviations
mean = np.mean(y)
sd = np.std(y)
final_list = y.copy()
final_list[np.abs(y - mean) > n * sd] = np.nan
return final_list
I also noticed that you didn’t use the value of n in your example code.
Alternatively, you can use the where method (https://numpy.org/doc/stable/reference/generated/numpy.where.html)
np.where(np.abs(y - mean) > n * sd, np.nan, y)
You don’t need the .copy() if you don’t mind modifying the input array.
Replace np.mean and np.std with np.nanmean and np.nanstd if you want the function to work on arrays that already contain nans, i.e. if you want to use this function recursively.
The answer about using if else in a list comprehension would work, but avoiding the list comprehension makes the function much faster if the arrays are large.

ValueError: Must pass 2-d input. shape=(1, 50, 2)

I was going to make a list of 50 random numbers, one to one hundred, and take the square and square root for the numbers, and I would put them into a nice PD.Dataframe (the first list is the random numbers which are the rows and the two other lists are the columns).
My Code looks like this:
import random
import math
import numpy as np
import pandas as pd
y = random.sample(range(1, 100), 50)
y1 = [f**2 for f in y]
y2 = [round(math.sqrt(f2)) for f2 in y]
whole = y + y1 + y2
whole2 = (np.array(whole)).reshape((50,3))
df = pd.DataFrame([whole2[:,1:]], index=whole2[:,0], columns=['Number', 'Square','Square Root'])
I would appreciate if some one could tell me where and how I went wrong. Thanks!
Your logic is a little off here. I think np.stack is a little more intuitive here as well. You can also remove y from the stack if you would only like the random numbers one time.
import random
import math
import numpy as np
import pandas as pd
y = random.sample(range(1, 100), 50)
y1 = [f**2 for f in y]
y2 = [round(math.sqrt(f2)) for f2 in y]
whole = np.stack([y1, y2], axis=-1)
df = pd.DataFrame(whole, index=y, columns=['Square','Square Root'])

Creating a vector of values based off a test using a for loop

This feels like it should be a simple problem but I am newer to python, in R i would use a foreach loop that gave me an option to combine.
I have tried a for loop that lets me print out all the values i need but i want them collected into a vector of values that i can use later.
from scipy.stats import gamma
import scipy.stats as stats
import numpy as np
import random
data2 = np.random.gamma(1,2, size = 500)
gammT = np.log(data2 + 1)
mean = np.mean(gammT)
sd = np.std(gammT)
a = (mean/ sd)**2
b = (sd**2)/ mean
for i in range(1,100):
gammT = random.sample(list(gammT), 500)
gamm = np.random.gamma(a,b, size = len(gammT))
s = stats.anderson_ksamp([gammT,gamm])
s = s[2]
print(s)
So i am able to print all the values i want but i want them all to be gathered together in a vector of values. I have tried to append and make lists but am not able to get them together.
from scipy.stats import gamma
import scipy.stats as stats
import numpy as np
import random
gammT = np.log(data2.iScore + 1)
mean = np.mean(gammT)
sd = np.std(gammT)
a = (mean/ sd)**2
b = (sd**2)/ mean
#initialize empty list
result=[]
for i in range(100):
# removed (1,100) you only need range(100) for 100 elements
gammT = random.sample(list(gammT), 500)
gamm = np.random.gamma(a,b, size = len(gammT))
s = stats.anderson_ksamp([gammT,gamm])
s = s[2]
#append calculation to list
result.append(s)
print(s)
print(result)

Use a more accurate array of x values to generate line of best fit in matplotlib?

I am currently stuck on a problem on which I am required to generate a curve of best fit which I am required to use a more precise x array from 250 to 100 in steps of 10. Here is my code below so far..
import numpy as np
from numpy import polyfit, polyval
import matplotlib.pyplot as plt
x = [250,300,350,400,450,500,550,600,700,750,800,900,1000]
x = np.array(x)
y = [0.791, 0.846, 0.895, 0.939, 0.978, 1.014, 1.046, 1.075, 1.102, 1.148, 1.169, 1.204, 1.234]
y= np.array(y)
r = polyfit(x,y,3)
fit = polyval(r, x)
plt.plot(x, fit, 'b')
plt.plot(x,y, color = 'r', marker = 'x')
plt.show()
If I understand correctly, you are trying to create an array of numbers from a to b by steps of c.
With pure python you can use:
list(range(a, b, c)) #in your case list(range(250, 1000, 10))
Or, since you are using numpy you can directly make the numpy array:
np.arange(a, b, c)
To create an array in steps you can use numpy.arange([start,] stop[, step]):
import numpy as np
x = np.arange(250,1000,10)
To generate values from 250-1000, use range(start, stop, step):
x = range(250,1001,10)
x = np.array(x)

Calculating tvalue using numpy

As part of an exercise i needed to check whether a given sample's true mean is 1.75 or not by generating tvalue using numpy and compare with the output from scipy.
Code:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
np.random.seed(seed=42) # make example reproducible
n = 100
x = np.random.normal(loc=1.78, scale=.1, size=n) # the sample is here
tval, pval = stats.ttest_1samp(x, 1.75)
var_x = x.var(ddof=1)
std_x = np.sqrt(var_x)
tval1 = (x.mean() - 1.75)/(std_x*np.sqrt(n))
print("Scipy: ",tval,"\nNumpy: ",tval1)
The output from Scipy is 2.1598800019529265,
while output from numpy is 0.021598800019529265
I guess the logic i used is incorrect, Please suggest.
You made a mistake in the denominator. It should be
tval1 = (x.mean() - 1.75)/(std_x / np.sqrt(n)) # (std_x divided by root n)
That's why you will find there is a factor of 100 difference ((1/10)/10 = 1/100) between your Scipy and numpy output.
Here is the Wiki of Student's t-test
An example using another sample size:
np.random.seed(seed=42)
n = 369
x = np.random.normal(loc=1.78, scale=.1, size=n) # the sample is here
tval, pval = stats.ttest_1samp(x, 1.75)
var_x = x.var(ddof=1)
std_x = np.sqrt(var_x)
tval1 = (x.mean() - 1.75)/(std_x / np.sqrt(n))
print("Scipy: ",tval,"\nNumpy: ",tval1)
# Output:
# Scipy: 6.306500305262841
# Numpy: 6.306500305262841

Categories