I try to figure it out how can I implement a function in python to generate arrays like this (please ignore the fact that is the same number):
[[0101111010, 0101111010,0101111010, 0101111010,0101111010,0101111010,0101111010],
[0101111010,0101111010,0101111010,0101111010,0101111010,0101111010,0101111010]]
I write this code but I don't know if it's the best idea:
import numpy as np
import random
sol_per_pop = 20
num_genes = 10
pop_size = (sol_per_pop,num_genes)
population = np.random.randint(2, size=pop_size)
print(population)
I don't want to use string. I want to find the best solution. Thank you!
I don't really see what this would be useful for. But it might still be fun.
import random
int("{:b}".format(random.randint(0,1<<64)))
Or:
import random
r = random.randint(0,1<<64)
sum(10**i*((r>>i)&1) for i in range(64))
Or, if we must use numpy:
import numpy as np
significance = 10**np.cumsum(np.ones(64))
np.sum(significance*np.random.default_rng().integers(0, 2, 64))
Yet another idea:
import numpy as np
rng = np.random.default_rng()
significance = 10**np.cumsum(np.ones(64))
result = np.zeros(64)
result[nrng.choice(range(64), rng.integers(0, 64))] = 1
np.sum(significance*result)
As you can see, there are many approaches to solve the same problem.
Related
While I'm able to get the population standard deviation for 1D list in Python, I'm unable to apply statistics.pstdev() to the 2D array but believe it is something as simple as modifying the function:
def StdDevArr(ArrayList):
output = [pstdev(elem) for elem in zip(*ArrayList)]
return output
to work with concatenated 2D arrays (concatenated confusion matrices) instead of concatenated 1D arrays, though I'm not sure how to do it. I am unsure if the np.std(ddof=0) is the same, though the author of a similar question seems to think as well as the answer here though a comment to that answer says they are not equal (So I'm still unsure). In any case it would be useful to know how to do it with pstdev as I am using that in other parts of my actual code, though if you know if pstdev() == std(ddof=0) for sure, that would be helpful as well.
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
from numpy import random
from statistics import pstdev
def randomboolean(size):
bitarray = random.randint(2, size=10)
return bitarray
def StdDevArr(ArrayList):
output = [pstdev(elem) for elem in zip(*ArrayList)]
return output
size = 10
NumberofTrials = 10
ConfMatArr = []
for i in range(NumberofTrials):
data = {'y_Actual': randomboolean(size),
'y_Predicted': randomboolean(size)
}
df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'], margins = True)
ConfMatArr.append(confusion_matrix)
cm_concat = pd.concat(ConfMatArr)
cm_group = cm_concat.groupby(cm_concat.index)
cm_groupmean = cm_group.mean()
# cm_popstd = cm_group.std(ddof=0)
cm_popstd = StdDevArr(cm_group)
sn.heatmap(cm_groupmean, annot=True)
plt.show()
I am trying to learn a bit of signal processing , specifically using Python. Here's a sample code I wrote.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import deconvolve
a = np.linspace(-1,1,50)
b = np.linspace(-1,1,50)**2
c = np.convolve(a,b,mode='same')
quotient,remainder = deconvolve(c,b);
plt.plot(a/max(a),"g")
plt.plot(b/max(b),"r")
plt.plot(c/max(c),"b")
plt.plot(remainder/max(remainder),"k")
#plt.plot(quotient/max(quotient),"k")
plt.legend(['a_original','b_original','convolution_a_b','deconvolution_a_b'])
In my understanding, the deconvolution of the convoluted array should return exactly the same array 'a' since I am using 'b' as the filter. This is clearly not the case as seen from the plots below.
I am not really sure if my mathematical understanding of deconvolution is wrong or if there is something wrong with the code. Any help is greatly appreciated!
You are using mode='same', and this seems not to be compatible with scipy deconvolve. Try with mode='full', it should work much better.
Here the corrected code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import deconvolve
a = np.linspace(-1,1,50)
b = np.linspace(-1,1,50)**2
c = np.convolve(a,b,mode='full')
quotient,remainder = deconvolve(c,b)
plt.plot(a,"g")
plt.plot(b,"r")
plt.plot(c,"b")
plt.plot(quotient,"k")
plt.xlim(0,50)
plt.ylim(-6,2)
plt.legend(['a_original','b_original','convolution_a_b','deconvolution_c_b'])
I am trying to create a function that will take a numpy dstr name as an argument and plot a histogram of random data points from that distribution.
if it only works on npy distributions that require 1 argument that is okay. Just really stuck trying to create the np.random.distribution()...
\
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
#Define a function (Fnc) that produces random numpy distributions (dstr)
#Fnc args: [npy dstr name as lst of str], [num of data pts]
def get_rand_dstr(dstr_name):
npdstr = dstr_name
dstr = np.random.npdstr(user.input("How many datapoints?"))
#here pass each dstr from dstr_name through for loop
#for loop will prompt user for required args of dstr (nbr of desired datapoints)
return plt.hist(df)
get_rand_dstr('chisquare')
Use this code, it might be helped you
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
def get_rand_dstr(dstr_name):
# npdstr = dstr_name
dstr = 'np.random.{}({})'.format(dstr_name, (input("How many datapoints?"))) # for using any distribution need to manipulate here
# cause number of args are diffrent for diffrent distibutions
print(dstr)
df = eval(dstr)
print(df)
# dstr1 = np.random.chisquare(int(input("How many datapoints?")))
# print(dstr1)
return plt.hist(df)
# get_rand_dstr('geometric')
get_rand_dstr('chisquare')
The accepted answer is incorrect and does not work. The problem is that NumPy's random distributions take different required arguments, so it's a little fiddly to pass size to all of them because it's a kwarg. (That's why the example in the accepted solution returns the wrong number of samples — only 1, not the 5 that were requested. That's because the first argument for chisquare is df, not size.)
It's common to want to invoke functions by name. As well as not working, the accepted answer uses eval() which is a common suggested solution to the issue. But it's generally accepted to be a bad idea, for various reasons.
A better way to achieve what you want is to define a dictionary that maps strings representing the names of functions to the functions themselves. For example:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
DISTRIBUTIONS = {
'standard_cauchy': np.random.standard_cauchy,
'standard_exponential': np.random.standard_exponential,
'standard_normal': np.random.standard_normal,
'chisquare': lambda size: np.random.chisquare(df=1, size=size),
}
def get_rand_dstr(dstr_name):
npdstr = DISTRIBUTIONS[dstr_name]
size = int(input("How many datapoints?"))
dstr = npdstr(size=size)
return plt.hist(dstr)
get_rand_dstr('chisquare')
This works fine — for the functions I made keys for. You could make more — there are 35 I think — but the problem is that they don't all have the same API. In other words, you can't call them all just with size as an argument. For example, np.random.chisquare() requires the parameter df or 'degrees of freedom'. Other functions require other things. You could make assumptions about those things and wrap all of the function calls (like I did, above, for chisquare)... if that's what you want to do?
I wish to perform a fourier transform of the function 'stress' from 0 to infinity and extract the real and imaginary parts. I have the following code that does it using a numerical integration technique:
import numpy as np
from scipy.integrate import trapz
import fileinput
import sys,string
window = 200000 # length of the array I wish to transform (number of data points)
time = np.linspace(1,window,window)
freq = np.logspace(-5,2,window)
output = [0]*len(freq)
for index,f in enumerate(freq):
visco = trapz(stress*np.exp(-1j*f*t),t)
soln = visco*(1j*f)
output[index] = soln
print 'f storage loss'
for i in range(len(freq)):
print freq[i],output[i].real,output[i].imag
This gives me a nice transformation of my input data.
Now I have an array of size 2x10^6, and using the above technique is not feasible(computation time scales as O(N^2)), so I have turned to the inbuilt fft function in numpy.
There aren't too many arguments that you can specify to change this function, and so I'm finding it difficult to customize it to my needs.
So far I have
import numpy as np
import fileinput
import sys, string
np.set_printoptions(threshold='nan')
N = len(stress)
fvi = np.fft.fft(stress,n=N)
gprime = fvi.real
gdoubleprime = fvi.imag
for i in range(len(stress)):
print gprime[i], gdoubleprime[i]
And it's not giving me accurate results.
The DFT in python is of the form A_k = summation(a_m * exp(-2*piimk/n)) where the summation is from m = 0 to m = n-1 (http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.fft.html). How can I change it to the form that I have mentioned in my first code, i.e. exp(-1jfreq*t) (freq is the frequency and t is the time which have already been predefined)? Or is there a post processing of the data that I have to do?
Thanks in advance for all your help.
I'm pretty new to programming and I have a quick question. I am trying to make a Gaussian function for a range of stars. However i want the size of undercurve be at 100 for all the stars. I was thinking of doing a while loop saying that while the total length of undercurve be 100. However, I get an error and I'm guessing it has something to do with it being a list. I'm showing you guys my code to see if you can help me out here. Thanks!
I get a syntax error: can't assign to call function
import numpy
import random
import math
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import scipy
from scipy import stats
from math import sqrt
from numpy import zeros
from numpy import numarray
variance = input("Input variance of the star:")
mean = input("Input mean of the star:")
space=numpy.linspace(-4,1,1000)
sigma = sqrt(variance)
Max = max(mlab.normpdf(space,mean,sigma))
normalized = (mlab.normpdf(space,mean,sigma))/Max
def random_y_pt():
return random.uniform(0,1)
def random_x_pt():
return random.uniform(-4,1)
import random
def undercurve(size):
result = []
for i in range(0,size):
y = random_y_pt()
x = random_x_pt()
if y < scipy.stats.norm(scale=variance,loc=mean).pdf(x):
result.append((x))
return result
size = 1
while len(undercurve(size)) < 100:
undercurve(size) = undercurve(1)+undercurve(size)
print undercurve(size)
plt.hist(undercurve(size),bins=20)
plt.show()
If your error is something like SyntaxError: can't assign to function call then that's because of your line
undercurve(size) = undercurve(1)+undercurve(size)
Which is trying to set the output of the right-hand side as the value of undercurve(size), which you cannot do.
It sounds like you actually want to see just the first 100 items in the list returned by undercurve(size). For that, use
undercurve(size)[:100]