How to compute average within given percentiles in Python? - python

I am doing some scientific computing and I couldn't find an elegant way of performing the following operation. Suppose I have a 2-dimensional numpy array D which stores measurements of a given quantity at several times along the day. Each row corresponds to a different measuring instrument and each column corresponds to a different moment in the day at which the measurement was done.
Consider a list of desired percentiles. For example:
quantiles = [0.25, 0.5, 0.75]
My goal is to compute the average measurement by percentile group, at each moment in the day. In other words, given a column of measurements, I would like to sort all the measurements from that column in groups respecting the quantiles above and then take averages within groups. Using the example, I would have 4 groups at each moment of the day: the measurements in the lower quartile, then the measurements between the 25th and 50th quartile, the ones between the 50th and the 75th and finally the ones in the last quartile. Therefore, if m is the number of moments in the day when measurements were taken and q is the number of elements in the quantiles variable, my desired output would be qxm numpy array.
Currently, I am doing this in the most inefficient and hard-coded way possible. Here we go:
quantiles = [0.25, 0.5, 0.75]
window = "30min"
moments = pd.date_range(start = "9:30", end = "16:00", freq = window).time
quantile_curves = np.zeros((len(quantiles)+1, len(moments)-1))
EmpQuantiles = np.quantile(D, quantiles, axis = 0)
for moment in range(len(moments)-1):
quantile_curves[0, moment] = np.mean(D[:, moment][D[:,moment] < EmpQuantiles[0, moment]])
quantile_curves[1, moment] = np.mean(D[:, moment][np.logical_and(D[:,moment] > EmpQuantiles[0, moment], D[:,moment] <EmpQuantiles[1, moment])])
quantile_curves[2, moment] = np.mean(D[:, moment][np.logical_and(D[:,moment] > EmpQuantiles[1, moment], D[:,moment] <EmpQuantiles[2, moment])])
quantile_curves[3, moment] = np.mean(D[:, moment][D[:,moment] > EmpQuantiles[2, moment]])
What's an elegant and simpler way of doing this? I couldn't find the answer here however there is a related (but not the same) question in R: ddply multiple quantiles by group
I intend to plot the evolution of the in-group average along the day. I show the plot I get below (I am satisfied with the plot and I get the result I want however I seek better way of computing the quantile_curves variable):
Thanks a lot in advance!

You can do it efficiently using masked_arrays:
import numpy as np
quantiles = [0.25, 0.5, 0.75]
print('quantiles:\n', quantiles)
moments = [f'moment {i}' for i in range(5)]
print('nb of moments:\n', len(moments))
nb_measurements = 10000
D = np.random.rand(nb_measurements,len(moments))
quantile_values = np.quantile(D,quantiles,axis=0)
print('quantile_values (for each moment):\n', quantile_values)
quantile_curves = np.zeros((len(quantiles)+1,len(moments)))
quantile_curves[0, :] = np.mean(np.ma.masked_array(D, mask=D>quantile_values[[0],:]), axis=0)
for q in range(len(quantiles)-1):
quantile_curves[q+1, :] = np.mean(np.ma.masked_array(D, mask=np.logical_or(D<quantile_values[[q],:], D>quantile_values[[q+1],:])), axis=0)
quantile_curves[len(quantiles), :] = np.mean(np.ma.masked_array(D, mask=D<quantile_values[[len(quantiles)-1],:]), axis=0)
print('mean for each group and at each moment:')
print(quantile_curves)
Output:
% python3 script.py
quantiles:
[0.25, 0.5, 0.75]
nb of moments:
5
quantile_values (for each moment):
[[0.25271343 0.25434056 0.24658732 0.24612319 0.25221014]
[0.51114344 0.50103699 0.49671249 0.49113293 0.49819521]
[0.75629377 0.75427293 0.74676209 0.74211813 0.7490436 ]]
mean for each group and at each moment
[[0.12650993 0.12823392 0.12492136 0.12200609 0.12655318]
[0.3826476 0.373516 0.37050513 0.36974876 0.37722219]
[0.63454102 0.63023986 0.62280545 0.61696283 0.6238492 ]
[0.87866019 0.87614489 0.87492553 0.87253142 0.87403426]]
Note that I'm using random values between 0 and 1 that's why the quantile values (extremities of groups intervals) are almost equql to quantiles. Also not that this code works for an arbitrary number of quantiles or moments.

Related

Histogram one array based on another array

I have two numpy arrays:
rates = [1.1, 0.8...]
zenith_anlges = [45, 20, ....]
both rates and zen_angles have the same length.
I also have some pre-defined zenith_angle bins,
zen_bins = [0, 10, 20,...]
What I need to do is bin the rates according to its corresponding zenith angle bins.
An ugly way to do it is
nbin = len(zen_bins)-1
norm_binned_zen = [[0]]*nbin
for i in range(nbin):
norm_binned_zen[i] = [0]
for i in range(len(rates)):
ind = np.searchsorted(zen_bins,zen_angles[i]) #The corresponding bin number
norm_binned_zen[ind-1].append(rates[i])
This is not very pythonic and is time consuming for large arrays. I believe there must be some more elegant way to do it?
The starting data (here randomly generated):
import numpy as np
rates = np.random.random(100)
zenith_angles = np.random.random(100)*90.0
zen_bins = np.linspace(0, 90, 10)
Since you are using numpy, you can use a one line solution:
norm_binned_zen = [rates[np.where((zenith_angles > low) & (zenith_angles <= high))] for low, high in zip(zen_bins[:-1], zen_bins[1:])]
Breaking this line into steps:
The list comprehension loops over pairs, the low and hight edges of each bin.
numpy.where is used to find the indexes of the angles inside the given bin in the zenith_angles array.
numpy indexing is used to select the rates values at the indexes obtained at previous step.

Python code to find minimum distance between points and a curve

I have some data and I have plotted magnitude against wavelength (the blue points). I then have some code that reads a model stellar population from a file, and plots this on the same graph (the pink line). In this code, there is a scale that can be adjusted that moves this line up or down on the graph. So far I have been changing the scale so that the line is as close as I can tell by eye to my points, but I would like to write some code that would calculate the value of the scale for which the total distance between my points and the line is a minimum. This is my code so far:
#Import modules
from math import *
import numpy as np
import matplotlib.pyplot as plt
# Specify data
wavelength =
np.array([357.389,445.832,472.355,547.783,620.246,752.243,891.252,2164.089])
magnitude =
np.array([24.0394,23.1925,23.1642,22.4794,21.7496,20.9047,20.4671,19.427])
# Create Graph
#plt.scatter(wavelength, magnitude)
#plt.ylim([25,18])
#plt.xlim([300,2200])
#plt.xlabel('wavelength (nm)')
#plt.ylabel('magnitude')
#plt.title('object 1')
#plt.show()
#plt.close()
#now - here is some code that reads a model stellar population model from a
file
lines = open('fig7b.dat').readlines()
wavelengths, luminosities = [],[]
for l in lines:
s = l.split()
wl = s[0]
old = s[-1]
if '#' not in wl:
wavelengths.append(float(wl)) #wavelength in angstroms
luminosities.append(float(old)) #luminosities are in log units!
scale = 3.5
c=3.e8
wavelengths = np.array(wavelengths)
nus = c/(wavelengths*1.e-10)
luminosities = np.array(luminosities) + scale
luminosity_density = np.log10(((10**luminosities)*wavelengths)/nus)
#plt.plot(wavelengths,luminosity_density)
#z = 1.0
#plt.plot(wavelengths*(1+z),luminosity_density,color='r')
#plt.axis([900, 10000, 25,31])
#plt.savefig('sed.png')
#plt.show()
#plt.close()
Mpc_to_cm = 3.086e24 #convert Mpc to cm
z = 0.3448 #our chosen redshift
D_L = 1841.7 * Mpc_to_cm
#remember luminosity_density is logged at the moment
flux_density = (10**luminosity_density) * (1+z) / (4*pi*D_L**2) #units will
be erg/s/cm^2/Hz
#now turn that into an AB magnitude - goes back to log
AB_mag = -2.5*np.log10(flux_density) - 48.6
#try plotting your photometry on here and play with z and D_L
plt.plot(wavelengths*(1+z),AB_mag,color='pink')
plt.scatter(wavelength*10., magnitude,color='cornflowerblue')
plt.axis([900, 25000, 30,18])
plt.xlabel('wavelength')
plt.ylabel('magnitude')
plt.title('object 1')
plt.savefig('sed_ab.png')
plt.show()
which gives a graph that looks like this:
Also it would be helpful to print the best scale value.
I'm very new to python and programming in general and the pink line isn't a simple equation (in the file I was given it is made up of a lot of data points) so I have been getting a bit stuck. Apologies if I am not using the correct language to describe my problem, and for the long code - a lot of the comments were previous plots my supervisor has kept from before when I had separate plots. (I am using python 2.7)
A link to fig7b.dat: https://drive.google.com/open?id=0B_tOncLLEAYsbG8wcHJMYVowOXc
First, create a list of points from the curve data so that each point corresponds to the first list of points (each corresponding pair of points will have the same X coordinate, i.e. the same wavelength).
Then the minimum distance between these two sets of points will simply be: (sum(points2)-sum(points1))/len(points1).
Look at the following example
points1 = [1.1, 1.4, 1.8, 1.9, 2.3, 1.7, 1.9, 2.7]
points2 = [8.4, 3.5, 2.9, 7.6, 0.1, 2.2, 3.3, 4.8]
def min_distance(first,second):
assert len(first) == len(second) # must have same size
result = (sum(second) - sum(first)) / len(first)
return result
print("Adding this value to the first series of points")
print("will provice minimum distance between curves")
print(min_distance(points1,points2))
Running this wil print value 2.25. If you add 2.25 to all values of points1, you will get the minimum possible distance between the two sets of points (which is 62.36 in this particular case).
In your problem, points1 will be the magnitude array. points2 will be the points from fig7b.dat corresponding to the wavelengths.
This assumes you want to minimize the sum of sqaures between the points and the curve. It also assumes distances are measured vertically (that is why you need to extract the points with the corresponding wavelengths).
If you want to write your own little code without using spicy.optimize I
would recommend:
use an interpolation of your theoretical spectrum to evaluate the theoretical value at each of your observed wavelength:
https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
e.g.:
from scipy.interpolate import interp1d
f2 = interp1d(wavelengths, luminosities, kind='cubic')
Than you can calculate \chi^{2} for every scale value you want to try and afterwards find the minumum.

Limiting a sequence of ratios to a range whilst maintaining overall increase/decrease of values they are multiplying

Sorry my maths isn't fantastic so you'll have to bear with me.
Let's say I have a ratio limit of 3.
I have a numpy array of sizes that are to be multiplied by the ratios and a numpy array of the ratios, some of which are within the limit, some of which aren't.
I need the ratios that are above the limit to be set to the limit and the ratios that are below the limit to be increased to account for the reduction of the ratios that were over the limit. The result would be the the sum of the sizes is still the same but the individual sizes haven't been altered by more than the limit
In [1]: import numpy as np
In [2]: sizes = np.array([2.0,4.0,6.0,8.0,10.0])
In [3]: ratios = np.array([0.5, 0.5, 5.0, 4.0, 0.5])
In [4]: print np.sum(sizes * ratios)
70.0
#result after limiting ratios would still be 70
Edit:
So in the example above the resulting ratios would be:
np.array([1.75, 1.75, 3.0, 3.0, 1.75])
In [4]: print np.sum(sizes * ratios)
70.0
The ratios that were previously above the limit have been reduced and the ratios that were below have been raised to compensate.
I think you are looking for something like this:
import numpy as np
def Spread_Ratios(ratios,sizes):
if np.dot(ratios,sizes)/np.sum(sizes)>3.:
print 'There is no solution!\n'
return None
if np.any(ratios>3.):
score = np.dot(sizes,ratios)
ratios_reduced = np.where(ratios>3.,3.,ratios)
score_reduced = np.dot(sizes,ratios_reduced)
delta_ratios = (score - score_reduced) / np.sum(sizes[ratios<3.])
new_ratios = ratios_reduced + np.where(ratios<3.,delta_ratios,0.)
return Spread_Ratios(new_ratios,sizes)
else:
return ratios,sizes
The recursive definition is necessary since it is possible that a weight below 3 (but close) is lifted above 3.
Furthermore it is possible that there exists no solution at all. This case is handled with the first if condition.

NumPy or SciPy to calculate weighted median

I'm trying to automate a process that JMP does (Analyze->Distribution, entering column A as the "Y value", using subsequent columns as the "weight" value). In JMP you have to do this one column at a time - I'd like to use Python to loop through all of the columns and create an array showing, say, the median of each column.
For example, if the mass array is [0, 10, 20, 30], and the weight array for column 1 is [30, 191, 9, 0], the weighted median of the mass array should be 10. However, I'm not sure how to arrive at this answer.
So far I've
imported the csv showing the weights as an array, masking values of 0, and
created an array of the "Y value" the same shape and size as the weights array (113x32). I'm not entirely sure I need to do this, but thought it would be easier than a for loop for the purpose of weighting.
I'm not sure exactly where to go from here. Basically the "Y value" is a range of masses, and all of the columns in the array represent the number of data points found for each mass. I need to find the median mass, based on the frequency with which they were reported.
I'm not an expert in Python or statistics, so if I've omitted any details that would be useful let me know!
Update: here's some code for what I've done so far:
#Boilerplate & Import files
import csv
import scipy as sp
from scipy import stats
from scipy.stats import norm
import numpy as np
from numpy import genfromtxt
import pandas as pd
import matplotlib.pyplot as plt
inputFile = '/Users/cl/prov.csv'
origArray = genfromtxt(inputFile, delimiter = ",")
nArray = np.array(origArray)
dimensions = nArray.shape
shape = np.asarray(dimensions)
#Mask values ==0
maTest = np.ma.masked_equal(nArray,0)
#Create array of masses the same shape as the weights (nArray)
fieldLength = shape[0]
rowLength = shape[1]
for i in range (rowLength):
createArr = np.arange(0, fieldLength*10, 10)
nCreateArr = np.array(createArr)
massArr.append(nCreateArr)
nCreateArr = np.array(massArr)
nmassArr = nCreateArr.transpose()
What we can do, if i understood your problem correctly. Is to sum up the observations, dividing by 2 would give us the observation number corresponding to the median. From there we need to figure out what observation this number was.
One trick here, is to calculate the observation sums with np.cumsum. Which gives us a running cumulative sum.
Example:
np.cumsum([1,2,3,4]) -> [ 1, 3, 6, 10]
Each element is the sum of all previously elements and itself. We have 10 observations here. so the mean would be the 5th observation. (We get 5 by dividing the last element by 2).
Now looking at the cumsum result, we can easily see that that must be the observation between the second and third elements (observation 3 and 6).
So all we need to do, is figure out the index of where the median (5) will fit.
np.searchsorted does exactly what we need. It will find the index to insert an elements into an array, so that it stays sorted.
The code to do it like so:
import numpy as np
#my test data
freq_count = np.array([[30, 191, 9, 0], [10, 20, 300, 10], [10,20,30,40], [100,10,10,10], [1,1,1,100]])
c = np.cumsum(freq_count, axis=1)
indices = [np.searchsorted(row, row[-1]/2.0) for row in c]
masses = [i * 10 for i in indices] #Correct if the masses are indeed 0, 10, 20,...
#This is just for explanation.
print "median masses is:", masses
print freq_count
print np.hstack((c, c[:, -1, np.newaxis]/2.0))
Output will be:
median masses is: [10 20 20 0 30]
[[ 30 191 9 0] <- The test data
[ 10 20 300 10]
[ 10 20 30 40]
[100 10 10 10]
[ 1 1 1 100]]
[[ 30. 221. 230. 230. 115. ] <- cumsum results with median added to the end.
[ 10. 30. 330. 340. 170. ] you can see from this where they fit in.
[ 10. 30. 60. 100. 50. ]
[ 100. 110. 120. 130. 65. ]
[ 1. 2. 3. 103. 51.5]]
wquantiles is a small python package that will do exactly what you need. It just uses np.cumsum() and np.interp() under the hood.
Since this is the top hit on Google for weighted median in NumPy, I will add my minimal function to select the weighted median from two arrays without changing their contents, and with no assumptions about the order of the values (on the off-chance that anyone else comes here looking for a quick recipe for the same exact pre-conditions).
def weighted_median(values, weights):
i = np.argsort(values)
c = np.cumsum(weights[i])
return values[i[np.searchsorted(c, 0.5 * c[-1])]]
Using argsort lets us maintain the alignment between the two arrays without changing or copying their content. It should be straight-forward to extend is to an arbitrary number of arbitrary quantiles.
Update
Since it may not be fully obvious at first blush exactly how easy it is to extend to arbitrary quantiles, here is the code:
def weighted_quantiles(values, weights, quantiles=0.5):
i = np.argsort(values)
c = np.cumsum(weights[i])
return values[i[np.searchsorted(c, np.array(quantiles) * c[-1])]]
This defaults to median, but you can pass in any quantile, or a list of quantiles. The return type is equivalent to what you pass in as quantiles, with lists promoted to NumPy arrays. With enough uniformly distributed values, you can indeed approximate the input poorly:
>>> weighted_quantiles(np.random.rand(10000), np.random.rand(10000), [0.01, 0.05, 0.25, 0.50, 0.75, 0.95, 0.99])
array([0.01235101, 0.05341077, 0.25355715, 0.50678338, 0.75697424,0.94962936, 0.98980785])
>>> weighted_quantiles(np.random.rand(10000), np.random.rand(10000), 0.5)
0.5036283072043176
>>> weighted_quantiles(np.random.rand(10000), np.random.rand(10000), [0.5])
array([0.49851076])
Update 2
In small data sets where the median/quantile is not actually observed, it may be important to be able to interpolate a point between two observations. This can be fairly easily added by calculating the mid point between two number in the case where the weight mass is equally (or quantile/1-quantile) divided between them. Due to the need for a conditional, this function always returns a NumPy array, even when quantiles is a single scalar. The inputs also need to be NumPy arrays now (except quantiles that may still be a single number).
def weighted_quantiles_interpolate(values, weights, quantiles=0.5):
i = np.argsort(values)
c = np.cumsum(weights[i])
q = np.searchsorted(c, quantiles * c[-1])
return np.where(c[q]/c[-1] == quantiles, 0.5 * (values[i[q]] + values[i[q+1]]), values[i[q]])
This function will fail with arrays smaller than 2 (the original would handle non-empty arrays).
>>> weighted_quantiles_interpolate(np.array([2, 1]), np.array([1, 1]), 0.5)
array(1.5)
Note that this extension is fairly unlikely to be needed when working with actual data sets where we typically have (a) large data sets, and (b) real-values weights that make the odds of ending up exactly at a quantile edge very long, and probably due to rounding errors when it does happen. Including it for completeness nonetheless.
I ended up writing that function based on #muzzle and #maesers replies:
def weighted_quantiles(values, weights, quantiles=0.5, interpolate=False):
i = values.argsort()
sorted_weights = weights[i]
sorted_values = values[i]
Sn = sorted_weights.cumsum()
if interpolate:
Pn = (Sn - sorted_weights/2 ) / Sn[-1]
return np.interp(quantiles, Pn, sorted_values)
else:
return sorted_values[np.searchsorted(Sn, quantiles * Sn[-1])]
The difference between interpolate True and False is as follows:
weighted_quantiles(np.array([1, 2, 3, 4]), np.ones(4))
> 2
weighted_quantiles(np.array([1, 2, 3, 4]), np.ones(4), interpolate=True)
> 2.5
(there is no difference for uneven arrays such as [1, 2, 3, 4, 5])
Speed tests show it is just as performant as #maesers' function in the uninterpolated case, and it is twice as performant in the interpolated case.
Sharing some code that I got a hand with. This allows you to run stats on each column of an excel spreadsheet.
import xlrd
import sys
import csv
import numpy as np
import itertools
from itertools import chain
book = xlrd.open_workbook('/filepath/workbook.xlsx')
sh = book.sheet_by_name("Sheet1")
ofile = '/outputfilepath/workbook.csv'
masses = sh.col_values(0, start_rowx=1) # first column has mass
age = sh.row_values(0, start_colx=1) # first row has age ranges
count = 1
mass = []
for a in ages:
age.append(sh.col_values(count, start_rowx=1))
count += 1
stats = []
count = 0
for a in ages:
expanded = []
# create a tuple with the mass vector
age_mass = zip(masses, age[count])
count += 1
# replicate element[0] for element[1] times
expanded = list(list(itertools.repeat(am[0], int(am[1]))) for am in age_mass)
# separate into one big list
medianlist = [x for t in expanded for x in t]
# convert to array and mask out zeroes
npa = np.array(medianlist)
npa = np.ma.masked_equal(npa,0)
median = np.median(npa)
meanMass = np.average(npa)
maxMass = np.max(npa)
minMass = np.min(npa)
stdev = np.std(npa)
stats1 = [median, meanMass, maxMass, minMass, stdev]
print stats1
stats.append(stats1)
np.savetxt(ofile, (stats), fmt="%d")

Discretization of probability array in Python

I have a numpy array (actually imported from a GIS raster map) which contains
probability values of occurrence of a species like following example:
a = random.randint(1.0,20.0,1200).reshape(40,30)
b = (a*1.0)/sum(a)
Now I want to get a discrete version for that array again. Like if I have
e.g. 100 individuals which are located on the area of that array (1200 cells) how are they
distributed? Of course they should be distributed according to their probability,
meaning lower values indicated lower probability of occurrence. However, as everything is statistics there is still the chance that a individual is located at a low probability
cell. It should be possible that multiple individuals can occupy on cell...
It is like transforming a continuous distribution curve into a histogram again. Like many different histograms may result in a certain distribution curve it should also be the other way round. Accordingly applying the algorithm I am looking for will produce different discrete values each time.
...is there any algorithm in python which can do that? As I am not that familiar with discretization maybe someone can help.
Use random.choice with bincount:
np.bincount(np.random.choice(b.size, 100, p=b.flat),
minlength=b.size).reshape(b.shape)
If you don't have NumPy 1.7, you can replace random.choice with:
np.searchsorted(np.cumsum(b), np.random.random(100))
giving:
np.bincount(np.searchsorted(np.cumsum(b), np.random.random(100)),
minlength=b.size).reshape(b.shape)
So far I think ecatmur's answer seems quite reasonable and simple.
I just want to add may a more "applied" example. Considering a dice
with 6 faces (6 numbers). Each number/result has a probability of 1/6.
Displaying the dice in form of an array could look like:
b = np.array([[1,1,1],[1,1,1]])/6.0
Thus rolling the dice 100 times (n=100) results in following simulation:
np.bincount(np.searchsorted(np.cumsum(b), np.random.random(n)),minlength=b.size).reshape(b.shape)
I think that can be an appropriate approach for such an application.
Thus thank you ecatmur for your help!
/Johannes
this is similar to my question i had earlier this month.
import random
def RandFloats(Size):
Scalar = 1.0
VectorSize = Size
RandomVector = [random.random() for i in range(VectorSize)]
RandomVectorSum = sum(RandomVector)
RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
return RandomVector
from numpy.random import multinomial
import math
def RandIntVec(ListSize, ListSumValue, Distribution='Normal'):
"""
Inputs:
ListSize = the size of the list to return
ListSumValue = The sum of list values
Distribution = can be 'uniform' for uniform distribution, 'normal' for a normal distribution ~ N(0,1) with +/- 5 sigma (default), or a list of size 'ListSize' or 'ListSize - 1' for an empirical (arbitrary) distribution. Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).
Output:
A list of random integers of length 'ListSize' whose sum is 'ListSumValue'.
"""
if type(Distribution) == list:
DistributionSize = len(Distribution)
if ListSize == DistributionSize or (ListSize-1) == DistributionSize:
Values = multinomial(ListSumValue,Distribution,size=1)
OutputValue = Values[0]
elif Distribution.lower() == 'uniform': #I do not recommend this!!!! I see that it is not as random (at least on my computer) as I had hoped
UniformDistro = [1/ListSize for i in range(ListSize)]
Values = multinomial(ListSumValue,UniformDistro,size=1)
OutputValue = Values[0]
elif Distribution.lower() == 'normal':
"""
Normal Distribution Construction....It's very flexible and hideous
Assume a +-3 sigma range. Warning, this may or may not be a suitable range for your implementation!
If one wishes to explore a different range, then changes the LowSigma and HighSigma values
"""
LowSigma = -3#-3 sigma
HighSigma = 3#+3 sigma
StepSize = 1/(float(ListSize) - 1)
ZValues = [(LowSigma * (1-i*StepSize) +(i*StepSize)*HighSigma) for i in range(int(ListSize))]
#Construction parameters for N(Mean,Variance) - Default is N(0,1)
Mean = 0
Var = 1
#NormalDistro= [self.NormalDistributionFunction(Mean, Var, x) for x in ZValues]
NormalDistro= list()
for i in range(len(ZValues)):
if i==0:
ERFCVAL = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
NormalDistro.append(ERFCVAL)
elif i == len(ZValues) - 1:
ERFCVAL = NormalDistro[0]
NormalDistro.append(ERFCVAL)
else:
ERFCVAL1 = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
ERFCVAL2 = 0.5 * math.erfc(-ZValues[i-1]/math.sqrt(2))
ERFCVAL = ERFCVAL1 - ERFCVAL2
NormalDistro.append(ERFCVAL)
#print "Normal Distribution sum = %f"%sum(NormalDistro)
Values = multinomial(ListSumValue,NormalDistro,size=1)
OutputValue = Values[0]
else:
raise ValueError ('Cannot create desired vector')
return OutputValue
else:
raise ValueError ('Cannot create desired vector')
return OutputValue
ProbabilityDistibution = RandFloats(1200)#This is your probability distribution for your 1200 cell array
SizeDistribution = RandIntVec(1200,100,Distribution=ProbabilityDistribution)#for a 1200 cell array, whose sum is 100 with given probability distribution
The two main lines that are important are the last two lines in the code above

Categories