Combination of two Binomial Draws Or "Group and sum matrix by value"

Combination of two Binomial Draws Or "Group and sum matrix by value" - python

Consider two urns, E and U. There are holy grails and crappy grails in each of these. Denote the holy ones with H.
Say we draw out of both urns, xe times out of E, and xu times out of U - how many holy grails are we going to find? This is easily solvable for any pair (xe, xu). But I'd like to do this for grids of draws out of xe and xu.
What is the most efficient way to do this in Python using standard packages?
Here is my approach.
import numpy as np
import scipy.stats as stats
binomial = stats.binom.pmf
# define the grids of E, U to search
numberOfE = np.arange(3)
numberOfHolyE = np.arange(3)
numberOfU = np.arange(5)
numberOfHolyU = np.arange(5)
# mesh it
E, U, EH, UH = np.meshgrid(numberOfE, numberOfU, numberOfHolyE, numberOfHolyU, indexing='ij')
# independent draws from both urns. Probabilities are 0.9 and 0.1
drawsE = binomial(EH, E, 0.9)
drawsU = binomial(UH, U, 0.1)
# joint probability of being at a specific grid point
prob = drawsE * drawsU
totalHigh = EH + UH
This is how far I've come:
In [77]: prob[1,1,:]
Out[77]:
array([[ 0.09, 0.01, 0. , 0. , 0. ],
[ 0.81, 0.09, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [78]: totalHigh[1,1,:]
Out[78]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
I think, that, these matrices mean the following:
Take a look at where totalHigh has value 1: if I draw one time from both urns, I have a 0.81 probability of drawing one high from E and zero from U, and 0.01 the other way around. That means, the total probability of drawing one guy conditional on drawing once from both urns is 0.82.
Which brings me to my second question:
Conditional on doing it this way, How do I sum up these probabilities efficiently, conditional on the first two dimensions? I effectively want to transform these 4D matrices into 3D matrices.

Related

Calculate means of array with specific elements

I'm implementing the Nearest Centroid Classification algorithm and I'm kind of blocked on how to use numpy.mean in my case.
So suppose I have some spherical datasets X:
[[ 0.39151059 3.48203037]
[-0.68677876 1.45377717]
[ 2.30803493 4.19341503]
[ 0.50395297 2.87076658]
[ 0.06677012 3.23265678]
[-0.24135103 3.78044279]
[-0.05660036 2.37695381]
[ 0.74210998 -3.2654815 ]
[ 0.05815341 -2.41905942]
[ 0.72126958 -1.71081388]
[ 1.03581142 -4.09666955]
[ 0.23209714 -1.86675298]
[-0.49136284 -1.55736028]
[ 0.00654881 -2.22505305]]]
and the labeled vector Y:
[0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]
An example with 100 2D data points gives the following result:
The NCC algorithm consists of first calculating the class mean of each class (0 and 1: that's blue and red) and then calculating the nearest class centroid for the next data point.
This is my current function:
def mean_ncc(X,Y):
# find unique classes
m_cids = np.unique(Y) #[0. 1.]
# compute class means
mu = np.zeros((len(cids), X.shape[1])) #[[0. 0.] [0. 0.]] (in the case where Y has 2 unique points (0 and 1)
for class_idx, class_label in enumerate(cids):
mu[class_idx, :] = #problem here
return mu
So here I want an array containing the class means of '0' (blue) points and '1' (red) points:
How can I specify the number of elements of X whose mean I want to calculate?
I would like to do something like this:
for class_idx, class_label in enumerate(m_cids):
mu[class_idx, :] = np.mean(X[only the elements,that contains the same class_label], axis=0)
Is it possible or is there another way to implement this?

You could use something like this:
import numpy as np
tags = [0, 0, 1, 1, 0, 1]
values = [5, 4, 2, 5, 9, 8]
tags_np = np.array(tags)
values_np = np.array(values)
print(values_np[tags_np == 1].mean())
EDIT: You will surely need to look more into the axis parameter for the mean function:
import numpy as np
values = [[5, 4],
[5, 4],
[4, 3],
[4, 3]]
values_np = np.array(values)
tags_np = np.array([0, 0, 1, 1])
print(values_np[tags_np == 0].mean(axis=0))

Numpy returns .00...002

Sorry if this post is a dupli,I couldn't find an answer... I have the following code:
import numpy as np
V = np.array([[6, 10, 0],
[2, 5, 0],
[0, 0, 0]])
subarr = np.array([[arr[0][0], arr[0][1]], [arr[1][0], arr[1][1]]])
det = np.linalg.det(subarr)
cross = np.cross(arr[0], arr[1])
print(f"Det: {det}")
print(f"Cross: {cross}")
I would expect that the det would return 10.0 and the cross returns in this case [0, 0, 10], the last number being equal to the det. For some reason, python returns
Det: 10.000000000000002
Cross: [ 0 0 10]
Can someone please explain why?

What you're seeing is floating point inaccuracies.
And in case you're wondering how you end up with floats when finding the determinant of a matrix made up of integers (where the usual calculation method is just 6*5 - 2*10 = 10), np.linalg.det uses LU decomposition to find the determinant. This isn't very efficient for 2x2 matrices, but is much more efficient when you have bigger matrices.
For your 2x2, you get:
scipy.linalg.lu(A, 1)
Out:
(array([[ 1. , 0. ],
[ 0.33333333, 1. ]]),
array([[ 6. , 10. ],
[ 0. , 1.66666667]]))
The determinant is just the product of the diagonals from this, which ends up being 6. * 1.66666667, which resolves to 10.00000002 due to floating point errors.

Fading out a signal in numpy

What is the most idiomatic way to produce a cumulative sum which "fades" out as it moves along. Let me explain with an example.
>>> np.array([1,0,-1,0,0]).cumsum()
array([1, 1, 0, 0, 0], dtype=int32)
But I would like to provide a factor <1 and produce something like:
>>> np.array([1,0,-1,0,0]).cumsum_with_factor(0.5)
array([1.0, 0.5, -0.75, -0.375, -0.1875], dtype=float64)
It's a big plus if it's fast!

Your result can be obtained by linear convolution:
signal = np.array([1,0,-1,0,0])
kernel = 0.5**np.arange(5)
np.convolve(signal, kernel, mode='full')
# array([ 1. , 0.5 , -0.75 , -0.375 , -0.1875, -0.125 , -0.0625,
0. , 0. ])
If performance is a consideration use scipy.signal.fftconvolve which is a faster implementation of the same logic.

numpy interpolation to increase a vector size

Hi I have to enlarge the number of points inside of vector to enlarge the vector to fixed size. for example:
for this simple vector
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> len(a)
# 6
now, I want to get a vector with size of 11 taken the a vector as base the results will be
# array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
EDIT 1
what I need is a function that will enter the base vector and the number of values that must be the resultant vector, and I return a new vector with size equal to the parameter. something like
def enlargeVector(vector, size):
.....
return newVector
to use like:
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> b = enlargeVector(a, 200):
>>> len(b)
# 200
and b contains data results of linear, cubic, or whatever interpolation methods

There are many methods to do this within scipy.interpolate. My favourite is UnivariateSpline, which produces an order k spline guaranteed to be differentiable k times.
To use it:
from scipy.interpolate import UnivariateSpline
old_indices = np.arange(0,len(a))
new_length = 11
new_indices = np.linspace(0,len(a)-1,new_length)
spl = UnivariateSpline(old_indices,a,k=3,s=0)
new_array = spl(new_indices)
The s is a smoothing factor that you should set to 0 in this case (since the data are exact).
Note that for the problem you have specified (since a just increases monotonically by 1), this is overkill, since the second np.linspace gives already the desired output.
EDIT: clarified that the length is arbitrary

As AGML pointed out there are tools to do this, but how about a pure numpy solution:
In [20]: a = np.arange(6)
In [21]: temp = np.dstack((a[:-1], a[:-1] + np.diff(a) / 2.0)).ravel()
In [22]: temp
Out[22]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
In [23]: np.hstack((temp, [a[-1]]))
Out[23]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])

Python Data Structure Recommendations for a 2D Grid

I am looking to implement an Ant Colony Optimization algorithm in Python, though am new to both Python and Object Oriented Programming so the learning curve has been rather steep. At this point, I am stuck as to how to address the following situation:
As ants walk around a 2D grid, they will encounter obstacles, pheromone deposits by other ants, food, etc. What data structure do I use to represent this 2D world and the aforementioned properties of each cell?
I had tried a 2D array, thinking that array[x-coord][y-coord] could point to a {} (dictionary) with the appropriate properties (Obstacle: 'Yes / 'No', Pheromone Level: X %, etc.). Unfortunately, though NumPy lets me create a 2D array, I cannot assign dictionary objects to the various coordinates.
from numpy import *
myArray = array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12]])
myArray[2][2]={}
Returns:
Traceback (most recent call last):
File "/Users/amormachine/Desktop/PythonTest.py", line 7, in <module>
myArray[2][2]={}
TypeError: long() argument must be a string or a number, not 'dict'
[Finished in 0.6s with exit code 1]
I am not committed to either dictionaries or this paradigm for implementing this project and would certainly appreciate the wisdom of the group.

sure you can, you just cant if your dtype is int ... so make your array with objects and you can use objects...
In [43]: a = [[{},{},{}],[{},{},{}]]
In [44]: a = numpy.array(a)
In [45]: a[1][1] = {'hello':'world','something':5}
In [46]: a
Out[46]:
array([[{}, {}, {}],
[{}, {'hello': 'world', 'something': 5}, {}]], dtype=object)
although not sure whay you will gain using numpy with objects, you may be better off just leaving it as a list of lists

In plain Python I would be going for the list-of-dicts approach but with NumPy I find it more natural to work with separate arrays for different attributes rather than trying to keep things in one structure.
import numpy as np
grid_shape = (120,80)
# example of random initialization with this grid shape
pheremone_level = np.random.rand(*grid_shape)
obstacle = np.random.rand(*grid_shape) > 0.8
As #bitwise says it entirely depends on what operations you want to perform. Generally the "correct" way in NumPy will be much closer to how you would write it in Matlab than non-NumPy Python. Unfortunately I'm not familiar with how Ant Colony Optimization works so I can't say what's more suitable.

I was looking for something related to structured 2D grids and google led me to this page.
Although my solution is not entirely related to grids for what has been asked in the question, and I didn't want to repeat a question for 'structured 2D grid' data structure, I'm posting my solution here. I hope it will be useful to the audience searching for 2D structured grid and redirected here by search engines
Note: the method returns only the cell vertices and the vertex connectivity of each cells. Other quantities like cell volume, cell centroid, circumcircle, incircle, etc as needed for the application can be easily generated by adding additional routines
import numpy as np
import matplotlib.pyplot as plt
def create_structured_grid(corner1=None, corner2=None, nx=5, ny=5, plt_=True, annotate=True):
"""
creates a structured grid of rectangular lattice
input:
------
corner1 : [x_start, y_start]
corner2 : [x_end, y_end]
nx : numpts in x
ny : numpts in y
plt_ : boolean whether to plot or not
annotate: whether to annotate the grid points or not
output:
-------
vertex_array : numpy.array((numpts, dim),dtype=float) of vertices
connectivity : numpy.array((num_cells, 2**dim), dtyp=int) of
vertex connectivity for each cell
plots : additionally plots if boolean values are true
"""
#corner1 = np.array([0.0, 0.0])
#corner2 = np.array([1.0, 1.0])
dim = len(corner1) #currently only for 2D,
x_pts = np.linspace(corner1[0], corner2[0], nx)
y_pts = np.linspace(corner1[1], corner2[1], ny)
Xv, Yv = np.meshgrid(x_pts, y_pts)
numpts = nx*ny
vertex_array = np.zeros((numpts, 2), dtype=float)
vertex_array[:,0] = np.reshape(Xv, numpts)
vertex_array[:,1] = np.reshape(Yv, numpts)
num_cells = int(nx-1)*(ny-1)
connectivity = np.zeros((num_cells, int(2**dim)), dtype=int)
rows = ny-1
cols = nx-1
for row in range(rows):
for col in range(cols):
num = nx*row + col
connectivity[cols*row + col] = [num+0, num+1, num+nx, num+nx+1]
if plt_:
X,Y = vertex_array.T
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal')
plt.scatter(X,Y, marker='o', s=50, color='g', alpha=1.0)
plt.plot(Xv,Yv, linewidth=2, color='k')
plt.plot(Yv,Xv, linewidth=2, color='k')
if annotate:
for idx, cc in enumerate(vertex_array):
plt.text(cc[0], cc[1], str(idx), color='k', verticalalignment='bottom', horizontalalignment='right', fontsize='medium')
plt.show(block=False)
return vertex_array, connectivity
A call to function can be like this:
c1 = np.array([0.0, 0.0])
c2 = np.array([1.0, 1.0])
vertices, connctivity = create_structured_grid(corner1=c1, corner2=c2, nx=4, ny=4)
vertices = array([[ 0. , 0. ],
[ 0.33333333, 0. ],
[ 0.66666667, 0. ],
[ 1. , 0. ],
[ 0. , 0.33333333],
[ 0.33333333, 0.33333333],
[ 0.66666667, 0.33333333],
[ 1. , 0.33333333],
[ 0. , 0.66666667],
[ 0.33333333, 0.66666667],
[ 0.66666667, 0.66666667],
[ 1. , 0.66666667],
[ 0. , 1. ],
[ 0.33333333, 1. ],
[ 0.66666667, 1. ],
[ 1. , 1. ]])
connectivity = array([[ 0, 1, 5, 6],
[ 1, 2, 6, 7],
[ 2, 3, 7, 8],
[ 4, 5, 9, 10],
[ 5, 6, 10, 11],
[ 6, 7, 11, 12],
[ 8, 9, 13, 14],
[ 9, 10, 14, 15],
[10, 11, 15, 16]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combination of two Binomial Draws Or "Group and sum matrix by value" - python

Related

Calculate means of array with specific elements

Numpy returns .00...002

Fading out a signal in numpy

numpy interpolation to increase a vector size

Python Data Structure Recommendations for a 2D Grid

Categories

Resources