Create normally distributed columns, with different mean values - python

I have the following numpy matrix:
R = np.matrix(np.ones([3,3]))
# Update R matrix based on sales statistics
for i in range(0, len(R)):
for j in range(0, len(R)):
R[j,i] = scipy.stats.norm(2, 1).pdf(i) * 100
print(R)
[[ 5.39909665 24.19707245 39.89422804]
[ 5.39909665 24.19707245 39.89422804]
[ 5.39909665 24.19707245 39.89422804]]
I would like to convert each column, multiplying the index (0,1,2) to the corresponding density value of the normal distribution, with mean equals to, specifically, 5.39909665 for the first column, 24.19707245 the second and 39.8942280 the third; and standard deviation equals to 1.
Ultimately, creating a matrix as:
[norm(5.39, 1).pdf(0), norm(24.197, 1).pdf(0), ...]
[ norm(5.39, 1).pdf(1), norm(24.197, 1).pdf(1), ...]
[ norm(5.39, 1).pdf(2), norm(24.197, 1).pdf(2), ...]]
How can I create the final matrix?

The pdf method works much like any numpy function, in the sense you can input arrays with same shapes in combinations with scalars. You can create R with something like:
ix = np.repeat(np.arange(3),3).reshape((3,3)) #row index, or ix.T for column index
R = scipy.stats.norm(2,1).pdf(ix.T)*100
>>array([[ 5.39909665, 24.19707245, 39.89422804],
[ 5.39909665, 24.19707245, 39.89422804],
[ 5.39909665, 24.19707245, 39.89422804]])
Following the same logic, if you want your [i,j] index to be scipy.stats.norm(scipy.stats.norm(2,1).pdf(j) * 100, 1).pdf(i) (as from the matrix you put as result), use:
scipy.stats.norm(scipy.stats.norm(2,1).pdf(ix.T) * 100, 1).pdf(ix)

Related

How to tabulate several vertical arrays in python?

I have used the following "for loop" to generate a series of column vectors
for x in time:
S = dot(M, S)
print S
Where M is (n x n) matrix and S is (n x 1) matrix. I am therefore trying to find the product of these two matrices.
The result is displayed as:
I would like the result to be displayed in a table like this.
The number of column vectors is not limited to 4, rather it is "n".
If you really need each iteration as a new column, I guess you will need to accumulate the results in a 2d array and print at the end. For example:
import numpy as np
M = np.random.randn(5,5)
S = np.random.randn(5,1)
for x in range(4):
S = np.c_[S, np.dot(M, S[:,-1])]
np.set_printoptions(precision=5, linewidth=120)
print S
which prints:
[[ 0.19891 -0.46714 2.09736 -5.01507 14.7212 ]
[ 0.6387 0.81975 -2.25251 6.5098 -8.27462]
[ -0.44047 0.3941 1.81101 -7.24052 23.07632]
[ -0.17742 0.88452 -2.80172 10.06426 -21.50157]
[ 0.57601 -0.80838 1.14127 1.15622 -4.11907]]

Averaging out sections of a multiple row array in Python

I've got a 2-row array called C like this:
from numpy import *
A = [1,2,3,4,5]
B = [50,40,30,20,10]
C = vstack((A,B))
I want to take all the columns in C where the value in the first row falls between i and i+2, and average them. I can do this with just A no problem:
i = 0
A_avg = []
while(i<6):
selection = A[logical_and(A >= i, A < i+2)]
A_avg.append(mean(selection))
i += 2
then A_avg is:
[1.0,2.5,4.5]
I want to carry out the same process with my two-row array C, but I want to take the average of each row separately, while doing it in a way that's dictated by the first row. For example, for C, I want to end up with a 2 x 3 array that looks like:
[[1.0,2.5,4.5],
[50,35,15]]
Where the first row is A averaged in blocks between i and i+2 as before, and the second row is B averaged in the same blocks as A, regardless of the values it has. So the first entry is unchanged, the next two get averaged together, and the next two get averaged together, for each row separately. Anyone know of a clever way to do this? Many thanks!
I hope this is not too clever. TIL boolean indexing does not broadcast, so I had to manually do the broadcasting. Let me know if anything is unclear.
import numpy as np
A = [1,2,3,4,5]
B = [50,40,30,20,10]
C = np.vstack((A,B)) # float so that I can use np.nan
i = np.arange(0, 6, 2)[:, None]
selections = np.logical_and(A >= i, A < i+2)[None]
D, selections = np.broadcast_arrays(C[:, None], selections)
D = D.astype(float) # allows use of nan, and makes a copy to prevent repeated behavior
D[~selections] = np.nan # exclude these elements from mean
D = np.nanmean(D, axis=-1)
Then,
>>> D
array([[ 1. , 2.5, 4.5],
[ 50. , 35. , 15. ]])
Another way, using np.histogram to bin your data. This may be faster for large arrays, but is only useful for few rows, since a hist must be done with different weights for each row:
bins = np.arange(0, 7, 2) # include the end
n = np.histogram(A, bins)[0] # number of columns in each bin
a_mean = np.histogram(A, bins, weights=A)[0]/n
b_mean = np.histogram(A, bins, weights=B)[0]/n
D = np.vstack([a_mean, b_mean])

Get indices of matrix from upper triangle

I have a symmetric matrix represented as a numpy array, like the following example:
[[ 1. 0.01735908 0.01628629 0.0183845 0.01678901 0.00990739 0.03326491 0.0167446 ]
[ 0.01735908 1. 0.0213712 0.02364181 0.02603567 0.01807505 0.0130358 0.0107082 ]
[ 0.01628629 0.0213712 1. 0.01293289 0.02041379 0.01791615 0.00991932 0.01632739]
[ 0.0183845 0.02364181 0.01293289 1. 0.02429031 0.01190878 0.02007371 0.01399866]
[ 0.01678901 0.02603567 0.02041379 0.02429031 1. 0.01496896 0.00924174 0.00698689]
[ 0.00990739 0.01807505 0.01791615 0.01190878 0.01496896 1. 0.0110924 0.01514519]
[ 0.03326491 0.0130358 0.00991932 0.02007371 0.00924174 0.0110924 1. 0.00808803]
[ 0.0167446 0.0107082 0.01632739 0.01399866 0.00698689 0.01514519 0.00808803 1. ]]
And I need to find the indices (row and column) of the greatest value without considering the diagonal. Since is a symmetric matrix I just took the the upper triangle of the matrix.
ind = np.triu_indices(M_size, 1)
And then the index of the max value
max_ind = np.argmax(H[ind])
However max_ind is the index of the vector resulting after taking the upper triangle with triu_indices, how do I know which are the row and column of the value I've just found?
The matrix could be any size but it's always symmetric. Do you know a better method to achieve the same?
Thank you
Couldn't you do this by using np.triu to return a copy of your matrix with all but the upper triangle zeroed, then just use np.argmax and np.unravel_index to get the row/column indices?
Example:
x = np.zeros((10,10))
x[3, 8] = 1
upper = np.triu(x, 1)
idx = np.argmax(upper)
row, col = np.unravel_index(idx, upper.shape)
The drawback of this method is that it creates a copy of the input matrix, but it should still be a lot quicker than looping over elements in Python. It also assumes that the maximum value in the upper triangle is > 0.
You can use the value of max_ind as an index into the ind data
max_ind = np.argmax(H[ind])
Out: 23
ind[0][max_ind], ind[1][max_ind],
Out: (4, 6)
Validate this by looking for the maximum in the entire matrix (won't always work -- data-dependent):
np.unravel_index(np.argmax(H), H.shape)
Out: (4, 6)
There's probably a neater "numpy way" to do this, but this is what comest to mind first:
answer = None
biggest = 0
for r,row in enumerate(matrix):
i,elem = max(enumerate(row[r+1:]), key=operator.itemgetter(1))
if elem > biggest:
biggest, answre = elem, i

Find two disjoint pairs of pairs that sum to the same vector

This is a follow-up to Find two pairs of pairs that sum to the same value .
I have random 2d arrays which I make using
import numpy as np
from itertools import combinations
n = 50
A = np.random.randint(2, size=(m,n))
I would like to determine if the matrix has two disjoint pairs of pairs of columns which sum to the same column vector. I am looking for a fast method to do this. In the previous problem ((0,1), (0,2)) was acceptable as a pair of pairs of column indices but in this case it is not as 0 is in both pairs.
The accepted answer from the previous question is so cleverly optimised I can't see how to make this simple looking change unfortunately. (I am interested in columns rather than rows in this question but I can always just do A.transpose().)
Here is some code to show it testing all 4 by 4 arrays.
n = 4
nxn = np.arange(n*n).reshape(n, -1)
count = 0
for i in xrange(2**(n*n)):
A = (i >> nxn) %2
p = 1
for firstpair in combinations(range(n), 2):
for secondpair in combinations(range(n), 2):
if firstpair < secondpair and not set(firstpair) & set(secondpair):
if (np.array_equal(A[firstpair[0]] + A[firstpair[1]], A[secondpair[0]] + A[secondpair[1]] )):
if (p):
count +=1
p = 0
print count
This should output 3136.
Here is my solution, extended to do what I believe you want. It isn't entirely clear though; one may get an arbitrary number of row-pairs that sum to the same total; there may exist unique subsets of rows within them that sum to the same value. For instance:
Given this set of row-pairs that sum to the same total
[[19 19 30 30]
[11 16 11 16]]
There exists a unique subset of these rows that may still be counted as valid; but should it?
[[19 30]
[16 11]]
Anyway, I hope those details are easy to deal with, given the code below.
import numpy as np
n = 20
#also works for non-square A
A = np.random.randint(2, size=(n*6,n)).astype(np.int8)
##A = np.array( [[0, 0, 0], [1, 1, 1], [1, 1 ,1]], np.uint8)
##A = np.zeros((6,6))
#force the inclusion of some hits, to keep our algorithm on its toes
##A[0] = A[1]
def base_pack_lazy(a, base, dtype=np.uint64):
"""
pack the last axis of an array as minimal base representation
lazily yields packed columns of the original matrix
"""
a = np.ascontiguousarray( np.rollaxis(a, -1))
packing = int(np.dtype(dtype).itemsize * 8 / (float(base) / 2))
for columns in np.array_split(a, (len(a)-1)//packing+1):
R = np.zeros(a.shape[1:], dtype)
for col in columns:
R *= base
R += col
yield R
def unique_count(a):
"""returns counts of unique elements"""
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), np.int)
np.add.at(count, inverse, 1) #note; this scatter operation requires numpy 1.8; use a sparse matrix otherwise!
return unique, count, inverse
def voidview(arr):
"""view the last axis of an array as a void object. can be used as a faster form of lexsort"""
return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1])
def has_identical_row_sums_lazy(A, combinations_index):
"""
compute the existence of combinations of rows summing to the same vector,
given an nxm matrix A and an index matrix specifying all combinations
naively, we need to compute the sum of each row combination at least once, giving n^3 computations
however, this isnt strictly required; we can lazily consider the columns, giving an early exit opportunity
all nicely vectorized of course
"""
multiplicity, combinations = combinations_index.shape
#list of indices into combinations_index, denoting possibly interacting combinations
active_combinations = np.arange(combinations, dtype=np.uint32)
#keep all packed columns; we might need them later
columns = []
for packed_column in base_pack_lazy(A, base=multiplicity+1): #loop over packed cols
columns.append(packed_column)
#compute rowsums only for a fixed number of columns at a time.
#this is O(n^2) rather than O(n^3), and after considering the first column,
#we can typically already exclude almost all combinations
partial_rowsums = sum(packed_column[I[active_combinations]] for I in combinations_index)
#find duplicates in this column
unique, count, inverse = unique_count(partial_rowsums)
#prune those combinations which we can exclude as having different sums, based on columns inspected thus far
active_combinations = active_combinations[count[inverse] > 1]
#early exit; no pairs
if len(active_combinations)==0:
return False
"""
we now have a small set of relevant combinations, but we have lost the details of their particulars
to see which combinations of rows does sum to the same value, we do need to consider rows as a whole
we can simply apply the same mechanism, but for all columns at the same time,
but only for the selected subset of row combinations known to be relevant
"""
#construct full packed matrix
B = np.ascontiguousarray(np.vstack(columns).T)
#perform all relevant sums, over all columns
rowsums = sum(B[I[active_combinations]] for I in combinations_index)
#find the unique rowsums, by viewing rows as a void object
unique, count, inverse = unique_count(voidview(rowsums))
#if not, we did something wrong in deciding on active combinations
assert(np.all(count>1))
#loop over all sets of rows that sum to an identical unique value
for i in xrange(len(unique)):
#set of indexes into combinations_index;
#note that there may be more than two combinations that sum to the same value; we grab them all here
combinations_group = active_combinations[inverse==i]
#associated row-combinations
#array of shape=(mulitplicity,group_size)
row_combinations = combinations_index[:,combinations_group]
#if no duplicate rows involved, we have a match
if len(np.unique(row_combinations[:,[0,-1]])) == multiplicity*2:
print row_combinations
return True
#none of identical rowsums met uniqueness criteria
return False
def has_identical_triple_row_sums(A):
n = len(A)
idx = np.array( [(i,j,k)
for i in xrange(n)
for j in xrange(n)
for k in xrange(n)
if i<j and j<k], dtype=np.uint16)
idx = np.ascontiguousarray( idx.T)
return has_identical_row_sums_lazy(A, idx)
def has_identical_double_row_sums(A):
n = len(A)
idx = np.array(np.tril_indices(n,-1), dtype=np.int32)
return has_identical_row_sums_lazy(A, idx)
from time import clock
t = clock()
for i in xrange(1):
## print has_identical_double_row_sums(A)
print has_identical_triple_row_sums(A)
print clock()-t
Edit: code cleanup

Find the closest value above and under in an array?

I have an array which is consisting of [An elementnumber, x-coordinate, y-coordinate, z-coordinate, radius(polare coordinates), θ(polare coordinates)]
In this array i need to find the two closest values to a specified number, the one above and the one below.
This needs to be found in the last columns of the array which holds the θ value.
The range of the values goes from 0 - 1.5707 radian(0 - 90 degrees) and in our case we want to be able to choose the amount of specified numbers we want
number=9
Anglestep = math.pi/2 / number
Anglerange = np.arange(0,math.pi/2+anglestep,anglestep) #math.pi/2+anglestep so that we get math.pi/2 in the array
For an example i need to find the two values above and under the specified value: "0.17"
[...['4549', '4.2158604', '49.4799309', '0.0833661', 49.65920902290997, 0.0849981532744405],
['4535', '4.2867651', '49.4913025', '0.0813997', 49.67660795755971, 0.08640089283783374],
['4537', '5.6042995', '49.4534569', '0.0811241', 49.7699967073121, 0.11284330708918186],
['4538', '6.2840257', '49.4676971', '0.0809942', 49.86523874780516, 0.12635612935285648],
['4539', '6.9654546', '49.4909363', '0.0814121', 49.97869879894153, 0.13982362821749783],
['4540', '7.6476088', '49.5210190', '0.0813955', 50.10805567128103, 0.1532211602749019],
['4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243],
['4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624],
['4529', '9.3985014', '49.6320610', '0.0812080', 50.51409018950577, 0.18714756393388338],
['4531', '10.3884563', '49.7157669', '0.0812043', 50.78954127329902, 0.2059930152826599]..]
So what i want as output would in this case be the two values: (0.16651831290560243, 0.17975113416156624)
In [30]: np.max(arr[arr < .17])
Out[30]: 0.16651831290560243
In [31]: np.min(arr[arr > .17])
Out[31]: 0.17975113416156624
#NPE's answer is correct for a 1d array, but you must first access the Angle column of your array. This depends on the dtype (data type) of your array (your array seems to include both strings and floats, which is not allowed for a numpy array). There are two ways that it might be solved, one by making it all floats, the other by using a structured dtype:
All floats
arr = np.array([
['4549', '4.2158604', '49.4799309', '0.0833661', 49.65920902290997, 0.0849981532744405 ],
['4535', '4.2867651', '49.4913025', '0.0813997', 49.67660795755971, 0.08640089283783374],
['4537', '5.6042995', '49.4534569', '0.0811241', 49.7699967073121 , 0.11284330708918186],
['4538', '6.2840257', '49.4676971', '0.0809942', 49.86523874780516, 0.12635612935285648],
['4539', '6.9654546', '49.4909363', '0.0814121', 49.97869879894153, 0.13982362821749783],
['4540', '7.6476088', '49.5210190', '0.0813955', 50.10805567128103, 0.1532211602749019 ],
['4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243],
['4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624],
['4529', '9.3985014', '49.6320610', '0.0812080', 50.51409018950577, 0.18714756393388338],
['4531', '10.3884563', '49.7157669', '0.0812043', 50.78954127329902, 0.2059930152826599 ]], dtype=float)
Then, to apply #Jaime's method, use
i = np.searchsorted(arr[:, -1], 0.17)
below = arr[i-1]
above = arr[i]
below
# array([ 4.54100000e+03, 8.32986550e+00, 4.95605049e+01, 8.12513000e-02, 5.02556495e+01, 1.66518313e-01])
above
# array([ 4.54200000e+03, 9.01412110e+00, 4.96065178e+01, 8.11457000e-02, 5.04188555e+01, 1.79751134e-01])
If you want just the angles, then just slice by column as well:
below_ang = arr[i-1, -1]
above_ang = arr[i, -1]
below_ang, above_ang
#(0.166518313, 0.179751134)
Note that this assumes that arr is sorted by angle.
Structured array:
arr = array([ ('4549', '4.2158604', '49.4799309', '0.0833661', 49.65920902290997, 0.0849981532744405 ),
('4535', '4.2867651', '49.4913025', '0.0813997', 49.67660795755971, 0.08640089283783374),
('4537', '5.6042995', '49.4534569', '0.0811241', 49.7699967073121 , 0.11284330708918186),
('4538', '6.2840257', '49.4676971', '0.0809942', 49.86523874780516, 0.12635612935285648),
('4539', '6.9654546', '49.4909363', '0.0814121', 49.97869879894153, 0.13982362821749783),
('4540', '7.6476088', '49.5210190', '0.0813955', 50.10805567128103, 0.1532211602749019 ),
('4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243),
('4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624),
('4529', '9.3985014', '49.6320610', '0.0812080', 50.51409018950577, 0.18714756393388338),
('4531', '10.3884563', '49.7157669', '0.0812043', 50.78954127329902, 0.2059930152826599)],
dtype=[('id', 'S4'), ('x', 'S10'), ('y', 'S10'), ('z', 'S9'), ('rad', '<f8'), ('ang', '<f8')])
i = np.searchsorted(arr['ang'], 0.17)
below = arr[i-1]
above = arr[i]
below
# ('4541', '8.3298655', '49.5605049', '0.0812513', 50.25564948531672, 0.16651831290560243)
above
# ('4542', '9.0141211', '49.6065178', '0.0811457', 50.41885547537927, 0.17975113416156624)
Doing it for several values
First, an easier way to set up your range is with linspace, which automatically includes the start and end, and is specified by length of array, not step. Instead of:
number=9
anglestep = math.pi/2 / number
anglerange = np.arange(0,math.pi/2+anglestep,anglestep) #math.pi/2+anglestep so that we get math.pi/2 in the array
Use
number = 9
anglerange = np.linspace(0, math.pi/2, number) # start, end, number
Now, searchsorted will actually find several points for you just as easily:
locs = np.searchsorted(arr['ang'], anglerange)
belows = arr['ang'][locs-1]
aboves = arr['ang'][locs]
For example, I'll set anglerange = [0.1, 0.17, 0.2] since the full range isn't in your sample data:
belows
# array([ 0.08640089, 0.16651831, 0.18714756])
aboves
# array([ 0.11284331, 0.17975113, 0.20599302])

Categories