Concatenate many arrays in python - python

I have this code for concatenate two arrays.
import numpy as np
from hmmlearn import hmm
model = hmm.MultinomialHMM(n_components=3, n_iter=10,algorithm='map',tol=0.00001)
sequence3 = np.array([[2, 1, 0, 1]]).T
sequence4 = np.array([[2, 1, 0, 1, 1]]).T
sample = np.concatenate([sequence3, sequence4])
lengths = [len(sequence3), len(sequence4)]
model.fit(sample,lengths)
and it is working correctly. but now if I have more than two array. let us to say I have 10 arrays. how I can make the same process?
import numpy as np
from hmmlearn import hmm
model = hmm.MultinomialHMM(n_components=3, n_iter=10,algorithm='map',tol=0.00001)
sample = np.array([])
lengths = []
for i in range(1:10)
?????????????
model.fit(sample,lengths)

In order to concatenate more than one array, you simply concatenate the array with the concatenation of all the previous arrays.
# Create arrays
arrays=[
np.array([1,2,3]),
np.array([4,5,6]),
np.array([7,8,9])
]
# Create an array to return to
sample = np.array([])
for array in arrays:
sample = np.concatenate([sample, array])
# Print results
print('sample', sample)
print('length', len(sample))

You can use vstack
That is,
Equivalent to np.concatenate(tup, axis=0) if tup contains arrays that
are at least 2-dimensional.
store your arrays as a list,say array_list
print np.vstack(array_list)
Sample:
import numpy as np
sequence3 = np.array([[2, 1]]).T
sequence4 = np.array([[2, 5]]).T
sequence5 = np.array([[4, 5]]).T
sequence6 = np.array([[6, 7]]).T
array_list=[sequence3,sequence4,sequence5,sequence6]
sample = np.concatenate([sequence3, sequence4])
lengths = [len(sequence3), len(sequence4)]
print np.vstack(array_list)
[[2]
[1]
[2]
[5]
[4]
[5]
[6]
[7]]
Hope it helps!

Related

List of arrays in python numpy

import numpy as np
A = np.empty((0, 3))
temp = np.array([1, 1, 1])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
B = [A]
temp = np.array([2, 2, 0])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
B = B.append(A)
So it does not work. How do I make a list of numpy arrays? The problem is that I have N types of points. Every type of points has M number of points. Every point is 3 coordinate array. Because I dont know in the first place the values of N and M, I need to do all the things dynamicly. When I had N = 1, vstack worked perfectly, but now every type has its own M, and array is not uniform anymore. So my guess - I need to work in numpy/vstack just as if I had N = 1, but afterwards just contain this np.empty((0, 3)) arrays somewhere. Is it possible? Maybe some empty object-type dictionary?
Thank you very mush in advance!

transform n-dimensional numpy array to a 2D-array based on a variable in Python

I am trying to find the minimum value in an N-dimensional array spanned by (N-Parameters of varying values) and take out a 2-dimensional array spanned by 2 of the (N-Parameters) around the minimum value to make a contour plot.
I can do this by hard coding the different cases, but it should preferably be done using a variable list of which axis should be extracted (contour_param).
Please see the code below for some clarification.
import numpy as np
np.random.seed(10) # seed random for reproducebility
#Example for a 3D input array (my_data)
param_sizes = [2, 3, 4]
#Generate a data_cube
my_data = np.random.normal(size=np.prod(param_sizes)).reshape(param_sizes)
#find minimum
min_pos = np.where(my_data == my_data.min())
#what I want:
#define a parameter with the indexs of the axis to be used for the contour plot: e.i. : contour_param = [0, 1]
#for contour_param = [0, 1] i would need the the 2D array:
result = my_data[:, :, min_pos[2][0]]
#for contour_param = [1, 2] i would need the the 2D array:
result = my_data[min_pos[0][0], :, :]
#What I have tried is to convert min_pos to a list and change the entries to arrays:
contour_param = [0, 1]
min_pos = list(np.where(my_data == my_data.min()))
min_pos[contour_param[0]] = np.arange(param_sizes[contour_param[0]])
min_pos[contour_param[1]] = np.arange(param_sizes[contour_param[1]])
result = my_data[min_pos] #This throws an error
#In an attempt to clarify - I have included a sample for a 4D array
#Example for a 4D array
param_sizes = [2, 3, 4, 3]
#Generate a data_cube
my_data = np.random.normal(size=np.prod(param_sizes)).reshape(param_sizes)
#find minimum
min_pos = np.where(my_data == my_data.min())
#for contour_param = [0, 1] i would need the the 2D array:
result = my_data[:, :, min_pos[2][0], min_pos[3][0]]
#for contour_param = [1, 2] i would need the the 2D array
result = my_data[min_pos[0][0], :, :, min_pos[3][0]]
Great Question...
You can make use of np.s_ for that, as you can build up your slicer with that.
For instance the function:
def build_slicer(contour_param,min_pos):
assert len(contur_param) + 1 == min_pos.shape[0]
output = [] # init a emtpy output list
for main_index in range(min_pos.shape[0]):
if main_index in contour_param:
output.append(np.s_[:])
else:
output.append(np.s_[min_pos[main_index][0]])
return tuple(output)
would return:
import numpy as np
np.random.seed(10)
param_sizes = [2, 3, 4]
my_data = np.random.normal(size=np.prod(param_sizes)).reshape(param_sizes)
min_pos = np.where(my_data == my_data.min())
contour_param = [0,2]
build_slicer(contour_param,min_pos)
>>> (slice(None, None, None), 2, slice(None, None, None))
you can then use this to just slice your array
slice = build_slicer(contour_param,min_pos)
my_data[slice]

parallelize zonal computation on numpy array

I try to compute mode on all cells of the same zone (same value) on a numpy array. I give you an example of code below. In this example sequential approach works fine but multiprocessed approach does nothing. I do not find my mistake.
Does someone see my error ?
I would like to parallelize the computation because my real array is a 10k * 10k array with 1M zones.
import numpy as np
import scipy.stats as ss
import multiprocessing as mp
def zone_mode(i, a, b, output):
to_extract = np.where(a == i)
val = b[to_extract]
output[to_extract] = ss.mode(val)[0][0]
return output
def zone_mode0(i, a, b):
to_extract = np.where(a == i)
val = b[to_extract]
output = ss.mode(val)[0][0]
return output
np.random.seed(1)
zone = np.array([[1, 1, 1, 2, 3],
[1, 1, 2, 2, 3],
[4, 2, 2, 3, 3],
[4, 4, 5, 5, 3],
[4, 6, 6, 5, 5],
[6, 6, 6, 5, 5]])
values = np.random.randint(8, size=zone.shape)
output = np.zeros_like(zone).astype(np.float)
for i in np.unique(zone):
output = zone_mode(i, zone, values, output)
# for multiprocessing
zone0 = zone - 1
pool = mp.Pool(mp.cpu_count() - 1)
results = [pool.apply(zone_mode0, args=(u, zone0, values)) for u in np.unique(zone0)]
pool.close()
output = results[zone0]
For positve integers in the arrays - zone and values, we can use np.bincount. The basic idea is that we will consider zone and values as row and cols on a 2D grid. So, can map those to their linear index equivalent numbers. Those would be used as bins for binned summation with np.bincount. Their argmax IDs would be the mode numbers. They are mapped back to zone-grid with indexing into zone.
Hence, the solution would be -
m = zone.max()+1
n = values.max()+1
ids = zone*n + values
c = np.bincount(ids.ravel(),minlength=m*n).reshape(-1,n).argmax(1)
out = c[zone]
For sparsey data (well spread integers in the input arrays), we can look into sparse-matrix to get the argmax IDs c. Hence, with SciPy's sparse-matrix -
from scipy.sparse import coo_matrix
data = np.ones(zone.size,dtype=int)
r,c = zone.ravel(),values.ravel()
c = coo_matrix((data,(r,c))).argmax(1).A1
For slight perf. boost, specify the shape -
c = coo_matrix((data,(r,c)),shape=(m,n)).argmax(1).A1
Solving for generic values
We will make use of pandas.factorize, like so -
import pandas as pd
ids,unq = pd.factorize(values.flat)
v = ids.reshape(values.shape)
# .. same steps as earlier with bincount, using v in place of values
out = unq[c[zone]]
Note that for tie-cases, it would pick random element off values. If you want to pick the first one, use pd.factorize(values.flat, sort=True).

adding rows after for loop in python

Hi I am new to python if I want to add row to an empty matrix after for loop in python how would I be able to do so. This is how I do it in Matlab.
n = 10000;
tic
A = [];
for i = 1:n
Ai = rand(1,3);
A = [A;Ai];
end
toc
You can use append to add rows at the end of an array:
a = []
for i in range(10):
a.append([1,2,3])
print a
For example:
Add a list to an empty matrix as a row where [i for i in range(4)] will give you a list [0,1,2,3]
#empty matrix
mat = []
mat.append([i for i in range(4)])
print mat
#output
[[0,1,2,3]]
A one-liner:
import random
matrix = [[random.randint(1, 3) for _ in range(10)]]
I wonder why would you need a one-row "matrix", as opposed to just a list of random numbers.
If you want a matrix with 1000 identical rows, create a row and add it 1000 times.
row = [random.randint(1, 3) for _ in range(10)]
matrix = [row for _ in range(1000)]
Or a funny syntax-sugared version:
matrix = [row] * 1000
Please note that the matrix will have 1000 references to the same row, not 1000 different copies of the row. If you mutate any value in any row, it will affect every row.
If you want a mutable matrix initially filled with 1000 identical rows, make a copy of the row each time:
matrix = [list(row) for _ in range(1000)]
Each invocation of list creates a new list filled with numbers from row.
Try to use numpy.random
import numpy as np
A = np.random.randint(1,4, size=(1000, 3))
and A should be the matrix you want.
For example,
import numpy as np
a = np.random.randint(1,4, size=(10, 3))
a looks like:
array([[1, 3, 3],
[3, 1, 1],
[2, 2, 1],
[3, 2, 3],
[2, 1, 2],
[3, 3, 2],
[2, 1, 2],
[1, 1, 2],
[2, 1, 1],
[1, 1, 3]])
import numpy as np
a = []
for i in range(2):
ai = np.random.rand(1,3)
a.append([ai])
print a

Python NumPy: How to fill a matrix using an equation

I wish to initialise a matrix A, using the equation A_i,j = f(i,j) for some f (It's not important what this is).
How can I do so concisely avoiding a situation where I have two for loops?
numpy.fromfunction fits the bill here.
Example from doc:
>>> import numpy as np
>>> np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
One could also get the indexes of your array with numpy.indices and then apply the function f in a vectorized fashion,
import numpy as np
shape = 1000, 1000
Xi, Yj = np.indices(shape)
A = (2*Xi + 3*Yj).astype(np.int) # or any other function f(Xi, Yj)

Categories