How to stack multiple features of different shapes in NumPy?

How to stack multiple features of different shapes in NumPy? - python

What is a better way to do the following codeblock? I want to create a 1d array for each scene containing features a-e to eventually have the shape: m x n if m is the number of scenes and n is the combined length of all the features.
The shape of features a-d is unknown and can be different from each other. For example feature a could have shape 100 x 3 x 3 x 5 while feature b could have shape 30 x 4. Feature e is simply a boolean.
inputs = []
for scene in scenes:
inp = np.concatenate((
scene['a'].flatten(),
scene['b'].flatten(),
scene['c'].flatten(),
scene['d'].flatten(),
[scene['e'] == True]))
inputs.append(inp)
inputs = torch.FloatTensor(inputs)

Let say we know ['a', 'b', 'c', 'd', 'e'] are the only attributes in each scene ( so they're accessible by scene.keys(). Then, the following code works:
output = np.vstack(
np.hstack(map(lambda x: np.array(x).flatten(), s.values())) for s in scenes
)
inputs = torch.FloatTensor(output)
In order to test that, I created a scene generator function that creates dictionaries similar to what you mentioned and synthetically create 10 scenes:
import numpy as np
def scense_generator():
scene = dict()
scene['a'] = np.random.random((100, 3, 3, 5))
scene['b'] = np.random.random((30, 4))
scene['c'] = np.random.random((15, 15, 2))
scene['d'] = np.random.random((2, 2, 2))
scene['e'] = True
return scene
scenes = [scense_generator() for _ in range(10)]
output = np.vstack(
np.hstack(map(lambda x: np.array(x).flatten(), s.values())) for s in scenes
)
print(output.shape)
# (10, 5079)

Related

All combinations of all elements of a 2D array

So I have matrix A
A = [[0,0,1,-1]
[0,0,1,-1]
[0,0,1,-1]
[0,0,1,-1]]
And I want to have all the possible combinations with these elements. This means that rows can change between them and columns as well. In this situation, I would expect a 4^4 = 256 possibilities. I have tried:
combs = np.array(list(itertools.product(*A)))
It does creates me, my desire to output a matrix of (256,4), but all the rows are equal. This means that I get vector [0,0,1,-1], 256 times.
Here is an example:
output = [[0,0,0,0]
[0,0,0,1]
[0,0,1,1]
[0,1,1,1]
[1,1,1,1]
[-1,1,1,-1]
[-1,-1,-1,-1]
....
[0,-1,0,-1]
Another example, if
A = [[1,2,3]
[4,5,6]
[7,8,9]]
The output should be all the possible combinations of arrays that the matrix can form
Combs =[[1,1,1]
[1,1,2]
[1,1,3]
[1,1,...9]
[2,1,1]
[2,2,1]
[1,2,1]
Another example would be:
I have the vector layers
layers = [1,2,3,4,5]
And then I have vector angle
angle = [0,90,45,-45]
each layer can have one of the angles, so I create a matrix A
A = [[0,90,45,-45]
[0,90,45,-45]
[0,90,45,-45]
[0,90,45,-45]
[0,90,45,-45]]
Great, but now I want to know all possible combinations that layers can have. For example, layer 1 can have an angle of 0º, layer 2 an angle of 90º, layer 3 an angle of 0º, layer 4 an angle of 45º and layer 5 and an angle of 0º. This creates the array
Comb = [0,90,0,45,0]
So all the combinations would be in a matrix
Comb = [[0,0,0,0,0]
[0,0,0,0,90]
[0,0,0,90,90]
[0,0,90,90,90]
[0,90,90,90,90]
[90,90,90,90,90]
...
[0,45,45,45,45]
[0,45,90,-45,90]]
How can I generalize this process for bigger matrices.
Am I doing something wrong?
Thank you!

It's OK to use np.array in conjunction with list(iterable), especially in your case where iterable is itertools.product(*A). However, this can be optimised since you know the shape of array of your output.
There are many ways to perform product so I'll just put my list:
Methods of Cartesian Product
import itertools
import numpy as np
def numpy_product_itertools(arr):
return np.array(list(itertools.product(*arr)))
def numpy_product_fromiter(arr):
dt = np.dtype([('', np.intp)]*len(arr)) #or np.dtype(','.join('i'*len(arr)))
indices = np.fromiter(itertools.product(*arr), dt)
return indices.view(np.intp).reshape(-1, len(arr))
def numpy_product_meshgrid(arr):
return np.stack(np.meshgrid(*arr), axis=-1).reshape(-1, len(arr))
def numpy_product_broadcast(arr): #a little bit different type of output
items = [np.array(item) for item in arr]
idx = np.where(np.eye(len(arr)), Ellipsis, None)
out = [x[tuple(i)] for x,i in zip(items, idx)]
return list(np.broadcast(*out))
Example of usage
A = [[1,2,3], [4,5], [7]]
numpy_product_itertools(A)
numpy_product_fromiter(A)
numpy_product_meshgrid(A)
numpy_product_broadcast(A)
Comparison of performance
import benchit
benchit.setparams(rep=1)
%matplotlib inline
sizes = [3,4,5,6,7]
N = sizes[-1]
arr = [np.arange(0,100,10).tolist()] * N
fns = [numpy_product_itertools, numpy_product_fromiter, numpy_product_meshgrid, numpy_product_broadcast]
in_ = {s: (arr[:s],) for s in sizes}
t = benchit.timings(fns, in_, multivar=True, input_name='Cartesian product of N arrays of length=10')
t.plot(logx=False, figsize=(12, 6), fontsize=14)
Note that numba beats majority of these algorithms although it's not included.

Slicing 2D numpy array periodically

I have a numpy array of 300x300 where I want to keep all elements periodically. Specifically, for both axes I want to keep the first 5 elements, then discard 15, keep 5, discard 15, etc. This should result in an array of 75x75 elements. How can this be done?

You can created a 1D mask, that carries out the keep/discard function, and then repeat the mask and apply the mask to the array. Here is an example.
import numpy as np
size = 300
array = np.arange(size).reshape((size, 1)) * np.arange(size).reshape((1, size))
mask = np.concatenate((np.ones(5), np.zeros(15))).astype(bool)
period = len(mask)
mask = np.repeat(mask.reshape((1, period)), repeats=size // period, axis=0)
mask = np.concatenate(mask, axis=0)
result = array[mask][:, mask]
print(result.shape)

You can view the array as series of 20x20 blocks, of which you want to keep the upper-left 5x5 portion. Let's say you have
keep = 5
discard = 15
This only works if
assert all(s % (keep + discard) == 0 for s in arr.shape)
First compute the shape of the view and use it:
block = keep + discard
shape1 = (arr.shape[0] // block, block, arr.shape[1] // block, block)
view = arr.reshape(shape1)[:, :keep, :, :keep]
The following operation will create a copy of the data because the view creates a non-contiguous buffer:
shape2 = (shape1[0] * keep, shape1[2] * keep)
result = view.reshape(shape2)
You can compute shape1 and shape2 in a more general manner with something like
shape1 = tuple(
np.stack((np.array(arr.shape) // block,
np.full(arr.ndim, block)), -1).ravel())
shape2 = tuple(np.array(shape1[::2]) * keep)
I would recommend packaging this into a function.

Here is my first thought of a solution. Will update later if I think of one with fewer lines. This should work even if the input is not square:
output = []
for i in range(len(arr)):
tmp = []
if i % (15+5) < 5: # keep first 5, then discard next 15
for j in range(len(arr[i])):
if j % (15+5) < 5: # keep first 5, then discard next 15
tmp.append(arr[i,j])
output.append(tmp)
Update:
Building off of Yang's answer, here is another way which uses np.tile, which repeats an array a given number of times along each axis. This relies on the input array being square in dimension.
import numpy as np
# Define one instance of the keep/discard box
keep, discard = 5, 15
mask = np.concatenate([np.ones(keep), np.zeros(discard)])
mask_2d = mask.reshape((keep+discard,1)) * mask.reshape((1,keep+discard))
# Tile it out -- overshoot, then trim to match size
count = len(arr)//len(mask_2d) + 1
tiled = np.tile(mask_2d, [count,count]).astype('bool')
tiled = tiled[:len(arr), :len(arr)]
# Apply the mask to the input array
dim = sum(tiled[0])
output = arr[tiled].reshape((dim,dim))

Another option using meshgrid and a modulo:
# MyArray = 300x300 numpy array
r = np.r_[0:300] # A slide from 0->300
xv, yv = np.meshgrid(r, r) # x and y grid
mask = ((xv%20)<5) & ((yv%20)<5) # We create the boolean mask
result = MyArray[mask].reshape((75,75)) # We apply the mask and reshape the final output

Wrapping array operation into a function

The input X to my network has the shape (10, 1, 5, 4). I am interested in boxplotting the distribution of input features (four), for each class. So, for example:
X = np.random.randn(10, 1, 5, 4)
a = np.zeros(5, dtype=int)
b = np.ones(5, dtype=int)
y = np.hstack((a,b))
print(X.shape)
print(y.shape)
(10, 1, 5, 4)
(10,)
Then I separate the input Xinto respective classes, like:
class0, class1 =[],[]
for i in range(len(y)):
if y[i]==0:
class0.append(X[i])
else:
class1.append(X[i])
class0 = np.array(class0)
class1 = np.array(class1)
Taking class0into consideration, I can go ahead to manipulate it in a way that the four features are arranged per column (col1, col2,col3,col4) this way.
def transformer(myclass):
#reshape the class
k = myclass.transpose((0,1,3,2))
#access individual feature
s = k[0][:,0].reshape(-1,1)
a = k[0][:,1].reshape(-1,1)
j = k[0][:, 2].reshape(-1,1)
b = k[0][:, 3].reshape(-1,1)
rslt = [s,a,j,b]
return rslt
Then plot the features:
sns.boxplot(data=transformer(class0))
This is the general idea of my workflow. Note that the function transformer is hardcoded to access only the first observation (element) of the class it takes as input.
Question: How to I do modify my function to access all observations of the class, not per every single example, for generalised. Such that col1are all features in the class that are in first column for each example.
Do write the following:
def mytransformer(myclass):
#first, transpose class
k = myclass.transpose((0,1,3,2))
#speed
for i in range(k):
s = k[i][:,0].reshape(-1,1)
return s
Which gives the error:
mytransformer(class0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-5451e55f03d9> in <module>()
----> 1 mytransformer(class0)
<ipython-input-14-d1a2c8098caf> in mytransformer(myclass)
3 myclass = myclass.transpose((0,1,3,2))
4 #speed
----> 5 for i in range(myclass):
6 s = k[i][:,0].reshape(-1,1)
7 return s
TypeError: only integer scalar arrays can be converted to a scalar index
Is there a way to add legend to the boxplot so that I can give name to each feature?

For your Question 1, You are using for loop range with a NumPy array which instead should have argument as an integer.
Maybe it is,
for i in range(len(k)):

Rearrange 3D array in python

I have big binary 3D data and I want to re-arrange the data such as it is a sequence of values in order achieved by parsing the original data as sub-arrays of size (4x4x4).
For example, if the data is 2D and I want to re-arrange the data from 2x2 sub-arrays
example image
I used simple loops for this but just iterating over the loops took way more times, I am trying to to use some numpy functions to do so but I am new to SciPy
My code looks like this
x,y,z = 1200,800,400
data = np.fromfile(file_name, dtype=np.float32)
data.shape = (z,y,x)
new_data = np.empty(shape=x*y*z, dtype = np.float32)
index = 0
for zz in range(0,z,4):
for yy in range(0,y,4):
for xx in range(0,x,4):
for zShift in range(4):
for yShift in range(4):
for xShift in range(4):
new_data[index] = data[zz+zShift][yy+yShift][xx+xShift]
index+=1
new_data.tofile(output)
However, this takes a lot of time, any better implementation ideas?
As I said, the code works as intended, however, I need a smarter, pythonic way to achieve my output
Thank you!

x,y,z = 1200,800,400
data = np.empty([x,y,z])
# numpy calculates the shape of -1
out = data.reshape(-1, 4, 4, 4)
out.shape
>>> (6000000, 4, 4, 4)

Perform the following test, for smaller data and block size:
x, y, z = 4, 4, 4 # Dimensions
stp = 2 # Block size (in each dimension)
# Create the test array
arr = np.arange(x * y * z).reshape((x, y, z))
And to create a list of "blocks", run:
new_data = []
for xx in range(0, x, stp):
for yy in range(0, y, stp):
for zz in range(0, z, stp):
print('Index:', xx, yy, zz)
obj = arr[xx:xx+stp, yy:yy+stp, zz:zz+stp].copy()
print(obj)
new_data.append(obj)
In the target version of your code:
restore original values of x, y and z,
read the array from your source,
change stp back to 4,
drop test printouts.
Note also that your code adds individual elements to new_data,
only iterating over blocks of size 4 * 4 * 4,
whereas you wrote that you want a sequence of smaller arrays
(i.e. slices) of size 4 * 4 * 4, what my code does.
So if you need a list of slices (smaller arrays), not a single
4-D array, use my code.

Using multiple independent variables in Python lmfit

I am trying to fit a model to some data. The independent variables are called A and B, and they are columns in a Pandas DataFrame. I am trying to fit with two parameters against y in the data frame.
Previously, with curve_fit from Scipy, I could do:
def fun(X, p1, p2):
A, B = X
return np.exp(p1*A) + p2*B
X = (df['A'].tolist(), df['B'].tolist())
popt, pcov = curve_fit(fun, X, df['y'].tolist())
But now, I'm using lmfit, where I cannot simply "pack" the independent variables like with curve_fit:
def fun(A, B, p1 = 1, p2 = 1):
return np.exp(p1*A) + p2*B
model = Model(fun, independent_vars=['A', 'B'])
How do I run model.fit() here? The FAQ is not really helpful—what do I have to flatten in the first place?

I created a complete, working example with two independent variables:
import pandas as pd
import numpy as np
from lmfit import Model
df = pd.DataFrame({
'A' : pd.Series([1, 1, 1, 2, 2, 2, 2]),
'B' : pd.Series([5, 4, 6, 6, 5, 6, 5]),
'target' : pd.Series([87.79, 40.89, 215.30, 238.65, 111.15, 238.65, 111.15])
})
def fun(A, B, p1 = 1, p2 = 1):
return p1 * np.exp(A) + p2 * np.exp(B)
model = Model(fun, independent_vars=['A', 'B'])
fit = model.fit(df['target'], A = df['A'], B = df['B'])
The trick is to specify all variables as keyword arguments in fit().

Firstly, creat a model with this function of multiple independent variables.
for example,
def random_func(x,y,a,b,c):
return a*x**3+b*y**2+c
Secondly, specify which ones are the independent variables in the formula.
for example,
from lmfit import Model
model = Model(random_func,independent_vars=['x','y'])
Thirdly, set params for the model
for example,
model.set_param_hint('a',value=2)
model.set_param_hint('b',value=3)
model.set_param_hint('c',value=4)
finally, set your x-axis values, as well as y-axis. And do the fit
Like this,
x = np.arange(0,2,0.1)
y = np.arange(0,2,0.1)
z = np.loadtxt('filename')
A direct fit actually does not work well. The 2D data array has to be flattened into 1D array, as well as the coordinates. For example, let's leave the model as it is. We need to create new 1D coordinates arrays.
x1d = []
y1d = []
for i in x:
for j in y:
x1d = x1d.append(i)
y1d = y1d.append(j)
z1d = z.flatten_data()
result = model.fit(z1d, x = x1d, y = y1d)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to stack multiple features of different shapes in NumPy? - python

Related

All combinations of all elements of a 2D array

Slicing 2D numpy array periodically

Wrapping array operation into a function

Rearrange 3D array in python

Using multiple independent variables in Python lmfit

Categories

Resources