Adding a bias to numpy array - python

I have two numpy arrays, one is 2D of dimension (a, b) and the other is 1D of dimension a. I want to add the single value for each index of the 1D array to each member of the same index in the 2D array. Here's an example of what I'm trying to do:
import numpy as np
firstArr = np.random.random((5,6))
secondArr = np.linspace(0, 4, 5)
I know I can do what I want with the loop:
for i in range(5):
firstArr[i] = firstArr[i] + secondArr[i]
But this is really not Pythonic, so it will no doubt take a looong time when I do it over millions of iterations.
What's the solution, here? Thanks!
Edit:
I think I found the answer: use np.newaxis. This works, but I don't know if there's a more efficient way to do this. Here's how It would work:
arr = firstArr + secondArr[:, np.newaxis]
I'm leaving the question open for now because there's probably a more efficient way of doing this, but this at least works. It can also, per Numpy documentation, be written as:
arr = firstArr + secondArr[:, None]
I'll admit I don't know what exactly this does (still looking into that), so any guidance would be appreciated

Related

More efficient way to access rows based on a list of indices in 2d numpy array?

So I have 2d numpay array arr. It's a relatively big one: arr.shape = (2400, 60000)
What I'm currently doing is the following:
randomly (with replacement) select arr.shape[0] indices
access (row-wise) chosen indices of arr
calculating column-wise averages and selecting max value
I'm repeating it for k times
It looks sth like:
no_rows = arr.shape[0]
indicies = np.array(range(no_rows))
my_vals = []
for k in range(no_samples):
random_idxs = np.random.choice(indicies, size=no_rows, replace=True)
my_vals.append(
arr[random_idxs].mean(axis=0).max()
)
My problem is that is very slow. With my arr size, it takes ~3s for 1 loop. As I want a sample that is bigger than 1k - my current solution solution pretty bad (1k*~3s -> ~1h). I've profiled it and the bottleneck is accessing row based on indices. "mean" and "max" work fast. np.random.choice is also ok.
Do you see any area for improvement? A more efficient way of accessing indices or maybe better a faster approach that solves the problem without this?
What I tried so far:
numpy.take (slower)
numpy.ravel :
sth similar to:
random_idxs = np.random.choice(sample_idxs, size=sample_size, replace=True)
test = random_idxs.ravel()[arr.ravel()].reshape(arr.shape)
similar approach to current one but without loop. I created 3d arr and accessed rows across additional dimension in one go
Since advanced indexing will generate a copy, the program will allocate huge memory in arr[random_idxs].
So one of the most simple way to improve efficiency is that do things batch wise.
BATCH = 512
max(arr[random_idxs,i:i+BATCH].mean(axis=0).max() for i in range(0,arr.shape[1],BATCH))
This is not a general solution to the problem, but should make your specific problem much faster. Basically, arr.mean(axis=0).max() won't change, so why not take random samples from that array?
Something like:
mean_max = arr.mean(axis=0).max()
my_vals = np.array([np.random.choice(mean_max, size=len(mean_max), replace=True) for i in range(no_samples)])
You may even be able to do: my_vals = np.random.choice(mean_max, size=(no_samples, len(mean_max)), replace=True), but I'm not sure how, if at all, that would change your statistics.

Numpy Array Multiplication

I couldn't seem to find this problem on stackoverflow although I'm sure someone has asked this before.
I have two numpy arrays as follows:
a = np.ones(shape = (2,10))
b = np.ones(2)
I want to multiply the first row of 10 of a by the first number in b and the second row by the second number. I can do this using lists as follows:
np.array([x*y for x,y in zip(b,a)])
I was wondering if there is a way to do this in numpy that would be a similar one liner to the list method.
I am aware I can reshape a to (1,2,10) and b to (2,1) to effectively achieve this - is this the only solution? Or is there a numpy method that can do this without manually reshaping.
This might be what you are looking for:
a*np.tile(np.expand_dims(b,axis=1),(1,10))
If you want to make use of the automatic numpy broadcasting, you need to reshape b first:
np.multiply(a, b.reshape(2,1))

expand numpy array in n dimensions

I am trying to 'expand' an array (generate a new array with proportionally more elements in all dimensions). I have an array with known numbers (let's call it X) and I want to make it j times bigger (in each dimension).
So far I generated a new array of zeros with more elements, then I used broadcasting to insert the original numbers in the new array (at fixed intervals).
Finally, I used linspace to fill the gaps, but this part is actually not directly relevant to the question.
The code I used (for n=3) is:
import numpy as np
new_shape = (np.array(X.shape) - 1 ) * ratio + 1
new_array = np.zeros(shape=new_shape)
new_array[::ratio,::ratio,::ratio] = X
My problem is that this is not general, I would have to modify the third line based on ndim. Is there a way to use such broadcasting for any number of dimensions in my array?
Edit: to be more precise, the third line would have to be:
new_array[::ratio,::ratio] = X
if ndim=2
or
new_array[::ratio,::ratio,::ratio,::ratio] = X
if ndim=4
etc. etc. I want to avoid having to write code for each case of ndim
p.s. If there is a better tool to do the entire process (such as 'inner-padding' that I am not aware of, I will be happy to learn about it).
Thank you
array = array[..., np.newaxis] will add another dimension
This article might help
You can use slice notation -
slicer = tuple(slice(None,None,ratio) for i in range(X.ndim))
new_array[slicer] = X
Build the slicing tuple manually. ::ratio is equivalent to slice(None, None, ratio):
new_array[(slice(None, None, ratio),)*new_array.ndim] = ...

How to get arrays that ouput result in brackets like [1][2][3] to [1 2 3]

The title kind of says it all. I have this (excerpt):
import numpy as np
import matplotlib.pyplot as plt
number_of_particles=1000
phi = np.arccos(1-2*np.random.uniform(0.0,1.,(number_of_particles,1)))
vc=2*pi
mux=-vc*np.sin(phi)
and I get out
[[-4.91272413]
[-5.30620302]
[-5.22400513]
[-5.5243784 ]
[-5.65050497]...]
which is correct, but I want it to be in the format
[-4.91272413 -5.30620302 -5.22400513 -5.5243784 -5.65050497....]
Feel like there should be a simple solution, but I couldn't find it.
Suppose your array is represented by the variable arr.
You can do,
l = ''
for i in arr:
l = l+i+' '
arr = [l]
Use this command:
new_mux = [i[0] for i in mux]
But I need it in an array, so then I add this
new_mux=np.array(new_mux)
and I get the desired output.
There's a method transpose in numpy's array object
mux.transpose()[0]
(I just noticed that this is a very old question, but since I have typed up this answer, and I believe it is simpler and more efficient than the existing ones, I'll post it...)
Notice that when you do
np.random.uniform(0.0,1.,(number_of_particles, 1))
you are creating a two-dimensional array with number_of_particles rows and one column. If you want a one-dimensional array throughout, you could do
np.random.uniform(0.0,1.,(number_of_particles,))
instead.
If you want to keep things 2d, but reshape mux for some reason, you can... well, reshape it:
mux_1d = mux.reshape(-1)
-1 here means "reshape it to one axis (because there’s just one number) and figure out automatically home many elements there should be along that axis (because the number is -1)."

Concatenating Numpy array to Numpy array of arrays

I'm trying to make a for loop that each time adds an array, to the end of an array of arrays and I can't quite put my finger on how to.
The general idea of the program:
for x in range(0,longnumber):
generatenewarray
add new array to end of array
So for example, the output of:
newArray = [1,2,3]
array = [[1,2,3,4],[1,4,3]]
would be: [[1,2,3,4],[1,4,3],[1,2,3]]
If the wording is poor let me know and I can try and edit it to be better!
Is this what you need?
list_of_arrays = []
for x in range(0,longnumber):
a = generatenewarray
list_of_arrays.append(a)
It's not pretty, but this will work. You turn both numpy arrays into lists, add those two lists, and finally convert the result into a new numpy array:
np.array(array.tolist() + newArray.tolist())

Categories