I am to use the Math Library to do some calculations on an array.
I tried something like this:
import numpy as np
import math
a = np.array([0, 1, 2, 3])
a1 = np.vectorize(a)
print("sin(a) = \n", math.sin(a1))
Unfortunately it does not work. An error occur: "TypeError: must be real number, not vectorize".
How can I use the vectorize function to be able to calculate that kind of things?
The whole point of numpy is that you don't need any math method or any list comprehension:
>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> a + 1
array([1, 2, 3, 4])
>>> np.sin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001])
>>> a ** 2
array([0, 1, 4, 9])
>>> np.exp(a)
array([ 1. , 2.71828183, 7.3890561 , 20.08553692])
You can use a as if it were a scalar and you get the corresponding array.
If you really need to use math.sin (hint: you don't), you can vectorize it (the function itself, not the array):
>>> vsin = np.vectorize(math.sin)
>>> vsin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001])
import numpy as np
import math
a = np.array([0, 1, 2, 3])
print("sin(a) = \n", [math.sin(x) for x in a])
math.sin requires one real number at a time.
I'd like to do slice operation on numpy array in parametric way in function so I could get expected array element for my computation. I know how to slide the array by index, but I am more interested in slicing array element in parametric way, so no need to indicate the index. In my case, I have coefficient array c and power array p, I have also parameter num_order. Basically, num_order decide the index of slicing array. To do so, I have following attempt:
my attempt:
import numpy as np
c=[1,1/2, -1/6, 1/12]
p= [1,2,3,4]
x = np.array([1, 1, 2, 3, 5, 8, 13, 21])
def arr_pow(x, num_order):
output= []
for i in range(num_order):
mul = c[i] * np.power(x, p[i])
return output
so, if num_order=2, then I also slice first two term of c and p doing c_new = c[:-2], p_new=p[:-2], c_new=[1,1/2], p_new=[1,2] and so on. I am curious is there any better way to do slicing element in two or more array based on param num_order. Can anyone point me out any elegant way to make this happen in parameterized function? Any thoughts?
instead of doing c_new=c[:-1], p_new=[:-1] if num_order=3, and c_new=c[:-2], p_new=p[:-2] if num_order=2, and so on, is there more elegant way (parametric fashion) to do this? Any way of doing this efficiently in python function? Thanks!
I'm not sure if this is the output you want (if you could please update your question to include the expected output that would be helpful):
import numpy as np
c = np.array([1, 1 / 2, -1 / 6, 1 / 12])
p = np.array([1, 2, 3, 4])
x = np.array([1, 1, 2, 3, 5, 8, 13, 21])
def arr_pow_numpy(x, num_order):
return c[:num_order, None] * np.power(x[None], p[:num_order, None])
def arr_pow(x, num_order):
output = []
for i in range(num_order):
mul = c[i] * np.power(x, p[i])
return np.asarray(output)
for num_order in range(1, len(p)):
assert np.array_equal(arr_pow(x, num_order), arr_pow_numpy(x, num_order)), f"{num_order}"
The idea here is to use NumPy broadcasting plus NumPy slicing to achieve the result you want without for loops and in a parametric way.
Use the following:
num_order = 2
np.array([c[i] * np.power(x, p[i]) for i in range(num_order)])
# Out:
# array([[ 1. , 1. , 2. , 3. , 5. , 8. , 13. , 21. ],
# [ 0.5, 0.5, 2. , 4.5, 12.5, 32. , 84.5, 220.5]])
I would like to calculate the geometric mean of some data (including NaN), how can I do it?
I know how to calculate the mean value with NaNs, we can use the following code:
import numpy as np
M = np.nanmean(data, axis=2).
So how to do it with geomean?
You could use the identity (I only found it in the german Wikipedia but there are probably other sources as well):
This identity can be constructed using the "logarithm rules" on the normal definition of the geometric mean:
The base a can be chosen arbitarly, so you could use np.log (and np.exp as inverse operation):
import numpy as np
def nangmean(arr, axis=None):
arr = np.asarray(arr)
inverse_valids = 1. / np.sum(~np.isnan(arr), axis=axis) # could be a problem for all-nan-axis
rhs = inverse_valids * np.nansum(np.log(arr), axis=axis)
return np.exp(rhs)
And it seems to work:
>>> l = [[1, 2, 3], [1, np.nan, 3], [np.nan, 2, np.nan]]
>>> nangmean(l)
>>> nangmean(l, axis=1)
array([ 1.81712059, 1.73205081, 2. ])
>>> nangmean(l, axis=0)
array([ 1., 2., 3.])
In NumPy 1.10 also np.nanprod was added, so you could also use the normal definition:
import numpy as np
def nangmean(arr, axis=None):
arr = np.asarray(arr)
valids = np.sum(~np.isnan(arr), axis=axis)
prod = np.nanprod(arr, axis=axis)
return np.power(prod, 1. / valids)
In Python, I have the following problem, made into a toy example:
import random
import numpy as np
x_arr = np.array([], dtype = object)
for x in range(5):
y_arr = np.array([], dtype=object)
for y in range(5):
r = random.random()
if r < 0.5:
y_arr = np.append(y_arr,y)
if random.random() < 0.9:
x_arr = np.append(x_arr, y_arr)
#This results in
>>> x_arr
array([4, 0, 1, 2, 4, 0, 3, 4], dtype=object)
I would like to have
array([array([4]), array([0, 1, 2, 4]), array([0, 3, 4]), dtype=object)
So apparently, in this run 3 out of 5 (variable) times the array $y_arr$ is written into $x_arr$, having lengths 1,4, and 3 (variable).
append() puts the results in one long 1D-structure, where I would like to keep it 2D. Also, considering the example, it might be that no numbers get written at all (if you are 'unlucky' with the random numbers). So i have an a priori unknown array of arrays with, each of those, a priori unknown number of elements. How would I approach this in Python, other than finding an upperbound on both and store a lot of zeros?
You might do it in a two step process? First add an element, then set the element. This circumvents the automatic flatten which happens in np.append() when axis=None (default behavior), as documented here.
import random
import numpy as np
x_arr = np.array([], dtype = object).reshape((1,0))
for x in range(5):
y_arr = np.array([], dtype=np.int32)
for y in range(5):
r = random.random()
if r < 0.5:
y_arr = np.append(y_arr,y)
if random.random() < 0.9:
x_arr = np.append(x_arr, 0)
x_arr[-1] = y_arr
print type(x_arr)
print x_arr
This gives:
<type 'numpy.ndarray'>
[array([0, 1, 2]) array([0, 1, 2, 3]) array([0, 1, 4]) array([0, 1, 3, 4])
array([2, 3])]
Also, why not use a python list for x_arr (or y_arr?). Nested numpy arrays are not really useful when they are not ndarrays.
I have a numpy matrix X and I would like to add to this matrix as new variables all the possible products between 2 columns.
So if X=(x1,x2,x3) I want X=(x1,x2,x3,x1x2,x2x3,x1x3)
Is there an elegant way to do that?
I think a combination of numpy and itertools should work
Very good answers but are they considering that X is a matrix? So x1,x1,.. x3 can eventually be arrays?
A Real example
Itertools should be the answer here.
a = [1, 2, 3]
p = (x * y for x, y in itertools.combinations(a, 2))
print list(itertools.chain(a, p))
[1, 2, 3, 2, 3, 6] # 1, 2, 3, 2 x 1, 3 x 1, 3 x 2
I think Samy's solution is pretty good. If you need to use numpy, you could transform it a little like this:
from itertools import combinations
from numpy import prod
x = [1, 2, 3]
print x + map(prod, combinations(x, 2))
Gives the same output as Samy's solution:
[1, 2, 3, 2, 3, 6]
If your arrays are small, then Samy's pure-Python solution using itertools.combinations should be fine:
from itertools import combinations, chain
def all_products1(a):
p = (x * y for x, y in combinations(a, 2))
return list(chain(a, p))
But if your arrays are large, then you'll get a substantial speedup by fully vectorizing the computation, using numpy.triu_indices, like this:
import numpy as np
def all_products2(a):
x, y = np.triu_indices(len(a), 1)
return np.r_[a, a[x] * a[y]]
Let's compare these:
>>> data = np.random.uniform(0, 100, (10000,))
>>> timeit(lambda:all_products1(data), number=1)
>>> timeit(lambda:all_products2(data), number=1)
The solution using numpy.triu_indices also works for multi-dimensional data:
>>> np.random.uniform(0, 100, (3,2))
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808]])
>>> all_products2(_)
array([[ 63.75071196, 15.19461254],
[ 94.33972762, 50.76916376],
[ 88.24056878, 90.36136808],
[ 6014.22480172, 771.41777239],
[ 5625.39908354, 1373.00597677],
[ 8324.59122432, 4587.57109368]])
If you want to operate on columns rather than rows, use:
def all_products3(a):
x, y = np.triu_indices(a.shape[1], 1)
return np.c_[a, a[:,x] * a[:,y]]
For example:
>>> np.random.uniform(0, 100, (2,3))
array([[ 33.0062385 , 28.17575024, 20.42504351],
[ 40.84235995, 61.12417428, 58.74835028]])
>>> all_products3(_)
array([[ 33.0062385 , 28.17575024, 20.42504351, 929.97553238,
674.15385734, 575.4909246 ],
[ 40.84235995, 61.12417428, 58.74835028, 2496.45552756,
2399.42126888, 3590.94440122]])
I have a numpy array X of size N, filled with 0 and 1.
I generate a sample S of size M
I want to revert the elements of X on each position from sample S.
I want to ask whether this is possible without using loops, but using some atomic operation from the numpy mask module.
I want to any type of loop like
for i in sample:
X[i] = 1-X[i]
and replace it with a single call in pylab.
Possible ?
Use X[sample] = 1 - X[sample].
For example:
>>> import numpy as np
>>> X = np.array([1, 1, 0, 1, 1])
>>> sample = [1,2,3]
>>> X[sample]
array([1, 0, 1])
>>> X[sample] = 1 - X[sample]
>>> X
array([1, 0, 1, 0, 1])