Related
I have a numpy arrary:
import numpy as np
pval=np.array([[0., 0.,0., 0., 0.,0., 0., 0.],
[0., 0., 0., 0., 0.,0., 0., 0.]])
And a vectorized function:
def getnpx(age):
return pval[0]+age
vgetnpx = np.frompyfunc(getnpx, 1, 1)
vgetnpx(1)
The output:
array([1., 1., 1., 1., 1., 1., 1., 1.])
However if I want to set a variable for pval:
def getnpx(mt,age):
return mt[0]+age
vgetnpx = np.frompyfunc(getnpx, 2, 1)
vgetnpx(pval,1)
I received an error:
TypeError: 'float' object is not subscriptable
What is the correct way to set a variable for pval ?Any friend can help?
I don't see why you are trying to use frompyfunc. That's for passing array arguments to a function that only takes scalar inputs.
In [97]: pval=np.array([[0., 0.,0., 0., 0.,0., 0., 0.],
...: [0., 0., 0., 0., 0.,0., 0., 0.]])
In the first case you use global pval, and use just 1 age value. No need to frompyfunc:
In [98]: pval[0]+1
Out[98]: array([1., 1., 1., 1., 1., 1., 1., 1.])
And if you want to pass pval as argument, just do:
In [99]: def foo(mt,age):
...: return mt[0]+age
...:
In [100]: foo(pval,1)
Out[100]: array([1., 1., 1., 1., 1., 1., 1., 1.])
You gave a link to an earlier question that I answered. The sticky point in that case was that your function returned an array that could vary in size. I showed how to use it with a list comprehension. I also showed how to tweak vectorize so it would happy returning an object dtype result. Alternatively use frompyfunc to return that object. In all those cases the function argument was a scalar, a single number.
If your goal is to add a different age to each row of pval, just do:
In [102]: pval + np.array([[1],[2]])
Out[102]:
array([[1., 1., 1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2., 2., 2.]])
I have an 2 dimensional array with np.shape(input)=(a,b) and that looks like
input=array[array_1[0,0,0,1,0,1,2,0,3,3,2,...,entry_b],...array_a[1,0,0,1,2,2,0,3,1,3,3,...,entry_b]]
Now I want to create an array np.shape(output)=(a,b,b) in which every entry that had the same value in the input get the value 1 and 0 otherwise
for example:
input=[[1,0,0,0,1,2]]
output=[array([[1., 0., 0., 0., 1., 0.],
[0., 1., 1., 1., 0., 0.],
[0., 1., 1., 1., 0., 0.],
[0., 1., 1., 1., 0., 0.],
[1., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1.]])]
My code so far is looking like:
def get_matrix(svdata,padding_size):
List=[]
for k in svdata:
matrix=np.zeros((padding_size,padding_size))
for l in range(padding_size):
for m in range(padding_size):
if k[l]==k[m]:
matrix[l][m]=1
List.append(matrix)
return List
But it takes 2:30 min for an input array of shape (2000,256). How can I become more effiecient by using built in numpy solutions?
res = input[:,:,None]==input[:,None,:]
Should give boolean (a,b,b) array
res = res.astype(int)
to get a 0/1 array
You're trying to create the array y where y[i,j,k] is 1 if input[i,j] == input[i, k]. At least that's what I think you're trying to do.
So y = input[:,:,None] == input[:,None,:] will give you a boolean array. You can then convert that to np.dtype('float64') using astype(...) if you want.
I have a 2-D numpy array let's say like this:
matrix([[1., 0., 0., ..., 1., 0., 0.],
[1., 0., 0., ..., 0., 1., 1.],
[1., 0., 0., ..., 1., 0., 0.],
[1., 1., 0., ..., 1., 0., 0.],
[1., 1., 0., ..., 1., 0., 0.],
[1., 1., 0., ..., 1., 0., 0.]])
I want to transform it into a 3-D numpy array based on the values of a column of a dataframe. Let's say the column is like this:
df = pd.DataFrame({"Case":[1,1,2,2,3,4]})
The final 3-D array should look like this:
matrix([
[
[1., 0., 0., ..., 1., 0., 0.], [1., 0., 0., ..., 0., 1., 1.]
],
[
[1., 0., 0., ..., 1., 0., 0.], [1., 1., 0., ..., 1., 0., 0.]
],
[
[1., 1., 0., ..., 1., 0., 0.]
],
[
[1., 1., 0., ..., 1., 0., 0.]
]
])
The first 2 arrays of the initial 2-D array becomes a 2-D array of the final 3-D array because from the column of the dataframe the first and second rows both have the same values of '1'.
Similarly, the next 2 arrays become another 2-D array of 2 arrays because the next two values of the column of the dataframe are '2' so the belong together.
There is only one row for the values '3' and '4' so the next 2-D arrays of the 3-D array has only 1 array each.
So, basically if two or more numbers of the column of the dataframe are same, then those indices of rows of the 2-D initial matrix belong together and are transformed into a 2-D matrix and pushed as an element of the final 3-D matrix.
How do I do this?
Numpy doesn't have very good support for arrays with rows of different length, but you can make it a list of 2D arrays instead:
M = np.ndarray(
[[1., 0., 0., ..., 1., 0., 0.],
[1., 0., 0., ..., 0., 1., 1.],
[1., 0., 0., ..., 1., 0., 0.],
[1., 1., 0., ..., 1., 0., 0.],
[1., 1., 0., ..., 1., 0., 0.],
[1., 1., 0., ..., 1., 0., 0.]]
)
df = pd.DataFrame({"Case":[1,1,2,2,3,4]})
M_per_case = [
np.stack([M[index] for index in df.index[df['Case'] == case]])
for case in set(df['Case'])
]
I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can't explain, maybe you can me help in understanding what is happening:
>>>> import h5py
>>>> import numpy as np
>>>> fp = h5py.File(datasource_fname, mode='w')
>>>> dt = h5py.special_dtype(vlen=np.dtype('float32'))
>>>> train_targets = fp.create_dataset('target_sequence', shape=(9549, 5,), dtype=dt)
>>>> test
Out[130]:
array([[ 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.]])
>>>> train_targets[0] = test
>>>> train_targets[0]
Out[138]:
array([ array([ 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1.], dtype=float32),
array([ 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.], dtype=float32),
array([ 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.], dtype=float32),
array([ 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0.], dtype=float32),
array([ 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)], dtype=object)
I do expect the train_targets[0] to be of this shape, however I can't recognize the rows in my array. They seem to be totally jumbled about, however it is consistent. By which I mean that every time I try the above code, train_targets[0] looks the same.
To clarify: the first element in my train_targets, in this case test, has shape (5,11), however the second element might be of shape (5,38) which is why I use vlen.
Thank you for your help
Mat
I think
train_targets[0] = test
has stored your (11,5) array as an F ordered array in a row of train_targets. According to the (9549,5) shape, that's a row of 5 elements. And since it is vlen, each element is a 1d array of length 11.
That's what you get back in train_targets[0] - an array of 5 arrays, each shape (11,), with values taken from test (order F).
So I think there are 2 issues - what a 2d shape means, and what vlen allows.
My version of h5py is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen only works with 1d arrays, an extension, so to speak, of byte strings.
Does the 5 in shape=(9549, 5,) have anything to do with 5 in the test.shape? I don't think it does, at least not as numpy and h5py see it.
When I make a file following the string vlen example:
>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)
and then do:
ds[0]='this one string'
and look at ds[0], I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds.
ds[0,0]='another'
is the correct way to set just one element.
vlen is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,) and (38,) with vlen, but not 2d ones.
Actually, train_targets output is reproduced with:
In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
test1[i]=test.T.flatten()[i:i+11]
It's 11 values taken from the transpose (F order), but shifted for each sub array.
Just found some unexpected behaviour in Numpy 1.8.1 in the triu function.
import numpy as np
a = np.zeros((4, 4))
a[1:, 2] = np.inf
a
>>>array([[ 0., 0., 0., 0.],
[ inf, 0., 0., 0.],
[ inf, 0., 0., 0.],
[ inf, 0., 0., 0.]])
np.triu(a)
>>>array([[ 0., 0., 0., 0.],
[ nan, 0., 0., 0.],
[ nan, 0., 0., 0.],
[ nan, 0., 0., 0.]])
Would this behaviour ever be desirable? Or shall I file a bug report?
Edit
I raised an issue on the Numpy github page
1. Explanation
Looks like you ignored the RuntimeWarning:
>>> np.triu(a)
twodim_base.py:450: RuntimeWarning: invalid value encountered in multiply
out = multiply((1 - tri(m.shape[0], m.shape[1], k - 1, dtype=m.dtype)), m)
The source code for numpy.triu is as follows:
def triu(m, k=0):
m = asanyarray(m)
out = multiply((1 - tri(m.shape[0], m.shape[1], k - 1, dtype=m.dtype)), m)
return out
This uses numpy.tri to get an array with ones below the diagonal and zeros above, and subtracts this from 1 to get an array with zeros below the diagonal and ones above:
>>> 1 - np.tri(4, 4, -1)
array([[ 1., 1., 1., 1.],
[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 1.]])
Then it multiplies this element-wise with the original array. So where the original array has inf, the result has inf * 0 which is NaN.
2. Workaround
Use numpy.tril_indices to generate the indices of the lower triangle, and set all those entries to zero:
>>> a = np.ones((4, 4))
>>> a[1:, 0] = np.inf
>>> a
array([[ 1., 1., 1., 1.],
[ inf, 1., 1., 1.],
[ inf, 1., 1., 1.],
[ inf, 1., 1., 1.]])
>>> a[np.tril_indices(4, -1)] = 0
>>> a
array([[ 1., 1., 1., 1.],
[ 0., 1., 1., 1.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 1.]])
(Depending on what you are going to do with a, you might want to take a copy before zeroing these entries.)