I have a float numpy array x, which contains values like, 0, .5, 1, 1.5,etc. I want to convert the float values into integers based on some equation and store them in a new array, newx. I did this,
newx=np.zeros(x.shape[0])
for i in range (x.shape [0]):
newx[i]= ((2*x[i]) +1)
print(newx, v)
However, when printing xnew, I get values like
(array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.,
10., 11., 12., 13., 14., 15., 16., 17., 18.])
newx must be used in some process, and it must be integer, when I want to use it in that process, I get an error stating that it must be of integer or Boolean type. Can anyone please tell me what mistake I've done?
Thank You.
Numpy is specifically designed for array manipulation. Try not to iterate over a numpy array like you did. You can read about how numpy datatypes are a little different than inbuilt datatypes. This leads to much higher run times.
Anyways Here is a working code for your problem
newx=x*2+1
newx=numpy.int16(newx) # as easy as this. ;)
Related
I'm trying to learn to write test functions for unit testing and I've written this code
from scipy.signal import argrelextrema
import numpy as np
def getMax(arr):
'''
Return the location of the maxima in an array
Parameters
----------
arr: array_like
A 1D array
Returns
-------
max: tuple containing one single array
Indices of elements corresponding to the maxima
'''
return argrelextrema(np.array(arr), np.greater)
And then I've made a new file where I import the original and make a test function with assert like this
import bjworking as bj
def testgetmax():
assert(bj.getMax([1., 2., 3., 2., 3., 4., 5., 6., 5., 4.])==[2,7])
return
print(testgetmax())
And I get an AssertionError when I don't think I should? If I just do this
import bjworking as bj
def testgetmax():
test=bj.getMax([1., 2., 3., 2., 3., 4., 5., 6., 5., 4.])
return test
print(testgetmax())
I get
(array([2, 7], dtype=int64),)
So what's wrong with my assertion test? Btw please tell me if there are cleaner and/or better ways of doing unit tests than what I just did! (Also if there's a way to figure out what causes the assertion error and what it actually should be)
Let A = np.ones([3,3,5]).
Is there any linear algebra operation which will return
array([[ 9., 9., 9., 9., 9.]]) without any looping?
A.sum(axis=(0, 1))
Call the standard sum routine with a tuple of axes to sum over.
I have a numpy.array with a dimension dim_array. I'm looking forward to obtain a median filter like scipy.signal.medfilt(data, window_len).
This in fact doesn't work with numpy.array may be because the dimension is (dim_array, 1) and not (dim_array, ).
How to obtain such filter?
Next, another question, how can I obtain other filter, i.e., min, max, mean?
Based on this post, we could create sliding windows to get a 2D array of such windows being set as rows in it. These windows would merely be views into the data array, so no memory consumption and thus would be pretty efficient. Then, we would simply use those ufuncs along each row axis=1.
Thus, for example sliding-median` could be computed like so -
np.median(strided_app(data, window_len,1),axis=1)
For the other ufuncs, just use the respective ufunc names there : np.min, np.max & np.mean. Please note this is meant to give a generic solution to use ufunc supported functionality.
For the best performance, one must still look into specific functions that are built for those purposes. For the four requested functions, we have the builtins, like so -
Median : scipy.signal.medfilt.
Max : scipy.ndimage.filters.maximum_filter1d.
Min : scipy.ndimage.filters.minimum_filter1d.
Mean : scipy.ndimage.filters.uniform_filter1d
The fact that applying of a median filter with the window size 1 will not change the array gives us a freedom to apply the median filter row-wise or column-wise.
For example, this code
from scipy.ndimage import median_filter
import numpy as np
arr = np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
median_filter(arr, size=3, cval=0, mode='constant')
#with cval=0, mode='constant' we set that input array is extended with zeros
#when window overlaps edges, just for visibility and ease of calculation
outputs an expected filtered with window (3, 3) array
array([[0., 2., 0.],
[2., 5., 3.],
[0., 5., 0.]])
because median_filter automatically extends the size to all dimensions, so the same effect we can get with:
median_filter(arr, size=(3, 3), cval=0, mode='constant')
Now, we can also apply median_filter row-wise with setting 1 to the first element of size
median_filter(arr, size=(1, 3), cval=0, mode='constant')
Output:
array([[1., 2., 2.],
[4., 5., 5.],
[7., 8., 8.]])
And column-wise with the same logic
median_filter(arr, size=(3, 1), cval=0, mode='constant')
Output:
array([[1., 2., 3.],
[4., 5., 6.],
[4., 5., 6.]])
I want to read data from a file that has many missing values, as in this example:
1,2,3,4,5
6,,,7,8
,,9,10,11
I am using the numpy.loadtxt function:
data = numpy.loadtxt('test.data', delimiter=',')
The problem is that the missing values break loadtxt (I get a "ValueError: could not convert string to float:", no doubt because of the two or more consecutive delimiters).
Is there a way to do this automatically, with loadtxt or another function, or do I have to bite the bullet and parse each line manually?
I'd probably use genfromtxt:
>>> from numpy import genfromtxt
>>> genfromtxt("missing1.dat", delimiter=",")
array([[ 1., 2., 3., 4., 5.],
[ 6., nan, nan, 7., 8.],
[ nan, nan, 9., 10., 11.]])
and then do whatever with the nans (change them to something, use a mask instead, etc.) Some of this could be done inline:
>>> genfromtxt("missing1.dat", delimiter=",", filling_values=99)
array([[ 1., 2., 3., 4., 5.],
[ 6., 99., 99., 7., 8.],
[ 99., 99., 9., 10., 11.]])
This is because the function expects to return a numpy array with all cells of the same type.
If you want a table with mixed strings and number, you should read it into a structured array instead, also you probably want to add skip_header=1 to skip the first line, ie in your case something like:
np.genfromtxt('upeak_names.txt', delimiter="\t", dtype="S10,S10,f4,S10,f4,S10,f4",
names=["id", "name", "Distance", "name2", "Distance2", "name3", "Distance3], skip_header=1)
See also:
Documentation for genfromtxt:
https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.genfromtxt.html
Documentation for structured arrays in numpy:
https://docs.scipy.org/doc/numpy-1.15.0/user/basics.rec.html
In numpy if you want to calculate the sinus of each entry of a matrix (elementise) then
a = numpy.arange(0,27,3).reshape(3,3)
numpy.sin(a)
will get the job done! If you want the power let's say to 2 of each entry
a**2
will do it.
But if you have a sparse matrix things seem more difficult. At least I haven't figured a way to do that besides iterating over each entry of a lil_matrix format and operate on it.
I've found this question on SO and tried to adapt this answer but I was not succesful.
The Goal is to calculate elementwise the squareroot (or the power to 1/2) of a scipy.sparse matrix of CSR format.
What would you suggest?
The following trick works for any operation which maps zero to zero, and only for those operations, because it only touches the non-zero elements. I.e., it will work for sin and sqrt but not for cos.
Let X be some CSR matrix...
>>> from scipy.sparse import csr_matrix
>>> X = csr_matrix(np.arange(10).reshape(2, 5), dtype=np.float)
>>> X.A
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.]])
The non-zero elements' values are X.data:
>>> X.data
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.])
which you can update in-place:
>>> X.data[:] = np.sqrt(X.data)
>>> X.A
array([[ 0. , 1. , 1.41421356, 1.73205081, 2. ],
[ 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ]])
Update In recent versions of SciPy, you can do things like X.sqrt() where X is a sparse matrix to get a new copy with the square roots of elements in X.