I'm implementing the Nearest Centroid Classification algorithm and I'm kind of blocked on how to use numpy.mean in my case.
So suppose I have some spherical datasets X:
[[ 0.39151059 3.48203037]
[-0.68677876 1.45377717]
[ 2.30803493 4.19341503]
[ 0.50395297 2.87076658]
[ 0.06677012 3.23265678]
[-0.24135103 3.78044279]
[-0.05660036 2.37695381]
[ 0.74210998 -3.2654815 ]
[ 0.05815341 -2.41905942]
[ 0.72126958 -1.71081388]
[ 1.03581142 -4.09666955]
[ 0.23209714 -1.86675298]
[-0.49136284 -1.55736028]
[ 0.00654881 -2.22505305]]]
and the labeled vector Y:
[0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]
An example with 100 2D data points gives the following result:
The NCC algorithm consists of first calculating the class mean of each class (0 and 1: that's blue and red) and then calculating the nearest class centroid for the next data point.
This is my current function:
def mean_ncc(X,Y):
# find unique classes
m_cids = np.unique(Y) #[0. 1.]
# compute class means
mu = np.zeros((len(cids), X.shape[1])) #[[0. 0.] [0. 0.]] (in the case where Y has 2 unique points (0 and 1)
for class_idx, class_label in enumerate(cids):
mu[class_idx, :] = #problem here
return mu
So here I want an array containing the class means of '0' (blue) points and '1' (red) points:
How can I specify the number of elements of X whose mean I want to calculate?
I would like to do something like this:
for class_idx, class_label in enumerate(m_cids):
mu[class_idx, :] = np.mean(X[only the elements,that contains the same class_label], axis=0)
Is it possible or is there another way to implement this?
You could use something like this:
import numpy as np
tags = [0, 0, 1, 1, 0, 1]
values = [5, 4, 2, 5, 9, 8]
tags_np = np.array(tags)
values_np = np.array(values)
print(values_np[tags_np == 1].mean())
EDIT: You will surely need to look more into the axis parameter for the mean function:
import numpy as np
values = [[5, 4],
[5, 4],
[4, 3],
[4, 3]]
values_np = np.array(values)
tags_np = np.array([0, 0, 1, 1])
print(values_np[tags_np == 0].mean(axis=0))
Sorry if this post is a dupli,I couldn't find an answer... I have the following code:
import numpy as np
V = np.array([[6, 10, 0],
[2, 5, 0],
[0, 0, 0]])
subarr = np.array([[arr[0][0], arr[0][1]], [arr[1][0], arr[1][1]]])
det = np.linalg.det(subarr)
cross = np.cross(arr[0], arr[1])
print(f"Det: {det}")
print(f"Cross: {cross}")
I would expect that the det would return 10.0 and the cross returns in this case [0, 0, 10], the last number being equal to the det. For some reason, python returns
Det: 10.000000000000002
Cross: [ 0 0 10]
Can someone please explain why?
What you're seeing is floating point inaccuracies.
And in case you're wondering how you end up with floats when finding the determinant of a matrix made up of integers (where the usual calculation method is just 6*5 - 2*10 = 10), np.linalg.det uses LU decomposition to find the determinant. This isn't very efficient for 2x2 matrices, but is much more efficient when you have bigger matrices.
For your 2x2, you get:
scipy.linalg.lu(A, 1)
Out:
(array([[ 1. , 0. ],
[ 0.33333333, 1. ]]),
array([[ 6. , 10. ],
[ 0. , 1.66666667]]))
The determinant is just the product of the diagonals from this, which ends up being 6. * 1.66666667, which resolves to 10.00000002 due to floating point errors.
What is the most idiomatic way to produce a cumulative sum which "fades" out as it moves along. Let me explain with an example.
>>> np.array([1,0,-1,0,0]).cumsum()
array([1, 1, 0, 0, 0], dtype=int32)
But I would like to provide a factor <1 and produce something like:
>>> np.array([1,0,-1,0,0]).cumsum_with_factor(0.5)
array([1.0, 0.5, -0.75, -0.375, -0.1875], dtype=float64)
It's a big plus if it's fast!
Your result can be obtained by linear convolution:
signal = np.array([1,0,-1,0,0])
kernel = 0.5**np.arange(5)
np.convolve(signal, kernel, mode='full')
# array([ 1. , 0.5 , -0.75 , -0.375 , -0.1875, -0.125 , -0.0625,
0. , 0. ])
If performance is a consideration use scipy.signal.fftconvolve which is a faster implementation of the same logic.
Hi I have to enlarge the number of points inside of vector to enlarge the vector to fixed size. for example:
for this simple vector
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> len(a)
# 6
now, I want to get a vector with size of 11 taken the a vector as base the results will be
# array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
EDIT 1
what I need is a function that will enter the base vector and the number of values that must be the resultant vector, and I return a new vector with size equal to the parameter. something like
def enlargeVector(vector, size):
.....
return newVector
to use like:
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> b = enlargeVector(a, 200):
>>> len(b)
# 200
and b contains data results of linear, cubic, or whatever interpolation methods
There are many methods to do this within scipy.interpolate. My favourite is UnivariateSpline, which produces an order k spline guaranteed to be differentiable k times.
To use it:
from scipy.interpolate import UnivariateSpline
old_indices = np.arange(0,len(a))
new_length = 11
new_indices = np.linspace(0,len(a)-1,new_length)
spl = UnivariateSpline(old_indices,a,k=3,s=0)
new_array = spl(new_indices)
The s is a smoothing factor that you should set to 0 in this case (since the data are exact).
Note that for the problem you have specified (since a just increases monotonically by 1), this is overkill, since the second np.linspace gives already the desired output.
EDIT: clarified that the length is arbitrary
As AGML pointed out there are tools to do this, but how about a pure numpy solution:
In [20]: a = np.arange(6)
In [21]: temp = np.dstack((a[:-1], a[:-1] + np.diff(a) / 2.0)).ravel()
In [22]: temp
Out[22]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
In [23]: np.hstack((temp, [a[-1]]))
Out[23]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
I am looking to implement an Ant Colony Optimization algorithm in Python, though am new to both Python and Object Oriented Programming so the learning curve has been rather steep. At this point, I am stuck as to how to address the following situation:
As ants walk around a 2D grid, they will encounter obstacles, pheromone deposits by other ants, food, etc. What data structure do I use to represent this 2D world and the aforementioned properties of each cell?
I had tried a 2D array, thinking that array[x-coord][y-coord] could point to a {} (dictionary) with the appropriate properties (Obstacle: 'Yes / 'No', Pheromone Level: X %, etc.). Unfortunately, though NumPy lets me create a 2D array, I cannot assign dictionary objects to the various coordinates.
from numpy import *
myArray = array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12]])
myArray[2][2]={}
Returns:
Traceback (most recent call last):
File "/Users/amormachine/Desktop/PythonTest.py", line 7, in <module>
myArray[2][2]={}
TypeError: long() argument must be a string or a number, not 'dict'
[Finished in 0.6s with exit code 1]
I am not committed to either dictionaries or this paradigm for implementing this project and would certainly appreciate the wisdom of the group.
sure you can, you just cant if your dtype is int ... so make your array with objects and you can use objects...
In [43]: a = [[{},{},{}],[{},{},{}]]
In [44]: a = numpy.array(a)
In [45]: a[1][1] = {'hello':'world','something':5}
In [46]: a
Out[46]:
array([[{}, {}, {}],
[{}, {'hello': 'world', 'something': 5}, {}]], dtype=object)
although not sure whay you will gain using numpy with objects, you may be better off just leaving it as a list of lists
In plain Python I would be going for the list-of-dicts approach but with NumPy I find it more natural to work with separate arrays for different attributes rather than trying to keep things in one structure.
import numpy as np
grid_shape = (120,80)
# example of random initialization with this grid shape
pheremone_level = np.random.rand(*grid_shape)
obstacle = np.random.rand(*grid_shape) > 0.8
As #bitwise says it entirely depends on what operations you want to perform. Generally the "correct" way in NumPy will be much closer to how you would write it in Matlab than non-NumPy Python. Unfortunately I'm not familiar with how Ant Colony Optimization works so I can't say what's more suitable.
I was looking for something related to structured 2D grids and google led me to this page.
Although my solution is not entirely related to grids for what has been asked in the question, and I didn't want to repeat a question for 'structured 2D grid' data structure, I'm posting my solution here. I hope it will be useful to the audience searching for 2D structured grid and redirected here by search engines
Note: the method returns only the cell vertices and the vertex connectivity of each cells. Other quantities like cell volume, cell centroid, circumcircle, incircle, etc as needed for the application can be easily generated by adding additional routines
import numpy as np
import matplotlib.pyplot as plt
def create_structured_grid(corner1=None, corner2=None, nx=5, ny=5, plt_=True, annotate=True):
"""
creates a structured grid of rectangular lattice
input:
------
corner1 : [x_start, y_start]
corner2 : [x_end, y_end]
nx : numpts in x
ny : numpts in y
plt_ : boolean whether to plot or not
annotate: whether to annotate the grid points or not
output:
-------
vertex_array : numpy.array((numpts, dim),dtype=float) of vertices
connectivity : numpy.array((num_cells, 2**dim), dtyp=int) of
vertex connectivity for each cell
plots : additionally plots if boolean values are true
"""
#corner1 = np.array([0.0, 0.0])
#corner2 = np.array([1.0, 1.0])
dim = len(corner1) #currently only for 2D,
x_pts = np.linspace(corner1[0], corner2[0], nx)
y_pts = np.linspace(corner1[1], corner2[1], ny)
Xv, Yv = np.meshgrid(x_pts, y_pts)
numpts = nx*ny
vertex_array = np.zeros((numpts, 2), dtype=float)
vertex_array[:,0] = np.reshape(Xv, numpts)
vertex_array[:,1] = np.reshape(Yv, numpts)
num_cells = int(nx-1)*(ny-1)
connectivity = np.zeros((num_cells, int(2**dim)), dtype=int)
rows = ny-1
cols = nx-1
for row in range(rows):
for col in range(cols):
num = nx*row + col
connectivity[cols*row + col] = [num+0, num+1, num+nx, num+nx+1]
if plt_:
X,Y = vertex_array.T
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal')
plt.scatter(X,Y, marker='o', s=50, color='g', alpha=1.0)
plt.plot(Xv,Yv, linewidth=2, color='k')
plt.plot(Yv,Xv, linewidth=2, color='k')
if annotate:
for idx, cc in enumerate(vertex_array):
plt.text(cc[0], cc[1], str(idx), color='k', verticalalignment='bottom', horizontalalignment='right', fontsize='medium')
plt.show(block=False)
return vertex_array, connectivity
A call to function can be like this:
c1 = np.array([0.0, 0.0])
c2 = np.array([1.0, 1.0])
vertices, connctivity = create_structured_grid(corner1=c1, corner2=c2, nx=4, ny=4)
vertices = array([[ 0. , 0. ],
[ 0.33333333, 0. ],
[ 0.66666667, 0. ],
[ 1. , 0. ],
[ 0. , 0.33333333],
[ 0.33333333, 0.33333333],
[ 0.66666667, 0.33333333],
[ 1. , 0.33333333],
[ 0. , 0.66666667],
[ 0.33333333, 0.66666667],
[ 0.66666667, 0.66666667],
[ 1. , 0.66666667],
[ 0. , 1. ],
[ 0.33333333, 1. ],
[ 0.66666667, 1. ],
[ 1. , 1. ]])
connectivity = array([[ 0, 1, 5, 6],
[ 1, 2, 6, 7],
[ 2, 3, 7, 8],
[ 4, 5, 9, 10],
[ 5, 6, 10, 11],
[ 6, 7, 11, 12],
[ 8, 9, 13, 14],
[ 9, 10, 14, 15],
[10, 11, 15, 16]])