using indices with multiple values, how to get the smallest one - python

I have an index to choose elements from one array. But sometimes the index might have repeated entries... in that case I would like to choose the corresponding smaller value. Is it possible?
index = [0,3,5,5]
dist = [1,1,1,3]
arr = np.zeros(6)
arr[index] = dist
print arr
what I get:
[ 1. 0. 0. 1. 0. 3.]
what I would like to get:
[ 1. 0. 0. 1. 0. 1.]
addendum
Actually I have a third array with the (vector) values to be inserted. So the problem is to insert values from values into arr at positions index as in the following. However I want to choose the values corresponding to minimum dist when multiple values have the same index.
index = [0,3,5,5]
dist = [1,1,1,3]
values = np.arange(8).reshape(4,2)
arr = np.zeros((6,2))
arr[index] = values
print arr
I get:
[[ 0. 1.]
[ 0. 0.]
[ 0. 0.]
[ 2. 3.]
[ 0. 0.]
[ 6. 7.]]
I would like to get:
[[ 0. 1.]
[ 0. 0.]
[ 0. 0.]
[ 2. 3.]
[ 0. 0.]
[ 4. 5.]]

Use groupby in pandas:
import pandas as pd
index = [0,3,5,5]
dist = [1,1,1,3]
s = pd.Series(dist).groupby(index).min()
arr = np.zeros(6)
arr[s.index] = s.values
print arr

If index is sorted, then itertools.groupby could be used to group that list.
np.array([(g[0],min([x[1] for x in g[1]])) for g in
itertools.groupby(zip(index,dist),lambda x:x[0])])
produces
array([[0, 1],
[3, 1],
[5, 1]])
This is about 8x slower than the version using np.unique. So for N=1000 is similar to the Pandas version (I'm guessing since something is screwy with my Pandas import). For larger N the Pandas version is better. Looks like the Pandas approach has a substantial startup cost, which limits its speed for small N.

Related

How to append to a Numpy array using a for-loop

I'm trying to learn how to work with Numpy arrays in python and working on a task where the goal is to append certain values from a square function to an np array.
To be specific, trying to append to the array in such a way that the result looks like this.
[[0, 0], [1, 1], [2, 4], [3, 9], [4, 16], [5, 25](....)
In other words kind of like using a for loop to append to a nested list kind of like this:
N = 101
def f(x):
return x**2
list1 = []
for i in range(N+1):
list1.append([i])
list1[i].append(f(i))
print(list1)
When I try to do this similarly whit Numpy arrays like below:
import numpy as np
N = 101
x_min = 1
x_max = 10
y = np.zeros(N)
x = np.linspace(x_min,x_max, N)
def f(x):
return x**2
for i in y:
np.append(y,f(x))
print(y)
I get the following output:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.]
... which is obviously wrong
Arrays as a datatype are quite new to me, so I would massively appreciate it if anyone could help me out.
Best regards from a rookie who is motivated to learn and welcome all help.
It is kind of un-numpy-thonic (if that's a thing) to mix and match numpy arrays with vanilla python operations like for loops and appending. If I were to do this in pure numpy I would first start with your original array
>>> import numpy as np
>>> N = 101
>>> values = np.arange(N)
>>> values
array([ 0, 1, 2, ..., 99, 100])
then I would generate your squared array to create your 2D result
>>> values = np.array([values, values**2])
>>> values.T
array([[ 0, 0],
[ 1, 1],
[ 2, 4],
...
[ 98, 9604],
[ 99, 9801],
[ 100, 10000]])
Numpy gets its speed advantages in two primary ways:
Faster execution of large numbers of repeated operations (i.e. without Python for loops)
Avoiding moving data in memory (i.e. re-allocating memory space).
It's impossible to implement an indefinite append operation with Numpy arrays and still get both of these advantages. So don't do it!
I can't see in your example why an append is necessary because you know the size of the result array in advance (N).
Perhaps what you are looking for instead is vectorized function execution and assignment:
y[:] = f(x)
print(y)
(Instead of your for loop.)
This produces:
[ 1. 1.1881 1.3924 1.6129 1.8496 2.1025 2.3716 2.6569
2.9584 3.2761 3.61 3.9601 4.3264 4.7089 5.1076 5.5225
5.9536 6.4009 6.8644 7.3441 7.84 8.3521 8.8804 9.4249
9.9856 10.5625 11.1556 11.7649 12.3904 13.0321 13.69 14.3641
...etc.
Or, to get a similar output to your first bit of code:
y = np.zeros((N, 2))
y[:, 0] = x
y[:, 1] = f(x)
You could simply broadcast the operation and column_stack them.
col1 = np.arange(N)
col2 = col1 **2
list1 = np.column_stack((col1,col2))

Create lower triangular matrix from a vector in python

I want to create a python program which computes a matrix from a vector with some coefficients. So lets say we have the following vector of coefficients c = [c0, c1, c2] = [0, 1, 0], then I want to compute the matrix:
So how do I go from the vector c to creating a lower triangular matrix A. I know how to index it manually, but I need a program that can do it. I was maybe thinking about a for-loop inside another for-loop but I struggle with how it is done practically, what do you guys think should be done here?
One way (assuming you're using plain arrays and not numpy or anything):
src = [0, 1, 0]
dst = [
[
src[i-j] if i >= j else 0
for j in range(len(src))
] for i in range(len(src))
]
You can try the following:
import numpy as np
c = [1, 2, 3, 4, 5]
n = len(c)
a = np.zeros((n,n))
for i in range(n):
np.fill_diagonal(a[i:, :], c[i])
print(a)
It gives:
[[1. 0. 0. 0. 0.]
[2. 1. 0. 0. 0.]
[3. 2. 1. 0. 0.]
[4. 3. 2. 1. 0.]
[5. 4. 3. 2. 1.]]

Matrix element repetition bug

I'm trying to create a matrix that reads:
[0,1,2]
[3,4,5]
[6,7,8]
However, my elements keep repeating. How do I fix this?
import numpy as np
n = 3
X = np.empty(shape=[0, n])
for i in range(3):
for j in range(1,4):
for k in range(1,7):
X = np.append(X, [[(3*i) , ((3*j)-2), ((3*k)-1)]], axis=0)
print(X)
Results:
[[ 0. 1. 2.]
[ 0. 1. 5.]
[ 0. 1. 8.]
[ 0. 1. 11.]
[ 0. 1. 14.]
[ 0. 1. 17.]
[ 0. 4. 2.]
[ 0. 4. 5.]
I'm not really sure how you think your code was supposed to work. You are appending a row in X at each loop, so 3 * 3 * 7 times, so you end up with a matrix of 54 x 3.
I think maybe you meant to do:
for i in range(3):
X = np.append(X, [[3*i , 3*i+1, 3*i+2]], axis=0)
Just so you know, appending array is usually discouraged (just create a list of list, then make it a numpy array).
You could also do
>> np.arange(9).reshape((3,3))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

Update 3 and 4 dimension elements of numpy array

I have a numpy array of shape [12, 8, 5, 5]. I want to modify the values of 3rd and 4th dimension for each element.
For e.g.
import numpy as np
x = np.zeros((12, 80, 5, 5))
print(x[0,0,:,:])
Output:
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
Modify values:
y = np.ones((5,5))
x[0,0,:,:] = y
print(x[0,0,:,:])
Output:
[[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]]
I can modify for all x[i,j,:,:] using two for loops. But, I was wondering if there is any pythonic way to do it without running two loops. Just curious to know :)
UPDATE
Actual use case:
dict_weights = copy.deepcopy(combined_weights)
for i in range(0, len(combined_weights[each_layer][:, 0, 0, 0])):
for j in range(0, len(combined_weights[each_layer][0, :, 0, 0])):
# Extract 5x5
trans_weight = combined_weights[each_layer][i,j]
trans_weight = np.fliplr(np.flipud(trans_weight ))
# Update
dict_weights[each_layer][i, j] = trans_weight
NOTE: The dimensions i, j of combined_weights can vary. There are around 200 elements in this list with varied i and j dimensions, but 3rd and 4th dimensions are always same (i.e. 5x5).
I just want to know if I can updated the elements combined_weights[:,:,5, 5] with transposed values without running 2 for loops.
Thanks.
Simply do -
dict_weights[each_layer] = combined_weights[each_layer][...,::-1,::-1]

First n elements of row in numpy array

I'm trying to implement a k-nearest neighbour classifier in Python, and so I want to calculate the Euclidean distance. I have a dataset that I have converted into a big numpy array
[[ 0. 0. 4. ..., 1. 0. 1.]
[ 0. 0. 5. ..., 0. 0. 1.]
[ 0. 0. 14. ..., 16. 9. 1.]
...,
[ 0. 0. 3. ..., 2. 0. 3.]
[ 0. 1. 7. ..., 0. 0. 3.]
[ 0. 2. 10. ..., 0. 0. 3.]]
where the last element of each row indicates the class. So when calculating the Euclidean distance, I obviously don't want to include the last element. I thought I could do the following
for row in dataset:
distance = euclidean_distance(vector, row[:dataset.shape[1] - 1])
but that still includes the last element
print row
>>> [[ 0. 0. 4. ..., 1. 0. 1.]]
print row[:dataset.shape[1] - 1]
>>> [[ 0. 0. 4. ..., 1. 0. 1.]]
as you can see both are the same.
You can subset the data using numpy slicing. If you find yourself iterating over a numpy array, stop and try to find a method that takes advantage of the vectorized nature of numpy operations.
Assuming your array is called arr:
data_points = arr[:,:-1]
classes = arr[:,-1]
For distance to vector calculations:
To find the distance between a 1d array and all of the rows of a 2d array, you can use to following. It assumes the 1d array is v and the 2d array is arr.
dist = np.power(arr - v, 2).sum(axis=1)
dist will be a 1d array of distances.
For pairwise calculations:
The following function takes a 2d array of numbers and returns the upper-diagonal matrix of pair-wise distances using the given L-x distance measurement (the Euclidean distance measure is the L=2 metric).
def pairwise_distance(arr, L=2):
d = arr.shape[0]
out = np.zeros(d)
for f in range(1, d):
out[:-f].ravel()[f::d+1] = np.power(arr[:-f]-arr[f:], L).sum(axis=1)
return np.power(out, 1.0/L)

Categories