Related
I have several lists that can only contain the following values: 0, 0.5, 1, 1.5
I want to efficiently convert each of these lists into probability mass functions. So if a list is as follows: [0.5, 0.5, 1, 1.5], the PMF will look like this: [0, 0.5, 0.25, 0.25].
I need to do this many times (and with very large lists), so avoiding looping will be optimal, if at all possible. What's the most efficient way to make this happen?
Edit: Here's my current system. This feels like a really inefficient/unelegant way to do it:
def get_distribution(samplemodes1):
n, bin_edges = np.histogram(samplemodes1, bins = 9)
totalcount = np.sum(n)
bin_probability = n / totalcount
bins_per_point = np.fmin(np.digitize(samplemodes1, bin_edges), len(bin_edges)-1)
probability_perpoint = [bin_probability[bins_per_point[i]-1] for i in range(len(samplemodes1))]
counts = Counter(samplemodes1)
total = sum(counts.values())
probability_mass = {k:v/total for k,v in counts.items()}
#print(probability_mass)
key_values = {}
if(0 in probability_mass):
key_values[0] = probability_mass.get(0)
else:
key_values[0] = 0
if(0.5 in probability_mass):
key_values[0.5] = probability_mass.get(0.5)
else:
key_values[0.5] = 0
if(1 in probability_mass):
key_values[1] = probability_mass.get(1)
else:
key_values[1] = 0
if(1.5 in probability_mass):
key_values[1.5] = probability_mass.get(1.5)
else:
key_values[1.5] = 0
distribution = list(key_values.values())
return distribution
Here are some solution for you to benchmark:
Using collections.Counter
from collections import Counter
bins = [0, 0.5, 1, 1.5]
a = [0.5, 0.5, 1.0, 0.5, 1.0, 1.5, 0.5]
denom = len(a)
counts = Counter(a)
pmf = [counts[bin]/denom for bin in Bins]
NumPy based solution
import numpy as np
bins = [0, 0.5, 1, 1.5]
a = np.array([0.5, 0.5, 1.0, 0.5, 1.0, 1.5, 0.5])
denom = len(a)
pmf = [(a == bin).sum()/denom for bin in bins]
but you can probably do better by using np.bincount() instead.
Further reading on this idea: https://thispointer.com/count-occurrences-of-a-value-in-numpy-array-in-python/
I'm trying to find the fundamental matrix between two images. The points of correspondence in my images are given as follows -
pts1_list =
[
[224.95256042, 321.64755249],
[280.72879028, 296.15835571],
[302.34194946, 364.82437134],
[434.68283081, 402.86990356],
[244.64321899, 308.50286865],
[488.62979126, 216.26953125],
[214.77470398, 430.75869751],
[299.20846558, 312.07217407],
[266.94125366, 119.36679077],
[384.41549683, 442.05865479],
[475.28448486, 254.28138733]
]
pts2_list =
[
[253.88285828, 335.00772095],
[304.884552, 308.89205933],
[325.33914185, 375.91308594],
[455.15515137, 411.18075562],
[271.48794556, 322.07028198],
[515.11816406, 221.74610901],
[245.31390381, 441.54830933],
[321.74771118, 324.31417847],
[289.86627197, 137.46456909],
[403.3711853, 451.08905029],
[496.16610718, 261.36074829]
]
I have found a code that does what I'm looking for, but it looks like it works only for 3D points.
I've linked the reference code links here and here, but fundamentally, the code snippets that I am looking at are -
def compute_fundamental(x1, x2):
'''Computes the fundamental matrix from corresponding points x1, x2 using
the 8 point algorithm.'''
n = x1.shape[1]
if x2.shape[1] != n:
raise ValueError('Number of points do not match.')
# Normalization is done in compute_fundamental_normalized().
A = numpy.zeros((n, 9))
for i in range(n):
A[i] = [x1[0, i] * x2[0, i], x1[0, i] * x2[1, i], x1[0, i] * x2[2, i],
x1[1, i] * x2[0, i], x1[1, i] * x2[1, i], x1[1, i] * x2[2, i],
x1[2, i] * x2[0, i], x1[2, i] * x2[1, i], x1[2, i] * x2[2, i],
]
# Solve A*f = 0 using least squares.
U, S, V = numpy.linalg.svd(A)
F = V[-1].reshape(3, 3)
# Constrain F to rank 2 by zeroing out last singular value.
U, S, V = numpy.linalg.svd(F)
S[2] = 0
F = numpy.dot(U, numpy.dot(numpy.diag(S), V))
return F / F[2, 2]
and
def setUp(self):
points = array([
[-1.1, -1.1, -1.1], [ 1.4, -1.4, -1.4], [-1.5, 1.5, -1], [ 1, 1.8, -1],
[-1.2, -1.2, 1.2], [ 1.3, -1.3, 1.3], [-1.6, 1.6, 1], [ 1, 1.7, 1],
])
points = homography.make_homog(points.T)
P = hstack((eye(3), array([[0], [0], [0]])))
cam = camera.Camera(P)
self.x = cam.project(points)
r = [0.05, 0.1, 0.15]
rot = camera.rotation_matrix(r)
cam.P = dot(cam.P, rot)
cam.P[:, 3] = array([1, 0, 0])
self.x2 = cam.project(points)
def testComputeFundamental(self):
E = sfm.compute_fundamental(self.x2[:, :8], self.x[:, :8])
In this code, the parameters that are being passed are 3 dimensional whereas my requirement is only a two-coordinate frame. I would like to know how to modify this code and how the A matrix should be calculated in my case. Thank you.
F, _ = cv2.findFundamentalMat(pts1_list, pts2_list)
Here is what I am trying to do.
Take the list:
list1 = [0,2]
This list has start point 0 and end point 2.
Now, if we were to take the midpoint of this list, the list would become:
list1 = [0,1,2]
Now, if we were to recursively split up the list again (take the midpoints of the midpoints), the list would becomes:
list1 = [0,.5,1,1.5,2]
I need a function that will generate lists like this, preferably by keeping track of a variable. So, for instance, let's say there is a variable, n, that keeps track of something. When n = 1, the list might be [0,1,2] and when n = 2, the list might be [0,.5,1,1.5,2], and I am going to increment the value of to keep track of how many times I have divided up the list.
I know you need to use recursion for this, but I'm not sure how to implement it.
Should be something like this:
def recursive(list1,a,b,n):
"""list 1 is a list of values, a and b are the start
and end points of the list, and n is an int representing
how many times the list needs to be divided"""
int mid = len(list1)//2
stuff
Could someone help me write this function? Not for homework, part of a project I'm working on that involves using mesh analysis to divide up rectangle into parts.
This is what I have so far:
def recursive(a,b,list1,n):
w = b - a
mid = a + w / 2
left = list1[0:mid]
right = list1[mid:len(list1)-1]
return recursive(a,mid,list1,n) + mid + recursive(mid,b,list1,n)
but I'm not sure how to incorporate n into here.
NOTE: The list1 would initially be [a,b] - I would just manually enter that but I'm sure there's a better way to do it.
You've generated some interesting answers. Here are two more.
My first uses an iterator to avoid
slicing the list and is recursive because that seems like the most natural formulation.
def list_split(orig, n):
if not n:
return orig
else:
li = iter(orig)
this = next(li)
result = [this]
for nxt in li:
result.extend([(this+nxt)/2, nxt])
this = nxt
return list_split(result, n-1)
for i in range(6):
print(i, list_split([0, 2], i))
This prints
0 [0, 2]
1 [0, 1.0, 2]
2 [0, 0.5, 1.0, 1.5, 2]
3 [0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2]
4 [0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.125, 1.25, 1.375, 1.5, 1.625, 1.75, 1.875, 2]
5 [0, 0.0625, 0.125, 0.1875, 0.25, 0.3125, 0.375, 0.4375, 0.5, 0.5625, 0.625, 0.6875, 0.75, 0.8125, 0.875, 0.9375, 1.0, 1.0625, 1.125, 1.1875, 1.25, 1.3125, 1.375, 1.4375, 1.5, 1.5625, 1.625, 1.6875, 1.75, 1.8125, 1.875, 1.9375, 2]
My second is based on the observation that recursion isn't necessary if you always start from two elements. Suppose those elements are mn and mx. After N applications of the split operation you will have 2^N+1 elements in it, so the numerical distance between the elements will be (mx-mn)/(2**N).
Given this information it should therefore be possible to deterministically compute the elements of the array, or even easier to use numpy.linspace like this:
def grid(emin, emax, N):
return numpy.linspace(emin, emax, 2**N+1)
This appears to give the same answers, and will probably serve you best in the long run.
You can use some arithmetic and slicing to figure out the size of the result, and fill it efficiently with values.
While not required, you can implement a recursive call by wrapping this functionality in a simple helper function, which checks what iteration of splitting you are on, and splits the list further if you are not at your limit.
def expand(a):
"""
expands a list based on average values between every two values
"""
o = [0] * ((len(a) * 2) - 1)
o[::2] = a
o[1::2] = [(x+y)/2 for x, y in zip(a, a[1:])]
return o
def rec_expand(a, n):
if n == 0:
return a
else:
return rec_expand(expand(a), n-1)
In action
>>> rec_expand([0, 2], 2)
[0, 0.5, 1.0, 1.5, 2]
>>> rec_expand([0, 2], 4)
[0,
0.125,
0.25,
0.375,
0.5,
0.625,
0.75,
0.875,
1.0,
1.125,
1.25,
1.375,
1.5,
1.625,
1.75,
1.875,
2]
You could do this with a for loop
import numpy as np
def add_midpoints(orig_list, n):
for i in range(n):
new_list = []
for j in range(len(orig_list)-1):
new_list.append(np.mean(orig_list[j:(j+2)]))
orig_list = orig_list + new_list
orig_list.sort()
return orig_list
add_midpoints([0,2],1)
[0, 1.0, 2]
add_midpoints([0,2],2)
[0, 0.5, 1.0, 1.5, 2]
add_midpoints([0,2],3)
[0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2]
You can also do this totally non-recursively and without looping. What we're doing here is just making a binary scale between two numbers like on most Imperial system rulers.
def binary_scale(start, stop, level):
length = stop - start
scale = 2 ** level
return [start + i * length / scale for i in range(scale + 1)]
In use:
>>> binary_scale(0, 10, 0)
[0.0, 10.0]
>>> binary_scale(0, 10, 2)
[0.0, 2.5, 5.0, 7.5, 10.0]
>>> binary_scale(10, 0, 1)
[10.0, 5.0, 0.0]
Fun with anti-patterns:
def expand(a, n):
for _ in range(n):
a[:-1] = sum(([a[i], (a[i] + a[i + 1]) / 2] for i in range(len(a) - 1)), [])
return a
print(expand([0, 2], 2))
OUTPUT
% python3 test.py
[0, 0.5, 1.0, 1.5, 2]
%
I'm supposed to normalize an array. I've read about normalization and come across a formula:
I wrote the following function for it:
def normalize_list(list):
max_value = max(list)
min_value = min(list)
for i in range(0, len(list)):
list[i] = (list[i] - min_value) / (max_value - min_value)
That is supposed to normalize an array of elements.
Then I have come across this: https://stackoverflow.com/a/21031303/6209399
Which says you can normalize an array by simply doing this:
def normalize_list_numpy(list):
normalized_list = list / np.linalg.norm(list)
return normalized_list
If I normalize this test array test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9] with my own function and with the numpy method, I get these answers:
My own function: [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
The numpy way: [0.059234887775909233, 0.11846977555181847, 0.17770466332772769, 0.23693955110363693, 0.29617443887954614, 0.35540932665545538, 0.41464421443136462, 0.47387910220727386, 0.5331139899831830
Why do the functions give different answers? Is there others way to normalize an array of data? What does numpy.linalg.norm(list) do? What do I get wrong?
There are different types of normalization. You are using min-max normalization. The min-max normalization from scikit learn is as follows.
import numpy as np
from sklearn.preprocessing import minmax_scale
# your function
def normalize_list(list_normal):
max_value = max(list_normal)
min_value = min(list_normal)
for i in range(len(list_normal)):
list_normal[i] = (list_normal[i] - min_value) / (max_value - min_value)
return list_normal
#Scikit learn version
def normalize_list_numpy(list_numpy):
normalized_list = minmax_scale(list_numpy)
return normalized_list
test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9]
test_array_numpy = np.array(test_array)
print(normalize_list(test_array))
print(normalize_list_numpy(test_array_numpy))
Output:
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
MinMaxscaler uses exactly your formula for normalization/scaling:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.minmax_scale.html
#OuuGiii: NOTE: It is not a good idea to use Python built-in function names as varibale names. list() is a Python builtin function so its use as a variable should be avoided.
The question/answer that you reference doesn't explicitly relate your own formula to the np.linalg.norm(list) version that you use here.
One NumPy solution would be this:
import numpy as np
def normalize(x):
x = np.asarray(x)
return (x - x.min()) / (np.ptp(x))
print(normalize(test_array))
# [ 0. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1. ]
Here np.ptp is peak-to-peak ie
Range of values (maximum - minimum) along an axis.
This approach scales the values to the interval [0, 1] as pointed out by #phg.
The more traditional definition of normalization would be to scale to a 0 mean and unit variance:
x = np.asarray(test_array)
res = (x - x.mean()) / x.std()
print(res.mean(), res.std())
# 0.0 1.0
Or use sklearn.preprocessing.normalize as a pre-canned function.
Using test_array / np.linalg.norm(test_array) creates a result that is of unit length; you'll see that np.linalg.norm(test_array / np.linalg.norm(test_array)) equals 1. So you're talking about two different fields here, one being statistics and the other being linear algebra.
The power of python is its broadcasting property, which allows you to do vectorizing array operations without explicit looping. So, You do not need to write a function using explicit for loop, which is slow and time-consuming, especially if your dataset is too big.
The pythonic way of doing min-max normalization is
test_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
normalized_test_array = (test_array - min(test_array)) / (max(test_array) - min(test_array))
output >> [ 0., 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1. ]
I have periodic data with the index being a floating point number like so:
time = [0, 0.1, 0.21, 0.31, 0.40, 0.49, 0.51, 0.6, 0.71, 0.82, 0.93]
voltage = [1, -1, 1.1, -0.9, 1, -1, 0.9,-1.2, 0.95, -1.1, 1.11]
df = DataFrame(data=voltage, index=time, columns=['voltage'])
df.plot(marker='o')
I want to create a cross(df, y_val, direction='rise' | 'fall' | 'cross') function that returns an array of times (indexes) with all the
interpolated points where the voltage values equal y_val. For 'rise' only the values where the slope is positive are returned; for 'fall' only the values with a negative slope are retured; for 'cross' both are returned. So if y_val=0 and direction='cross' then an array with 10 values would be returned with the X values of the crossing points (the first one being about 0.025).
I was thinking this could be done with an iterator but was wondering if there was a better way to do this.
Thanks. I'm loving Pandas and the Pandas community.
To do this I ended up with the following. It is a vectorized version which is 150x faster than one that uses a loop.
def cross(series, cross=0, direction='cross'):
"""
Given a Series returns all the index values where the data values equal
the 'cross' value.
Direction can be 'rising' (for rising edge), 'falling' (for only falling
edge), or 'cross' for both edges
"""
# Find if values are above or bellow yvalue crossing:
above=series.values > cross
below=np.logical_not(above)
left_shifted_above = above[1:]
left_shifted_below = below[1:]
x_crossings = []
# Find indexes on left side of crossing point
if direction == 'rising':
idxs = (left_shifted_above & below[0:-1]).nonzero()[0]
elif direction == 'falling':
idxs = (left_shifted_below & above[0:-1]).nonzero()[0]
else:
rising = left_shifted_above & below[0:-1]
falling = left_shifted_below & above[0:-1]
idxs = (rising | falling).nonzero()[0]
# Calculate x crossings with interpolation using formula for a line:
x1 = series.index.values[idxs]
x2 = series.index.values[idxs+1]
y1 = series.values[idxs]
y2 = series.values[idxs+1]
x_crossings = (cross-y1)*(x2-x1)/(y2-y1) + x1
return x_crossings
# Test it out:
time = [0, 0.1, 0.21, 0.31, 0.40, 0.49, 0.51, 0.6, 0.71, 0.82, 0.93]
voltage = [1, -1, 1.1, -0.9, 1, -1, 0.9,-1.2, 0.95, -1.1, 1.11]
df = DataFrame(data=voltage, index=time, columns=['voltage'])
x_crossings = cross(df['voltage'])
y_crossings = np.zeros(x_crossings.shape)
plt.plot(time, voltage, '-ob', x_crossings, y_crossings, 'or')
plt.grid(True)
It was quite satisfying when this worked. Any improvements that can be made?