I am trying to calculate the distance between two lists so I can find the shortest distance between all coordinates.
Here is my code:
import random
import math
import copy
def calculate_distance(starting_x, starting_y, destination_x, destination_y):
distance = math.hypot(destination_x - starting_x, destination_y - starting_y) # calculates Euclidean distance (straight-line) distance between two points
return distance
def nearest_neighbour_algorithm(selected_map):
temp_map = copy.deepcopy(selected_map)
optermised_map = [] # we setup an empty optimised list to fill up
# get last element of temp_map to set as starting point, also removes it from temp_list
optermised_map.append(temp_map.pop()) # we set the first element of the temp_map and put it in optimised_map as the starting point and remove this element from the temp_map
for x in range(len(temp_map)):
nearest_value = 1000
neares_index = 0
for i in range(len(temp_map[x])):
current_value = calculate_distance(*optermised_map[x], *temp_map[x])
I get an error at this part and im not sure why:
for i in range(len(temp_map[x])):
current_value = calculate_distance(*optermised_map[x], *temp_map[x])
I am trying to find the distance between points between these two lists and the error I get is that my list index is out of range where the for loop is
On the first iteration optermised_map would be length 1. This would likely cause the error because it's iterating over len(temp_map) which is likely more than 1. I think you may have wanted:
for i in range(len(optermised_map)):
current_value = calculate_distance(*optermised_map[i], *temp_map[x])
Are the lengths of the lists the same? I could be wrong, but this sounds like a cosine similarity exercise to me. Check out this very simple exercise.
from scipy import spatial
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)
result
# 0.97228425171235
dataSetI = [1, 2, 3, 10]
dataSetII = [2, 4, 6, 20]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)
result
# 1.0
dataSetI = [10, 200, 234, 500]
dataSetII = [45, 3, 19, 20]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)
result
# 0.4991255575740505
In the second iteration, we can see that the ratios of the numbers in the two lists are exactly the same, but the numbers are different. We focus in the ratios of the numbers.
Related
I have a list with values. From this list I would like to get the outliners.
list_of_values = [2, 3, 100, 5, 53, 5, 4, 7]
def detect_outlier(data):
threshold= 3
mean_1 = np.mean(data)
std_1 =np.std(data)
outliers = [y for y in data if (np.abs((y - mean_1)/std_1) > threshold)]
return outliers
print(detect_outlier(list_of_values))
However, my print turns up empty, aka a [] without anything in it. Any ideas?
Since std_1 = 33.413, any element in list_of_values divided by std_1 will be smaller than the threshold and hence not yielded.
I would like to know how to use the python random.sample() function within a for-loop to generate multiple sample lists that are not identical.
For example, right now I have:
for i in range(3):
sample = random.sample(range(10), k=2)
This will generate 3 sample lists containing two numbers each, but I would like to make sure none of those sample lists are identical. (It is okay if there are repeating values, i.e., (2,1), (3,2), (3,7) would be okay, but (2,1), (1,2), (5,4) would not.)
If you specifically need to "use random.sample() within a for-loop", then you could keep track of samples that you've seen, and check that new ones haven't been seen yet.
import random
seen = set()
for i in range(3):
while True:
sample = random.sample(range(10), k=2)
print(f'TESTING: {sample = }') # For demo
fr = frozenset(sample)
if fr not in seen:
seen.add(fr)
break
print(sample)
Example output:
TESTING: sample = [0, 7]
[0, 7]
TESTING: sample = [0, 7]
TESTING: sample = [1, 5]
[1, 5]
TESTING: sample = [7, 0]
TESTING: sample = [3, 5]
[3, 5]
Here I made seen a set to allow fast lookups, and I converted sample to a frozenset so that order doesn't matter in comparisons. It has to be frozen because a set can't contain another set.
However, this could be very slow with different inputs, especially a larger range of i or smaller range to draw samples from. In theory, its runtime is infinite, but in practice, random's number generator is finite.
Alternatives
There are other ways to do the same thing that could be much more performant. For example, you could take a big random sample, then chunk it into the desired size:
n = 3
k = 2
upper = 10
sample = random.sample(range(upper), k=k*n)
for chunk in chunks(sample, k):
print(chunk)
Example output:
[6, 5]
[3, 0]
[1, 8]
With this approach, you'll never get any duplicate numbers like [[2,1], [3,2], [3,7]] because the sample contains all unique numbers.
This approach was inspired by Sven Marnach's answer on "Non-repetitive random number in numpy", which I coincidentally just read today.
it looks like you are trying to make a nested list of certain list items without repetition from original list, you can try below code.
import random
mylist = list(range(50))
def randomlist(mylist,k):
length = lambda : len(mylist)
newlist = []
while length() >= k:
newlist.append([mylist.pop(random.randint(0, length() - 1)) for I in range(k)])
newlist.append(mylist)
return newlist
randomlist(mylist,6)
[[2, 20, 36, 46, 14, 30],
[4, 12, 13, 3, 28, 5],
[45, 37, 18, 9, 34, 24],
[31, 48, 11, 6, 19, 17],
[40, 38, 0, 7, 22, 42],
[23, 25, 47, 41, 16, 39],
[8, 33, 10, 43, 15, 26],
[1, 49, 35, 44, 27, 21],
[29, 32]]
This should do the trick.
import random
import math
# create set to store samples
a = set()
# number of distinct elements in the population
m = 10
# sample size
k = 2
# number of samples
n = 3
# this protects against an infinite loop (see Safety Note)
if n > math.comb(m, k):
print(
f"Error: {math.comb(m, k)} is the number of {k}-combinations "
f"from a set of {m} distinct elements."
)
exit()
# the meat
while len(a) < n:
a.add(tuple(sorted(random.sample(range(m), k = k))))
print(a)
With a set you are guaranteed to get a collection with no duplicate elements. In a set, you would be allowed to have (1, 2) and (2, 1) inside, which is why sorted is applied. So if [1, 2] is drawn, sorted([1, 2]) returns [1, 2]. And if [2, 1] is subsequently drawn, sorted([2, 1]) returns [1, 2], which won't be added to the set because (1, 2) is already in the set. We use tuple because objects in a set have to be hashable and list objects are not.
I hope this helps. Any questions, please let me know.
Safety Note
To avoid an infinite loop when you change 3 to some large number, you need to know the maximum number of possible samples of the type that you desire.
The relevant mathematical concept for this is a combination.
Suppose your first argument of random.sample() is range(m) where
m is some arbitrary positive integer. Note that this means that the
sample will be drawn from a population of m distinct members
without replacement.
Suppose that you wish to have n samples of length k in total.
The number of possible k-combinations from the set of m distinct elements is
m! / (k! * (m - k)!)
You can get this value via
from math import comb
num_comb = comb(m, k)
comb(m, k) gives the number of different ways to choose k elements from m elements without repetition and without order, which is exactly what we want.
So in the example above, m = 10, k = 2, n = 3.
With these m and k, the number of possible k-combinations from the set of m distinct elements is 45.
You need to ensure that n is less than 45 if you want to use those specific m and k and avoid an infinite loop.
I want to find out if the maximum value in a list has a smaller index than the minimum value in a list. If there are two or more indices with the minimum value, I want to look at the greatest index. If there are two or more indices with the maximum value, I want to look at the smallest index. Now my code looks like this:
maximum = max(lijst)
minimum = minimum(lijst)
if lijst.index(maximum) <= lijst.index(minimum):
...
But this doesn't give me the indices I want with these kind of lists:
[2, 9, 15, 36, 36, 3, 2, 36]
Now I want to look at the largest index of the minimum value (which is 6 in this case) and the smallest index for the maximum value (which is 3 in this case). Does someone know how to find these indices?
you can return the min/max value in a list using min/max, then use enumerate to get indices, then apply another min/max of the indicies list, example:
my_list = [2, 9, 15, 36, 36, 3, 2, 36]
maxval = max(my_list)
indices = [index for index, val in enumerate(my_list) if val == maxval]
[3, 4, 7]
maxIndex = max(indices)
7
So if you want to check if maximum before the minimum, then return each value's index and compare the two.
You need to use Python's find function. You can find the last minimum value by continuing to check until find returns -1.
maximum = max(li)
minimum = minimum(li)
i1 = li.find(maximum)
i2 = li.find(minimum)
found = False
while(not found):
if li.find(minimum, i2+1) != -1:
i2 = li.find(minimum, i2+1)
else:
found = True
if i1 < i2:
.......
To get the index of the first maximum:
l.index(max(l))
To get the index of the last minimum you can reverse the list and apply something similar:
l.reverse()
len(l)-l.index(min(l))-1
What you probably had in mind
Although there are other answers I wanted to work on something along the lines of your code. It may not be the most efficient but I believe it is what you had in mind:
my_list = [2, 9, 15, 36, 36, 3, 2, 36]
maximum = max(my_list)
minimum = min(my_list)
first_maximum_index = my_list.index(maximum)
last_minimum_index = len(my_list)-1 - my_list[::-1].index(minimum)
if first_maximum_index <= last_minimum_index:
print("Yes!")
.index() gets the index of the first value in the list. So, to get the last minimum value, you need to reverse the list before using .index() which is this portion:
my_list[::-1].index(minimum)
After that you will get the index of the minimum value BUT it is the index of the reversed list. Now, you have to "reverse" this index by substracting the number of indices, len(my_list)-1 which gives you the final expression:
len(my_list)-1 - my_list[::-1].index(minimum)
After that, you can compare the indices as you did.
A more efficient method
Now, here's a more efficient solution (though longer, and perhaps less readable). If you notice, you are running through the list about 4 times (worst case) in the code above. You can reduce it to running through the list once:
my_list = [2, 9, 15, 36, 36, 3, 36]
# Step 1
current_min = float("inf")
current_max = float("-inf")
is_before = False
for val in my_list:
if val > current_max:
is_before = False
current_max = val
if val <= current_min:
current_min = val
is_before = True
if is_before:
print("Yes!")
The trick here is to think about subsets of the list:
[2] # ???
[2, 9] # False
[2, 9, 15] # False
[2, 9, 15, 36] # False
[2, 9, 15, 36, 36] # False
[2, 9, 15, 36, 36, 3] # False
[2, 9, 15, 36, 36, 3, 2] # True
[2, 9, 15, 36, 36, 3, 2, 36] # True
If you look closely, the result changes from True to False when there is a new maximum value. Similarly, the result changes from False to True when there is a new or existing minimum value introduced at the end of the list.
These correspond to the block of code:
# If value introduced is the new maximum
if val > current_max:
is_before = False
current_max = val
# If value introduced is an existing or new minimum
if val <= current_min:
current_min = val
is_before = True
I need a vector that stores the median values of the medians of the main list "v". I have tried something with the following code but I am only able to write some values in the correct way.
v=[1,2,3,4,5,6,7,8,9,10]
final=[]
nfac=0
for j in range (0,4):
nfac=j+1
for k in range (0,nfac):
if k%2==0:
final.append(v[10/2**(nfac)-1])
else:
final.append(v[9-10/2**(nfac)])
The first median in v=[1,2,3,4,5,6,7,8,9,10] is 5
Then I want the medians of the remaining sublists [1,2,3,4] and [6,7,8,9,10]. I.e. 2 and 8 respectively. And so on.
The list "final" must be in the following form:
final=[5,2,8,1,3,6,9,4,7,10]
Please take a note that the task as you defined it is basically equivalent to constructing a binary heap from an array.
Definitely start by defining a helper function for finding the median:
def split_by_median(l):
median_ind = (len(l)-1) // 2
median = l[median_ind]
left = l[:median_ind]
right = l[median_ind+1:] if len(l) > 1 else []
return median, left, right
Following the example you give, you want to process the resulting sublists in a breadth-first manner, so we need a queue to remember the following tasks:
from collections import deque
def construct_heap(v):
lists_to_process = deque([sorted(v)])
nodes = []
while lists_to_process:
head = lists_to_process.popleft()
if len(head) == 0:
continue
median, left, right = split_by_median(head)
nodes.append(median)
lists_to_process.append(left)
lists_to_process.append(right)
return nodes
So calling the function finally:
print(construct_heap([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])) # [5, 2, 8, 1, 3, 6, 9, 4, 7, 10]
print(construct_heap([5, 1, 2])) # [2, 1, 5]
print(construct_heap([1, 0, 0.5, -1])) # [0, -1, 0.5, 1]
print(construct_heap([])) # []
I want to write a code which locates the two closest numbers to a given argument without using any modules. Below is what I have at the moment.
list1 = (1,3,5,8,12)
x = 9
for value in list1:
point = list1[x - 1], list1[x + 1]
expected output is [8,12]
Try this:
t= [abs(i-x) for i in list1]
sorted_list = sorted(enumerate(t), key=lambda i:i[1])
(list1[sorted_list[0][0]], (list1[sorted_list[1][0]]))
It returns a tuple of your desired values (8, 12)
You can use a sorted squared dist-list to achieve it. By calculating a difference and sqaring it you get rid of negative distances. Things that are close to the target value, i.e.
9 - x = 0 ==> squared 0
10 - x = 1 ==> squared 1
8 - x = -1 ==> squared 1
12 - x = 3 ==> squared 9 etc.
stay close, things far away get even farer away.
# your "list" was a tuple - make it a list
data = [1,3,5,8,12]
x = 9
# calculate the difference between each value and your target value x
diffs = [t-x for t in data]
print(diffs)
# sort all diffs by the squared difference
diffsSorted = sorted([t-x for t in data], key = lambda x:x**2)
print(diffsSorted)
# take the lowes 2 of them
diffVals = diffsSorted[0:2]
print(diffVals)
# add x on top again
values = [t + x for t in diffVals]
print(values)
# Harder to understand, but you could reduce it into a oneliner:
allInOne=[k+x for k in sorted([t-x for t in data], key = lambda x:x**2)][:2]
print(allInOne)
Output:
[-8, -6, -4, -1, 3] # difference between x and each number in data
# [64, 36, 16, 1, 9] are the squared distances
[-1, 3, -4, -6, -8] # sorted differences
[-1, 3] # first 2 - just add x back
[8, 12] # result by stepwise
[8, 12] # result allInOne
Intermediate steps (not printed):
[64, 36, 16, 1, 9] # squared differences - we use that as key to
# sort [-8, -6, -4, -1, 3] into [-1, 3, -4, -6, -8]
Some measurements comparing heapq and list-approach (changing to abs() instead of squared):
import timeit
setup = """
import random
random.seed(42)
rnd = random.choices(range(1,100),k={})
import heapq
x = 42
n = 5 """
h = "fit = heapq.nsmallest(n, rnd, key = lambda L: abs(x - L))"
l = "allInOne=[k+x for k in sorted([t-x for t in rnd], key = lambda x:abs(x))][:n]"
rt = {}
for k in range(1,6):
s = setup.format(10**k)
rt[10**k] = (timeit.timeit(l,setup=s,number=1000),timeit.timeit(h,setup=s,number=1000))
print(rt)
Output:
# rnd-size list approch heapq
{
10: ( 0.06346651086960813, 0.11704596144812314),
100: ( 0.5278085906813885, 0.8281634763797711),
1000: ( 5.032436315978541, 7.462741343986483),
10000: ( 54.45165343575938, 79.96112521267483),
100000: (577.708372381287, 835.539905495399)
}
list is always faster then heapq, heapq is (especially for bigger lists) far better space-coplexity wise.