I have two dataframes (attached image). For each of the given row in Table-1 -
Part1 - I need to find the row in Table-2 which gives the minimum Euclidian distance. Output-1 is the expected answer.
Part2 - I need to find the row in Table-2 which gives the minimum Euclidian distance. Output-2 is the expected answer. Here the only difference is that a row from Table-2 cannot be selected two times.
I tried this code to get the distance but not sure on how to add other fields -
import numpy as np
from scipy.spatial import distance
s1 = np.array([(2,2), (3,0), (4,1)])
s2 = np.array([(1,3), (2,2),(3,0),(0,1)])
print(distance.cdist(s1,s2).min(axis=1))
Two dataframes and the expected output:
The code now gives the desired output, and there's a commented out print statement for extra output.
It's also flexible to different list lengths.
Credit also to: How can the Euclidean distance be calculated with NumPy?
Hope it helps:
from numpy import linalg as LA
list1 = [(2,2), (3,0), (4,1)]
list2 = [(1,3), (2,2),(3,0),(0,1)]
names = range(0, len(list1) + len(list2))
names = [chr(ord('`') + number + 1) for number in names]
i = -1
j = len(list1) #Start Table2 names
for tup1 in list1:
collector = {} #Let's collect values for each minimum check
j = len(list1)
i += 1
name1 = names[i]
for tup2 in list2:
name2 = names[j]
a = numpy.array(tup1)
b = numpy.array(tup2)
# print ("{} | {} -->".format(name1, name2), tup1, tup2, " ", numpy.around(LA.norm(a - b), 2))
j += 1
collector["{} | {}".format(name1, name2)] = numpy.around(LA.norm(a - b), 2)
if j == len(names):
min_key = min(collector, key=collector.get)
print (min_key, "-->" , collector[min_key])
Output:
a | e --> 0.0
b | f --> 0.0
c | f --> 1.41
Related
This is a homework which was given to me and I have been struggling with writing the solution.
Write a program that finds the longest adjacent sequence of colors in a matrix(2D grid). Colors are represented by ‘R’, ‘G’, ‘B’ characters (respectively Red, Green and Blue).
You will be provided with 4 individual test cases, which must also be included in your solution.
An example of your solution root directory should look like this:
solutionRootDir
| - (my solution files and folders)
| - tests/
| - test_1
| - test_2
| - test_3
| - test_4
Individual test case input format:
First you should read two whitespace separated 32-bit integers from the provided test case
that represents the size (rows and cols) of the matrix.
Next you should read rows number of newline separated lines of 8-bit characters.
Your program should find and print the longest adjacent sequence (diagonals are not counted as adjacent fields),
and print to the standard output the number.
NOTE: in case of several sequences with the same length – simply print their equal length.
test_1
Provided input:
3 3
R R B
G G R
R B G
Expected Output:
2
test_2
Provided input:
4 4
R R R G
G B R G
R G G G
G G B B
Expected Output:
7
test_3
Provided input:
6 6
R R B B B B
B R B B G B
B G G B R B
B B R B G B
R B R B R B
R B B B G B
Expected Output:
22
test_4
Provided input:
1000 1000
1000 rows of 1000 R’s
Expected Output:
1000000
Your program entry point should accepted from one to four additional parameters.
Those parameters will indicate the names of the test cases that your program should run.
• Example 1: ./myprogram test_1 test_3
• Example 2: ./myprogram test_1 test_2 test_3 test_4
• you can assume that the input from the user will be correct (no validation is required)
import numpy as np
a = int(input("Enter rows: "))
b = int(input("Enter columns: "))
rgb = ["R", "G", "B"]
T = [[0 for col in range(b)] for row in range(a)]
for row in range(a):
for col in range(b):
T[row][col] = np.random.choice(rgb)
for r in T:
for c in r:
print(c, end=" ")
print()
def solution(t):
rows: int = len(t)
cols: int = len(t[0])
longest = np.empty((rows, cols))
longest_sean = 1
for i in range(rows - 1, -1, -1):
for j in range(cols - 1, -1, -1):
target = t[i][j]
current = 1
for ii in range(i, rows):
for jj in range(j, cols):
length = 1
if target == t[ii][jj]:
length += longest[ii][jj]
current = max(current, length)
longest[i][j] = current
longest_sean = max(current, longest_sean)
return longest_sean
print(solution(T))
in order to get the parameters from the console execution you have to use sys.argv so from sys import argv. than convert your text field to python lists like this
def load(file):
with open(file+".txt") as f:
data = f.readlines()
res = []
for row in data:
res.append([])
for element in row:
if element != "\n" and element != " ":
res[-1].append(element)
return res
witch will create a 2 dimentional list of containing "R", "B" and "G". than you can simply look for the longest area of one Value like using this Function:
def findLargest(data):
visited = []
area = []
length = 0
movement = [(1,0), (0,1), (-1,0),(0,-1)]
def recScan(x, y, scanArea):
visited.append((x,y))
scanArea.append((x,y))
for dx, dy in movement:
newX, newY = x+dx, y+dy
if newX >= 0 and newY >= 0 and newX < len(data) and newY < len(data[newX]):
if data[x][y] == data[newX][newY] and (not (newX, newY) in visited):
recScan(newX, newY, scanArea)
return scanArea
for x in range(len(data)):
for y in range(len(data[x])):
if (x, y) not in visited:
newArea = recScan(x, y, [])
if len(newArea) > length:
length = len(newArea)
area = newArea
return length, area
whereby recScan will check all adjacent fields that haven't bean visited jet. than just call the functions like this:
if __name__ == "__main__":
for file in argv[1:]:
data = load(file)
print(findLargest(data))
the argv[1:] is reqired because the first argument passed to python witch is the file you want to execute. my data structure is.
main.py
test_1.txt
test_2.txt
test_3.txt
test_4.txt
and test_1 threw test_4 look like this just with other values.
R R B B B B
B R B B G B
B G G B R B
B B R B G B
R B R B R B
R B B B G B
I am writing a program which solves a function in an interval 0:9 where step size is 0.005. This program requires 1800 calculations and a way to find the max value of a function and x argument which was used.
What would be the recommended way and loops to use in order calculate function 1800 times (9/0.005), find the max value of it and output related argument value which was used in calculation for the max value?
My idea was that there should be 2 lists generated, one for the range/interval (1800 items) and other for calculated values (also 1800). Which would then find max in 'calculated array' and related x argument in the other array, using list index or some other method..
from operator import itemgetter
import math
myfile = open("result.txt", "w")
data = []
step=0.005
rng=9
lim=rng/step
print(lim)
xs=[x * step for x in range(rng)]
lim_int=int(lim)
print(xs)
for i in range(lim_int):
num=itemgetter(i)(xs)
x=math.sin(num)* math.exp(-num/100)
print(i, x)
data.append(x)
for i in range(rng):
text = str(i)
text2 = str(data[i])
print(text, text2)
myfile.write(text + ' ' + text2 + '\n')
i=1
while i < rng:
i=i+1
num2=itemgetter(i)(xs)
v=math.sin(num2)* math.exp(-num2/100)
if v==max(data):
arg=num2
break
print('largest function value', max(data))
print('function argument value used', arg)
myfile.close()
Numpy is the widely used performant package for this:
import numpy as np
x = np.arange(0, 9, 0.005)
f = np.sin(x)*np.exp(-x/100)
print("max is: ", np.max(f))
print("index of max is: ", np.argmax(f))
output:
max is: 0.98446367206362
index of max is: 312
If for some reason you want a native python solution (without using list methods max and index), you can do something like this:
step = 0.005
rng = 9
lim = int(rng/step)
x = [x_i*step for x_i in range(lim + 1)]
f = [math.exp(-x_i/100)*math.sin(x_i) for x_i in x]
max_ind = 0
f_max = f[max_ind]
for j, f_x in enumerate(f):
if f_x > f_max:
f_max = f_x
max_ind = j
** I modified the entire question **
I have an example list specified below and i want to find if 2 values are from the same list and i wanna know which list both the value comes from.
list1 = ['a','b','c','d','e']
list2 = ['f','g','h','i','j']
c = 'b'
d = 'e'
i used for loop to check whether the values exist in the list however not sure how to obtain which list the value actually is from.
for x,y in zip(list1,list2):
if c and d in x or y:
print(True)
Please advise if there is any work around.
First u might want to inspect the distribution of values and sizes where you can improve the result with the least effort like this:
df_inspect = df.copy()
df_inspect["size.value"] = ["size.value"].map(lambda x: ''.join(y.upper() for y in x if x.isalpha() if y != ' '))
df_inspect = df_inspect.groupby(["size.value"]).count().sort_values(ascending=False)
Then create a solution for the most occuring size category, here "Wide"
long = "adasda, 9.5 W US"
short = "9.5 Wide"
def get_intersection(s1, s2):
res = ''
l_s1 = len(s1)
for i in range(l_s1):
for j in range(i + 1, l_s1):
t = s1[i:j]
if t in s2 and len(t) > len(res):
res = t
return res
print(len(get_intersection(long, short)) / len(short) >= 0.6)
Then apply the solution to the dataframe
df["defective_attributes"] = df.apply(lambda x: len(get_intersection(x["item_name.value"], x["size.value"])) / len(x["size.value"]) >= 0.6)
Basically, get_intersection search for the longest intersection between the itemname and the size. Then takes the length of the intersection and says, its not defective if at least 60% of the size_value are also in the item_name.
I was given a task to create the smallest number from two numbers remaining zeros.
But I cannot solve the task because of my code is not remaining all zeros appropriate. If the input is
245
36
the output is 23456 and that's correct. But with input:
40
305
it outputs: [0,0,3,4,5]. But should be 30045.
Here's my code:
f1 = [int(x) for x in input()]
f2 = [int(y) for y in input()]
f = f1+f2
for each in range(len(f)):
for eacc in range(each+1, len(f)):
if f[each] > f[eacc]:
f[each], f[eacc] = f[eacc], f[each]
for zero in range(len(f)):
if f[zero] == 0 and f[0] > 0:
f.remove(0)
f.insert(zero+1, 0)
break
print(f)
n1 = 40
n2 = 305
# sort lexicograhically
ns = sorted(str(n1) + str(n2))
# move the first non-zero element to the start
i = ns.count('0')
if 0 < i < len(ns):
ns[0:0] = ns.pop(i)
Remove all the zeros. Get all the permutations and find the min. Then add zero from index 1
from itertools import permutations
a=list('40')+list('305')
a=list(map(int,a))
num_of_zero=a.count(0) # get the count of zeros
for i in range(num_of_zero):
a.pop(a.index(0))
new_list=list(min(list(permutations(a)))) # get all the permutations
for i in range(num_of_zero):
new_list.insert(1,0) # insert zeros at index 1 shifting all element to the right
print(''.join(map(str,new_list)))#30045
Without permutations sorted will also work
a=list('40')+list('305')
a=list(map(int,a))
num_of_zero=a.count(0)
for i in range(num_of_zero):
a.pop(a.index(0))
new_list=sorted(a)
for i in range(num_of_zero):
new_list.insert(1,0)
print(''.join(map(str,new_list)))#30045
Using numpy
import numpy as np
a=list('40')+list('305')
a=list(map(int,a))
num_of_zero=a.count(0)
new_list=sorted(a) # sorted will return [0,0,3,4,5]
I = np.nonzero(new_list) #return non_zero_array
if(len(I[0])>0):
first_non_zero_value=new_list.pop(I[0][0]) #get index of first element
new_list.insert(0,first_non_zero_value)
print(''.join(map(str,new_list)))#30045
Here you could use itertools.permutations. First I would use map to change the ints to lists. Next I would concatenate them and have one list of 5 ints. Then using permutations we could generate all possible numbers that could be made from these 5 ints. From our new list we could now take the min using *list comprehension to filter out any item that begins with 0 using if i[0]. Since it is a tuple we have to convert the elements to str then we can join them into an int and print
from itertools import permutations
a = 40
b = 305
a = [*map(int, str(a))]
b = [*map(int, str(b))]
c = a + b
combo = list(permutations(c, len(c)))
res = min([i for i in combo if i[0]])
res = [str(i) for i in res]
print(''.join(res))
# 30045
If a = 0, b = 0 is a potential input, a try/except block would be neccessary
try:
res = min([i for i in combo if i[0]])
res = [str(i) for i in res]
print(int(''.join(res)))
except ValueError:
res = 0
print(res)
i have a hw assignment i just finished up but it looks pretty horrendous knowing that theres a much simpler and efficient way to get the correct output but i just cant seem to figure it out.
Heres the objective of the assignment.
Write a program that stores the following values in a 2D list (these will be hardcoded):
2.42 11.42 13.86 72.32
56.59 88.52 4.33 87.70
73.72 50.50 7.97 84.47
The program should determine the maximum and average of each column
Output looks like
2.42 11.42 13.86 72.32
56.59 88.52 4.33 87.70
73.72 50.50 7.97 84.47
============================
73.72 88.52 13.86 87.70 column max
44.24 50.15 8.72 81.50 column average
The printing of the 2d list was done below, my problem is calculating the max, and averages.
data = [ [ 2.42, 11.42, 13.86, 72.32],
[ 56.59, 88.52, 4.33, 87.70],
[ 73.72, 50.50, 7.97, 84.47] ]
emptylist = []
r = 0
while r < 3:
c = 0
while c < 4 :
print "%5.2f" % data[r][c] ,
c = c + 1
r = r + 1
print
print "=" * 25
This prints the top half but the code i wrote to calculate the max and average is bad. for max i basically comapred all indexes in columns to each other with if, elif, statements and for the average i added the each column indency together and averaged, then printed. IS there anyway to calculate the bottom stuff with some sort of loop. Maybe something like the following
for numbers in data:
r = 0 #row index
c = 0 #column index
emptylist= []
while c < 4 :
while r < 3 :
sum = data[r][c]
totalsum = totalsum + sum
avg = totalsum / float(rows)
emptylist.append(avg) #not sure if this would work? here im just trying to
r = r + 1 #dump averages into an emptylist to print the values
c = c + 1 #in it later?
or something like that where im not manually adding each index number to each column and row. The max one i have no clue how to do in a loop . also NO LIST METHODS can be used. only append and len() can be used. Any help?
Here is what you're looking for:
num_rows = len(data)
num_cols = len(data[0])
max_values = [0]*num_cols # Assuming the numbers in the array are all positive
avg_values = [0]*num_cols
for row_data in data:
for col_idx, col_data in enumerate(row):
max_values[col_idx] = max(max_values[col_idx],col_data) # Max of two values
avg_values[col_idx] += col_data
for i in range(num_cols):
avg_values[i] /= num_rows
Then the max_values will contain the maximum for each column, while avg_values will contain the average for each column. Then you can print it like usual:
for num in max_values:
print num,
print
for num in avg_values:
print num
or simply (if allowed):
print ' '.join(max_values)
print ' '.join(avg_values)
I would suggest making a two new lists, each of the same size of each of your rows, and keeping a running sum in one, and a running max in the second one:
maxes = [0] * 4 # equivalent to [0, 0, 0, 0]
avgs = [0] * 4
for row in data: # this gives one row at a time
for c in range(4): # equivalent to for c in [0,1,2,3]:
#first, check if the max is big enough:
if row[c] > maxes[c]:
maxes[c] = row[c]
# next, add that value to the sum:
avgs[c] += row[c]/4.
You can print them like so:
for m in maxes:
print "%5.2f" % m,
for s in sums:
print "%5.2f" % s,
If you are allowed to use the enumerate function, this can be done a little more nicely:
for i, val in enumerate(row):
print i, val
0 2.42
1 11.42
2 13.86
3 72.32
So it gives us the values and the index, so we can use it like this:
maxes = [0] * 4
sums = [0] * 4
for row in data:
for c, val in enumerate(row):
#first, check if the max is big enough:
if val > maxes[c]:
maxes[c] = val
# next, add that value to the sum:
sums[c] += val