I am wondering how I can go about condensing these elif statements into a method of some sorts. I also don't know how to go about storing a chosen coordinate so that I can perform checks of surrounding coordinates. I know my code is nooby, but so am I, I learn better starting with the long way :)
Below is how I'm going about storing a coordinate inside a variable. (Not sure this is even the right way to do it yet...)
grab = board[x][y]
if(SjumpX == 'A1'):
grab = [0][0]
elif(SjumpX == 'A2'):
grab = [0][1]
elif(SjumpX == 'A3'):
grab = [0][2]
elif(SjumpX == 'A4'):
grab = [0][3]
elif(SjumpX == 'B1'):
grab = [1][0]
elif(SjumpX == 'B2'):
grab = [1][1]
elif(SjumpX == 'B3'):
grab = [1][2]
elif(SjumpX == 'B4'):
grab = [1][3]
elif(SjumpX == 'C1'):
grab = [2][0]
elif(SjumpX == 'C2'):
grab = [2][1]
elif(SjumpX == 'C3'):
grab = [2][2]
elif(SjumpX == 'C4'):
grab = [2][3]
SjumpX is the coordinate of the piece my player wants to grab, and DjumpX is the coordinate of the destination. My logic behind this is if the player enters a coordinate(ie A1 B2 C3...), I can then store that coordinate into the variable 'grab', then use that variable to test if the destination coordinate is empty, also if the coordinate between the two is the an opposing players piece.
Here is the board:
1 2 3 4
A - X O X
B X O - O
C O X O X
This where I'm checking that the "jumpable" destination coordinates are empty based on the current coordinates of my 'grab' variable. In this case 'A3' <==> grab = [0][2]
if((grab[x][y-2] == '-' or grab[x][y+2] == '-' or grab[x-2][y] == '-' or grab[x+2][y] == '-')and
(grab[x][y-1] == 'X' or grab[x][y+1] == 'X' or grab[x-1][y] == 'X' or grab[x+1][y] == 'X'):
My main Questions Are:
1- How do I condense my huge elif statement list?
2- What is the correct format/process to store a coordinate to perform checks on surrounding coordinate content?
3- How can I condense my if statement that checks to see if the destination coordinate is empty('-').
We can make a map
then using it we can initialize the grab
i.e,
field_map = {'A1':(0,0),'A2':(0,0)......}
if SjumpX in field_map.keys():
x,y = field_map[SjumpX]
grab = [x][y]
I think it helps
Assuming you would want to grab the board's position corresponding to SjumpX value, the following would be a simple code for the task.
grab = board[ord(SjumpX[0]) - 65][int(SjumpX[1]) - 1]
This would mean converting the first letter of SjumpX to its ASCII ordinate value (A, B, C, ...) and converting it to numbers (65, 66, 67, ...). Since the offset is 65, subtracting it from the ordinate should give you the numbers you need (0, 1, 2, ...)
On the other hand you could go for a direct method suggested by #khachik's comment.
grab = board[{'A':0, 'B':1, 'C':2}[SjumpX[0]]][int(SjumpX[1]) - 1]
This directly maps (A, B, C) to (0, 1, 2), although this statement would grow longer for larger boards (D, E, and so on).
I have two suggestions:
First: Keep an adjacency list for the or a matrix representation (this answer depends on your design, I personally like adjacency lists better)
# Adding only some of the values here
map = {'A1': ['A2','B1'], 'A2': ['A1','A3', 'B2'], 'B1': ['A1','B2','C1']}
val_map = {'A1': '-', 'B1': 'X'}
grab = SjumpX
# You can also get the values by iterating over the list from next statement
nearby_ele[grab] = map[grab]
Second: Store the mapping of row, col in a dict {'A1': (0,0), 'A2': (0,1)}. Dict is constant time lookup and you can directly get the co-ordinate making things fast. Use a matrix representation as
map = {'A1': (0,0), 'A2': (0,1), 'A3': (0,2), 'A4': (0,3),
'B1': (1,0), 'B2': (1,1), 'B3': (1,2), 'B4': (1,3),
'C1': (2,0), 'C2': (2,1), 'C3': (2,2), 'C4': (2,3),
}
val_map = [['-', 'X', 'O', 'X'], ['X', 'O', '-', 'O'],['O','X','O','X']]
grab = map[SjumpX]
nearby_ele[grab] = [(grab[0]-1,grab[1]), (grab[0]+1,grab[1]),
(grab[0],grab[1]-1), (grab[0],grab[1]+1)]
Related
Implement the function most_popular_character(my_string), which gets the string argument my_string and returns its most frequent letter. In case of a tie, break it by returning the letter of smaller ASCII value.
Note that lowercase and uppercase letters are considered different (e.g., ‘A’ < ‘a’). You may assume my_string consists of English letters only, and is not empty.
Example 1: >>> most_popular_character("HelloWorld") >>> 'l'
Example 2: >>> most_popular_character("gggcccbb") >>> 'c'
Explanation: cee and gee appear three times each (and bee twice), but cee precedes gee lexicographically.
Hints (you may ignore these):
Build a dictionary mapping letters to their frequency;
Find the largest frequency;
Find the smallest letter having that frequency.
def most_popular_character(my_string):
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] = 1
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
sorted_chars = sorted(char_count) # sort the dictionary
char_count = char_count.keys() # place the dictionary in a list
max_per = 0
for i in range(len(sorted_chars) - 1):
if sorted_chars[i] >= sorted_chars[i+1]:
max_per = sorted_chars[i]
break
return max_per
my function returns 0 right now, and I think the problem is in the last for loop and if statement - but I can't figure out what the problem is..
If you have any suggestions on how to adjust the code it would be very appreciated!
Your dictionary didn't get off to a good start by you forgetting to add 1 to the character count, instead you are resetting to 1 each time.
Have a look here to get the gist of getting the maximum value from a dict: https://datagy.io/python-get-dictionary-key-with-max-value/
def most_popular_character(my_string):
# NOTE: you might want to convert the entire sting to upper or lower case, first, depending on the use
# e.g. my_string = my_string.lower()
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] += 1 # add 1 to it
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
# Never under estimate the power of print in debugging
print(char_count)
# max(char_count.values()) will give the highest value
# But there may be more than 1 item with the highest count, so get them all
max_keys = [key for key, value in char_count.items() if value == max(char_count.values())]
# Choose the lowest by sorting them and pick the first item
low_item = sorted(max_keys)[0]
return low_item, max(char_count.values())
print(most_popular_character("HelloWorld"))
print(most_popular_character("gggcccbb"))
print(most_popular_character("gggHHHAAAAaaaccccbb 12 3"))
Result:
{'H': 1, 'e': 1, 'l': 3, 'o': 2, 'W': 1, 'r': 1, 'd': 1}
('l', 3)
{'g': 3, 'c': 3, 'b': 2}
('c', 3)
{'g': 3, 'H': 3, 'A': 4, 'a': 3, 'c': 4, 'b': 2, ' ': 2, '1': 1, '2': 1, '3': 1}
('A', 4)
So: l and 3, c and 3, A and 4
def most_popular_character(my_string):
history_l = [l for l in my_string] #each letter in string
char_dict = {} #creating dict
for item in history_l: #for each letter in string
char_dict[item] = history_l.count(item)
return [max(char_dict.values()),min(char_dict.values())]
I didn't understand the last part of minimum frequency, so I make this function return a maximum frequency and a minimum frequency as a list!
Use a Counter to count the characters, and use the max function to select the "biggest" character according to your two criteria.
>>> from collections import Counter
>>> def most_popular_character(my_string):
... chars = Counter(my_string)
... return max(chars, key=lambda c: (chars[c], -ord(c)))
...
>>> most_popular_character("HelloWorld")
'l'
>>> most_popular_character("gggcccbb")
'c'
Note that using max is more efficient than sorting the entire dictionary, because it only needs to iterate over the dictionary once and find the single largest item, as opposed to sorting every item relative to every other item.
I would like to get common elements in two given strings such that duplicates will be taken care of. It means that if a letter occurs 3 times in the first string and 2 times in the second one, then in the common string it has to occur 2 times. The length of the two strings may be different. eg
s1 = 'aebcdee'
s2 = 'aaeedfskm'
common = 'aeed'
I can not use the intersection between two sets. What would be the easiest way to find the result 'common' ? Thanks.
Well there are multiple ways in which you can get the desired result. For me the simplest algorithm to get the answer would be:
Define an empty dict. Like d = {}
Iterate through each character of the first string:
if the character is not present in the dictionary, add the character to the dictionary.
else increment the count of character in the dictionary.
Create a variable as common = ""
Iterate through the second string characters, if the count of that character in the dictionary above is greater than 0: decrement its value and add this character to common
Do whatever you want to do with the common
The complete code for this problem:
s1 = 'aebcdee'
s2 = 'aaeedfskm'
d = {}
for c in s1:
if c in d:
d[c] += 1
else:
d[c] = 1
common = ""
for c in s2:
if c in d and d[c] > 0:
common += c
d[c] -= 1
print(common)
You can use two arrays (length 26).
One array is for the 1st string and 2nd array is for the second string.
Initialize both the arrays to 0.
The 1st array's 0th index denotes the number of "a" in 1st string,
1st index denotes number of "b" in 1st string, similarly till - 25th index denotes number of "z" in 1st string.
Similarly, you can create an array for the second string and store the count of
each alphabet in their corresponding index.
s1 = 'aebcdee'
s2 = 'aaeedfs'
Below is the array example for the above s1 and s2 values
Now you can run through the 1st String
s1 = 'aebcdee'
for each alphabet find the
K = minimum of ( [ count(alphabet) in Array 1 ], [ count(alphabet) in Array 2 ] )
and print that alphabet K times.
then make that alphabet count to 0 in both the arrays. (Because if you dint make it zero, then our algo might print the same alphabet again if it comes in the future).
Complexity - O( length(S1) )
Note - You can also run through the string having a minimum length to reduce the complexity.
In that case Complexity - O( minimum [ length(S1), length(S2) ] )
Please let me know if you want the implementation of this.
you can use collection.Counter and count each char in two string and if each char exist in two string using min of list and create a new string by join of them.
from collections import Counter, defaultdict
from itertools import zip_longest
s1 = 'aebcdee'
s2 = 'aaeedfskm'
# Create a dictionary the value is 'list' and can append char in each 'list'
res = defaultdict(list)
# get count of each char
cnt1 = Counter(s1) # -> {'e': 3, 'a': 1, 'b': 1, 'c': 1, 'd': 1}
cnt2 = Counter(s2) # -> {'a': 2, 'e': 2, 'd': 1, 'f': 1, 's': 1, 'k': 1, 'm': 1}
# for appending chars in one step, we can zip count of chars in two strings,
# so Because maybe two string have different length, we can use 'itertools. zip_longest'
for a,b in zip_longest(cnt1 , cnt2):
# list(zip_longest(cnt1 , cnt2)) -> [('a', 'a'), ('e', 'e'), ('b', 'd'),
# ('c', 'f'), ('d', 's'), (None, 'k'),
# (None, 'm')]
# Because maybe we have 'none', before 'append' we need to check 'a' and 'b' don't be 'none'
if a: res[a].append(cnt1[a])
if b: res[b].append(cnt2[b])
# res -> {'a': [1, 2], 'e': [3, 2], 'b': [1], 'd': [1, 1], 'c': [1], 'f': [1], 's': [1], 'k': [1], 'm': [1]}
# If the length 'list' of each char is larger than one so this char is duplicated and we repeat this char in the result base min of each char in the 'list' of count char of two strings.
out = ''.join(k* min(v) for k,v in res.items() if len(v)>1)
print(out)
# aeed
We can use this approach for multiple string, like three strings.
s1 = 'aebcdee'
s2 = 'aaeedfskm'
s3 = 'aaeeezzxx'
res = defaultdict(list)
cnt1 = Counter(s1)
cnt2 = Counter(s2)
cnt3 = Counter(s3)
for a,b,c in zip_longest(cnt1 , cnt2, cnt3):
if a: res[a].append(cnt1[a])
if b: res[b].append(cnt2[b])
if c: res[c].append(cnt3[c])
out = ''.join(k* min(v) for k,v in res.items() if len(v)>1)
print(out)
# aeed
s1="ckglter"
s2="ancjkle"
final_list=[]
if(len(s1)<len(s2)):
for i in s1:
if(i in s2):
final_list.append(i)
else:
for i in s2:
if(i in s1):
final_list.append(i)
print(final_list)
you can also do it like this also, just iterate through both the string using for loop and append the common character into the empty list
Background:
I am attempting to determine the difference between technical drawing dimensions and actual measured dimensions. These dimensions are stored within two dictionaries: actual_data and drawing_data. Due to the fact that the actual measurements can be taken in many ways, they are stored as lists to accommodate the following cases:
1x1 list. This represents a singular dimension with no variation or tolerance specified.
1x2 list. This represents a dimension that has an upper and lower limit or tolerance.
Nx1 list. This represents a list of singular dimensions taken from one surface of a large component.
Nx2 list. This represents a list of dimensions that each have an upper and lower limit or tolerance.
To avoid too many conditionals, I break up all matrices into lists and assign each new list (either 1x1 or 1x2) a new key.
My Script:
import numpy as np
drawing_data = {
'A': [394.60],
'B': [629.85, 629.92],
'C': [759.95, 760.00],
'D': [839.95, 840.00],
'E': [1779.50, 1780.50]
}
actual_data = {
'A': [390.00],
'B': [629.88, 629.90],
'C': [760.17, 760.25],
'D': [[840.12, 840.18], [840.04, 840.06], [840.07, 840.07]],
'E': [1780.00, 1780.00]
}
As you can see, Dimension D has an expected measurement ranging between 839.95mm and 840.00mm. However, due to the size of the component, it is measured in three places (hence, the list of lists).
def dimensional_deviation(drawing, actual):
# If the actual data is of the type Nx1 or Nx2, it is split and converted into new dimensions of the
# type 1x1 or 1x2 - new dictionary keys are created to accommodate the new dimensions, and the
# original dictionary entries are deleted.
#---------------------------------------------------------------------------------------------------
new_dict_drawing, new_dict_actual, keys_to_remove = {}, {}, []
for dimension in drawing:
if type(actual.get(dimension)[0]) is list:
keys_to_remove.append(dimension) # Create a list of unnecessery dimensions
for i, sublist in enumerate(actual.get(dimension)):
new_dict_drawing[f'{dimension}{i + 1}'] = drawing.get(dimension) # Create new dimension
new_dict_actual[f'{dimension}{i + 1}'] = sublist # Create new dimension
for dimension in keys_to_remove: # Remove all unnecessary dimensions from the original dicts
drawing.pop(dimension, None)
actual.pop(dimension, None)
# Merge dictionaries:
drawing = {**drawing, **new_dict_drawing}
actual = {**actual, **new_dict_actual}
#---------------------------------------------------------------------------------------------------
# Determine the deviation between the drawing and actual dimensions. The average of the upper and
# lower bounds is used to simplify each case.
#---------------------------------------------------------------------------------------------------
inspection_results = {}
for dimension in drawing:
drawing_value, actual_value = drawing.get(dimension), actual.get(dimension)
drawing_ave, actual_ave = np.mean(drawing_value), np.mean(actual_value)
deviation = drawing_ave - actual_ave
# Create new dictionary of the repair requirements:
#---------------------------------------------------------------------------------------------------
inspection_results[f'{dimension}'] = round(deviation, 3)
return inspection_results
My Problem:
When I call the above for the first time, I get the desired output. Dimension D is broken up as expected and the deviation is calculated. However, when I call the same function a second time, everything regarding Dimension D is completely neglected as though the key did not exist:
print('First function call:')
print(dimensional_deviation(drawing=drawing_data, actual=actual_data))
print('---------------------------------------------------------------------------------------')
print('Second function call:')
print(dimensional_deviation(drawing=drawing_data, actual=actual_data))
print('---------------------------------------------------------------------------------------')
Resulting in:
First function call:
{'A': 4.6, 'B': -0.005, 'C': -0.235, 'E': 0.0, 'D1': -0.175, 'D2': -0.075, 'D3': -0.095}
---------------------------------------------------------------------------------------
Second function call:
{'A': 4.6, 'B': -0.005, 'C': -0.235, 'E': 0.0}
---------------------------------------------------------------------------------------
I believe I am overwriting my drawing_data and actual_data somewhere, but I cannot find the issue. Additionally, this is one of my first times using dictionaries and I suspect that my key creation and deletion may not be best practice.
In my comments you will see Create a list of unnecessary dimensions - an example of this would be Dimension D, as it is newly accounted for in D1, D2 and D3.
Could somebody please explain to me why I get this result on each subsequent function call after the first?
The issue was that the original dictionaries were being modified (see above comments):
import numpy as np
drawing_data = {
'A': [394.60],
'B': [629.85, 629.92],
'C': [759.95, 760.00],
'D': [839.95, 840.00],
'E': [1779.50, 1780.50]
}
actual_data = {
'A': [390.00],
'B': [629.88, 629.90],
'C': [760.17, 760.25],
'D': [[840.12, 840.18], [840.04, 840.06], [840.07, 840.07]],
'E': [1780.00, 1780.00]
}
#-------------------------------------------------------------------------------------------------------
# The 'dimensional deviation' function takes the drawing data and actual data as arguments, returning a
# dictionary of dimensions and whether they require rectification or not (based one the drawing data).
#-------------------------------------------------------------------------------------------------------
def dimensional_deviation(drawing, actual):
temp_dict_drawing = {}
for key in drawing:
temp_dict_drawing[key] = drawing.get(key)
temp_dict_actual = {}
for key in drawing:
temp_dict_actual[key] = actual.get(key)
# If the actual data is of the type Nx1 or Nx2, it is split and converted into new dimensions of the
# type 1x1 or 1x2 - new dictionary keys are created to accommodate the new dimensions, and the
# original dictionary entries are deleted.
#---------------------------------------------------------------------------------------------------
new_dict_drawing, new_dict_actual, keys_to_remove = {}, {}, []
for dimension in temp_dict_drawing:
if type(temp_dict_actual.get(dimension)[0]) is list:
keys_to_remove.append(dimension) # Create a list of unnecessery dimensions
for i, sublist in enumerate(temp_dict_actual.get(dimension)):
new_dict_drawing[f'{dimension}{i + 1}'] = temp_dict_drawing.get(dimension) # Create new dimension
new_dict_actual[f'{dimension}{i + 1}'] = sublist # Create new dimension
for dimension in keys_to_remove: # Remove all unnecessary dimensions from the original dicts
temp_dict_drawing.pop(dimension)
temp_dict_actual.pop(dimension)
# Merge dictionaries:
drawing_new = {**temp_dict_drawing, **new_dict_drawing}
actual_new = {**temp_dict_actual, **new_dict_actual}
#---------------------------------------------------------------------------------------------------
# Determine the deviation between the drawing and actual dimensions. The average of the upper and
# lower bounds is used to simplify each case.
#---------------------------------------------------------------------------------------------------
inspection_results = {}
for dimension in drawing_new:
drawing_value, actual_value = drawing_new.get(dimension), actual_new.get(dimension) # Fetch the data
drawing_ave, actual_ave = np.mean(drawing_value), np.mean(actual_value) # Calculate averages
deviation = drawing_ave - actual_ave
# Create new dictionary of the repair requirements:
#---------------------------------------------------------------------------------------------------
inspection_results[f'{dimension}'] = round(deviation, 3)
return inspection_results
Alternatively, copy.deepcopy() also yielded the correct results:
Deep copy of a dict in python
i have a list like this.
a = [
['a1', 'b2', 'c3'],
['c3', 'd4', 'a1'],
['b2', 'a1', 'e5'],
['d4', 'a1', 'b2'],
['c3', 'b2', 'a1']
]
I'll be given x (eg: 'a1'). I have to find the co-occurrence of a1 with every other element and sort it and retrieve the top n (eg: top 2)
my answer should be
[
{'product_id': 'b2', 'count': 4},
{'product_id': 'c3', 'count': 3},
]
my current code looks like this:
def compute (x):
set_a = list(set(list(itertools.chain(*a))))
count_dict = []
for i in range(0, len(set_a)):
count = 0
for j in range(0, len(a)):
if x==set_a[i]:
continue
if x and set_a[i] in a[j]:
count+=1
if count>0:
count_dict.append({'product_id': set_a[i], 'count': count})
count_dict = sorted(count_dict, key=lambda k: k['count'], reverse=True) [:2]
return count_dict
And it works beautifully for smaller inputs. However my actual input has 70000 unique items instead of 5 (a to e) and 1.3 million rows instead of 5. And hence mxn becomes very exhaustive. Is there a faster way to do this?
"Faster" is a very general term. Do you need a shorter total processing time, or shorter response time for a request? Is this for only one request, or do you want a system that handles repeated inputs?
If what you need is the fastest response time for repeated inputs, then convert this entire list of lists into a graph, with each element as a node, and the edge weight being the number of occurrences between the two elements. You make a single pass over the data to build the graph. For each node, sort the edge list by weight. From there, each request is a simple lookup: return the weight of the node's top edge, which is a hash (linear function) and two direct access operations (base address + offset).
UPDATE after OP's response
"fastest response" seals the algorithm, then. What you want to have is a simple dict, keyed by each node. The value of each node is a sorted list of related elements and their counts.
A graph package (say, networkx) will give you a good entry to this, but may not retain a node's edges in fast form, nor sorted by weight. Instead, pre-process your data base. For each row, you have a list of related elements. Let's just look at the processing for some row in the midst of the data set; call the elements a5, b2, z1, and the dict d. Assume that a5, b2 is already in your dict.
using `intertools`, Iterate through the six pairs.
(a5, b2):
d[a5][b2] += 1
(a5, z1):
d[a5][z1] = 1 (creates a new entry under a5)
(b2, a5):
d[b2][a5] += 1
(b2, z1):
d[b2][z1] = 1 (creates a new entry under b2)
(z1, a5):
d[z1] = {} (creates a new z1 entry in d)
d[z1][a5] = 1 (creates a new entry under z1)
(z1, b2):
d[z1][b2] = 1 (creates a new entry under z1)
You'll want to use defaultdict to save you some hassle to detect and initialize new entries.
With all of that handled, you now want to sort each of those sub-dicts into order based on the sub-level values. This leaves you with an ordered sequence for each element. When you need to access the top n connected elements, you go straight to the dict and extract them:
top = d[elem][:n]
Can you finish the coding from there?
as mentioned by #prune it is not mentioned that do you want a shorter processing time or shorter response time.
So I will explain two approach to this problem
The optimised code approach (for less processing time)
from heapq import nlargest
from operator import itemgetter
#say we have K THREADS
def compute (x, top_n=2):
# first you find the unique items and save them somewhere easily accessible
set_a = list(set(list(itertools.chain(*a))))
#first find that in which of your ROWS the x exists
selected_rows=[]
for i,row in enumerate(a): #this whole loop can be parallelized
if x in row:
selected_rows.append(i) #append index of the row in selected_rows array
# time complexity till now is still O(M*N) but this part can be run in parallel as well, as each row # can be evaluated independently M items can be evaluated independently
# THE M rows can be run in parallel, if we have K threads
# it is going to take us (M/K)*N time complexity to run it.
count_dict=[]
# now the same thing you did earlier but now in second loop we are looking for less rows
for val in set_a:
if val==x:
continue
count=0
for ri in selected_rows: # this whole part can be parallelized as well
if val in a[ri]:
count+=1
count_dict.append({'product_id':val, 'count': count})
# if our selected rows size on worst case is M itself
# and our unique values are U, the complexity
# will be (U/K)*(M/K)*N
res = nlargest(top_n, count_dict, key = itemgetter('count'))
return res
Lets calculate time complexity here
If we have K threads then
O((M/K)*N)+O((U/K)*(M/K)*N))
where
M---> Total rows
N---> Total Columns
U---> Unique Values
K---> number of threads
Graph approach as suggested by Prune
# other approach
#adding to Prune approach
big_dictionary={}
set_a = list(set(list(itertools.chain(*a))))
for x in set_a:
big_dictionary[x]=[]
for y in set_a:
count=0
if x==y:
continue
for arr in a:
if (x in arr) and (y in arr):
count+=1
big_dictionary[x].append((y,count))
for x in big_dictionary:
big_dictionary[x]=sorted(big_dictionary[x], key=lambda v:v[1], reverse=True)
Lets calculate time complexity for this one here
One time complexity will be:
O(U*U*M*N)
where
M---> Total rows
N---> Total Columns
U---> Unique Values
But once this big_dictionary is calculated once,
It will take you just 1 step to get your topN values
For example if we want to get top3 values for a1
result=big_dictionary['a1'][:3]
I followed the defaultdict approach as suggested by #Prune above. Here's the final code:
from collections import defaultdict
def recommender(input_item, b_list, n):
count =[]
top_items = []
for x in b.keys():
lst_2 = b[x]
common_transactions = len(set(b_list) & set(lst_2))
count.append(common_transactions)
top_ids = list((np.argsort(count)[:-n-2:-1])[1::])
top_values_counts = [count[i] for i in top_ids]
key_list = list(b.keys())
for i,v in enumerate(top_ids):
item_id = key_list[v]
top_items.append({item_id: top_values_counts[i]})
print(top_items)
return top_items
a = [
['a1', 'b2', 'c3'],
['c3', 'd4', 'a1'],
['b2', 'a1', 'e5'],
['d4', 'a1', 'b2'],
['c3', 'b2', 'a1']
]
b = defaultdict(list)
for i, s in enumerate(a):
for key in s :
b[key].append(i)
input_item = str(input("Enter the item_id: "))
n = int(input("How many values to be retrieved? (eg: top 5, top 2, etc.): "))
top_items = recommender(input_item, b[input_item], n)
Here's the output for top 3 for 'a1':
[{'b2': 4}, {'c3': 3}, {'d4': 2}]
Thanks!!!
I have a dataset:
from pandas import DataFrame
Cars = {'1': [140.8731392,142.3481116,146.7621232,144.9406286,144.8725356,145.3976902],
'2': [147.6279494,141.4455089,147.3953295,144.6467237,146.406241,147.0695877],
'3': [140.7164976,143.4675429,145.9967808,141.7831729,144.4806287,147.7805723],
'4': [149.359966,147.0236556,146.2931072,148.478762,149.565317,143.9501002],
'5': [145.9216418,143.3376241,145.2974838,148.80916,143.7103238,145.4369799],
'6': [146.2192954,149.0914385,146.3690445,143.3845218,140.1431644,149.6484708]
}
df = DataFrame(Cars,columns= ['1', '2', '3', '4', '5', '6'])
print (df)
If I try to identify which col contain outliers, using:
outlier_numbers = []
explantations = []
for col in df.columns:
quartile_01, quartile_03 = np.percentile(df[col].dropna(), [25, 75])
iqr = quartile_03 - quartile_01
lwer_bound = quartile_01 -(1.5 * iqr)
upper_bound = quartile_03 +(1.5 * iqr)
outliers_number = ((df[col] < (quartile_01 - 1.5 * iqr)) | (df[col] > (quartile_03 + 1.5 * iqr))).sum() #!=0
explanation = f"The lower and upper bound of the range for '{col}' respectively is: {lwer_bound} and {upper_bound}"
if outliers_number >0:
outlier_numbers.append(outliers_number)
explantations.append(explanation)
a_dict = {key: value for key, value in zip(outlier_numbers, explantations)}
values_checking = len(outlier_numbers) == 0
and then, print outliers_number, I will get [0, 1, 0, 0, 1, 0], meaning that col 2 and 5 contain outliers. But if I check the zipped a_dict, I will get : {1: "The lower and upper bound of the range for '5' respectively is: 141.5670700125 and 148.3405201125"} which doesn't make sense to me. Why only one element, not two elements got zipped?
The source of your problem is that outlier_numbers contains
[1, 1] (for both columns 2 and 5 the number of outliers is 1).
Then, when you create a dictionary:
The first item has key 1 and the first explanation.
Then the second pair from zip has also key == 1,
so the (only) existing item is overwritten with the
second explanation.
Maybe keys should be column names, not the number of outliers?