N length list of nearby ranked elements, excluding matching id - python

I have two data sets:
A list of ids, categories, and sorted counts of id occurrence, grouped_df.
A list of ids, categories, and matching names, label_df.
I wish to match the category for a given id and then pull n matching names, excluding the given id, from a ranked list using only that category. I want ids that are above and below the matched id but not that id itself.
I have something that works for even numbers but not for odds.
def rolling_match(id, n_matches=3):
roll_length = (n_matches//2) # number of positions to move from selected id.
index_length = (n_matches+1) #length to include from the rolled list.
label = label_df.loc[id][0] # category label
arr = grouped_df.loc[label] # subset category label
idx = len(grouped_df.loc[label][(id):])+roll_length #position of first element in list
indices = np.delete(np.roll(arr.index, idx)[:index_length], roll_length)
# pulls a rolled list around chosen id and drops that element.
return label_df.loc[indices]
Thanks for any help!

To make your solution work for odd numbers, you need to modify the roll_length and index_length calculations to handle the case where n_matches is odd.
def rolling_match(id, n_matches=3):
roll_length = (n_matches-1) // 2 # number of positions to move from selected id.
index_length = n_matches # length to include from the rolled list.
label = label_df.loc[id][0] # category label
arr = grouped_df.loc[label] # subset category label
idx = len(grouped_df.loc[label][(id):]) + roll_length # position of first element in list
indices = np.delete(np.roll(arr.index, idx)[:index_length], roll_length)
# pulls a rolled list around chosen id and drops that element.
return label_df.loc[indices]
In this modified code, roll_length is calculated as (n_matches-1) // 2 to handle odd numbers, and index_length is set to n_matches to include all n_matches names in the returned list.

I got my version working, the problem is related to Ramesh's point. What confused me was I needed to subtract 1 before mod-division and also add 1 to get the right index length.
def rolling_match(id, n_matches=3):
roll_length = (n_matches-1)//2 # number of positions to move from selected id.
index_length = (n_matches+1) #length to include from the rolled list.
label = label_df.loc[id][0] # category label
arr = grouped_df.loc[label] # subset category label
idx = len(grouped_df.loc[label][(id):])+roll_length #position of first element in list
indices = np.delete(np.roll(arr.index, idx)[:index_length], roll_length)
# pulls a rolled list around chosen id and drops that element.
return label_df.loc[indices]
I am interested in recommendations for improving or refactoring this.

Related

How would you build in criteria when using Random assign people to groups

I would like to assign people randomly to a group. I have used the following code from an existing script but I would like to add a criteria where "Kimani" is always no. 2 in a group
'''
import random
participants=
["Alex","Elsie","Elise","Kimani","Ryan","Chris","Paul","Chris1","Pau2l",
"Chris3","Paul3"]
group=1
membersInGroup=5
for participant in participants[:]: # only modification
if membersInGroup==5:
print("Group {} consists of;".format(group))
membersInGroup=0
group+=1
person=random.choice(participants)
print(person)
membersInGroup+=1
participants.remove(str(person))
'''
you could do something like this:
import math
Kimani_group = math.ceil(random.randint(1,len(participants)) / 5) # round up to the nearest random selection of a group
participants.remove(str("Kimani")) # remove Kimani as their group has already been selected, just need to insert them
for count in range(len(participants) + 1): # add +1 to participants as Kimani was part of the count but removed; changed count to the index of the loop
if membersInGroup==5:
print("Group {} consists of;".format(group))
membersInGroup=0
group+=1
if count % 5 == 1 and math.ceil((count + 1) / 5) == Kimani_group: # check if the second position in the group and that the group is the preselected group
print("Kimani")
membersInGroup+=1
continue # skip the rest of the code in this iteration and continue to the next iteration
person=random.choice(participants)
print(person)
membersInGroup+=1
participants.remove(str(person))
This makes Kimani number 2 in a group they join.

Extracting multiple data from a single list

I working on a text file that contains multiple information. I converted it into a list in python and right now I'm trying to separate the different data into different lists. The data is presented as following:
CODE/ DESCRIPTION/ Unity/ Value1/ Value2/ Value3/ Value4 and then repeat, an example would be:
P03133 Auxiliar helper un 203.02 417.54 437.22 675.80
My approach to it until now has been:
Creating lists to storage each information:
codes = []
description = []
unity = []
cost = []
Through loops finding a code, based on the code's structure, and using the code's index as base to find the remaining values.
Finding a code's easy, it's a distinct type of information amongst the other data.
For the remaining values I made a loop to find the next value that is numeric after a code. That way I can delimitate the rest of the indexes:
The unity would be the code's index + index until isnumeric - 1, hence it's the first information prior to the first numeric value in each line.
The cost would be the code's index + index until isnumeric + 2, the third value is the only one I need to store.
The description is a little harder, the number of elements that compose it varies across the list. So I used slicing starting at code's index + 1 and ending at index until isnumeric - 2.
for i, carc in enumerate(txtl):
if carc[0] == "P" and carc[1].isnumeric():
codes.append(carc)
j = 0
while not txtl[i+j].isnumeric():
j = j + 1
description.append(" ".join(txtl[i+1:i+j-2]))
unity.append(txtl[i+j-1])
cost.append(txtl[i+j])
I'm facing some problems with this approach, although there will always be more elements to the list after a code I'm getting the error:
while not txtl[i+j].isnumeric():
txtl[i+j] list index out of range.
Accepting any solution to debug my code or even new solutions to problem.
OBS: I'm also going to have to do this to a really similar data font, but the code would be just a sequence of 7 numbers, thus harder to find amongst the other data. Any solution that includes this facet is also appreciated!
A slight addition to your code should resolve this:
while i+j < len(txtl) and not txtl[i+j].isnumeric():
j += 1
The first condition fails when out of bounds, so the second one doesn't get checked.
Also, please use a list of dict items instead of 4 different lists, fe:
thelist = []
thelist.append({'codes': 69, 'description': 'random text', 'unity': 'whatever', 'cost': 'your life'})
In this way you always have the correct values together in the list, and you don't need to keep track of where you are with indexes or other black magic...
EDIT after comment interactions:
Ok, so in this case you split the line you are processing on the space character, and then process the words in the line.
from pprint import pprint # just for pretty printing
textl = 'P03133 Auxiliar helper un 203.02 417.54 437.22 675.80'
the_list = []
def handle_line(textl: str):
description = ''
unity = None
values = []
for word in textl.split()[1:]:
# it splits on space characters by default
# you can ignore the first item in the list, as this will always be the code
# str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296
if not word.replace(',', '').replace('.', '').isnumeric():
if len(description) == 0:
description = word
else:
description = f'{description} {word}' # I like f-strings
elif not unity:
# if unity is still None, that means it has not been set yet
unity = word
else:
values.append(word)
return {'code': textl.split()[0], 'description': description, 'unity': unity, 'values': values}
the_list.append(handle_line(textl))
pprint(the_list)
str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296

Execute an action based on how many times an element appears in a list

I have a list :
CASE 1 : group_member = ['MEU1', 'MEU1','MEU2', 'MEU1','MEU1','MEU2','MEU1','MEU2','MEU1','MEU3']
CASE 2 : group_member = ['MEU1','MEU2','MEU3','None','None']
CASE 3 : group_member = ['MEU1','MEU2','MEU3','MEU1','CEU1']
What I'm trying to do is insert a value in a table in sql if 70% of the list has the same value or send mail to some users if the values is below 70%.
For the list I have above it will be the first case because EU1 value is bigger than 70%.
I tried something like this :
from collections import Counter
freqDict = Counter(group_member)
size = len(group_member)
if len(group_member) > 0 :
for (key,val) in freqDict.items():
if (val >= (7*size/10)):
print(">= than70%")
insert_into_table(group)
elif (val <(7*size/10)) :
print("under 70%")
send_mail_notification(group)
The problem with this is that it will check for each combinations of key and value and that would mean even if one value is >= 70% it will still enter the elif and send mail multiple times for the same group which is unacceptable but i didn't found a solution for this yet.
How can I avoid this cases? For the list above it should only insert the value in the table and move on to the next list, for the second list it should only send a mail notification only one time because there is no element >=70%.
I need to implement the following cases:
If >=70% is the same value (ex MEU1 in CASE1) then insert into a table.
IF >=70% is in the same unit (M) but not the same tribe, so in CASE 3 because 4 of 5 elements have M they belong to the same Unit --> send notification
I believe you should check if there's at least one item with a value greater or equal than 70, then send mail if there's no such value. This means you should check if you should send a mail after you go through the list.
from collections import Counter
freqDict = Counter(group_member)
size = len(group_member)
foundBigVal = False
if len(group_member) > 0 :
for (key,val) in freqDict.items():
if (val >= (7*size/10)):
print(">= than70%")
insert_into_table(group)
foundBigVal = True
break #no need to check the list further since only one can have %70 percent
if foundBigVal:
#if there's a value greater than %70 in the list, we would enter this part
print("under 70%")
send_mail_notification(group)
I put the if outside the loop in order to call send_mail_notification once but check each element inside the list.
You could try something like this
a = ['EU1', 'EU1','EU2', 'EU1','EU1','EU2','EU1','EU2','EU1','EU3']
max_percentage = Counter(a).most_common(1)[0][1]/len(a)*100
most_common(1)[0] returns a tuple in the form of (most_common_element, its_count). We just extract the count and divide it by the total length of the list to find the percentage.
Although it seems that the percentage of EU1 in the above list is 60%, not 70, as you mentioned.

Iterating over a list of lists with a conditional formulas in Python

I have three input lists as follows:
fill_rgn_pts = [[0,1,2,3,4,5,6,7],[0,1,2,3,4,5],[0,1,2,3],[0,1,2,3]]
fill_rgn = [[region1],[region2],[region3],[region4]]
rooms = [[room1],[room2],[room3],[room4],[room5],[room6]]
I am trying to pair fill_rgn and rooms based on whether all fill_rgn_pts are contained within the room. Here's what I have tried so far:
valid_rooms, valid_fill_rgn, invalid_rooms = [], [], []
for i in rooms:
for list, region in zip(fill_rgn_pts, fill_rgn):
if all(i.IsPointInRoom(j) == True for j in list):
valid_rooms.append(i)
valid_fill_rgn.append(region)
else:
invalid_rooms.append(i)
OUT = valid_fill_rgn, valid_rooms, invalid_rooms
What I am getting back from this are three lists:
valid_fill_rgn = [[region1],[region2],[region3],[region4]]
valid_rooms = [[room1],[room2],[room3],[room4]]
invalid_rooms = [[room1],[room1],[room1],[room2],[room2],[room2],[room3],[room3],[room3],[room4],[room4],[room4],[room4],[room5],[room5],[room5],[room6],[room6],[room6],[room6]]
The first two lists look exactly how I want them since they paired up a region and room how I expected it. The third list however returns too many items. I am getting three extra values for each room, which makes me think that I am iterating over something that I shouldn't. Ideas?
If I understand your problem correctly, you can fix this by changing invalid_rooms and valid_rooms to a set, which will not allow duplicates.
Your loop repeats the addition for each point/range pair, which is why you keep getting duplicates in your valid_rooms and invalid_rooms list. Once a room is marked as valid or invalid, you don't need to add it again.
Further, you seem to have a list with just one item, [room1], it would be better just to have the individual rooms:
rooms = [room1,room2,room3,room4,room5,room6]
Why not compute invalid rooms last?
After finding valid rooms
fill_rgn_pts = [[0,1,2,3,4,5,6,7],[0,1,2,3,4,5],[0,1,2,3],[0,1,2,3]]
fill_rgn = [[region1],[region2],[region3],[region4]]
rooms = [[room1],[room2],[room3],[room4],[room5],[room6]]
for i in rooms:
for list, region in zip(fill_rgn_pts, fill_rgn):
if all(i.IsPointInRoom(j) == True for j in list):
valid_rooms.append(i)
valid_fill_rgn.append(region)
invalid_rooms = [room for room in rooms if room not in valid_rooms]
OUT = valid_fill_rgn, valid_rooms, invalid_rooms

changing order of items in tkinter listbox

Is there an easier way to change the order of items in a tkinter listbox than deleting the values for specific key, then re-entering new info?
For example, I want to be able to re-arrange items in a listbox. If I want to swap the position of two, this is what I've done. It works, but I just want to see if there's a quicker way to do this.
def moveup(self,selection):
value1 = int(selection[0]) - 1 #value to be moved down one position
value2 = selection #value to be moved up one position
nameAbove = self.fileListSorted.get(value1) #name to be moved down
nameBelow = self.fileListSorted.get(value2) #name to be moved up
self.fileListSorted.delete(value1,value1)
self.fileListSorted.insert(value1,nameBelow)
self.fileListSorted.delete(value2,value2)
self.fileListSorted.insert(value2,nameAbove)
Is there an easier way to change the order of items in a tkinter listbox than deleting the values for specific key, then re-entering new info?
No. Deleting and re-inserting is the only way. If you just want to move a single item up by one you can do it with only one delete and insert, though.
def move_up(self, pos):
""" Moves the item at position pos up by one """
if pos == 0:
return
text = self.fileListSorted.get(pos)
self.fileListSorted.delete(pos)
self.fileListSorted.insert(pos-1, text)
To expand on Tim's answer, it is possible to do this for multiple items as well if you use the currentselection() function of the tkinter.listbox.
l = self.lstListBox
posList = l.curselection()
# exit if the list is empty
if not posList:
return
for pos in posList:
# skip if item is at the top
if pos == 0:
continue
text = l.get(pos)
l.delete(pos)
l.insert(pos-1, text)
This would move all selected items up 1 position. It could also be easily adapted to move the items down. You would have to check if the item was at the end of the list instead of the top, and then add 1 to the index instead of subtract. You would also want to reverse the list for the loop so that the changing indexes wouldn't mess up future moves in the set.

Categories