Python Comparing lists averages sorting - python

I am currently working on a python problem that involves taking a list consisting of 2 sublists of numbers and an identifier, for a total of three things. The procedure name is compareTeams(lstTeams), and is meant to be used to calculate the average winning percentage of teams over a number of seasons. The first list is the games won, the second list is the games lost. The procedure in question takes a list of these lists and tries to find the highest average by adding up the games won over the total games then diving that out by the length of the list. Both lists have the same size. It then sorts the averages in order from greatest to least as pairs of lists, with the identifier tagging along as the first element in each list. To provide an example:
teamA = [[6, 4, 8, 5, 0], [3, 6, 0, 2, 4], 'A'] avg winning percentage = 0.56
(in case my explanation is poor and hard to follow, for teamA, the percentage is calculated as (6/9 + 4/10 + 8/8 + 5/7 + 0/4) / 5)
teamB = [[3, 6, 8, 2, 4], [3, 6, 8, 2, 4], 'B'] avg winning percentage = 0.50
teamC = [[3, 6, 8, 2, 4], [0, 0, 0, 0, 0], 'C'] avg winning percentage = 1
compareTeams([teamA, teamB, teamC]) gives [['C', 1],['A', 0.56],['B', 0.50]]
I have given this problem a good amount of thought, but am new to python, so I am unsure if I am calling everything correctly. The interpreter I am using does not even display my procedure when I run it, which leads me to believe that I may be doing something wrong. Here is my code:
def compareTeams(lstTeams):
a = 0
x = 0
lst = []
y = lstTeams[a]
for a in range(0, len(y)):
x = x + ((float(y[0][0]) / (y[1][0])) / len(y[0]))
a = a + 1
lst.append(x)
return lst.reverse(lst.sort())
Is this correct? Am I doing anything wrong? Any help would be greatly appreciated.
NOTE: I am using python 2.7 for this.

You can use zip here:
def compare_team(teams):
lis = []
for team in teams:
#zip fetches items from the same index one by one from the lists passed to it
avg = sum( (x*1.0)/(x+y) for x,y in zip(team[0],team[1]))/ len(team[0])
lis.append([team[-1],avg])
lis.sort(key = lambda x:x[1],reverse = True) #reverse sort based on the second item
return lis
>>> compare_team(([teamA, teamB, teamC]))
[['C', 1.0], ['A', 0.5561904761904761], ['B', 0.5]]

Related

Fast way to get items removed from a list and items appended to a list based on an old version and a new version

I have a class in python that contains a list, that can be updated by calling a function. This list has several associated lists, which are all updated whenever the list is updated. However, one of the lists has roughly 40,000 items, and so when it is updated and all the related lists are updated, and it takes a very long time! Instead of remaking all the lists associated with the new list every time I update the list, I want to get what has been added to the main list and what has been removed, and only update these specific parts of the list.
Below is some code that gets what items have been removed, where self.buildings is the list that contains the old list, and globalvars[self.name] gets the current list.
def updatelist(self):
globalvars = globals()
removed = []
for building in self.buildings:
if building not in globalvars[self.name]:
removed.append(building)
However, while this code functions, it takes a very long time for the list with 40,000 items, and I am certain that I could just iterate over the list and find which items have been removed without using in, which has a time complexity that makes this unusable.
Finally, I want some code which can also identify what has been appended to the list. Something will never be inserted into the list at a random place, so there will only by new items appearing at the end which should make it easier to code.
A sample input would be:
self.buildings = [[3, 2, 3, 5], [0, 0, 1, 1], [1, 1, 5, 5], [6, 2, 3, 3]]
globalvars[self.name] = [[0, 0, 1, 1], [1, 1, 5, 5], [8, 2, 6, 3], [6, 2, 7, 0]]
An ideal output would be:
removed = [[3, 2, 3, 5], [6, 2, 3, 3]]
appended = [[8, 2, 6, 3], [6, 2, 7, 0]]
How would I achieve this in an optimised way?
Thanks in advance.
as i was discussing in the comments section, using sets will speed up your code, but your entries have to be hashable, if the input is in fact list of lists, then converting it to a set of tuples would do the trick, otherwise if it's a complicated class then you must define a __hash__ function for it.
the below code shows the use of sets for the content in the question.
from random import randint
# construct the large random arrays
large_list_size = 10000
small_list_size = 4
first_list = []
second_list = []
first_list.append([1,1,1,1])
second_list.append([1,1,1,1])
for i in range(large_list_size):
small_list_instance = [1,1,1,1]
while small_list_instance in first_list:
small_list_instance = []
for j in range(small_list_size):
small_list_instance.append(randint(0,10))
first_list.append(small_list_instance)
small_list_instance = [1, 1, 1, 1]
while small_list_instance in second_list:
small_list_instance = []
for j in range(small_list_size):
small_list_instance.append(randint(0, 10))
second_list.append(small_list_instance)
# list implementation in question
def updatelist(first,second):
removed = []
for building in first:
if building not in second:
removed.append(building)
return removed
# sets implemenation
def updatelist_sets(first,second):
first_set = set([(*x,) for x in first])
second_set = set([(*x,) for x in second])
removed = first_set.difference(second_set)
return removed
# measure lists time
import time
retries = 10
t1 = time.time()
for i in range(retries):
a = updatelist(first_list,second_list)
t2 = time.time()
time_lists = (t2-t1)/retries
print('using lists = ',time_lists)
# measure sets time
t1 = time.time()
for i in range(retries):
b = updatelist_sets(first_list,second_list)
t2 = time.time()
time_sets = (t2-t1)/retries
print('using sets = ',time_sets)
print('speedup = ',time_lists/time_sets)
# convert b to be like a in structure
b = [list(x) for x in b]
# compare equality of both outputs
if sorted(a) == sorted(b):
print('both are equal')
you get
using lists = 1.0185745239257813
using sets = 0.004686903953552246
speedup = 217.32353255367963
both are equal
needless to say, most of the time is spent constructing the set of tuples, so if you change your original data structure to a set of hashable objects, then the code is going to be even faster.
Edit: fixed last comparison as #Kelly Bundy suggested.

Find 4 values in a list that are close together

I am trying to find the 4 closest value in a given list within a defined value for the difference. The list can be of any length and is sorted in increasing order. Below is what i have tried:
holdlist=[]
m=[]
nlist = []
t = 1
q = [2,3,5,6,7,8]
for i in range(len(q)-1):
for j in range(i+1,len(q)):
if abs(q[i]-q[j])<=1:
holdlist.append(i)
holdlist.append(j)
t=t+1
break
else:
if t != 4:
holdlist=[]
t=1
elif t == 4:
nlist = holdlist
holdlist=[]
t=1
nlist = list(dict.fromkeys(nlist))
for num in nlist:
m.append(q[num])
The defined difference value here is 1. Where "q" is the list and i am trying to get the result in "m" to be [5,6,7,8]. but it turns out to be an empty list.
This works only if the list "q" is [5,6,7,8,10,11]. My guess is after comparing the last value, the for loop ends and the result does not go into "holdlist".
Is there a more elegant way of writing the code?
Thank you.
One solution would be to sort the input list and find the smallest window of four elements. Given the example input, this is
min([sorted(q)[i:i+4] for i in range(len(q) - 3)],
key=lambda w: w[3] - w[0])
But given a different input this will still return a value if the smallest window has a bigger spacing than 1. But I'd still use this solution, with a bit of error handling:
assert len(q) > 4
answer = min([sorted(q)[i:i+4] for i in range(len(q) - 3)], key=lambda w: w[3] - w[0])
assert answer[3] - answer[0] < 4
Written out and annotated:
sorted_q = sorted(q)
if len(q) < 4:
raise RuntimeError("Need at least four members in the list!")
windows = [sorted_q[i:i+4] for i in range(len(q) - 3)] # All the chunks of four elements
def size(window):
"""The size of the window."""
return window[3] - window[0]
answer = min(windows, key=size) # The smallest window, by size
if answer[3] - answer[0] > 3:
return "No group of four elements has a maximum distance of 1"
return answer
This would be one easy approach to find four closest numbers in list
# Lets have a list of numbers. It have to be at least 4 numbers long
numbers = [10, 4, 9, 1,7,12,25,26,28,29,30,77,92]
numbers.sort()
#now we have sorted list
delta = numbers[4]-numbers[0] # Lets see how close first four numbers in sorted list are from each others.
idx = 0 # Let's save our starting index
for i in range(len(numbers)-4):
d = numbers[i+4]-numbers[i]
if d < delta:
# if some sequence are closer together we save that value and index where they were found
delta = d
idx = i
if numbers[idx:idx+4] == 4:
print ("closest numbers are {}".format(numbers[idx:idx+4]))
else:
print ("Sequence with defined difference didn't found")
Here is my jab at the issue for OP's reference, as #kojiro and #ex4 have already supplied answers that deserve credit.
def find_neighbor(nums, dist, k=4):
res = []
nums.sort()
for i in range(len(nums) - k):
if nums[i + k - 1] - nums[i] <= dist * k:
res.append(nums[i: i + k])
return res
Here is the function in action:
>>> nums = [10, 11, 5, 6, 7, 8, 9] # slightly modified input for better demo
>>> find_neighbor(nums, 1)
[[5, 6, 7, 8], [6, 7, 8, 9], [7, 8, 9, 10]]
Assuming sorting is legal in tackling this problem, we first sort the input array. (I decided to sort in-place for marginal performance gain, but we can also use sorted(nums) as well.) Then, we essentially create a window of size k and check if the difference between the first and last element within that window are lesser or equal to dist * k. In the provided example, for instance, we would expect the difference between the two elements to be lesser or equal to 1 * 4 = 4. If there exists such window, we append that subarray to res, which we return in the end.
If the goal is to find a window instead of all windows, we could simply return the subarray without appending it to res.
You can do this in a generic fashion (i.e. for any size of delta or resulting largest group) using the zip function:
def deltaGroups(aList,maxDiff):
sList = sorted(aList)
diffs = [ (b-a)<=maxDiff for a,b in zip(sList,sList[1:]) ]
breaks = [ i for i,(d0,d1) in enumerate(zip(diffs,diffs[1:]),1) if d0!=d1 ]
groups = [ sList[s:e+1] for s,e in zip([0]+breaks,breaks+[len(sList)]) if diffs[s] ]
return groups
Here's how it works:
Sort the list in order to have each number next to the closest other numbers
Identify positions where the next number is within the allowed distance (diffs)
Get the index positions where compliance with the allowed distance changes (breaks) from eligible to non-eligible and from non-eligible to eligible
This corresponds to start and end of segments of the sorted list that have consecutive eligible pairs.
Extract subsets of the the sorted list based on the start/end positions of consecutive eligible differences (groups)
The deltaGroups function returns a list of groups with at least 2 values that are within the distance constraints. You can use it to find the largest group using the max() function.
output:
q = [10,11,5,6,7,8]
m = deltaGroups(q,1)
print(q)
print(m)
print(max(m,key=len))
# [10, 11, 5, 6, 7, 8]
# [[5, 6, 7, 8], [10, 11]]
# [5, 6, 7, 8]
q = [15,1,9,3,6,16,8]
m = deltaGroups(q,2)
print(q)
print(m)
print(max(m,key=len))
# [15, 1, 9, 3, 6, 16, 8]
# [[1, 3], [6, 8, 9], [15, 16]]
# [6, 8, 9]
m = deltaGroups(q,3)
print(m)
print(max(m,key=len))
# [[1, 3, 6, 8, 9], [15, 16]]
# [1, 3, 6, 8, 9]

How to find the average of a nested list using a for loop?

I can't import any modules to find the average, only a 'for loop'.
I essentially have to find the average of a nested list.
def get_average(map: List[List[int]]) -> float:
"""Return the average across all cells in the map.
>>> get_average(3X3)
5.0
>>> get_average(4X4)
3.8125
"""
total = 0
for sublist in range(len(map)): #gives sublist
for i in range(sublist): #access the items in the sublist
total = total + i
average = total / len(map)
return average
The output for get_average(4X4) is 1.0
L =[[1, 2, 6, 5],
[4, 5, 3, 2],
[7, 9, 8, 1],
[1, 2, 1, 4]]
def func(l):
total_sum = sum([sum(i) for i in l])
# make the sum of inner lists, store them in the list and then get the sum of final list
count = sum([len(i) for i in l]) # get the total element in the list
return total_sum/count # return average
print(func(L))
output
3.8125
what op code should be
def get_average_elevation(elevation_map):
"""Return the average elevation across all cells in the elevation map
elevation_map.
Precondition: elevation_map is a valid elevation map.
>>> get_average_elevation(UNIQUE_3X3)
5.0
>>> get_average_elevation(FOUR_BY_FOUR)
3.8125
"""
total = 0
count = 0
for sublist in range(len(elevation_map)): # gives sublist index
for i in range(len(elevation_map[sublist])): # gives index of item in sublist
count+=1
total = total + elevation_map[sublist][i]
return total/count
l = [[1, 2, 6, 5],
[4, 5, 3, 2],
[7, 9, 8, 1],
[1, 2, 1, 4]]
print(get_average_elevation(l))
why coming defference, this is beacause
let say a list is l = [1,2,3,4]
so for this list for i in range(len(l)) will iterate 4 times only, but it wont give elemnt inside list ( which op thought it will give) ,but range return a object which iterate in inclusive range , easy term it give list from start to end-1.
what op want was element inside the list for this he need to use for element in list this will give indivitual element , in this quest regard inner list.
also to get the avg, op need to get sum of all element, which he is geeting but he need to make the avg outside the for loop.
also op need a counter to count the total no of elements to get the average.
You misunderstand the difference between an index and the list contents.
for sublist in range(len(elevation_map)): #gives sublist
No, it does not. sublist iterates through the indices of elevation_map, the values 0-3.
for i in range(sublist): #access the items in the sublist
Again, no. i iterates through the values 0-sublist, which is in the range 0-3.
As a result, total winds up being the sum 0 + (0 + 1) + (0 + 1 + 2) = 4. That's how you got a mean of 1.0
Instead, write your loops to work as your comments describe:
def get_average_elevation(elevation_map):
"""Return the average elevation across all cells in the elevation map
elevation_map.
Precondition: elevation_map is a valid elevation map.
>>> get_average_elevation(UNIQUE_3X3)
5.0
>>> get_average_elevation(FOUR_BY_FOUR)
3.8125
"""
total = 0
for sublist in elevation_map: #gives sublist
for i in sublist: #access the items in the sublist
total = total + i
average = total / len(elevation_map)
return average
Now, this adds up all 16 elements and divides by the quantity of rows, giving you 15.25 as the result. I think you want the quantity of elements, so you'll need to count or compute that, instead.
Can you take it from there?
Your code is likely not working because for sublist in range(len(elevation_map)): will iterate over a generator that produces [1,2,3,4]. You never access that actual numbers within the elevation_map array. The inner loop suffers from the same issue.
You can make the code simpler by using a list comprehension to flatten the array, then get the average from the flattened list.
flat_list = [item for sublist in elevation_map for item in sublist]
average = sum(flat_list) / len(flat_list)
You can just turn your list of lists in to a flat list, then use the sum function:
FOUR_BY_FOUR = [[1, 2, 6, 5],
[4, 5, 3, 2],
[7, 9, 8, 1],
[1, 2, 1, 4]]
UNIQUE_3X3 = [[1, 2, 3],
[9, 8, 7],
[4, 5, 6]]
answer = [i for k in FOUR_BY_FOUR for i in k]
print(sum(answer)/16)
answer = [i for k in UNIQUE_3X3 for i in k]
print(sum(answer)/9)
This returns:
3.8125
5.0

Subset sum with `itertools.combinations`

I have a list of integers in python, let's say:
weight = [7, 5, 3, 2, 9, 1]
How should I use itertools.combinations to find all of the possible subsets of sums that there are with these integers.
So that I get (an example of desired output with 3 integers - weight = [7, 5, 3]:
sums = [ [7], [7+5], [7+3], [7+5+3], [5], [5+3], [3] ]
Associated with these weights I have another array called luggages that is a list of lists with the luggage name and its correspondent weight in this format:
luggages = [["samsonite", 7], ["Berkin", 5], ["Catelli", 3] .....]
I created an array called weight in this manner.
weight = numpy.array([c[1] for c in luggages])
I could do this for the luggage names need be.
I attempted to use itertools.combinations in this manner (upon suggestion):
comb = [combinations(weight, i) for i in range(len(luggages))]
My goal: To print out all the possible subsets of luggage names that I can bring on a trip given the max_weight = 23 kg of all the combination of each subset that satisfies the condition that the subsets sum equals EXACTLY 23 KG.
In simpler terms I have to print out a list with the names of the luggages that if its weights were summed would equal the max_weight = 23 EXACTLY.
Keed in mind: The luggages can be selected only once in each subset but they can appear in as many subsets as possible. Also, The number of items in each subset is irrelevant: it can be 1 luggage, 2, 3... as long as their sum equals exactly 23.
Working on the traveling salesman, are we? You can do this using everyone's favorite Python feature, list comprehensions:
weight = [7, 5, 3, 2, 9, 1]
cmb = []
for x in range(1, len(weight) + 1):
cmb += itertools.combinations(weight, x)
#cmb now contains all combos, filter out ones over the limit
limit = 23
valid_combos = [i for i in cmb if sum(i) == limit]
print(valid_combos)

numbers in a list, find average of surroundings

Question:
Given a list listA of numbers, write a program that generates a new list listB with the same number of elements as listA, such that each element in the new list is the average of its neighbors and itself in the original list.
For example, if listA = [5, 1, 3, 8, 4], listB = [3.0, 3.0, 4.0, 5.0, 6.0], where:
(5 + 1)/2 = 3.0
(5 + 1 + 3)/3 = 3.0
(1 + 3 + 8)/3 = 4.0
(3 + 8 + 4)/3 = 5.0
(8 + 4)/2 = 6.0
so i can get the first part, and the last part since they only deal with 2 numbers, but for the middle part i can not get it. my loop is wrong, but i dont know exactly. this is what i have so far.
listA= [5,1,3,8,4]
N=len(listA)
print(listA)
listB=[]
listB.append((listA[0]+listA[1])/2)
y=0
x=1
while x in listA:
y=((listA[x-1] + list[x] + list[x+1])/3)
listB.append(y)
y=y+1
listB.append((listA[-1]+listA[-2])/2)
print(listB)
You can do this using iterators without having to resort to looping through indices:
import itertools
def neighbours(items, fill=None):
before = itertools.chain([fill], items)
after = itertools.chain(items, [fill]) #You could use itertools.zip_longest() later instead.
next(after)
for a, b, c in zip(before, items, after):
yield [value for value in (a, b, c) if value is not fill]
Used like so:
>>> items = [5, 1, 3, 8, 4]
>>> [sum(values)/len(values) for values in neighbours(items)]
[3.0, 3.0, 4.0, 5.0, 6.0]
So how does this work? We create some extra iterators - for the before and after values. We use itertools.chain to add an extra value to the beginning and end respectively, in order to allow us to get the right values at the right time (and not run out of items). We then advance the later item on one, to put it in the right position, then loop through, returning the values that are not None. This means we can just loop through in a very natural way.
Note that this requires a list, as an iterator will be exhausted. If you need it to work lazily on an iterator, the following example uses itertools.tee() to do the job:
def neighbours(items, fill=None):
b, i, a = itertools.tee(items, 3)
before = itertools.chain([fill], b)
after = itertools.chain(a, [fill])
next(a)
for a, b, c in zip(before, i, after):
yield [value for value in (a, b, c) if value is not fill]
list_a = [5, 1, 3, 8, 4]
# Start list_b with the special case first element
list_b = [sum(list_a[:1]) / 2.0]
# Iterate over each remaining index in the list
for i in range(1, len(list_a - 1)):
# Get the slice of the element and it's neighbors
neighborhood = list_a[i-1:i+1]
# Add the average of the element and it's neighbors
# have to calculate the len of neighborhood to handle
# last element special case
list_b.append(sum(neighborhood) / float(len(neighborhood)))
Could use arcpy.AverageNearestNeighbor_stats
Otherwise, if you like loops:
import numpy as np
listA= [5,1,3,8,4]
b = []
for r in xrange(len(listA)):
if r==0 or r==len(listA):
b.append(np.mean(listA[r:r+2]))
else:
b.append(np.mean(listA[r-1:r+2]))
It looks like you've got the right idea. Your code is a little tough to follow though, try using more descriptive variable names in the future :) It makes it easier on every one.
Here's my quick and dirty solution:
def calcAverages(listOfNums):
outputList = []
for i in range(len(listOfNums)):
if i == 0:
outputList.append((listOfNums[0] + listOfNums[1]) / 2.)
elif i == len(listOfNums)-1:
outputList.append((listOfNums[i] + listOfNums[i-1]) / 2.)
else:
outputList.append((listOfNums[i-1] +
listOfNums[i] +
listOfNums[i+1]) / 3.)
return outputList
if __name__ == '__main__':
listOne = [5, 1, 3, 8, 4, 7, 20, 12]
print calcAverages(listOne)
I opted for a for loop instead of a while. This doesn't make a big difference, but I feel the syntax is easier to follow.
for i in range(len(listOfNums)):
We create a loop which will iterate over the length of the input list.
Next we handle the two "special" cases: the beginning and end of the list.
if i == 0:
outputList.append((listOfNums[0] + listOfNums[1]) / 2.)
elif i == len(listOfNums)-1:
outputList.append((listOfNums[i] + listOfNums[i-1]) / 2.)
So, if our index is 0, we're at the beginning, and so we add the value of the currect index, 0, and the next highest 1, average it, and append it to our output list.
If our index is equal to the length of out list - 1 (we use -1 because lists are indexed starting at 0, while length is not. Without the -1, we would get an IndexOutOfRange error.) we know we're on the last element. And thus, we take the value at that position, add it to the value at the previous position in the list, and finally append the average of those numbers to the output list.
else:
outputList.append((listOfNums[i-1] +
listOfNums[i] +
listOfNums[i+1]) / 3.)
Finally, for all of the other cases, we simply grab the value at the current index, and those immediately above and below it, and then append the averaged result to our output list.
In [31]: lis=[5, 1, 3, 8, 4]
In [32]: new_lis=[lis[:2]]+[lis[i:i+3] for i in range(len(lis)-1)]
In [33]: new_lis
Out[33]: [[5, 1], [5, 1, 3], [1, 3, 8], [3, 8, 4], [8, 4]]
#now using sum() ans len() on above list and using a list comprehension
In [35]: [sum(x)/float(len(x)) for x in new_lis]
Out[35]: [3.0, 3.0, 4.0, 5.0, 6.0]
or using zip():
In [36]: list1=[lis[:2]] + zip(lis,lis[1:],lis[2:]) + [lis[-2:]]
In [37]: list1
Out[37]: [[5, 1], (5, 1, 3), (1, 3, 8), (3, 8, 4), [8, 4]]
In [38]: [sum(x)/float(len(x)) for x in list1]
Out[38]: [3.0, 3.0, 4.0, 5.0, 6.0]

Categories