I am working on an optimization project where I have a series of dictionaries with tuples as keys and another dictionary (a decision variable with Gurobi) where the key is the first element of the tuples in the other dictionaries. I need to be able to do the following:
data1 = {(place, person): q}
data2 = {person: s}
x = {place: var}
qx = {k: x[k]*data1[k] for k in x}
total1 = {}
for key, value in qx.items():
person = key[1]
if person in total1:
total1[person] = total1[person] + value
else:
total1[person] = value
total2 = {k: total1[k]/data2[k] for k in total1}
(Please note that the data1, data2, and x dictionaries are very large, 10,000+ distinct place/person pairs).
This same process works when I use the raw data in place of the decision variable, which uses the same (place, person) key. Unfortunately, my variable within the Gurobi model itself must be a dictionary and it cannot contain the person key value.
Is there any way to iterate over just the first value in the tuple key?
EDIT:
Here are some sample values (sensitive data, so placeholder values):
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
data2 = {a: 7.8, b: 8.5, c: 8.4}
x = {1: 0.002, 2: 0.013}
Values in data1 are all integers, data2 are hours, and x are small decimals.
Outputs in total2 should look similar to the following (assuming there are many other rows for each person):
total2 = {a: 0.85, b: 1.2, c: 1.01}
This code is essentially calculating a "productivity score" for each person. The decision variable, x, is looking only at each individual place for business purposes, so it cannot include the person identifiers. Also, the Gurobi package is very limiting about how things can be formatted, so I have not found a way to even use the tuple key for x.
Generally, the most efficient way to aggregate values into bins is to use a for loop and store the values in a dictionary, as you did with total1 in your example. In the code below, I have fixed your qx line so it runs, but I don't know if this matches your intention. I also used total1.setdefault to streamline the code a little:
a, b, c = 'a', 'b', 'c'
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
data2 = {a: 7.8, b: 8.5, c: 8.4}
x = {1: 0.002, 2: 0.013}
qx = {place, person: x[place] * value for (place, person), value in data1.items()}
total1 = {}
for (place, person), value in qx.items():
total1.setdefault(person, 0.0)
total1[person] += value
total2 = {k: total1[k] / data2[k] for k in total1}
print(total2)
# {'a': 0.0071794871794871795, 'c': 0.013571428571428571, 'b': 0.19117647058823528}
But this doesn't produce the result you asked for. I can't tell at a glance how you get the result you showed, but this may help you move in the right direction.
It might also be easier to read if you moved the qx logic into the loop, like this:
total1 = {}
for (place, person), value in data1.items():
total1.setdefault(person, 0.0)
total1[person] += x[place] * value
total2 = {k: total1[k] / data2[k] for k in total1}
Or, if you want to do this often, it might be worth creating a cross-reference between persons and their matching places, as #martijn-pieters suggested (note, you still need a for loop to do the initial cross-referencing):
# create a list of valid places for each person
places_for_person = {}
for place, person in data1:
places_for_person.setdefault(person, [])
places_for_person[person].append(place)
# now do the calculation
total2 = {
person:
sum(
data1[place, person] * x[place]
for place in places_for_person[person]
) / data2[person]
for person in data2
}
For creating a new dictionary removing the tuple:
a, b, c = "a", "b", "c"
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
total = list()
spot = 0
for a in data1:
total.append(list(a[1])) # Add new Lists to list "total" containing the Key values
total[spot].append(data1[a]) # Add Values to Keys judging from their spot in the list
spot += 1 # to keep the spot in correct place in lists
total = dict(total) # convert it to dictionary
print(total)
Output:
{'a': 28, 'c': 57, 'b': 125}
Related
I want to see the modeling output with two data frames.
One data frame has a target value of 1 to 8 and another has only 1,2,3,5,6,7
I made a dictionary to map the values, and I made a code as below to make the probability.
my_dict ={1:'a', 2:'b', 3:'c', 4:'d', 5:'e', 6:'f', 7:'g', 8:'f'}
def func(val):
for key, value in my_dict.items():
if val == key:
return value
return "There is no such Key"
inputData = [1, 2, 3, 4, 5]
inputData2 = np.array([inputData])
index = 1;
result_data = OrderedDict()
for x in xgb_model.predict_proba(inputData2,ntree_limit=None, validate_features=False,base_margin=None)[0]:
result_data[func(index)] = round(x,2)
index += 1
print("result_name : ", max(result_data.items(), key=operator.itemgetter(1))[0])
print("result_value : ", max(xgb_model.predict_proba(inputData2, ntree_limit=None, validate_features=False, base_margin=None)[0]))
print(result_data)
But in the second data frame, the key value is pushed back.
For example, a: 0.2, b:0.2, c:0.1, e:0.1, f:0.1 g:0.3 should appear, but in real data, the data should be:
a:0.2, b:0.2, c:0.1, d:0.1, e:0.1, f:0.3
I don’t know what I should do.
So I've been working on the code below.
Only a:0.2, b:0.2, c:0.1 comes out and ends.
for x in xgb_model.predict_proba(inputData2,ntree_limit=None, validate_features=False,base_margin=None)[0]:
if index not in y.target.unique().tolist():
continue
result_data[func(index)] = round(x,2)
index += 1
please let me know if you can't understand the code.
hope for help. Thank you.
In the second model that has 8 coefficients, you overwrite the value for f since it is defined both for the 6th as well as for the 8th element. Your dict should be defined as:
my_dict ={1:'a', 2:'b', 3:'c', 4:'d', 5:'e', 6:'f', 7:'g', 8:'h'}
But you could make the code much simpler by just using a string ("_abcdefgh") to get the correct letter for each index. You could, then, just use result_data[mystring[i]]= and drop the function.
Question:
I have a list in the following format:
x = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
The algorithm:
Combine all inner lists with the same starting 2 values, the third value doesn't have to be the same to combine them
e.g. "hello",0,5 is combined with "hello",0,8
But not combined with "hello",1,1
The 3rd value becomes the average of the third values: sum(all 3rd vals) / len(all 3rd vals)
Note: by all 3rd vals I am referring to the 3rd value of each inner list of duplicates
e.g. "hello",0,5 and "hello",0,8 becomes hello,0,6.5
Desired output: (Order of list doesn't matter)
x = [["hello",0,6.5], ["hi",0,6], ["hello",1,1]]
Question:
How can I implement this algorithm in Python?
Ideally it would be efficient as this will be used on very large lists.
If anything is unclear let me know and I will explain.
Edit: I have tried to change the list to a set to remove duplicates, however this doesn't account for the third variable in the inner lists and therefore doesn't work.
Solution Performance:
Thanks to everyone who has provided a solution to this problem! Here
are the results based on a speed test of all the functions:
Update using running sum and count
I figured out how to improve my previous code (see original below). You can keep running totals and counts, then compute the averages at the end, which avoids recording all the individual numbers.
from collections import defaultdict
class RunningAverage:
def __init__(self):
self.total = 0
self.count = 0
def add(self, value):
self.total += value
self.count += 1
def calculate(self):
return self.total / self.count
def func(lst):
thirds = defaultdict(RunningAverage)
for sub in lst:
k = tuple(sub[:2])
thirds[k].add(sub[2])
lst_out = [[*k, v.calculate()] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
Original answer
This probably won't be very efficient since it has to accumulate all the values to average them. I think you could get around that by having a running average with a weighting factored in, but I'm not quite sure how to do that.
from collections import defaultdict
def avg(nums):
return sum(nums) / len(nums)
def func(lst):
thirds = defaultdict(list)
for sub in lst:
k = tuple(sub[:2])
thirds[k].append(sub[2])
lst_out = [[*k, avg(v)] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
You can try using groupby.
m = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
from itertools import groupby
m.sort(key=lambda x:x[0]+str(x[1]))
for i,j in groupby(m, lambda x:x[0]+str(x[1])):
ss=0
c=0.0
for k in j:
ss+=k[2]
c+=1.0
print [k[0], k[1], ss/c]
This should be O(N), someone correct me if I'm wrong:
def my_algorithm(input_list):
"""
:param input_list: list of lists in format [string, int, int]
:return: list
"""
# Dict in format (string, int): [int, count_int]
# So our list is in this format, example:
# [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
# so for our dict we will make keys a tuple of the first 2 values of each sublist (since that needs to be unique)
# while values are a list of third element from our sublist + counter (which counts every time we have a duplicate
# key, so we can divide it and get average).
my_dict = {}
for element in input_list:
# key is a tuple of the first 2 values of each sublist
key = (element[0], element[1])
if key not in my_dict:
# If the key do not exists add it.
# Value is in form of third element from our sublist + counter. Since this is first value set counter to 1
my_dict[key] = [element[2], 1]
else:
# If key does exist then increment our value and increment counter by 1
my_dict[key][0] += element[2]
my_dict[key][1] += 1
# we have a dict so we will need to convert it to list (and on the way calculate averages)
return _convert_my_dict_to_list(my_dict)
def _convert_my_dict_to_list(my_dict):
"""
:param my_dict: dict, key is in form of tuple (string, int) and values are in form of list [int, int_counter]
:return: list
"""
my_list = []
for key, value in my_dict.items():
sublist = [key[0], key[1], value[0]/value[1]]
my_list.append(sublist)
return my_list
my_algorithm(x)
This will return:
[['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
While your expected return is:
[["hello", 0, 6.5], ["hi", 0, 6], ["hello", 1, 1]]
If you really need ints then you can modify _convert_my_dict_to_list function.
Here's my variation on this theme: a groupby sans the expensive sort. I also changed the problem to make the input and output a list of tuples as these are fixed-size records:
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
data = [("hello", 0, 5), ("hi", 0, 6), ("hello", 0, 8), ("hello", 1, 1)]
dictionary = defaultdict(complex)
for key, group in groupby(data, itemgetter(slice(2))):
total = sum(value for (string, number, value) in group)
dictionary[key] += total + 1j
array = [(*key, value.real / value.imag) for key, value in dictionary.items()]
print(array)
OUTPUT
> python3 test.py
[('hello', 0, 6.5), ('hi', 0, 6.0), ('hello', 1, 1.0)]
>
Thanks to #wjandrea for the itemgetter replacement for lambda. (And yes, I am using complex numbers in passing for the average to track the total and count.)
I am wondering how I can print all keys in a dictionary on one line and values on the next so they line up.
The task is to create a solitaire card game in python. Made most of it already but I wish to improve on the visual. I know how to use a for loop to print lines for each value and key, but the task I'm doing in school asks me to do it this way. I also just tried to create new lists for each line and "print(list1)" print(list2) but that just looks ugly.
FireKort ={
'A': None,#in my code i have the cards as objects here with value
#and type
'B': None,#ex. object1: 8, cloves; object2: King, hearts
'C': None,
'D': None,
'E': None,
'F': None,
'G': None,
'H': None
}
def f_printK():
global FireKort
for key in FireKort:
print('Stokk:',key,' Gjenstående:',len(FireKort[key]))
try:
print(FireKort[key][0].sort, FireKort[key][0].valør)
except:
print('tom')
##here are the lists i tried:
## navn=[]
## kort=[]
## antall=[]
## for key in FireKort:
## navn.append((key+' '))
## kort.append([FireKort[key][0].sort,FireKort[key][0].valør])
## antall.append( str(len(FireKort[key])))
## print(navn)
## print(kort)
## print(antall)
A B C D E F G H
[♦9][♣A][♠Q][♣8][♦8][♣J][♣10][♦7]
4 4 4 4 4 4 4 4
Have you try to use pprint?
The pprint module provides a capability to “pretty-print arbitrary Python data structures
https://docs.python.org/2/library/pprint.html
Try this:
d = { ... }
keys = [ str(q) for q in d.keys() ]
values = [ str(q) for q in d.values() ]
txts = [ (str(a), str(b)) for a, b in zip(keys, values) ]
sizes = [ max(len(a), len(b)) for a, b in txts ]
formats = [ '%%%ds' % q for q in sizes ]
print(' '.join(a % b for a, b in zip (formats, keys)))
print(' '.join(a % b for a, b in zip (formats, values)))
In short:
first we get str values of keys and values of dictionary d (since we're going to use them twice, we might as well store them locally)
we calculate max size of each "column"
we create formats for % operator
and we print
It could be done using ljust method of str.
Example:
d = {'A':'some','B':'words','C':'of','D':'different','E':'length'}
keys = list(d.keys())
values = list(d.values())
longest = max([len(i) for i in keys]+[len(i) for i in values])
print(*[i.ljust(longest) for i in keys])
print(*[i.ljust(longest) for i in values])
Output:
A B C D E
some words of different length
Note that I harnessed fact that .keys() and .values() return key and values in same order, if no action was undertaken between them regarding given dict.
Heyo everyone, I have a question.
I have three variables, rF, tF, and dF.
Now these values can range from -100 to +100. I want to check all of them and see if they are less than 1; if they are, set them to 1.
An easy way of doing this is just 3 if statements, like
if rF < 1:
rF = 1
if tF < 1:
tF = 1
if dF < 1:
dF = 1
However, as you can see, this looks bad, and if i had, say 50 of these values, this could get out of hand quite easily.
I tried to put them in an array like so:
for item in [rF, tF, dF]:
if item < 1:
item = 1
However this doesn't work. I believe that when you do that you create a completely different object (the array), and when you change the items you are not changing the variables themselves but the values of the array.
So my question is: What is an elegant way of doing this?
Why not use a dictionary, if you've only got three variables of which to keep track?
rF, tF, dF = 100, -100, 1
d = {'rF': rF, 'tF': tF, 'dF': dF}
for k in d:
if d[k] < 1:
d[k] = 1
print(d)
{'rF': 100, 'tF': 1, 'dF': 1}
Then if you're referencing any of those values later, you can simply do this (as a trivial example):
def f(var):
print("'%s' is equal to %d" % (var, d[var]))
>>> f('rF')
'rF' is equal to 100
If you really wanted to use lists, and you knew the order of your list, you could do this (but dictionaries are made for this type of problem):
arr = [rF, tF, dF]
arr = [1 if x < 1 else x for x in arr]
print(arr)
[100, 1, 1]
Note that the list comprehension approach won't actually change the values of rF, tF, and dF.
You can simply use a dictionary and then unpack the dict:
d = {'rF': rF, 'tF': tF, 'dF': dF}
for key in d:
if d[key] < 1:
d[key] = 1
rF, tF, dF = d['rF'], d['tF'], d['dF']
You can use the following instead of the last line:
rF, tF, dF = map(d.get, ('rF', 'tF', 'dF'))
Here's exactly what you asked for:
rF = -3
tF = 9
dF = -2
myenv = locals()
for k in list(myenv.keys()):
if len(k) == 2 and k[1] == "F":
myenv[k] = max(1, myenv[k])
print(rF, tF, dF)
# prints 1 9 1
This may accidentally modify any variables you don't really want to change, so I recommend using a proper data structure instead of hacking the user environment.
Edit: Fixed an error for RuntimeError: dictionary changed size during iteration. Dictionaries cannot be iterated over and modified at the same time. Avoid this by first copying the dictionary keys, and iterating over the original keys instead of the actual dictionary. Should work in Python 2 and 3 now, just Python 2 before.
Use List Comprehension and max function.
items = [-32, 0, 43]
items = [max(1, item) for item in items]
rF, tF, dF = items
print(rF, tF, dF)
I'm creating a class where one of the methods inserts a new item into the sorted list. The item is inserted in the corrected (sorted) position in the sorted list. I'm not allowed to use any built-in list functions or methods other than [], [:], +, and len though. This is the part that's really confusing to me.
What would be the best way in going about this?
Use the insort function of the bisect module:
import bisect
a = [1, 2, 4, 5]
bisect.insort(a, 3)
print(a)
Output
[1, 2, 3, 4, 5]
Hint 1: You might want to study the Python code in the bisect module.
Hint 2: Slicing can be used for list insertion:
>>> s = ['a', 'b', 'd', 'e']
>>> s[2:2] = ['c']
>>> s
['a', 'b', 'c', 'd', 'e']
You should use the bisect module. Also, the list needs to be sorted before using bisect.insort_left
It's a pretty big difference.
>>> l = [0, 2, 4, 5, 9]
>>> bisect.insort_left(l,8)
>>> l
[0, 2, 4, 5, 8, 9]
timeit.timeit("l.append(8); l = sorted(l)",setup="l = [4,2,0,9,5]; import bisect; l = sorted(l)",number=10000)
1.2235019207000732
timeit.timeit("bisect.insort_left(l,8)",setup="l = [4,2,0,9,5]; import bisect; l=sorted(l)",number=10000)
0.041441917419433594
I'm learning Algorithm right now, so i wonder how bisect module writes.
Here is the code from bisect module about inserting an item into sorted list, which uses dichotomy:
def insort_right(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the right of the rightmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]:
hi = mid
else:
lo = mid+1
a.insert(lo, x)
If there are no artificial restrictions, bisect.insort() should be used as described by stanga. However, as Velda mentioned in a comment, most real-world problems go beyond sorting pure numbers.
Fortunately, as commented by drakenation, the solution applies to any comparable objects. For example, bisect.insort() also works with a custom dataclass that implements __lt__():
from bisect import insort
#dataclass
class Person:
first_name: str
last_name: str
age: int
def __lt__(self, other):
return self.age < other.age
persons = []
insort(persons, Person('John', 'Doe', 30))
insort(persons, Person('Jane', 'Doe', 28))
insort(persons, Person('Santa', 'Claus', 1750))
# [Person(first_name='Jane', last_name='Doe', age=28), Person(first_name='John', last_name='Doe', age=30), Person(first_name='Santa', last_name='Claus', age=1750)]
However, in the case of tuples, it would be desirable to sort by an arbitrary key. By default, tuples are sorted by their first item (first name), then by the next item (last name), and so on.
As a solution you can manage an additional list of keys:
from bisect import bisect
persons = []
ages = []
def insert_person(person):
age = person[2]
i = bisect(ages, age)
persons.insert(i, person)
ages.insert(i, age)
insert_person(('John', 'Doe', 30))
insert_person(('Jane', 'Doe', 28))
insert_person(('Santa', 'Claus', 1750))
Official solution: The documentation of bisect.insort() refers to a recipe how to use the function to implement this functionality in a custom class SortedCollection, so that it can be used as follows:
>>> s = SortedCollection(key=itemgetter(2))
>>> for record in [
... ('roger', 'young', 30),
... ('angela', 'jones', 28),
... ('bill', 'smith', 22),
... ('david', 'thomas', 32)]:
... s.insert(record)
>>> pprint(list(s)) # show records sorted by age
[('bill', 'smith', 22),
('angela', 'jones', 28),
('roger', 'young', 30),
('david', 'thomas', 32)]
Following is the relevant extract of the class required to make the example work. Basically, the SortedCollection manages an additional list of keys in parallel to the items list to find out where to insert the new tuple (and its key).
from bisect import bisect_left
class SortedCollection(object):
def __init__(self, iterable=(), key=None):
self._given_key = key
key = (lambda x: x) if key is None else key
decorated = sorted((key(item), item) for item in iterable)
self._keys = [k for k, item in decorated]
self._items = [item for k, item in decorated]
self._key = key
def __getitem__(self, i):
return self._items[i]
def __iter__(self):
return iter(self._items)
def insert(self, item):
'Insert a new item. If equal keys are found, add to the left'
k = self._key(item)
i = bisect_left(self._keys, k)
self._keys.insert(i, k)
self._items.insert(i, item)
Note that list.insert() as well as bisect.insort() have O(n) complexity. Thus, as commented by nz_21, manually iterating through the sorted list, looking for the right position, would be just as good in terms of complexity. In fact, simply sorting the array after inserting a new value will probably be fine, too, since Python's Timsort has a worst-case complexity of O(n log(n)). For completeness, however, note that a binary search tree (BST) would allow insertions in O(log(n)) time.
This is a possible solution for you:
a = [15, 12, 10]
b = sorted(a)
print b # --> b = [10, 12, 15]
c = 13
for i in range(len(b)):
if b[i] > c:
break
d = b[:i] + [c] + b[i:]
print d # --> d = [10, 12, 13, 15]
# function to insert a number in an sorted list
def pstatement(value_returned):
return print('new sorted list =', value_returned)
def insert(input, n):
print('input list = ', input)
print('number to insert = ', n)
print('range to iterate is =', len(input))
first = input[0]
print('first element =', first)
last = input[-1]
print('last element =', last)
if first > n:
list = [n] + input[:]
return pstatement(list)
elif last < n:
list = input[:] + [n]
return pstatement(list)
else:
for i in range(len(input)):
if input[i] > n:
break
list = input[:i] + [n] + input[i:]
return pstatement(list)
# Input values
listq = [2, 4, 5]
n = 1
insert(listq, n)
Well there are many ways to do this, here is a simple naive program to do the same using inbuilt Python function sorted()
def sorted_inserter():
list_in = []
n1 = int(input("How many items in the list : "))
for i in range (n1):
e1 = int(input("Enter numbers in list : "))
list_in.append(e1)
print("The input list is : ",list_in)
print("Any more items to be inserted ?")
n2 = int(input("How many more numbers to be added ? : "))
for j in range (n2):
e2= int(input("Add more numbers : "))
list_in.append(e2)
list_sorted=sorted(list_in)
print("The sorted list is: ",list_sorted)
sorted_inserter()
The output is
How many items in the list : 4
Enter numbers in list : 1
Enter numbers in list : 2
Enter numbers in list : 123
Enter numbers in list : 523
The input list is : [1, 2, 123, 523]
Any more items to be inserted ?
How many more numbers to be added ? : 1
Add more numbers : 9
The sorted list is: [1, 2, 9, 123, 523]
To add to the existing answers: When you want to insert an element into a list of tuples where the first element is comparable and the second is not you can use the key parameter of the bisect.insort function as follows:
import bisect
class B:
pass
a = [(1, B()), (2, B()), (3, B())]
bisect.insort(a, (3, B()), key=lambda x: x[0])
print(a)
Without the lambda function as the third parameter of the bisect.insort function the code would throw a TypeError as the function would try to compare the second element of a tuple as a tie breaker which isn't comparable by default.
This is the best way to append the list and insert values to sorted list:
a = [] num = int(input('How many numbers: ')) for n in range(num):
numbers = int(input('Enter values:'))
a.append(numbers)
b = sorted(a) print(b) c = int(input("enter value:")) for i in
range(len(b)):
if b[i] > c:
index = i
break d = b[:i] + [c] + b[i:] print(d)`