How to merge dictionary values where the key count varies

How to merge dictionary values where the key count varies - python

i'm having trouble to merge the values of a dictionary whereas the dictionary varies in its key count
i found an working example using two lists like
t1 = [1,2,3]
t2 = ["a","b","c"]
output = list(zip(t1, t2))
which leads to [(1, 'a'), (2, 'b'), (3, 'c')] ... first success.
But I need to zip all the values from a dictionary, which varies in the count of the key values. (Sometimes there are 2 keys in, sometimes 4 and so on..)
Is there a way to do the zip with a dynamic input, dependent on the count of the keys
Lets say
t1 = [1,2,3]
t2 = ["a","b","c"]
generated_rows = OrderedDict()
generated_rows['t1'] = t1
generated_rows['t2']=t2
output = list(zip(??*))
the expected output would be as above:
[(1, 'a'), (2, 'b'), (3, 'c')]
but the parameters of the zip method should somehow come from the dictionary in a dynamic way. The following variing dicts should work with the method:
d1 = {'k1':[0,1,2], 'k2':['a','b','c']}
d2 = {'k1':[0,1,2], 'k2':['a','b','c'], 'k3':['x','y','z']}
d3 = ...
solution (thanks to Todd):
d1 = {'k1':[0,1,2], 'k2':['a','b','c']}
o = list(zip(*d1.values()))

If your second piece of code accurately represents what you want to do with N different lists, then the code would probably be:
t1 = [ 1, 2, 3 ]
t2 = [ 'a', 'b', 'c' ]
# And so on
x = []
x.append( t1 )
x.append( t2 )
# And so on
output = zip(*x)
You don't need the extra list() because zip() already returns a list. The * operator is sometimes referred to as the 'splat' operator, and when used like this represents unpacking the arguments.
A list is used instead of a dictionary because the 'splat' operator doesn't guarantee the order it unpacks things in beyond "whatever order the type in question uses when iterating over it". An ordered dictionary may work if the keys are selected to impose the correct ordering.

Related

Printing a Parellel Function Outputs in True Order w/Python

Looking to print everything in order, for a Python parallelized script. Note the c3 is printed prior to the b2 -- out of order. Any way to make the below function with a wait feature? If you rerun, sometimes the print order is correct for shorter batches. However, looking for a reproducible solution to this issue.
from joblib import Parallel, delayed, parallel_backend
import multiprocessing
testFrame = [['a',1], ['b', 2], ['c', 3]]
def testPrint(letr, numbr):
print(letr + str(numbr))
return letr + str(numbr)
with parallel_backend('multiprocessing'):
num_cores = multiprocessing.cpu_count()
results = Parallel(n_jobs = num_cores)(delayed(testPrint)(letr = testFrame[i][0],
numbr = testFrame[i][1]) for i in range(len(testFrame)))
print('##########')
for test in results:
print(test)
Output:
b2
c3
a1
##########
a1
b2
c3
Seeking:
a1
b2
c3
##########
a1
b2
c3

Once you launch tasks in separate processes you no longer control the order of execution so you cannot expect the actions of those tasks to execute in any predictable order - especially if the tasks can take varying lengths of time.
If you are parallelizing(?) a task/function with a sequence of arguments and you want to reorder the results to match the order of the original sequence you can pass sequence information to the task/function that will be returned by the task and can be used to reconstruct the original order.
If the original function looks like this:
def f(arg):
l,n = arg
#do stuff
time.sleep(random.uniform(.1,10.))
result = f'{l}{n}'
return result
Refactor the function to accept the sequence information and pass it through with the return value.
def f(arg):
indx, (l,n) = arg
time.sleep(random.uniform(.1,10.))
result = (indx,f'{l}{n}')
return result
enumerate could be used to add the sequence information to the sequence of data:
originaldata = list(zip('abcdefghijklmnopqrstuvwxyz', range(26)))
dataplus = enumerate(originaldata)
Now the arguments have the form (index,originalarg) ... (0, ('a',0'), (1, ('b',1)).
And the returned values from the multi-processes look like this (if collected in a list) -
[(14, 'o14'), (23, 'x23'), (1, 'b1'), (4, 'e4'), (13, 'n13'),...]
Which is easily sorted on the first item of each result, key=lambda item: item[0], and the values you really want obtained by picking out the second items after sorting results = [item[1] for item in results].

return the name of a sorted value in Python 3

I have values like
amity = 0
erudite = 2
etc.
And I am able to sort the integers with
print (sorted([amity, abnegation, candor, erudite, dauntless]))`
but I want the variable names to be attached to the integers as well, so that when the numbers are sorted I can tell what each number means.
Is there a way to do this?

Define a mapping between the names and the numbers:
numbers = dict(dauntless=42, amity=0, abnegation=1, candor=4, erudite=2)
Then sort:
d = sorted(numbers.items(), key=lambda x: x[1])
print(d)
# [('amity', 0), ('abnegation', 1), ('erudite', 2), ('candor', 4), ('dauntless', 42)]
To keep the result as a mapping/dictionary, call collections.OrderedDict on the sorted list:
from collections import OrderedDict
print(OrderedDict(d))
# OrderedDict([('amity', 0), ('abnegation', 1), ('erudite', 2), ('candor', 4), ('dauntless', 42)])

Python has a built in data-type called dictionary, it is used to map key, value pairs. It is pretty much what you asked for in your question, to attach a value into a specific key.
You can read a bit more about dictionaries here.
What I think you should do is to create a dictionary and map the names of the variables as strings to each of their integer values as shown below:
amity = 0
erudite = 2
abnegation = 50
dauntless = 10
lista = [amity, erudite, abnegation, dauntless]
dictonary = {} # initialize dictionary
dictionary[amity] = 'amity'# You're mapping the value 0 to the string amity, not the variable amity in this case.
dictionary[abnegation] = 'abnegation'
dictionary[erudite] = 'erudite'
dictionary[dauntless] = 'dauntless'
print(dictionary) # prints all key, value pairs in the dictionary
print(dictionary[0]) # outputs amity.
for item in sorted(lista):
print(dictionary[x]) # prints values of dictionary in an ordered manner.

Python sort multiple lists by date and print list names

Working in Python 3.5.2 I have four lists of dates, each in ascending order, where the lists are not of equal length. Each list of dates is generated by a lookup into a longer list of dates. A sample date value and data type is shown below:
In: print (date, type(date))
Out: 725722.0 <class 'numpy.float64'>
I build each list of dates using a respective loop. To see the values I convert to strings and print each list. So I could sort with data type as numpy float64 or convert to string. Relevant values of actual data in each list (based on specific filter settings) are shown below:
a = [12-17-1987, 11-22-1989, 03-05-1990, 11-12-1990]
b = [12-16-1987, 03-02-1990, 11-12-1990]
c = [10-09-1986, 12-16-1987, 03-05-1990, 11-12-1990]
d = [10-16-1985, 08-20-1986, 10-15-1986, 12-16-1987, 03-02-1990]
I need to sort dates from all four lists in ascending order by mm-dd-yyyy, print each date, and beside each date print the name of the respective list, as shown in the example below:
# Desired Printout
10-16-1985 d
08-20-1986 d
10-09-1986 c
10-15-1986 d
12-16-1987 b
12-16-1987 c
12-16-1987 d
12-17-1987 a
11-22-1989 a
03-02-1990 b
03-02-1990 d
03-05-1990 a
03-05-1990 c
11-12-1990 a
11-12-1990 b
11-12-1990 c
This will give me visual confirmation of a sequence of events in four different sets of data. I would try to create a dictionary and sort by date for print to screen or disk but I have noticed similar answers using map or lambda functions that may provide a more elegant solution. If I am storing this information on disk what is the best data structure and solution?

I have a couple comments on this one:
"Best" is ambiguous. It could mean minimized algorithmic complexity, minimized runtime, minimized memory usage, simplest to implement or read, least amount of code, etc.
Unless you have thousands of entries, it might not be worth optimizing your data structure or algorithm. The community's accepted best practice is to profile and optimize what's slow about your entire program.
A simple implementation could be nothing more than joining the lists and sorting them with the sorted built-in. For example, here are a few options you might consider for sorting:
import datetime
a = ['7-1-1987', '1-1-1990']
b = ['7-2-1987', '1-5-1990']
c = ['7-1-1987', '1-3-1990']
d = ['1-10-1985', '7-10-1986']
# hold on to list name
a = [(i, 'a') for i in a] # [(date, list_name), ...]
b = [(i, 'b') for i in b]
c = [(i, 'c') for i in c]
d = [(i, 'd') for i in d]
dates = a + b + c + d # combine into one flat list
for i in dates: print(i)
Output
('7-1-1987', 'a')
('1-1-1990', 'a')
('7-2-1987', 'b')
('1-5-1990', 'b')
('7-1-1987', 'c')
('1-3-1990', 'c')
('1-10-1985', 'd')
('7-10-1986', 'd')
Approach 1 - Parse each date string to a datetime object, sort them in place, and output a list of datetime objects.
dates_1 = [(datetime.datetime.strptime(d, '%m-%d-%Y').date(), l) for d, l in dates]
dates_1.sort()
for i in dates_1: print(i)
Output
(datetime.date(1985, 1, 10), 'd')
(datetime.date(1986, 7, 10), 'd')
(datetime.date(1987, 7, 1), 'a')
(datetime.date(1987, 7, 1), 'c')
(datetime.date(1987, 7, 2), 'b')
(datetime.date(1990, 1, 1), 'a')
(datetime.date(1990, 1, 3), 'c')
(datetime.date(1990, 1, 5), 'b')
Approach 2 - Sort the dates using a lambda function that parses them on the fly, and output a (new) list of strings.
dates_2 = sorted(dates, key=lambda d: (datetime.datetime.strptime(d[0], '%m-%d-%Y').date(), d[1]))
for i in dates_2: print(i)
Output
('1-10-1985', 'd')
('7-10-1986', 'd')
('7-1-1987', 'a')
('7-1-1987', 'c')
('7-2-1987', 'b')
('1-1-1990', 'a')
('1-3-1990', 'c')
('1-5-1990', 'b')
Approach 3 - Use heapq.merge to sort more efficiently. Credit to #friendlydog for the suggestion.
import datetime
import heapq
a = ['7-1-1987', '1-1-1990']
b = ['7-2-1987', '1-5-1990']
c = ['7-1-1987', '1-3-1990']
d = ['1-10-1985', '7-10-1986']
def strs_to_dates(date_strs, list_name):
"""
Convert a list of date strings to a generator of (date, str) tuples.
"""
return ((datetime.datetime.strptime(date, '%m-%d-%Y').date(), list_name) for date in date_strs)
a = strs_to_dates(a, 'a')
b = strs_to_dates(b, 'b')
c = strs_to_dates(c, 'c')
d = strs_to_dates(d, 'd')
dates_3 = heapq.merge(a, b, c, d)
for i in dates_3: print(i)
Output
(datetime.date(1985, 1, 10), 'd')
(datetime.date(1986, 7, 10), 'd')
(datetime.date(1987, 7, 1), 'a')
(datetime.date(1987, 7, 1), 'c')
(datetime.date(1987, 7, 2), 'b')
(datetime.date(1990, 1, 1), 'a')
(datetime.date(1990, 1, 3), 'c')
(datetime.date(1990, 1, 5), 'b')
Notes:
I assumed the format of your input strings is 'day-month-year'.
I assumed when the same date is in multiple lists, that you'd want to secondarily sort alphanumerically by list name.
I left formatting the output list as an exercise for the reader.
Both examples working under Python 2 / 3.
In this example, the key argument is a lambda. Without that it would sort the strings alphabetically. This lets us override that and sort by year > month > day.
A more elaborate implementation could take advantage of the guarantee that the lists are pre-sorted. Wikipedia has a list of merge algorithms to consider.

Assuming your dates are all formatted as mm-dd-yyyy (unlike your example), this should do the trick:
import itertools
lists = dict(a=['7-1-1987', '1-1-1990'],
b=['7-2-1987', '1-5-1990'],
c=['7-1-1987', '1-3-1990'],
d=['1-10-1985', '7-10-1986'])
for d, v in sorted(itertools.chain(*([(e, n) for e in v] for n, v in lists.items()))):
print d, v
If the dates aren't formatted properly, then you'd have to add a custom sorting key to the sorted function to parse the date into a properly comparable objects.

# Create the list of all dates, combining the four lists you have. Keep
# the information about which list value comes from
all_dates = [(x, 'a') for x in a] + [(x, 'b') for x in b] + [(x, 'c') for x in c] + [(x, 'd') for x in d]
# Sort with key a simple date parser. The way it works is:
# 1. It takes a date 11-12-2012 and splits it by '-' so that we get ['11', '12', '2012']
# 2. Reverses the list ([::-1]) so that the year is the most significant (['2012', '12', '11'])
# 3. Applies int to each so that they are compared as numbers ([2012, 12, 11]). Note that Python can automatically compare things like that
all_dates.sort(key = lambda x: list(map(int, x[0].split('-')[::-1])))
# Print the result
for date in all_dates:
print ' '.join(date)

You honestly don't need anything that fancy. Just do a min on the first item in every list. Then check if the value that is the min is in any of the lists and do a list.pop() and a print then. That's a simple way to do it that is efficient and makes sense. I could provide you the code but this should be clear enough.

Sum second value in tuple for each given first value in tuples using Python

I'm working with a large set of records and need to sum a given field for each customer account to reach an overall account balance. While I can probably put the data in any reasonable form, I figured the easiest would be a list of tuples (cust_id,balance_contribution) as I process through each record. After the round of processing, I'd like to add up the second item for each cust_id, and I am trying to do it without looping though the data thousands of time.
As an example, the input data could look like:[(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(2,20.00)]
And I want the output to be something like this:
[(1,125.00),(2,50.00)]
I've read other questions where people have just wanted to add the values of the second element of the tuple using the form of sum(i for i, j in a), but that does separate them by the first element.
This discussion, python sum tuple list based on tuple first value, which puts the values as a list assigned to each key (cust_id) in a dictionary. I suppose then I could figure out how to add each of the values in a list?
Any thoughts on a better approach to this?
Thank you in advance.

import collections
def total(records):
dct = collections.defaultdict(int)
for cust_id, contrib in records:
dct[cust_id] += contrib
return dct.items()

Would the following code be useful?
in_list = [(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(3,20.00)]
totals = {}
for uid, x in in_list :
if uid not in totals :
totals[uid] = x
else :
totals[uid] += x
print(totals)
output :
{1: 125.0, 2: 30.0, 3: 20.0}

People usually like one-liners in python:
[(uk,sum([vv for kk,vv in data if kk==uk])) for uk in set([k for k,v in data])]
When
data=[(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(3,20.00)]
The output is
[(1, 125.0), (2, 30.0), (3, 20.0)]

Here's an itertools solution:
from itertools import groupby
>>> x
[(1, 125.5), (2, 30.0), (1, 24.5), (1, -25.0), (2, 20.0)]
>>> sorted(x)
[(1, -25.0), (1, 24.5), (1, 125.5), (2, 20.0), (2, 30.0)]
>>> for a,b in groupby(sorted(x), key=lambda item: item[0]):
print a, sum([item[1] for item in list(b)])
1 125.0
2 50.0

Converting tuple to integer

I have this python function that takes 2 arguments (string , dictionary) and returns a float. The function is designed to take the average of the integers within a dicionary of scores and strings.
def happiness_score(string, dic):
keys = string.lower().split()
v = sum(dic[key] for key in keys)
return float(v)/len(keys)
I have this test case which works:
print happiness_score("a b" , {"a":(1.2) , "b":(3.4)})
>>> 2.3
I also have a test case with tuples:
print happiness_score("a b" , {"a":(1,2) , "b":(3,4)})
How can I change my code so that I can convert any given tuple to integer so that I can still run my program?

Attempting to use my ninja mind reading skills to guess how you want to convert a tuple to a float, Perhaps you want:
def tup2float(tup):
return float('.'.join(str(x) for x in tup))
This will only work with 2-tuples...
Some results:
>>> tup2float((1,2))
1.2
>>> tup2float((2,3))
2.3
>>> tup2float((2,30))
2.3
>>> tup2float((2,32))
2.32

I'm going to assume that you want the answer in this case to be (2, 3). In other words, you want to treat each element in the tuple as the source of a separate mean.
def happiness_score(dic):
scores = map(sum, zip(*d.values()))
return [float(v)/len(dic) for v in scores]
The first step is to create a set of values for each tuple element:
d = {'a': (1, 2), 'b': (3, 4)}
zip(*d.values())
[(1, 3), (2, 4)]
Then apply sum to each tuple using map.

assuming you read the file into a list called lines
hrs, miles = zip(*[(float(line.split()[-4]), float(line.split()[-2])) for line in lines if 'miles' and 'hrs' in line])
total_hrs = sum(hrs)
total_miles = sum(miles)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to merge dictionary values where the key count varies - python

Related

Printing a Parellel Function Outputs in True Order w/Python

return the name of a sorted value in Python 3

Python sort multiple lists by date and print list names

Sum second value in tuple for each given first value in tuples using Python

Converting tuple to integer

Categories

Resources