Converting 2 list and one string to dictionary - python

P.S: Thank you everybody ,esp Matthias Fripp . Just reviewed the question You are right I made mistake : String is value not the key
num=[1,2,3,4,5,6]
pow=[1,4,9,16,25,36]
s= ":subtraction"
dic={1:1 ,0:s , 2:4,2:s, 3:9,6:s, 4:16,12:s.......}
There is easy way to convert two list to dictionary :
newdic=dict(zip(list1,list2))
but for this problem no clue even with comprehension:
print({num[i]:pow[i] for i in range(len(num))})

As others have said, dict cannot contain duplicate keys. You can make key duplicate with a little bit of tweaking. I used OrderedDict to keep order of inserted keys:
from pprint import pprint
from collections import OrderedDict
num=[1,2,3,4,5,6]
pow=[1,4,9,16,25,36]
pprint(OrderedDict(sum([[[a, b], ['substraction ({}-{}):'.format(a, b), a-b]] for a, b in zip(num, pow)], [])))
Prints:
OrderedDict([(1, 1),
('substraction (1-1):', 0),
(2, 4),
('substraction (2-4):', -2),
(3, 9),
('substraction (3-9):', -6),
(4, 16),
('substraction (4-16):', -12),
(5, 25),
('substraction (5-25):', -20),
(6, 36),
('substraction (6-36):', -30)])

In principle, this would do what you want:
nums = [(n, p) for (n, p) in zip(num, pow)]
diffs = [('subtraction', p-n) for (n, p) in zip(num, pow)]
items = nums + diffs
dic = dict(items)
However, a dictionary cannot have multiple items with the same key, so each of your "subtraction" items will be replaced by the next one added to the dictionary, and you'll only get the last one. So you might prefer to work with the items list directly.
If you need the items list sorted as you've shown, that will take a little more work. Maybe something like this:
items = []
for n, p in zip(num, pow):
items.append((n, p))
items.append(('subtraction', p-n))
# the next line will drop most 'subtraction' entries, but on
# Python 3.7+, it will at least preserve the order (not possible
# with earlier versions of Python)
dic = dict(items)

Related

Finding Set of strings from list of pairs (string, int) with maximum int value

I have a list of (str,int) pairs
list_word = [('AND', 1), ('BECAUSE', 1), ('OF', 1), ('AFRIAD', 1), ('NEVER', 1), ('CATS', 2), ('ARE', 2), ('FRIENDS', 1), ('DOGS', 2)]
This basically says how many times each word showed up in a text.
What I want to get is the set of words with maximum occurrence along with maximum occurrence number. So, in the above example, I want to get
(set(['CATS', 'DOGS','ARE']), 2)
The solution I can think of is looping through the list. But is there any elegant way of doing this?
Two linear scans, first to find the maximal element:
maxcount = max(map(itemgetter(1), mylist))
then a second to pull out the values you care about:
maxset = {word for word, count in mylist if count == maxcount}, maxcount
If you needed to get the sets for more than just the maximal count, you can use collections.defaultdict to accumulate by count in a single pass:
from collections import defaultdict
sets_by_count = defaultdict(set)
for word, count in mylist:
sets_by_count[count].add(word)
Which can then be followed by allcounts = sorted(sets_by_count.items(), key=itemgetter(0), reverse=True) to get a list of count, set pairs, from highest to lowest count (with minimal sorting work, since it's sorting only a number of items equal to the unique counts, not all words).
Convert list to dict with key as count and value as set of words. Find the max value of key, and it;s corresponding value
from collections import defaultdict
my_list = [('AND', 1), ('BECAUSE', 1), ('OF', 1), ('AFRIAD', 1), ('NEVER', 1), ('CATS', 2), ('ARE', 2), ('FRIENDS', 1), ('DOGS', 2)]
my_dict = defaultdict(set)
for k, v in my_list:
my_dict[v].add(k)
max_value = max(my_dict.keys())
print (my_dict[max_value], max_value)
# prints: (set(['CATS', 'ARE', 'DOGS']), 2)
While the more pythonic solutions are certainly easier on the eye, unfortunately the requirement for two scans, or building data-structures you don't really want is significantly slower.
The following fairly boring solution is about ~55% faster than the dict solution, and ~70% faster than the comprehension based solutions based on the provided example data (and my implementations, machine, benchmarking etc.)
This almost certainly down to the single scan here rather than two.
word_occs = [
('AND', 1), ('BECAUSE', 1), ('OF', 1), ('AFRIAD', 1), ('NEVER', 1),
('CATS', 2), ('ARE', 2), ('FRIENDS', 1), ('DOGS', 2)
]
def linear_scan(word_occs):
max_val = 0
max_set = None
for word, occ in word_occs:
if occ == max_val:
max_set.add(word)
elif occ > max_val:
max_val, max_set = occ, {word}
return max_set, max_val
To be fair, they are all blazing fast and in your case readability might be more important.

Creating combination of value list with existing key - Pyspark

So my rdd consists of data looking like:
(k, [v1,v2,v3...])
I want to create a combination of all sets of two for the value part.
So the end map should look like:
(k1, (v1,v2))
(k1, (v1,v3))
(k1, (v2,v3))
I know to get the value part, I would use something like
rdd.cartesian(rdd).filter(case (a,b) => a < b)
However, that requires the entire rdd to be passed (right?) not just the value part. I am unsure how to arrive at my desired end, I suspect its a groupby.
Also, ultimately, I want to get to the k,v looking like
((k1,v1,v2),1)
I know how to get from what I am looking for to that, but maybe its easier to go straight there?
Thanks.
I think Israel's answer is a incomplete, so I go a step further.
import itertools
a = sc.parallelize([
(1, [1,2,3,4]),
(2, [3,4,5,6]),
(3, [-1,2,3,4])
])
def combinations(row):
l = row[1]
k = row[0]
return [(k, v) for v in itertools.combinations(l, 2)]
a.map(combinations).flatMap(lambda x: x).take(3)
# [(1, (1, 2)), (1, (1, 3)), (1, (1, 4))]
Use itertools to create the combinations. Here is a demo:
import itertools
k, v1, v2, v3 = 'k1 v1 v2 v3'.split()
a = (k, [v1,v2,v3])
b = itertools.combinations(a[1], 2)
data = [(k, pair) for pair in b]
data will be:
[('k1', ('v1', 'v2')), ('k1', ('v1', 'v3')), ('k1', ('v2', 'v3'))]
I have made this algorithm, but with higher numbers looks like that doesn't work or its very slow, it will run in a cluster of big data(cloudera), so i think that i have to put the function into pyspark, please give a hand if you can.
import pandas as pd
import itertools as itts
number_list = [10953, 10423, 10053]
def reducer(nums):
def ranges(n):
print(n)
return range(n, -1, -1)
num_list = list(map(ranges, nums))
return list(itts.product(*num_list))
data=pd.DataFrame(reducer(number_list))
print(data)

Sum second value in tuple for each given first value in tuples using Python

I'm working with a large set of records and need to sum a given field for each customer account to reach an overall account balance. While I can probably put the data in any reasonable form, I figured the easiest would be a list of tuples (cust_id,balance_contribution) as I process through each record. After the round of processing, I'd like to add up the second item for each cust_id, and I am trying to do it without looping though the data thousands of time.
As an example, the input data could look like:[(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(2,20.00)]
And I want the output to be something like this:
[(1,125.00),(2,50.00)]
I've read other questions where people have just wanted to add the values of the second element of the tuple using the form of sum(i for i, j in a), but that does separate them by the first element.
This discussion, python sum tuple list based on tuple first value, which puts the values as a list assigned to each key (cust_id) in a dictionary. I suppose then I could figure out how to add each of the values in a list?
Any thoughts on a better approach to this?
Thank you in advance.
import collections
def total(records):
dct = collections.defaultdict(int)
for cust_id, contrib in records:
dct[cust_id] += contrib
return dct.items()
Would the following code be useful?
in_list = [(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(3,20.00)]
totals = {}
for uid, x in in_list :
if uid not in totals :
totals[uid] = x
else :
totals[uid] += x
print(totals)
output :
{1: 125.0, 2: 30.0, 3: 20.0}
People usually like one-liners in python:
[(uk,sum([vv for kk,vv in data if kk==uk])) for uk in set([k for k,v in data])]
When
data=[(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(3,20.00)]
The output is
[(1, 125.0), (2, 30.0), (3, 20.0)]
Here's an itertools solution:
from itertools import groupby
>>> x
[(1, 125.5), (2, 30.0), (1, 24.5), (1, -25.0), (2, 20.0)]
>>> sorted(x)
[(1, -25.0), (1, 24.5), (1, 125.5), (2, 20.0), (2, 30.0)]
>>> for a,b in groupby(sorted(x), key=lambda item: item[0]):
print a, sum([item[1] for item in list(b)])
1 125.0
2 50.0

Sorting a dictionary by value then by key [duplicate]

This question already has answers here:
Sorting a dictionary by value then key
(3 answers)
Closed 6 years ago.
This seems like it has to be a dupe but my SO-searching-fu is poor today...
Say I have a dictionary of integer key/values, how can I sort the dictionary by the values descending, then by the key descending (for common values).
Input:
{12:2, 9:1, 14:2}
{100:1, 90:4, 99:3, 92:1, 101:1}
Output:
[(14,2), (12,2), (9,1)] # output from print
[(90,4), (99,3), (101,1), (100,1), (92,1)]
In [62]: y={100:1, 90:4, 99:3, 92:1, 101:1}
In [63]: sorted(y.items(), key=lambda x: (x[1],x[0]), reverse=True)
Out[63]: [(90, 4), (99, 3), (101, 1), (100, 1), (92, 1)]
The key=lambda x: (x[1],x[0]) tells sorted that for each item x in y.items(), use (x[1],x[0]) as the proxy value to be sorted. Since x is of the form (key,value), (x[1],x[0]) yields (value,key). This causes sorted to sort by value first, then by key for tie-breakers.
reverse=True tells sorted to present the result in descending, rather than ascending order.
See this wiki page for a great tutorial on sorting in Python.
PS. I tried using key=reversed instead, but reversed(x) returns an iterator, which does not compare as needed here.
Maybe this is more explicit:
>>> y = {100:1, 90:4, 99:3, 92:1, 101:1}
>>> reverse_comparison = lambda (a1, a2), (b1, b2):cmp((b2, b1), (a2, a1))
>>> sorted(y.items(), cmp=reverse_comparison)
[(90, 4), (99, 3), (101, 1), (100, 1), (92, 1)]
Try this:
>>> d={100:1, 90:4, 99:3, 92:1, 101:1}
>>> sorted(d.items(), lambda a,b:b[1]-a[1] or a[0]-b[0])

Why does python dict behave this way?

I have some code written like so:
class Invite(models.Model):
STATE_UNKNOWN = 0
STATE_WILL_PLAY = 1
STATE_WONT_PLAY = 2
STATE_READY = 3
STATE_CHOICES = ((STATE_UNKNOWN, _("Unknown")),
(STATE_WILL_PLAY, _("Yes, I'll play")),
(STATE_WONT_PLAY, _("Sorry, can't play")),
(STATE_READY, _("I'm ready to play now")))
...
def change_state(self, state):
assert(state in dict(Invite.STATE_CHOICES))
This code works like I want it to, but I'm curious as to why it works this way. It is admittedly very convenient that it does work this way, but it seems like maybe I'm missing some underlying philosophy as to why that is.
If I try something like:
dict((1,2,3), (2,2,3), (3,2,3))
ValueError: dictionary update sequence element #0 has length 3; 2 is required
it doesn't create a dict that looks like
{1: (2,3), 2: (2,3), 3: (2,3)}
So the general pattern is not to take the first part of the tuple as the key and the rest as the value. Is there some fundamental underpinning that causes this behavior, or it is just, well, it would be convenient if it did....
I think it's somewhat obvious. In your example, (1,2,3) is a single object. So the idea behind a dictionary is to map a key to a value (i.e. object).
So consider the output:
>>> dict(((1,(2,3)), (2,(2,3)))).items()
[(1, (2, 3)), (2, (2, 3))]
But you can also do something like this:
>>> dict((((1,2),3), ((2,2),3)))
[((1, 2), 3), ((2, 2), 3)]
Where the key is actually an object too! In this case a tuple also.
So in your example:
dict((1,2,3), (2,2,3), (3,2,3))
how do you know which part of each tuple is the key and which is the value?
If you find this annoying, it's a simple fix to write your own constructor:
def special_dict(*args):
return dict((arg[0], arg[1:]) for arg in args)
Also, to Rafe's comment, you should define the dictionary right away:
class Invite(models.Model):
STATE_UNKNOWN = 0
STATE_WILL_PLAY = 1
STATE_WONT_PLAY = 2
STATE_READY = 3
STATE_CHOICES = dict(((STATE_UNKNOWN, _("Unknown")),
(STATE_WILL_PLAY, _("Yes, I'll play")),
(STATE_WONT_PLAY, _("Sorry, can't play")),
(STATE_READY, _("I'm ready to play now"))))
...
def change_state(self, state):
assert(state in Invite.STATE_CHOICES)
If you ever want to iterate over the states, all you have to do is:
for state, description = Invite.STATE_CHOICES.iteritems():
print "{0} == {1}".format(state, description)
The construction of the dictionary in your change_state function is unnecessarily costly.
When you define the Django field, just do:
models.IntegerField(sorted(choices=Invite.STATE_CHOICES.iteritems()))
The constructor of dict accepts (among other things) a sequence of (key, value) tuples. Your second examples passes a list of tuples of length 3 instead of 2, and hence fails.
dict([(1, (2, 3)), (2, (2, 3)), (3, (2, 3))])
however will create the dictionary
{1: (2, 3), 2: (2, 3), 3: (2, 3)}
The general pattern is just this: you can create a dict from a list (in general: iterable) of pairs, treated as (key, value). Anything longer would be arbitrary: why (1,2,3)->{1:(2,3)} and not (1,2,3)-> {(1,2):3}?
Moreover, the pairs<->dict conversion is obviously two-way. With triples it couldn't be (see the above example).

Categories