How to get 'specific' keys with maximum value in dictionary? - python

My dict is:
rec_Dict = {'000000000500test.0010': -103,
'000000000500test.0012': -104,
'000000000501test.0015': -105,
'000000000501test.0017': -106}
I know how to find maximum value:
>>print 'max:' + str(max(recB_Dict.iteritems(), key=operator.itemgetter(1)))
max:(u'000000000500test.0010', -103)`
But I want to find keys beginning with '000000000501test', but not including '000000000501test.0015' or any starting with '000000000500test'.
It should print like:
max:(u'000000000501test.0015', -105)`
How can I use keyword to get?

I can't understand the conditions you want to filter keys, but you can use the below scripts (just fix the conditions)
genetator_filter = genetator_filter = ((a,b) for a,b in rec_Dict.iteritems() if (not '.0015' in a) and (not '000000000500test.' in a) )
#(you need to fix filter conditions for keys)
print 'max:' + str(max(genetator_filter, key = lambda x:x[1]))

Separating responsibility to achieve the final result, you can find your max based on what you are looking to match on exactly. Then using that max, just output the value. Granted, some will argue that it is not the most optimized, or not the most functional way to go about it. But, personally, it works just fine, and achieves the result with good enough performance. Furthermore, makes it more readable and easy to test.
Get the max based on the part of the string you want by extracting keys and finding the max:
max_key_substr = max(i.split('.')[0] for i in rec_Dict)
Iterate with that max_key_substr and output the key/value pair:
for key, value in rec_Dict.items():
if max_key_substr in key:
print(key, value)
The output will be:
000000000501test.0015 -105
000000000501test.0017 -106

What you say it should print like doesn't make sense because the key '000000000501test.0015'should have been excluded according to other things you said.
Ignoring that, you could use a generator expression to sift-out the items you don't want processed:
from operator import itemgetter
rec_Dict = {'000000000500test.0010': -103,
'000000000500test.0012': -104,
'000000000501test.0015': -105,
'000000000501test.0017': -106}
def get_max(items):
def sift(record):
key, value = record
return key.startswith('000000000501') and not key.endswith('.0015')
max_record = max((item for item in items if sift(item)), key=itemgetter(1))
return max_record
print(get_max(rec_Dict.iteritems())) # -> ('000000000501test.0017', -106)

Related

Get max value in Python in case of tiebreaker

I would like to know of a way to get the reverse alphabetical order item in the case of a tiebreaker using max(lst,key=lst.count) or if there is a better way.
For example if I had the following list:
mylist = ['fire','water','water','fire']
Even though both water and fire occur twice, I would like it to return 'water' since it comes first in the reverse alphabetical order instead of it returning the first available value.
I created a function that gets the max value based on two keys, being one primary and one secondary.
Here it is:
def max_2_keys(__iter: iter, primary, secondary):
srtd = sorted(__iter, key=primary, reverse=True)
filtered = filter(lambda x: primary(x) == primary(srtd[0]), srtd)
return sorted(filtered, key=secondary, reverse=True)[0]
in your case, you can execute the following lines:
from string import printable
max_2_keys(lst,primary=lst.count, secondary=lambda x: printable[::-1].index(x[0]))
```

How to use Python Decorator to change only one part of function?

I am practically repeating the same code with only one minor change in each function, but an essential change.
I have about 4 functions that look similar to this:
def list_expenses(self):
explist = [(key,item.amount) for key, item in self.expensedict.iteritems()] #create a list from the dictionary, making a tuple of dictkey and object values
sortedlist = reversed(sorted(explist, key = lambda (k,a): (a))) #sort the list based on the value of the amount in the tuples of sorted list. Reverse to get high to low
for ka in sortedlist:
k, a = ka
print k , a
def list_income(self):
inclist = [(key,item.amount) for key, item in self.incomedict.iteritems()] #create a list from the dictionary, making a tuple of dictkey and object values
sortedlist = reversed(sorted(inclist, key = lambda (k,a): (a))) #sort the list based on the value of the amount in the tuples of sorted list. Reverse to get high to low
for ka in sortedlist:
k, a = ka
print k , a
I believe this is what they refer to as violating "DRY", however I don't have any idea how I can change this to be more DRYlike, as I have two seperate dictionaries(expensedict and incomedict) that I need to work with.
I did some google searching and found something called decorators, and I have a very basic understanding of how they work, but no clue how I would apply it to this.
So my request/question:
Is this a candidate for a decorator, and if a decorator is
necessary, could I get hint as to what the decorator should do?
Pseudocode is fine. I don't mind struggling. I just need something
to start with.
What do you think about using a separate function (as a private method) for list processing? For example, you may do the following:
def __list_processing(self, list):
#do the generic processing of your lists
def list_expenses(self):
#invoke __list_processing with self.expensedict as a parameter
def list_income(self):
#invoke __list_processing with self.incomedict as a parameter
It looks better since all the complicated processing is in a single place, list_expenses and list_income etc are the corresponding wrapper functions.

From an (ID, number) pair keep only those pairs that contain the largest number

I am new in python and I would like some help for a small problem. I have a file whose each line has an ID plus an associated number. More than one numbers can be associated to the same ID. How is it possible to get only the ID plus the largest number associated with it in python?
Example:
Input: ID_file.txt
ENSG00000133246 2013 ENSG00000133246 540
ENSG00000133246 2010
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
ENSG00000158458 2553
What I want is the following:
ENSG00000133246 2013
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
Thanks in advance for any help!
I would think there are many ways to do this I would though use a dictionary
from collections import defaultdict
id_value_dict = defaultdict()
for line in open(idfile.txt).readlines():
id, value = line.strip().split()
if id not in id_value_dict:
id_value_dict[id] = int(value)
else:
if id_value_dict[id] < int(value):
id_value_dict[id] = int(value)
Next step is to get the dictionary written out
out_ref = open(outputfile.txt,'w')
for key, value in id_value_dict:
outref.write(key + '\t' + str(value)
outref.close()
There are slicker ways to do this, I think the dictionary could be written in a one-liner using a lamda or a list-comprehension but I like to start simple
Just in case you need the results sorted there are lots of ways to do it but I think it is critical to understand working with lists and dictionaries in python as I have found that the learning to think about the right data container is usually the key to solving many of my problems but I am still a new. Any way if you need the sorted results a straightforward was is to
id_value_dict.keys().sort()
SO this is one of the slick things about python id_value__dict.keys() is a list of the keys of the dictionary sorted
out_ref = open(outputfile.txt,'w')
for key in id_value_dict.keys():
outref.write(key + '\t' + str(id_value_dict[key])
outref.close()
its really tricky because you might want (I know I always want) to code
my_sorted_list = id_value_dict.keys().sort()
However you will find that my_sorted_list does not exist (NoneType)
Given that your input consists of nothing but contiguous runs for each ID—that is, as soon as you see another ID, you never see the previous ID again—you can just do this:
import itertools
import operator
with open('ID_file.txt') as idfile, open('max_ID_file.txt', 'w') as maxidfile:
keyvalpairs = (line.strip().split(None, 1) for line in idfile)
for key, group in itertools.groupby(keyvalpairs, operator.itemgetter(0)):
maxval = max(int(keyval[1]) for keyval in group)
maxidfile.write('{} {}\n'.format(key, maxval))
To see what this does, let's go over it line by line.
A file is just an iterable full of lines, so for line in idfile means exactly what you'd expect. For each line, we're calling strip to get rid of extraneous whitespace, then split(None, 1) to split it on the first space, so we end up with an iterable full of pairs of strings.
Next, we use groupby to change that into an iterable full of (key, group) pairs. Try printing out list(keyvalpairs) to see what it looks like.
Then we iterate over that, and just use max to get the largest value in each group.
And finally, we print out the key and the max value for the group.

How to retrieve from python dict where key is only partially known?

I have a dict that has string-type keys whose exact values I can't know (because they're generated dynamically elsewhere). However, I know that that the key I want contains a particular substring, and that a single key with this substring is definitely in the dict.
What's the best, or "most pythonic" way to retrieve the value for this key?
I thought of two strategies, but both irk me:
for k,v in some_dict.items():
if 'substring' in k:
value = v
break
-- OR --
value = [v for (k,v) in some_dict.items() if 'substring' in k][0]
The first method is bulky and somewhat ugly, while the second is cleaner, but the extra step of indexing into the list comprehension (the [0]) irks me. Is there a better way to express the second version, or a more concise way to write the first?
There is an option to write the second version with the performance attributes of the first one.
Use a generator expression instead of list comprehension:
value = next(v for (k,v) in some_dict.iteritems() if 'substring' in k)
The expression inside the parenthesis will return an iterator which you will then ask to provide the next, i.e. first element. No further elements are processed.
How about this:
value = (v for (k,v) in some_dict.iteritems() if 'substring' in k).next()
It will stop immediately when it finds the first match.
But it still has O(n) complexity, where n is the number of key-value pairs. You need something like a suffix list or a suffix tree to speed up searching.
If there are many keys but the string is easy to reconstruct from the substring, then it can be faster reconstructing it. e.g. often you know the start of the key but not the datestamp that has been appended on. (so you may only have to try 365 dates rather than iterate through millions of keys for example).
It's unlikely to be the case but I thought I would suggest it anyway.
e.g.
>>> names={'bob_k':32,'james_r':443,'sarah_p':12}
>>> firstname='james' #you know the substring james because you have a list of firstnames
>>> for c in "abcdefghijklmnopqrstuvwxyz":
... name="%s_%s"%(firstname,c)
... if name in names:
... print name
...
james_r
class MyDict(dict):
def __init__(self, *kwargs):
dict.__init__(self, *kwargs)
def __getitem__(self,x):
return next(v for (k,v) in self.iteritems() if x in k)
# Defining several dicos ----------------------------------------------------
some_dict = {'abc4589':4578,'abc7812':798,'kjuy45763':1002}
another_dict = {'boumboum14':'WSZE x478',
'tagada4783':'ocean11',
'maracuna102455':None}
still_another = {12:'jfg',45:'klsjgf'}
# Selecting the dicos whose __getitem__ method will be changed -------------
name,obj = None,None
selected_dicos = [ (name,obj) for (name,obj) in globals().iteritems()
if type(obj)==dict
and all(type(x)==str for x in obj.iterkeys())]
print 'names of selected_dicos ==',[ name for (name,obj) in selected_dicos]
# Transforming the selected dicos in instances of class MyDict -----------
for k,v in selected_dicos:
globals()[k] = MyDict(v)
# Exemple of getting a value ---------------------------------------------
print "some_dict['7812'] ==",some_dict['7812']
result
names of selected_dicos == ['another_dict', 'some_dict']
some_dict['7812'] == 798
I prefer the first version, although I'd use some_dict.iteritems() (if you're on Python 2) because then you don't have to build an entire list of all the items beforehand. Instead you iterate through the dict and break as soon as you're done.
On Python 3, some_dict.items(2) already results in a dictionary view, so that's already a suitable iterator.

Map list of tuples into a dictionary

I've got a list of tuples extracted from a table in a DB which looks like (key , foreignkey , value). There is a many to one relationship between the key and foreignkeys and I'd like to convert it into a dict indexed by the foreignkey containing the sum of all values with that foreignkey, i.e. { foreignkey , sumof( value ) }. I wrote something that's rather verbose:
myDict = {}
for item in myTupleList:
if item[1] in myDict:
myDict [ item[1] ] += item[2]
else:
myDict [ item[1] ] = item[2]
but after seeing this question's answer or these two there's got to be a more concise way of expressing what I'd like to do. And if this is a repeat, I missed it and will remove the question if you can provide the link.
Assuming all your values are ints, you could use a defaultdict to make this easier:
from collections import defaultdict
myDict = defaultdict(int)
for item in myTupleList:
myDict[item[1]] += item[2]
defaultdict is like a dictionary, except if you try to get a key that isn't there it fills in the value returned by the callable - in this case, int, which returns 0 when called with no arguments.
UPDATE: Thanks to #gnibbler for reminding me, but tuples can be unpacked in a for loop:
from collections import defaultdict
myDict = defaultdict(int)
for _, key, val in myTupleList:
myDict[key] += val
Here, the 3-item tuple gets unpacked into the variables _, key, and val. _ is a common placeholder name in Python, used to indicate that the value isn't really important. Using this, we can avoid the hairy item[1] and item[2] indexing. We can't rely on this if the tuples in myTupleList aren't all the same size, but I bet they are.
(We also avoid the situation of someone looking at the code and thinking it's broken because the writer thought arrays were 1-indexed, which is what I thought when I first read the code. I wasn't alleviated of this until I read the question. In the above loop, however, it's obvious that myTupleList is a tuple of three elements, and we just don't need the first one.)
from collections import defaultdict
myDict = defaultdict(int)
for _, key, value in myTupleList:
myDict[key] += value
Here's my (tongue in cheek) answer:
myDict = reduce(lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1], myTupleList, {})
It is ugly and bad, but here is how it works.
The first argument to reduce (because it isn't clear there) is lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1]. I will talk about this later, but for now, I'll just call it joe (no offense to any people named Joe intended). The reduce function basically works like this:
joe(joe(joe({}, myTupleList[0]), myTupleList[1]), myTupleList[2])
And that's for a three element list. As you can see, it basically uses its first argument to sort of accumulate each result into the final answer. In this case, the final answer is the dictionary you wanted.
Now for joe itself. Here is joe as a def:
def joe(myDict, tupleItem):
myDict[tupleItem[1]] = myDict.get(tupleItem[1], 0) + tupleItem[2]
return myDict
Unfortunately, no form of = or return is allowed in a Python lambda so that has to be gotten around. I get around the lack of = by calling the dicts __setitem__ function directly. I get around the lack of return in by creating a tuple with the return value of __setitem__ and the dictionary and then return the tuple element containing the dictionary. I will slowly alter joe so you can see how I accomplished this.
First, remove the =:
def joe(myDict, tupleItem):
# Using __setitem__ to avoid using '='
myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2])
return myDict
Next, make the entire expression evaluate to the value we want to return:
def joe(myDict, tupleItem):
return (myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2]),
myDict)[1]
I have run across this use-case for reduce and dict many times in my Python programming. In my opinion, dict could use a member function reduceto(keyfunc, reduce_func, iterable, default_val=None). keyfunc would take the current value from the iterable and return the key. reduce_func would take the existing value in the dictionary and the value from the iterable and return the new value for the dictionary. default_val would be what was passed into reduce_func if the dictionary was missing a key. The return value should be the dictionary itself so you could do things like:
myDict = dict().reduceto(lambda t: t[1], lambda o, t: o + t, myTupleList, 0)
Maybe not exactly readable but it should work:
fks = dict([ (v[1], True) for v in myTupleList ]).keys()
myDict = dict([ (fk, sum([ v[2] for v in myTupleList if v[1] == fk ])) for fk in fks ])
The first line finds all unique foreign keys. The second line builds your dictionary by first constructing a list of (fk, sum(all values for this fk))-pairs and turning that into a dictionary.
Look at SQLAlchemy and see if that does all the mapping you need and perhaps more

Categories