Multiple dynamic sort values in lambda - python

I can sort an array by mapping. But how can i build the lambda for a unknown count of dynamic maps in s?
l = ['1','2','3']
s = [{ '3':'a', '2':'a', '1':'c'},{ '3':'z', '2':'a', '1':'b'},{ '1':34, '2':123, '3':1000}]
sorted(l, key=lambda x: (s[0][x], s[1][x], s[2][x]))

yes, why hardcoding this when you can iterate on s and create the key using list comprehension:
sorted(l, key=lambda x: [p[x] for p in s])
(note that now key is a list, no longer a tuple but that doesn't matter since they share the same ordering)

You can sort in an arbitrary number of parameters using comparator like so:
from functools import cmp_to_key
l = ['1','2','3']
s = [{ '3':'a', '2':'a', '1':'c'},{ '3':'z', '2':'a', '1':'b'},{ '1':34, '2':123, '3':1000}]
def compare(x,y):
for dc in s:
if dc[x] < dc[y]:
return -1
elif dc[x] > dc[y]:
return 1
return 0
sorted(l, key= cmp_to_key(compare))
There's no need to use lambda function for this one.

Related

parse data using regex and converting it into tuple

I need to find a city with the highest population using regex, data is presented in such way:
data = ["id,name,poppulation,is_capital",
"3024,eu_kyiv,24834,y",
"3025,eu_volynia,20231,n",
"3026,eu_galych,23745,n",
"4892,me_medina,18038,n",
"4401,af_cairo,18946,y",
"4700,me_tabriz,13421,n",
"4899,me_bagdad,22723,y",
"6600,af_zulu,09720,n"]
I've done this so far:
def max_population(data):
lst = []
for items in data:
a = re.findall(r',\S+_\S+,[0-9]+', items)
lst += [[b for b in i.split(',') if b] for i in a]
return max(lst, key=lambda x:int(x[1]))
But function should return (str, int) tuple, is it possible to change my code in a way that it will return tuple without iterating list once again?
All your strings are separated by a comma. You could get the max value using split and check if the third value is a digit and is greater than the first value of the tuple.
If it is, set it as the new highest value.
def max_population(data):
result = None
for s in data:
parts = s.split(",")
if not parts[2].isdigit():
continue
tup = (parts[1], int(parts[2]))
if result is None or tup[1] > result[1]:
result = tup
return result
print(max_population(items))
Output
('eu_kyiv', 24834)
Python demo
The following long line get the wanted result (str, int) tuple:
def max_population(data):
p=max([(re.findall(r"(\w*),\d*,\w$",i)[0],int(re.findall(r"(\d*),\w$",i)[0])) for n,i in enumerate(data) if n>0],key=lambda x:int(x[1]) )
return p
in this line,enumerate(data) and n>0 were used to skip the header "id,name,poppulation,is_capital". But if data has no-header the, line would be:
def max_population(data):
p=max([(re.findall(r"(\w*),\d*,\w$",i)[0],int(re.findall(r"(\d*),\w$",i)[0])) for i in data],key=lambda x:int(x[1]) )
return p
The result for both is ('eu_kyiv', 24834)
Create a list of tuples instead of a list of lists.
import re
data = ["id,name,poppulation,is_capital",
"3024,eu_kyiv,24834,y",
"3025,eu_volynia,20231,n",
"3026,eu_galych,23745,n",
"4892,me_medina,18038,n",
"4401,af_cairo,18946,y",
"4700,me_tabriz,13421,n",
"4899,me_bagdad,22723,y",
"6600,af_zulu,09720,n"]
def max_population(data):
lst = []
for items in data:
a = re.findall(r',\S+_\S+,[0-9]+', items)
lst += [tuple(b for b in i.split(',') if b) for i in a]
return max(lst, key=lambda x:int(x[1]))
print(max_population(data))
You could create a mapping function to map the types to the data and use the operator.itemgetter function as your key in max:
from operator import itemgetter
def f(row):
# Use a tuple of types to cast str to the desired type
types = (str, int)
# slice here to get the city and population values
return tuple(t(val) for t, val in zip(types, row.split(',')[1:3]))
# Have max consume a map on the data excluding the
# header row (hence the slice)
max(map(f, data[1:]), key=itemgetter(1))
('eu_kyiv', 24834)

What is an easy way to remove duplicates from only part of the string in Python?

I have a list of strings that goes like this:
1;213;164
2;213;164
3;213;164
4;213;164
5;213;164
6;213;164
7;213;164
8;213;164
9;145;112
10;145;112
11;145;112
12;145;112
13;145;112
14;145;112
15;145;112
16;145;112
17;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
I would like to remove all duplicates where second 2 numbers are the same. So after running it through program I would get something like this:
1;213;164
9;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
But something like
8;213;164
15;145;112
1001;1;151
1002;2;81
1003;3;171
1004;4;31
would also be correct.
Here is a nice and fast trick you can use (assuming l is your list):
list({ s.split(';', 1)[1] : s for s in l }.values())
No need to import anything, and fast as can be.
In general you can define:
def custom_unique(L, keyfunc):
return list({ keyfunc(li): li for li in L }.values())
You can group the items by this key and then use the first item in each group (assuming l is your list).
import itertools
keyfunc = lambda x: x.split(";", 1)[1]
[next(g) for k, g in itertools.groupby(sorted(l, key=keyfunc), keyfunc)]
Here is a code on the few first items, just switch my list with yours:
x = [
'7;213;164',
'8;213;164',
'9;145;112',
'10;145;112',
'11;145;112',
]
new_list = []
for i in x:
check = True
s_part = i[i.find(';'):]
for j in new_list:
if s_part in j:
check = False
if check == True:
new_list.append(i)
print(new_list)
Output:
['7;213;164', '9;145;112']

How to sort a list of strings by frequency?

I have a list of files
example_list = [7.gif, 8.gif, 123.html]
There are over 700k elements and I need to sort them by frequency to see the most accessed file and least accessed file.
for i in resl:
if resl.count(i) > 500:
resl2.append(i)
print(resl2)
When I run this it never compiles. And i have tried other methods but no results.
Your algorithm is unecessarily quadratic time. The following is linear
from collections import Counter
resl2 = [k for k,v in Counter(resl).items() if v > 500]
If you need them sorted, then do something like
resl2 = [(k,v) for k,v in Counter(resl).items() if v > 500]
resl2.sort(key=lambda kv: kv[1])
resl2 = [k for k,v in resl2]
From your comment:
I just need to find out which file occurs the most.
So:
statistics.mode(example_list)
Note that i represents an element from the array and not an integer
for i in resl:
if resl.count(i) > 500:
resl2.append(i)
print(resl2)
Change it to this.
for i in range(0,len(resl)-1):
if i > 500:
resl2.append(resl[i])
print(resl2)
You can do this trick using a set ;)
Here you have a minimal example for a list of files and showing when it appears 2 times:
files = ['10.gif', '8.gif', '0.gif', '0.doc', '0.gif', '0.gif', '0.tmp', '0.doc', '0.gif']
file_set = set(files)
files_freq = [0]*len(file_set)
for n,file in enumerate(file_set):
files_freq[n] = files.count(file)
sorted_list = [f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]
print(sorted_list)
and the output will be: ['0.gif', '0.doc']
The set will filter the list only to unique occurrences of each file and the loop will calculate the count of each file.
After, the spooky list comprehension is the trick!
[f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]
This will create a list only with the files which appeared 2 or more times, then the key part forces the sorted function to use the first files_freq from zip(files_freq, file_set) to do the sorting and reverse is to sort the list in descendant order, showing the highest frequencies before.

python - Expected dict, got list error

In the below function there is a dictionary called task_ranku{}.
I'm trying to sort by its values and print the dictionary.
However, when I add these 2 lines to sort and print, I get the error Expected dict, got list
Could anybody explain what I am doing wrong?
cdef dict task_ranku_sorted = sorted(task_ranku.values())
for key, value in task_ranku_sorted.iteritems():
print key, value
def heft_order(object nxgraph, PlatformModel platform_model):
"""
Order task according to HEFT ranku.
Args:
nxgraph: full task graph as networkx.DiGraph
platform_model: cscheduling.PlatformModel instance
Returns:
a list of tasks in a HEFT order
"""
cdef double mean_speed = platform_model.mean_speed
cdef double mean_bandwidth = platform_model.mean_bandwidth
cdef double mean_latency = platform_model.mean_latency
cdef dict task_ranku = {}
for idx, task in enumerate(list(reversed(list(networkx.topological_sort(nxgraph))))):
ecomt_and_rank = [
task_ranku[child] + (edge["weight"] / mean_bandwidth + mean_latency)
for child, edge in nxgraph[task].items()
] or [0]
task_ranku[task] = task.amount / mean_speed + max(ecomt_and_rank) + 1
# use node name as an additional sort condition to deal with zero-weight tasks (e.g. root)
return sorted(nxgraph.nodes(), key=lambda node: (task_ranku[node], node.name), reverse=True)
The task_ranku_sorted is not a dictonary, it is a list:
task_ranku_sorted = sorted(task_ranku.values())
as you call values() on your original dictionary and sort only the value list.
You can check it:
print type(task_ranku_sorted)
cdef dict task_ranku_sorted = sorted(task_ranku.values())
This will give the sorted list of values so, task_ranku_sorted is a list of sorted values.
and on that your using function iteritems() .This function is only allowed for or applicable for dict type variable not for list and your using it for list which is task_ranku_sorted.iteritems() that is why you are getting this error.
If you want to sort dictionary see following functions:
import operator
sorted_dictinory = sorted(dictinory.items(), key=operator.itemgetter(1))
OR
sorted_dictinory = sorted(dictinory.items(), key=lambda x: x[1])
OR
from collections import OrderedDict
sorted_dictinory = OrderedDict(sorted(dictinory.items(), key=lambda x: x[1]))

Finding the dictionary keys whose values are numerically highest

Given a Python dict of the form:
dict = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258, ......}
Is there an easy way to print the first x keys with the highest numeric values? That is, say:
Beth 9102
Cecil 3258
Currently this is my attempt:
max = 0
max_word = ""
for key, value in w.word_counts.iteritems():
if value > max:
if key not in stop_words:
max = value
max_word = key
print max_word
I'd simply sort the items by the second value and then pick the first K elements :
d_items = sorted(d.items(), key=lambda x: -x[1])
print d_items[:2]
[('Beth', 9102), ('Cecil', 3258)]
The complexity of this approach is O(N log N + K), not that different from optimal O(N + K log K) (using QuickSelect and sorting just the first K elements).
Using collections.Counter.most_common:
>>> from collections import Counter
>>> d = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258}
>>> c = Counter(d)
>>> c.most_common(2)
[('Beth', 9102), ('Cecil', 3258)]
It uses sorted (O(n*log n)), or heapq.nlargest(k) that might be faster than sorted if k << n, or max() if k==1.
>>> (sorted(dict.items(), key=lambda x:x[1]))[:2]
[('Alice', 2341), ('Cecil', 3258)]
items = sorted(w.word_counts.items(), lambda x, y: cmp(x[1], y[1]), None, True)
items[:5]
Replace 5 with the number of elements you want to get.
d = {'Alice': 2341, 'Beth': 9102, 'Cecil': 3258}
vs = sorted(d, key=d.get,reverse=True)
l = [(x,d.get(x)) for x in vs[0:2]]
n [4]: l
Out[4]: [('Beth', 9102), ('Cecil', 3258)]
Convert dict to list of tuples [(2341, 'Alice'), ...] then sort it (without key=lambda ...).

Categories