Related
I need to apply the map reduce function from MRJob and I can't arrive.
I have a big list with two codes and a sentence, as following:
enter code here
L = ['E-0053 C-0169 It's goig to be a good day\n', 'D-0312 B-0291 Peter has arrived late\n', 'A-
0417 B-0187 for more information please call the following number\n']
I need to use map reduce to obtain a list that counts the number of words that have each sentences for each pair of combinations of letter from the code. For example, the solution with the list example would be:
enter code here
[EC 6, DB 4, AB 8]
I've tried with:
enter code here
C1 = [i [0] for i in L]
C2 = [i [7] for i in L]
C1_C2 = [C1[i]+C2[i] for i in range(len(C1))]
class count(MRJob):
def mapper(self, _, C1_C2):
[elem.split() for elem in L]
yield C1_C2, [(len(i)-2) for i in sentence]
def reducer(self, key, values):
yield key, sum(values)
count.run()
You could try this :
L = [
"E-0053 C-0169 It's goig to be a good day\n",
"D-0312 B-0291 Peter has arrived late\n",
"A-0417 B-0187 for more information please call the following number\n"
]
result = [i[0] + i[7] + " " + str(len(i.split()) - 2) for i in L]
print(result)
Output :
['EC 7', 'DB 4', 'AB 8']
I want to sort the following dictionary by the keys score, which is an array.
student = {"name" : [["Peter"], ["May"], ["Sharon"]],
"score" : [[1,5,3], [3,2,6], [5,9,2]]}
For a better representation:
Peter ---- [1,5,3]
May ---- [3,2,6]
Sharon ---- [5,9,2]
Here I want to use the second element of score to sort the name into a list.
The expected result is:
name_list = ("May ", "Peter", "Sharon")
I have tried to use
sorted_x = sorted(student.items(), key=operator.itemgetter(1))
and
for x in sorted(student, key=lambda k: k['score'][1]):
name_list.append(x['name'])
but both dont work.
First zip the students name and score use zip.
zip(student["name"], student["score"])
you will get better representation:
[(['May'], [3, 2, 6]),
(['Peter'], [1, 5, 3]),
(['Sharon'], [5, 9, 2])]
then sort this list and get the student name:
In [10]: [ i[0][0] for i in sorted(zip(student["name"], student["score"]), key=lambda x: x[1][1])]
Out[10]: ['May', 'Peter', 'Sharon']
Know about the sorted buildin function first: https://docs.python.org/2/howto/sorting.html#sortinghowto
This would probably work for you:
student = {
"name": [["Peter"], ["May"], ["Sharon"]],
"score": [[1,5,3], [3,2,6], [5,9,2]]
}
# pair up the names and scores
# this gives one tuple for each student:
# the first item is their name as a 1-item list
# the second item is their list of scores
pairs = zip(student['name'], student['score'])
# sort the tuples by the second score:
pairs_sorted = sorted(
pairs,
key=lambda t: t[1][1]
)
# get the names, in order
names = [n[0] for n, s in pairs_sorted]
print names
# ['May', 'Peter', 'Sharon']
If you want to use dictionaries, I suggest using OrderedDict:
name = ["Peter", "May", "Sharon"]
score = [[1,5,3], [3,2,6], [5,9,2]]
d = {n: s for (n, s) in zip(name, score)}
from collections import OrderedDict
ordered = OrderedDict(sorted(d.items(), key=lambda t: t[1][1]))
list(ordered) # to retrieve the names
But, if not, the following would be a simpler approach:
name = ["Peter", "May", "Sharon"]
score = [[1,5,3], [3,2,6], [5,9,2]]
d = [(n, s) for (n, s) in zip(name, score)]
ordered = sorted(d, key=lambda t: t[1][1])
names_ordered = [item[0] for item in ordered]
I'm using a dictionary with the id as the key and the names as the values. What I'm trying to do is get the names in the values that have the same name in them and put them in a list. Like for example with the name tim:
{'id 1': ['timmeh', 'user543', 'tim'], 'id 2': ['tim', 'timmeh', '!anon0543']}
whois_list = ['timmeh', 'user543', 'tim', '!anon0543']
The bot would append the names that are not in list yet. This is the code to execute this example:
def who(name):
whois_list = []
if not any(l for l in whois.whoisDB.values() if name.lower() in l):
return 'No alias found for <b>%s</b>." % name.title()
else:
for l in whois.whoisDB.values():
if name.lower() in l:
for names in l:
if names not in whois_list
whois_list.append(names)
return "Possible alias found for <b>%s</b>: %s" % (name.title(), whois_list)
The issue is: I do not want to have a double loop in this code, but I'm not really sure how to do it, if it's possible.
A logically equivalent, but shorter and more efficient solution is to use sets instead of lists.
Your innermost for loop simply extends whois_list with every non-duplicate name in l. If you originally define whois_list = set([]) then you can replace the three lines of the inner for loop with:
whois_list = whois_list.union(l)
For example,
>>> a = set([1,2,3])
>>> a = a.union([3,4,5])
>>> a
set([1, 2, 3, 4, 5])
You'll notice a prints out slightly differently, indicating that it is a set instead of a list. If this is a problem, you could convert it right before your return statement as in
>>> a = list(a)
>>> a
[1, 2, 3, 4, 5]
I have a list of tuples, for example.
[('ABC', 'Abcair', 1.50), ('DEF', 'Defir', 5.60), ('GHI', 'Ghiair',3.22), ('ANZ', 'Anzplace', 26.25), ('ARG', 'Argair', 12.22), ('CEN', 'Cenair', 11.22), ('CNU', 'Cununun',3.01)]
I have an input command as such
code_input = input('Please list portfolio: ').upper()
Where a person will input any number of 3 letter codes separated by a comma, which I then format using
no_spaces_codes = code_input.replace(" ", "")
code_list = no_spaces_codes.split(",")
So, "Ank , ABc,DEF" becomes ['ANK', 'ABC', 'DEF']
Then I print these headings formatted
header="{0:<6}{1:<20}{2:>8}".format("Code","Place","Number")
print(header)
I then need to search the list of tuples for the 3 letter codes and print the values under the
headings formatted the same way eg and codes not in the list will not be printed.
Code Name Price
ABC Abcair 5.30
DEF Defair 11.22
I have gotten this far.
for code in b:
if code[0] == (code_list[1]):
print(code[:])
break
Which prints
Code Name Price
('CEN', 'Contact', 11.22)
But I cannot get any further than this.
You can do that with:
place, price = next((c[1:] for c in b if c[0] == code_input), ('Not found', 0))
but you really want to use a dictionary instead:
code_dict = {k: (v, p) for k, v, p in b}
after which matching becomes a simple lookup:
place, price = code_dict.get(code_input, ('Not found', 0))
Demonstration:
>>> b = [('ABC', 'Abcair', 1.50), ('DEF', 'Defir', 5.60), ('GHI', 'Ghiair',3.22), ('ANZ', 'Anzplace', 26.25), ('ARG', 'Argair', 12.22), ('CEN', 'Cenair', 11.22), ('CNU', 'Cununun',3.01)]
>>> code_input = 'CEN'
>>> place, price = next((c[1:] for c in b if c[0] == code_input), ('Not found', 0))
>>> print code_input, place, price
CEN Cenair 11.22
>>> code_dict = {k: (v, p) for k, v, p in b}
>>> place, price = code_dict.get(code_input, ('Not found', 0))
>>> print code_input, place, price
CEN Cenair 11.22
With the code_dict mapping, lookups will be much, much faster when doing multiple lookups, especially when there are non-existing entries in the list. To put this together with the rest of your code:
code_input = input('Please list portfolio: ').upper()
code_dict = {k: (v, p) for k, v, p in b}
line="{0:<6}{1:<20}{2:>8}"
print line.format("Code", "Place", "Number")
for code in code_input.split(','):
code = code.strip()
if code not in code_dict:
continue # skip codes not in the mapping
place, price = code_dict[code]
print line.format(code, place, price)
Which for your "Ank , ABc,DEF" input would print:
Code Place Number
ABC Abcair 1.5
DEF Defir 5.6
result = [v for v in list_of_tuples if v[0] in code_list]
for v in result:
print(v) # Or format 'v' tuple in any way you want.
I'm creating a class where one of the methods inserts a new item into the sorted list. The item is inserted in the corrected (sorted) position in the sorted list. I'm not allowed to use any built-in list functions or methods other than [], [:], +, and len though. This is the part that's really confusing to me.
What would be the best way in going about this?
Use the insort function of the bisect module:
import bisect
a = [1, 2, 4, 5]
bisect.insort(a, 3)
print(a)
Output
[1, 2, 3, 4, 5]
Hint 1: You might want to study the Python code in the bisect module.
Hint 2: Slicing can be used for list insertion:
>>> s = ['a', 'b', 'd', 'e']
>>> s[2:2] = ['c']
>>> s
['a', 'b', 'c', 'd', 'e']
You should use the bisect module. Also, the list needs to be sorted before using bisect.insort_left
It's a pretty big difference.
>>> l = [0, 2, 4, 5, 9]
>>> bisect.insort_left(l,8)
>>> l
[0, 2, 4, 5, 8, 9]
timeit.timeit("l.append(8); l = sorted(l)",setup="l = [4,2,0,9,5]; import bisect; l = sorted(l)",number=10000)
1.2235019207000732
timeit.timeit("bisect.insort_left(l,8)",setup="l = [4,2,0,9,5]; import bisect; l=sorted(l)",number=10000)
0.041441917419433594
I'm learning Algorithm right now, so i wonder how bisect module writes.
Here is the code from bisect module about inserting an item into sorted list, which uses dichotomy:
def insort_right(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the right of the rightmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]:
hi = mid
else:
lo = mid+1
a.insert(lo, x)
If there are no artificial restrictions, bisect.insort() should be used as described by stanga. However, as Velda mentioned in a comment, most real-world problems go beyond sorting pure numbers.
Fortunately, as commented by drakenation, the solution applies to any comparable objects. For example, bisect.insort() also works with a custom dataclass that implements __lt__():
from bisect import insort
#dataclass
class Person:
first_name: str
last_name: str
age: int
def __lt__(self, other):
return self.age < other.age
persons = []
insort(persons, Person('John', 'Doe', 30))
insort(persons, Person('Jane', 'Doe', 28))
insort(persons, Person('Santa', 'Claus', 1750))
# [Person(first_name='Jane', last_name='Doe', age=28), Person(first_name='John', last_name='Doe', age=30), Person(first_name='Santa', last_name='Claus', age=1750)]
However, in the case of tuples, it would be desirable to sort by an arbitrary key. By default, tuples are sorted by their first item (first name), then by the next item (last name), and so on.
As a solution you can manage an additional list of keys:
from bisect import bisect
persons = []
ages = []
def insert_person(person):
age = person[2]
i = bisect(ages, age)
persons.insert(i, person)
ages.insert(i, age)
insert_person(('John', 'Doe', 30))
insert_person(('Jane', 'Doe', 28))
insert_person(('Santa', 'Claus', 1750))
Official solution: The documentation of bisect.insort() refers to a recipe how to use the function to implement this functionality in a custom class SortedCollection, so that it can be used as follows:
>>> s = SortedCollection(key=itemgetter(2))
>>> for record in [
... ('roger', 'young', 30),
... ('angela', 'jones', 28),
... ('bill', 'smith', 22),
... ('david', 'thomas', 32)]:
... s.insert(record)
>>> pprint(list(s)) # show records sorted by age
[('bill', 'smith', 22),
('angela', 'jones', 28),
('roger', 'young', 30),
('david', 'thomas', 32)]
Following is the relevant extract of the class required to make the example work. Basically, the SortedCollection manages an additional list of keys in parallel to the items list to find out where to insert the new tuple (and its key).
from bisect import bisect_left
class SortedCollection(object):
def __init__(self, iterable=(), key=None):
self._given_key = key
key = (lambda x: x) if key is None else key
decorated = sorted((key(item), item) for item in iterable)
self._keys = [k for k, item in decorated]
self._items = [item for k, item in decorated]
self._key = key
def __getitem__(self, i):
return self._items[i]
def __iter__(self):
return iter(self._items)
def insert(self, item):
'Insert a new item. If equal keys are found, add to the left'
k = self._key(item)
i = bisect_left(self._keys, k)
self._keys.insert(i, k)
self._items.insert(i, item)
Note that list.insert() as well as bisect.insort() have O(n) complexity. Thus, as commented by nz_21, manually iterating through the sorted list, looking for the right position, would be just as good in terms of complexity. In fact, simply sorting the array after inserting a new value will probably be fine, too, since Python's Timsort has a worst-case complexity of O(n log(n)). For completeness, however, note that a binary search tree (BST) would allow insertions in O(log(n)) time.
This is a possible solution for you:
a = [15, 12, 10]
b = sorted(a)
print b # --> b = [10, 12, 15]
c = 13
for i in range(len(b)):
if b[i] > c:
break
d = b[:i] + [c] + b[i:]
print d # --> d = [10, 12, 13, 15]
# function to insert a number in an sorted list
def pstatement(value_returned):
return print('new sorted list =', value_returned)
def insert(input, n):
print('input list = ', input)
print('number to insert = ', n)
print('range to iterate is =', len(input))
first = input[0]
print('first element =', first)
last = input[-1]
print('last element =', last)
if first > n:
list = [n] + input[:]
return pstatement(list)
elif last < n:
list = input[:] + [n]
return pstatement(list)
else:
for i in range(len(input)):
if input[i] > n:
break
list = input[:i] + [n] + input[i:]
return pstatement(list)
# Input values
listq = [2, 4, 5]
n = 1
insert(listq, n)
Well there are many ways to do this, here is a simple naive program to do the same using inbuilt Python function sorted()
def sorted_inserter():
list_in = []
n1 = int(input("How many items in the list : "))
for i in range (n1):
e1 = int(input("Enter numbers in list : "))
list_in.append(e1)
print("The input list is : ",list_in)
print("Any more items to be inserted ?")
n2 = int(input("How many more numbers to be added ? : "))
for j in range (n2):
e2= int(input("Add more numbers : "))
list_in.append(e2)
list_sorted=sorted(list_in)
print("The sorted list is: ",list_sorted)
sorted_inserter()
The output is
How many items in the list : 4
Enter numbers in list : 1
Enter numbers in list : 2
Enter numbers in list : 123
Enter numbers in list : 523
The input list is : [1, 2, 123, 523]
Any more items to be inserted ?
How many more numbers to be added ? : 1
Add more numbers : 9
The sorted list is: [1, 2, 9, 123, 523]
To add to the existing answers: When you want to insert an element into a list of tuples where the first element is comparable and the second is not you can use the key parameter of the bisect.insort function as follows:
import bisect
class B:
pass
a = [(1, B()), (2, B()), (3, B())]
bisect.insort(a, (3, B()), key=lambda x: x[0])
print(a)
Without the lambda function as the third parameter of the bisect.insort function the code would throw a TypeError as the function would try to compare the second element of a tuple as a tie breaker which isn't comparable by default.
This is the best way to append the list and insert values to sorted list:
a = [] num = int(input('How many numbers: ')) for n in range(num):
numbers = int(input('Enter values:'))
a.append(numbers)
b = sorted(a) print(b) c = int(input("enter value:")) for i in
range(len(b)):
if b[i] > c:
index = i
break d = b[:i] + [c] + b[i:] print(d)`