How do you nest list comprehensions - python

I have been assigned homework and have spent hours running in circles attempting to nest comprehensions. Specifically, I am attempting to find vowels in a string (say, S = 'This is an easy assignment') and have it return the vowels from the string in a list (so, [1], [1], [1], [2], [3])
I figured out quickly [len(x) for x in S.lower().split()]
to give the length of the words, but cannot successfully get it to produce the required output. This problem cannot use anything but list comprehensions.

Something like:
[[len([v for v in x if v in 'aeiou'])] for x in S.lower().split()]
Not sure why you'd want the counts to be single element lists themselves, but that would do it with only list comprehensions (plus a len call).
Note: A suggested edit was to use [sum(x.count(v) for v in 'aeiou') for x in S.lower().split()]; this is a reasonable way to do it (though x.count would mean traversing the string five times; in CPython it might still be faster, but in JIT-ed interpreters a single pass is likely superior), and to keep it use list comprehensions, not generator expressions, you'd have to pointlessly realize the list before summing: [sum([x.count(v) for v in 'aeiou']) for x in S.lower().split()]

Related

Index error trying to deduplicate a list only using if and for

My goal is to write a program that removes duplicated elements from a list – E.g., [2,3,4,5,4,6,5] →[2,3,4,5,6] without the function set (only if and for)
At the end of my code I got stuck. I have tried to change everything in the if statement but it got me to nowhere, same error repeating itself here:
n=int(input('enter the number of elements in your list '))
mylist=[]
for i in range (n):
for j in range (n):
ele=input(' ')
if mylist[i]!=mylist[j]: ***here is the error exactly , I dont really understand what does the out-of-range above problem have to do with the if statement right here***
mylist.append(ele)
print(mylist)
However, I changed nearly everything and I still got the following error:
if mylist[i]!=mylist[j]:
IndexError: list index out of range
Why does this issue keep coming back? ps: I cant use the function set because I am required to use if and for only
You can do this all with a list comprehension that checks if the value has previously appeared in the list slice that you have already looped over.
mylist = [1,2,3,3,4,5,6,4,1,2,4,3]
outlist = [x for index, x in enumerate(mylist) if x not in mylist[:index]]
print(outlist)
Output
[1, 2, 3, 4, 5, 6]
In case you're not familiar with list comprehensions yet, the above comprehension is functionally equivalent to:
outlist = []
for index, x in enumerate(mylist):
if x not in mylist[:index]:
outlist.append(x)
print(outlist)
A set would be more efficient, but
>>> def dedupe(L):
... seen = []
... return [seen.append(e) or e for e in L if e not in seen]
...
>>> dedupe([2,3,4,5,4,6,5])
[2, 3, 4, 5, 6]
This is not just a toy solution. Because sets can only contain hashable types, you might really have reason to resort to this kind of thing. This approach entails a linear search of the seen items, which is fine for small cases, but is slow compared to hashing.
It still may be possible to do better than this in some cases. Even if you can't hash them, if you can at least sort the seen elements, then you can speed up the search to logarithmic time by using a binary search. (See bisect).

Dictionary Comprehension from List of Dictionaries [duplicate]

I have a list of tuples with duplicates and I've converted them to a dictionary using this code I found here:
https://stackoverflow.com/a/61201134/2415706
mylist = [(a,1),(a,2),(b,3)]
result = {}
for i in mylist:
result.setdefault(i[0],[]).append(i[1])
print(result)
>>> result = {a:[1,2], b:[3]}
I recall learning that most for loops can be re-written as comprehensions so I wanted to practice but I've failed for the past hour to make one work.
I read this: https://stackoverflow.com/a/56011919/2415706 and now I haven't been able to find another library that does this but I'm also not sure if this comprehension I want to write is a bad idea since append mutates things.
Comprehension is meant to map items in a sequence independent of each other, and is not suitable for aggregations such as the case in your question, where the sub-list an item appends to depends on a sub-list that a previous item appends to.
You can produce the desired output with a nested comprehension if you must, but it would turn what would've been solved in O(n) time complexity with a loop into one that takes O(n ^ 2) instead:
{k: [v for s, v in mylist if s == k] for k, _ in mylist}

python array creation syntax [for in range]

I came across the following syntax to create a python array. It is strange to me.
Can anyone explain it to me? And how should I learn this kind of syntax?
[str(index) for index in range(100)]
First of all, this is not an array. This is a list. Python does have built-in arrays, but they are rarely used (google the array module, if you're interested). The structure you see is called list comprehension. This is the fastest way to do vectorized stuff in pure Python. Let's get through some examples.
Simple list comprehensions are written this way:
[item for item in iterable] - this will build a list containing all items of an iterable.
Actually, you can do something with each item using an expression or a function: [item**2 for item in iterable] - this will square each element, or [f(item) for item in iterable] - f is a function.
You can even add if and else statements like this [number for number in xrange(10) if not number % 2] - this will create a list of even numbers; ['even' if not number % 2 else 'odd' for number in range(10)] - this is how you use else statements.
You can nest list comprehensions [[character for character in word] for word in words] - this will create a list of lists. List comprehensions are similar to generator expressions, so you should google Python docs for additional information.
List comprehensions and generator expressions are among the most powerful and valuable Python features. Just start an interactive session and play for a while.
P.S.
There are other types of comprehensions that create sets and dictionaries. They use the same concept. Google them for additional information.
List comprehension itself is concept derived from mathematics' set comprehension, where to get new set, you specify parent set and the rule to filter out its elements.
In its simplest but full form list comprehension looks like this:
[f(i) for i in range(1000) if i % 2 == 0]
range(1000) - set of values you iterates through. It could be any iterable (list, tuple etc). range is just a function, which returns list of consecutive numbers, e.g. range(4) -> [0, 1, 2, 3]
i - variable will be assigned on each iteration.
if i%2 == 0 - rule condition to filter values. If condition is not True, resulting list will not contain this element.
f(i) - any python code or function on i, result of which will be in resulting list.
For understand concept of list comprehensions, try them out in python console, and look at output. Here is some of them:
[i for i in [1,2,3,4]]
[i for i in range(10)]
[i**2 for i in range(10)]
[max(4, i) for i in range(10)]
[(1 if i>5 else -1) for i in range(10)]
[i for i in range(10) if i % 2 == 0]
I recommend you to unwrap all comprehensions you face into for-loops to better understand their mechanics and syntax until you get used to them. For example, your comprehension can be unwrapped this way:
newlist = []
for index in range(100)
newlist.append(str(index))
I hope it's clear now.

python list.iteritems replacement

I've got a list in which some items shall be moved into a separate list (by a comparator function). Those elements are pure dicts. The question is how should I iterate over such list.
When iterating the simplest way, for element in mylist, then I don't know the index of the element. There's no .iteritems() methods for lists, which could be useful here. So I've tried to use for index in range(len(mylist)):, which [1] seems over-complicated as for python and [2] does not satisfy me, since range(len()) is calculated once in the beginning and if I remove an element from the list during iteration, I'll get IndexError: list index out of range.
Finally, my question is - how should I iterate over a python list, to be able to remove elements from the list (using a comparator function and put them in another list)?
You can use enumerate function and make a temporary copy of the list:
for i, value in enumerate(old_list[:]):
# i == index
# value == dictionary
# you can safely remove from old_list because we are iterating over copy
Creating a new list really isn't much of a problem compared to removing items from the old one. Similarly, iterating twice is a very minor performance hit, probably swamped by other factors. Unless you have a very good reason to do otherwise, backed by profiling your code, I'd recommend iterating twice and building two new lists:
from itertools import ifilter, ifilterfalse
l1 = list(ifilter(condition, l))
l2 = list(ifilterfalse(condition, l))
You can slice-assign the contents of one of the new lists into the original if you want:
l[:] = l1
If you're absolutely sure you want a 1-pass solution, and you're absolutely sure you want to modify the original list in place instead of creating a copy, the following avoids quadratic performance hits from popping from the middle of a list:
j = 0
l2 = []
for i in range(len(l)):
if condition(l[i]):
l[j] = l[i]
j += 1
else:
l2.append(l[i])
del l[j:]
We move each element of the list directly to its final position without wasting time shifting elements that don't really need to be shifted. We could use for item in l if we wanted, and it'd probably be a bit faster, but when the algorithm involves modifying the thing we're iterating over, I prefer the explicit index.
I prefer not to touch the original list and do as #Martol1ni, but one way to do it in place and not be affected by the removal of elements would be to iterate backwards:
for i in reversed(range(len()):
# do the filtering...
That will affect only the indices of elements that you have tested/removed already
Try the filter command, and you can override the original list with it too if you don't need it.
def cmp(i): #Comparator function returning a boolean for a given item
...
# mylist is the initial list
mylist = filter(cmp, mylist)
mylist is now a generator of suitable items. You can use list(mylist) if you need to use it more than once.
Haven't tried this yet but.. i'll give it a quick shot:
new_list = [old.pop(i) for i, x in reversed(list(enumerate(old))) if comparator(x)]
You can do this, might be one line too much though.
new_list1 = [x for x in old_list if your_comparator(x)]
new_list2 = [x for x in old_list if x not in new_list1]

Fast String within List Searching

Using Python 3, I have a list containing over 100,000 strings (list1), each 300 characters long at most. I also have a list of over 9 million substrings (list2)--I want to count how many elements a substring in list2 appears in. For instance,
list1 = ['cat', 'caa', 'doa', 'oat']
list2 = ['at', 'ca', 'do']
I want the function to return (mapped to list2):
[2, 2, 1]
Normally, this is very simple and requires very little. However, due to the massive size of the lists, I have efficiency issues. I want to find the fastest way to return that counter list.
I've tried list comprehensions, generators, maps, loops of all kinds and I have yet to find an fast way to do this easy task. What is theoretically the fastest way to complete this objective, preferably taking O(len(list2)) steps very quickly?
set M = len(list1) and N = len(list2)
For each of the N entries in list2 you're going to have to do M comparisons against the entries in list1. That's a worst case running time of O(M x N). If you take it further, lets take each entry in list2 to be of length 1 and each entry in list1 to be of length 300, then you got a running time of O(300M x N).
If performance is really an issue, try dynamic programming. Here is a start:
1) sort list2 in ascending order of length like so:
['scorch', 'scorching', 'dump', 'dumpster', 'dumpsters']
2) sort them into sublists such that each preceeding entry is a subset of the proceeding entry like so:
[['scorch', 'scorching'] , ['dump', 'dumpster', 'dumpsters']]
3) Now if you compare against list1 and 'scorch' is not in there, then you don't have to search for 'scorching' either. Likewise if 'dump' is not in there, neither is 'dumpster' or 'dumpsters'
note the worst case run time is still the same
I believe this task can be solved in linear time with an Aho Corasick string matching machine.
See this answer for additional information (maybe you get ideas from the other answers to that question, too - it is almost the same task and I think Aho Corasick is the theoretically fastest way to solve this).
You will have to modify the string matching machine in such way, that instead of returning the match it increases the counter of every matched substring by one. (This should be only a minor modification).
Not sure how you could avoid having some sort of O(n**2) algorithm. Here is a simple implementation.
>>> def some_sort_of_count(list1, list2):
>>> return [sum(x in y for y in list1) for x in list2]
>>>
>>> list1 = ['cat', 'caa', 'doa', 'oat']
>>> list2 = ['at', 'ca', 'do']
>>> some_sort_of_count(list1, list2)
[2, 2, 1]

Categories