How do I flatten nested lists of lists in python? - python

I have created a function to split paths into lists of directories in python like so:
splitAllPaths = lambda path: flatten([[splitAllPaths(start), end] if start else end for (start, end) in [os.path.split(path)]])
with this helper function:
#these only work one directory deep
def flatten(list_of_lists):
return list(itertools.chain.from_iterable(list_of_lists))
The output from this function looks like so:
> splitAllPaths('./dirname/dirname2/foo.bar')
[[[['.'], 'dirname'], 'dirname2'], 'foo.bar']
now I want this as a flat list. my attempts are as follows (with the output):
> flatten(splitAllPaths('./diname/dirname2/foo.bar'))
['.', 'd', 'i', 'r', 'n', 'a', 'm', 'e', 'd', 'i', 'r', 'n', 'a', 'm', 'e', '2', 'f', 'o', 'o', '.', 'b', 'a', 'r']
and
> reduce(list.__add__, (list(mi) for mi in splitAllPaths('./dirname/dirname2/foo.bar')))
me2/foo.bar')))
[[['.'], 'dirname'], 'dirname2', 'f', 'o', 'o', '.', 'b', 'a', 'r']
How do I unfold this list correctly (I would also welcome any suggestions for how to improve my splitAllPaths function)?

This a less general answer, but it solves your original problem -- although its elegance is debatable.
The main idea is the fact that generating a list with the reversed (as in ['file', 'user', 'home', '/'] order is quite easy, so you can just create that and reverse it in the end. So it boils down to:
def split_paths(path):
def split_paths_reverse(path):
head, tail = os.path.split(path)
while head and tail:
yield tail
head, tail = os.path.split(head)
yield head
return reversed(tuple(split_paths_reverse(path)))
Example:
test = '/home/user/file.txt'
print(list(split_paths(test)))
['/', 'home', 'user', 'file.txt']
You could also avoid the explicit reversing part by putting each element in a stack and then removing them, but that's up to you.

Sortherst way that comes in mind would be:
listoflists = [[[['.'], 'dirname'], 'dirname2'], 'foo.bar']
str(listoflists).translate(None,"[]'").split(',')

I solved this by writing a (non-general) foldr. I think better, more practical solutions are provided by #L3viathan in the comments.
attempt = lambda list: attempt(list[0] + list[1:]) if len(list[0]) > 1 else list[0] + list[1:]
Output
> attempt([[[['.'], 'dirname'], 'dirname2'], 'foo.bar'])
['.', 'dirname', 'dirname2', 'foo.bar']
I've also now written it in terms of a general foldr1
> foldr1 = lambda func, list: foldr1(func, func(list[0], list[1:])) if len(list[0]) > 1 else func(list[0], list[1:])
> foldr1(list.__add__, [[[['.'], 'dirname'], 'dirname2'], 'foo.bar'])
['.', 'dirname', 'dirname2', 'foo.bar']
NOTE: Could someone more familiar than me confirm that this is a foldr and not a foldl (I often get them confused).

Related

Is passing an array slice to a function in Python an O(1) or O(N) operation?

I have a function that takes an input string of characters and reverses them according to the white space breaks.
For example:
input: arr = [ 'p', 'e', 'r', 'f', 'e', 'c', 't', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'r', 'a', 'c', 't', 'i', 'c', 'e' ]
output: [ 'p', 'r', 'a', 'c', 't', 'i', 'c', 'e', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'e', 'r', 'f', 'e', 'c', 't' ]
To reverse the 'words', I use the following function
def reverse_word(arr):
i = 0
j = len(arr) - 1
while i < j:
arr[j], arr[i] = arr[i], arr[j]
i += 1
j -= 1
return arr
def reverse_words(arr):
arr.reverse()
p1 = 0
for i, v in enumerate(arr):
if v == ' ':
if arr[p1] != ' ':
arr[p1:i] = reverse_word(arr[p1:i])
p1 = i + 1
arr[p1:] = reverse_word(arr[p1:])
return arr
My question is: Is the call to reverse an O(1) or O(N) space operation? I assumed O(N) but someone else said it was O(1). I assumed O(N) because in the worst case, with one word, the entire array will need to be copied to the stackcall. Space is not "constant" because the space size allocated to the call is dependent on the input length.
To answer your question first: Yes the reverse function you defined is an O(1) space operation(even though it's wrong and will never end). The reason is, when you pass in a list to the function in python, it does not copy the whole list, it passes it's reference(or the pointer, if you are familiar with C concepts). So no matter how long your array is, the space usage is constant.
However, your question alone may be meaningful, but in this program, it does not matter. We all know for an algorithm, the big-O for space and time is determined by the largest part of the algorithm. You actually other operations in your code that has O(N) space operation. For example, reversed() function generates a whole new list.
BTW, it's not the best practice to define a function that has the same name with other methods you may use(in this case, reverse).

Cannot find glitch in program using recursion for multible nested for-loops

alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z']
endlist = []
def loopfunc(n, lis):
if n ==0:
endlist.append(lis[0]+lis[1]+lis[2]+lis[3]+lis[4])
for i in alphabet:
if n >0:
lis.append(i)
loopfunc(n-1, lis )
loopfunc(5, [])
This program is supposed to make endlist be:
endlist = [aaaaa, aaaab, aaaac, ... zzzzy, zzzzz]
But it makes it:
endlist = [aaaaa, aaaaa, aaaaa, ... , aaaaa]
The lenght is right, but it won't make different words. Can anyone help me see why?
The only thing you ever add to endlist is the first 5 elements of lis, and since you have a single lis that is shared among all the recursive calls (note that you never create a new list in this code other than the initial values for endlist and lis, so every append to lis is happening to the same list), those first 5 elements are always the a values that you appended in your first 5 recursive calls. The rest of the alphabet goes onto the end of lis and is never reached by any of your other code.
Since you want string in the end, it's a little easier just to use strings for collecting your items. This avoids the possibility of shared mutable references which is cause your issues. With that the recursion becomes pretty concise:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def loopfunc(n, lis=""):
if n < 1:
return [lis]
res = []
for a in alphabet:
res.extend(loopfunc(n-1, lis + a))
return res
l = loopfunc(5)
print(l[0], l[1], l[-1], l[-2])
# aaaaa aaaab zzzzz zzzzy
Note that with n=5 you'll have almost 12 million combinations. If you plan on having larger n values, it may be worth rewriting this as a generator.

python function argument update

I am working on a function to print the list elements in reverse order using recursion. I came up with the following code
class Solution:
def __init__(self):
self.count=0
def reverseString(self, s):
def helper(s):
"""
Do not return anything, modify s in-place instead.
"""
print(s[:])
if len(s)>1:
s[0],s[len(s)-1]=s[len(s)-1],s[0]
print('s[0]',s[0])
print('s[len(s)-1]',s[len(s)-1])
helper(s[1:len(s)-1])
helper(s)
As you see, I am using print statements to debug the code. I get the following output
['h', 'e', 'l', 'p', 'o']
s[0] o
s[len(s)-1] h
['e', 'l', 'p']
s[0] p
s[len(s)-1] e
['l']
['o', 'e', 'l', 'p', 'h']
I see that my logic is working that there is something fundamental I am missing about variable update at local and global level. Can someone explain to me why I am swapping the first and last list element but my list output is not correct? I expect the output to be ['o', 'p', 'l', 'e', 'h']
On the other hand below modification seems to work fine
class Solution:
def __init__(self):
self.count=0
def reverseString(self, s):
def helper(left,right):
"""
Do not return anything, modify s in-place instead.
"""
print(s[:])
if left<right:
s[left],s[right]=s[right],s[left]
print('s[0]',s[left])
print('s[len(s)-1]',s[right])
helper(left+1,right-1)
helper(0,len(s)-1)
x=Solution()
s=["h","e","l","p","o"]
x.reverseString(s)
print(s)
['h', 'e', 'l', 'p', 'o']
s[0] o
s[len(s)-1] h
['o', 'e', 'l', 'p', 'h']
s[0] p
s[len(s)-1] e
['o', 'p', 'l', 'e', 'h']
['o', 'p', 'l', 'e', 'h']
I looked at the discussion Python inplace update of function arguments? and Immutable vs Mutable types which could possibly be related.
Your code essentially swaps two elements together and in your last line of code, you are swapping only the first and last. Your code should find a way to swap all elements not just the first and last.
I think something like the code below might do the trick.
Notice how the print is done after the recursive call, this is done on purpose, because when the call stack returns it executes all the calls in reverse order, which applies for the prints statements after the recursive calls.
def reverse_print(list):
if list: # As long as the list is not empty proceed with the recursion / print.
reverse_print(list[1:]) # Start from the next list element.
print(list[0]) # The print statement is after the recursive call, on purpose.
else: # The list is been reduced one element at a time until it reaches 0 length.
return
s = ["h", "e", "l", "p", "o"]
reverse_print(s)
When run this prints the list of strings containing a single character in reverse order:
o p l e h

Is that a tag list or something else?

I am new to NLP and NLTK, and I want to find ambiguous words, meaning words with at least n different tags. I have this method, but the output is more than confusing.
Code:
def MostAmbiguousWords(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
if wordsUniqeTags.has_key(w):
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
else:
wordsUniqeTags[w] = set([t])
# Starting to count
res = []
for w in wordsUniqeTags:
if len(wordsUniqeTags[w]) >= n:
res.append((w, wordsUniqeTags[w]))
return res
MostAmbiguousWords(brown.tagged_words(), 13)
Output:
[("what's", set(['C', 'B', 'E', 'D', 'H', 'WDT+BEZ', '-', 'N', 'T', 'W', 'V', 'Z', '+'])),
("who's", set(['C', 'B', 'E', 'WPS+BEZ', 'H', '+', '-', 'N', 'P', 'S', 'W', 'V', 'Z'])),
("that's", set(['C', 'B', 'E', 'D', 'H', '+', '-', 'N', 'DT+BEZ', 'P', 'S', 'T', 'W', 'V', 'Z'])),
('that', set(['C', 'D', 'I', 'H', '-', 'L', 'O', 'N', 'Q', 'P', 'S', 'T', 'W', 'CS']))]
Now I have no idea what B,C,Q, ect. could represent. So, my questions:
What are these?
What do they mean? (In case they are tags)
I think they are not tags, because who and whats don't have the WH tag indicating "wh question words".
I'll be happy if someone could post a link that includes a mapping of all possible tags and their meaning.
It looks like you have a typo. In this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
you should have set([t]) (not set(t)), like you do in the else case.
This explains the behavior you're seeing because t is a string and set(t) is making a set out of each character in the string. What you want is set([t]) which makes a set that has t as its element.
>>> t = 'WHQ'
>>> set(t)
set(['Q', 'H', 'W']) # bad
>>> set([t])
set(['WHQ']) # good
By the way, you can correct the problem and simplify things by just changing that line to:
wordsUniqeTags[w].add(t)
But, really, you should make use of the setdefault method on dict and list comprehension syntax to improve the method overall. So try this instead:
def most_ambiguous_words(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
wordsUniqeTags.setdefault(w, set()).add(t)
# Starting to count
return [(word,tags) for word,tags in wordsUniqeTags.iteritems() if len(tags) >= n]
You are splitting your POS tags into single characters in this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
set('AT') results in set(['A', 'T']).
How about making use of the Counter and defaultdict functionality in the collections module?
from collection import defaultdict, Counter
def most_ambiguous_words(words, n):
counts = defaultdict(Counter)
for (word,tag) in words:
counts[word][tag] += 1
return [(w, counts[w].keys()) for w in counts if len(counts[word]) > n]

Ordered Sets Python 2.7

I have a list that I'm attempting to remove duplicate items from. I'm using python 2.7.1 so I can simply use the set() function. However, this reorders my list. Which for my particular case is unacceptable.
Below is a function I wrote; which does this. However I'm wondering if there's a better/faster way. Also any comments on it would be appreciated.
def ordered_set(list_):
newlist = []
lastitem = None
for item in list_:
if item != lastitem:
newlist.append(item)
lastitem = item
return newlist
The above function assumes that none of the items will be None, and that the items are in order (ie, ['a', 'a', 'a', 'b', 'b', 'c', 'd'])
The above function returns ['a', 'a', 'a', 'b', 'b', 'c', 'd'] as ['a', 'b', 'c', 'd'].
Another very fast method with set:
def remove_duplicates(lst):
dset = set()
# relies on the fact that dset.add() always returns None.
return [item for item in lst
if item not in dset and not dset.add(item)]
Use an OrderedDict:
from collections import OrderedDict
l = ['a', 'a', 'a', 'b', 'b', 'c', 'd']
d = OrderedDict()
for x in l:
d[x] = True
# prints a b c d
for x in d:
print x,
print
Assuming the input sequence is unordered, here's O(N) solution (both in space and time).
It produces a sequence with duplicates removed, while leaving unique items in the same relative order as they appeared in the input sequence.
>>> def remove_dups_stable(s):
... seen = set()
... for i in s:
... if i not in seen:
... yield i
... seen.add(i)
>>> list(remove_dups_stable(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e']))
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I know this has already been answered, but here's a one-liner (plus import):
from collections import OrderedDict
def dedupe(_list):
return OrderedDict((item,None) for item in _list).keys()
>>> dedupe(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e'])
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I think this is perfectly OK. You get O(n) performance which is the best you could hope for.
If the list were unordered, then you'd need a helper set to contain the items you've already visited, but in your case that's not necessary.
if your list isn't sorted then your question doesn't make sense.
e.g. [1,2,1] could become [1,2] or [2,1]
if your list is large you may want to write your result back into the same list using a SLICE to save on memory:
>>> x=['a', 'a', 'a', 'b', 'b', 'c', 'd']
>>> x[:]=[x[i] for i in range(len(x)) if i==0 or x[i]!=x[i-1]]
>>> x
['a', 'b', 'c', 'd']
for inline deleting see Remove items from a list while iterating or Remove items from a list while iterating without using extra memory in Python
one trick you can use is that if you know x is sorted, and you know x[i]=x[i+j] then you don't need to check anything between x[i] and x[i+j] (and if you don't need to delete these j values, you can just copy the values you want into a new list)
So while you can't beat n operations if everything in the set is unique i.e. len(set(x))=len(x)
There is probably an algorithm that has n comparisons as its worst case but can have n/2 comparisons as its best case (or lower than n/2 as its best case if you know somehow know in advance that len(x)/len(set(x))>2 because of the data you've generated):
The optimal algorithm would probably use binary search to find maximum j for each minimum i in a divide and conquer type approach. Initial divisions would probably be of length len(x)/approximated(len(set(x))). Hopefully it could be carried out such that even if len(x)=len(set(x)) it still uses only n operations.
There is unique_everseen solution described in
http://docs.python.org/2/library/itertools.html
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Looks ok to me. If you really want to use sets do something like this:
def ordered_set (_list) :
result = set()
lastitem = None
for item in _list :
if item != lastitem :
result.add(item)
lastitem = item
return sorted(tuple(result))
I don't know what performance you will get, you should test it; probably the same because of method's overheat!
If you really are paranoid, just like me, read here:
http://wiki.python.org/moin/HowTo/Sorting/
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Just remembered this(it contains the answer):
http://www.peterbe.com/plog/uniqifiers-benchmark

Categories