I am trying to solve a question on Pramp:
Implement a function reverseWords that reverses the order of the words in the array in the most efficient manner.
Ex: arr = [ 'p', 'e', 'r', 'f', 'e', 'c', 't', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'r', 'a', 'c', 't', 'i', 'c', 'e' ]
output: [ 'p', 'r', 'a', 'c', 't', 'i', 'c', 'e', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'e', 'r', 'f', 'e', 'c', 't' ]
The Python-like pseudo-code which they have given is as follows:
function reverseWords(arr):
# reverse all characters:
n = arr.length
mirrorReverse(arr, 0, n-1)
# reverse each word:
wordStart = null
for i from 0 to n-1:
if (arr[i] == ' '):
if (wordStart != null):
mirrorReverse(arr, wordStart, i-1)
wordStart = null
else if (i == n-1):
if (wordStart != null):
mirrorReverse(arr, wordStart, i)
else:
if (wordStart == null):
wordStart = i
return arr
# helper function - reverses the order of items in arr
# please note that this is language dependent:
# if are arrays sent by value, reversing should be done in place
function mirrorReverse(arr, start, end):
tmp = null
while (start < end):
tmp = arr[start]
arr[start] = arr[end]
arr[end] = tmp
start++
end--
They say that the time complexity is O(n), essentially because they are traversing the array twice with a constant number of actions for each item. Co-incidentally, I came up with the exact same approach using stringstreams in C++, but thought that it was not efficient!
I think the time complexity of this snippet should be O(mn), where m is the number of words in the string and n is the average number of alphabets in each word. This is so because we iterate over all the elements in the input and in the worst case, mirrorReverse() visits all the elements again for reversing, for a given i.
Which is correct?
In O(n), n refers to the length of the input (total characters), not the quantity of words. I suspect you're confused because the code uses a variable n in the latter sense.
Note the explanation: "traversing the array ...". "The array" consists of individual characters.
The implementation seems a bit silly to me; it's much more readable to:
Join letter groups into words.
Reverse the word order (including spaces) with a trivial list slice.
Expand the words back into characters.
Related
I have the list "alphabet" that has all the letters, and the program should with a certain word generate a sequence of letters using an number that gives the user, eg:
Word input = "sun"
Shift_number input = 3
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
The output should be "vxq" because the index moved three spaces to the right, my problem is when the moves of the index exceeds the number of variables in the list, eg:
Word input = "zero"
Shift_number = 1
The output should be "afsp" but instead I get this error: "list index out of range". I just need that the index goes to "z" to "a"
Take modulus to stay within the array bounds (index % 26, returning a range between 0-25 in the alphabet array of size 26):
>>> "".join([alphabet[(alphabet.index(i) + 3) % 26] for i in "sun"])
'vxq'
>>> "".join([alphabet[(alphabet.index(i) + 1) % 26] for i in "zero"])
'afsp'
(alphabet.index(i) + N) % 26 will increment the index by N cyclically in your array.
Use itertools.cycle and string.ascii_lowercase:
from itertools import cycle
import string
circular_alphabet = cycle(string.ascii_lowercase)
"".join(next(circular_alphabet ) for _ in range(50))
'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx'
I have a function that takes an input string of characters and reverses them according to the white space breaks.
For example:
input: arr = [ 'p', 'e', 'r', 'f', 'e', 'c', 't', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'r', 'a', 'c', 't', 'i', 'c', 'e' ]
output: [ 'p', 'r', 'a', 'c', 't', 'i', 'c', 'e', ' ',
'm', 'a', 'k', 'e', 's', ' ',
'p', 'e', 'r', 'f', 'e', 'c', 't' ]
To reverse the 'words', I use the following function
def reverse_word(arr):
i = 0
j = len(arr) - 1
while i < j:
arr[j], arr[i] = arr[i], arr[j]
i += 1
j -= 1
return arr
def reverse_words(arr):
arr.reverse()
p1 = 0
for i, v in enumerate(arr):
if v == ' ':
if arr[p1] != ' ':
arr[p1:i] = reverse_word(arr[p1:i])
p1 = i + 1
arr[p1:] = reverse_word(arr[p1:])
return arr
My question is: Is the call to reverse an O(1) or O(N) space operation? I assumed O(N) but someone else said it was O(1). I assumed O(N) because in the worst case, with one word, the entire array will need to be copied to the stackcall. Space is not "constant" because the space size allocated to the call is dependent on the input length.
To answer your question first: Yes the reverse function you defined is an O(1) space operation(even though it's wrong and will never end). The reason is, when you pass in a list to the function in python, it does not copy the whole list, it passes it's reference(or the pointer, if you are familiar with C concepts). So no matter how long your array is, the space usage is constant.
However, your question alone may be meaningful, but in this program, it does not matter. We all know for an algorithm, the big-O for space and time is determined by the largest part of the algorithm. You actually other operations in your code that has O(N) space operation. For example, reversed() function generates a whole new list.
BTW, it's not the best practice to define a function that has the same name with other methods you may use(in this case, reverse).
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z']
endlist = []
def loopfunc(n, lis):
if n ==0:
endlist.append(lis[0]+lis[1]+lis[2]+lis[3]+lis[4])
for i in alphabet:
if n >0:
lis.append(i)
loopfunc(n-1, lis )
loopfunc(5, [])
This program is supposed to make endlist be:
endlist = [aaaaa, aaaab, aaaac, ... zzzzy, zzzzz]
But it makes it:
endlist = [aaaaa, aaaaa, aaaaa, ... , aaaaa]
The lenght is right, but it won't make different words. Can anyone help me see why?
The only thing you ever add to endlist is the first 5 elements of lis, and since you have a single lis that is shared among all the recursive calls (note that you never create a new list in this code other than the initial values for endlist and lis, so every append to lis is happening to the same list), those first 5 elements are always the a values that you appended in your first 5 recursive calls. The rest of the alphabet goes onto the end of lis and is never reached by any of your other code.
Since you want string in the end, it's a little easier just to use strings for collecting your items. This avoids the possibility of shared mutable references which is cause your issues. With that the recursion becomes pretty concise:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def loopfunc(n, lis=""):
if n < 1:
return [lis]
res = []
for a in alphabet:
res.extend(loopfunc(n-1, lis + a))
return res
l = loopfunc(5)
print(l[0], l[1], l[-1], l[-2])
# aaaaa aaaab zzzzz zzzzy
Note that with n=5 you'll have almost 12 million combinations. If you plan on having larger n values, it may be worth rewriting this as a generator.
I am doing a Python exercise to search a word from a given sorted wordlist, containing more than 100,000 words.
When using bisect_left from the Python bisect module, it is very efficient, but using the binary method created by myself is very inefficient. Could anyone please clarify why?
This is the searching method using the Python bisect module:
def in_bisect(word_list, word):
"""Checks whether a word is in a list using bisection search.
Precondition: the words in the list are sorted
word_list: list of strings
word: string
"""
i = bisect_left(word_list, word)
if i != len(word_list) and word_list[i] == word:
return True
else:
return False
My implementation is really very inefficient (don't know why):
def my_bisect(wordlist,word):
"""search the given word in a wordlist using
bisection search, also known as binary search
"""
if len(wordlist) == 0:
return False
if len(wordlist) == 1:
if wordlist[0] == word:
return True
else:
return False
if word in wordlist[len(wordlist)/2:]:
return True
return my_bisect(wordlist[len(wordlist)/2:],word)
if word in wordlist[len(wordlist)/2:]
will make Python search through half of your wordlist, which is kinda defeating the purpose of writing a binary search in the first place. Also, you are not splitting the list in half correctly. The strategy for binary search is to cut the search space in half each step, and then only apply the same strategy to the half which your word could be in. In order to know which half is the right one to search, it is critical that the wordlist is sorted. Here's a sample implementation which keeps track of the number of calls needed to verify whether a word is in wordlist.
import random
numcalls = 0
def bs(wordlist, word):
# increment numcalls
print('wordlist',wordlist)
global numcalls
numcalls += 1
# base cases
if not wordlist:
return False
length = len(wordlist)
if length == 1:
return wordlist[0] == word
# split the list in half
mid = int(length/2) # mid index
leftlist = wordlist[:mid]
rightlist = wordlist[mid:]
print('leftlist',leftlist)
print('rightlist',rightlist)
print()
# recursion
if word < rightlist[0]:
return bs(leftlist, word) # word can only be in left list
return bs(rightlist, word) # word can only be in right list
alphabet = 'abcdefghijklmnopqrstuvwxyz'
wl = sorted(random.sample(alphabet, 10))
print(bs(wl, 'm'))
print(numcalls)
I included some print statements so you can see what is going on. Here are two sample outputs. First: word is in the wordlist:
wordlist ['b', 'c', 'g', 'i', 'l', 'm', 'n', 'r', 's', 'v']
leftlist ['b', 'c', 'g', 'i', 'l']
rightlist ['m', 'n', 'r', 's', 'v']
wordlist ['m', 'n', 'r', 's', 'v']
leftlist ['m', 'n']
rightlist ['r', 's', 'v']
wordlist ['m', 'n']
leftlist ['m']
rightlist ['n']
wordlist ['m']
True
4
Second: word is not in the wordlist:
wordlist ['a', 'c', 'd', 'e', 'g', 'l', 'o', 'q', 't', 'x']
leftlist ['a', 'c', 'd', 'e', 'g']
rightlist ['l', 'o', 'q', 't', 'x']
wordlist ['l', 'o', 'q', 't', 'x']
leftlist ['l', 'o']
rightlist ['q', 't', 'x']
wordlist ['l', 'o']
leftlist ['l']
rightlist ['o']
wordlist ['l']
False
4
Note that if you double the size of the wordlist, i.e. use
wl = sorted(random.sample(alphabet, 20))
numcalls on average will be only one higher than for a wordlist of length 10, because wordlist has to be split in half only once more.
to search if a word is in a wordlist simply (python 2.7):
def bisect_fun(listfromfile, wordtosearch):
bi = bisect.bisect_left(listfromfile, wordtosearch)
if listfromfile[bi] == wordtosearch:
return listfromfile[bi], bi
I am new to NLP and NLTK, and I want to find ambiguous words, meaning words with at least n different tags. I have this method, but the output is more than confusing.
Code:
def MostAmbiguousWords(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
if wordsUniqeTags.has_key(w):
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
else:
wordsUniqeTags[w] = set([t])
# Starting to count
res = []
for w in wordsUniqeTags:
if len(wordsUniqeTags[w]) >= n:
res.append((w, wordsUniqeTags[w]))
return res
MostAmbiguousWords(brown.tagged_words(), 13)
Output:
[("what's", set(['C', 'B', 'E', 'D', 'H', 'WDT+BEZ', '-', 'N', 'T', 'W', 'V', 'Z', '+'])),
("who's", set(['C', 'B', 'E', 'WPS+BEZ', 'H', '+', '-', 'N', 'P', 'S', 'W', 'V', 'Z'])),
("that's", set(['C', 'B', 'E', 'D', 'H', '+', '-', 'N', 'DT+BEZ', 'P', 'S', 'T', 'W', 'V', 'Z'])),
('that', set(['C', 'D', 'I', 'H', '-', 'L', 'O', 'N', 'Q', 'P', 'S', 'T', 'W', 'CS']))]
Now I have no idea what B,C,Q, ect. could represent. So, my questions:
What are these?
What do they mean? (In case they are tags)
I think they are not tags, because who and whats don't have the WH tag indicating "wh question words".
I'll be happy if someone could post a link that includes a mapping of all possible tags and their meaning.
It looks like you have a typo. In this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
you should have set([t]) (not set(t)), like you do in the else case.
This explains the behavior you're seeing because t is a string and set(t) is making a set out of each character in the string. What you want is set([t]) which makes a set that has t as its element.
>>> t = 'WHQ'
>>> set(t)
set(['Q', 'H', 'W']) # bad
>>> set([t])
set(['WHQ']) # good
By the way, you can correct the problem and simplify things by just changing that line to:
wordsUniqeTags[w].add(t)
But, really, you should make use of the setdefault method on dict and list comprehension syntax to improve the method overall. So try this instead:
def most_ambiguous_words(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
wordsUniqeTags.setdefault(w, set()).add(t)
# Starting to count
return [(word,tags) for word,tags in wordsUniqeTags.iteritems() if len(tags) >= n]
You are splitting your POS tags into single characters in this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
set('AT') results in set(['A', 'T']).
How about making use of the Counter and defaultdict functionality in the collections module?
from collection import defaultdict, Counter
def most_ambiguous_words(words, n):
counts = defaultdict(Counter)
for (word,tag) in words:
counts[word][tag] += 1
return [(w, counts[w].keys()) for w in counts if len(counts[word]) > n]