Analyze the time and space complexity of the following code - python

Problem from leetcode:
https://leetcode.com/problems/text-justification/description/
Given an array of words and a width maxWidth, format the text such that each line has exactly maxWidth characters and is fully (left and right) justified.
You should pack your words in a greedy approach; that is, pack as many words as you can in each line. Pad extra spaces ' ' when necessary so that each line has exactly maxWidth characters.
Extra spaces between words should be distributed as evenly as possible. If the number of spaces on a line do not divide evenly between words, the empty slots on the left will be assigned more spaces than the slots on the right.
For the last line of text, it should be left justified and no extra space is inserted between words.
Original code:
class Solution:
def fullJustify(self, words, maxWidth):
ans, curr, word_length = [], [], 0
words.append(' ' * maxWidth)
for w in words:
if word_length + len(w) + len(curr) > maxWidth:
space = maxWidth-word_length
if w != words[-1]:
for i in range(space):
curr[i%(len(curr)-1 or 1)] += ' '
ans.append(''.join(curr))
else:
ans.append(' '.join(curr) + ' ' * (space - (len(curr) - 1)))
curr = []
word_length = 0
curr += [w]
word_length += len(w)
return ans
So there are 2 for-loops, one is inside another.
The second for-loop is determined by the space which change everytime but always smaller than 'maxWidth'. First loop has time-complexity of O(n), what's the overall time complexity?

If you call n = |words| and m = maxWidth then you'll notice that you have an outer loop that does n iterations, inside of that there are different conditions but if they happen to be true you have another loop that in the worst case scenario is executed m times.
Therefore you can say time complexity is: T(n, m) = O(n * m)

Related

How to encode (replace) parts of word from end to beginning for some N value (like abcabcab to cbacbaba for n=3)?

I would like to create a program for encoding and decoding words.
Specifically, the program should take part of the word (count characters depending on the value of n) and turns them backwards.
This cycle will be running until it encodes the whole word.
At first I created the number of groups of parts of the word which is the number of elements n + some possible remainder
*(For example for Language with n = 3 has 3 parts - two parts of 3 chars and one remainder with 2 chars).This unit is called a general.
Then, depending on the general, I do a cycle that n * takes the given character and always adds it to the group (group has n chars).
At the end of the group cycle, I add (in reverse order) to new_word and reset the group value.
The goal should be to example decode word Language with (n value = 2) to aLgnaueg.
Or Language with (n value = 3) to naL aug eg and so on.
Next example is word abcabcab (n=3) to cba cba ba ?
Output of my code don´t do it right. Output for n=3 is "naLaugeg"
Could I ask how to improve it? Is there some more simple python function how to rewrite it?
My code is there:
n = 3
word = "Language"
new_word = ""
group = ""
divisions = (len(word)//n)
residue = (len(word)%n)
general = divisions + residue
for i in range(general):
j=2
for l in range(n):
group += word[i+j]
print(word[i+j], l)
j=j-1
for j in range((len(group)-1),-1,-1):
new_word += group[j]
print(word[j])
group = ""
print(group)
print(new_word)
import textwrap
n = 3
word = "Language"
chunks = textwrap.wrap(word, n)
reversed_chunks = [chunk[::-1] for chunk in chunks]
>>> print(' '.join(reversed_chunks))
naL aug eg

Time complexity of a sliding window question

I'm working on the following problem:
Given a string and a list of words, find all the starting indices of substrings in the given string that are a concatenation of all the given words exactly once without any overlapping of words. It is given that all words are of the same length. For example:
Input: String = "catfoxcat", Words = ["cat", "fox"]
Output: [0, 3]
Explanation: The two substring containing both the words are "catfox" & "foxcat".
My solution is:
def find_word_concatenation(str, words):
result_indices = []
period = len(words[0])
startIndex = 0
wordCount = {}
matched = 0
for w in words:
if w not in wordCount:
wordCount[w] = 1
else:
wordCount[w] += 1
for endIndex in range(0, len(str) - period + 1, period):
rightWord = str[endIndex: endIndex + period]
if rightWord in wordCount:
wordCount[rightWord] -= 1
if wordCount[rightWord] == 0:
matched += 1
while matched == len(wordCount):
if endIndex + period - startIndex == len(words)*period:
result_indices.append(startIndex)
leftWord = str[startIndex: startIndex + period]
if leftWord in wordCount:
wordCount[leftWord] += 1
if wordCount[leftWord] > 0:
matched -= 1
startIndex += period
return result_indices
Can anyone help me figure out its time complexity please?
We should start by drawing a distinction between the time complexity of your code vs what you might actually be looking for.
In your case, you have a set of nested loops (a for and a while). So, worst case, which is what Big O is based on, you would do each of those while loops n times. But you also have that outer loop which would also be done n times.
O(n) * O(n) = O(n) 2
Which is not very good. Now, while not really so bad with this example, imagine if you were looking for "what a piece of work is man" in all of the Library of Congress or even in the collected works of Shakespeare.
On the plus side, you can refactor your code and get it down quite a bit.

How to determine the minimum period of a periodic series

I am doing a text mining and trying to clean bullet screen (弹幕) data.(Bullet screen is a kind of comment in video websites) There are repetitions of expressions in my data. ("LOL LOL LOL", "LMAOLMAOLMAOLMAO") And I want to get "LOL", "LMAO".
In most cases, I want to find the minimum period of a sequence.
CORNER CASE: The tail of the input sequence can be seen as a part of the periodic subsequence.
"eat an apple eat an apple eat an" # input
"eat an apple" # output
There are some other test cases:
cases = [
"abcd", #4 abcd
"ababab", #2 ab
"ababcababc", #5 ababc
"abcdabcdabc", #4 abcd
]
NOTE: As for the last case "abcdabcdabc", "abcd" is better than "abcdabcdabc" because the last three character "abc" is part of "abcd".
def solve(x):
n = len(x)
d = dict()
T = 0
k = 0
while k < n:
w = x[k]
if w not in d:
d[w] = T
T += 1
else:
while k < n and d.get(x[k], None) == k%T:
k += 1
if k < n:
T = k+1
k += 1
return T, x[:T]
it can output correct answers for first two cases but fails to handle all of them.
There is effective Z-algorithm
Given a string S of length n, the Z Algorithm produces an array Z
where Z[i] is the length of the longest substring starting from S[i]
which is also a prefix of S, i.e. the maximum k such that
S[j] = S[i + j] for all 0 ≤ j < k. Note that Z[i] = 0 means that
S[0] ≠ S[i]. For easier terminology, we will refer to substrings which
are also a prefix as prefix-substrings.
Calculate Z-array for your string and find such position i with property i + Z[i] == len and len % i == 0 (len is string length). Now i is period length
I'm not fluent in Python, but can easily describe the algorithm you need:
found <- false
length <- inputString.length
size = 1
output <- inputString
while (not found) and (size <= length / 2) do
if (length % size = 0) then
chunk <- inputString.substring(0, size)
found <- true
for (j <- 1,length/size) do
if (not inputString.substring(j * size, size).equals(chunk)) then
found <- false
if end
for end
if found then
output <- chunk
if end
if end
size <- size + 1
while end
The idea is to increasingly take substrings starting from the start of the string, the starting length of the substrings being 1 and while you do not find a repetitive cycle, you increase the length (until it is evidently no longer feasible, that is, half of the length of the input has been reached). In each iteration you compare the length of the substring with the length of the input string and if the length of the input string is not divisible with the current substring, then the current substring will not be repetitive for the input string (an optimization would be to find out what numbers is your input string's length divisible with and check only for that lengths in your substrings, but I avoided such optimizations for the sake of understandability). If the size of your string is divisible with the current size, then you take the substring from the start of your input string up until the current size and check whether it is repeated. The first time you find such a pattern you can stop with your loop, because you have found the solution. If no such solution is found, then the input string is the smallest repetitive substring and it is repeated 0 times, as it is found in your string only once.
EDIT
If you want to tolerate the last occurrence being only a part of the pattern, limited by the inputString, then the algorithm can be changed like this:
found <- false
length <- inputString.length
size = 1
output <- inputString
while (not found) and (size <= length / 2) do
chunk <- inputString.substring(0, size)
found <- true
for (j <- 1,length/size) do
if (not inputString.substring(j * size, size).equals(chunk)) then
found <- (chunk.indexOf(inputString.substring(j).length) = 0)
if end
for end
if found then
output <- chunk
if end
size <- size + 1
while end
In this case, we see the line of
found <- (chunk.indexOf(inputString.substring(j).length) = 0)
so, in the case of a mismatch, we check whether our chunk starts with the remaining part of the string. If so, then we are at the end of the input string and the pattern is partially matched up until the end of the string, so found will be true. If not, then found will be false.
You could do it this way :
def solve(string):
foundPeriods = {}
for x in range(len(string)):
#Tested substring
substring = string[0:len(string)-x]
#Frequency count
occurence_count = string.count(substring)
#Make a comparaison to original string
if substring * occurence_count in string:
foundPeriods[occurence_count] = substring
return foundPeriods[max(foundPeriods.keys())]
for x in cases:
print(x ,'===> ' , solve(x), "#" , len(solve(x)))
print()
Output
abcd ===> a # 1
ababab ===> ab # 2
ababcababc ===> ababc # 5
abcdabcdabc ===> abcd # 4
EDIT :
Answer edited to consider the following in the question
"abcdabcdabc", "abcd" is better than "abcdabcdabc" because it comes more naturally

Python, print letters and tower of asterisk equal to how many times the letter appears

def muchbetter(x):
count_list = []
for char in "abcdefghijklmnopqrstuvwxyz":
count_list.append(x.lower().count(char))
return tuple(count_list)
def print_stars(x):
tup = muchbetter(x)
stars = [' '*(max(tup) - s) + '*'*s for s in tup if s != 0]
print('\n'.join([''.join(a) for a in list(zip(*stars))]))
so those are two functions, the first one counts how many times a letter appears in a sample text, the second one makes a "tower of asterisk" equal to the number of letters, however, i need the second one to also put all of the letters at the bottom, so what i want it to do is
*
* *
ABCDEFGHIJLMNOPQRSTUVWXYZ
that should be the result if i input "ADD" as X, it would put two asterisk on top of X, 1 asterisk on top of A, and no asterisk ontop of anything else.
You need just to delete if s!= 0 in your list comprehension to make empty space for letters which are not in output of muchbetter.
def muchbetter(x):
count_list = []
for char in "abcdefghijklmnopqrstuvwxyz":
count_list.append(x.lower().count(char))
return tuple(count_list)
def print_stars_order(x):
tup = muchbetter(x)
stars = [' '*(max(tup) - s) + '*'*s for s in tup]
print('\n'.join([''.join(a) for a in list(zip(*stars))]))
print('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
in_str="ADD"
print_stars_order(in_str)
*
* *
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Recursive function dies with Memory Error

Say we have a function that translates the morse symbols:
. -> -.
- -> ...-
If we apply this function twice, we get e.g:
. -> -. -> ...--.
Given an input string and a number of repetitions, want to know the length of the final string. (Problem 1 from the Flemish Programming Contest VPW, taken from these slides which provide a solution in Haskell).
For the given inputfile
4
. 4
.- 2
-- 2
--... 50
We expect the solution
44
16
20
34028664377246354505728
Since I don't know Haskell, this is my recursive solution in Python that I came up with:
def encode(msg, repetition, morse={'.': '-.', '-': '...-'}):
if isinstance(repetition, str):
repetition = eval(repetition)
while repetition > 0:
newmsg = ''.join(morse[c] for c in msg)
return encode(newmsg, repetition-1)
return len(msg)
def problem1(fn):
with open(fn) as f:
f.next()
for line in f:
print encode(*line.split())
which works for the first three inputs but dies with a memory error for the last input.
How would you rewrite this in a more efficient way?
Edit
Rewrite based on the comments given:
def encode(p, s, repetition):
while repetition > 0:
p,s = p + 3*s, p + s
return encode(p, s, repetition-1)
return p + s
def problem1(fn):
with open(fn) as f:
f.next()
for line in f:
msg, repetition = line.split()
print encode(msg.count('.'), msg.count('-'), int(repetition))
Comments on style and further improvements still welcome
Consider that you don't actually have to output the resulting string, only the length of it. Also consider that the order of '.' and '-' in the string do not affect the final length (e.g. ".- 3" and "-. 3" produce the same final length).
Thus, I would give up on storing the entire string and instead store the number of '.' and the number of '-' as integers.
In your starting string, count the number of dots and dashes. Then apply this:
repetitions = 4
dots = 1
dashes = 0
for i in range(repetitions):
dots, dashes = dots + 3 * dashes, dashes + dots
Think about it why this works.
Per #Hammar (I had the same idea, but he explained it better than I could have ;-):
from sympy import Matrix
t = Matrix([[1,3],[1,1]])
def encode(dots, dashes, reps):
res = matrix([dashes, dots]) * t**reps
return res[0,0] + res[0,1]
you put the count of dots to dashes, and count of dashes to dots in each iteration...
def encode(dots, dashes, repetitions):
while repetitions > 0:
dots, dashes = dots + 3 * dashes, dots + dashes
repetitions -= 1
return dots + dashes
def problem1(fn):
with open(fn) as f:
count = int(next(f))
for i in xrange(count):
line = next(f)
msg, repetition = line.strip().split()
print encode(msg.count('.'), msg.count('-'), int(repetition))

Categories