Time Complexity for LeetCode 3. Longest Substring Without Repeating Characters - python

Problem: Given a string s, find the length of the longest substring
without repeating characters.
Example: Input: s = "abcabcbb" Output: 3 Explanation: The answer is
"abc", with the length of 3.
My solution:
class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
seen = set()
l = r = curr_len = max_len = 0
n = len(s)
while l < n:
if r < n and s[r] not in seen:
seen.add(s[r])
curr_len += 1
max_len = max(curr_len, max_len)
r += 1
else:
l += 1
r = l
curr_len = 0
seen.clear()
return max_len
I know this is not an efficient solution, but I am having trouble figuring out its time complexity.
I visit every character in the string but, for each one of them, the window expands until it finds a repeated char. So every char ends up being visited multiple times, but not sure if enough times to justify an O(n2) time complexity and, obviously, it's way worse than O(n).

You could claim the algorithm to be O(n) if you know the size of the character set your input can be composed of, because the length your window can expand is limited by the number of different characters you could pass over before encountering a duplicate, and this is capped by the size of the character set you're working with, which itself is some constant independent of the length of the string. For example, if you are only working with lower case alphabetic characters, the algorithm is O(26n) = O(n).
To be more exact you could say that it runs in O(n*(min(m,n)) where n is the length of the string and m is the number of characters in the alphabet of the string. The reason for the min is that even if you're somehow working with an alphabet of unlimited unique characters, at worst you're doing a double for loop to the end of the string. That means however that if the number of possible characters you can encounter in the string exceeds the string's length you have a worst case O(n^2) performance (which occurs when every character of the string is unique).

Related

How can I optimize this for-loop?

I need to check the occurrences of the letter "a" in a string s of size n.
Example:
s = "abcac"
n = 10
String to check for occurrences of letter "a": "abcacabcac".
Occurrences: 4
My code works, but I need it to work faster for larger values of n.
What can I do to optimize this code?
def repeatedString(s, n):
a_count, word_iter = 0, 0
for i in range(n):
if s[word_iter] == "a":
a_count+=1
word_iter += 1
if word_iter == (len(s)):
word_iter = 0
return a_count
You only don't need to assemble the full repeated string to do it. count the number of the specified characted in the whole string and multiple that by the number of times it will be fully repeated (n//len(s) times). Add to that the number of occurrences that will appear in the last (truncated) part at the end of the repetitions (i.e. first n%len(s) characters)
def countChar(s,n,c):
return s.count(c)*n//len(s)+s[:n%len(s)].count(c)
output:
countChar("abcac",10,"a") # 4 times in 'abcacabcac'
countChar("abcac",17,"a") # 7 times in 'abcacabcacabcacab'
Count the number of times a appears in a string, s up to length n
s = "abcac"
n = 10
str(s*(int(n/len(s))))[:n].count('a')
You can use regular expressions:
import re
a_count = len(re.findall(r'a',s))
re.findall returns an array of all matches, and we can just get the length of it. Using a regular expression allows for greater generalization and the ability to search for more complex patterns. Debra's original answer is better for a simple string search though:
a_count = s.count('a')

Time limit exceeded error. Word Ladder leetcode

I am trying to solve leetcode problem(https://leetcode.com/problems/word-ladder/description/):
Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Note:
Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.
You may assume no duplicates in the word list.
You may assume beginWord and endWord are non-empty and are not the same.
Input:
beginWord = "hit",
endWord = "cog",
wordList = ["hot","dot","dog","lot","log","cog"]
Output:
5
Explanation:
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" ->
"cog", return its length 5.
import queue
class Solution:
def isadjacent(self,a, b):
count = 0
n = len(a)
for i in range(n):
if a[i] != b[i]:
count += 1
if count > 1:
return False
if count == 1:
return True
def ladderLength(self,beginWord, endWord, wordList):
word_queue = queue.Queue(maxsize=0)
word_queue.put((beginWord,1))
while word_queue.qsize() > 0:
queue_last = word_queue.get()
index = 0
while index != len(wordList):
if self.isadjacent(queue_last[0],wordList[index]):
new_len = queue_last[1]+1
if wordList[index] == endWord:
return new_len
word_queue.put((wordList[index],new_len))
wordList.pop(index)
index-=1
index+=1
return 0
Can someone suggest how to optimise it and prevent the error!
The basic idea is to find the adjacent words faster. Instead of considering every word in the list (even one that has already been filtered by word length), construct each possible neighbor string and check whether it is in the dictionary. To make those lookups fast, make sure the word list is stored in something like a set that supports fast membership tests.
To go even faster, you could store two sorted word lists, one sorted by the reverse of each word. Then look for possibilities involving changing a letter in the first half in the reversed list and for the latter half in the normal list. All the existing neighbors can then be found without making any non-word strings. This can even be extended to n lists, each sorted by omitting one letter from all the words.

Number of Palindromic Slices in a string with O(N) complexity

def solution(S):
total = 0
i = 1
while i <= len(S):
for j in range(0, len(S) - i + 1):
if is_p(S[ j: j + i]):
total += 1
i += 1
return total
def is_p(S):
if len(S) == 1:
return False
elif S == S[::-1]:
return True
else:
return False
I am writing a function to count the number of Palindromic Slices(with length bigger than 1) in a string. The above code is in poor time complexity. Can someone help me to improve it and make it O(N) complexity?
Edit: It is not duplicate since the other question is about finding the longest Palindromic Slices
Apply Manacher's Algorithm, also described by the multiple answers to this question.
That gives you the length of the longest palindrome centered at every location (centered at a character for odd-length, or centered between characters for even-length). You can use this to easily calculate the number of palindromes. Note that every palindrome must be centered somewhere, so it must be a substring (or equal to) the longest palindrome centered there.
So consider the string ababcdcbaa. By Manacher's Algorithm, you know that the maximal length palindrome centered at the d has length 7: abcdcba. By the properties of palindromes, you immediately know that bcdcb and cdc and d are also palindromes centered at d. In fact there are floor((k+1)/2) palindromes centered at a location, if you know that the longest palindrome centered there has length k.
So you sum the results of Manacher's Algorithm to get your count of all palindromes. If you want to only count palindromes of length > 1, you just need to subtract the number of length-1 palindromes, which is just n, the length of your string.
This can be done in linear time using suffix trees:
1) For constant sized alphabet we can build suffix trees using Ukkonen's Algorithm in O(n).
2) For given string S, build a generalized suffix tree of S#S' where S' is reverse of string S and # is delimiting character.
3) Now in this suffix tree, for every suffix i in S, look for lowest common ancestor of (2n-i+1) suffix is S'.
4) count for all such suffixes in the tree to get total count of all palindromes.

Realizing if there is a pattern in a string (Does not need to start at index 0, could be any length)

Coding a program to detect a n-length pattern in a string, even without knowing where the pattern starts, could be easily done by creating a list of n-length substrings and check if starting at one point there are same items or the rest of the list. Without any piece of information other than the string to check through, is the only way to recognize the pattern is to brute-force through all lengths and check or is there a more efficient algorithm?
(I'm just a beginner in Python, so this may be easy to code... )
Current code that only suits checking for starting at index 0:
def search(s):
match=s[0]+s[1]
while (match != s) and (match[0] != match[-1]):
for matchLen in range(len(match),len(s)-1):
letter = s[matchLen]
if letter == match[-1]:
match += s[len(match)]
break
if match == s:
return None
else:
return match[:-1]
You can use re.findall(r'(.{2,})\1+', string). The parentheses creates a capture group that is later backreferenced by \1. The . matches any character (except for line breaks). The {2,} requires the pattern to be at least two characters long (otherwise strings like ss would be considered a pattern). Finally the + requires that pattern to repeat 1 or more times (in addition to the first time that it occurred inside the capture group). You can see it working in action.
Pattern is a far too vague term, but assuming you mean some string repeating itself, the regexp (?P<pat>.+)(?P=pat) will work.
Given a string what you could do is -
You start with length = 1, and take two pointer variables i and j which you shall use to traverse the string.
Set i = 0 and j = i+length
if str[i]==str[j]:
i++,j++ // till j not equal to length of string
else:
length = length + 1
//increase length by 1 and start the algorithm over from i = 0
Take the example abcdeabcde :
In this we see
Initially i = 0, j = 1 ,
but str[0]!=str[1] i.e. a!=b,
Then we get length = 2 i.e., i = 0,j = 2
but str[0]!=str[2] i.e. a!=c,
Continuing in the same fashion,
We see when length = 5 and i = 0 and j = 5,
str[0]==str[5]
and thus you can see that i and j increment till j is equal to string length.
And you have your answer that is the pattern length. It may not seem obvious but i would suggest you dry-run this algorithm over some of your test cases and let me know the results.
You can use re.findall() to find all matches:
import re
s = "somethingabcdeabcdeabcdeabcdeabcdeelseabcdeabcdeabcde"
li = re.findall(r'abcde',s)
print(li)
Output:
['abcde', 'abcde', 'abcde', 'abcde', 'abcde', 'abcde', 'abcde', 'abcde']

Find the longest substring with contiguous characters, where the string may be jumbled

Given a string, find the longest substring whose characters are contiguous (i.e. they are consecutive letters) but possibly jumbled (i.e. out of order). For example:
Input : "owadcbjkl"
Output: "adcb"
We consider adcb as contiguous as it forms abcd.
(This is an interview question.)
I have an idea of running a while loop with 2 conditions, one that checks for continuous characters using Python's ord and another condition to find the minimum and maximum and check if all the following characters fall in this range.
Is there any way this problem could be solved with low running time complexity? The best I can achieve is O(N^2) where N is the length of the input string and ord() seems to be a slow operation.
If the substring is defined as ''.join(sorted(substr)) in alphabet then:
there is no duplicates in the substring and therefore the size of
the longest substring is less than (or equal to) the size of the alphabet
(ord(max(substr)) - ord(min(substr)) + 1) == len(substr), where
ord() returns position in the alphabet (+/- constant) (builtin
ord() can be used for lowercase ascii letters)
Here's O(n*m*m)-time, O(m)-space solution, where n is len(input_string) and m is len(alphabet):
from itertools import count
def longest_substr(input_string):
maxsubstr = input_string[0:0] # empty slice (to accept subclasses of str)
for start in range(len(input_string)): # O(n)
for end in count(start + len(maxsubstr) + 1): # O(m)
substr = input_string[start:end] # O(m)
if len(set(substr)) != (end - start): # found duplicates or EOS
break
if (ord(max(substr)) - ord(min(substr)) + 1) == len(substr):
maxsubstr = substr
return maxsubstr
Example:
print(longest_substr("owadcbjkl"))
# -> adcb

Categories