Number of Palindromic Slices in a string with O(N) complexity - python

def solution(S):
total = 0
i = 1
while i <= len(S):
for j in range(0, len(S) - i + 1):
if is_p(S[ j: j + i]):
total += 1
i += 1
return total
def is_p(S):
if len(S) == 1:
return False
elif S == S[::-1]:
return True
else:
return False
I am writing a function to count the number of Palindromic Slices(with length bigger than 1) in a string. The above code is in poor time complexity. Can someone help me to improve it and make it O(N) complexity?
Edit: It is not duplicate since the other question is about finding the longest Palindromic Slices

Apply Manacher's Algorithm, also described by the multiple answers to this question.
That gives you the length of the longest palindrome centered at every location (centered at a character for odd-length, or centered between characters for even-length). You can use this to easily calculate the number of palindromes. Note that every palindrome must be centered somewhere, so it must be a substring (or equal to) the longest palindrome centered there.
So consider the string ababcdcbaa. By Manacher's Algorithm, you know that the maximal length palindrome centered at the d has length 7: abcdcba. By the properties of palindromes, you immediately know that bcdcb and cdc and d are also palindromes centered at d. In fact there are floor((k+1)/2) palindromes centered at a location, if you know that the longest palindrome centered there has length k.
So you sum the results of Manacher's Algorithm to get your count of all palindromes. If you want to only count palindromes of length > 1, you just need to subtract the number of length-1 palindromes, which is just n, the length of your string.

This can be done in linear time using suffix trees:
1) For constant sized alphabet we can build suffix trees using Ukkonen's Algorithm in O(n).
2) For given string S, build a generalized suffix tree of S#S' where S' is reverse of string S and # is delimiting character.
3) Now in this suffix tree, for every suffix i in S, look for lowest common ancestor of (2n-i+1) suffix is S'.
4) count for all such suffixes in the tree to get total count of all palindromes.

Related

Time Complexity for LeetCode 3. Longest Substring Without Repeating Characters

Problem: Given a string s, find the length of the longest substring
without repeating characters.
Example: Input: s = "abcabcbb" Output: 3 Explanation: The answer is
"abc", with the length of 3.
My solution:
class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
seen = set()
l = r = curr_len = max_len = 0
n = len(s)
while l < n:
if r < n and s[r] not in seen:
seen.add(s[r])
curr_len += 1
max_len = max(curr_len, max_len)
r += 1
else:
l += 1
r = l
curr_len = 0
seen.clear()
return max_len
I know this is not an efficient solution, but I am having trouble figuring out its time complexity.
I visit every character in the string but, for each one of them, the window expands until it finds a repeated char. So every char ends up being visited multiple times, but not sure if enough times to justify an O(n2) time complexity and, obviously, it's way worse than O(n).
You could claim the algorithm to be O(n) if you know the size of the character set your input can be composed of, because the length your window can expand is limited by the number of different characters you could pass over before encountering a duplicate, and this is capped by the size of the character set you're working with, which itself is some constant independent of the length of the string. For example, if you are only working with lower case alphabetic characters, the algorithm is O(26n) = O(n).
To be more exact you could say that it runs in O(n*(min(m,n)) where n is the length of the string and m is the number of characters in the alphabet of the string. The reason for the min is that even if you're somehow working with an alphabet of unlimited unique characters, at worst you're doing a double for loop to the end of the string. That means however that if the number of possible characters you can encounter in the string exceeds the string's length you have a worst case O(n^2) performance (which occurs when every character of the string is unique).

Pythonic way of checking if indefinite # of consec elements in list sum to given value

Having trouble figuring out a nice way to get this task done.
Say i have a list of triangular numbers up to 1000 -> [0,1,3,6,10,15,..]etc
Given a number, I want to return the consecutive elements in that list that sum to that number.
i.e.
64 --> [15,21,28]
225 --> [105,120]
371 --> [36, 45, 55, 66, 78, 91]
if there's no consecutive numbers that add up to it, return an empty list.
882 --> [ ]
Note that the length of consecutive elements can be any number - 3,2,6 in the examples above.
The brute force way would iteratively check every possible consecutive pairing possibility for each element. (start at 0, look at the sum of [0,1], look at the sum of [0,1,3], etc until the sum is greater than the target number). But that's probably O(n*2) or maybe worse. Any way to do it better?
UPDATE:
Ok, so a friend of mine figured out a solution that works at O(n) (I think) and is pretty intuitively easy to follow. This might be similar (or the same) to Gabriel's answer, but it was just difficult for me to follow and I like that this solution is understandable even from a basic perspective. this is an interesting question, so I'll share her answer:
def findConsec(input1 = 7735):
list1 = range(1, 1001)
newlist = [reduce(lambda x,y: x+y,list1[0:i]) for i in list1]
curr = 0
end = 2
num = sum(newlist[curr:end])
while num != input1:
if num < input1:
num += newlist[end]
end += 1
elif num > input1:
num -= newlist[curr]
curr += 1
if curr == end:
return []
if num == input1:
return newlist[curr:end]
A 3-iteration max solution
Another solution would be to start from close where your number would be and walk forward from one position behind. For any number in the triangular list vec, their value can be defined by their index as:
vec[i] = sum(range(0,i+1))
The division between the looking-for sum value and the length of the group is the average of the group and, hence, lies within it, but may as well not exist in it.
Therefore, you can set the starting point for finding a group of n numbers whose sum matches a value val as the integer part of the division between them. As it may not be in the list, the position would be that which minimizes their difference.
# vec as np.ndarray -> the triangular or whatever-type series
# val as int -> sum of n elements you are looking after
# n as int -> number of elements to be summed
import numpy as np
def seq_index(vec,n,val):
index0 = np.argmin(abs(vec-(val/n)))-n/2-1 # covers odd and even n values
intsum = 0 # sum of which to keep track
count = 0 # counter
seq = [] # indices of vec that sum up to val
while count<=2: # walking forward from the initial guess of where the group begins or prior to it
intsum = sum(vec[(index0+count):(index0+count+n)])
if intsum == val:
seq.append(range(index0+count,index0+count+n))
count += 1
return seq
# Example
vec = []
for i in range(0,100):
vec.append(sum(range(0,i))) # build your triangular series from i = 0 (0) to i = 99 (whose sum equals 4950)
vec = np.array(vec) # convert to numpy to make it easier to query ranges
# looking for a value that belong to the interval 0-4590
indices = seq_index(vec,3,4)
# print indices
print indices[0]
print vec[indices]
print sum(vec[indices])
Returns
print indices[0] -> [1, 2, 3]
print vec[indices] -> [0 1 3]
print sum(vec[indices]) -> 4 (which we were looking for)
This seems like an algorithm question rather than a question on how to do it in python.
Thinking backwards I would copy the list and use it in a similar way to the Sieve of Eratosthenes. I would not consider the numbers that are greater than x. Then start from the greatest number and sum backwards. Then if I get greater than x, subtract the greatest number (exclude it from the solution) and continue to sum backward.
This seems the most efficient way to me and actually is O(n) - you never go back (or forward in this backward algorithm), except when you subtract or remove the biggest element, which doesn't need accessing the list again - just a temp var.
To answer Dunes question:
Yes, there is a reason - to subtracts the next largest in case of no-solution that sums larger. Going from the first element, hit a no-solution would require access to the list again or to the temporary solution list to subtract a set of elements that sum greater than the next element to sum. You risk to increase the complexity by accessing more elements.
To improve efficiency in the cases where an eventual solution is at the beginning of the sequence you can search for the smaller and larger pair using binary search. Once a pair of 2 elements, smaller than x is found then you can sum the pair and if it sums larger than x you go left, otherwise you go right. This search has logarithmic complexity in theory. In practice complexity is not what it is in theory and you can do whatever you like :)
You should pick the first three elements, sum them and do and then you keep subtracting the first of the three and add the next element in the list and see if the sum add up to whatever number you want. That would be O(n).
# vec as np.ndarray
import numpy as np
itsum = sum(list[0:2]) # the sum you want to iterate and check its value
sequence = [[] if itsum == whatever else [range(0,3)]] # indices of the list that add up to whatever (creation)
for i in range(3,len(vec)):
itsum -= vec[i-3]
itsum += vec[i]
if itsum == whatever:
sequence.append(range(i-2,i+1)) # list of sequences that add up to whatever
The solution you provide in the question isn't truly O(n) time complexity -- the way you compute your triangle numbers makes the computation O(n2). The list comprehension throws away the previous work that want into calculating the last triangle number. That is: tni = tni-1 + i (where tn is a triangle number). Since you also, store the triangle numbers in a list, your space complexity is not constant, but related to the size of the number you are looking for. Below is an identical algorithm, but is O(n) time complexity and O(1) space complexity (written for python 3).
# for python 2, replace things like `highest = next(high)` with `highest = high.next()`
from itertools import count, takewhile, accumulate
def find(to_find):
# next(low) == lowest number in total
# next(high) == highest number not in total
low = accumulate(count(1)) # generator of triangle numbers
high = accumulate(count(1))
total = highest = next(high)
# highest = highest number in the sequence that sums to total
# definitely can't find solution if the highest number in the sum is greater than to_find
while highest <= to_find:
# found a solution
if total == to_find:
# keep taking numbers from the low iterator until we find the highest number in the sum
return list(takewhile(lambda x: x <= highest, low))
elif total < to_find:
# add the next highest triangle number not in the sum
highest = next(high)
total += highest
else: # if total > to_find
# subtract the lowest triangle number in the sum
total -= next(low)
return []

Word ranking partial completion [duplicate]

This question already has answers here:
Finding the ranking of a word (permutations) with duplicate letters
(6 answers)
Closed 8 years ago.
I am not sure how to solve this problem within the constraints.
Shortened problem formulation:
"Word" as any sequence of capital letters A-Z (not limited to just "dictionary words").
Consider list of permutations of all characters in a word, sorted lexicographically
Find a position of original word in such a list
Do not generate all possible permutations of a word, since it won't fit in time-memory constraints.
Constraints: word length <= 25 characters; memory limit 1Gb, any answer should fit in 64-bit integer
Original problem formulation:
Consider a "word" as any sequence of capital letters A-Z (not limited to just "dictionary words"). For any word with at least two different letters, there are other words composed of the same letters but in a different order (for instance, STATIONARILY/ANTIROYALIST, which happen to both be dictionary words; for our purposes "AAIILNORSTTY" is also a "word" composed of the same letters as these two). We can then assign a number to every word, based on where it falls in an alphabetically sorted list of all words made up of the same set of letters. One way to do this would be to generate the entire list of words and find the desired one, but this would be slow if the word is long. Write a program which takes a word as a command line argument and prints to standard output its number. Do not use the method above of generating the entire list. Your program should be able to accept any word 25 letters or less in length (possibly with some letters repeated), and should use no more than 1 GB of memory and take no more than 500 milliseconds to run. Any answer we check will fit in a 64-bit integer.
Sample words, with their rank:
ABAB = 2
AAAB = 1
BAAA = 4
QUESTION = 24572
BOOKKEEPER = 10743
examples:
AAAB - 1
AABA - 2
ABAA - 3
BAAA - 4
AABB - 1
ABAB - 2
ABBA - 3
BAAB - 4
BABA - 5
BBAA - 6
I came up with I think is only a partial solution.
Imagine I have the word JACBZPUC. I sort the word and get ABCCJPUZ This should be rank 1 in the word rank. From ABCCJPUZ to the first alphabetical word right before the word starting with J I want to find the number of permutations between the 2 words.
ex:
for `JACBZPUC`
sorted --> `ABCCJPUZ`
permutations that start with A -> 8!/2!
permutations that start with B -> 8!/2!
permutations that start with C -> 8!/2!
Add the 3 values -> 60480
The other C is disregarded as the permutations would have the same values as the previous C (duplicates)
At this point I have the ranks from ABCCJPUZ to the word right before the word that starts with J
ABCCJPUZ rank 1
...
... 60480 values
...
*HERE*
JABCCJPUZ rank 60481 LOCATION A
...
...
...
JACBZPUC rank ??? LOCATION B
I'm not sure how to get the values between Locations A and B:
Here is my code to find the 60480 values
def perm(word):
return len(set(itertools.permutations(word)))
def swap(word, i, j):
word = list(word)
word[i], word[j] = word[j], word[i]
print word
return ''.join(word)
def compute(word):
if ''.join(sorted(word)) == word:
return 1
total = 0
sortedWord = ''.join(sorted(word))
beforeFirstCharacterSet = set(sortedWord[:sortedWord.index(word[0])])
print beforeFirstCharacterSet
for i in beforeFirstCharacterSet:
total += perm(swap(sortedWord,0,sortedWord.index(i)))
return total
Here is a solution I found online to solve this problem.
Consider the n-letter word { x1, x2, ... , xn }. My solution is based on the idea that the word number will be the sum of two quantities:
The number of combinations starting with letters lower in the alphabet than x1, and
how far we are into the the arrangements that start with x1.
The trick is that the second quantity happens to be the word number of the word { x2, ... , xn }. This suggests a recursive implementation.
Getting the first quantity is a little complicated:
Let uniqLowers = { u1, u2, ... , um } = all the unique letters lower than x1
For each uj, count the number of permutations starting with uj.
Add all those up.
I think I complete step number 1 but not number 2. I am not sure how to complete this part
Here is the Haskell solution...I don't know Haskell =/ and I am trying to write this program in Python
https://github.com/david-crespo/WordNum/blob/master/comb.hs
The idea of finding the number of prmutations of the letters before the actual first letter is good.But your calculation:
for `JACBZPUC`
sorted --> `ABCCJPUZ`
permutations that start with A -> 8!/2!
permutations that start with B -> 8!/2!
permutations that start with C -> 8!/2!
Add the 3 values -> 60480
is wrong. There are only 8!/2! = 20160 permutations of JACBZPUC, so the starting position can't be greater than 60480. In your method, the first letter is fixed, you can only permute the seven following letters. So:
permutations that start with A: 7! / 2! == 2520
permutations that start with B: 7! / 2! == 2520
permutations that start with C: 7! / 1! == 5040
-----
10080
You don't divide by 2! to find the permutations beginning with C, because the seven remaning letters are unique; there's only one C left.
Here's a Python implementation:
def fact(n):
"""factorial of n, n!"""
f = 1
while n > 1:
f *= n
n -= 1
return f
def rrank(s):
"""Back-end to rank for 0-based rank of a list permutation"""
# trivial case
if len(s) < 2: return 0
order = s[:]
order.sort()
denom = 1
# account for multiple occurrences of letters
for i, c in enumerate(order):
n = 1
while i + n < len(order) and order[i + n] == c:
n += 1
denom *= n
# starting letters alphabetically before current letter
pos = order.index(s[0])
#recurse to list without its head
return fact(len(s) - 1) * pos / denom + rrank(s[1:])
def rank(s):
"""Determine 1-based rank of string permutation"""
return rrank(list(s)) + 1
strings = [
"ABC", "CBA",
"ABCD", "BADC", "DCBA", "DCAB", "FRED",
"QUESTION", "BOOKKEEPER", "JACBZPUC",
"AAAB", "AABA", "ABAA", "BAAA"
]
for s in strings:
print s, rank(s)
The second part of the solution you have found is also --I think-- what I was about to suggest:
To go from what you call "Location A" to "Location B", you have to find the position of word ACBZPUC among its possible permutations. Consider that a new question to your algorithm, with a new word that just happens to be one position shorter than the original one.
The words in the alphabetical list between JABCCPUZ, which you know the position of, and JACBZPUC, which you want to find the position of, all start with J. Finding the position of JACBZPUC relative to JABCCPUZ, then, is equivalent to finding the relative positions of those two words with the initial J removed, which is the same as the problem you were trying to solve initially but with a word one character shorter.
Repeat that process enough times and you will be left with a word that contains a single character, C. The position of a word with a single character is known to always be 1, so you can then sum that and all of the previous relative positions for an absolute position.

python recursion with bubble sort

So, i have this problem where i recieve 2 strings of letters ACGT, one with only letters, the other contain letters and dashes "-".both are same length. the string with the dashes is compared to the string without it. cell for cell. and for each pairing i have a scoring system. i wrote this code for the scoring system:
for example:
dna1: -ACA
dna2: TACG
the scoring is -1. (because dash compared to a letter(T) gives -2, letter compared to same letter gives +1 (A to A), +1 (C to C) and non similar letters give (-1) so sum is -1.
def get_score(dna1, dna2, match=1, mismatch=-1, gap=-2):
""""""
score = 0
for index in range(len(dna1)):
if dna1[index] is dna2[index]:
score += match
elif dna1[index] is not dna2[index]:
if "-" not in (dna1[index], dna2[index]):
score += mismatch
else:
score += gap
this is working fine.
now i have to use recursion to give the best possible score for 2 strings.
i recieve 2 strings, they can be of different sizes this time. ( i cant change the order of letters).
so i wrote this code that adds "-" as many times needed to the shorter string to create 2 strings of same length and put them in the start of list. now i want to start moving the dashes and record the score for every dash position, and finally get the highest posibble score. so for moving the dashes around i wrote a litle bubble sort.. but it dosnt seem to do what i want. i realize its a long quesiton but i'd love some help. let me know if anything i wrote is not understood.
def best_score(dna1, dna2, match=1, mismatch=-1, gap=-2,\
score=[], count=0):
""""""
diff = abs(len(dna1) - len(dna2))
if len(dna1) is len(dna2):
short = []
elif len(dna1) < len(dna2):
short = [base for base in iter(dna1)]
else:
short = [base for base in iter(dna2)]
for i in range(diff):
short.insert(count, "-")
for i in range(diff+count, len(short)-1):
if len(dna1) < len(dna2):
score.append((get_score(short, dna2),\
''.join(short), dna2))
else:
score.append((get_score(dna1, short),\
dna1, ''.join(short)))
short[i+1], short[i] = short[i], short[i+1]
if count is min(len(dna1), len(dna2)):
return score[score.index(max(score))]
return best_score(dna1, dna2, 1, -1, -2, score, count+1)
First, if I correctly deciephered your cost function, your best score value do not depend on gap, as number of dashes is fixed.
Second, it is lineary dependent on number of mismatches and so doesn't depend on match and mismatch exact values, as long as they are positive and negative respectively.
So your task reduces to lookup of a longest subsequence of longest string letters strictly matching subsequence of letters of the shortest one.
Third, define by M(string, substr) function returnin length of best match from above. If you smallest string fisrt letter is S, that is substr == 'S<letters>', then
M(string, 'S<letters>') = \
max(1 + M(string[string.index(S):], '<letters>') + # found S
M(string[1:], '<letters>')) # letter S not found, placed at 1st place
latter is an easy to implement recursive expression.
For a pair string, substr denoting m=M(string, substr) best score is equal
m * match + (len(substr) - m) * mismatch + (len(string)-len(substr)) * gap
It is straightforward, storing what value was max in recursive expression, to find what exactly best match is.

Find the longest substring with contiguous characters, where the string may be jumbled

Given a string, find the longest substring whose characters are contiguous (i.e. they are consecutive letters) but possibly jumbled (i.e. out of order). For example:
Input : "owadcbjkl"
Output: "adcb"
We consider adcb as contiguous as it forms abcd.
(This is an interview question.)
I have an idea of running a while loop with 2 conditions, one that checks for continuous characters using Python's ord and another condition to find the minimum and maximum and check if all the following characters fall in this range.
Is there any way this problem could be solved with low running time complexity? The best I can achieve is O(N^2) where N is the length of the input string and ord() seems to be a slow operation.
If the substring is defined as ''.join(sorted(substr)) in alphabet then:
there is no duplicates in the substring and therefore the size of
the longest substring is less than (or equal to) the size of the alphabet
(ord(max(substr)) - ord(min(substr)) + 1) == len(substr), where
ord() returns position in the alphabet (+/- constant) (builtin
ord() can be used for lowercase ascii letters)
Here's O(n*m*m)-time, O(m)-space solution, where n is len(input_string) and m is len(alphabet):
from itertools import count
def longest_substr(input_string):
maxsubstr = input_string[0:0] # empty slice (to accept subclasses of str)
for start in range(len(input_string)): # O(n)
for end in count(start + len(maxsubstr) + 1): # O(m)
substr = input_string[start:end] # O(m)
if len(set(substr)) != (end - start): # found duplicates or EOS
break
if (ord(max(substr)) - ord(min(substr)) + 1) == len(substr):
maxsubstr = substr
return maxsubstr
Example:
print(longest_substr("owadcbjkl"))
# -> adcb

Categories