How can I find a specific pattern in a list?

How can I find a specific pattern in a list? - python

Given a list of numbers containing either 0's, 1's, or -1's, how can I find the longest portion of the list that starts with a +1 and ends with a -1.
For example, [0,0,1,1,1,-1,-1,-1,0] : The longest portion is 6 due to the portion of the list [1,1,1,-1,-1,-1].
For example, [1,-1,0,1,-1,-1,-1] : The longest portion is 4 due to the portion of the list [1,-1,-1,-1]. Note that had the original list only been the first 3 elements (e.g., [1,-1,0]), then the correct answer would have been 2 [1,-1].
Also, the list cannot be broken with a 0 and it can only alternate from +1 to -1 once. In other words [+1,-1,+1,-1] is still only 2.
Thank you

You need has two bool(previous_has_one exist, previous_has_neg_one) to record them exist or not.
def getLongestPortion(l):
maxx = 0
curMax = 0
JustHadOne = False
JustHadNeg = False
for i in range(len(l)):
if(l[i]==1):
if(JustHadNeg):
curMax = 0
curMax += 1
JustHadOne = True
JustHadNeg = False
elif(l[i]==-1 and JustHadOne):
curMax += 1
maxx = max(maxx,curMax)
JustHadNeg = True
else:
JustHadOne = False
JustHadNeg = False
curMax=0
return maxx
l = [1,-1,-1,0,1,1,-1,-1]
print(getLongestPortion(l))

Here's a regex solution. First I change a list like [1, -1, 0, 1, -1, -1, -1] to a string like 'ab abbb', then I search for 'a+b+', then take the maximum length of the matches:
import re
max(map(len, re.findall('a+b+', ''.join(' ab'[i] for i in l))))

Related

How to count the number of triplets found in a string?

string1 = "abbbcccd"
string2 = "abbbbdccc"
How do I find the number of triplets found in a string. Triplet meaning a character that appears 3 times in a row. Triplets can also overlap for example in string 2 (abbbbdccc)
The output should be:
2 < -- string 1
3 <-- string 2
Im new to python and stack overflow so any help or advice in question writing would be much appreciated.

Try iterating through the string with a while loop, and comparing if the character and the two other characters in front of that character are the same. This works for overlap as well.
string1 = "abbbcccd"
string2 = "abbbbdccc"
string3 = "abbbbddddddccc"
def triplet_count(string):
it = 0 # iterator of string
cnt = 0 # count of number of triplets
while it < len(string) - 2:
if string[it] == string[it + 1] == string[it + 2]:
cnt += 1
it += 1
return cnt
print(triplet_count(string1)) # returns 2
print(triplet_count(string2)) # returns 3
print(triplet_count(string3)) # returns 7

This simple script should work...
my_string = "aaabbcccddddd"
# Some required variables
old_char = None
two_in_a_row = False
triplet_count = 0
# Iterates through characters of a given string
for char in my_string:
# Checks if previous character matches current character
if old_char == char:
# Checks if there already has been two in a row (and hence now a triplet)
if two_in_a_row:
triplet_count += 1
two_in_a_row = True
# Resets the two_in_a_row boolean variable if there's a non-match.
else:
two_in_a_row = False
old_char = char
print(triplet_count) # prints 5 for the example my_string I've given

Find all the positions where a substring matches a stirng

I want to get the indexes for all the ocurrencies of substring inside a string
s = "the bewildered tourist was in the mosque"
w = "the"
find(s,t)
[1, 31]
I also want the function to start counting on 1, as in the previous example, instead of 0.
This is what I managed:
import re
def find(sentence, word):
if word in sentence:
matches = re.finditer(word, sentence)
matches_positions = [match.start() for match in matches]
print(matches_positions)
else:
return [-1]
find(s, w)
[0, 30]
This is working, but I would like a way of doing this without using any import, as well as starting at 1 instead of 0.

There is no need to use the regular expression functionality to search for substrings if you don't need any regular expression matching.
To solve with just the standard methods available:
def find_indexes(string_to_search, word_to_find):
start_index = 0
positions = []
while True:
try:
start_index = string_to_search.index(word_to_find, start_index)
positions.append(start_index)
start_index += len(word_to_find)
except ValueError:
return positions
return positions
s = "the bewildered tourist was in the mosque"
w = "the"
print(find_indexes(s, w))
index raises an ValueError exception if the word can't be found after the given index, so we catch that and return the current indexes when it gets generated. index takes the start index as its second argument.
To make it start at 1 instead, add 1 to the index when you store it:
positions.append(start_index + 1)

You can convert it the string into a list and find the index of each item where the value of w is found. This will give you the position of the word (not the position of the string).
s = "the bewildered tourist was in the mosque"
w = "the"
ss = s.split(' ')
print ([i for i,si in enumerate(ss) if si == w])
The output of this will be:
[0, 5]
Similarly, if you want to find the position of the string, can search for the word the (or var w), then print the position. To use the similar approach as above, you can do:
print ([i+1 for i in range(len(s)-len(w)-1) if s[i:i+len(2)] == w])
This will return:
[1, 31]
The i+1 is to give you the starting position of 1 instead of 0.
The first result will return the position of the words and the second will return the position within the string.

How to find the most amount of shared characters in two strings? (Python)

yamxxopd
yndfyamxx
Output: 5
I am not quite sure how to find the number of the most amount of shared characters between two strings. For example (the strings above) the most amount of characters shared together is "yamxx" which is 5 characters long.
xx would not be a solution because that is not the most amount of shared characters. In this case the most is yamxx which is 5 characters long so the output would be 5.
I am quite new to python and stack overflow so any help would be much appreciated!
Note: They should be the same order in both strings

Here is simple, efficient solution using dynamic programming.
def longest_subtring(X, Y):
m,n = len(X), len(Y)
LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
result = 0
for i in range(m + 1):
for j in range(n + 1):
if (i == 0 or j == 0):
LCSuff[i][j] = 0
elif (X[i-1] == Y[j-1]):
LCSuff[i][j] = LCSuff[i-1][j-1] + 1
result = max(result, LCSuff[i][j])
else:
LCSuff[i][j] = 0
print (result )
longest_subtring("abcd", "arcd") # prints 2
longest_subtring("yammxdj", "nhjdyammx") # prints 5

This solution starts with sub-strings of longest possible lengths. If, for a certain length, there are no matching sub-strings of that length, it moves on to the next lower length. This way, it can stop at the first successful match.
s_1 = "yamxxopd"
s_2 = "yndfyamxx"
l_1, l_2 = len(s_1), len(s_2)
found = False
sub_length = l_1 # Let's start with the longest possible sub-string
while (not found) and sub_length: # Loop, over decreasing lengths of sub-string
for start in range(l_1 - sub_length + 1): # Loop, over all start-positions of sub-string
sub_str = s_1[start:(start+sub_length)] # Get the sub-string at that start-position
if sub_str in s_2: # If found a match for the sub-string, in s_2
found = True # Stop trying with smaller lengths of sub-string
break # Stop trying with this length of sub-string
else: # If no matches found for this length of sub-string
sub_length -= 1 # Let's try a smaller length for the sub-strings
print (f"Answer is {sub_length}" if found else "No common sub-string")
Output:
Answer is 5

s1 = "yamxxopd"
s2 = "yndfyamxx"
# initializing counter
counter = 0
# creating and initializing a string without repetition
s = ""
for x in s1:
if x not in s:
s = s + x
for x in s:
if x in s2:
counter = counter + 1
# display the number of the most amount of shared characters in two strings s1 and s2
print(counter) # display 5

Infinite string

We are given N words, each of length at max 50.All words consist of small case alphabets and digits and then we concatenate all the N words to form a bigger string A.An infinite string S is built by performing infinite steps on A recursively: In ith step, A is concatenated with ′$′ i times followed by reverse of A. Eg: let N be 3 and each word be '1','2' and '3' after concatenating we get A= 123 reverse of a is 321 and on first recursion it will be
A=123$321 on second recursion it will be A=123$321$$123$321 And so on… The infinite string thus obtained is S.Now after ith recursion we have to find the character at index say k.Now recursion can be large as pow(10,4) and N which can be large as (pow(10,4)) and length of each word at max is 50 so in worst case scenario our starting string can have a length of 5*(10**5) which is huge so recursion and adding the string won't work.
What I came up with is that the string would be a palindrome after 1 st recursion so if I can calculate the pos of '$'*I I can calculate any index since the string before and after it is a palindrome.I came up with a pattern that
looks like this:
string='123'
k=len(string)
recursion=100
lis=[]
for i in range(1,recursion+1):
x=(2**(i-1))
y=x*(k+1)+(x-i)
lis.append(y)
print(lis[:10])
Output:
[4, 8, 17, 36, 75, 154, 313, 632, 1271, 2550]
Now I have two problems with it first I also want to add position of adjacent '$' in the list because at position 8 which is the after 2nd recursion there will be more (recursion-1)=1 more '$' at position 9 and likewise for position 17 which is 3rd recursion there will be (3-1) two more '$' in position 18 and 19 and this would continue until ith recursion and for that I would have to insert while loop and that would make my algorithm to give TLE
string='123'
k=len(string)
recursion=100
lis=[]
for i in range(1,recursion+1):
x=(2**(i-1))
y=x*(k+1)+(x-i)
lis.append(y)
count=1
while(count<i):
y=y+1
lis.append(y)
count+=1
print(lis[:10])
Output: [4, 8, 9, 17, 18, 19, 36, 37, 38, 39]
The idea behind finding the position of $ is that the string before and after it is a palindrome and if the index of $ is odd the element before and after it would be the last element of the string and it is even the element before and after it would be the first element of the string.

The number of dollar signs that S will have in each group of them follows the following sequence:
1 2 1 3 1 2 1 4 1 2 1 ...
This corresponds to the number of trailing zeroes that i has in its binary representation, plus one:
bin(i) | dollar signs
--------+-------------
00001 | 1
00010 | 2
00011 | 1
00100 | 3
00101 | 1
00110 | 2
... ...
With that information you can use a loop that subtracts from k the size of the original words and then subtracts the number of dollars according to the above observation. This way you can detect whether k points at a dollar or within a word.
Once k has been "normalised" to an index within the limits of the original total words length, there only remains a check to see whether the characters are in their normal order or reversed. This depends on the number of iterations done in the above loop, and corresponds to i, i.e. whether it is odd or even.
This leads to this code:
def getCharAt(words, k):
size = sum([len(word) for word in words]) # sum up the word sizes
i = 0
while k >= size:
i += 1
# Determine number of dollars: corresponds to one more than the
# number of trailing zeroes in the binary representation of i
b = bin(i)
dollars = len(b) - b.rindex("1")
k -= size + dollars
if k < 0:
return '$'
if i%2: # if i is odd, then look in reversed order
k = size - 1 - k
# Get the character at the k-th index
for word in words:
if k < len(word):
return word[k]
k -= len(word)
You would call it like so:
print (getCharAt(['1','2','3'], 13)) # outputs 3
Generator Version
When you need to request multiple characters like that, it might be more interesting to create a generator, which just keeps producing the next character as long as you keep iterating:
def getCharacters(words):
i = 0
while True:
i += 1
if i%2:
for word in words:
yield from word
else:
for word in reversed(words):
yield from reversed(word)
b = bin(i)
dollars = len(b) - b.rindex("1")
yield from "$" * dollars
If for instance you want the first 80 characters from the infinite string that would be built from "a", "b" and "cd", then call it like this:
import itertools
print ("".join(itertools.islice(getCharacters(['a', 'b', 'cd']), 80)))
Output:
abcd$dcba$$abcd$dcba$$$abcd$dcba$$abcd$dcba$$$$abcd$dcba$$abcd$dcba$$$abcd$dcba$

Here is my solution to the problem (index starts at 1 for findIndex) I am basically counting recursively to find the value of the findIndex element.
def findInd(k,n,findIndex,orientation):
temp = k # no. of characters covered.
tempRec = n # no. of dollars to be added
bool = True # keeps track of if dollar or reverse of string is to be added.
while temp < findIndex:
if bool:
temp += tempRec
tempRec += 1
bool = not bool
else:
temp += temp - (tempRec - 1)
bool = not bool
# print(temp,findIndex)
if bool:
if findIndex <= k:
if orientation: # checks if string must be reversed.
return A[findIndex - 1]
else:
return A[::-1][findIndex - 1] # the string reverses when there is a single dollar so this is necessary
else:
if tempRec-1 == 1:
return findInd(k,1,findIndex - (temp+tempRec-1)/2,False) # we send a false for orientation as we want a reverse in case we encounter a single dollar sign.
else:
return findInd(k,1,findIndex - (temp+tempRec-1)/2,True)
else:
return "$"
A = "123" # change to suit your need
findIndex = 24 # the index to be found # change to suit your need
k = len(A) # length of the string.
print(findInd(k,1,findIndex,True))
I think this will satisfy your time constraint also as I do not go through each element.

How to find elements in a list is consecutive

Hi I have a list [tag1 tag2], I would like to find out whether the number following tag is incremental.
match_atr_value = re.search('(.+)\~(\w+)',each_line)
tags = match_atr_value.group(1).split('.')
print tag
Input:
[tag1 tag2]
[tag1 tag3]
[tag1 tag2 tag4]
Output:
Incremental
Not
Not
"Consecutive integers are integers that follow each other in order"
Is there a simpler way to do it? All I have to do is check if its incremental and if yes I should use them else I should throw an exception.
Thanks

You can extract all the digits followed by tag via re.findall() and then use enumerate() and all() to check if the numbers are consecutive:
import re
l = [
"[tag1 tag2]",
"[tag1 tag3]",
"[tag1 tag2 tag4]"
]
pattern = re.compile(r"tag(\d+)")
for item in l:
numbers = map(int, pattern.findall(item)) # if Python 3: call list() on that
result = all(index == item for index, item in enumerate(numbers, start=numbers[0]))
print(result)
Prints:
True
False
False

tag1val = int(re.search(r'\D*(\d+)', tag1).group(1))
tag2val = int(re.search(r'\D*(\d+)', tag2).group(1))
(tag1val - tag2val) == -1
True
tag3val = int(re.search(r'\D*(\d+)', tag3).group(1))
(tag1val - tag3val) == -1
False
(tag2val - tag3val) == -1
True
If you wanted to stick with regular expressions, you can go this route. If the difference is -1 then the one on the right is 1 greater than the one on the left.

If I were you, I would avoid a regex and do something like below. I get the number from the string, whether it's beginning, end, etc. and then compare it to the previous value and return False if it's not increasing
def isIncreasing( listy ):
prev = 0
for w in listy:
val = [''.join(s) for s in w if s.isdigit()]
cur = int(val[0])
if cur != prev+1:
return False
prev = cur
return True

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I find a specific pattern in a list? - python

Here's a regex solution. First I change a list like [1, -1, 0, 1, -1, -1, -1] to a string like 'ab abbb', then I search for 'a+b+', then take the maximum length of the matches: import re max(map(len, re.findall('a+b+', ''.join(' ab'[i] for i in l))))

Related

How to count the number of triplets found in a string?

Find all the positions where a substring matches a stirng

How to find the most amount of shared characters in two strings? (Python)

Infinite string

How to find elements in a list is consecutive

Categories

Resources