python - count AABB-like occurrence in a list [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a list as below:
a=[0,0,2,4,4,6,6,9,12,13,13,16,16,21,21,24,26,26,28,28,31,34,34,37,37]
The list satisfies:
1.sorted in ascending order
2.each number occurs 1-2 times
How to count all AABB-like occurrence in the list?
In the above example the answer should be 5
(4,4,6,6) (13,13,16,16) (16,16,21,21) (26,26,28,28) (34,34,37,37)

There are many ways to do this. The simplest I could think of is using 'enumerate' and list comprehension with a condition to test for 'aabb' patterns.
result = len([x for idx,x in enumerate(a) if idx<len(a)-3 and x == a[idx+1] and x!=a[idx+2] and a[idx+2] == a[idx+3]])
The idx < len(a) - 3 avoids index problems.

Although it may not seem that way at first (because you are processing a list of ints) this is actually an example of a string searching algorithm.
If you look at the wikipedia article (linked to above) you will see that there are quite a few uses for such algorithms, beyond simply searching strings, one major one being searching DNA sequences for a given pattern, so it is quite an important area of computer science.
As well as multiple uses there are multiple implementations so you could approach this several ways.
The naive approach is to simple iterate through the list and check to see if the next element matches the current element and then if the following elements also match. The problem here is that you have to go through the whole list and then iterate through each sublist to check if it matches the given pattern. In big O notation we say this approach has a complexity of O(nm) where n is the length of the list and m is the length of the pattern you are searching, so it is not very efficient.
There are many ways to improve on the naive approach, and there may even be some that are unknown. I'll leave that to you to figure out, but hope this gives you some pointers.

Just loop through - and compare the pairs
a=[0,0,2,4,4,6,6,9,12,13,13,16,16,21,21,24,26,26,28,28,31,34,34,37,37]
for i in range(len(a)-3):
if (a[i]==a[i+1] and a[i+2]==a[i+3]):
print(str(a[i])+str(a[i+1])+str(a[i+2])+str(a[i+3]))

Related

Efficient way to generate all possibilities of string from characters [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I am trying to randomly generate a string of n length from 5 characters ('ATGC '). I am currently using itertools.product, but it is incredibly slow. I switched to itertools.combinations_with_replacement, but it skips some values. Is there a faster way of doing this? For my application order does matter.
for error in itertools.product('ATGC ', repeat=len(errorPos)):
print(error)
for ps in error:
for pos in errorPos:
if ps == " ":
fseqL[pos] = ""
else:
fseqL[pos] = ps
If you just want a random single sequence:
import random
def generate_DNA(N):
possible_bases ='ACGT'
return ''.join(random.choice(possible_bases) for i in range(N))
one_hundred_bp_sequence = generate_DNA(100)
That was posted before post clarified spaces need; you can change possible_sequences to include a space if you need spaces allowed.
If you want all combinations that allow a space, too, a solution adapted from this answer, which I learned of from Biostars post 'all possible sequences from consensus':
from itertools import product
def all_possibilities_w_space(seq):
"""return list of all possible sequences given a completely ambiguous DNA input. Allow spaces"""
d = {"N":"ACGT "}
return list(map("".join, product(*map(d.get, seq))))
all_possibilities_w_space("N"*2) # example of length two
The idea being N can be any of "ACGT " and the multiple specifies the length. The map should specify C is used to make it faster according to the answer I adapted it from.

How do I check if a string contains ANY element in an array [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am trying to detect if a string contains any element in an array. I want to know if the string(msg) has any element from the array(prefixes) in it.
I want this because I want to make a discord bot with multiple prefixes, heres my garbage if statement.
if msg.startswith(prefixes[any]):
The existing answers show two ways of doing a linear search, and this is probably your best choice.
If you need something more scalable (ie, you have a lot of potential prefixes, they're very long, and/or you need to scan very frequently) then you could write a prefix tree. That's the canonical search structure for this problem, but it's obviously a lot more work and you still need to profile to see if it's really worthwhile for your data.
Try something like this:
prefixes = ('a','b','i')
if msg.startswith(prefixes):
The prefixes must be tuple because startswith function does not supports lists as a parameter.
There are algorithms for such a search, however, a functional implementation in Python may look like this:
prefixes = ['foo', 'bar']
string = 'foobar'
result = any(map(lambda x: string.startswith(x), prefixes))
If you search for x at any position in string, then change string.startswith(x) to x in string.
UPDATE
According to #MisterMiyagi in comments, the following is a more readable (possibly more efficient) statement:
result = any(string.startswith(prefix) for prefix in prefixes)

Using itertools to get a large number of unique permutations [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have a question as such: How many ways are there to change order of letters in "avocadojuice" so that the vowel comes up first, for example "ovacadojuice" is a solution? The starting point - "avocadojuice" is also a solution.
I know that itertools.permutations can do things like this, but if the word is too long it pops up a memory error. Is there a way to prevent this, or maybe there is another built in module, which can solve this? Thank you in advance!
P.S. I know how to turn permutation tuples into strings.
This is a case were brute-forcing the problem probably isn't feasible. You need to compute the number of unique permutations of n elements, accounting for the fact that some elements are repeated.
There is a mathematical formula for this, and some excellent answers on other stack exchange sites.
Only a and o are repeated, so the number of unique permuations of avocadojuice is
(12!) / (2!2!)
or 119750400
At roughly 45 bytes per 12-character string (on my machine at least), that's over 5 gigs of memory just to store the nearly 120 million permutations! You can see why brute forcing this isn't such a great idea.
You have one extra requirement in your problem though, which is that the permutations must start with a vowel. Given that there are only 5 vowels, you should be able to calculate the possible permutations with each of the given vowels as the first character.
(11! / 2!) + # a (only o is repeated)
(11! / (2!2!)) + # e
(11! / (2!2!)) + # i
(11! / 2!) + # o (only a is repeated)
(11! / (2!2!)) + # u

Matching two comma seperated strings in Python and correct position counts [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Improve this question
Hope get assistance on the below problem.
Python having more in-built sequence matcher functions. Whether the following requirement can be done through any built-in function without looping.
x = 'hu1_X','hu2_Y','hu3_H','hu4_H','hu5_H','hu7_H'
y = 'hu1_H','hu2_X','hu3_H','hu4_H','hu5_H','hu7_X'
for comparing the above string the final match count is 3.
Matches are: 'hu3_H','hu4_H','hu5_H'.
Any idea, which in-built function can use? Can we go with ZIP() in Python.
Thanks in advance.
You can use a generator expression and the builtin sum function along with zip, like this
x = 'hu1_X', 'hu2_Y', 'hu3_H', 'hu4_H', 'hu5_H', 'hu7_H'
y = 'hu1_H', 'hu2_X', 'hu3_H', 'hu4_H', 'hu5_H', 'hu7_X'
print(sum(item1 == item2 for item1, item2 in zip(x, y)))
# 3
This works because, in Python, True and False can be treated as 1 and 0 respectively. So, we can simply compare the corresponding elements and the result of that evaluation can be added together to get the total match count.

Change an integer to a string (numbers to words) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
How to convert number such as 24 to the two words "two", "four".
Quick way I thought of.
First you need a way to loop through the integer. You can try doing some weird diving by 10 and using the modulus of it... or just convert it to a string.
Then you can iterate through each 'number' in the string and use a simple lookup table to print out each number.
numberconverterlookup={'1':'one'
'2':'two'
'3':'three'
'4':'four'
'5':'five'
'6':'six'
'7':'seven'
'8':'eight'
'9':'nine'
'0':'zero'
}
number = 24
stringnumber = str(number)
for eachdigit in stringnumber:
print numberconverterlookup[eachdigit]
Note this only handles single digits and can't easily handle large numbers. Otherwise you'd have to write out each number in the lookup table by hand. That is very cumbersome.
Some key concepts are illustrated here:
Dictionary: This maps a 'key' to a 'value'. I.e. '1' maps to 'one'
For loop: This allows us to go through each digit in the number. In the case of 24, it will loop twice, once with eachdigit set to '2', and loops around again with eachdigit set to '4'. We cant loop through an integer because it is itself a single entity.
Typecasting: This converts the integer type 24 into a string '24'. A string is basically a list of individual characters grouped together, whereas an integer is a single entity.

Categories