Efficient way to generate all possibilities of string from characters [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I am trying to randomly generate a string of n length from 5 characters ('ATGC '). I am currently using itertools.product, but it is incredibly slow. I switched to itertools.combinations_with_replacement, but it skips some values. Is there a faster way of doing this? For my application order does matter.
for error in itertools.product('ATGC ', repeat=len(errorPos)):
print(error)
for ps in error:
for pos in errorPos:
if ps == " ":
fseqL[pos] = ""
else:
fseqL[pos] = ps

If you just want a random single sequence:
import random
def generate_DNA(N):
possible_bases ='ACGT'
return ''.join(random.choice(possible_bases) for i in range(N))
one_hundred_bp_sequence = generate_DNA(100)
That was posted before post clarified spaces need; you can change possible_sequences to include a space if you need spaces allowed.
If you want all combinations that allow a space, too, a solution adapted from this answer, which I learned of from Biostars post 'all possible sequences from consensus':
from itertools import product
def all_possibilities_w_space(seq):
"""return list of all possible sequences given a completely ambiguous DNA input. Allow spaces"""
d = {"N":"ACGT "}
return list(map("".join, product(*map(d.get, seq))))
all_possibilities_w_space("N"*2) # example of length two
The idea being N can be any of "ACGT " and the multiple specifies the length. The map should specify C is used to make it faster according to the answer I adapted it from.

Related

Explain the meaning of ' i ' in the following code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 10 months ago.
Improve this question
import random
for i in range(5):
print(random.randint(1, 10))
Is it the number of integers that we want to print? But we didn't specify that it's the number of integers in the code, so how does python understand?
The Python for construct requires a variable name between for and in. Conventional practice is to use _ (underscore) as the variable in cases where a variable is required but not actually used/relevant. Note that _ is a valid variable name.
for i in range(5):
do this action
Is (the Python way of saying
"for each element in range(5)
do this action".
range(5) can be replaced by any iterable collection.
In this example the variable i is not used. We might write
for i in range(5):
print(i)
which would print out all the values from the expression range(5).
As you guessed, i in that code will be the number of random integers to be printed. That is because, in python, the range constructor will generate a sequence of integers when specified in the way you are showing.
If only one argument is specified, python will assume that you want to begin by the number zero, incrementing by one unit until it reaches the number one unit below the specified argument.

How do I check if a string contains ANY element in an array [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am trying to detect if a string contains any element in an array. I want to know if the string(msg) has any element from the array(prefixes) in it.
I want this because I want to make a discord bot with multiple prefixes, heres my garbage if statement.
if msg.startswith(prefixes[any]):
The existing answers show two ways of doing a linear search, and this is probably your best choice.
If you need something more scalable (ie, you have a lot of potential prefixes, they're very long, and/or you need to scan very frequently) then you could write a prefix tree. That's the canonical search structure for this problem, but it's obviously a lot more work and you still need to profile to see if it's really worthwhile for your data.
Try something like this:
prefixes = ('a','b','i')
if msg.startswith(prefixes):
The prefixes must be tuple because startswith function does not supports lists as a parameter.
There are algorithms for such a search, however, a functional implementation in Python may look like this:
prefixes = ['foo', 'bar']
string = 'foobar'
result = any(map(lambda x: string.startswith(x), prefixes))
If you search for x at any position in string, then change string.startswith(x) to x in string.
UPDATE
According to #MisterMiyagi in comments, the following is a more readable (possibly more efficient) statement:
result = any(string.startswith(prefix) for prefix in prefixes)

how could i check for diffrent attributes in a string (such as lower case letters, numbers and so on...) without using for loops? python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have been struggling with a task to check if a password has different attributes without for loop. Such as if it has lower case letters, upper case letters, numbers, symbols and so on. would really appreciate some help on the matter. I am new to recursive functions so I would be more than pleased if someone has an idea for a solution not involving them in python.
My attempt so far:
def strength(password):
if password[1:].isnumeric:
score = 0
if password[1:].isalpha():
score += 3
if password[1:].islower():
score += 2
if password[1:].isupper():
score += 2
if password[1:].isalpha():
score += 3
return score
Sorry that I didn't put this earlier, I'm still a little new to the site. This code only checks whether the entire password is numeric or lowercase or so on to my understanding. How can I extend this to check for other criteria, such as containing symbols?
You can create functions to check on each of those attributes (some of those already exist for Python's str object but it's a good exercise). The any operator will be your friend here:
def contains_upper(string):
uppers = 'ACBDEFGHIJKLMNOPQRSTUVWXYZ'
return any(s in uppers for s in string)
def contains_lower(string):
# etc... you should implement functions to check other attributes
Now create another function to assess if a given string pass all those tests:
def is_valid_password(string):
if not contains_upper(string):
return False
if not contains_lower(string):
return False
# do this for all attributes you want to check
return True
Using regular expressions:
import re
regexp = re.compile(r'[a-z]')
if regexp.search(mystring):
print("lower case found")
Then same with [A-Z] and so on

Newbie need Help python regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.

How to return the most similar word from a list of words? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How to create a function that returns the most similar word from a list of words, even if the word is not exactly the same?
The function should have two inputs: one for the word and the other for the list. The function should return the word that is most similar to the word.
lst = ['apple','app','banana store','pear','beer']
func('apple inc.',lst)
>>'apple'
func('banana',lst)
>>'banana store'
From doing some research, it seems that I have to use the concepts of Fuzzy String Matching, NLTK, and Levenshtein-distance, which I'm having a hard time trying to implement in creating a function like this.
I should also point out that by similar, I just mean the characters and I'm not concerned for the meaning of the word at all.
Slow solution for debugging:
def func(word, lst):
items = sorted((dist(word, w), w) for w in lst)
# Print items here for debugging.
if not items:
raise ValueError('List of words is empty.')
return items[0][1]
Or, this is faster and uses less memory:
def func(word, lst):
return min((dist(word, w), w) for w in lst)[1]
See https://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison for implementing dist. One of the answers has a link to a Levenshtein-distance implementation.

Categories