This is what I wrote...
def brute(m,pattern=None):
letters = 'abcdefghijklmnopqrstuvwxyz'
spec = '##&$%*()+'
upper = letters.upper()
number = '1234567890'
info = {'#':spec,'^':upper,'%':letters,'*':number}
chars = [info.get(p,letters) for _,p in zip(range(m),pattern or letters)]
def inner(m):
if m:
for l in chars[~m]:
for j in inner(m-1):
yield(l+j)
else:
for l in chars[~m]:
yield l
for i in inner(m-1):
print(i)
I want to know how to write a tool similar to crunch in kali...
I would be grateful if you could implement it in Python.
And why is my code so slow even when I write the output to file??
How to make it faster??
Here is an itertools based approach which might do what you want:
import itertools, string
def brute(m,pattern=None):
if pattern is None:
pattern = '%'*m
letters = string.ascii_lowercase
upper = string.ascii_uppercase
spec = '##&$%*()+'
number = '1234567890'
info = {'#':spec,'^':upper,'%':letters,'*':number}
chars = [info.get(d,letters) for d in pattern]
return [''.join(p) for p in itertools.product(*chars)]
For example, words = brute(6,'#%%*#^') takes about 2 seconds to evaluate to a list of 14236560 words.
Related
yes i know replace methods is not what i need but i don't know how to do it, it must to show the letters from a word, slowly, below the code i type the output what i want
import random
a = "hello"
x = "_" * len(a)
c = x.replace(x[random.randint(0, len(a) -1 )], a[random.randint(0, len(a) - 1)])
print(c)```
the output what i want is something like
_____
2seconds later
__ll_
2sl...
h_ll_
2....
hell_
2...
hello
You can do it various ways: by replacing, by indexing, by regex ...
A simple implementation using a regex that reads from a whitelist:
from random import shuffle
from time import sleep
from re import sub
word = "hello"
sleep_time = 2
mask_char = '_'
char_list = list(word)
shuffle(char_list) # Mix the order of the letters.
# Make the letters from the list unique,
# then make it a list again for indexing later:
char_list = list(set(char_list))
whitelist = ""
# Print the full mask:
print(mask_char*len(word))
sleep(sleep_time)
for ch in char_list:
whitelist += ch
print(sub(fr"[^{whitelist}]", mask_char, word))
if ch != char_list[-1]:
sleep(sleep_time)
You'd need the time module for the sleep function. Then it's just a matter of getting each letter in the a string, shuffling them, looping through them and then each iteration you loop over a and replace the letter in x.
I made x a list due to strings being immutable in python.
from random import sample
from time import sleep
a = "hello"
x = ["_" for _ in a]
letters = frozenset(a)
for letter in sample(letters, len(letters)):
print(''.join(x))
for i, replace in enumerate(a):
if replace == letter:
x[i] = letter
sleep(2)
print(''.join(x))
So I'm trying to make a python script that takes a pattern (ex: c**l) where it'll return every iteration of the string (* = any character in the alphabet)...
So, we get something like: caal, cbal, ccal and so forth.
I've tried using the itertools library's product but I haven't been able to make it work properly. So after 2 hours I've decide to turn to Stack Overflow.
Here's my current code. It's not complete since I feel stuck
alphabet = list('abcdefghijklmnopqrstuvwxyz')
wildChar = False
tmp_string = ""
combinations = []
if '*' in pattern:
wildChar = True
tmp_string = pattern.replace('*', '', pattern.count('*')+1)
if wildChar:
tmp = []
for _ in range(pattern.count('*')):
tmp.append(list(product(tmp_string, alphabet)))
for array in tmp:
for instance in array:
combinations.append("".join(instance))
tmp = []
print(combinations)
You could try:
from itertools import product
from string import ascii_lowercase
pattern = "c**l"
repeat = pattern.count("*")
pattern = pattern.replace("*", "{}")
for letters in product(ascii_lowercase, repeat=repeat):
print(pattern.format(*letters))
Result:
caal
cabl
cacl
...
czxl
czyl
czzl
Use itertools.product
import itertools
import string
s = 'c**l'
l = [c if c != '*' else string.ascii_lowercase) for c in s]
out = [''.join(c) for c in itertools.product(*l)]
Output:
>>> out
['caal',
'cabl',
'cacl',
'cadl',
'cael',
'cafl',
'cagl',
'cahl',
'cail',
'cajl'
...
What I want is to generate a string in this specific format: l+l+l+d+d+d+d+l+d+l+l+l+l+d+d+d+d+l+d+l+l+l+l+d+d+d+d+l+d+l+l+l+l+d+d+d+d+l+d+l With each l and d being a different string or number.
The issue is when I try to generate, the whole thing is the same value/string. But I want it different.
Here is an example:
What I am getting:
lll9999l9llll9999l9llll9999l9llll9999l9l
What I need:
bfb7491w3anfr4530x2zzbg9891u2rbep8421m9s
def id_gen():
l = random.choice(string.ascii_lowercase)
d = random.choice(string.digits)
id = l+l+l+d+d+d+d+l+d+l+l+l+l+d+d+d+d+l+d+l+l+l+l+d+d+d+d+l+d+l+l+l+l+d+d+d+d+l+d+l
print(id)
The result:
lll9999l9llll9999l9llll9999l9llll9999l9l
I need this to generate something different :)
This seems to work for me:
def gen_id() :
pattern = 'lllddddldllllddddldllllddddldllllddddldl'
digits = [random.choice(string.digits) for i in range(len(pattern))]
letters = [random.choice(string.ascii_lowercase) for i in range(len(pattern))]
return ''.join( digits[i] if pattern[i] == 'd' else letters[i] for i in range(len(pattern)) )
testing:
>>> gen_id()
'lnx1066k0hnrd5409d1nhgo1254t6rzyw5165f8v'
>>> gen_id()
'sbc7119f4ythd8845i1afay1900f4wjcv0659b4e'
>>> gen_id()
'yan6228r0nebj5097y7jnwh7065s7osra0391j5f'
>>>
seems different enough... please, don't forget to import string, random =)
To not consume the random generator, IMHO this is the best solution:
def gen_id(pattern) :
l = len(pattern)
d = pattern.count('d')
digits = random.choices(string.digits, d)
letters = random.choices(string.ascii_lowercase, l-d)
return ''.join( digits.pop() if pattern[i] == 'd' else letters.pop() for i in range(l) )
You can use this to get a random combination of letters and digits in the desired order:
def letter():
return random.choice(string.ascii_lowercase)
def digit():
return random.choice(string.digits)
def id_gen():
return letter() + digit() + letter() + letter() # ldll
I'm having trouble with a script to replace the normal letters to especial characters to test a translation system, here's an example (cha-mate is chá-mate but would be tested with chã-mate/chã-máte and other variations), but instead of creating this variations, it's switching all of the same characters to only one espcial letter, here's what it's printing:
chá-máte
chã-mãte
Here's what should print in theory:
cha-máte
cha-mãte
chá-mate
chã-mate
etc.
Here's the code and the json utilized:
def translation_tester(word):
esp_chars = {
'a': 'áã',
}
#words = [word]
for esp_char in esp_chars:
if esp_char in word:
replacement_chars = esp_chars[esp_char]
for i in range(len(replacement_chars)):
print(word.replace(esp_char, replacement_chars[i]))
def main():
words = ['cha-mate']
for word in words:
translation_tester(word)
main()
Anyway, any help is appreciated, thanks in advance!
To handle arbitrary number of replacements, you need to use recursion. This is how I did it.
intword = 'cha-mate'
esp_chars = {'a': 'áã'}
def wpermute(word, i=0):
for idx, c in enumerate(word[i:], i):
if c in esp_chars:
for s in esp_chars[c]:
newword = word[0:idx] + s + word[idx + 1:]
wpermute(newword, idx + 1)
if idx == len(word) -1:
print(word)
wpermute(intword)
which gives the output of 9 different ways the word can be written.
chá-máte
chá-mãte
chá-mate
chã-máte
chã-mãte
chã-mate
cha-máte
cha-mãte
cha-mate
There might be a nicer way to do this, but you can do the following (making sure to include the plain 'a' in the list of replacement chars):
import itertools
import re
def replace_at_indices(word, new_chars, indices):
new_word = word
for i, index in enumerate(indices):
new_word = new_word[:index] + new_chars[i] + new_word[index+1:]
return new_word
def translation_tester(word):
esp_chars = {
'a': 'aáã',
}
for esp_char in esp_chars:
replacement_chars = list(esp_chars[esp_char])
indices = [m.start() for m in re.finditer(esp_char, word)]
product = list(itertools.product(replacement_chars, repeat=len(indices)))
for p in product:
new_word = replace_at_indices(word, p, indices)
print(new_word)
def main():
words = ['cha-mate']
for word in words:
translation_tester(word)
main()
For your example, this should give you:
cha-mate
cha-máte
cha-mãte
chá-mate
chá-máte
chá-mãte
chã-mate
chã-máte
chã-mãte
See also:
Find all occurrences of a substring in Python
generating permutations with repetitions in python
Replacing a character from a certain index
I have string of some length consisting of only 4 characters which are 'A,T,G and C'. I have pattern 'GAATTC' present multiple times in the given string. I have to cut the string at intervals where this pattern is..
For example for a string, 'ATCGAATTCATA', I should get output of
string one - ATCGA
string two - ATTCATA
I am newbie in using Python but I have come up with the following (incomplete) code:
seq = seq.upper()
str1 = "GAATTC"
seqlen = len(seq)
seq = list(seq)
for i in range(0,seqlen-1):
site = seq.find(str1)
print(site[0:(i+2)])
Any help would be really appreciated.
First lets develop your idea of using find, so you can figure out your mistakes.
seq = 'ATCGAATTCATAATCGAATTCATAATCGAATTCATA'
seq = seq.upper()
pattern = "GAATTC"
split_at = 2
seqlen = len(seq)
i = 0
while i < seqlen:
site = seq.find(pattern, i)
if site != -1:
print(seq[i: site + split_at])
i = site + split_at
else:
print seq[i:]
break
Yet python string sports a powerful replace method that directly replaces fragments of string. The below snippet uses the replace method to insert separators when needed:
seq = 'ATCGAATTCATAATCGAATTCATAATCGAATTCATA'
seq = seq.upper()
pattern = "GA","ATTC"
pattern1 = ''.join(pattern) # 'GAATTC'
pattern2 = ' '.join(pattern) # 'GA ATTC'
splited_seq = seq.replace(pattern1, pattern2) # 'ATCGA ATTCATAATCGA ATTCATAATCGA ATTCATA'
print (splited_seq.split())
I believe it is more intuitive and should be faster then RE (which might have lower performance, depending on library and usage)
Here is a simple solution :
seq = 'ATCGAATTCATA'
seq_split = seq.upper().split('GAATTC')
result = [
(seq_split[i] + 'GA') if i % 2 == 0 else ('ATTC' + seq_split[i])
for i in range(len(seq_split)) if len(seq_split[i]) > 0
]
Result :
print(result)
['ATCGA', 'ATTCATA']
BioPython has a restriction enzyme package to do exactly what you're asking.
from Bio.Restriction import *
from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA
print(EcoRI.site) # You will see that this is the enzyme you listed above
test = 'ATCGAATTCATA'.upper() # This is the sequence you want to search
my_seq = Seq(test, IUPACAmbiguousDNA()) # Create a biopython Seq object with our sequence
cut_sites = EcoRI.search(my_seq)
cut_sites contain a list of exactly where to cut the input sequence (such that GA is in the left sequence and ATTC is in the right sequence.
You can then split the sequence into contigs using:
cut_sites = [0] + cut_sites # We add a leading zero so this works for the first
# contig. This might not always be needed.
contigs = [test[i:j] for i,j in zip(cut_sites, cut_sites[1:]+[None])]
You can see this page for more details about BioPython.
My code is a bit sloppy, but you could try something like this when you want to iterate over multiple occurrences of the string
def split_strings(seq):
string1 = seq[:seq.find(str1) +2]
string2 = seq[seq.find(str1) +2:]
return string1, string2
test = 'ATCGAATTCATA'.upper()
str1 = 'GAATTC'
seq = test
while str1 in seq:
string1, seq = split_strings(seq)
print string1
print seq
Here's a solution using the regular expression module:
import re
seq = 'ATCGAATTCATA'
restriction_site = re.compile('GAATTC')
subseq_start = 0
for match in restriction_site.finditer(seq):
print seq[subseq_start:match.start()+2]
subseq_start = match.start()+2
print seq[subseq_start:]
Output:
ATCGA
ATTCATA