Generate wordlist with known characters - python

I'm looking to write a piece of code in Javascript or Python that generates a wordlist file out of a pre-defined combination of characters.
E.g.
input = abc
output =
ABC
abc
Abc
aBc
abC
AbC
ABc
aBC
I have very basic knowledge of either so all help is appreciated.
Thank you

I'll assume that you're able to import Python packages. Therefore, take a look at itertools.product:
This tool computes the cartesian product of input iterables.
For example, product(A, B) returns the same as ((x,y) for x in A for y in B).
It looks quite like what you're looking for, right? That's every possible combination from two different lists.
Since you're new to Python, I'll assume you don't know what a map is. Nothing too hard to understand:
Returns a list of the results after applying the given function to each item of a given iterable (list, tuple etc.)
That's easy! So the first parameter is the function you want to apply and the second one is your iterable.
The function I applied in the map is as follows:
''.join
This way you set '' as your separator (basically no separator at all) and put together every character with .join.
Why would you want to put together the characters? Well, you'll have a list (a lot of them in fact) and you want a string, so you better put those chars together in each list.
Now here comes the hard part, the iterable inside the map:
itertools.product(*((char.upper(), char.lower()) for char in string)
First of all notice that * is the so-called splat operator in this situation. It splits the sequence into separate arguments for the function call.
Now that you know that, let's dive into the code.
Your (A, B) for itertools.product(A, B) are now (char.upper(), char.lower()). That's both versions of char, upper and lowercase. And what's char? It's an auxiliar variable that will take the value of each and every character in the given string, one at a time.
Therefore for input 'abc' char will take values a, b and c while in the loop, but since you're asking for every possible combination of uppercase and lowercase char you'll get exactly what you asked for.
I hope I made everything clear enough. :)
Let me know if you need any further clarification in the comments. Here's a working function based on my previous explanation:
import itertools
def func():
string = input("Introduce some characters: ")
output = map(''.join, itertools.product(*((char.upper(), char.lower()) for char in string)))
print(list(output))
As an additional note, if you printed output you wouldn't get your desired output, you have to turn the map type into a list for it to be printable.

A simple approach using generators, and no library code. It returns a generator (iterator-like object), but can be converted to a list easily.
def lU(s):
if not s:
yield ''
else:
for sfx in lU(s[1:]):
yield s[0].upper() + sfx
yield s[0].lower() + sfx
print list(lU("abc"))
Note that all the sub-lists of suffixes are not fully expanded, but the number of generator objects (each a constant size) that get generated is proportional to the length of the string.

Related

More efficient way to iterate through n-nested for loops in Python

I'm currently working on a "dehashing" script that lets user type in an input, together with the hashing method, and the script iterates through a list of characters, builds strings of different lengths, and tries to check if any of the character combinations (of lengths 1-8), hashed, are equal to the input provided by the user.
For example, the user provides the hashed version of 'password', and the algorithm takes all the possibilities, starting from length 1:
Length 1: a, b, c, d, ..., z
Length 2: aa, ab, ac, ..., zz
Length 3: aaa, aab, aac, ..., zzz
and so on, until it reaches Length 8 (including it).
It hashes all the possibilities, one by one, and it checks if they are equal to the user's input. If so, the program outputs the unhashed string and stops the searching.
I firstly thought about using 1 for() loop for length 1, 2 nested for() loops for length 2, and so on, but thought that I might copy and paste too much of the same code, so I Googled for some other options, and I found out that I can use itertools.
This is how I'm generating my n-nested for() loops:
chars = "abcdefghijklmnopqrstuvwxyz"
ranges = []
for i in range(0, length):
ranges.append(range(0, len(chars)))
for xs in itertools.product(*ranges):
# build the string here, hash it and check if it maches the user's input
I'm not providing the full implementation, because there's more than just checking (writing into files if something is found, outputting stuff, etc.).
The idea is, I realized that this algorithm works pretty well for lengths 1-4. Strings with length 1, 2 or 3 are found in less than a second, while strings with length 4 can also require several minutes.
I also "improved" the searching, using multiprocessing and searching for groups of two lengths per process.
The problem is, the algorithm is still not efficient enough. If I want to search for a string with length 5, for example, I'll have to wait even hours, and I'm pretty sure that's a more efficient way of implementing what I actually did.
Also tested the execution time of n-nested normal for() loops vs this type of itertool implementation, and found out that the for() loops are 2x faster. Shouldn't it have been exactly the reverse?
Do you have any advice on how to improve my algorithm?
You can use chars directly as the iterable for itertools.product. In addition, product accepts an optional argument repeat if you want the product of an iterable with itself. Refer to the documentation.
product generates tuples. To get a string out of a tuple of strings, use ''.join().
from itertools import product
def find_password(hashed, length, chars = "abcdefghijklmnopqrstuvwxyz"):
for p in product(chars, repeat=length):
if hash(''.join(p)) == hashed:
return ''.join(p)
return None
password = 'aaabc'
print( find_password(hash(password), len(password)) )
# aaabc
Additionally, you could use from string import ascii_lowercase instead of hardcoding your own alphabet:
from string import ascii_lowercase
print(ascii_lowercase)
# abcdefghijklmnopqrstuvwxyz

Is there a way to create a list of characters, ord() them to integers, and hex() them to a hex code?

I am a beginner programmer using Python, and I am trying to create encryption software (Beginner).
I am looking for something like:
Input: Apple -> A, P, P, L, E -> ord() -> 97,"","","","" -> hex() -> 0x16, "","" ,"" ,""
However, I cannot find a way to translate my characters to integers while accounting for an unknown amount of characters in input.
Sentence = list(input("Enter"))
print(Sentence)
ord_sentence = []
for each in range(len(Sentence)):
ord_sentence.append(ord(Sentence[]))
This then doesn't work because the argument at the end of Sentence is empty, but I don't know how to make it fill with each individual character. I could try
...
...
while len(ord_sentence) <= len(Sentence)
ord_sentence.append(ord(sentence[0]))
ord_sentence.append(ord(sentence[1]))
##Continues on to 1000 to account for unknown input##
But then, I run into INDEX ERROR when the input isn't exactly 1000 characters long, and putting something like:
...
ord_sentence.append(ord(sentence[0]))
if IndexError:
print(ord_sentence)
break
Only results in it printing the first digit of the sequence, and then breaking.
Any help would be greatly appreciated! Thank you!!!
I think you need to read about how loops work again. When you iterate over something, the value gets assigned to a variable. In your code, that's each. You never use that variable for anything, but I think it's what you're looking for.
for each in range(len(Sentence)):
ord_sentence.append(ord(Sentence[each]))
Iterating over a range and indexing as you're doing here works, but it's not as direct as just iterating on the list directly. You could instead do:
for each in Sentence: # no range, len, each is a character
ord_sentence.append(ord(each)) # take its ord() directly
Or you could use a list comprehension to build a new list form an old one directly, without a separate loop and a bunch of append calls:
ord_sentence = [ord(each) for each in Sentence]
While each is the name you've been using in your code, it is better practice to give a more specific name to your variables, that tells you what the value means. In the first version here, where you're iterating over a range, I'd use index, since that's what the number you get is (an index into the list). For the other two, value or character might make more sense, since the value is a single character from the Sentence list. Speacking of that list, its name is a little misleading, as I'd expect a sentence to be a string, or maybe a list of words, not a list of characters (that might have come from more or less than one sentence, e.g. ['f', 'o', 'o'] or ['F', 'o', 'o', '.', ' ', 'B', 'a', 'r', '.']).
Don't use a while loop for this. If possible, you should avoid using indexes — these are a frequent source of small bugs and can make the code hard to read. Python makes it very easy to loop directly over values. You should have a really good reason to use:
for each in range(len(Sentence)):
instead of:
for a_char in Sentence:
# use a_char here
which will give you each character in turn
Or a comprehension, which will do the same and create a list at the same time. These are central to python.
[a_char for a_char in s]
Together with join and your hex() and ord() functions, this becomes very succinct. You can compose function like hex(ord('A')). With that, you can make a comprehension that will handle strings of what ever length you pass:
s = "APPLE"
codedList = [hex(ord(c)) for c in s]
# ['0x41', '0x50', '0x50', '0x4c', '0x45']
# ... or:
codedstring = "".join(hex(ord(c)) for c in s)
# '0x410x500x500x4c0x45'
this does what you asked for
s='APPLE';
l=list(s);
h=list(map(lambda x: '0x%x' % (ord(x)), l));
print(h);

Comparing and Combining List Items in Python

Im working on Advent of Code: Day 2, and Im having trouble working with lists. My code takes a string, for example 2x3x4, and splits it into a list. Then it checks for an 'x' in the list and removes them and feeds the value to a method that calculates the area needed. The problem is that before it removes the 'x's I need to find out if there are two numbers before the 'x' and combine them, to account for double digit numbers. I've looked into regular expressions but I don't think I've been using it right. Any ideas?
def CalcAreaBox(l, w, h):
totalArea = (2*(l*w)) + (2*(w*h))+ (2*(h*l))
extra = l * w
toOrder = totalArea + extra
print(toOrder)
def ProcessString(dimStr):
#seperate chars into a list
dimStrList = list(dimStr)
#How to deal with double digit nums?
#remove any x
for i in dimStrList:
if i == 'x':
dimStrList.remove(i)
#Feed the list to CalcAreaBox
CalcAreaBox(int(dimStrList[0]), int(dimStrList[1]), int(dimStrList[2]))
dimStr = "2x3x4"
ProcessString(dimStr)
You could use split on your string
#remove any x and put in list of ints
dims = [int(dim) for dim in dimStrList.split('x')]
#Feed the list to CalcAreaBox
CalcAreaBox(dims[0], dims[1], dims[2])
Of course you will want to consider handling the cases where there are not exactly two X's in the string
Your question is more likely to fit on Code Review and not Stack Overflow.
As your task is a little challenge, I would not tell you an exact solution, but give you a hint towards the split method of Python strings (see the documentation).
Additionally, you should check the style of your code against the recommendation in PEP8, e.g. Python usually has function/variable names in all lowercase letters, words separated by underscores (like calc_area_box).

Display the number of lower case letters in a string

This is what I have so far:
count=0
mystring=input("enter")
for ch in mystring:
if mystring.lower():
count+=1
print(count)
I figured out how to make a program that displays the number of lower case letters in a string, but it requires that I list each letter individually: if ch=='a' or ch=='b' or ch=='c', etc. I am trying to figure out how to use a command to do so.
This sounds like homework! Anway, this is a fun way of doing it:
#the operator module contains functions that can be used like
#their operator counter parts. The eq function works like the
#'=' operator; it takes two arguments and test them for equality.
from operator import eq
#I want to give a warning about the input function. In python2
#the equivalent function is called raw_input. python2's input
#function is very different, and in this case would require you
#to add quotes around strings. I mention this in case you have
#been manually adding quotes if you are testing in both 2 and 3.
mystring = input('enter')
#So what this line below does is a little different in python 2 vs 3,
#but comes to the same result in each.
#First, map is a function that takes a function as its first argument,
#and applies that to each element of the rest of the arguments, which
#are all sequences. Since eq is a function of two arguments, you can
#use map to apply it to the corresponding elements in two sequences.
#in python2, map returns a list of the elements. In python3, map
#returns a map object, which uses a 'lazy' evaluation of the function
#you give on the sequence elements. This means that the function isn't
#actually used until each item of the result is needed. The 'sum' function
#takes a sequence of values and adds them up. The results of eq are all
#True or False, which are really just special names for 1 and 0 respectively.
#Adding them up is the same as adding up a sequence of 1s and 0s.
#so, map is using eq to check each element of two strings (i.e. each letter)
#for equality. mystring.lower() is a copy of mystring with all the letters
#lowercase. sum adds up all the Trues to get the answer you want.
sum(map(eq, mystring, mystring.lower()))
or the one-liner:
#What I am doing here is using a generator expression.
#I think reading it is the best way to understand what is happening.
#For every letter in the input string, check if it is lower, and pass
#that result to sum. sum sees this like any other sequence, but this sequence
#is also 'lazy,' each element is generated as you need it, and it isn't
#stored anywhere. The results are just given to sum.
sum(c.islower() for c in input('enter: '))
You have a typo in your code. Instead of:
if my.string.lower():
It should be:
if ch.islower():
If you have any questions ask below. Good luck!
I'm not sure if this will handle UTF or special characters very nicely but should work for at least ASCII in Python3, using the islower() function.
count=0
mystring=input("enter:")
for ch in mystring:
if ch.islower():
count+=1
print(count)
The correct version of your code would be:
count=0
mystring=input("enter")
for ch in mystring:
if ch.islower():
count += 1
print(count)
The method lower converts a string/char to lowercase. Here you want to know if it IS lowercase (you want a boolean), so you need islower.
Tip: With a bit of wizardry you can even write this:
mystring= input("enter")
count = sum(map(lambda x: x.islower(), mystring))
or
count = sum([x.islower() for x in mystring])
(True is automatically converted to 1 and False to 0)
:)
I think you can use following method:
mystring=input("enter:")
[char.lower() for char in mystring].count( True ) )

Looking for elegant glob-like DNA string expansion

I'm trying to make a glob-like expansion of a set of DNA strings that have multiple possible bases.
The base of my DNA strings contains the letters A, C, G, and T. However, I can have special characters like M which could be an A or a C.
For example, say I have the string:
ATMM
I would like to take this string as input and output the four possible matching strings:
ATAA
ATAC
ATCA
ATCC
Rather than brute force a solution, I feel like there must be some elegant Python/Perl/Regular Expression trick to do this.
Thank you for any advice.
Edit, thanks cortex for the product operator. This is my solution:
Still a Python newbie, so I bet there's a better way to handle each dictionary key than another for loop. Any suggestions would be great.
import sys
from itertools import product
baseDict = dict(M=['A','C'],R=['A','G'],W=['A','T'],S=['C','G'],
Y=['C','T'],K=['G','T'],V=['A','C','G'],
H=['A','C','T'],D=['A','G','T'],B=['C','G','T'])
def glob(str):
strings = [str]
## this loop visits very possible base in the dictionary
## probably a cleaner way to do it
for base in baseDict:
oldstrings = strings
strings = []
for string in oldstrings:
strings += map("".join,product(*[baseDict[base] if x == base
else [x] for x in string]))
return strings
for line in sys.stdin.readlines():
line = line.rstrip('\n')
permutations = glob(line)
for x in permutations:
print x
Agree with other posters that it seems like a strange thing to want to do. Of course, if you really want to, there is (as always) an elegant way to do it in Python (2.6+):
from itertools import product
map("".join, product(*[['A', 'C'] if x == "M" else [x] for x in "GMTTMCA"]))
Full solution with input handling:
import sys
from itertools import product
base_globs = {"M":['A','C'], "R":['A','G'], "W":['A','T'],
"S":['C','G'], "Y":['C','T'], "K":['G','T'],
"V":['A','C','G'], "H":['A','C','T'],
"D":['A','G','T'], "B":['C','G','T'],
}
def base_glob(glob_sequence):
production_sequence = [base_globs.get(base, [base]) for base in glob_sequence]
return map("".join, product(*production_sequence))
for line in sys.stdin.readlines():
productions = base_glob(line.strip())
print "\n".join(productions)
You probably could do something like this in python using the yield operator
def glob(str):
if str=='':
yield ''
return
if str[0]!='M':
for tail in glob(str[1:]):
yield str[0] + tail
else:
for c in ['A','G','C','T']:
for tail in glob(str[1:]):
yield c + tail
return
EDIT: As correctly pointed out I was making a few mistakes. Here is a version which I tried out and works.
This isn't really an "expansion" problem and it's almost certainly not doable with any sensible regular expression.
I believe what you're looking for is "how to generate permutations".
You could for example do this recursively. Pseudo-code:
printSequences(sequence s)
switch "first special character in sequence"
case ...
case M:
s1 = s, but first M replaced with A
printSequences(s1)
s2 = s, but first M replaced with C
printSequences(s2)
case none:
print s;
Regexps match strings, they're not intended to be turned into every string they might match.
Also, you're looking at a lot of strings being output from this - for instance:
MMMMMMMMMMMMMMMM (16 M's)
produces 65,536 16 character strings - and I'm guessing that DNA sequences are usually longer than that.
Arguably any solution to this is pretty much 'brute force' from a computer science perspective, because your algorithm is O(2^n) on the original string length. There's actually quite a lot of work to be done.
Why do you want to produce all the combinations? What are you going to do with them? (If you're thinking to produce every string possibility and then look for it in a large DNA sequence, then there are much better ways of doing that.)

Categories