Remove character from a string and list all possible permutations - python

I'm trying to go through a string and every time I come across an asterisk (*) replace it with every letter in the alphabet. Once that is done and another asterisk is hit, do the same again in both positions and so on. if possible saving these permutations to a .txt file or just printing them out. This is what I have and don't know where to go further than this:
alphabet = "abcdefghijklmnopqrstuvwxyz"
for i in reversed("h*l*o"):
if i =="*":
for j in ("abcdefghijklmnopqrstuvwxyz"):
Right I have some more challenges for some of the solutions below that im trying to use.
I cannot write to a file as I just get errors.

You can:
count the amount of asterisks in the string.
Create the product of all letters with as many repetitions as given in (1).
Replace each asterisk (in order) with the matching letter:
import string
import itertools
s = "h*l*o"
num_of_asterisks = s.count('*')
for prod in itertools.product(string.ascii_lowercase, repeat=num_of_asterisks):
it = iter(prod)
new_s = ''.join(next(it) if c == '*' else c for c in s)
print(new_s)
Notes:
Instead of creating a string of all letters, just use the string module.
This converts the product's tuples to iterators for easy handling of sequential replacing of each letter.
Uses the join method to create the new string out of the input string.
The above code simply prints each permutation. You can of course replace it with writing to a file or anything else you desire.

Interesting problems. I assume you mean the cartesian product and not "permutations".
I would use itertools:
string = "h*l*o"
import itertools
# for every combination of N letters
for letters in itertools.product(alphabet, repeat=string.count('*')):
# iterate over the letters
letter_iter = iter(letters)
# replace every * with the next instance
print(''.join(i if i!='*' else next(letter_iter) for i in string))

Related

Notepad++ Regex to insert random letter or number every other character-position

I'm hoping to figure out a simple search and replace in Notepad++ to slightly obfuscate text by littering it with random letters and numbers every second ("other") character, and then be able to reverse that again with another macro.
So:
banana
would become:
bma0ndaNn4aR
(b?a?n?a?n?a?)
...And then be able to undo this again by removing every other character with a backspace.
...
I found this method so far:
(?<=.)(?!$)
How to insert spaces between characters using Regex?
But as best I understand, this is not actually capturing anything so I can't use this to replace with expressions I've found for printing random letters and numbers, such as:
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])$
I'm sure a tweak to that would work and then I could reverse it all by replacing the same search with \b.
There are better ways of doing but you can use the following python prototype as a starting point to create your own script:
import string
import random
inputText = 'banana'
#encoding
obfuscatedText = ''.join([x + random.choice(string.ascii_letters+string.digits) for x in inputText])
print(obfuscatedText)
#decoding
originalText = ''.join([x for x in obfuscatedText][0:len(obfuscatedText)-1:2])
print(originalText)
Explanations:
Encoding:
[x for x in inputText] will generate an array of chars from the input string
random.choice(string.ascii_letters+string.digits) takes one character
from the union of string.ascii_letters and string.digits
x + random.choice(string.ascii_letters+string.digits) create 2 char strings by concatenating each char of the input with the generated char.
The ''.join() operation will allow you to create a string from the char array
Decoding:
[x for x in obfuscatedText][0:len(obfuscatedText)-1:2] will allow you to get only the
char that are located at index 0,2,4,6,...
the ''.join() operation will regenerate a string from the char array
Execution:
$ python obfuscate.py
biaLncaIn4aE
banana

Optionally replacing a substring python

My list of replacement is in the following format.
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II')),.....]
What I want to achieve is optionally change the occurrence of the letter by all the possible replacements. The input word should also be a member of the list.
e.g.
input - DArA
Expected output -
['DArA','DaarA','Daaraa','DAraa','DaArA','DAraA','DaAraA','DAarA','DAarAa', 'DArAa','DAArA','DAArAA','DArAA']
My try was
lstrep = [('A',('aa','aA','Aa','AA'))]
def alte(word,lstrep):
output = [word]
for (a,b) in lstrep:
for bb in b:
output.append(word.replace(a,bb))
return output
print alte('DArA',lstrep)
The output I received was ['DArA', 'Daaraa', 'DaAraA', 'DAarAa', 'DAArAA'] i.e. All occurrences of 'A' were replaced by 'aa','aA','Aa' and 'AA' respectively. What I want is that it should give all permutations of optional replacements.
itertools.product will give all of the permutations. You can build up a list of substitutions and then let it handle the permutations.
import itertools
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II'))]
input_str = 'DArA'
# make substitution list a dict for easy lookup
lstrep_map = dict(lstrep)
# a substitution is an index plus a string to substitute. build
# list of subs [[(index1, sub1), (index1, sub2)], ...] for all
# characters in lstrep_map.
subs = []
for i, c in enumerate(input_str):
if c in lstrep_map:
subs.append([(i, sub) for sub in lstrep_map[c]])
# build output by applying each sub recorded
out = [input_str]
for sub in itertools.product(*subs):
# make input a list for easy substitution
input_list = list(input_str)
for i, cc in sub:
input_list[i] = cc
out.append(''.join(input_list))
print(out)
Try constructing tuples of all possible permutations based on the replaceable characters that occur. This will have to be achieved using recursion.
The reason recursion is necessary is that you would need a variable number of loops to achieve this.
For your example "DArA" (2 replaceable characters, "A" and "A"):
replaceSet = set()
replacements = ['A':('aa','aA','Aa','AA'),'I':('ii','iI','Ii','II'),.....]
for replacement1 in replacements["A"]:
for replacement2 in replacements["A"]:
replaceSet.add((replacement1, replacement2))
You see you need two loops for two replaceables, and n loops for n replaceables.
Think of a way you could use recursion to solve this problem. It will likely involve creating all permutations for a substring that contains n-1 replaceables (if you had n in your original string).

Python: Expanding a string of variables with integers

I'm still new to Python and learning the more basic things in programming.
Right now i'm trying to create a function that will dupilicate a set of numbers varies names.
Example:
def expand('d3f4e2')
>dddffffee
I'm not sure how to write the function for this.
Basically i understand you want to times the letter variable to the number variable beside it.
The key to any solution is splitting things into pairs of strings to be repeated, and repeat counts, and then iterating those pairs in lock-step.
If you only need single-character strings and single-digit repeat counts, this is just breaking the string up into 2-character pairs, which you can do with mshsayem's answer, or with slicing (s[::2] is the strings, s[1::2] is the counts).
But what if you want to generalize this to multi-letter strings and multi-digit counts?
Well, somehow we need to group the string into runs of digits and non-digits. If we could do that, we could use pairs of those groups in exactly the same way mshsayem's answer uses pairs of characters.
And it turns out that we can do this very easily. There's a nifty function in the standard library called groupby that lets you group anything into runs according to any function. And there's a function isdigit that distinguishes digits and non-digits.
So, this gets us the runs we want:
>>> import itertools
>>> s = 'd13fx4e2'
>>> [''.join(group) for (key, group) in itertools.groupby(s, str.isdigit)]
['d', '13', 'ff', '4', 'e', '2']
Now we zip this up the same way that mshsayem zipped up the characters:
>>> groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
>>> ''.join(c*int(d) for (c, d) in zip(groups, groups))
'dddddddddddddfxfxfxfxee'
So:
def expand(s):
groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
return ''.join(c*int(d) for (c, d) in zip(groups, groups))
Naive approach (if the digits are only single, and characters are single too):
>>> def expand(s):
s = iter(s)
return "".join(c*int(d) for (c,d) in zip(s,s))
>>> expand("d3s5")
'dddsssss'
Poor explanation:
Terms/functions:
iter() gives you an iterator object.
zip() makes tuples from iterables.
int() parses an integer from string
<expression> for <variable> in <iterable> is list comprehension
<string>.join joins an iterable strings with string
Process:
First we are making an iterator of the given string
zip() is being used to make tuples of character and repeating times. e.g. ('d','3'), ('s','5) (zip() will call the iterable to make the tuples. Note that for each tuple, it will call the same iterable twice—and, because our iterable is an iterator, that means it will advance twice)
now for in will iterate the tuples. using two variables (c,d) will unpack the tuples into those
but d is still an string. int is making it an integer
<string> * integer will repeat the string with integer times
finally join will return the result
Here is a multi-digit, multi-char version:
import re
def expand(s):
s = re.findall('([^0-9]+)(\d+)',s)
return "".join(c*int(d) for (c,d) in s)
By the way, using itertools.groupby is better, as shown by abarnert.
Let's look at how you could do this manually, using only tools that a novice will understand. It's better to actually learn about zip and iterators and comprehensions and so on, but it may also help to see the clunky and verbose way you write the same thing.
So, let's start with just single characters and single digits:
def expand(s):
result = ''
repeated_char_next = True
for char in s:
if repeated_char_next:
char_to_repeat = char
repeated_char_next = False
else:
repeat_count = int(char)
s += char_to_repeat * repeat_count
repeated_char_next = True
return char
This is a very simple state machine. There are two states: either the next character is a character to be repeated, or it's a digit that gives a repeat count. After reading the former, we don't have anything to add yet (we know the character, but not how many times to repeat it), so all we do is switch states. After reading the latter, we now know what to add (since we know both the character and the repeat count), so we do that, and also switch states. That's all there is to it.
Now, to expand it to multi-char repeat strings and multi-digit repeat counts:
def expand(s):
result = ''
current_repeat_string = ''
current_repeat_count = ''
for char in s:
if isdigit(char):
current_repeat_count += char
else:
if current_repeat_count:
# We've just switched from a digit back to a non-digit
count = int(current_repeat_count)
result += current_repeat_string * count
current_repeat_count = ''
current_repeat_string = ''
current_repeat_string += char
return char
The state here is pretty similar—we're either in the middle of reading non-digits, or in the middle of reading digits. But we don't automatically switch states after each character; we only do it when getting a digit after non-digits, or vice-versa. Plus, we have to keep track of all the characters in the current repeat string and in the current repeat count. I've collapsed the state flag into that repeat string, but there's nothing else tricky here.
There is more than one way to do this, but assuming that the sequence of characters in your input is always the same, eg: a single character followed by a number, the following would work
def expand(input):
alphatest = False
finalexpanded = "" #Blank string variable to hold final output
#first part is used for iterating through range of size i
#this solution assumes you have a numeric character coming after your
#alphabetic character every time
for i in input:
if alphatest == True:
i = int(i) #converts the string number to an integer
for value in range(0,i): #loops through range of size i
finalexpanded += alphatemp #adds your alphabetic character to string
alphatest = False #Once loop is finished resets your alphatest variable to False
i = str(i) #converts i back to string to avoid error from i.isalpha() test
if i.isalpha(): #tests i to see if it is an alphabetic character
alphatemp = i #sets alphatemp to i for loop above
alphatest = True #sets alphatest True for loop above
print finalexpanded #prints the final result

Reassigning letters in an alphet to a higher letter in python?

If I am building a basic encryption program in python that reassigns A to C and D to F and so on, what is a simple algorithm I could use to do this?
I have a list named alphabet that holds each letter, then a variable that takes in the user input to change to the encrypted version.
str.translate should be the easiest way:
table = str.maketrans(
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ",
"cdefghijklmnopqrstuvwxyzabCDEFGHIJKLMNOPQRSTUVWXYZAB"
)
s = "Test String"
print(s.translate(table))
Output:
Vguv Uvtkpi
There's two major parts to this. First, ciphering a single letter; and second, applying that to the whole string. We'll start with the first one.
You said you had a list with the alphabet in it. Suppose, too, that we have a letter.
>>> letter = 'F'
If we want to replace that letter with the letter two spaces down in the alphabet, first we'll probably want to find the numerical value of that letter. To do that, use index:
>>> alphabet.index(letter)
5
Next, you can add the offset to it and access it in the list again:
>>> alphabet[alphabet.index(letter) + 2]
'H'
But wait, this won't work if we try doing a letter like Z, because when we add the index, we'll go off the end of the list and get an error. So we'll wrap the value around before getting the new letter:
>>> alphabet[(alphabet.index('Z') + 2) % len(alphabet)]
'B'
So now we know how to change a single letter. Python makes it easy to apply it to the whole string. First putting our single-letter version into a function:
>>> def cipher_letter(letter):
... return alphabet[(alphabet.index(letter) + 2) % len(alphabet)]
...
We can use map to apply it over a sequence. Then we get an iterable of ciphered characters, which we can join back into a string.
>>> ''.join(map(cipher_letter, 'HELLOWORLD'))
'JGNNQYQTNF'
If you want to leave characters not in alphabet in place, add a test in cipher_letter to make sure that letter in alphabet first, and if not, just return letter. Voilà.

Can't convert 'list'object to str implicitly Python

I am trying to import the alphabet but split it so that each character is in one array but not one string. splitting it works but when I try to use it to find how many characters are in an inputted word I get the error 'TypeError: Can't convert 'list' object to str implicitly'. Does anyone know how I would go around solving this? Any help appreciated. The code is below.
import string
alphabet = string.ascii_letters
print (alphabet)
splitalphabet = list(alphabet)
print (splitalphabet)
x = 1
j = year3wordlist[x].find(splitalphabet)
k = year3studentwordlist[x].find(splitalphabet)
print (j)
EDIT: Sorry, my explanation is kinda bad, I was in a rush. What I am wanting to do is count each individual letter of a word because I am coding a spelling bee program. For example, if the correct word is 'because', and the user who is taking part in the spelling bee has entered 'becuase', I want the program to count the characters and location of the characters of the correct word AND the user's inputted word and compare them to give the student a mark - possibly by using some kind of point system. The problem I have is that I can't simply say if it is right or wrong, I have to award 1 mark if the word is close to being right, which is what I am trying to do. What I have tried to do in the code above is split the alphabet and then use this to try and find which characters have been used in the inputted word (the one in year3studentwordlist) versus the correct word (year3wordlist).
There is a much simpler solution if you use the in keyword. You don't even need to split the alphabet in order to check if a given character is in it:
year3wordlist = ['asdf123', 'dsfgsdfg435']
total_sum = 0
for word in year3wordlist:
word_sum = 0
for char in word:
if char in string.ascii_letters:
word_sum += 1
total_sum += word_sum
# Length of characters in the ascii letters alphabet:
# total_sum == 12
# Length of all characters in all words:
# sum([len(w) for w in year3wordlist]) == 18
EDIT:
Since the OP comments he is trying to create a spelling bee contest, let me try to answer more specifically. The distance between a correctly spelled word and a similar string can be measured in many different ways. One of the most common ways is called 'edit distance' or 'Levenshtein distance'. This represents the number of insertions, deletions or substitutions that would be needed to rewrite the input string into the 'correct' one.
You can find that distance implemented in the Python-Levenshtein package. You can install it via pip:
$ sudo pip install python-Levenshtein
And then use it like this:
from __future__ import division
import Levenshtein
correct = 'because'
student = 'becuase'
distance = Levenshtein.distance(correct, student) # distance == 2
mark = ( 1 - distance / len(correct)) * 10 # mark == 7.14
The last line is just a suggestion on how you could derive a grade from the distance between the student's input and the correct answer.
I think what you need is join:
>>> "".join(splitalphabet)
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
join is a class method of str, you can do
''.join(splitalphabet)
or
str.join('', splitalphabet)
To convert the list splitalphabet to a string, so you can use it with the find() function you can use separator.join(iterable):
"".join(splitalphabet)
Using it in your code:
j = year3wordlist[x].find("".join(splitalphabet))
I don't know why half the answers are telling you how to put the split alphabet back together...
To count the number of characters in a word that appear in the splitalphabet, do it the functional way:
count = len([c for c in word if c in splitalphabet])
import string
# making letters a set makes "ch in letters" very fast
letters = set(string.ascii_letters)
def letters_in_word(word):
return sum(ch in letters for ch in word)
Edit: it sounds like you should look at Levenshtein edit distance:
from Levenshtein import distance
distance("because", "becuase") # => 2
While join creates the string from the split, you would not have to do that as you can issue the find on the original string (alphabet). However, I do not think is what you are trying to do. Note that the find that you are trying attempts to find the splitalphabet (actually alphabet) within year3wordlist[x] which will always fail (-1 result)
If what you are trying to do is to get the indices of all the letters of the word list within the alphabet, then you would need to handle it as
for each letter in the word of the word list, determine the index within alphabet.
j = []
for c in word:
j.append(alphabet.find(c))
print j
On the other hand if you are attempting to find the index of each character within the alphabet within the word, then you need to loop over splitalphabet to get an individual character to find within the word. That is
l = []
for c within splitalphabet:
j = word.find(c)
if j != -1:
l.append((c, j))
print l
This gives the list of tuples showing those characters found and the index.
I just saw that you talk about counting the number of letters. I am not sure what you mean by this as len(word) gives the number of characters in each word while len(set(word)) gives the number of unique characters. On the other hand, are you saying that your word might have non-ascii characters in it and you want to count the number of ascii characters in that word? I think that you need to be more specific in what you want to determine.
If what you are doing is attempting to determine if the characters are all alphabetic, then all you need to do is use the isalpha() method on the word. You can either say word.isalpha() and get True or False or check each character of word to be isalpha()

Categories