I'm trying to write a function in Python that, given a string and an optional character, generates all possible strings from the given string. The big picture is using this function to eventually help with turning a CFG into chomsky normal form.
For example, given a string 'ASA' and optional character 'A', I want to be able to generate the following array:
['SA', 'AS', 'S']
Since these are all the possible strings that can be generated by omitting one or both of the A's of the original string.
For reference, I've looked at the following question: generating all possible strings given a grammar rule, but the problem seemed to be slightly different since the rules of the grammar were defined in the original string.
Here is my thinking on how to go about solving the problem: Have a recursive function that takes a string and an optional character, loops through the string to find the first optional character, then create a new string that has the first optional character omitted, add this to a return array, and call itself again with the string it just generated and the same optional character.
Then, after all recursions return, go back to the original string and omit the second occurrence of the optional character, and repeat the process.
This would continue on until all occurrences of the optional character were omitted.
I was wondering if there was any better way of doing this than by using the type of logic I just described.
As was mentioned in the comments it could also be done with itertools. Here's a quick demonstration:
import itertools
mystr='ABCDABCDAABCD'
optional_letter='A'
indices=[i for i,char in enumerate(list(mystr)) if char==optional_letter]
def remover(combination,mystr):
mylist=list(mystr)
for index in combination[::-1]:
del mylist[index]
return ''.join(mylist)
all_strings=[remover(combination,mystr)
for n in xrange(len(indices)+1)
for combination in itertools.combinations(indices,n)]
for string in all_strings: print string
It first finds all indices of occurrences of your character, then removes all the combinations of these indices from your string. If you have two optional letters in a row in the sring you will get duplicates which can be removed by using:
set(all_strings)
This is based on the combinations method, that returns a list of all possible combinations (without regard to order) of elements a list. Pass a list of indexes of the occurrences of your character to it, and the rest is straightforward:
def indexes(string, char):
return [i for i in range(len(string)) if string[i] == char]
def combinations(chars, max_length=None):
if max_length is None:
max_length = len(chars)
if len(chars) == 0:
return [[]]
nck = []
for sub_list in combinations(chars[1:], max_length):
nck.append(sub_list)
if len(sub_list) < max_length:
nck.append(chars[:1] + sub_list)
return nck
def substringsOmitting(string, char):
subbies = []
for combo in combinations(indexes(string, char)):
keepChars = [string[i] for i in range(len(string)) if not i in combo]
subbies.append(''.join(keepChars))
return subbies
if __name__ == '__main__':
print(substringsOmitting('ASA', 'A'))
output: ['ASA', 'SA', 'AS', 'S']
It does contain the string itself, too. But this should be a good starting point.
Related
Learning Python, came across a demanding begginer's exercise.
Let's say you have a string constituted by "blocks" of characters separated by ';'. An example would be:
cdk;2(c)3(i)s;c
And you have to return a new string based on old one but in accordance to a certain pattern (which is also a string), for example:
c?*
This pattern means that each block must start with an 'c', the '?' character must be switched by some other letter and finally '*' by an arbitrary number of letters.
So when the pattern is applied you return something like:
cdk;cciiis
Another example:
string: 2(a)bxaxb;ab
pattern: a?*b
result: aabxaxb
My very crude attempt resulted in this:
def switch(string,pattern):
d = []
for v in range(0,string):
r = float("inf")
for m in range (0,pattern):
if pattern[m] == string[v]:
d.append(pattern[m])
elif string[m]==';':
d.append(pattern[m])
elif (pattern[m]=='?' & Character.isLetter(string.charAt(v))):
d.append(pattern[m])
return d
Tips?
To split a string you can use split() function.
For pattern detection in strings you can use regular expressions (regex) with the re library.
My list of replacement is in the following format.
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II')),.....]
What I want to achieve is optionally change the occurrence of the letter by all the possible replacements. The input word should also be a member of the list.
e.g.
input - DArA
Expected output -
['DArA','DaarA','Daaraa','DAraa','DaArA','DAraA','DaAraA','DAarA','DAarAa', 'DArAa','DAArA','DAArAA','DArAA']
My try was
lstrep = [('A',('aa','aA','Aa','AA'))]
def alte(word,lstrep):
output = [word]
for (a,b) in lstrep:
for bb in b:
output.append(word.replace(a,bb))
return output
print alte('DArA',lstrep)
The output I received was ['DArA', 'Daaraa', 'DaAraA', 'DAarAa', 'DAArAA'] i.e. All occurrences of 'A' were replaced by 'aa','aA','Aa' and 'AA' respectively. What I want is that it should give all permutations of optional replacements.
itertools.product will give all of the permutations. You can build up a list of substitutions and then let it handle the permutations.
import itertools
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II'))]
input_str = 'DArA'
# make substitution list a dict for easy lookup
lstrep_map = dict(lstrep)
# a substitution is an index plus a string to substitute. build
# list of subs [[(index1, sub1), (index1, sub2)], ...] for all
# characters in lstrep_map.
subs = []
for i, c in enumerate(input_str):
if c in lstrep_map:
subs.append([(i, sub) for sub in lstrep_map[c]])
# build output by applying each sub recorded
out = [input_str]
for sub in itertools.product(*subs):
# make input a list for easy substitution
input_list = list(input_str)
for i, cc in sub:
input_list[i] = cc
out.append(''.join(input_list))
print(out)
Try constructing tuples of all possible permutations based on the replaceable characters that occur. This will have to be achieved using recursion.
The reason recursion is necessary is that you would need a variable number of loops to achieve this.
For your example "DArA" (2 replaceable characters, "A" and "A"):
replaceSet = set()
replacements = ['A':('aa','aA','Aa','AA'),'I':('ii','iI','Ii','II'),.....]
for replacement1 in replacements["A"]:
for replacement2 in replacements["A"]:
replaceSet.add((replacement1, replacement2))
You see you need two loops for two replaceables, and n loops for n replaceables.
Think of a way you could use recursion to solve this problem. It will likely involve creating all permutations for a substring that contains n-1 replaceables (if you had n in your original string).
I'm still new to Python and learning the more basic things in programming.
Right now i'm trying to create a function that will dupilicate a set of numbers varies names.
Example:
def expand('d3f4e2')
>dddffffee
I'm not sure how to write the function for this.
Basically i understand you want to times the letter variable to the number variable beside it.
The key to any solution is splitting things into pairs of strings to be repeated, and repeat counts, and then iterating those pairs in lock-step.
If you only need single-character strings and single-digit repeat counts, this is just breaking the string up into 2-character pairs, which you can do with mshsayem's answer, or with slicing (s[::2] is the strings, s[1::2] is the counts).
But what if you want to generalize this to multi-letter strings and multi-digit counts?
Well, somehow we need to group the string into runs of digits and non-digits. If we could do that, we could use pairs of those groups in exactly the same way mshsayem's answer uses pairs of characters.
And it turns out that we can do this very easily. There's a nifty function in the standard library called groupby that lets you group anything into runs according to any function. And there's a function isdigit that distinguishes digits and non-digits.
So, this gets us the runs we want:
>>> import itertools
>>> s = 'd13fx4e2'
>>> [''.join(group) for (key, group) in itertools.groupby(s, str.isdigit)]
['d', '13', 'ff', '4', 'e', '2']
Now we zip this up the same way that mshsayem zipped up the characters:
>>> groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
>>> ''.join(c*int(d) for (c, d) in zip(groups, groups))
'dddddddddddddfxfxfxfxee'
So:
def expand(s):
groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
return ''.join(c*int(d) for (c, d) in zip(groups, groups))
Naive approach (if the digits are only single, and characters are single too):
>>> def expand(s):
s = iter(s)
return "".join(c*int(d) for (c,d) in zip(s,s))
>>> expand("d3s5")
'dddsssss'
Poor explanation:
Terms/functions:
iter() gives you an iterator object.
zip() makes tuples from iterables.
int() parses an integer from string
<expression> for <variable> in <iterable> is list comprehension
<string>.join joins an iterable strings with string
Process:
First we are making an iterator of the given string
zip() is being used to make tuples of character and repeating times. e.g. ('d','3'), ('s','5) (zip() will call the iterable to make the tuples. Note that for each tuple, it will call the same iterable twice—and, because our iterable is an iterator, that means it will advance twice)
now for in will iterate the tuples. using two variables (c,d) will unpack the tuples into those
but d is still an string. int is making it an integer
<string> * integer will repeat the string with integer times
finally join will return the result
Here is a multi-digit, multi-char version:
import re
def expand(s):
s = re.findall('([^0-9]+)(\d+)',s)
return "".join(c*int(d) for (c,d) in s)
By the way, using itertools.groupby is better, as shown by abarnert.
Let's look at how you could do this manually, using only tools that a novice will understand. It's better to actually learn about zip and iterators and comprehensions and so on, but it may also help to see the clunky and verbose way you write the same thing.
So, let's start with just single characters and single digits:
def expand(s):
result = ''
repeated_char_next = True
for char in s:
if repeated_char_next:
char_to_repeat = char
repeated_char_next = False
else:
repeat_count = int(char)
s += char_to_repeat * repeat_count
repeated_char_next = True
return char
This is a very simple state machine. There are two states: either the next character is a character to be repeated, or it's a digit that gives a repeat count. After reading the former, we don't have anything to add yet (we know the character, but not how many times to repeat it), so all we do is switch states. After reading the latter, we now know what to add (since we know both the character and the repeat count), so we do that, and also switch states. That's all there is to it.
Now, to expand it to multi-char repeat strings and multi-digit repeat counts:
def expand(s):
result = ''
current_repeat_string = ''
current_repeat_count = ''
for char in s:
if isdigit(char):
current_repeat_count += char
else:
if current_repeat_count:
# We've just switched from a digit back to a non-digit
count = int(current_repeat_count)
result += current_repeat_string * count
current_repeat_count = ''
current_repeat_string = ''
current_repeat_string += char
return char
The state here is pretty similar—we're either in the middle of reading non-digits, or in the middle of reading digits. But we don't automatically switch states after each character; we only do it when getting a digit after non-digits, or vice-versa. Plus, we have to keep track of all the characters in the current repeat string and in the current repeat count. I've collapsed the state flag into that repeat string, but there's nothing else tricky here.
There is more than one way to do this, but assuming that the sequence of characters in your input is always the same, eg: a single character followed by a number, the following would work
def expand(input):
alphatest = False
finalexpanded = "" #Blank string variable to hold final output
#first part is used for iterating through range of size i
#this solution assumes you have a numeric character coming after your
#alphabetic character every time
for i in input:
if alphatest == True:
i = int(i) #converts the string number to an integer
for value in range(0,i): #loops through range of size i
finalexpanded += alphatemp #adds your alphabetic character to string
alphatest = False #Once loop is finished resets your alphatest variable to False
i = str(i) #converts i back to string to avoid error from i.isalpha() test
if i.isalpha(): #tests i to see if it is an alphabetic character
alphatemp = i #sets alphatemp to i for loop above
alphatest = True #sets alphatest True for loop above
print finalexpanded #prints the final result
I'm fairly new to programming so forgive me if there's an obvious answer here.
I essentially want to apply a permutation to a string. If the string isn't the same length as the permutation, I want to add a 'Z' to make the length a multiple of the permutation's length. The problem is if say the permutation is length 4 and the string is length 8, how do I get it to apply the permutation to the second half of the string?
here's what I have so far
def encr(k, m):
if len(m)%len(k)!=0:
for i in range((int(len(m)%len(k)))):
m+=str('Z')
return apply(m, k)
and heres the apply function called on in the previous function
def apply(L,P):
A = list(range(len(L)))
for i in range(len(L)): A[i] = L[P[i]]
return A
I want to apply the permutation to a string in order to do a simple for of encryption. It would just switch the index of the letters in the string. But if the string is longer than the permutation how do I get the permutation to apply more than once to the same string. For example: Permutation = [2,3,0,1,4] and the string would be "hello" it would make it "llheo" but if the string is "hello_world", how do I get the permutation to start over again after the 4th index? I apologize if I'm confusing, I just don't know how to word it.
Okay I hope I'm understanding you correctly. What I think you're asking is if you give a list of characters and a list of indexes to rearrange those characters how can you:
A. add characters to the end of the characters list to make the length a multiple of the length of the index list
and
B. make the index list repeat on character lists that are longer than the index lists (i.e [4,0,1,3,2] would be used twice on a 10 character long list)
if that is indeed the case then here is my response....
Firstly if you just added random characters to the word to be a multiple of the perm then repeated the perm you would have an issue with your word would ALWAYS be scrambled in groups of the length of your perm. I.E. if your perm was 5 characters long and your text was 'aaaaabbbbbcccccddddd' the result would always be 'aaaaabbbbbcccccddddd' no matter what the order of the permutation was. But since i don't know exactly what you're using it for, here is what you've asked for I think....
this will add a 'Z' to the end of a list of characters until the length is a multiple of the length of the permutation order. I think thats what you were wanting. Also i've commented out a line that will instead add a random lowercase character instead of adding 'Z' in case you found that helpful.
import random
def enc(oldword, perm):
while len(oldword)%len(perm)!=0:
oldword.append('Z')
## oldword.append(chr(random.randrange(97,123)))
newword=[]
while len(oldword)>0:
for i in perm:
newword.append(oldword[i])
for _ in range(len(perm)):
oldword.pop(0)
return newword
Now taking a guess from the name of your function 'enc' if you are trying to scramble a word or sentence using a key (like your permutation list) that you can also use to DECRYPT the message I would suggest making a slight modification. Instead of supplying a perm list you could have the input be a NUMBER and then use that number like so:
count the characters in the string repeatedly until you reach that number then remove that character from the string and repeat the process from that point. for example if the word was 'hello' and the number was 7 you would count the 5 characters in the word hello then go back to the h for number 6 and the e would be 7. you would remove the 'e' making the remaining string 'hllo' and your new string would be 'e' so far. Then count to 7 again starting with the place you left off. repeat that until there are no more characters in your string.
A more detailed example....
For the text 'hello_world' using a number of 65
def enc(oldword, siferkey):
#be sure that the oldword is a list of characters i.e. ['a','b','c']
oldword=[oldword[i] for i in range(len(oldword))]
newword=[]
dex=0
while len(oldword)>0:
dex+=siferkey-1
while dex not in range(len(oldword)):
dex-=len(oldword)
newword.append(oldword.pop(dex))
return newword
print enc(['h','e','l','l','o','_','w','o','r','l','d'],65)
would return ['r', 'l', 'w', 'e', '_', 'l', 'l', 'o', 'o', 'h', 'd']
Then to unscramble it you would use this:
def dec(wrd, sifer):
dex=0
newword=['' for _ in range(len(wrd))]
dex+=sifer-1
while dex not in range(len(newword)):
dex-=len(newword)
newword[dex]=wrd.pop(0)
counter=0
while len(wrd)>0:
counter+=sifer
while counter>0:
dex+=1
if dex not in range(len(newword)):
dex=0
if newword[dex]=='':
counter-=1
newword[dex]=wrd.pop(0)
return newword
Please forgive me if i'm completely wrong on the direction you were trying to go here.
I would like to know how I can produce only in sequence combinations from a list of string parts, with use being optional. I need to do this in Python.
For example:
Charol(l)ais (cattle) is my complete string, with the parts in brackets being optional.
From this I would like to produce the following output as an iterable:
Charolais
Charollais
Charolais cattle
Charollais cattle
Was looking at Python's itertools module, since it has combinations; but couldn't figure out how to use this for my scenario.
You will need to convert the string into a more sensible format. For example, a tuple of all of the options for each part:
words = [("Charol",), ("l", ""), ("ais ",), ("cattle", "")]
And you can easily put them back together:
for p in itertools.product(*words):
print("".join(p))
To create the list, parse the string, e.g.:
base = "Charol(l)ais (cattle)"
words = []
start = 0
for i, c in enumerate(base):
if c == "(":
words.append((base[start:i],))
start = i + 1
elif c == ")":
words.append((base[start:i], ""))
start = i + 1
if start < len(base):
words.append((base[start:],))
You could use the permutations from itertools and denote your optional strings with a special character. Then, you can replace those either with the correct character or an empty string. Or carry on from this idea depending on the exact semantics of your task at hand.