Python: Expanding a string of variables with integers - python

I'm still new to Python and learning the more basic things in programming.
Right now i'm trying to create a function that will dupilicate a set of numbers varies names.
Example:
def expand('d3f4e2')
>dddffffee
I'm not sure how to write the function for this.
Basically i understand you want to times the letter variable to the number variable beside it.

The key to any solution is splitting things into pairs of strings to be repeated, and repeat counts, and then iterating those pairs in lock-step.
If you only need single-character strings and single-digit repeat counts, this is just breaking the string up into 2-character pairs, which you can do with mshsayem's answer, or with slicing (s[::2] is the strings, s[1::2] is the counts).
But what if you want to generalize this to multi-letter strings and multi-digit counts?
Well, somehow we need to group the string into runs of digits and non-digits. If we could do that, we could use pairs of those groups in exactly the same way mshsayem's answer uses pairs of characters.
And it turns out that we can do this very easily. There's a nifty function in the standard library called groupby that lets you group anything into runs according to any function. And there's a function isdigit that distinguishes digits and non-digits.
So, this gets us the runs we want:
>>> import itertools
>>> s = 'd13fx4e2'
>>> [''.join(group) for (key, group) in itertools.groupby(s, str.isdigit)]
['d', '13', 'ff', '4', 'e', '2']
Now we zip this up the same way that mshsayem zipped up the characters:
>>> groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
>>> ''.join(c*int(d) for (c, d) in zip(groups, groups))
'dddddddddddddfxfxfxfxee'
So:
def expand(s):
groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
return ''.join(c*int(d) for (c, d) in zip(groups, groups))

Naive approach (if the digits are only single, and characters are single too):
>>> def expand(s):
s = iter(s)
return "".join(c*int(d) for (c,d) in zip(s,s))
>>> expand("d3s5")
'dddsssss'
Poor explanation:
Terms/functions:
iter() gives you an iterator object.
zip() makes tuples from iterables.
int() parses an integer from string
<expression> for <variable> in <iterable> is list comprehension
<string>.join joins an iterable strings with string
Process:
First we are making an iterator of the given string
zip() is being used to make tuples of character and repeating times. e.g. ('d','3'), ('s','5) (zip() will call the iterable to make the tuples. Note that for each tuple, it will call the same iterable twice—and, because our iterable is an iterator, that means it will advance twice)
now for in will iterate the tuples. using two variables (c,d) will unpack the tuples into those
but d is still an string. int is making it an integer
<string> * integer will repeat the string with integer times
finally join will return the result
Here is a multi-digit, multi-char version:
import re
def expand(s):
s = re.findall('([^0-9]+)(\d+)',s)
return "".join(c*int(d) for (c,d) in s)
By the way, using itertools.groupby is better, as shown by abarnert.

Let's look at how you could do this manually, using only tools that a novice will understand. It's better to actually learn about zip and iterators and comprehensions and so on, but it may also help to see the clunky and verbose way you write the same thing.
So, let's start with just single characters and single digits:
def expand(s):
result = ''
repeated_char_next = True
for char in s:
if repeated_char_next:
char_to_repeat = char
repeated_char_next = False
else:
repeat_count = int(char)
s += char_to_repeat * repeat_count
repeated_char_next = True
return char
This is a very simple state machine. There are two states: either the next character is a character to be repeated, or it's a digit that gives a repeat count. After reading the former, we don't have anything to add yet (we know the character, but not how many times to repeat it), so all we do is switch states. After reading the latter, we now know what to add (since we know both the character and the repeat count), so we do that, and also switch states. That's all there is to it.
Now, to expand it to multi-char repeat strings and multi-digit repeat counts:
def expand(s):
result = ''
current_repeat_string = ''
current_repeat_count = ''
for char in s:
if isdigit(char):
current_repeat_count += char
else:
if current_repeat_count:
# We've just switched from a digit back to a non-digit
count = int(current_repeat_count)
result += current_repeat_string * count
current_repeat_count = ''
current_repeat_string = ''
current_repeat_string += char
return char
The state here is pretty similar—we're either in the middle of reading non-digits, or in the middle of reading digits. But we don't automatically switch states after each character; we only do it when getting a digit after non-digits, or vice-versa. Plus, we have to keep track of all the characters in the current repeat string and in the current repeat count. I've collapsed the state flag into that repeat string, but there's nothing else tricky here.

There is more than one way to do this, but assuming that the sequence of characters in your input is always the same, eg: a single character followed by a number, the following would work
def expand(input):
alphatest = False
finalexpanded = "" #Blank string variable to hold final output
#first part is used for iterating through range of size i
#this solution assumes you have a numeric character coming after your
#alphabetic character every time
for i in input:
if alphatest == True:
i = int(i) #converts the string number to an integer
for value in range(0,i): #loops through range of size i
finalexpanded += alphatemp #adds your alphabetic character to string
alphatest = False #Once loop is finished resets your alphatest variable to False
i = str(i) #converts i back to string to avoid error from i.isalpha() test
if i.isalpha(): #tests i to see if it is an alphabetic character
alphatemp = i #sets alphatemp to i for loop above
alphatest = True #sets alphatest True for loop above
print finalexpanded #prints the final result

Related

Return only the first instance of a value found in a for loop

I have a list of strings that are split in half like the following;
fhlist = [['BzRmmzZHzVBzgVQmZ'],['efmt']]
shlist = [['LPtqqffPqWqJmPLlL', ['abcm']]
The first half is stored in a list fhlist whilst the second in shlist.
So the combined string of fhlist[0] and shlist[0] is BzRmmzZHzVBzgVQmZLPtqqffPqWqJmPLlL.
and fhlist[1] and shlist[1] is efmtabcm
I've written some code that iterates through each letter in the first and second half strings, and if any letters appear in both halfs it adds this character to another list found;
found = []
for i in range(len(fhlist)):
for char in fhlist[i]:
if char in shlist[i]:
found.append(char)
However, with the above example, the example list returns me m m m as it is returning every instance of the letter occurring, the letter m occurs 3 times in the combined string BzRmmzZHzVBzgVQmZLPtqqffPqWqJmPLlL I only want to return the code to return m
I previously had;
found = []
for i in range(len(fhlist)):
for char in fhlist[i]:
if char in shlist[i] and char not in found:
found.append(char)
but this essentially 'blacklisted' any characters that appeared in other strings, so if another two strings both contained m such as the combined string efmtabcm it would ignore it as this character had already been found.
Thanks for any help!
Expanding my suggestion from the comments since it apparently solves the problem in the desired way:
To dedupe per pairing, you can replace:
found = []
for i in range(len(fhlist)):
for char in fhlist[i]:
if char in shlist[i]:
found.append(char)
with (making some slight idiomatic improvements):
found = []
for fh, sh in zip(fhlist, shlist): # Pair them up directly, don't operate by index
found.extend(set(fh).intersection(sh))
or as a possibly too complex listcomp:
found = [x for fh, sh in zip(fhlist, shlist) for x in set(fh).intersection(sh)]
This gets the unique overlapping items from each pairing (with set(fh).intersection(sh)) more efficiently (O(m+n) rather than O(m*n) in terms of the lengths of each string), then you add them all to found in bulk (keeping it as a list to avoid deduping across pairings).
IIUC, you are trying to find common characters between each of the respective strings in fhlist and shlist
You can use set.intersection for this after using zip on the 2 lists and iterating on them together with a list comprehension, as follows -
[list(set(f[0]).intersection(s[0])) for f,s in zip(fhlist, shlist)]
[['m'], ['m']]
This works as follows -
1. BzRmmzZHzVBzgVQmZ, LPtqqffPqWqJmPLlL -> Common is `m`
2. efmt, abcm -> Common is `m`
...
till end of list
You can try this
fhlist = [['BzRmmzZHzVBzgVQmZ'],['efmt']]
shlist = [['LPtqqffPqWqJmPLlL'], ['abcm']]
found = []
for i in range(len(fhlist)):
for char in ''.join(fhlist[i]):
for a in ''.join(shlist[i]):
if a==char and char not in found:
found.append(char)
print(found)
Output:
['m']

"Sorted" function doesn't work in Python task

Here's my code:
def Descending_Order(num):
return int(''.join(sorted(str(num).split(), reverse = True)))
print Descending_Order(0)
print Descending_Order(15)
print Descending_Order(123456789)
"num" is supposed to be printed in descending order, but the code doesn't work, although I don't have any errors. Any idea why it isn't being executed?
The split is superfluous, redundant and the cause of your problem. The split method of a string requires a delimiter which in your case there is none so defaults to consecutive whitespace. As your string does not have consecutive white-space, it results in a single list containing the number in string format as the only element.
>>> str('123456789').split()
['123456789']
Sorting the resultant list is invariant as what you are sorting is a list of a single element
>>> sorted(['123456789'])
['123456789']
Finally joining and converting it to an integer restores the original number
>>> int(''.join(sorted(['123456789'])))
123456789
It is worth mentioning that sorted expects a sequence, so a string would qualify enough to be sorted without splitting into individual digits
What you probably wanted is
>>> def Descending_Order(num):
return int(''.join(sorted(str(num), reverse = True)))
>>> print Descending_Order(123456789)
987654321
You can also split the numbers using list, then sort the list that way:
def Descending_Order(num):
digits = [digit for digit in list(str(num))]
return int("".join(sorted(digits, reverse = True)))
# Output
>>> Descending_Order(123456789)
987654321

Python help: generating all possible strings given optional character

I'm trying to write a function in Python that, given a string and an optional character, generates all possible strings from the given string. The big picture is using this function to eventually help with turning a CFG into chomsky normal form.
For example, given a string 'ASA' and optional character 'A', I want to be able to generate the following array:
['SA', 'AS', 'S']
Since these are all the possible strings that can be generated by omitting one or both of the A's of the original string.
For reference, I've looked at the following question: generating all possible strings given a grammar rule, but the problem seemed to be slightly different since the rules of the grammar were defined in the original string.
Here is my thinking on how to go about solving the problem: Have a recursive function that takes a string and an optional character, loops through the string to find the first optional character, then create a new string that has the first optional character omitted, add this to a return array, and call itself again with the string it just generated and the same optional character.
Then, after all recursions return, go back to the original string and omit the second occurrence of the optional character, and repeat the process.
This would continue on until all occurrences of the optional character were omitted.
I was wondering if there was any better way of doing this than by using the type of logic I just described.
As was mentioned in the comments it could also be done with itertools. Here's a quick demonstration:
import itertools
mystr='ABCDABCDAABCD'
optional_letter='A'
indices=[i for i,char in enumerate(list(mystr)) if char==optional_letter]
def remover(combination,mystr):
mylist=list(mystr)
for index in combination[::-1]:
del mylist[index]
return ''.join(mylist)
all_strings=[remover(combination,mystr)
for n in xrange(len(indices)+1)
for combination in itertools.combinations(indices,n)]
for string in all_strings: print string
It first finds all indices of occurrences of your character, then removes all the combinations of these indices from your string. If you have two optional letters in a row in the sring you will get duplicates which can be removed by using:
set(all_strings)
This is based on the combinations method, that returns a list of all possible combinations (without regard to order) of elements a list. Pass a list of indexes of the occurrences of your character to it, and the rest is straightforward:
def indexes(string, char):
return [i for i in range(len(string)) if string[i] == char]
def combinations(chars, max_length=None):
if max_length is None:
max_length = len(chars)
if len(chars) == 0:
return [[]]
nck = []
for sub_list in combinations(chars[1:], max_length):
nck.append(sub_list)
if len(sub_list) < max_length:
nck.append(chars[:1] + sub_list)
return nck
def substringsOmitting(string, char):
subbies = []
for combo in combinations(indexes(string, char)):
keepChars = [string[i] for i in range(len(string)) if not i in combo]
subbies.append(''.join(keepChars))
return subbies
if __name__ == '__main__':
print(substringsOmitting('ASA', 'A'))
output: ['ASA', 'SA', 'AS', 'S']
It does contain the string itself, too. But this should be a good starting point.

Optionally replacing a substring python

My list of replacement is in the following format.
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II')),.....]
What I want to achieve is optionally change the occurrence of the letter by all the possible replacements. The input word should also be a member of the list.
e.g.
input - DArA
Expected output -
['DArA','DaarA','Daaraa','DAraa','DaArA','DAraA','DaAraA','DAarA','DAarAa', 'DArAa','DAArA','DAArAA','DArAA']
My try was
lstrep = [('A',('aa','aA','Aa','AA'))]
def alte(word,lstrep):
output = [word]
for (a,b) in lstrep:
for bb in b:
output.append(word.replace(a,bb))
return output
print alte('DArA',lstrep)
The output I received was ['DArA', 'Daaraa', 'DaAraA', 'DAarAa', 'DAArAA'] i.e. All occurrences of 'A' were replaced by 'aa','aA','Aa' and 'AA' respectively. What I want is that it should give all permutations of optional replacements.
itertools.product will give all of the permutations. You can build up a list of substitutions and then let it handle the permutations.
import itertools
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II'))]
input_str = 'DArA'
# make substitution list a dict for easy lookup
lstrep_map = dict(lstrep)
# a substitution is an index plus a string to substitute. build
# list of subs [[(index1, sub1), (index1, sub2)], ...] for all
# characters in lstrep_map.
subs = []
for i, c in enumerate(input_str):
if c in lstrep_map:
subs.append([(i, sub) for sub in lstrep_map[c]])
# build output by applying each sub recorded
out = [input_str]
for sub in itertools.product(*subs):
# make input a list for easy substitution
input_list = list(input_str)
for i, cc in sub:
input_list[i] = cc
out.append(''.join(input_list))
print(out)
Try constructing tuples of all possible permutations based on the replaceable characters that occur. This will have to be achieved using recursion.
The reason recursion is necessary is that you would need a variable number of loops to achieve this.
For your example "DArA" (2 replaceable characters, "A" and "A"):
replaceSet = set()
replacements = ['A':('aa','aA','Aa','AA'),'I':('ii','iI','Ii','II'),.....]
for replacement1 in replacements["A"]:
for replacement2 in replacements["A"]:
replaceSet.add((replacement1, replacement2))
You see you need two loops for two replaceables, and n loops for n replaceables.
Think of a way you could use recursion to solve this problem. It will likely involve creating all permutations for a substring that contains n-1 replaceables (if you had n in your original string).

Python: How to produce only in sequence combinations from a list of string parts, with use being optional

I would like to know how I can produce only in sequence combinations from a list of string parts, with use being optional. I need to do this in Python.
For example:
Charol(l)ais (cattle) is my complete string, with the parts in brackets being optional.
From this I would like to produce the following output as an iterable:
Charolais
Charollais
Charolais cattle
Charollais cattle
Was looking at Python's itertools module, since it has combinations; but couldn't figure out how to use this for my scenario.
You will need to convert the string into a more sensible format. For example, a tuple of all of the options for each part:
words = [("Charol",), ("l", ""), ("ais ",), ("cattle", "")]
And you can easily put them back together:
for p in itertools.product(*words):
print("".join(p))
To create the list, parse the string, e.g.:
base = "Charol(l)ais (cattle)"
words = []
start = 0
for i, c in enumerate(base):
if c == "(":
words.append((base[start:i],))
start = i + 1
elif c == ")":
words.append((base[start:i], ""))
start = i + 1
if start < len(base):
words.append((base[start:],))
You could use the permutations from itertools and denote your optional strings with a special character. Then, you can replace those either with the correct character or an empty string. Or carry on from this idea depending on the exact semantics of your task at hand.

Categories