Generate all substrings of a given string - python

I want to generate all the possible substrings from a given string without redundant values as follows:
input: 'abba'
output: 'a','b','ab','ba','abb','bba'
Here is my code
s='abba'
for i in range (0,len(s)):
for j in range (i+1,len(s)):
print(s[i:j])
My output is 'a','ab','abb','b','bb','b'
As you can see from the output 'b' is repeated, and 'bba' does not exist.
I want to know and learn the right logic to produce all unique substrings.

Fixing the indexing a bit
s='abba'
for i in range (0,len(s)):
for j in range (i,len(s)):
print(s[i:(j+1)])
yields the following output
a
ab
abb
abba
b
bb
bba
b
ba
a
Basically, the indexing fix takes into account that
'abba'[3:3] produces just zero-length string ''
but
'abba'[3:4] produces string 'a' which has length one.
Duplicates you may remove by using set(), as follows:
s='abba'
ss = set()
for i in range (0,len(s)):
for j in range (i,len(s)):
ss.add(s[i:(j+1)])
print(sorted(ss))
Then you will have the following result ['a', 'ab', 'abb', 'abba', 'b', 'ba', 'bb', 'bba'].

Related

how to concatenate two string from 2 list in one list without getting double quotation doubled?

i have two list i want to have concatenation without additional quote ,and those lists that i want to enter
a=["a"]
b=["b"]
** how i can pass it in matrix like this**
matrix["ab"]
** or how i can get this result , i try append and concatenate but it doesn't work**
c=["ab"]
Use
d = list([a[0]+b[0]])
or use the code below which also combines strings from longer lists such as ['a', 'c'] and ['b', 'e']
d = [i+j for i, j in zip(a,b)]

Generating all possible combinations of characters in a string

Say I have a string list:
li = ['a', 'b', 'c']
I would like to construct a new list such that each entry of the new list is a concatenation of a selection of 3 entries in the original list. Note that each entry can be chosen repeatedly:
new_li=['abc', 'acb', 'bac', 'bca', 'cab', 'cba', 'aab', 'aac',....'aaa', 'bbb', 'ccc']
The brutal force way is to construct a 3-fold nested for loop and insert each 3-combination into the new list. I was wondering if there is any Pythonic way to deal with that? Thanks.
Update:
Later I will convert the new list into a set, so the order does not matter anyway.
This looks like a job for itertools.product.
import itertools
def foo(l):
yield from itertools.product(*([l] * 3))
for x in foo('abc'):
print(''.join(x))
aaa
aab
aac
aba
abb
abc
aca
acb
acc
baa
bab
bac
bba
bbb
bbc
bca
bcb
bcc
caa
cab
cac
cba
cbb
cbc
cca
ccb
ccc
yield from is available to you from python3.3 and beyond. For older version, yield within a loop:
def foo(l):
for i in itertools.product(*([l] * 3)) :
yield i
The best way to get all combinations (also called cartesian product) of a list is to use itertools.product using the len of your iterable as repeat argument (that's where it differs from the other answer):
from itertools import product
li = ['a', 'b', 'c']
for comb in product(li, repeat=len(li)):
print(''.join(comb))
or if you want the result as list:
>>> combs = [''.join(comb) for comb in product(li, repeat=len(li))]
>>> combs
['aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa',
'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab',
'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']
It's a bit cleaner to use the repeat argument than to multiply and unpack the list you have manually.
An alternate approach using list comprehension:
li = ['a', 'b', 'c']
new_li = [a+b+c for a in li for b in li for c in li]
import itertools
repeat=int(input("Enter length: ")
def password():
def foo(l):
yield from itertools.product(*([l] * repeat)))
for x in foo('abcdefghijklmnopqrstuvwxyz'):
# you could also use string.ascii_lowercase or ["a","b","c"]
print(''.join(x))
password()
I'll show you a way to do this without any libraries so that you can understand the logic behind how to achieve it.
First, we need to understand how to achieve all combinations mathematically.
Let's take a look at the pattern of every possible combination of characters ranging from a-b with a length of '1'.
a
b
Not much to see but from what we can see, there is one set of each character in the list. Let's increase our string length to '2' and see what pattern emerges.
aa
ab
ba
bb
So looking at this pattern, we see a new column has been added. The far right column is the same as the first example, with there being only 1 set of characters, but it's looped this time. The column on the far left has 2 set of characters. Could it be that for every new column added, one more set of characters is added? Let's take a look and find out by increasing the string length to '3'.
aaa
aab
aba
abb
baa
bab
bba
bbb
We can see the two columns on the right have stayed the same and the new column on the left has 4 of each characters! Not what we was expecting. So the number of characters doesn't increase by 1 for each column. Instead, if you notice the pattern, it is actually increasing by powers of 2.
The first column with only '1' set of characters : 2 ^ 0 = 1
The second column with '2' sets of characters : 2 ^ 1 = 2
The third column with '4' sets of characters : 2 ^ 2 = 4
So the answer here is, with each new column added, the number of each characters in the column is determined by it's position of powers, with the first column on the right being x ^ 0, then x ^ 1, then x ^ 2... and so on.
But what is x? In the example I gave x = 2. But is it always 2? Let's take a look.
I will now give an example of each possible combination of characters from range a-c
aa
ab
ac
ba
bb
bc
ca
cb
cc
If we count how many characters are in the first column on the right, there is still only one set of each characters for every time it loops, this is because the very first column on the right will always be equal to x ^ 0 and anything to the power of 0 is always 1. But if we look at the second column, we see 3 of each characters for every loop. So if x ^ 1 is for the second column, then x = 3. For the first example I gave with a range of a-b (range of 2), to the second example where I used a range a-c (range of 3), it seems as if x is always the length of characters used in your combinations.
With this first pattern recognised, we can start building a function that can identify what each column should represent. If we want to build every combination of characters from range a-b with a string length of 3, then we need a function that can understand that every set of characters in each column will as followed : [4, 2, 1].
Now create a function that can find how many set of characters should be in each column by returning a list of numbers that represent the total number of characters in a column based on it's position. We do this using powers.
Remember if we use a range of characters from a-b (2) then each column should have a total of x ^ y number of characters for each set, where x represents the length of characters being used, and y represents it's column position, where the very first column on the right is column number 0.
Example:
A combination of characters ranging from ['a', 'b'] with a string length of 3 will have a total of 4 a's and b's in the far left column for each set, a total of 2 a's and b's in the next for each set and a total of 1 a's and b's in the last for each set.
To return a list with this total number of characters respective to their columns as so [4, 2, 1] we can do this
def getCharPower(stringLength, charRange):
charpowers = []
for x in range(0, stringLength):
charpowers.append(len(charRange)**(stringLength - x - 1))
return charpowers
With the above function - if we want to create every possible combination of characters that range from a-b (2) and have a string length of 4, like so
aaaa
aaab
aaba
aabb
abaa
abab
abba
abbb
baaa
baab
baba
babb
bbaa
bbab
bbba
bbbb
which have a total set of (8) a's and b's, (4) a's and b's, (2) a's and b's, and (1) a's and b's, then we want to return a list of [8, 4, 2, 1]. The stringLength is 4 and our charRange is ['a', 'b'] and the result from our function is [8, 4, 2, 1].
So now all we have to do is print out each character x number of times depending on the value of it's column placement from our returned list.
In order to do this though, we need to find out how many times each set is printed in it's column. Take a look at the first column on the right of the previous combination example. All though a and b is only printed once per set, it loops and prints out the same thing 7 more times (8 total). If the string was only 3 characters in length then it loop a total of 4 times.
The reason for this is because the length of our strings determine how many combinations there will be in total. The formula for working this out is x ^ y = a, where x equals our range of characters, y equals the length of the string and a equals the total number of combinations that are possible within those specifications.
So to finalise this problem, our solution is to figure out
How many many characters in each set go into each column
How many times to repeat each set in each column
Our first option has already been solved with our previously created function.
Our second option can be solved by finding out how many combinations there are in total by calculating charRange ^ stringLength. Then running through a loop, we add how many sets of characters there are until a (total number of possible combinations) has been reached in that column. Run that for each column and you have your result.
Here is the function that solves this
def Generator(stringLength, charRange):
workbench = []
results = []
charpowers = getCharPower(stringLength, charRange)
for x in range(0, stringLength):
while len(workbench) < len(charRange)**stringLength:
for char in charRange:
for z in range(0, charpowers[x]):
workbench.append(char)
results.append(workbench)
workbench = []
results = ["".join(result) for result in list(zip(*results))]
return results
That function will return every possible combination of characters and of string length that you provide.
A way more simpler way of approaching this problem would be to just run a for loop for your total length.
So to create every possible combination of characters ranging from a-b with a length of 2
characters = ['a', 'b']
for charone in characters:
for chartwo in characters:
print(charone+chartwo)
All though this is a lot simpler, this is limited. This code only works to print every combination with a length of 2. To create more than this, we would have to manually add another for loop each time we wanted to change it. The functions I provided to you before this code however will print any combination for how many string length you give it, making it 100% adaptable and the best way to solve this issue manually yourself without any libraries.

How to remove items from a list without the [''] part

I am trying to get the word "Test" by taking each character out of the list using positions within it.
Here is my code:
test1 = ["T", "E", "S", "T"]
one = test1[0:1]
two = test1[1:2]
three = test1[2:3]
four = test1[3:4]
print(one, two, three, four)
At the moment my output from the program is:
['T'] ['E'] ['S'] ['T']
Although that does read "Test" it has [] around each letter which I don't want.
[a:b] returns a list with every value from index a until index b.
If you just want to access a singe value from a list you just need to point to the index of the value to access. E.g.
s = ['T', 'e', 's', 't']
print(s[0]) # T
print(s[0:1]) # ['T']
The problem is you are using slices of the list not elements. The syntax l[i1,i2] returns a list with all elements of l between the indices i1 and i2. If one of them is out of bound you get an error. To do what you intended you can do:
one = test[0]
two = test[1]
...
You have slicing and indexing confused. You are using slicing where you should use indexing.
Slicing always returns a new object of the same type, with the given selection elements. Slicing a list always gives you a list again:
>>> test1 = ["T","E","S","T"]
>>> test1[1:3]
['E', 'S']
>>> test1[:1]
['T']
while indexing uses individual positions only (no : colons to separate start and end positions), and gives you the individual elements from the list:
>>> test1[0]
'T'
>>> test1[1]
'E'
Not that you need to use indexing at all. Use the str.join() method instead; given a separator string, this joins the string elements of a list together with that delimiter in between. Use the empty string:
>>> ''.join(test1)
'TEST'
try this
test1 = ["T","E","S","T"]
final = ""
for i in range(0, len(test1)):
final = final + str(test1[i])

Check if a subset of characters in a list of strings is contained in another list of strings

So I have two list of strings. Those strings are formed by a sorted combination of one or more different characters. The characters are not all in the alphabet but are given.
Let's say, all the possible characters are [A, B, C, D, E], then the two lists have a combination of those elements (from 1 up to 5 in this case).
Example:
list1 = [AB, AB, C]
list2 = [ABC, CD, ABCDE, E]
The number of elements in each list is not defined, but can range from 1 to 30, with the general case being around 10.
Now, what I want is to tell if there is at least one combination of unique characters per string in list1 that also exists in list2, regardless order. In the example, [A, A, C] is contained in list2 with [A, C, A, E].
The naive way I found to do this is doing all the possible 1 character combinations from each list and see if exists at least one case where list1 is contained in list2. But this can grow exponentially as all possible combinations of a 10 element list of 5-characters strings can be huge (and that's only the general case).
I have thought of using regular expressions or something like that, but I am really not picturing a more efficient solution.
I am using Python for this. Just in case is relevant because of an existing solution or library.
Thank you for your help!
This may be a prime candidate for set operations. Lets take your example (notice, we needed to add quotes to make them strings).
list1 = ["AB", "AB", "C"]
list2 = ["ABC", "CD", "ABCDE", "E"]
If we want a set with unique elements from both list1 and list2
print(set(list1) | set(list2))
#OUTPUT: {'C', 'AB', 'ABCDE', 'CD', 'ABC', 'E'}
If we want to check what elements are common in both list1 and list2(If we were to add "C" to list2 we would have an output of {'C'} otherwise, there are no common elements shared which results in an empty set())
print(set(list1) & set(list2))
#OUTPUT: set()
If we want the elements that are in list1 but not in list2
print(set(list1) - set(list2))
#OUTPUT: {'C', 'AB'}
If we want a set with elements that are either in list1 or list2
print(set(list1) ^ set(list2))
#OUTPUT: {'E', 'CD', 'AB', 'ABC', 'C', 'ABCDE'}
For more information you can check out https://docs.python.org/2/library/sets.html
I hope this helped!

Permutations using a multidict

I'm trying to put together a code that replaces unique characters in a given input string with corresponding values in a dictionary in a combinatorial manner while preserving the position of 'non' unique characters.
For example, I have the following dictionary:
d = {'R':['A','G'], 'Y':['C','T']}
How would go about replacing all instances of 'R' and 'Y' while producing all possible combinations of the string but maintaining the positions of 'A' and 'C'?
For instance, the input 'ARCY' would generate the following output:
'AACC'
'AGCC'
'AACT'
'AGCT'
Hopefully that makes sense. If anyone can point me in the right directions, that would be great!
Given the dictionary, we can state a rule that tells us what letters are possible at a given position in the output. If the original letter from the input is in the dictionary, we use the value; otherwise, there is a single possibility - the original letter itself. We can express that very neatly:
def candidates(letter):
d = {'R':['A','G'], 'Y':['C','T']}
return d.get(letter, [letter])
Knowing the candidates for each letter (which we can get by mapping our candidates function onto the letters in the pattern), we can create the Cartesian product of candidates, and collapse each result (which is a tuple of single-letter strings) into a single string by simply ''.joining them.
def substitute(pattern):
return [
''.join(result)
for result in itertools.product(*map(candidates, pattern))
]
Let's test it:
>>> substitute('ARCY')
['AACC', 'AACT', 'AGCC', 'AGCT']
The following generator function produces all of your desired strings, using enumerate, zip, itertools.product, a list comprehension and argument list unpacking all of which are very handy Python tools/concepts you should read up on:
from itertools import product
def multi_replace(s, d):
indexes, replacements = zip(*[(i, d[c]) for i, c in enumerate(s) if c in d])
# indexes: (1, 3)
# replacements: (['A', 'G'], ['C', 'T'])
l = list(s) # turn s into sth. mutable
# iterate over cartesian product of all replacement tuples ...
for p in product(*replacements):
for index, replacement in zip(indexes, p):
l[index] = replacement
yield ''.join(l)
d = {'R': ['A', 'G'], 'Y': ['C', 'T']}
s = 'ARCY'
for perm in multi_replace(s, d):
print perm
AACC
AACT
AGCC
AGCT
s = 'RRY'
AAC
AAT
AGC
AGT
GAC
GAT
GGC
GGT
Change ARCY to multiple list and use below code:
import itertools as it
list = [['A'], ['A','G'],['C'],['C','T']]
[''.join(item) for item in it.product(*list)]
or
import itertools as it
list = ['A', 'AG','C', 'CT']
[''.join(item) for item in it.product(*list)]

Categories