Generating specific bit sequences in Python - python

I have a list of numbers that represent the number of 1s in a row and I have an integer that represents the length of the entire sequence. So for example, if I get list [1, 2, 3] and length 8, then the only possible bit sequence is 10110111. But if the length was 9, I should get
010110111, 100110111, 101100111 and 101101110. I wonder if there is a simply pythonic way of doing this. My current method is
def genSeq(seq, length):
strSeq = [str(i) for i in seq]
strRep = strSeq + ["x" for i in xrange(length-sum(seq))]
perm = list(set(permutations(strRep)))
legalSeq = [seq for seq in perm if isLegal(seq) and [char for char in seq if char.isdigit()] == strSeq]
return [''.join(["1"*int(i) if i.isdigit() else "0" for i in seq]) for seq in legalSeq]
def isLegal(seq):
for i in xrange(len(seq)-1):
if seq[i].isdigit() and seq[i+1].isdigit(): return False
return True
print genSeq([1, 2, 3], 9)

My approach is as follows:
Figure out how many zeros will appear in the result sequences, by doing the appropriate arithmetic. We'll call this value n, and call the length of the input list k.
Now, imagine that we write out the n zeros for a given result sequence in a row. To create a valid sequence, we need to choose k places where we'll insert the appropriate number of ones, and we have n + 1 choices of where to do so - in between any two digits, or at the beginning or end. So we can use itertools.combinations to give us all the possible position-groups, asking it to choose k values from 0 up to n, inclusive.
Given a combination of ones-positions, in ascending order, we can figure out how many zeroes appear before the first group of ones (it's the first value from the combination), the second (it's the difference between the first two values from the combination), etc. As we iterate, we need to alternate between groups of zeros (the first and/or last of these might be empty) and groups of ones; so there is one more group of zeros than of ones, in general - but we can clean that up by adding a "dummy" zero-length group of ones to the end. We also need to get "overlapping" differences from the ones-positions; fortunately there's a trick for this.
This part is tricky when put all together; I'll write a helper function for it, first:
def make_sequence(one_positions, zero_count, one_group_sizes):
zero_group_sizes = [
end - begin
for begin, end in zip((0,) + one_positions, one_positions + (zero_count,))
]
return ''.join(
'0' * z + '1' * o
for z, o in zip(zero_group_sizes, one_group_sizes + [0])
)
Now we can get back to the iteration over all possible sequences:
def generate_sequences(one_group_sizes, length):
zero_count = length - sum(one_group_sizes)
for one_positions in itertools.combinations(
range(zero_count + 1), len(one_group_sizes)
):
yield make_sequence(one_positions, zero_count, one_group_sizes)
Which we could of course then fold back into a single function, but maybe you'll agree with me that it's better to keep something this clever in more manageable pieces :)
This is more fragile than I'd like - the one_positions and one_group_sizes sequences need to be "padded" to make things work, which in turn requires assuming their type. one_positions will be a tuple because that's what itertools.combinations produces, but I've hard-coded the assumption that the user-supplied one_group_sizes will be a list. There are ways around that but I'm a little exhausted at the moment :)
Anyway, the test:
>>> list(generate_sequences([1,2,3], 9))
['101101110', '101100111', '100110111', '010110111']
>>> list(generate_sequences([1,1,1], 9))
['101010000', '101001000', '101000100', '101000010', '101000001', '100101000', '
100100100', '100100010', '100100001', '100010100', '100010010', '100010001', '10
0001010', '100001001', '100000101', '010101000', '010100100', '010100010', '0101
00001', '010010100', '010010010', '010010001', '010001010', '010001001', '010000
101', '001010100', '001010010', '001010001', '001001010', '001001001', '00100010
1', '000101010', '000101001', '000100101', '000010101']

Related

How can I count sequences that meet these constraints?

I am trying to count permutations of a sequence of I and O symbols, representing e.g. people entering (I for "in") and leaving (O for "out") a room. For a given n many I symbols, there should be exactly as many O symbols, giving a total length of 2*n for the sequence. Also, at any point in a valid permutation, the number of O symbols must be less than or equal to the number of I symbols (since it is not possible for someone to leave the room when it is empty).
Additionally, I have some initial prefix of I and O symbols, representing people who previously entered or left the room. The output should only count sequences starting with that prefix.
For example, for n=1 and an initial state of '', the result should be 1 since the only valid sequence is IO; for n=3 and an initial state of II, the possible permutations are
IIIOOO
IIOIOO
IIOOIO
for a result of 3. (There are five ways for three people to enter and leave the room, but the other two involve the first person leaving immediately.)
I'm guessing the simplest way to solve this is using itertools.permutations. This is my code so far:
n=int(input()) ##actual length will be 2*n
string=input()
I_COUNT=string.count("I")
O_COUNT=string.count("O")
if string[0]!="I":
sys.exit()
if O_COUNT>I_COUNT:
sys.exit()
perms = [''.join(p) for p in permutations(string)]
print(perms)
the goal is to get the permutation for whatever is left out of the string and append it to the user's input, so how can I append user's input to the remaining length of the string and get the count for permutation?
#cache
def count_permutations(ins: int, outs: int):
# ins and outs are the remaining number of ins and outs to process
assert outs >= ins
if ins == 0 :
# Can do nothing but output "outs"
return 1
elif outs == ins:
# Your next output needs to be an I else you become unbalanced
return count_permutations(ins - 1, outs)
else:
# Your. next output can either be an I or an O
return count_permutations(ins - 1, outs) + count_permutations(ins, outs - 1)
If, say you have a total of 5 Is and 5 Os, and you've already output one I, then you want: count_permutations(4, 5).
I'm guessing the simplest way to solve this is using itertools.permutations
Sadly, this will not be very helpful. The problem is that itertools.permutations does not care about the value of the elements it's permuting; it treats them as all distinct regardless. So if you have 6 input elements, and ask for length-6 permutations, you will get 720 results, even if all the inputs are the same.
itertools.combinations has the opposite issue; it doesn't distinguish any elements. When it selects some elements, it only puts those elements in the order they initially appeared. So if you have 6 input elements and ask for length-6 combinations, you will get 1 result - the original sequence.
Presumably what you wanted to do is generate all the distinct ways of arranging the Is and Os, then take out the invalid ones, then count what remains. This is possible, and the itertools library can help with the first step, but it is not straightforward.
It will be simpler to use a recursive algorithm directly. The general approach is as follows:
At any given time, we care about how many people are in the room and how many people must still enter. To handle the prefix, we simply count how many people are in the room right now, and subtract that from the total number of people in order to determine how many must still enter. I leave the input handling as an exercise.
To determine that count, we count up the ways that involve the next action being I (someone comes in), plus the ways that involve the next action being O (someone leaves).
If everyone has entered, there is only one way forward: everyone must leave, one at a time. This is a base case.
Otherwise, it is definitely possible for someone to come in. We recursively count the ways for everyone else to enter after that; in the recursive call, there is one more person in the room, and one fewer person who must still enter.
If there are still people who have to enter, and there is also someone in the room right now, then it is also possible for someone to leave first. We recursively count the ways for others to enter after that; in the recursive call, there is one fewer person in the room, and the same number who must still enter.
This translates into code fairly directly:
def ways_to_enter(currently_in, waiting):
if waiting == 0:
return 1
result = ways_to_enter(currently_in + 1, waiting - 1)
if currently_in > 0:
result += ways_to_enter(currently_in - 1, waiting)
return result
Some testing:
>>> ways_to_enter(0, 1) # n = 1, prefix = ''
1
>>> ways_to_enter(2, 1) # n = 3, prefix = 'II'; OR e.g. n = 4, prefix = 'IIOI'
3
>>> ways_to_enter(0, 3) # n = 3, prefix = ''
5
>>> ways_to_enter(0, 14) # takes less than a second on my machine
2674440
We can improve the performance for larger values by decorating the function with functools.cache (lru_cache prior to 3.9), which will memoize results of the previous recursive calls. The more purpose-built approach is to use dynamic programming techniques: in this case, we would initialize 2-dimensional storage for the results of ways_to_enter(x, y), and compute those values one at a time, in such a way that the values needed for the "recursive calls" have already been done earlier in the process.
That direct approach would look something like:
def ways_to_enter(currently_in, waiting):
# initialize storage
results = [[0] * currently_in for _ in waiting]
# We will iterate with `waiting` as the major axis.
for w, row in enumerate(results):
for c, column in enumerate(currently_in):
if w == 0:
value = 1
else:
value = results[w - 1][c + 1]
if c > 0:
value += results[w][c - 1]
results[w][c] = value
return results[-1][-1]
The product() function from itertools will allow you to generate all the possible sequences of 'I' and 'O' for a given length.
From that list, you can filter by the sequences that start with the user-supplied start_seq.
From that list, you can filter by the sequences that are valid, given your rules of the number and order of the 'I's and 'O's:
from itertools import product
def is_valid(seq):
'''Evaluates a sequence I's and O's following the rules that:
- there cannot be more outs than ins
- the ins and outs must be balanced
'''
_in, _out = 0, 0
for x in seq:
if x == 'I':
_in += 1
else:
_out += 1
if (_out > _in) or (_in > len(seq)/2):
return False
return True
# User inputs...
start_seq = 'II'
assert start_seq[0] != 'O', 'Starting sequence cannot start with an OUT.'
n = 3
total_len = n*2
assert len(start_seq) < total_len, 'Starting sequence is at least as big as total number, nothing to iterate.'
# Calculate all possible sequences that are total_len long, as tuples of 'I' and 'O'
seq_tuples = product('IO', repeat=total_len)
# Convert tuples to strings, e.g., `('I', 'O', 'I')` to `'IOI'`
sequences = [''.join(seq_tpl) for seq_tpl in seq_tuples]
# Filter for sequences that start correctly
sequences = [seq for seq in sequences if seq.startswith(start_seq)]
# Filter for valid sequences
sequences = [seq for seq in sequences if is_valid(seq)]
print(sequences)
and I get:
['IIIOOO', 'IIOIOO', 'IIOOIO']
Not very elegant perhaps but this certainly seems to fulfil the brief:
from itertools import permutations
def isvalid(start, p):
for c1, c2 in zip(start, p):
if c1 != c2:
return 0
n = 0
for c in p:
if c == 'O':
if (n := n - 1) < 0:
return 0
else:
n += 1
return 1
def calc(n, i):
s = i + 'I' * (n - i.count('I'))
s += 'O' * (n * 2 - len(s))
return sum(isvalid(i, p) for p in set(permutations(s)))
print(calc(3, 'II'))
print(calc(3, 'IO'))
print(calc(3, 'I'))
print(calc(3, ''))
Output:
3
2
5
5
def solve(string,n):
countI =string.count('I')
if countI==n:
return 1
countO=string.count('O')
if countO > countI:
return 0
k= solve(string + 'O',n)
h= solve(string + 'I',n)
return k+h
n= int(input())
string=input()
print(solve(string,n))
This is a dynamic programming problem.
Given the number of in and out operations remaining, we do one of the following:
If we're out of either ins or outs, we can only use operations of the other type. There is only one possible assignment.
If we have an equal number of ins or outs, we must use an in operation according to the constraints of the problem.
Finally, if we have more ins than outs, we can perform either operation. The answer, then, is the sum of the number of sequences if we choose to use an in operation plus the number of sequences if we choose to use an out operation.
This runs in O(n^2) time, although in practice the following code snippet can be made faster using a 2D-list rather than the cache annotation (I've used #cache in this case to make the recurrence easier to understand).
from functools import cache
#cache
def find_permutation_count(in_remaining, out_remaining):
if in_remaining == 0 or out_remaining == 0:
return 1
elif in_remaining == out_remaining:
return find_permutation_count(in_remaining - 1, out_remaining)
else:
return find_permutation_count(in_remaining - 1, out_remaining) + find_permutation_count(in_remaining, out_remaining - 1)
print(find_permutation_count(3, 3)) # prints 5
The number of such permutations of length 2n is given by the n'th Catalan number. Wikipedia gives a formula for Catalan numbers in terms of central binomial coefficients:
from math import comb
def count_permutations(n):
return comb(2*n,n) // (n+1)
for i in range(1,10):
print(i, count_permutations(i))
# 1 1
# 2 2
# 3 5
# 4 14
# 5 42
# 6 132
# 7 429
# 8 1430
# 9 4862

How to loop to generate string in sequence?

I am trying to create a loop where I can generate string using loop. What I am trying to achieve is that I want to create a small collection of strings starting from 1 character to up to 5 characters.
So, starting from sting 1, I want to go to 55555 but this is number so it seems easy if I just add them, but when it comes to alpha numeric, it gets tricky.
Here is explanation,
I have collection of alpha-numeric chars as string s = "123ABC" and what I want to do is that I want to create all possible 1 character string out of it, so I will have 1,2,3,A,B,C and after that I want to add one more digit in length of string so I can get 11, 12, 13 and so on until I get all possible combination out of it up to CA, CB, CC and I want to get it up to CCCCCC. I am confused in loop because I can get it to generate a temp sting but looping inside to rotate characters is tricky,
this is what I have done so far,
i = 0
strr = "123ABC"
while i < len(strr):
t = strr[0] * (i+1)
for q in range(0, len(t)):
# Here I need help to rotate more
pass
i += 1
Can anyone explain me or point me to resource where I can find solution for it?
You may want to use itertools.permutations function:
import itertools
chars = '123ABC'
for i in xrange(1, len(chars)+1):
print list(itertools.permutations(chars, i))
EDIT:
To get a list of strings, try this:
import itertools
chars = '123ABC'
strings = []
for i in xrange(1, len(chars)+1):
strings.extend(''.join(x) for x in itertools.permutations(chars, i))
This is a nested loop. Different depths of recursion produce all possible combinations.
strr = "123ABC"
def prod(items, level):
if level == 0:
yield []
else:
for first in items:
for rest in prod(items, level-1):
yield [first] + rest
for ln in range(1, len(strr)+1):
print("length:", ln)
for s in prod(strr, ln):
print(''.join(s))
It is also called cartesian product and there is a corresponding function in itertools.

how to make an imputed string to a list, change it to a palindrome(if it isn't already) and reverse it as a string back

A string is palindrome if it reads the same forward and backward. Given a string that contains only lower case English alphabets, you are required to create a new palindrome string from the given string following the rules gives below:
1. You can reduce (but not increase) any character in a string by one; for example you can reduce the character h to g but not from g to h
2. In order to achieve your goal, if you have to then you can reduce a character of a string repeatedly until it becomes the letter a; but once it becomes a, you cannot reduce it any further.
Each reduction operation is counted as one. So you need to count as well how many reductions you make. Write a Python program that reads a string from a user input (using raw_input statement), creates a palindrome string from the given string with the minimum possible number of operations and then prints the palindrome string created and the number of operations needed to create the new palindrome string.
I tried to convert the string to a list first, then modify the list so that should any string be given, if its not a palindrome, it automatically edits it to a palindrome and then prints the result.after modifying the list, convert it back to a string.
c=raw_input("enter a string ")
x=list(c)
y = ""
i = 0
j = len(x)-1
a = 0
while i < j:
if x[i] < x[j]:
a += ord(x[j]) - ord(x[i])
x[j] = x[i]
print x
else:
a += ord(x[i]) - ord(x[j])
x [i] = x[j]
print x
i = i + 1
j = (len(x)-1)-1
print "The number of operations is ",a print "The palindrome created is",( ''.join(x) )
Am i approaching it the right way or is there something I'm not adding up?
Since only reduction is allowed, it is clear that the number of reductions for each pair will be the difference between them. For example, consider the string 'abcd'.
Here the pairs to check are (a,d) and (b,c).
Now difference between 'a' and 'd' is 3, which is obtained by (ord('d')-ord('a')).
I am using absolute value to avoid checking which alphabet has higher ASCII value.
I hope this approach will help.
s=input()
l=len(s)
count=0
m=0
n=l-1
while m<n:
count+=abs(ord(s[m])-ord(s[n]))
m+=1
n-=1
print(count)
This is a common "homework" or competition question. The basic concept here is that you have to find a way to get to minimum values with as few reduction operations as possible. The trick here is to utilize string manipulation to keep that number low. For this particular problem, there are two very simple things to remember: 1) you have to split the string, and 2) you have to apply a bit of symmetry.
First, split the string in half. The following function should do it.
def split_string_to_halves(string):
half, rem = divmod(len(string), 2)
a, b, c = '', '', ''
a, b = string[:half], string[half:]
if rem > 0:
b, c = string[half + 1:], string[rem + 1]
return (a, b, c)
The above should recreate the string if you do a + c + b. Next is you have to convert a and b to lists and map the ord function on each half. Leave the remainder alone, if any.
def convert_to_ord_list(string):
return map(ord, list(string))
Since you just have to do a one-way operation (only reduction, no need for addition), you can assume that for each pair of elements in the two converted lists, the higher value less the lower value is the number of operations needed. Easier shown than said:
def convert_to_palindrome(string):
halfone, halftwo, rem = split_string_to_halves(string)
if halfone == halftwo[::-1]:
return halfone + halftwo + rem, 0
halftwo = halftwo[::-1]
zipped = zip(convert_to_ord_list(halfone), convert_to_ord_list(halftwo))
counter = sum([max(x) - min(x) for x in zipped])
floors = [min(x) for x in zipped]
res = "".join(map(chr, floors))
res += rem + res[::-1]
return res, counter
Finally, some tests:
target = 'ideal'
print convert_to_palindrome(target) # ('iaeai', 6)
target = 'euler'
print convert_to_palindrome(target) # ('eelee', 29)
target = 'ohmygodthisisinsane'
print convert_to_palindrome(target) # ('ehasgidihmhidigsahe', 84)
I'm not sure if this is optimized nor if I covered all bases. But I think this pretty much covers the general concept of the approach needed. Compared to your code, this is clearer and actually works (yours does not). Good luck and let us know how this works for you.

Printing from 2 lists on one line

What I have so far:
def balance_equation(species,coeff):
data=zip(coeff,species)
positive=[]
negative=[]
for (mul,el) in data:
if int(mul)<0:
negative.append((el,mul))
if int(mul)>0:
positive.append((el,mul))
I know this does not print anything. What I have is a function that takes in two lists species=['H2O','O2','CO2'] and coeff=['1','3','-4']. I need it to print like so:
1H20+3O2=4CO2
I started by putting the negative coeff and species in one list and the positive in the other. I just can seem to be able to get the two to print right.
Try this:
species = ["H2O", "CO2", "O2"]
coeff = ['1', '-4', '3']
pos = [c + s for c, s in zip(coeff, species) if int(c) > 0]
neg = [c[1:] + s for c, s in zip(coeff, species) if int(c) < 0]
print ("+".join(pos))+"="+("+".join(neg))
EDIT: I took out the spaces.
2nd EDIT: coeff is a list of strings.
You should also test if pos or neg are empty to replace them with 0s when appropriate. It appears that the coefficients are integers.
Breaking things down into steps is a good way to solve things (you can always recombine the steps later), and you've got 80% of the way there.
You already have positive and negative lists. So, you need to convert each one into a string, then just:
print poshalf, "=", neghalf
So, how do you convert positive into poshalf? Well, it's a representation of each member, separated by '+', so if you had a function stringify that could turn each member into its representation, it's just:
poshalf = '+'.join(stringify(el, mul) for (el, mul) in pos)
neghalf = '+'.join(stringify(el, mul)[1:] for (el, mul) in neg)
The [1:] there is to take out the - sign. If mul is actually an integer rather than a string, it probably makes sense to just negate the value before passing it to stringify:
neghalf = '+'.join(stringify(el, -mul) for (el, mul) in neg)
Now, what does that "stringify" function look like? Well, each one member is an (el, mul) pair. If they were both strings, you could just add them. From your previous questions, mul may end up being some kind of number at this point, but that's almost as easy:
def stringify(el, mul):
return str(mul) + el
Put it all together, and you're done.
One way to make this all simpler: If you never use the (el, mul) for any other purpose except to call stringify on it, just call stringify in the first place and store the result:
def balance_equation(species,coeff):
data=zip(coeff,species)
positive=[]
negative=[]
for (mul,el) in data:
if int(mul)<0:
negative.append(str(mul)[1:] + el)
if int(mul)>0:
positive.append(str(mul) + el)
return positive, negative
Remember that last line, which you've left off both versions of your previous question and this question! If you never return the values, all that effort figuring them out is wasted, and the caller just gets None as an answer.
Obviously either the str or the int is unnecessary, but I've left them both in for safety; you should look at your surrounding code and remove the unnecessary one. (If you're taking mul as an int, you probably want str(-mul) instead of str(mul)[1:], as described earlier.)
Once you've got this, if you understand list comprehensions, you might realize that this is a familiar pattern: start with [], loop over some other collection, and append each value that meets some test. In other words:
def balance_equation(species,coeff):
data=zip(coeff,species)
positive = [str(mul) + el for (mul, el) in data if int(mul) > 0]
negative = [str(mul) + el for (mul, el) in data if int(mul) < 0]
return positive, negative
You might notice that you can simplify this even further—the only thing you use the lists for is to build the strings, so maybe you just want a function that returns a string equation (in which case you can use a generator expression instead of a list comprehension—if you don't know about them yet, ignore that part).
def balance_equation(species,coeff):
data=zip(coeff,species)
positive = '+'.join(str(mul) + el for (mul, el) in data if int(mul) > 0)
negative = '-'.join(str(mul) + el for (mul, el) in data if int(mul) < 0)
return positive + '=' + negative
Of course now you're returning a string instead of a pair of lists, so you have to change your calling code to just print balance_equation(species, coeff) instead of combining the lists.
One last thing: You seem to reverse the order of the coefficients/multipliers and species/elements at each call. For example:
def balance_equation(species,coeff):
data=zip(coeff,species)
Unless there's a good reason to do otherwise, it's better to pick one order and be consistent throughout, or you're invariably going to run into a bug where you get them backwards.

Print out a large list from file into multiple sublists with overlapping sequences in python

currently I have a very long sequence in a file and I wish to split this sequence into smaller subsequences, but I would like each subsequence to have an overlap from the previous sequence, and place them into a list. here is an example of what I mean:
(apologies about the cryptic sequence, this is all on 1 line)
file1.txt
abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft
list1 = ["abcdefessdfekgheithrfkopeifhght", "fhghtryrhfbcvdfersdwtiyuyrterdhc", "erdhcbgjherytyekdnfiwyt", "nfiwytowihfiwoeirehjiwoqpft"]
I can currently split each sequence into smaller saubsequences without the overlaps using the following code:
def chunks(seq, n):
division = len(seq) / float (n)
return [ seq[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]
in the above code the n specifies how many subsequences the list will be split into.
I was thinking of just grabbing the ends of each subsequence and just concatenating them to the ends of the elements in the list by hard coding it... but this would be inefficient and hard. is there an easy way to do this?
in reality it would be more about 100 characters that i would require to be overlapped.
Thanks guys
seq="abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft"
>>> n = 4
>>> overlap = 5
>>> division = len(seq)/n
>>> [seq[i*division:(i+1)*division+overlap] for i in range(n)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']
it is probably slightly more efficient to do it like this
>>> [seq[i:i+division+overlap] for i in range(0,n*division,division)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']
If you want to split your sequence seq into subsequences of length length with overlap number of characters/elements shared between each subsequence and its successor:
def split_with_overlap(seq, length, overlap):
return [seq[i:i+length] for i in range(0, len(seq), length - overlap)]
Then testing it on your original data:
>>> seq = 'abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft'
>>> split_with_overlap(seq, 31, 5)
['abcdefessdfekgheithrfkopeifhght', 'fhghtryrhfbcvdfersdwtiyuyrterdh', 'terdhcbgjherytyekdnfiwytowihfiw', 'ihfiwoeirehjiwoqpft']

Categories