Randomly choosing a % of elements in a string and changing the value - python

I have a sting and need to randomly replace 5% of the elements, and flip them to 0 if they are 1, and flip them to 1 if they are 0.
I have a string that looks like this:
'10011110110001000111010011010100101100100110111000010001111100001010000011101100011110100110001110010101010000100111000101001100100110110010010100000010111110000011001001011011010111111010001011101011110100000101010110100001001011010000111110101011001101011000100100010010100011100001011011110001010101010101100001111111010101000010011010010110111100011111001011100101001000101011110000010111101111101100010010010011011101101110110000000000101010101010101011111011010111000101010010001010110011101011'
Effectively, 5% of the values in the string will change from a 0 to 1, or vice versa.
I have tried this but it does not seem to work, and isn't guaranteed to only replace 5% of the elements:
for i in range(500):
if random.random() < 0.05:
if test[i] == '1':
test[i] == '0'
else:
test[i] == '1'

You need to change two things
strings are immutable. So, test = list(test) before your loop, and test = ''.join(test) after it
Choose in advance which elements you want to change
First, choose n random indexes. One option is using random.choice without replacement
num_elements = int(0.05 * len(test))
indexes = random.choice(list(range(len(test)), num_elements, replace=False)
and then modify the values as before.

This will work on Python 2.7 and 3.6 (tested; probably newer as well).
import random
sting = '10011110110001000111010011010100101100100110111000010001111100001010000011101100011110100110001110010101010000100111000101001100100110110010010100000010111110000011001001011011010111111010001011101011110100000101010110100001001011010000111110101011001101011000100100010010100011100001011011110001010101010101100001111111010101000010011010010110111100011111001011100101001000101011110000010111101111101100010010010011011101101110110000000000101010101010101011111011010111000101010010001010110011101011'
sting = ''.join([chr(ord(y) ^ 1) if x in random.sample(range(len(sting)),len(sting)//20) else y for x,y in enumerate(list(sting))])
print (sting)
I have tried this and it works, and is guaranteed to only replace 5% of the elements.
It toggles exactly 5% -- the length of sting divided by 20, which is 5% -- characters from 0 to 1 and from 1 to 0, without duplicates. Toggling is done with XOR operator and random.sample picks indices to replace "unique elements", i.e. without duplicates.

Related

IndexError: list index out of range in codewars puzzle

As a beginner in Python I decided to have a go at the Codewars puzzles.
Codewars uses Python 2.7.6.
The second puzzle requires you to:
Write a function that will return the count of distinct case-insensitive alphabetic characters and numeric digits that occur more than once in the input string. The input string can be assumed to contain only alphabets (both uppercase and lowercase) and numeric digits.
For example, if you give the program "abcde" it should give you 0, because there are no duplicates. But, if you give it "indivisibilities" it should give you 2, because there are 2 duplicate letters: i (occurs 7 times) and s (occurs twice).
As a beginner I came up with an approach that I imagine is very crude, but nevertheless it works perfectly on my system:
def duplicate_count(text):
# the number of duplicates
dupes = 0
# convert input string to lower case and split into individual characters
list_of_chars = list(text.lower())
# sort list into groups
sorted_chars = sorted(list_of_chars)
# get length of list
n = len(sorted_chars)
# check whether the first element of the list is the same as the second. If
# it is, add one to the dupes count
if sorted_chars[0] == sorted_chars[1]:
dupes += 1
else:
dupes += 0
# start with the second element (index: 1) and finish with the (n - 1)-th
# element
for i in range(1, n - 1):
# if the ith element of the list is the same as the next one, add one
# to the dupes count. However, since we only want to count each
# duplicate once, we must check that the ith element is not the same as
# the previous one
if sorted_chars[i] == sorted_chars[i + 1] and sorted_chars[i] != sorted_chars[i - 1]:
dupes += 1
else:
dupes += 0
return dupes
This passes all of the automated tests, but when I submit this as a solution I get an STDERR:
Traceback:
in <module>
in duplicate_count
IndexError: list index out of range
As I understand it, this error is given if I try and access an element of the list that does not exist. But I cannot see where in my code I am doing that. I calculate the length of my list and store it in n. So let's say I supply the string "ababa" to duplicate_count, it should generate a list sorted_chars: ['a', 'a', 'a', 'b', 'b'] of length 5. So n = 5. Therefore range(1, n - 1) = range(1, 4) which will generate the numbers 1, 2 and 3. Thus for i in range(1, n - 1) is, mathematically speaking, for each i ϵ I = {1, 2, 3}. The largest index I therefore use in this code is 4 (if sorted_chars[i] == sorted_chars[i + 1]), which is fine, because there is an element at index 4 (in this case 'b').
Why, then, is Codewars giving me this error.
In this case, your function requires at least two characters to work. Try running duplicate_count('a') and see the error it throws. Add the following after n = len(sorted_chars):
if n < 2:
return 0
That will stop running the rest of the function and return 0 duplicates (because you can't have any if there's only one character).

Calculating the similarity of multiple elements with unequal length of a nested list

I have a nested list, with every second element having varying lengths:
lst = [[a,bcbcbcbcbc],[e,bbccbbccb],[i,ccbbccbb],[o,cbbccbb]]
My output is a csv of dataframe with this look:
comparison similarity_score
a:e *some score
a:i *some score
a:o *some score
e:i *some score
e:o *some score
i:o *some score
my code:
similarity = []
for i in lst:
name = i[0]
string = i[1]
score = 0.0
length =(len(string))
for i in range(length):
if string[i]==string[i+1]:
score += 1.0
new_score = (100.0*score)/length
name_seq = name[i] + ':' + name[i+1]
similarity.append(name_seq,new_score)
similarity.pdDataFrame(similarity, columns = ['comparison' , 'similarity_score'])
similarity.to_csv('similarity_score.csv')
but I am recieving an error:
if codes[i]==codes[i+1]:
IndexError: string index out of range
any advice? thanks!
According to Python's documentation range does the following by example:
>>>range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In your code (assuming variable names have not changed):
...
length =(len(string)) # For an input of 'bcb' length will be 3
for i in range(length): # For an input of 'bcb' range will be [0, 1, 2]
if string[i]==string[i+1]: # When i == 2 i + 1 == 3 which gives you the
# IndexError: string index out of range
...
In other words, given an input bcb, your if statement will look at the following indices:
(0, 1)
(1, 2)
(2, 3) <-- The 3 in this case is your issue.
To fix your issue iterate from [0, len(string) - 1]
I think your biggest issue is that at the top level you're just iterating on one name,string pair at a time, not a pair of name,string pairs like you want to see in your output (as shown by the paired names a:e).
You're trying to index the name and string values later on, but doing so is not achieving what you want (comparing two strings to each other to compute a score), since you're only accessing adjacent characters in the same string. The exception you're getting is because i+1 may go off the end of the string. There's further confusion since you're using i for both the index in the inner loop and as the items taken from the outer loop (the name, string pairs).
To get pairs of pairs, I suggest using itertools.combinations:
import itertools
for [name1, string1], [name2, string2] in itertools.combinations(lst, 2):
Now you can use the two name and two string variables in the rest of the loop.
I'm not entirely sure I understand how you want to compare the strings to get your score, since they're not the same length as one another. If you want to compare just the initial parts of the strings (and ignore the trailing bit of the longer one), you could use zip to get pairs of corresponding characters between the two strings. You can then compare them in a generator expression and add up the bool results (True is a special version of the integer 1 and False is a version of 0). You can then divide by the smaller of the string's lengths (or maybe the larger if you want to penalize length differences):
common_letters = sum(c1 == c2 for c1, c2 in zip(string1, string2))
new_score = common_letters * 100 / min(len(string1), len(string2))
There's one more obvious issue, where you're calling append with two arguments. If you really want to be appending a 2-tuple, you need an extra set of parentheses:
similarity.append((name_seq, new_score))

Generating specific bit sequences in Python

I have a list of numbers that represent the number of 1s in a row and I have an integer that represents the length of the entire sequence. So for example, if I get list [1, 2, 3] and length 8, then the only possible bit sequence is 10110111. But if the length was 9, I should get
010110111, 100110111, 101100111 and 101101110. I wonder if there is a simply pythonic way of doing this. My current method is
def genSeq(seq, length):
strSeq = [str(i) for i in seq]
strRep = strSeq + ["x" for i in xrange(length-sum(seq))]
perm = list(set(permutations(strRep)))
legalSeq = [seq for seq in perm if isLegal(seq) and [char for char in seq if char.isdigit()] == strSeq]
return [''.join(["1"*int(i) if i.isdigit() else "0" for i in seq]) for seq in legalSeq]
def isLegal(seq):
for i in xrange(len(seq)-1):
if seq[i].isdigit() and seq[i+1].isdigit(): return False
return True
print genSeq([1, 2, 3], 9)
My approach is as follows:
Figure out how many zeros will appear in the result sequences, by doing the appropriate arithmetic. We'll call this value n, and call the length of the input list k.
Now, imagine that we write out the n zeros for a given result sequence in a row. To create a valid sequence, we need to choose k places where we'll insert the appropriate number of ones, and we have n + 1 choices of where to do so - in between any two digits, or at the beginning or end. So we can use itertools.combinations to give us all the possible position-groups, asking it to choose k values from 0 up to n, inclusive.
Given a combination of ones-positions, in ascending order, we can figure out how many zeroes appear before the first group of ones (it's the first value from the combination), the second (it's the difference between the first two values from the combination), etc. As we iterate, we need to alternate between groups of zeros (the first and/or last of these might be empty) and groups of ones; so there is one more group of zeros than of ones, in general - but we can clean that up by adding a "dummy" zero-length group of ones to the end. We also need to get "overlapping" differences from the ones-positions; fortunately there's a trick for this.
This part is tricky when put all together; I'll write a helper function for it, first:
def make_sequence(one_positions, zero_count, one_group_sizes):
zero_group_sizes = [
end - begin
for begin, end in zip((0,) + one_positions, one_positions + (zero_count,))
]
return ''.join(
'0' * z + '1' * o
for z, o in zip(zero_group_sizes, one_group_sizes + [0])
)
Now we can get back to the iteration over all possible sequences:
def generate_sequences(one_group_sizes, length):
zero_count = length - sum(one_group_sizes)
for one_positions in itertools.combinations(
range(zero_count + 1), len(one_group_sizes)
):
yield make_sequence(one_positions, zero_count, one_group_sizes)
Which we could of course then fold back into a single function, but maybe you'll agree with me that it's better to keep something this clever in more manageable pieces :)
This is more fragile than I'd like - the one_positions and one_group_sizes sequences need to be "padded" to make things work, which in turn requires assuming their type. one_positions will be a tuple because that's what itertools.combinations produces, but I've hard-coded the assumption that the user-supplied one_group_sizes will be a list. There are ways around that but I'm a little exhausted at the moment :)
Anyway, the test:
>>> list(generate_sequences([1,2,3], 9))
['101101110', '101100111', '100110111', '010110111']
>>> list(generate_sequences([1,1,1], 9))
['101010000', '101001000', '101000100', '101000010', '101000001', '100101000', '
100100100', '100100010', '100100001', '100010100', '100010010', '100010001', '10
0001010', '100001001', '100000101', '010101000', '010100100', '010100010', '0101
00001', '010010100', '010010010', '010010001', '010001010', '010001001', '010000
101', '001010100', '001010010', '001010001', '001001010', '001001001', '00100010
1', '000101010', '000101001', '000100101', '000010101']

(Binary) Summing the elements of a list

I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
def binary_search(l, low=0,high=-1):
if not l: return -1
if(high == -1): high = len(l)-1
if low == high:
if l[low] == 1: return low
else: return -1
mid = (low + high)//2
upper = [l[mid:high]]
lower = [l[0:mid-1]]
u = sum(int(x) for x in upper)
lo = sum(int(x) for x in lower)
if u == 1: return binary_search(upper, mid, high)
elif lo == 1: return binary_search(lower, low, mid-1)
return -1
l = [0 for x in range(255)]
l[123] = 1
binary_search(l)
The code I'm using to test
u = sum(int(x) for x in upper)
works fine in the interpreter, but gives me the error
TypeError: int() argument must be a string or a number, not 'list'
I've just started to use python, and can't figure out what's going wrong (the version I've written in c++ doesn't work either).
Does anyone have any pointers?
Also, how would I do the sum so that it is a binary xor, not simply decimal addition?
You don't actually want a sum; you want to know whether upper or lower contains a 1 value. Just take advantage of Python's basic container-type syntax:
if 1 in upper:
# etc
if 1 in lower:
# etc
The reason you're getting the error, by the way, is because you're wrapping upper and lower with an extra nested list when you're trying to split l (rename this variable, by the way!!). You just want to split it like this:
upper = the_list[mid:high]
lower = the_list[:mid-1]
Finally, it's worth noting that your logic is pretty weird. This is not a binary search in the classic sense of the term. It looks like you're implementing "find the index of the first occurrence of 1 in this list". Even ignoring the fact that there's a built-in function to do this already, you would be much better served by just iterating through the whole list until you find a 1. Right now, you've got O(nlogn) time complexity (plus a bunch of extra one-off loops), which is pretty silly considering the output can be replicated in O(n) time by:
def first_one(the_list):
for i in range(len(the_list)):
if the_list[i] == 1:
return i
return -1
Or of course even more simply by using the built-in function index:
def first_one(the_list):
try:
return the_list.index(1)
except ValueError:
return -1
I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
What's wrong with
int(1 in l)
I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
No need to sum the whole list; you can stop at the first 1. Simply use any(). It will return True if there is at least one truthy value in the container and False otherwise, and it short-circuits (i.e. if a truthy value is found early in the list, it doesn't scan the rest). Conveniently, 1 is truthy and 0 is not.
True and False work as 1 and 0 in an arithmetic context (Booleans are a subclass of integers), but if you want specifically 1 and 0, just wrap any() in int().
Stop making nested lists.
upper = l[mid:high]
lower = l[0:mid-1]

python fixed string size

Suppose I need a n elements long with each element seperated by a space. Further, I need the ith element to be a 1 and the rest to be a 0. What's the easiest pythonic way to do this?
Thus if n = 10, and i = 2, I would want
0 1 0 0 0 0 0 0 0 0
I want something like a new string[] that you can get in C++ but the Python version is eluding me.
Thanks!
Use a list.
n = 10
i = 2
mylist = ["0"] * n
mylist[i-1] = "1"
print " ".join(mylist)
The following Python generator expression should produce what you are looking for:
" ".join("1" if i == (x+1) else "0" for x in range(n))
The following example from the Python REPL should help you. The Python generator expression " ".join(bin((1<<n)+(1<<n>>i))[3:]) should be a solution.
>>> n=10
>>> i=2
>>> " ".join(bin((1<<n)+(1<<n>>i))[3:])
'0 1 0 0 0 0 0 0 0 0'
Just use "+" to join strings:
("0 "*i + "1 " + "0 "*(n-i-1))[:-1]
or you can also use bytearray:
a = bytearray(("0 "*n)[:-1])
a[i*2] = "1"
print str(a)
This looks like homework, so I'll describe my answer in English instead of Python.
Use a list comprehension to create a list of n strings, selecting each element from either '1' (if the index of the item is i) and '0' (otherwise). Join the elements of the list with a space between them.
There are several ways you can select from two items based on a boolean value. One of them would be to have the two items in a sequence, and index the sequence with the boolean value, thus picking the first element if False, and the second element if True. In this particular case, you can also convert the boolean value to '0' or '1' in several ways (e.g. convert to integer and then to string).
I don't understand the new string[] comment -- that's not C++ syntax.
Edit
Here's the Python:
' '.join([('0', '1')[x == i - 1] for x in range(n)])
or
' '.join([str(int(x == i - 1)) for x in range(n)])
I like the former more -- it's quite clear that it generates zeros and ones, and they can be changed to something else with ease. It's i - 1 instead of i to adjust for i being one-based and x being zero-based. Normally, I wouldn't do that. All things being equal, I prefer to work with zero-based indices throughout, except when input and output formats demand one-based values. In that case, I convert inputs as soon as possible and outputs as late as possible, and definitely avoid mixing the two in the same bit of code.
You can safely drop the square brackets, turning the list comprehension into a generator comprehension. For most intents and purposes, they work the same.

Categories