Binary string in Python issues

Binary string in Python issues - python

For some reason I'm having a heck of a time figuring out how to do this in Python.
I am trying to represent a binary string in a string variable, and all I want it to have is
0010111010
However, no matter how I try to format it as a string, Python always chops off the leading zeroes, which is giving me a headache in trying to parse it out.
I'd hoped this question would have helped, but it doesn't really...
Is there a way to force Python to stop auto-converting my string to an integer?
I have tried the following:
val = ""
if (random.random() > 0.50):
val = val + "1"
else
val = val + "0"
and
val = ""
if (random.random() > 0.50):
val = val + "%d" % (1)
else:
val = val + "%d" % (0)
I had stuck it into an array previously, but ran into issues inserting that array into another array, so I figured it would just be easier to parse it as a string.
Any thoughts on how to get my leading zeroes back? The string is supposed to be a fixed length of 10 bits if that helps.
Edit:
The code:
def create_string(x):
for i in xrange(10): # 10 random populations
for j in xrange(int(x)): # population size
v = ''.join(choice(('0','1')) for _ in range(10))
arr[i][j] = v
return arr
a = create_string(5)
print a
Hopefully the output I'm seeing will show you why I'm having issues:
[[ 10000100 1100000001 101010110 111011 11010111]
[1001111000 1011011100 1110110111 111011001 10101000]
[ 110010001 1011010111 1100111000 1011100011 1000100001]
[ 10011010 1000011001 1111111010 11100110 110010101]
[1101010000 1010110101 110011000 1100001001 1010100011]
[ 10001010 1100000001 1110010000 10110000 11011010]
[ 111011 1000111010 1100101 1101110001 110110000]
[ 110100100 1100000000 1010101001 11010000 1000011011]
[1110101110 1100010101 1110001110 10011111 101101100]
[ 11100010 1111001010 100011101 1101010 1110001011]]
The issue here isn't only with printing, I also need to be able to manipulate them on a per-element basis. So if I go to play with the first element, then it returns a 1, not a 0 (on the first element).

If I understood you right, you could do it this way:
a = 0b0010111010
'{:010b}'.format(a)
#The output is: '0010111010'
Python 2.7
It uses string format method.
This is the answer if you want to represent the binary string with leading zeros.
If you are just trying to generate a random string with a binary you could do it this way:
from random import choice
''.join(choice(('0','1')) for _ in range(10))
Update
Unswering your update.
I made a code which has a different output if compared to yours:
from random import choice
from pprint import pprint
arr = []
def create_string(x):
for i in xrange(10): # 10 random populations
arr.append([])
for j in xrange(x): # population size
v = ''.join(choice(('0','1')) for _ in range(10))
arr[-1].append(v)
return arr
a = create_string(5)
pprint(a)
The output is:
[['1011010000', '1001000010', '0110101100', '0101110111', '1101001001'],
['0010000011', '1010011101', '1000110001', '0111101011', '1100001111'],
['0011110011', '0010101101', '0000000100', '1000010010', '1101001000'],
['1110101111', '1011111001', '0101100110', '0100100111', '1010010011'],
['0100010100', '0001110110', '1110111110', '0111110000', '0000001010'],
['1011001011', '0011101111', '1100110011', '1100011001', '1010100011'],
['0110011011', '0001001001', '1111010101', '1110010010', '0100011000'],
['1010011000', '0010111110', '0011101100', '1111011010', '1011101110'],
['1110110011', '1110111100', '0011000101', '1100000000', '0100010001'],
['0100001110', '1011000111', '0101110100', '0011100111', '1110110010']]
Is this what you are looking for?

How about the following:
In [30]: ''.join('1' if random.random() > 0.50 else '0' for i in xrange(10))
Out[30]: '0000110111'
This gives a ten-character binary string; there's no chopping off of leading zeroes.
If you don't need to vary digit probability (the 0.50 above), a slightly more concise version is:
In [39]: ''.join(random.choice('01') for i in xrange(10))
Out[39]: '0001101001'

Related

How a create a random string combination of postive, Negative, floats, Upper case, Lower case elements

I'm new to python seeking your help. I would like to create a string combination of postive, Negative, floats, Upper case, Lower case elements
Example: Like random combination.
As-1jP0.7M -->output
Explanation
A - caps A
s- small S
-1 - Negative 1
j- small j
0.7 - float 0.7
M- caps M
My ranges
Caps - A - Z
small- a -z
positive - 0 to 9
Negatives - -1 to -9
float - 0.1 to 0.9
I know I'm asking too much but by doing some basic researcg I got idea how to generate combination of Alphanumeric numbers like.
import random, string
x = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for _ in range(10))
print(x)
This ok...But, I'm completely clueless how to add Negative & float types along with alphanumerics..Any suggestions how to achieve it. Like we have any some shorcuts like string.floatdigits? or string.negatives? I've searched for similar syntax but, But I 'havent found anything

You can generate separate lists according to your ranges and use accordingly.
Generate separate lists according to your ranges
Merge them - Reference
Generate random item n number of times & Join as str - Reference
Code:
import string
import random
list_small=list(string.ascii_lowercase)
list_cap=list(string.ascii_uppercase)
list_dig = list(range(9,-10,-1)) #both postive & Neg
floats= [ round(x * 0.1, 1) for x in range(1, 10)]
#Merge list
all_list = list_small + list_cap + list_dig +floats
s =''.join(str(x) for x in ([random.choice(all_list) for i in range(10)]))
print(s)
#output
ircZC-1bj0.7k

Try this:
import random, string
x = ''.join(random.choice(list(string.ascii_uppercase) +
list(string.ascii_lowercase) +
['0']+[ch+dig for dig in string.digits[1:] for ch in ['', '-', '0.'] ])
for _ in range(10))
print(x)
Random output:0.3z0.7Q0.52-9Ghe
Another random output: Op-7-1R0.8M7wz

Splitting arrays in unequal chunks using np.array_split: overcomplicating?

I am trying to convert an octal number to decimal.
The inputs are a set of strings as numbers such as "23", or "23 24", or "23 24 25". My code works for inputs like this, but cannot handle say "23 240", or "23 240 1" I.e. when the inputs are of different lengths, the array splits them incorrectly.
I think I've overcomplicated it by using arrays. Is there a way to assess each input individually (i.e. "23" then "240" then "1"), and then put these back into the desired output "19 160 1"?
Code:
import numpy as np
def decode(code):
decimals = []
code = code.split()
for n in code:
p = len(n)
for digit in n:
decimal = int(digit) * (8**(p - 1))
decimals.append(decimal)
p -= 1
split_input = np.array_split(decimals, len(code))
sum_decimals = []
for number in split_input:
sum_decimal = sum(map(int, number))
sum_decimals.append(str(sum_decimal))
separate_outputs = " ".join(sum_decimals)
return str(separate_outputs)

Does this work? Try running it in an instant interpreter such as Google Colab
CODE:
import numpy as np
def decode(code):
if isinstance(code, str):
decimals = []
code = code.split(' ')
for n in code:
p = len(n)
for digit in n:
decimal = int(digit) * (8**(p - 1))
decimals.append(decimal)
p -= 1
split_input = np.array_split(decimals, len(code))
sum_decimals = []
for number in split_input:
sum_decimal = sum(map(int, number))
sum_decimals.append(str(sum_decimal))
separate_outputs = " ".join(sum_decimals)
return str(separate_outputs)
INPUT:
str1 = input('Enter octals:') #23 240 1
decode(str1)
OUTPUT:
19 160 1
EXTRAS:
If you copy code make sure to adjust tabs because that could lead to
multiple errors.
Also, you could put an else statement to raise
error if function does not receive a string array. OR, you could add
multiple if...else statements to handle different data formats
like: str, int, list, tuple, etc.

Given a string of a million numbers, return all repeating 3 digit numbers

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)
I pretty much screwed up on the first interview problem...
Question: Given a string of a million numbers (Pi for example), write
a function/program that returns all repeating 3 digit numbers and number of
repetition greater than 1
For example: if the string was: 123412345123456 then the function/program would return:
123 - 3 times
234 - 3 times
345 - 2 times
They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:
000 --> 999
Now that I'm thinking about it, I don't think it's possible to come up with a constant time algorithm. Is it?

You got off lightly, you probably don't want to be working for a hedge fund where the quants don't understand basic algorithms :-)
There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.
Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that's not usually how people use complexity analysis.
It appears to me you could have impressed them in a number of ways.
First, by informing them that it's not possible to do it in O(1), unless you use the "suspect" reasoning given above.
Second, by showing your elite skills by providing Pythonic code such as:
inpStr = '123412345123456'
# O(1) array creation.
freq = [0] * 1000
# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
freq[val] += 1
# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])
This outputs:
[(123, 3), (234, 3), (345, 2)]
though you could, of course, modify the output format to anything you desire.
And, finally, by telling them there's almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.
And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.
Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):
vv
123412 vv
123451
5123456
You can farm these out to separate workers and combine the results afterwards.
The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of "measure, don't guess" applies here, of course.
This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.
For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:
#include <stdio.h>
#include <string.h>
int main(void) {
static char inpStr[100000000+1];
static int freq[1000];
// Set up test data.
memset(inpStr, '1', sizeof(inpStr));
inpStr[sizeof(inpStr)-1] = '\0';
// Need at least three digits to do anything useful.
if (strlen(inpStr) <= 2) return 0;
// Get initial feed from first two digits, process others.
int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
char *inpPtr = &(inpStr[2]);
while (*inpPtr != '\0') {
// Remove hundreds, add next digit as units, adjust table.
val = (val % 100) * 10 + *inpPtr++ - '0';
freq[val]++;
}
// Output (relevant part of) table.
for (int i = 0; i < 1000; ++i)
if (freq[i] > 1)
printf("%3d -> %d\n", i, freq[i]);
return 0;
}

Constant time isn't possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.
For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.

A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:
from collections import Counter
def triple_counter(s):
c = Counter(s[n-3: n] for n in range(3, len(s)))
for tri, n in c.most_common():
if n > 1:
print('%s - %i times.' % (tri, n))
else:
break
if __name__ == '__main__':
import random
s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
triple_counter(s)
Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.
Parallel execution version
I wrote a blog post on this with more explanation.

The simple O(n) solution would be to count each 3-digit number:
for nr in range(1000):
cnt = text.count('%03d' % nr)
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
This would search through all 1 million digits 1000 times.
Traversing the digits only once:
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
for nr, cnt in enumerate(counts):
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
Timing shows that iterating only once over the index is twice as fast as using count.

Here is a NumPy implementation of the "consensus" O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say "385", adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.
def setup_data(n):
import random
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_np(text):
# Get the data into NumPy
import numpy as np
a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
# Rolling triplets
a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)
bins = np.zeros((10, 10, 10), dtype=int)
# Next line performs O(n) binning
np.add.at(bins, tuple(a3), 1)
# Filtering is left as an exercise
return bins.ravel()
def f_py(text):
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
return counts
import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
data = setup_data(n)
ref = f_np(**data)
print(f'n = {n}')
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
try:
assert np.all(ref == func(**data))
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
except:
print("{:16s} apparently crashed".format(name[2:]))
Unsurprisingly, NumPy is a bit faster than #Daniel's pure Python solution on large data sets. Sample output:
# n = 10
# np 0.03481400 ms
# py 0.00669330 ms
# n = 1000
# np 0.11215360 ms
# py 0.34836530 ms
# n = 1000000
# np 82.46765980 ms
# py 360.51235450 ms

I would solve the problem as follows:
def find_numbers(str_num):
final_dict = {}
buffer = {}
for idx in range(len(str_num) - 3):
num = int(str_num[idx:idx + 3])
if num not in buffer:
buffer[num] = 0
buffer[num] += 1
if buffer[num] > 1:
final_dict[num] = buffer[num]
return final_dict
Applied to your example string, this yields:
>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}
This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.

As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn't exists already in the dictionary.
The code will look something like this:
def calc_repeating_digits(number):
hash = {}
for i in range(len(str(number))-2):
current_three_digits = number[i:i+3]
if current_three_digits in hash.keys():
hash[current_three_digits] += 1
else:
hash[current_three_digits] = 1
return hash
You can filter down to the keys which have item value greater than 1.

As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.
However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.
My guess is that either the interviewer misspoke when they gave you the solution, or you misheard "constant time" when they said "constant space."

Here's my answer:
from timeit import timeit
from collections import Counter
import types
import random
def setup_data(n):
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_counter(text):
c = Counter()
for i in range(len(text)-2):
ss = text[i:i+3]
c.update([ss])
return (i for i in c.items() if i[1] > 1)
def f_dict(text):
d = {}
for i in range(len(text)-2):
ss = text[i:i+3]
if ss not in d:
d[ss] = 0
d[ss] += 1
return ((i, d[i]) for i in d if d[i] > 1)
def f_array(text):
a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
for n in range(len(text)-2):
i, j, k = (int(ss) for ss in text[n:n+3])
a[i][j][k] += 1
for i, b in enumerate(a):
for j, c in enumerate(b):
for k, d in enumerate(c):
if d > 1: yield (f'{i}{j}{k}', d)
for n in (1E1, 1E3, 1E6):
n = int(n)
data = setup_data(n)
print(f'n = {n}')
results = {}
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
for r in results:
print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))
The array lookup method is very fast (even faster than #paul-panzer's numpy method!). Of course, it cheats since it isn't technicailly finished after it completes, because it's returning a generator. It also doesn't have to check every iteration if the value already exists, which is likely to help a lot.
n = 10
counter 0.10595780 ms
dict 0.01070654 ms
array 0.00135370 ms
f_counter : []
f_dict : []
f_array : []
n = 1000
counter 2.89462101 ms
dict 0.40434612 ms
array 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter 2849.00500992 ms
dict 438.44007806 ms
array 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

Image as answer:
Looks like a sliding window.

Here is my solution:
from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}
With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point.
Hope it helps :)

-Telling from the perspective of C.
-You can have an int 3-d array results[10][10][10];
-Go from 0th location to n-4th location, where n being the size of the string array.
-On each location, check the current, next and next's next.
-Increment the cntr as resutls[current][next][next's next]++;
-Print the values of
results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]
-It is O(n) time, there is no comparisons involved.
-You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.

inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'
count = {}
for i in range(len(inputStr) - 2):
subNum = int(inputStr[i:i+3])
if subNum not in count:
count[subNum] = 1
else:
count[subNum] += 1
print count

Find length of a string that includes its own length?

I want to get the length of a string including a part of the string that represents its own length without padding or using structs or anything like that that forces fixed lengths.
So for example I want to be able to take this string as input:
"A string|"
And return this:
"A string|11"

On the basis of the OP tolerating such an approach (and to provide an implementation technique for the eventual python answer), here's a solution in Java.
final String s = "A String|";
int n = s.length(); // `length()` returns the length of the string.
String t; // the result
do {
t = s + n; // append the stringified n to the original string
if (n == t.length()){
return t; // string length no longer changing; we're good.
}
n = t.length(); // n must hold the total length
} while (true); // round again
The problem of, course, is that in appending n, the string length changes. But luckily, the length only ever increases or stays the same. So it will converge very quickly: due to the logarithmic nature of the length of n. In this particular case, the attempted values of n are 9, 10, and 11. And that's a pernicious case.

A simple solution is :
def addlength(string):
n1=len(string)
n2=len(str(n1))+n1
n2 += len(str(n2))-len(str(n1)) # a carry can arise
return string+str(n2)
Since a possible carry will increase the length by at most one unit.
Examples :
In [2]: addlength('a'*8)
Out[2]: 'aaaaaaaa9'
In [3]: addlength('a'*9)
Out[3]: 'aaaaaaaaa11'
In [4]: addlength('a'*99)
Out[4]: 'aaaaa...aaa102'
In [5]: addlength('a'*999)
Out[5]: 'aaaa...aaa1003'

Here is a simple python port of Bathsheba's answer :
def str_len(s):
n = len(s)
t = ''
while True:
t = s + str(n)
if n == len(t):
return t
n = len(t)
This is a much more clever and simple way than anything I was thinking of trying!
Suppose you had s = 'abcdefgh|, On the first pass through, t = 'abcdefgh|9
Since n != len(t) ( which is now 10 ) it goes through again : t = 'abcdefgh|' + str(n) and str(n)='10' so you have abcdefgh|10 which is still not quite right! Now n=len(t) which is finally n=11 you get it right then. Pretty clever solution!

It is a tricky one, but I think I've figured it out.
Done in a hurry in Python 2.7, please fully test - this should handle strings up to 998 characters:
import sys
orig = sys.argv[1]
origLen = len(orig)
if (origLen >= 98):
extra = str(origLen + 3)
elif (origLen >= 8):
extra = str(origLen + 2)
else:
extra = str(origLen + 1)
final = orig + extra
print final
Results of very brief testing
C:\Users\PH\Desktop>python test.py "tiny|"
tiny|6
C:\Users\PH\Desktop>python test.py "myString|"
myString|11
C:\Users\PH\Desktop>python test.py "myStringWith98Characters.........................................................................|"
myStringWith98Characters.........................................................................|101

Just find the length of the string. Then iterate through each value of the number of digits the length of the resulting string can possibly have. While iterating, check if the sum of the number of digits to be appended and the initial string length is equal to the length of the resulting string.
def get_length(s):
s = s + "|"
result = ""
len_s = len(s)
i = 1
while True:
candidate = len_s + i
if len(str(candidate)) == i:
result = s + str(len_s + i)
break
i += 1

This code gives the result.
I used a few var, but at the end it shows the output you want:
def len_s(s):
s = s + '|'
b = len(s)
z = s + str(b)
length = len(z)
new_s = s + str(length)
new_len = len(new_s)
return s + str(new_len)
s = "A string"
print len_s(s)

Here's a direct equation for this (so it's not necessary to construct the string). If s is the string, then the length of the string including the length of the appended length will be:
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
The idea here is that a direct calculation is only problematic when the appended length will push the length past a power of ten; that is, at 9, 98, 99, 997, 998, 999, 9996, etc. To work this through, 1 + int(log10(len(s))) is the number of digits in the length of s. If we add that to len(s), then 9->10, 98->100, 99->101, etc, but still 8->9, 97->99, etc, so we can push past the power of ten exactly as needed. That is, adding this produces a number with the correct number of digits after the addition. Then do the log again to find the length of that number and that's the answer.
To test this:
from math import log10
def find_length(s):
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
return L1
# test, just looking at lengths around 10**n
for i in range(9):
for j in range(30):
L = abs(10**i - j + 10) + 1
s = "a"*L
x0 = find_length(s)
new0 = s+`x0`
if len(new0)!=x0:
print "error", len(s), x0, log10(len(s)), log10(x0)

How to make a random but partial shuffle in Python?

Instead of a complete shuffle, I am looking for a partial shuffle function in python.
Example : "string" must give rise to "stnrig", but not "nrsgit"
It would be better if I can define a specific "percentage" of characters that have to be rearranged.
Purpose is to test string comparison algorithms. I want to determine the "percentage of shuffle" beyond which an(my) algorithm will mark two (shuffled) strings as completely different.
Update :
Here is my code. Improvements are welcome !
import random
percent_to_shuffle = int(raw_input("Give the percent value to shuffle : "))
to_shuffle = list(raw_input("Give the string to be shuffled : "))
num_of_chars_to_shuffle = int((len(to_shuffle)*percent_to_shuffle)/100)
for i in range(0,num_of_chars_to_shuffle):
x=random.randint(0,(len(to_shuffle)-1))
y=random.randint(0,(len(to_shuffle)-1))
z=to_shuffle[x]
to_shuffle[x]=to_shuffle[y]
to_shuffle[y]=z
print ''.join(to_shuffle)

This is a problem simpler than it looks. And the language has the right tools not to stay between you and the idea,as usual:
import random
def pashuffle(string, perc=10):
data = list(string)
for index, letter in enumerate(data):
if random.randrange(0, 100) < perc/2:
new_index = random.randrange(0, len(data))
data[index], data[new_index] = data[new_index], data[index]
return "".join(data)

Your problem is tricky, because there are some edge cases to think about:
Strings with repeated characters (i.e. how would you shuffle "aaaab"?)
How do you measure chained character swaps or re arranging blocks?
In any case, the metric defined to shuffle strings up to a certain percentage is likely to be the same you are using in your algorithm to see how close they are.
My code to shuffle n characters:
import random
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
idx = idx[:n]
mapping = dict((idx[i], idx[i-1]) for i in range(n))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
Basically chooses n positions to swap at random, and then exchanges each of them with the next in the list... This way it ensures that no inverse swaps are generated and exactly n characters are swapped (if there are characters repeated, bad luck).
Explained run with 'string', 3 as input:
idx is [0, 1, 2, 3, 4, 5]
we shuffle it, now it is [5, 3, 1, 4, 0, 2]
we take just the first 3 elements, now it is [5, 3, 1]
those are the characters that we are going to swap
s t r i n g
^ ^ ^
t (1) will be i (3)
i (3) will be g (5)
g (5) will be t (1)
the rest will remain unchanged
so we get 'sirgnt'
The bad thing about this method is that it does not generate all the possible variations, for example, it could not make 'gnrits' from 'string'. This could be fixed by making partitions of the indices to be shuffled, like this:
import random
def randparts(l):
n = len(l)
s = random.randint(0, n-1) + 1
if s >= 2 and n - s >= 2: # the split makes two valid parts
yield l[:s]
for p in randparts(l[s:]):
yield p
else: # the split would make a single cycle
yield l
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
mapping = dict((x[i], x[i-1])
for i in range(len(x))
for x in randparts(idx[:n]))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))

import random
def partial_shuffle(a, part=0.5):
# which characters are to be shuffled:
idx_todo = random.sample(xrange(len(a)), int(len(a) * part))
# what are the new positions of these to-be-shuffled characters:
idx_target = idx_todo[:]
random.shuffle(idx_target)
# map all "normal" character positions {0:0, 1:1, 2:2, ...}
mapper = dict((i, i) for i in xrange(len(a)))
# update with all shuffles in the string: {old_pos:new_pos, old_pos:new_pos, ...}
mapper.update(zip(idx_todo, idx_target))
# use mapper to modify the string:
return ''.join(a[mapper[i]] for i in xrange(len(a)))
for i in xrange(5):
print partial_shuffle('abcdefghijklmnopqrstuvwxyz', 0.2)
prints
abcdefghljkvmnopqrstuxwiyz
ajcdefghitklmnopqrsbuvwxyz
abcdefhwijklmnopqrsguvtxyz
aecdubghijklmnopqrstwvfxyz
abjdefgcitklmnopqrshuvwxyz

Evil and using a deprecated API:
import random
# adjust constant to taste
# 0 -> no effect, 0.5 -> completely shuffled, 1.0 -> reversed
# Of course this assumes your input is already sorted ;)
''.join(sorted(
'abcdefghijklmnopqrstuvwxyz',
cmp = lambda a, b: cmp(a, b) * (-1 if random.random() < 0.2 else 1)
))

maybe like so:
>>> s = 'string'
>>> shufflethis = list(s[2:])
>>> random.shuffle(shufflethis)
>>> s[:2]+''.join(shufflethis)
'stingr'
Taking from fortran's idea, i'm adding this to collection. It's pretty fast:
def partial_shuffle(st, p=20):
p = int(round(p/100.0*len(st)))
idx = range(len(s))
sample = random.sample(idx, p)
res=str()
samptrav = 1
for i in range(len(st)):
if i in sample:
res += st[sample[-samptrav]]
samptrav += 1
continue
res += st[i]
return res

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Binary string in Python issues - python

Related

How a create a random string combination of postive, Negative, floats, Upper case, Lower case elements

Splitting arrays in unequal chunks using np.array_split: overcomplicating?

Given a string of a million numbers, return all repeating 3 digit numbers

Find length of a string that includes its own length?

How to make a random but partial shuffle in Python?

Categories

Resources