I'm looking for help in creating a script to add periods to a string in every place but first and last, using as many periods as needed to create as many combinations as possible:
The output for the string 1234 would be:
["1234", "1.234", "12.34", "123.4", "1.2.34", "1.23.4" etc. ]
And obviously this needs to work for all lengths of string.
You should solve this type of problems yourself, these are simple algorithms to manipulate data that you should know how to come up with.
However, here is the solution (long version for more clarity):
my_str = "1234" # original string
# recursive function for constructing dots
def construct_dot(s, t):
# s - the string to put dots
# t - number of dots to put
# zero dots will return the original string in a list (stop criteria)
if t==0: return [s]
# allocation for results list
new_list = []
# iterate the next dot location, considering the remaining dots.
for p in range(1,len(s) - t + 1):
new_str = str(s[:p]) + '.' # put the dot in the location
res_str = str(s[p:]) # crop the string frot the dot to the end
sub_list = construct_dot(res_str, t-1) # make a list with t-1 dots (recursive)
# append concatenated strings
for sl in sub_list:
new_list.append(new_str + sl)
# we result with a list of the string with the dots.
return new_list
# now we will iterate the number of the dots that we want to put in the string.
# 0 dots will return the original string, and we can put maximum of len(string) -1 dots.
all_list = []
for n_dots in range(len(my_str)):
all_list.extend(construct_dot(my_str,n_dots))
# and see the results
print(all_list)
Output is:
['1234', '1.234', '12.34', '123.4', '1.2.34', '1.23.4', '12.3.4', '1.2.3.4']
A concise solution without recursion: using binary combinations (think of 0, 1, 10, 11, etc) to determine where to insert the dots.
Between each letter, put a dot when there's a 1 at this index and an empty string when there's a 0.
your_string = "1234"
def dot_combinations(string):
i = 0
combinations = []
# Iter while the binary representation length is smaller than the string size
while i.bit_length() < len(string):
current_word = []
for index, letter in enumerate(string):
current_word.append(letter)
# Append a dot if there's a 1 in this position
if (1 << index) & i:
current_word.append(".")
i+=1
combinations.append("".join(current_word))
return combinations
print dot_combinations(your_string)
Output:
['1234', '1.234', '12.34', '1.2.34', '123.4', '1.23.4', '12.3.4', '1.2.3.4']
Related
I'm trying to find a pattern in a string. Example:
trail = 'AABACCCACCACCACCACCACC" one can note the "ACC" repetition after a prefix of AAB; so the result should be AAB(ACC)
Without using regex 'import re' how can I do this. What I did so far:
def get_pattern(trail):
for j in range(0,len(trail)):
k = j+1
while k<len(trail) and trail[j]!=trail[k]:
k+=1
if k==len(trail)-1:
continue
window = ''
stop = trail[j]
m = j
while m<len(trail) and k<len(trail) and trail[m]==trail[k]:
window+=trail[m]
m+=1
k+=1
if trail[m]==stop and len(window)>1:
break
if len(window)>1:
prefix=''
if j>0:
prefix = trail[0:j]
return prefix+'('+window+')'
return False
This will do (almost) the trick because in a use case like this:
"AAAAAAAAAAAAAAAAAABDBDBDBDBDBDBDBDBDBDBDBDBDBDBDBD"
the result is AA but it should be: AAAAAAAAAAAAAAAAAA(BD)
The issue with your code is that once you find a repetition that is of length 2 or greater, you don't check forward to make sure it's maintained. In your second example, this causes it to grab onto the 'AA' without seeing the 'BD's that follow.
Since we know we're dealing with cases of prefix + window, it makes sense to instead look from the end rather than the beginning.
def get_pattern(string):
str_len = len(string)
splits = [[string[i-rep_length: i] for i in range(str_len, 0, -rep_length)] for rep_length in range(1, str_len//2)]
reps = [[window == split[0] for window in split].index(False) for split in splits]
prefix_lengths = [str_len - (i+1)*rep for i,rep in enumerate(reps)]
shortest_prefix_length = min(prefix_lengths)
indices = [i for i, pre_len in enumerate(prefix_lengths) if pre_len == shortest_prefix_length]
reps = list(map(reps.__getitem__, indices))
splits = list(map(splits.__getitem__, indices))
max_reps = max(reps)
window = splits[reps.index(max_reps)][0]
prefix = string[0:shortest_prefix_length]
return f'{prefix}({window})' if max_reps > 1 else None
splits uses list comprehension to create a list of lists where each sublist splits the string into rep_length sized pieces starting from the end.
For each sublist split, the first split[0] is our proposed pattern and we see how many times that it's repeated. This is easily done by finding the first instance of False when checking window == split[0] using the list.index() function. We also want to calculate the size of the prefix. We want the shortest prefix with the largest number of reps. This is because of nasty edge cases like jeifjeiAABBBBBBBBBBBBBBAABBBBBBBBBBBBBBAABBBBBBBBBBBBBBAABBBBBBBBBBBBBB where the window has B that repeats more than the window itself. Additionally, anything that repeats 4 times can also be seen as a double-sized window repeated twice.
If you want to deal with an additional suffix, we can do a hacky solution by just trimming from the end until get_pattern() returns a pattern and then just append what was trimmed:
def get_pattern_w_suffix(string):
for i in range(len(string), 0, -1):
pattern = get_pattern(string[0:i])
suffix = string[i:]
if pattern is not None:
return pattern + suffix
return None
However, this assumes that the suffix doesn't have a pattern itself.
I have a CSV file with the following data:
bel.lez.za;bellézza
e.la.bo.ra.re;elaboràre
a.li.an.te;alïante
u.mi.do;ùmido
the first value is the word divided in syllables and the second is for the stress.
I'd like to merge the the two info and obtain the following output:
bel.léz.za
e.la.bo.rà.re
a.lï.an.te
ù.mi.do
I computed the position of the stressed vowel and tried to substitute the same unstressed vowel in the first value, but full stops make indexing difficult. Is there a way to tell python to ignore full stops while counting? or is there an easier way to perform it? Thx
After splitting the two values for each line I computed the position of the stressed vowels:
char_list=['ò','à','ù','ì','è','é','ï']
for character in char_list:
if character in value[1]:
position_of_stressed_vowel=value[1].index(character)
I'd suggest merging/aligning the two forms in parallel instead of trying to substitute things via indexing. The idea is to iterate through the plain form and take out one character from the accented form for every character from the plain form, keeping dots as they are.
(Or perhaps, the idea is to add the dots to the accented form instead of adding the accented characters to the syllabified form.)
def merge_accents(plain, accented):
output = ""
acc_chars = iter(accented)
for char in plain:
if char == ".":
output += char
else:
output += next(acc_chars)
return output
Test:
data = [['bel.lez.za', 'bellézza'],
['e.la.bo.ra.re', 'elaboràre'],
['a.li.an.te', 'alïante'],
['u.mi.do', 'ùmido']]
# Returns
# bel.léz.za
# e.la.bo.rà.re
# a.lï.an.te
# ù.mi.do
for plain, accented in data:
print(merge_accents(plain, accented))
Is there a way to tell python to ignore full stops while counting?
Yes, by implementing it yourself using an index lookup that tells you which index in the space-delimited string an index in the word is equivalent to:
i = 0
corrected_index = []
for char in value[0]:
if char != ".":
corrected_index.append(i)
i+=1
now, you can correct the index and replace the character:
value[0][corrected_index[position_of_stressed_vowel]] = character
Make sure to use UTF-16 as encoding for your "stressed vowel" characters to have a single index.
You can loop over the two halfs of the string, keep track of the index in the first half, excluding the dots and add the character at the tracked index from the second half of the string to a buffer (modified) string. Like the code below:
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
converted_data = []
# Loop over the data.
for pair in data:
# Split the on ";"
first_half, second_half = pair.split(';')
# Create variables to keep track of the current letter and the modified string.
current_letter = 0
modified_second_half = ''
# Loop over the letter of the first half of the string.
for current_char in first_half:
# If the current_char is a dot add it to the modified string.
if current_char == '.':
modified_second_half += '.'
# If the current_char is not a dot add the current letter from the second half to the modified string,
# and update the current letter value.
else:
modified_second_half += second_half[current_letter]
current_letter += 1
converted_data.append(modified_second_half)
print(converted_data)
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
def slice_same(input, lens):
# slices the given string into the given lengths.
res = []
strt = 0
for size in lens:
res.append(input[strt : strt + size])
strt += size
return res
# split into two.
data = [x.split(';') for x in data]
# Add third column that's the length of each piece.
data = [[x, y, [len(z) for z in x.split('.')]] for x, y in data]
# Put text and lens through function.
data = ['.'.join(slice_same(y, z)) for x, y, z in data]
print(data)
Output:
['bel.léz.za',
'e.la.bo.rà.re',
'a.lï.an.te',
'ù.mi.do']
I was previously working on a problem of String encryption: How to add randomly generated characters in specific locations in a string? (obfuscation to be more specific).
Now I am working on its second part that is to remove the randomly added characters and digits from the obfuscated String.
My code works for removing one random character and digit from the string (when encryption_str is set to 1) but for removing two, three .. nth .. number of characters (when encryption_str is set to 2, 3 or n), I don't understand how to modify it.
My Code:
import string, random
def decrypt():
encryption_str = 2 #Doesn't produce correct output when set to any other number except 1
data = "osqlTqlmAe23h"
content = data[::-1]
print("Modified String: ",content)
result = []
result[:0] = content
indices = []
for i in range(0, encryption_str+3): #I don't understand how to change it
indices.append(i)
for i in indices:
del result[i+1]
message = "".join(result)
print("Original String: " ,message)
decrypt()
Output for Encryption level 1 (Correct Output)
Output for Encryption level 2 (Incorrect Output)
That's easy to append chars, that's a bit more difficult to remove them, because that changes the string length and the position of the chars.
But there is an easy way : retrieve the good ones, and for that you just need to iterate with the encryption_str+1 as step (that avoid adding an if on the indice)
def decrypt(content, nb_random_chars):
content = content[::-1]
result = []
for i in range(0, len(content), nb_random_chars + 1):
result.append(content[i])
message = "".join(result)
print("Modified String: ", content)
print("Original String: ", message)
# 3 lines in 1 with :
result = [content[i] for i in range(0, len(content), nb_random_chars + 1)]
Both will give hello
decrypt("osqlTqlmAe23h", 2)
decrypt("osqFlTFqlmFAe2F3h", 3)
Why not try some modulo arithmetic? Maybe with your original string, you try something like:
''.join([x for num, x in enumerate(data) if num % encryption_str == 0])
How about a list comprehension (which is really just a slightly more compact notation for #azro's answer)?
result = content[0::(encryption_str+1)]
That is, take every encryption_str+1'd character from content starting with the first.
I have a string like this in python3:
ab_cdef_ghilm__nop_q__rs
starting from a specific character, based on the index position I want to slice a window around this character of 5 characters per side. But if the _ character is found it has to skip and to go to the next character. for example, considering in this string the character "i" I want to have a final string of 11 characters around the "i" skipping the _ characters all the times it occurs like outputting this:
defghilmnop
Consider that I have long strings and I want to decide the index position where I want to do this thing.
in this case index=10
Is there a command that crops a string of a specific size skipping a specific character?
for the moment what I'm able to do is to remove the _ from the string meanwhile counting the number of _ occurrences and use it to define the shift in the middle index position and finally I crop a window of the desired size but I want something more processive so if I could just jump every time he find a "_" wolud be perfect
situation B) index=13
I want to have 5 character on the left and 5 on the right of this index getting rid (abd not counting) of the _ characters so having this output:
ghilmnopqrs
so basically when the index corresponds to a character star to from it instead when the index correspond to a _ character we have to shift (to the right up to the next character to have in the end a string of 11 characters.
to make long story short the output is 11 characters with the index position in the middle. if the index position is a _ we have to skip this character and consider the middle character the one close by(closer).
I don't think there's specific command for this, but you could build your own.
For example:
s = 'ab_cdef_ghilm__nop_q__rs'
def get_slice(s, idx, n=5, ignored_chars='_'):
if s[idx] in ignored_chars:
# adjust idx to first valid on right side:
idx = next((i for i, ch in enumerate(s[idx:], idx) if ch not in ignored_chars), None)
if idx is None:
return ''
d = {i: ch for i, ch in enumerate(s) if ch not in ignored_chars}
if idx in d:
keys = [k for k in d.keys()]
idx = keys.index(idx)
return ''.join(d[k] for k in keys[max(0, idx-n):min(idx+n+1, len(s))])
print(get_slice(s, 10, 5, '_'))
print(get_slice(s, 13, 5, '_'))
Prints:
defghilmnop
ghilmnopqrs
In case print(get_slice(s, 1, 5, '_')):
abcdefg
EDIT: Added check for starting index equals ignored char.
you define a function split like below which will split a string such that it has given number of characters on left and right side which is not "_"
st = "ab_cdef_ghilm__nop_q__rs"
def slice(st, ind, c_count):
cp = [char!="_" for char in st]
for i in range(len(st)):
if sum(cp[ind:ind+i]) == c_count:
break
right = ind + i
for i in range(len(st)):
if sum(cp[ind-i:ind]) == c_count:
break
left = ind - i
return st[left:right+1]
slice(st, 10, 5)
I am trying to create a loop where I can generate string using loop. What I am trying to achieve is that I want to create a small collection of strings starting from 1 character to up to 5 characters.
So, starting from sting 1, I want to go to 55555 but this is number so it seems easy if I just add them, but when it comes to alpha numeric, it gets tricky.
Here is explanation,
I have collection of alpha-numeric chars as string s = "123ABC" and what I want to do is that I want to create all possible 1 character string out of it, so I will have 1,2,3,A,B,C and after that I want to add one more digit in length of string so I can get 11, 12, 13 and so on until I get all possible combination out of it up to CA, CB, CC and I want to get it up to CCCCCC. I am confused in loop because I can get it to generate a temp sting but looping inside to rotate characters is tricky,
this is what I have done so far,
i = 0
strr = "123ABC"
while i < len(strr):
t = strr[0] * (i+1)
for q in range(0, len(t)):
# Here I need help to rotate more
pass
i += 1
Can anyone explain me or point me to resource where I can find solution for it?
You may want to use itertools.permutations function:
import itertools
chars = '123ABC'
for i in xrange(1, len(chars)+1):
print list(itertools.permutations(chars, i))
EDIT:
To get a list of strings, try this:
import itertools
chars = '123ABC'
strings = []
for i in xrange(1, len(chars)+1):
strings.extend(''.join(x) for x in itertools.permutations(chars, i))
This is a nested loop. Different depths of recursion produce all possible combinations.
strr = "123ABC"
def prod(items, level):
if level == 0:
yield []
else:
for first in items:
for rest in prod(items, level-1):
yield [first] + rest
for ln in range(1, len(strr)+1):
print("length:", ln)
for s in prod(strr, ln):
print(''.join(s))
It is also called cartesian product and there is a corresponding function in itertools.