How to delete randomly inserted characters at specific locations in a string? - python

I was previously working on a problem of String encryption: How to add randomly generated characters in specific locations in a string? (obfuscation to be more specific).
Now I am working on its second part that is to remove the randomly added characters and digits from the obfuscated String.
My code works for removing one random character and digit from the string (when encryption_str is set to 1) but for removing two, three .. nth .. number of characters (when encryption_str is set to 2, 3 or n), I don't understand how to modify it.
My Code:
import string, random
def decrypt():
encryption_str = 2 #Doesn't produce correct output when set to any other number except 1
data = "osqlTqlmAe23h"
content = data[::-1]
print("Modified String: ",content)
result = []
result[:0] = content
indices = []
for i in range(0, encryption_str+3): #I don't understand how to change it
indices.append(i)
for i in indices:
del result[i+1]
message = "".join(result)
print("Original String: " ,message)
decrypt()
Output for Encryption level 1 (Correct Output)
Output for Encryption level 2 (Incorrect Output)

That's easy to append chars, that's a bit more difficult to remove them, because that changes the string length and the position of the chars.
But there is an easy way : retrieve the good ones, and for that you just need to iterate with the encryption_str+1 as step (that avoid adding an if on the indice)
def decrypt(content, nb_random_chars):
content = content[::-1]
result = []
for i in range(0, len(content), nb_random_chars + 1):
result.append(content[i])
message = "".join(result)
print("Modified String: ", content)
print("Original String: ", message)
# 3 lines in 1 with :
result = [content[i] for i in range(0, len(content), nb_random_chars + 1)]
Both will give hello
decrypt("osqlTqlmAe23h", 2)
decrypt("osqFlTFqlmFAe2F3h", 3)

Why not try some modulo arithmetic? Maybe with your original string, you try something like:
''.join([x for num, x in enumerate(data) if num % encryption_str == 0])

How about a list comprehension (which is really just a slightly more compact notation for #azro's answer)?
result = content[0::(encryption_str+1)]
That is, take every encryption_str+1'd character from content starting with the first.

Related

How to substitute unstressed vowel?

I have a CSV file with the following data:
bel.lez.za;bellézza
e.la.bo.ra.re;elaboràre
a.li.an.te;alïante
u.mi.do;ùmido
the first value is the word divided in syllables and the second is for the stress.
I'd like to merge the the two info and obtain the following output:
bel.léz.za
e.la.bo.rà.re
a.lï.an.te
ù.mi.do
I computed the position of the stressed vowel and tried to substitute the same unstressed vowel in the first value, but full stops make indexing difficult. Is there a way to tell python to ignore full stops while counting? or is there an easier way to perform it? Thx
After splitting the two values for each line I computed the position of the stressed vowels:
char_list=['ò','à','ù','ì','è','é','ï']
for character in char_list:
if character in value[1]:
position_of_stressed_vowel=value[1].index(character)
I'd suggest merging/aligning the two forms in parallel instead of trying to substitute things via indexing. The idea is to iterate through the plain form and take out one character from the accented form for every character from the plain form, keeping dots as they are.
(Or perhaps, the idea is to add the dots to the accented form instead of adding the accented characters to the syllabified form.)
def merge_accents(plain, accented):
output = ""
acc_chars = iter(accented)
for char in plain:
if char == ".":
output += char
else:
output += next(acc_chars)
return output
Test:
data = [['bel.lez.za', 'bellézza'],
['e.la.bo.ra.re', 'elaboràre'],
['a.li.an.te', 'alïante'],
['u.mi.do', 'ùmido']]
# Returns
# bel.léz.za
# e.la.bo.rà.re
# a.lï.an.te
# ù.mi.do
for plain, accented in data:
print(merge_accents(plain, accented))
Is there a way to tell python to ignore full stops while counting?
Yes, by implementing it yourself using an index lookup that tells you which index in the space-delimited string an index in the word is equivalent to:
i = 0
corrected_index = []
for char in value[0]:
if char != ".":
corrected_index.append(i)
i+=1
now, you can correct the index and replace the character:
value[0][corrected_index[position_of_stressed_vowel]] = character
Make sure to use UTF-16 as encoding for your "stressed vowel" characters to have a single index.
You can loop over the two halfs of the string, keep track of the index in the first half, excluding the dots and add the character at the tracked index from the second half of the string to a buffer (modified) string. Like the code below:
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
converted_data = []
# Loop over the data.
for pair in data:
# Split the on ";"
first_half, second_half = pair.split(';')
# Create variables to keep track of the current letter and the modified string.
current_letter = 0
modified_second_half = ''
# Loop over the letter of the first half of the string.
for current_char in first_half:
# If the current_char is a dot add it to the modified string.
if current_char == '.':
modified_second_half += '.'
# If the current_char is not a dot add the current letter from the second half to the modified string,
# and update the current letter value.
else:
modified_second_half += second_half[current_letter]
current_letter += 1
converted_data.append(modified_second_half)
print(converted_data)
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
def slice_same(input, lens):
# slices the given string into the given lengths.
res = []
strt = 0
for size in lens:
res.append(input[strt : strt + size])
strt += size
return res
# split into two.
data = [x.split(';') for x in data]
# Add third column that's the length of each piece.
data = [[x, y, [len(z) for z in x.split('.')]] for x, y in data]
# Put text and lens through function.
data = ['.'.join(slice_same(y, z)) for x, y, z in data]
print(data)
Output:
['bel.léz.za',
'e.la.bo.rà.re',
'a.lï.an.te',
'ù.mi.do']

Remove punctuation items from end of string

I have a seemingly simple problem, which I cannot seem to solve. Given a string containing a DOI, I need to remove the last character if it is a punctuation mark until the last character is letter or number.
For example, if the string was:
sampleDoi = "10.1097/JHM-D-18-00044.',"
I want the following output:
"10.1097/JHM-D-18-00044"
ie. remove .',
I wrote the following script to do this:
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i - 1
else:
print (a)
break
However, this produces 10.1097/JHM-D-18-00 but I would like it to produce 10.1097/JHM-D-18-00044. Why is the 44 removed from the end?
The string function rstrip() is designed to do exactly this:
>>> sampleDoi = "10.1097/JHM-D-18-00044.',"
>>> sampleDoi.rstrip(",.'")
'10.1097/JHM-D-18-00044'
Corrected code:
import string
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i # Well Really this line can just be removed all together.
else:
print (a)
break
This gives the output you want, while keeping the original code mostly the same.
This is one way using next and str.isalnum with a generator expression utilizing enumerate / reversed.
sampleDoi = "10.1097/JHM-D-18-00044.',"
idx = next((i for i, j in enumerate(reversed(sampleDoi)) if j.isalnum()), 0)
res = sampleDoi[:-idx]
print(res)
'10.1097/JHM-D-18-00044'
The default parameter 0is used so that, if no alphanumeric character is found, an empty string is returned.
If you dont wanna use regex:
the_str = "10.1097/JHM-D-18-00044.',"
while the_str[-1] in string.punctuation:
the_str = the_str[:-1]
Removes the last character until it's no longer a punctuation character.

Python script to make every combination of a string with placed characters

I'm looking for help in creating a script to add periods to a string in every place but first and last, using as many periods as needed to create as many combinations as possible:
The output for the string 1234 would be:
["1234", "1.234", "12.34", "123.4", "1.2.34", "1.23.4" etc. ]
And obviously this needs to work for all lengths of string.
You should solve this type of problems yourself, these are simple algorithms to manipulate data that you should know how to come up with.
However, here is the solution (long version for more clarity):
my_str = "1234" # original string
# recursive function for constructing dots
def construct_dot(s, t):
# s - the string to put dots
# t - number of dots to put
# zero dots will return the original string in a list (stop criteria)
if t==0: return [s]
# allocation for results list
new_list = []
# iterate the next dot location, considering the remaining dots.
for p in range(1,len(s) - t + 1):
new_str = str(s[:p]) + '.' # put the dot in the location
res_str = str(s[p:]) # crop the string frot the dot to the end
sub_list = construct_dot(res_str, t-1) # make a list with t-1 dots (recursive)
# append concatenated strings
for sl in sub_list:
new_list.append(new_str + sl)
# we result with a list of the string with the dots.
return new_list
# now we will iterate the number of the dots that we want to put in the string.
# 0 dots will return the original string, and we can put maximum of len(string) -1 dots.
all_list = []
for n_dots in range(len(my_str)):
all_list.extend(construct_dot(my_str,n_dots))
# and see the results
print(all_list)
Output is:
['1234', '1.234', '12.34', '123.4', '1.2.34', '1.23.4', '12.3.4', '1.2.3.4']
A concise solution without recursion: using binary combinations (think of 0, 1, 10, 11, etc) to determine where to insert the dots.
Between each letter, put a dot when there's a 1 at this index and an empty string when there's a 0.
your_string = "1234"
def dot_combinations(string):
i = 0
combinations = []
# Iter while the binary representation length is smaller than the string size
while i.bit_length() < len(string):
current_word = []
for index, letter in enumerate(string):
current_word.append(letter)
# Append a dot if there's a 1 in this position
if (1 << index) & i:
current_word.append(".")
i+=1
combinations.append("".join(current_word))
return combinations
print dot_combinations(your_string)
Output:
['1234', '1.234', '12.34', '1.2.34', '123.4', '1.23.4', '12.3.4', '1.2.3.4']

How to loop to generate string in sequence?

I am trying to create a loop where I can generate string using loop. What I am trying to achieve is that I want to create a small collection of strings starting from 1 character to up to 5 characters.
So, starting from sting 1, I want to go to 55555 but this is number so it seems easy if I just add them, but when it comes to alpha numeric, it gets tricky.
Here is explanation,
I have collection of alpha-numeric chars as string s = "123ABC" and what I want to do is that I want to create all possible 1 character string out of it, so I will have 1,2,3,A,B,C and after that I want to add one more digit in length of string so I can get 11, 12, 13 and so on until I get all possible combination out of it up to CA, CB, CC and I want to get it up to CCCCCC. I am confused in loop because I can get it to generate a temp sting but looping inside to rotate characters is tricky,
this is what I have done so far,
i = 0
strr = "123ABC"
while i < len(strr):
t = strr[0] * (i+1)
for q in range(0, len(t)):
# Here I need help to rotate more
pass
i += 1
Can anyone explain me or point me to resource where I can find solution for it?
You may want to use itertools.permutations function:
import itertools
chars = '123ABC'
for i in xrange(1, len(chars)+1):
print list(itertools.permutations(chars, i))
EDIT:
To get a list of strings, try this:
import itertools
chars = '123ABC'
strings = []
for i in xrange(1, len(chars)+1):
strings.extend(''.join(x) for x in itertools.permutations(chars, i))
This is a nested loop. Different depths of recursion produce all possible combinations.
strr = "123ABC"
def prod(items, level):
if level == 0:
yield []
else:
for first in items:
for rest in prod(items, level-1):
yield [first] + rest
for ln in range(1, len(strr)+1):
print("length:", ln)
for s in prod(strr, ln):
print(''.join(s))
It is also called cartesian product and there is a corresponding function in itertools.

String manipulation weirdness when incrementing trailing digit

I got this code:
myString = 'blabla123_01_version6688_01_01Long_stringWithNumbers'
versionSplit = re.findall(r'-?\d+|[a-zA-Z!##$%^&*()_+.,<>{}]+|\W+?', myString)
for i in reversed(versionSplit):
id = versionSplit.index(i)
if i.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
versionSplit[id]=str(i)
break
final = ''
myString = final.join(versionSplit)
print myString
Which suppose to increase ONLY the last digit from the string given. But if you run that code you will see that if there is the same digit in the string as the last one it will increase it one after the other if you keep running the script. Can anyone help me find out why?
Thank you in advance for any help
Is there a reason why you aren't doing something like this instead:
prefix, version = re.match(r"(.*[^\d]+)([\d]+)$", myString).groups()
newstring = prefix + str(int(version)+1).rjust(len(version), '0')
Notes:
This will actually "carry over" the version numbers properly: ("09" -> "10") and ("99" -> "100")
This regex assumes at least one non-numeric character before the final version substring at the end. If this is not matched, it will throw an AttributeError. You could restructure it to throw a more suitable or specific exception (e.g. if re.match(...) returns None; see comments below for more info).
Adjust accordingly.
The issue is the use of the list.index() function on line 5. This returns the index of the first occurrence of a value in a list, from left to right, but the code is iterating over the reversed list (right to left). There are lots of ways to straighten this out, but here's one that makes the fewest changes to your existing code: Iterate over indices in reverse (avoids reversing the list).
for idx in range(len(versionSplit)-1, -1, -1):
i = versionSplit[idx]
if chunk.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
versionSplit[idx]=str(i)
break
myString = 'blabla123_01_version6688_01_01veryLong_stringWithNumbers01'
versionSplit = re.findall(r'-?\d+|[^\-\d]+', myString)
for i in xrange(len(versionSplit) - 1, -1, -1):
s = versionSplit[i]
if s.isdigit():
n = int(s) + 1
versionSplit[i] = "%0*d" % (len(s), n)
break
myString = ''.join(versionSplit)
print myString
Notes:
It is silly to use the .index() method to try to find the string. Just use a decrementing index to try each part of versionSplit. This was where your problem was, as commented above by #David Robinson.
Don't use id as a variable name; you are covering up the built-in function id().
This code is using the * in a format template, which will accept an integer and set the width.
I simplified the pattern: either you are matching a digit (with optional leading minus sign) or else you are matching non-digits.
I tested this and it seems to work.
First, three notes:
id is a reserved python word;
For joining, a more pythonic idiom is ''.join(), using a literal empty string
reversed() returns an iterator, not a list. That's why I use list(reversed()), in order to do rev.index(i) later.
Corrected code:
import re
myString = 'blabla123_01_version6688_01_01veryLong_stringWithNumbers01'
print myString
versionSplit = re.findall(r'-?\d+|[a-zA-Z!##$%^&*()_+.,<>{}]+|\W+?', myString)
rev = list(reversed(versionSplit)) # create a reversed list to work with from now on
for i in rev:
idd = rev.index(i)
if i.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
rev[idd]=str(i)
break
myString = ''.join(reversed(rev)) # reverse again only just before joining
print myString

Categories