I need change 3 random characters in a string using Python, example string:
Adriano Celentano
Luca Valentina
I need to replace 3 characters, not replacing with the same character or number, not replacing space. How can I do this using Python ?
Need output like this :
adraano cettntano
lacr vilenntina
I don't know from where i can start to make this.
My code so far:
for i in xrange(4):
for n in nume :
print n.replace('$', random.choice(string.letters)).replace('#', random.choice(string.letters))
If you just want to change chars that are not whitespace and not the same char in regards to index, you can first pull the indexes where the non-whitespace chars are:
import random
inds = [i for i,_ in enumerate(s) if not s.isspace()]
print(random.sample(inds,3))
Then use those indexes to replace.
s = "Adriano Celentano"
import random
inds = [i for i,_ in enumerate(s) if not s.isspace()]
sam = random.sample(inds, 3)
from string import ascii_letters
lst = list(s)
for ind in sam:
lst[ind] = random.choice(ascii_letters)
print("".join(lst))
If you want a unique char each time to replace with also:
s = "Adriano Celentano"
import random
from string import ascii_letters
inds = [i for i,_ in enumerate(s) if not s.isspace()]
sam = random.sample(inds, 3)
letts = iter(random.sample(ascii_letters, 3))
lst = list(s)
for ind in sam:
lst[ind] = next(letts)
print("".join(lst))
output:
Adoiano lelenhano
You can do this in two stages. In the first stage you pick 3 random positions in your string that meet your search criteria (isalnum):
import random
import string
replacement_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
# replacement_chars = string.letters + string.digits
input = 'Adriano Celentano'
input_list = list(input)
input_dic = dict(enumerate(input_list))
valid_positions=[key for key in input_dic if input_dic[key].isalnum()]
random_positions=random.sample(valid_positions,3)
In the second part you generate 3 random characters and replace the characters in the previously selected positions. I have added a while loop to generate a new random character if it matches the existing value
random_chars = random.sample(replacement_chars,len(random_positions))
char_counter = 0
for position in random_positions:
#check if the replacement character matches the existing one
#and generate another one if needed
while input_list[position]==random_chars[char_counter]:
random_chars[char_counter] = random.choice(replacement_chars)
input_list[position]=random_chars[char_counter]
char_counter = char_counter + 1
print "".join(input_list).lower()
Related
yes i know replace methods is not what i need but i don't know how to do it, it must to show the letters from a word, slowly, below the code i type the output what i want
import random
a = "hello"
x = "_" * len(a)
c = x.replace(x[random.randint(0, len(a) -1 )], a[random.randint(0, len(a) - 1)])
print(c)```
the output what i want is something like
_____
2seconds later
__ll_
2sl...
h_ll_
2....
hell_
2...
hello
You can do it various ways: by replacing, by indexing, by regex ...
A simple implementation using a regex that reads from a whitelist:
from random import shuffle
from time import sleep
from re import sub
word = "hello"
sleep_time = 2
mask_char = '_'
char_list = list(word)
shuffle(char_list) # Mix the order of the letters.
# Make the letters from the list unique,
# then make it a list again for indexing later:
char_list = list(set(char_list))
whitelist = ""
# Print the full mask:
print(mask_char*len(word))
sleep(sleep_time)
for ch in char_list:
whitelist += ch
print(sub(fr"[^{whitelist}]", mask_char, word))
if ch != char_list[-1]:
sleep(sleep_time)
You'd need the time module for the sleep function. Then it's just a matter of getting each letter in the a string, shuffling them, looping through them and then each iteration you loop over a and replace the letter in x.
I made x a list due to strings being immutable in python.
from random import sample
from time import sleep
a = "hello"
x = ["_" for _ in a]
letters = frozenset(a)
for letter in sample(letters, len(letters)):
print(''.join(x))
for i, replace in enumerate(a):
if replace == letter:
x[i] = letter
sleep(2)
print(''.join(x))
I have a string in Python, and I would like to shift a pattern 1 place earlier.
This is my string:
my_string = [AudioLengthInSecs: 37.4]hello[seconds_silence:
0.65]one[seconds_silence: 0.54]two[seconds_silence: 0.59]three[seconds_silence:
0.48]hello[seconds_silence: 2.32]
I would like to shift the numbers, after [seconds_silence: XXXX] one place earlier (and removing the first one, and the last one (since that one is shifted)). The result should be like this:
my_desired_string = [AudioLengthInSecs: 37.4]hello[seconds_silence: 0.54]one[seconds_silence: 0.59]two[seconds_silence:
0.48]three[seconds_silence: 2.32]hello
Here is my code:
import re
my_string = "[AudioLengthInSecs: 37.4]hello[seconds_silence:0.65]one[seconds_silence: 0.54]two[seconds_silence: 0.59]three[seconds_silence: 0.48]hello[seconds_silence: 2.32]"
# First, find all the numbers in the string
all_numbers = (re.findall('\d+', my_string ))
# Secondly, remove the first 4 numbers ()
all_numbers = all_numbers[4:]
# combine the numbers into one string
all_numbers
combined_numbers = [i+j for i,j in zip(all_numbers[::2], all_numbers[1::2])]
# Than loop over the string and instert
for word in my_string.split():
print(word)
if word == "[seconds_silence":
print(word)
# here i wanted to check if [soconds_silence was recognized
# and replace with value from combined_numbers
# however, this is failing obviously
The idea is to find all pairs:
the string preceding [seconds_silence: ...] fragment (capturing group No 1),
and the above fragment itself (capturing group No 2).
Then:
drop the first [seconds_silence: ...] fragment,
and join both lists,
but as they now have different length, itertools.zip_longest is needed.
So the whole code to do your task is:
import itertools
import re
my_string = '[AudioLengthInSecs: 37.4]hello[seconds_silence:0.65]'\
'one[seconds_silence: 0.54]two[seconds_silence: 0.59]'\
'three[seconds_silence: 0.48]hello[seconds_silence: 2.32]'
gr1 = []
gr2 = []
for mtch in re.findall(r'(.+?)(\[seconds_silence: ?[\d.]+\])', my_string):
g1, g2 = mtch
gr1.append(g1)
gr2.append(g2)
gr2.pop(0)
my_desired_string = ''
for g1, g2 in itertools.zip_longest(gr1, gr2, fillvalue=''):
my_desired_string += g1 + g2
print(my_desired_string)
I have a Python list of string names where I would like to remove a common substring from all of the names.
And after reading this similar answer I could almost achieve the desired result using SequenceMatcher.
But only when all items have a common substring:
From List:
string 1 = myKey_apples
string 2 = myKey_appleses
string 3 = myKey_oranges
common substring = "myKey_"
To List:
string 1 = apples
string 2 = appleses
string 3 = oranges
However I have a slightly noisy list that contains a few scattered items that don't fit the same naming convention.
I would like to remove the "most common" substring from the majority:
From List:
string 1 = myKey_apples
string 2 = myKey_appleses
string 3 = myKey_oranges
string 4 = foo
string 5 = myKey_Banannas
common substring = ""
To List:
string 1 = apples
string 2 = appleses
string 3 = oranges
string 4 = foo
string 5 = Banannas
I need a way to match the "myKey_" substring so I can remove it from all names.
But when I use the SequenceMatcher the item "foo" causes the "longest match" to be equal to blank "".
I think the only way to solve this is to find the "most common substring". But how could that be accomplished?
Basic example code:
from difflib import SequenceMatcher
names = ["myKey_apples",
"myKey_appleses",
"myKey_oranges",
#"foo",
"myKey_Banannas"]
string2 = names[0]
for i in range(1, len(names)):
string1 = string2
string2 = names[i]
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
print(string1[match.a: match.a + match.size]) # -> myKey_
Given names = ["myKey_apples", "myKey_appleses", "myKey_oranges", "foo", "myKey_Banannas"]
An O(n^2) solution I can think of is to find all possible substrings and storing them in a dictionary with the number of times they occur :
substring_counts={}
for i in range(0, len(names)):
for j in range(i+1,len(names)):
string1 = names[i]
string2 = names[j]
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
matching_substring=string1[match.a:match.a+match.size]
if(matching_substring not in substring_counts):
substring_counts[matching_substring]=1
else:
substring_counts[matching_substring]+=1
print(substring_counts) #{'myKey_': 5, 'myKey_apples': 1, 'o': 1, '': 3}
And then picking the maximum occurring substring
import operator
max_occurring_substring=max(substring_counts.iteritems(), key=operator.itemgetter(1))[0]
print(max_occurring_substring) #myKey_
Here's a overly verbose solution to your problem:
def find_matching_key(list_in, max_key_only = True):
"""
returns the longest matching key in the list * with the highest frequency
"""
keys = {}
curr_key = ''
# If n does not exceed max_n, don't bother adding
max_n = 0
for word in list(set(list_in)): #get unique values to speed up
for i in range(len(word)):
# Look up the whole word, then one less letter, sequentially
curr_key = word[0:len(word)-i]
# if not in, count occurance
if curr_key not in keys.keys() and curr_key!='':
n = 0
for word2 in list_in:
if curr_key in word2:
n+=1
# if large n, Add to dictionary
if n > max_n:
max_n = n
keys[curr_key] = n
# Finish the word
# Finish for loop
if max_key_only:
return max(keys, key=keys.get)
else:
return keys
# Create your "from list"
From_List = [
"myKey_apples",
"myKey_appleses",
"myKey_oranges",
"foo",
"myKey_Banannas"
]
# Use the function
key = find_matching_key(From_List, True)
# Iterate over your list, replacing values
new_From_List = [x.replace(key,'') for x in From_List]
print(new_From_List)
['apples', 'appleses', 'oranges', 'foo', 'Banannas']
Needless to say, this solution would look a lot neater with recursion. Thought I'd sketch out a rough dynamic programming solution for you though.
I would first find the starting letter with the most occurrences. Then I would take each word having that starting letter, and take while all these words have matching letters. Then in the end I would remove the prefix that was found from each starting word:
from collections import Counter
from itertools import takewhile
strings = ["myKey_apples", "myKey_appleses", "myKey_oranges", "berries"]
def remove_mc_prefix(words):
cnt = Counter()
for word in words:
cnt[word[0]] += 1
first_letter = list(cnt)[0]
filter_list = [word for word in words if word[0] == first_letter]
filter_list.sort(key = lambda s: len(s)) # To avoid iob
prefix = ""
length = len(filter_list[0])
for i in range(length):
test = filter_list[0][i]
if all([word[i] == test for word in filter_list]):
prefix += test
else: break
return [word[len(prefix):] if word.startswith(prefix) else word for word in words]
print(remove_mc_prefix(strings))
Out: ['apples', 'appleses', 'oranges', 'berries']
To find the most-common-substring from list of python-string
I already tested on python-3.10.5 I hope it will work for you.
I have the same use case but a different kind of task, I just need to find one common-pattern-string from a list of more than 100s files. To use as a regular-expression.
Your Basic example code is not working in my case. because 1st checking with 2nd, 2nd with 3rd, 3rd with 4th and so on. So, I change it to the most common substring and will check with each one.
The downside of this code is that if something is not common with the most common substring, the final most common substring will be an empty one.
But in my case, it is working.
from difflib import SequenceMatcher
for i in range(1, len(names)):
if i==1:
string1, string2 = names[0], names[i]
else:
string1, string2 = most_common_substring, names[i]
match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
most_common_substring = string1[match.a: match.a + match.size]
print(f"most_common_substring : {most_common_substring}")
python python-3python-difflib
Say i have this:
x = ["hello-543hello-454hello-765", "hello-745hello-635hello-321"]
how can i get the output to:
["hello-543: hello-454: hello-765", "hello-745: hello-635: hello-321"]
You can split each string based on substring length with a list comprehension using range where the step value is the number of characters each substring should contain. Then use join to convert each list back to a string with the desired separator characters.
x = ["hello-543hello-454hello-765", "hello-745hello-635hello-321"]
n = 9
result = [': '.join([s[i:i+n] for i in range(0, len(s), n)]) for s in x]
print(result)
# ['hello-543: hello-454: hello-765', 'hello-745: hello-635: hello-321']
Or with textwrap.wrap:
from textwrap import wrap
x = ["hello-543hello-454hello-765", "hello-745hello-635hello-321"]
n = 9
result = [': '.join(wrap(s, n)) for s in x]
print(result)
# ['hello-543: hello-454: hello-765', 'hello-745: hello-635: hello-321']
If you are sure every str length is multiply of your n, I would use re.findall for that task.
import re
txt1 = "hello-543hello-454hello-765"
txt2 = "hello-745hello-635hello-321"
out1 = ": ".join(re.findall(r'.{9}',txt1))
out2 = ": ".join(re.findall(r'.{9}',txt2))
print(out1) #hello-543: hello-454: hello-765
print(out2) #hello-745: hello-635: hello-321
.{9} in re.findall mean 9 of any characters excluding newline (\n), so this code would work properly as long as your strs do not contain \n. If this does not hold true you need to add re.DOTALL as third argument of re.findall
Say that I have 10 different tokens, "(TOKEN)" in a string. How do I replace 2 of those tokens, chosen at random, with some other string, leaving the other tokens intact?
>>> import random
>>> text = '(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)'
>>> token = '(TOKEN)'
>>> replace = 'foo'
>>> num_replacements = 2
>>> num_tokens = text.count(token) #10 in this case
>>> points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
>>> replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
'(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__foo__(TOKEN)__foo__(TOKEN)__(TOKEN)__(TOKEN)'
In function form:
>>> def random_replace(text, token, replace, num_replacements):
num_tokens = text.count(token)
points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
return replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
>>> random_replace('....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....','(TOKEN)','FOO',2)
'....FOO....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....FOO....'
Test:
>>> for i in range(0,9):
print random_replace('....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....','(0)','(%d)'%i,i)
....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(1)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(0)....(2)....(2)....(0)....
....(3)....(0)....(0)....(3)....(0)....(3)....(0)....(0)....
....(4)....(4)....(0)....(0)....(4)....(4)....(0)....(0)....
....(0)....(5)....(5)....(5)....(5)....(0)....(0)....(5)....
....(6)....(6)....(6)....(0)....(6)....(0)....(6)....(6)....
....(7)....(7)....(7)....(7)....(7)....(7)....(0)....(7)....
....(8)....(8)....(8)....(8)....(8)....(8)....(8)....(8)....
If you need exactly two, then:
Detect the tokens (keep some links to them, like index into the string)
Choose two at random (random.choice)
Replace them
What are you trying to do, exactly? A good answer will depend on that...
That said, a brute-force solution that comes to mind is to:
Store the 10 tokens in an array, such that tokens[0] is the first token, tokens[1] is the second, ... and so on
Create a dictionary to associate each unique "(TOKEN)" with two numbers: start_idx, end_idx
Write a little parser that walks through your string and looks for each of the 10 tokens. Whenever one is found, record the start/end indexes (as start_idx, end_idx) in the string where that token occurs.
Once done parsing, generate a random number in the range [0,9]. Lets call this R
Now, your random "(TOKEN)" is tokens[R];
Use the dictionary in step (3) to find the start_idx, end_idx values in the string; replace the text there with "some other string"
My solution in code:
import random
s = "(TOKEN)test(TOKEN)fgsfds(TOKEN)qwerty(TOKEN)42(TOKEN)(TOKEN)ttt"
replace_from = "(TOKEN)"
replace_to = "[REPLACED]"
amount_to_replace = 2
def random_replace(s, replace_from, replace_to, amount_to_replace):
parts = s.split(replace_from)
indices = random.sample(xrange(len(parts) - 1), amount_to_replace)
replaced_s_parts = list()
for i in xrange(len(parts)):
replaced_s_parts.append(parts[i])
if i < len(parts) - 1:
if i in indices:
replaced_s_parts.append(replace_to)
else:
replaced_s_parts.append(replace_from)
return "".join(replaced_s_parts)
#TEST
for i in xrange(5):
print random_replace(s, replace_from, replace_to, 2)
Explanation:
Splits string into several parts using replace_from
Chooses indexes of tokens to replace using random.sample. This returned list contains unique numbers
Build a list for string reconstruction, replacing tokens with generated index by replace_to.
Concatenate all list elements into single string
Try this solution:
import random
def replace_random(tokens, eqv, n):
random_tokens = eqv.keys()
random.shuffle(random_tokens)
for i in xrange(n):
t = random_tokens[i]
tokens = tokens.replace(t, eqv[t])
return tokens
Assuming that a string with tokens exists, and a suitable equivalence table can be constructed with a replacement for each token:
tokens = '(TOKEN1) (TOKEN2) (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) (TOKEN9) (TOKEN10)'
equivalences = {
'(TOKEN1)' : 'REPLACEMENT1',
'(TOKEN2)' : 'REPLACEMENT2',
'(TOKEN3)' : 'REPLACEMENT3',
'(TOKEN4)' : 'REPLACEMENT4',
'(TOKEN5)' : 'REPLACEMENT5',
'(TOKEN6)' : 'REPLACEMENT6',
'(TOKEN7)' : 'REPLACEMENT7',
'(TOKEN8)' : 'REPLACEMENT8',
'(TOKEN9)' : 'REPLACEMENT9',
'(TOKEN10)' : 'REPLACEMENT10'
}
You can call it like this:
replace_random(tokens, equivalences, 2)
> '(TOKEN1) REPLACEMENT2 (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) REPLACEMENT9 (TOKEN10)'
There are lots of ways to do this. My approach would be to write a function that takes the original string, the token string, and a function that returns the replacement text for an occurrence of the token in the original:
def strByReplacingTokensUsingFunction(original, token, function):
outputComponents = []
matchNumber = 0
unexaminedOffset = 0
while True:
matchOffset = original.find(token, unexaminedOffset)
if matchOffset < 0:
matchOffset = len(original)
outputComponents.append(original[unexaminedOffset:matchOffset])
if matchOffset == len(original):
break
unexaminedOffset = matchOffset + len(token)
replacement = function(original=original, offset=matchOffset, matchNumber=matchNumber, token=token)
outputComponents.append(replacement)
matchNumber += 1
return ''.join(outputComponents)
(You could certainly change this to use shorter identifiers. My style is somewhat more verbose than typical Python style.)
Given that function, it's easy to replace two random occurrences out of ten. Here's some sample input:
sampleInput = 'a(TOKEN)b(TOKEN)c(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)g(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k'
The random module has a handy method for picking random items from a population (not picking the same item twice):
import random
replacementIndexes = random.sample(range(10), 2)
Then we can use the function above to replace the randomly-chosen occurrences:
sampleOutput = strByReplacingTokensUsingFunction(sampleInput, '(TOKEN)',
(lambda matchNumber, token, **keywords:
'REPLACEMENT' if (matchNumber in replacementIndexes) else token))
print sampleOutput
And here's some test output:
a(TOKEN)b(TOKEN)cREPLACEMENTd(TOKEN)e(TOKEN)fREPLACEMENTg(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k
Here's another run:
a(TOKEN)bREPLACEMENTc(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)gREPLACEMENTh(TOKEN)i(TOKEN)j(TOKEN)k
from random import sample
mystr = 'adad(TOKEN)hgfh(TOKEN)hjgjh(TOKEN)kjhk(TOKEN)jkhjk(TOKEN)utuy(TOKEN)tyuu(TOKEN)tyuy(TOKEN)tyuy(TOKEN)tyuy(TOKEN)'
def replace(mystr, substr, n_repl, replacement='XXXXXXX', tokens=10, index=0):
choices = sorted(sample(xrange(tokens),n_repl))
for i in xrange(choices[-1]+1):
index = mystr.index(substr, index) + 1
if i in choices:
mystr = mystr[:index-1] + mystr[index-1:].replace(substr,replacement,1)
return mystr
print replace(mystr,'(TOKEN)',2)