Write a function called remove_duplicates which will take one argument called string.
This string input will only have characters between a-z.
The function should remove all repeated characters in the string and return a tuple with two values:
A new string with only unique, sorted characters.
The total number of duplicates dropped.
For example:
remove_duplicates('aaabbbac') should produce ('abc')
remove_duplicates('a') should produce ('a', 0)
remove_duplicates('thelexash') should produce ('aehlstx', 2)
My Code:
def remove_duplicates(string):
for string in "abcdefghijklmnopqrstuvwxyz":
k = set(string)
x = len(string) - len(set(string))
return k, x
print(remove_duplicates("aaabbbccc"))
Expected Output:
I'm expecting it to print ({a, b, c}, 6) instead it prints ({a}, 0).
What is wrong with my code above? Why it isn't producing what I'm expecting?
You'll get the expected result if you don't iterate over each char in the string.
I've commented your code so you'll can see the difference between your script and mine.
Non-working commented code:
def remove_duplicates(string):
#loop through each char in "abcdefghijklmnopqrstuvwxyz" and call it "string"
for string in "abcdefghijklmnopqrstuvwxyz":
#create variable k that holds a set of 1 char because of the loop
k = set(string)
# create a variable x that holds the difference between 1 and 1 = 0
x = len(string) - len(set(string))
#return these values in each iteration
return k, x
print(remove_duplicates("aaabbbccc"))
Outputs:
({'a'}, 0)
Working code:
def remove_duplicates(string):
#create variable k that holds a set of each unique char present in string
k = set(string)
# create a variable x that holds the difference between 1 and 1 = 0
x = len(string) - len(set(string))
#return these values
return k, x
print(remove_duplicates("aaabbbccc"))
Outputs:
({'b', 'c', 'a'}, 6)
P.s.: if you want your results to be in order, you can change return k, x to return sorted(k), x, but then the output will be a list.
(['a', 'b', 'c'], 6)
EDIT: if you want your code runs only if certain condition is met - for example, runs only if string don't have any number - you can add an if/else clause:
Example Code:
def remove_duplicates(s):
if not s.isdigit():
k = set(s)
x = len(s) - len(set(s))
return sorted(k), x
else:
msg = "This function only works with strings that doesn't contain any digits.."
return msg
print(remove_duplicates("aaabbbccc"))
print(remove_duplicates("123123122"))
Outputs:
(['a', 'b', 'c'], 6)
This function only works with strings that doesn't contain any digits..
In your code, the function will return after iterating the 1st character.
As string refers to the first char in the input string. I think you are trying to iterate over the string variable character-by-character.
For this, you can use collections.Counter which performs the same computation more efficiently.
However, we can work with an alternate solution which doesn't involve computing the count of each character in the given string.
def remove_duplicates(s):
unique_characters = set(s) # extract the unique characters in the given string
new_sorted_string = ''.join(sorted(unique_characters)) # create the sorted string with unique characters
number_of_duplicates = len(s) - len(unique_characters) # compute the number of duplicates in the original string
return new_sorted_string, number_of_duplicates
You are returning from the function at the first instance where a character is found. So it returns for the first "a".
Try this instead :
def remove_duplicates(string):
temp = set(string)
return temp,len(string) - len(temp)
print(remove_duplicates("aaabbbccc"))
Output :
({'c', 'b', 'a'}, 6)
If you want to remove everything expect alphabets (as you mentioned in comments) try this:
def remove_duplicates(string):
a= set()
for i in string:
if i.isalpha() and i not in a:
a.add(i)
return a,len(string) - len(a)
def remove_duplicates(s):
unique_characters = set(s) # extract the unique characters in the given
string
new_sorted_string = ''.join(sorted(unique_characters)) # create the sorted string with unique characters
number_of_duplicates = len(s) - len(unique_characters) # compute the number of duplicates in the original string
return new_sorted_string, number_of_duplicates
Related
Say I have a string in alphabetical order, based on the amount of times that a letter repeats.
Example: "BBBAADDC".
There are 3 B's, so they go at the start, 2 A's and 2 D's, so the A's go in front of the D's because they are in alphabetical order, and 1 C. Another example would be CCCCAAABBDDAB.
Note that there can be 4 letters in the middle somewhere (i.e. CCCC), as there could be 2 pairs of 2 letters.
However, let's say I can only have n letters in a row. For example, if n = 3 in the second example, then I would have to omit one "C" from the first substring of 4 C's, because there can only be a maximum of 3 of the same letters in a row.
Another example would be the string "CCCDDDAABC"; if n = 2, I would have to remove one C and one D to get the string CCDDAABC
Example input/output:
n=2: Input: AAABBCCCCDE, Output: AABBCCDE
n=4: Input: EEEEEFFFFGGG, Output: EEEEFFFFGGG
n=1: Input: XXYYZZ, Output: XYZ
How can I do this with Python? Thanks in advance!
This is what I have right now, although I'm not sure if it's on the right track. Here, z is the length of the string.
for k in range(z+1):
if final_string[k] == final_string[k+1] == final_string[k+2] == final_string[k+3]:
final_string = final_string.translate({ord(final_string[k]): None})
return final_string
Ok, based on your comment, you're either pre-sorting the string or it doesn't need to be sorted by the function you're trying to create. You can do this more easily with itertools.groupby():
import itertools
def max_seq(text, n=1):
result = []
for k, g in itertools.groupby(text):
result.extend(list(g)[:n])
return ''.join(result)
max_seq('AAABBCCCCDE', 2)
# 'AABBCCDE'
max_seq('EEEEEFFFFGGG', 4)
# 'EEEEFFFFGGG'
max_seq('XXYYZZ')
# 'XYZ'
max_seq('CCCDDDAABC', 2)
# 'CCDDAABC'
In each group g, it's expanded and then sliced until n elements (the [:n] part) so you get each letter at most n times in a row. If the same letter appears elsewhere, it's treated as an independent sequence when counting n in a row.
Edit: Here's a shorter version, which may also perform better for very long strings. And while we're using itertools, this one additionally utilises itertools.chain.from_iterable() to create the flattened list of letters. And since each of these is a generator, it's only evaluated/expanded at the last line:
import itertools
def max_seq(text, n=1):
sequences = (list(g)[:n] for _, g in itertools.groupby(text))
letters = itertools.chain.from_iterable(sequences)
return ''.join(letters)
hello = "hello frrriend"
def replacing() -> str:
global hello
j = 0
for i in hello:
if j == 0:
pass
else:
if i == prev:
hello = hello.replace(i, "")
prev = i
prev = i
j += 1
return hello
replacing()
looks a bit primal but i think it works, thats what i came up with on the go anyways , hope it helps :D
Here's my solution:
def snip_string(string, n):
list_string = list(string)
list_string.sort()
chars = set(string)
for char in chars:
while list_string.count(char) > n:
list_string.remove(char)
return ''.join(list_string)
Calling the function with various values for n gives the following output:
>>> string = "AAAABBBCCCDDD"
>>> snip_string(string, 1)
'ABCD'
>>> snip_string(string, 2)
'AABBCCDD'
>>> snip_string(string, 3)
'AAABBBCCCDDD'
>>>
Edit
Here is the updated version of my solution, which only removes characters if the group of repeated characters exceeds n.
import itertools
def snip_string(string, n):
groups = [list(g) for k, g in itertools.groupby(string)]
string_list = []
for group in groups:
while len(group) > n:
del group[-1]
string_list.extend(group)
return ''.join(string_list)
Output:
>>> string = "DDDAABBBBCCABCDE"
>>> snip_string(string, 3)
'DDDAABBBCCABCDE'
from itertools import groupby
n = 2
def rem(string):
out = "".join(["".join(list(g)[:n]) for _, g in groupby(string)])
print(out)
So this is the entire code for your question.
s = "AABBCCDDEEE"
s2 = "AAAABBBDDDDDDD"
s3 = "CCCCAAABBDDABBB"
s4 = "AAAAAAAA"
z = "AAABBCCCCDE"
With following test:
AABBCCDDEE
AABBDD
CCAABBDDABB
AA
AABBCCDE
Given a string containing uppercase alphabets (A-Z), compress the string using Run Length encoding. Repetition of character has to be replaced by storing the length of that run.
I tried the following codes
#Code 1: Tried on my own
def encode(message):
list1=[]
for i in range (0,len(message)):
count = 1
while(i < len(message)-1 and message[i]==message[i+1]):
count+=1
i+=1
list1=str(count)+message[i]
return list1
encoded_message=encode("ABBBBCCCCCCCCAB")
print(encoded_message)
Input:AAAABBBBCCCCCCCC
Expected Output: 4A4B8C
#code 2:I tried this by looking at another code based on run-length encoding
def encode(message):
list1=[]
count=1
for i in range (1,len(message)):
if(message[i]==message[i-1]):
count+=1
else:
list1.append((count,list1[i-1]))
count=1
if i == len(messege) - 1 :
list1.append((count , data[i]))
return list1
encoded_message=encode("ABBBBCCCCCCCCAB")
print(encoded_message)
Input:AAAABBBBCCCCCCCC
Expected Output: 4A4B8C
The first code gives output as 2B
def encode(message):
pairs = []
for char in message:
if len(pairs) > 0:
if pairs[-1][0] == char:
pairs[-1] = (char, pairs[-1][1] + 1)
else:
pairs.append((char, 1))
else:
pairs.append((char, 1))
strings = []
for letter, count in pairs:
strings.append(f"{count}{letter.upper()}")
return "".join(strings)
print(encode("ABBBBCCCCCCCCAB"))
print(encode("AAAABBBBCCCCCCCC"))
This outputs:
1A4B8C1A1B
4A4B8C
This is a very good use for the groupby function from itertools:
from itertools import groupby
message = 'AAAABBBBCCCCCCCC'
''.join('{}{}'.format(len(list(g)), c) for c, g in groupby(message))
Based on your code #2 method I have tweaked it to give out the output as you have in Expected Output: 4A4B8C
basically, your returning a tuple in a list so you needed to make it a string instead and add to it your also using data but have no data variable and your trying to find the content of the message, not your list so the code would be
def encode2(message):
encoded_return_message = ""
count=1
for i in range (1,len(message)):
if(message[i]==message[i-1]):
count+=1
else:
encoded_return_message += (f'{count}{message[i-1]}')
count=1
if i == len(message) - 1 :
encoded_return_message +=(f'{count}{message[i]}')
return encoded_return_message
encoded_message=encode2("ABBBBCCCCCCCCAB")
print(str(encoded_message))
I also did a demo on Repl.it
https://repl.it/repls/RowdyFloralwhiteBlockchain
Personally I would do that task using re module following way:
import re
text = 'AAAABBBBCCCCCCCC'
def sub_function(m):
span = m.span()
return f"{span[1]-span[0]}"+m.groups()[0]
out = re.sub(r'(\w)(\1*)',sub_function,text)
print(out)
Output:
4A4B8C
Explanation: pattern in re.sub is looking for letter followed by 0 or more occurences of same letter, than every such substring is feed to sub_function which calculate overall length of substring and return that value concatenated with first letter (which is same as all others) of substring. Note that I used so-called f-string in my code which is not available in older versions (I tested my code in Python 3.6.7), if you have to use older version you need to use other string formatting method. Note also that my code as is would replace single letter with digit 1 plus that letter for example input ABC would result in 1A1B1C, if you wish to retain single letters without adding 1 then change 1st argument of re.sub from r'(\w1)(\1*)' to r'(\w1)(\1+)'
Though maybe now I am the guy with hammer seeing nails everywhere.
def encode(message):
count=0
characters=''
previous_char=message[0]
result=''
length=len(message)
i=0
while(i!=length):
character=message[i]
if previous_char==character:
count=count+1
else:
result=result+str(count)+previous_char
count=1
previous_char=character
i=i+1
return result+str(count)+str(previous_char)
encoded_messsage=encode("ABBBBCCCCCCCCAB")
print(encoded_message)
Input is:ABBBBCCCCCCCCAB
output is:1A4B8C1A1B
def encodeString(s):
encoded = ""
ctr = 1
for i in range(len(s)-1):
if s[i]==s[i+1]:
ctr += 1
i += 1
else:
encoded = encoded + str(ctr) + s[i]
i += 1
ctr = 1
#print(encoded)
encoded = encoded + str(ctr) + s[i]
#print(encoded)
return encoded
Input :"AAAAABBCCDDAB"
Output: 5A2B2C4D1A1B
def encode(message):
list1=[]
count=1
for i in range (1,len(message)):
if(message[i].upper()==message[i-1].upper()):
count+=1
else:
list1.append(f"{count}{message[i-1].upper()}")
count=1
if i == len(message) - 1 :
list1.append(f"{count}{message[i].upper()}")
return "".join(list1)
encoded_message=encode("ABBBBCCCCCCCCAB")
print(encoded_message)
I need to exchange the middle character in a numeric string of 15 numbers with the last number of the string.
So I get that this:
def string(str):
return str[-1:] + str[1:-1] + str[:1]
print(string('abcd'))
print(string('12345'))
RESULTS:
dbca
52341
But how can I make it so that in the initial input string, 012345678912345,
where the 7 is exchanged with the last character in the string 5?
Consider
def last_to_mid(s):
if len(s) == 1:
return s
if len(s)%2 == 0:
raise ValueError('expected string of odd length')
idx = len(s)//2
return f'{s[:idx]}{s[-1]}{s[idx+1:-1]}{s[idx]}'
operating like this:
>>> last_to_mid('021')
'012'
>>> last_to_mid('0123x4567')
'01237456x'
>>> last_to_mid('1')
'1'
Assuming you have Python 3.6 or newer for f-strings.
You can have a function for this:
In [178]: def swap_index_values(my_string):
...: l = list(my_string)
...: middleIndex = (len(l) - 1)/2
...: middle_val = l[middleIndex]
...: l[middleIndex] = l[-1]
...: l[-1] = middle_val
...: return ''.join(l)
...:
In [179]:
In [179]: a
Out[179]: '012345678912345'
In [180]: swap_index_values(a)
Out[180]: '012345658912347'
Above, you can see that middle value and last values have been exchanged.
In this very specific context (always the middle and last character of a string of length 15), your initial approach can be extended to:
text[0:7]+text[-1]+text[8:-1]+text[7]
Also try to avoid variable names like str, since they shadow the function of the same name.
s1='1243125'
s2=s1[:len(s1)//2] + s1[-1] + s1[len(s1)//2 + 1:]
print(s2)
'1245125'
I am trying to find all the occurences of "|" in a string.
def findSectionOffsets(text):
startingPos = 0
endPos = len(text)
for position in text.find("|",startingPos, endPos):
print position
endPos = position
But I get an error:
for position in text.find("|",startingPos, endPos):
TypeError: 'int' object is not iterable
The function:
def findOccurrences(s, ch):
return [i for i, letter in enumerate(s) if letter == ch]
findOccurrences(yourString, '|')
will return a list of the indices of yourString in which the | occur.
if you want index of all occurrences of | character in a string you can do this
import re
str = "aaaaaa|bbbbbb|ccccc|dddd"
indexes = [x.start() for x in re.finditer('\|', str)]
print(indexes) # <-- [6, 13, 19]
also you can do
indexes = [x for x, v in enumerate(str) if v == '|']
print(indexes) # <-- [6, 13, 19]
It is easier to use regular expressions here;
import re
def findSectionOffsets(text):
for m in re.finditer('\|', text):
print m.start(0)
import re
def findSectionOffsets(text)
for i,m in enumerate(re.finditer('\|',text)) :
print i, m.start(), m.end()
text.find returns an integer (the index at which the desired string is found), so you can run for loop over it.
I suggest:
def findSectionOffsets(text):
indexes = []
startposition = 0
while True:
i = text.find("|", startposition)
if i == -1: break
indexes.append(i)
startposition = i + 1
return indexes
If text is the string that you want to count how many "|" it contains, the following line of code returns the count:
len(text.split("|"))-1
Note: This will also work for searching sub-strings.
text.find() only returns the first result, and then you need to set the new starting position based on that. So like this:
def findSectionOffsets(text):
startingPos = 0
position = text.find("|", startingPos):
while position > -1:
print position
startingPos = position + 1
position = text.find("|", startingPos)
Say that I have 10 different tokens, "(TOKEN)" in a string. How do I replace 2 of those tokens, chosen at random, with some other string, leaving the other tokens intact?
>>> import random
>>> text = '(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)'
>>> token = '(TOKEN)'
>>> replace = 'foo'
>>> num_replacements = 2
>>> num_tokens = text.count(token) #10 in this case
>>> points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
>>> replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
'(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__foo__(TOKEN)__foo__(TOKEN)__(TOKEN)__(TOKEN)'
In function form:
>>> def random_replace(text, token, replace, num_replacements):
num_tokens = text.count(token)
points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
return replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
>>> random_replace('....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....','(TOKEN)','FOO',2)
'....FOO....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....FOO....'
Test:
>>> for i in range(0,9):
print random_replace('....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....','(0)','(%d)'%i,i)
....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(1)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(0)....(2)....(2)....(0)....
....(3)....(0)....(0)....(3)....(0)....(3)....(0)....(0)....
....(4)....(4)....(0)....(0)....(4)....(4)....(0)....(0)....
....(0)....(5)....(5)....(5)....(5)....(0)....(0)....(5)....
....(6)....(6)....(6)....(0)....(6)....(0)....(6)....(6)....
....(7)....(7)....(7)....(7)....(7)....(7)....(0)....(7)....
....(8)....(8)....(8)....(8)....(8)....(8)....(8)....(8)....
If you need exactly two, then:
Detect the tokens (keep some links to them, like index into the string)
Choose two at random (random.choice)
Replace them
What are you trying to do, exactly? A good answer will depend on that...
That said, a brute-force solution that comes to mind is to:
Store the 10 tokens in an array, such that tokens[0] is the first token, tokens[1] is the second, ... and so on
Create a dictionary to associate each unique "(TOKEN)" with two numbers: start_idx, end_idx
Write a little parser that walks through your string and looks for each of the 10 tokens. Whenever one is found, record the start/end indexes (as start_idx, end_idx) in the string where that token occurs.
Once done parsing, generate a random number in the range [0,9]. Lets call this R
Now, your random "(TOKEN)" is tokens[R];
Use the dictionary in step (3) to find the start_idx, end_idx values in the string; replace the text there with "some other string"
My solution in code:
import random
s = "(TOKEN)test(TOKEN)fgsfds(TOKEN)qwerty(TOKEN)42(TOKEN)(TOKEN)ttt"
replace_from = "(TOKEN)"
replace_to = "[REPLACED]"
amount_to_replace = 2
def random_replace(s, replace_from, replace_to, amount_to_replace):
parts = s.split(replace_from)
indices = random.sample(xrange(len(parts) - 1), amount_to_replace)
replaced_s_parts = list()
for i in xrange(len(parts)):
replaced_s_parts.append(parts[i])
if i < len(parts) - 1:
if i in indices:
replaced_s_parts.append(replace_to)
else:
replaced_s_parts.append(replace_from)
return "".join(replaced_s_parts)
#TEST
for i in xrange(5):
print random_replace(s, replace_from, replace_to, 2)
Explanation:
Splits string into several parts using replace_from
Chooses indexes of tokens to replace using random.sample. This returned list contains unique numbers
Build a list for string reconstruction, replacing tokens with generated index by replace_to.
Concatenate all list elements into single string
Try this solution:
import random
def replace_random(tokens, eqv, n):
random_tokens = eqv.keys()
random.shuffle(random_tokens)
for i in xrange(n):
t = random_tokens[i]
tokens = tokens.replace(t, eqv[t])
return tokens
Assuming that a string with tokens exists, and a suitable equivalence table can be constructed with a replacement for each token:
tokens = '(TOKEN1) (TOKEN2) (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) (TOKEN9) (TOKEN10)'
equivalences = {
'(TOKEN1)' : 'REPLACEMENT1',
'(TOKEN2)' : 'REPLACEMENT2',
'(TOKEN3)' : 'REPLACEMENT3',
'(TOKEN4)' : 'REPLACEMENT4',
'(TOKEN5)' : 'REPLACEMENT5',
'(TOKEN6)' : 'REPLACEMENT6',
'(TOKEN7)' : 'REPLACEMENT7',
'(TOKEN8)' : 'REPLACEMENT8',
'(TOKEN9)' : 'REPLACEMENT9',
'(TOKEN10)' : 'REPLACEMENT10'
}
You can call it like this:
replace_random(tokens, equivalences, 2)
> '(TOKEN1) REPLACEMENT2 (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) REPLACEMENT9 (TOKEN10)'
There are lots of ways to do this. My approach would be to write a function that takes the original string, the token string, and a function that returns the replacement text for an occurrence of the token in the original:
def strByReplacingTokensUsingFunction(original, token, function):
outputComponents = []
matchNumber = 0
unexaminedOffset = 0
while True:
matchOffset = original.find(token, unexaminedOffset)
if matchOffset < 0:
matchOffset = len(original)
outputComponents.append(original[unexaminedOffset:matchOffset])
if matchOffset == len(original):
break
unexaminedOffset = matchOffset + len(token)
replacement = function(original=original, offset=matchOffset, matchNumber=matchNumber, token=token)
outputComponents.append(replacement)
matchNumber += 1
return ''.join(outputComponents)
(You could certainly change this to use shorter identifiers. My style is somewhat more verbose than typical Python style.)
Given that function, it's easy to replace two random occurrences out of ten. Here's some sample input:
sampleInput = 'a(TOKEN)b(TOKEN)c(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)g(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k'
The random module has a handy method for picking random items from a population (not picking the same item twice):
import random
replacementIndexes = random.sample(range(10), 2)
Then we can use the function above to replace the randomly-chosen occurrences:
sampleOutput = strByReplacingTokensUsingFunction(sampleInput, '(TOKEN)',
(lambda matchNumber, token, **keywords:
'REPLACEMENT' if (matchNumber in replacementIndexes) else token))
print sampleOutput
And here's some test output:
a(TOKEN)b(TOKEN)cREPLACEMENTd(TOKEN)e(TOKEN)fREPLACEMENTg(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k
Here's another run:
a(TOKEN)bREPLACEMENTc(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)gREPLACEMENTh(TOKEN)i(TOKEN)j(TOKEN)k
from random import sample
mystr = 'adad(TOKEN)hgfh(TOKEN)hjgjh(TOKEN)kjhk(TOKEN)jkhjk(TOKEN)utuy(TOKEN)tyuu(TOKEN)tyuy(TOKEN)tyuy(TOKEN)tyuy(TOKEN)'
def replace(mystr, substr, n_repl, replacement='XXXXXXX', tokens=10, index=0):
choices = sorted(sample(xrange(tokens),n_repl))
for i in xrange(choices[-1]+1):
index = mystr.index(substr, index) + 1
if i in choices:
mystr = mystr[:index-1] + mystr[index-1:].replace(substr,replacement,1)
return mystr
print replace(mystr,'(TOKEN)',2)