Python - how to multiply characters in string by number after character - python

Title, for example I want to make 'A3G3A' into 'AAAGGGA'.
I have this so far:
if any(i.isdigit() for i in string):
for i in range(0, len(string)):
if string[i].isdigit():
(i am lost after this)

Here's a simplistic approach:
string = 'A3G3A'
expanded = ''
for character in string:
if character.isdigit():
expanded += expanded[-1] * (int(character) - 1)
else:
expanded += character
print(expanded)
OUTPUT: AAAGGGA
It assumes valid input. It's limitation is that the repetition factor has to be a single digit, e.g. 2 - 9. If we want repetition factors greater than 9, we have to do slightly more parsing of the string:
from itertools import groupby
groups = groupby('DA10G3ABC', str.isdigit)
expanded = []
for is_numeric, characters in groups:
if is_numeric:
expanded.append(expanded[-1] * (int(''.join(characters)) - 1))
else:
expanded.extend(characters)
print(''.join(expanded))
OUTPUT: DAAAAAAAAAAGGGABC

Assuming that the format is always a letter followed by an integer, with the last integer possibly missing:
>>> from itertools import izip_longest
>>> s = 'A3G3A'
>>> ''.join(c*int(i) for c, i in izip_longest(*[iter(s)]*2, fillvalue=1))
'AAAGGGA'
Assuming that the format can be any substring followed by an integer, with the integer possibly longer than one digit and the last integer possibly missing:
>>> from itertools import izip_longest
>>> import re
>>> s = 'AB10GY3ABC'
>>> sp = re.split('(\d+)', s)
>>> ''.join(c*int(i) for c, i in izip_longest(*[iter(sp)]*2, fillvalue=1))
'ABABABABABABABABABABGYGYGYABC'

A minimal pure python code which manage all cases.
output = ''
n = ''
c = ''
for x in input + 'a':
if x.isdigit():
n += x
else:
if n == '':
n = '1'
output = output + c*int(n)
n = ''
c = x
with input="WA5OUH2!10", output is WAAAAAOUHH!!!!!!!!!!.
+'a' is to enforce the good behaviour at the end, because output is delayed.

Another approach could be -
import re
input_string = 'A3G3A'
alphabets = re.findall('[A-Z]', input_string) # List of all alphabets - ['A', 'G', 'A']
digits = re.findall('[0-9]+', input_string) # List of all numbers - ['3', '3']
final_output = "".join([alphabets[i]*int(digits[i]) for i in range(0, len(alphabets)-1)]) + alphabets[-1]
# This expression repeats each letter by the number next to it ( Except for the last letter ), joins the list of strings into a single string, and appends the last character
# final_output - 'AAAGGGA'
Explanation -
In [31]: alphabets # List of alphabets in the string
Out[31]: ['A', 'G', 'A']
In [32]: digits # List of numbers in the string ( Including numbers more than one digit)
Out[32]: ['3', '3']
In [33]: list_of_strings = [alphabets[i]*int(digits[i]) for i in range(0, len(alphabets)-1)] # List of strings after repetition
In [34]: list_of_strings
Out[34]: ['AAA', 'GGG']
In [35]: joined_string = "".join(list_of_strings) # Joined list of strings
In [36]: joined_string
Out[36]: 'AAAGGG'
In [38]: final_output = joined_string + input_string[-1] # Append last character of the string
In [39]: final_output
Out[39]: 'AAAGGGA'

using the * to repeat the characters:
assumption repeater range between [1,9]
q = 'A3G3A'
try:
int(q[-1]) # check if it ends with digit
except:
q = q+'1' # repeat only once
"".join([list(q)[i]*int(list(q)[i+1]) for i in range(0,len(q),2)])

One line solution. Assuming numbers in the range [0, 9].
>>> s = 'A3G3A'
>>> s = ''.join(s[i] if not s[i].isdigit() else s[i-1]*(int(s[i])-1) for i in range(0, len(s)))
>>> print(s)
AAAGGGA

Embrace regex! This finds all occurrences of the pattern non-digit character followed by non-negative integer (any number of digits) and replaces that substring with that many of the character.
import re
re.sub(r'(\D)(\d+)', lambda m: m.group(1) * int(m.group(2)), 'A3G3A')

This can be solved by numpy:
import numpy as np
x = 'A3G3A'
if not x[-1].isdigit():
x += '1'
letters = list(x[::2])
times = list(map(int,x[1::2]))
lst = ''.join(np.repeat(letters, times))
#output
'AAAGGGA'

Related

Python - removing repeated letters in a string

Say I have a string in alphabetical order, based on the amount of times that a letter repeats.
Example: "BBBAADDC".
There are 3 B's, so they go at the start, 2 A's and 2 D's, so the A's go in front of the D's because they are in alphabetical order, and 1 C. Another example would be CCCCAAABBDDAB.
Note that there can be 4 letters in the middle somewhere (i.e. CCCC), as there could be 2 pairs of 2 letters.
However, let's say I can only have n letters in a row. For example, if n = 3 in the second example, then I would have to omit one "C" from the first substring of 4 C's, because there can only be a maximum of 3 of the same letters in a row.
Another example would be the string "CCCDDDAABC"; if n = 2, I would have to remove one C and one D to get the string CCDDAABC
Example input/output:
n=2: Input: AAABBCCCCDE, Output: AABBCCDE
n=4: Input: EEEEEFFFFGGG, Output: EEEEFFFFGGG
n=1: Input: XXYYZZ, Output: XYZ
How can I do this with Python? Thanks in advance!
This is what I have right now, although I'm not sure if it's on the right track. Here, z is the length of the string.
for k in range(z+1):
if final_string[k] == final_string[k+1] == final_string[k+2] == final_string[k+3]:
final_string = final_string.translate({ord(final_string[k]): None})
return final_string
Ok, based on your comment, you're either pre-sorting the string or it doesn't need to be sorted by the function you're trying to create. You can do this more easily with itertools.groupby():
import itertools
def max_seq(text, n=1):
result = []
for k, g in itertools.groupby(text):
result.extend(list(g)[:n])
return ''.join(result)
max_seq('AAABBCCCCDE', 2)
# 'AABBCCDE'
max_seq('EEEEEFFFFGGG', 4)
# 'EEEEFFFFGGG'
max_seq('XXYYZZ')
# 'XYZ'
max_seq('CCCDDDAABC', 2)
# 'CCDDAABC'
In each group g, it's expanded and then sliced until n elements (the [:n] part) so you get each letter at most n times in a row. If the same letter appears elsewhere, it's treated as an independent sequence when counting n in a row.
Edit: Here's a shorter version, which may also perform better for very long strings. And while we're using itertools, this one additionally utilises itertools.chain.from_iterable() to create the flattened list of letters. And since each of these is a generator, it's only evaluated/expanded at the last line:
import itertools
def max_seq(text, n=1):
sequences = (list(g)[:n] for _, g in itertools.groupby(text))
letters = itertools.chain.from_iterable(sequences)
return ''.join(letters)
hello = "hello frrriend"
def replacing() -> str:
global hello
j = 0
for i in hello:
if j == 0:
pass
else:
if i == prev:
hello = hello.replace(i, "")
prev = i
prev = i
j += 1
return hello
replacing()
looks a bit primal but i think it works, thats what i came up with on the go anyways , hope it helps :D
Here's my solution:
def snip_string(string, n):
list_string = list(string)
list_string.sort()
chars = set(string)
for char in chars:
while list_string.count(char) > n:
list_string.remove(char)
return ''.join(list_string)
Calling the function with various values for n gives the following output:
>>> string = "AAAABBBCCCDDD"
>>> snip_string(string, 1)
'ABCD'
>>> snip_string(string, 2)
'AABBCCDD'
>>> snip_string(string, 3)
'AAABBBCCCDDD'
>>>
Edit
Here is the updated version of my solution, which only removes characters if the group of repeated characters exceeds n.
import itertools
def snip_string(string, n):
groups = [list(g) for k, g in itertools.groupby(string)]
string_list = []
for group in groups:
while len(group) > n:
del group[-1]
string_list.extend(group)
return ''.join(string_list)
Output:
>>> string = "DDDAABBBBCCABCDE"
>>> snip_string(string, 3)
'DDDAABBBCCABCDE'
from itertools import groupby
n = 2
def rem(string):
out = "".join(["".join(list(g)[:n]) for _, g in groupby(string)])
print(out)
So this is the entire code for your question.
s = "AABBCCDDEEE"
s2 = "AAAABBBDDDDDDD"
s3 = "CCCCAAABBDDABBB"
s4 = "AAAAAAAA"
z = "AAABBCCCCDE"
With following test:
AABBCCDDEE
AABBDD
CCAABBDDABB
AA
AABBCCDE

Count of sub-strings that contain character X at least once. E.g Input: str = “abcd”, X = ‘b’ Output: 6

This question was asked in an exam but my code (given below) passed just 2 cases out of 7 cases.
Input Format : single line input seperated by comma
Input: str = “abcd,b”
Output: 6
“ab”, “abc”, “abcd”, “b”, “bc” and “bcd” are the required sub-strings.
def slicing(s, k, n):
loop_value = n - k + 1
res = []
for i in range(loop_value):
res.append(s[i: i + k])
return res
x, y = input().split(',')
n = len(x)
res1 = []
for i in range(1, n + 1):
res1 += slicing(x, i, n)
count = 0
for ele in res1:
if y in ele:
count += 1
print(count)
When the target string (ts) is found in the string S, you can compute the number of substrings containing that instance by multiplying the number of characters before the target by the number of characters after the target (plus one on each side).
This will cover all substrings that contain this instance of the target string leaving only the "after" part to analyse further, which you can do recursively.
def countsubs(S,ts):
if ts not in S: return 0 # shorter or no match
before,after = S.split(ts,1) # split on target
result = (len(before)+1)*(len(after)+1) # count for this instance
return result + countsubs(ts[1:]+after,ts) # recurse with right side
print(countsubs("abcd","b")) # 6
This will work for single character and multi-character targets and will run much faster than checking all combinations of substrings one by one.
Here is a simple solution without recursion:
def my_function(s):
l, target = s.split(',')
result = []
for i in range(len(l)):
for j in range(i+1, len(l)+1):
ss = l[i] + l[i+1:j]
if target in ss:
result.append(ss)
return f'count = {len(result)}, substrings = {result}'
print(my_function("abcd,b"))
#count = 6, substrings = ['ab', 'abc', 'abcd', 'b', 'bc', 'bcd']
Here you go, this should help
from itertools import combinations
output = []
initial = input('Enter string and needed letter seperated by commas: ') #Asking for input
list1 = initial.split(',') #splitting the input into two parts i.e the actual text and the letter we want common in output
text = list1[0]
final = [''.join(l) for i in range(len(text)) for l in combinations(text, i+1)] #this is the core part of our code, from this statement we get all the available combinations of the set of letters (all the way from 1 letter combinations to nth letter)
for i in final:
if 'b' in i:
output.append(i) #only outputting the results which have the required letter/phrase in it

Splitting an unspaced string of decimal values - Python

An awful person has given me a string like this
values = '.850000.900000.9500001.000001.50000'
and I need to split it to create the following list:
['.850000', '.900000', '.950000', '1.00000', '1.500000']
I know that I was dealing only with numbers < 1 I could use the code
dl = '.'
splitvalues = [dl+e for e in values.split(dl) if e != ""]
But in cases like this one where there are numbers greater than 1 buried in the string, splitvalue would end up being
['.850000', '.900000', '.9500001', '.000001', '.50000']
So is there a way to split a string with multiple delimiters while also splitting the string differently based on which delimiter is encountered?
I think this is somewhat closer to a fixed width format string. Try a regular expression like this:
import re
str = "(\d{1,2}\\.\d{5})"
m = re.search(str, input_str)
your_first_number = m.group(0)
Try this repeatedly on the remaining string to consume all numbers.
>>> import re
>>> source = '0.850000.900000.9500001.000001.50000'
>>> re.findall("(.*?00+(?!=0))", source)
['0.850000', '.900000', '.950000', '1.00000', '1.50000']
The split is based on looking for "{anything, double zero, a run of zeros (followed by a not-zero)"}.
Assume that the value before the decimal is less than 10, and then we have,
values = '0.850000.900000.9500001.000001.50000'
result = list()
last_digit = None
for value in values.split('.'):
if value.endswith('0'):
result.append(''.join([i for i in [last_digit, '.', value] if i]))
last_digit = None
else:
result.append(''.join([i for i in [last_digit, '.', value[0:-1]] if i]))
last_digit = value[-1]
if values.startswith('0'):
result = result[1:]
print(result)
# Output
['.850000', '.900000', '.950000', '1.00000', '1.50000']
How about using re.split():
import re
values = '0.850000.900000.9500001.000001.50000'
print([a + b for a, b in zip(*(lambda x: (x[1::2], x[2::2]))(re.split(r"(\d\.)", values)))])
OUTPUT
['0.85000', '0.90000', '0.950000', '1.00000', '1.50000']
Here digits are of fixed width, i.e. 6, if include the dot it's 7. Get the slices from 0 to 7 and 7 to 14 and so on. Because we don't need the initial zero, I use the slice values[1:] for extraction.
values = '0.850000.900000.9500001.000001.50000'
[values[1:][start:start+7] for start in range(0,len(values[1:]),7)]
['.850000', '.900000', '.950000', '1.00000', '1.50000']
Test;
''.join([values[1:][start:start+7] for start in range(0,len(values[1:]),7)]) == values[1:]
True
With a fixed / variable string, you may try something like:
values = '0.850000.900000.9500001.000001.50000'
str_list = []
first_index = values.find('.')
while first_index > 0:
last_index = values.find('.', first_index + 1)
if last_index != -1:
str_list.append(values[first_index - 1: last_index - 2])
first_index = last_index
else:
str_list.append(values[first_index - 1: len(values) - 1])
break
print str_list
Output:
['0.8500', '0.9000', '0.95000', '1.0000', '1.5000']
Assuming that there will always be a single digit before the decimal.
Please take this as a starting point and not a copy paste solution.

Python replace 3 random characters in a string with no duplicates

I need change 3 random characters in a string using Python, example string:
Adriano Celentano
Luca Valentina
I need to replace 3 characters, not replacing with the same character or number, not replacing space. How can I do this using Python ?
Need output like this :
adraano cettntano
lacr vilenntina
I don't know from where i can start to make this.
My code so far:
for i in xrange(4):
for n in nume :
print n.replace('$', random.choice(string.letters)).replace('#', random.choice(string.letters))
If you just want to change chars that are not whitespace and not the same char in regards to index, you can first pull the indexes where the non-whitespace chars are:
import random
inds = [i for i,_ in enumerate(s) if not s.isspace()]
print(random.sample(inds,3))
Then use those indexes to replace.
s = "Adriano Celentano"
import random
inds = [i for i,_ in enumerate(s) if not s.isspace()]
sam = random.sample(inds, 3)
from string import ascii_letters
lst = list(s)
for ind in sam:
lst[ind] = random.choice(ascii_letters)
print("".join(lst))
If you want a unique char each time to replace with also:
s = "Adriano Celentano"
import random
from string import ascii_letters
inds = [i for i,_ in enumerate(s) if not s.isspace()]
sam = random.sample(inds, 3)
letts = iter(random.sample(ascii_letters, 3))
lst = list(s)
for ind in sam:
lst[ind] = next(letts)
print("".join(lst))
output:
Adoiano lelenhano
You can do this in two stages. In the first stage you pick 3 random positions in your string that meet your search criteria (isalnum):
import random
import string
replacement_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
# replacement_chars = string.letters + string.digits
input = 'Adriano Celentano'
input_list = list(input)
input_dic = dict(enumerate(input_list))
valid_positions=[key for key in input_dic if input_dic[key].isalnum()]
random_positions=random.sample(valid_positions,3)
In the second part you generate 3 random characters and replace the characters in the previously selected positions. I have added a while loop to generate a new random character if it matches the existing value
random_chars = random.sample(replacement_chars,len(random_positions))
char_counter = 0
for position in random_positions:
#check if the replacement character matches the existing one
#and generate another one if needed
while input_list[position]==random_chars[char_counter]:
random_chars[char_counter] = random.choice(replacement_chars)
input_list[position]=random_chars[char_counter]
char_counter = char_counter + 1
print "".join(input_list).lower()

removing non-numeric characters from a string

strings = ["1 asdf 2", "25etrth", "2234342 awefiasd"] #and so on
Which is the easiest way to get [1, 25, 2234342]?
How can this be done without a regex module or expression like (^[0-9]+)?
One could write a helper function to extract the prefix:
def numeric_prefix(s):
n = 0
for c in s:
if not c.isdigit():
return n
else:
n = n * 10 + int(c)
return n
Example usage:
>>> strings = ["1asdf", "25etrth", "2234342 awefiasd"]
>>> [numeric_prefix(s) for s in strings]
[1, 25, 2234342]
Note that this will produce correct output (zero) when the input string does not have a numeric prefix (as in the case of empty string).
Working from Mikel's solution, one could write a more concise definition of numeric_prefix:
import itertools
def numeric_prefix(s):
n = ''.join(itertools.takewhile(lambda c: c.isdigit(), s))
return int(n) if n else 0
new = []
for item in strings:
new.append(int(''.join(i for i in item if i.isdigit())))
print new
[1, 25, 2234342]
Basic usage of regular expressions:
import re
strings = ["1asdf", "25etrth", "2234342 awefiasd"]
regex = re.compile('^(\d*)')
for s in strings:
mo = regex.match(s)
print s, '->', mo.group(0)
1asdf -> 1
25etrth -> 25
2234342 awefiasd -> 2234342
Building on sahhhm's answer, you can fix the "1 asdf 1" problem by using takewhile.
from itertools import takewhile
def isdigit(char):
return char.isdigit()
numbers = []
for string in strings:
result = takewhile(isdigit, string)
resultstr = ''.join(result)
if resultstr:
number = int(resultstr)
if number:
numbers.append(number)
So you only want the leading digits? And you want to avoid regexes? Probably there's something shorter but this is the obvious solution.
nlist = []
for s in strings:
if not s or s[0].isalpha(): continue
for i, c in enumerate(s):
if not c.isdigit():
nlist.append(int(s[:i]))
break
else:
nlist.append(int(s))

Categories