Say I have a string 'Area51' and an array ['0051'], how would I go about replacing the 51 in the string with the array so that the output reads 'Area0051'. Assume that I have another function that finds my transform_array but it's not significant to this code.
string = 'Area51'
transformation_array = ['0051']
Ideally, this would extend to examples such as:
string = '22Area51'
transform_array = ['0022','0051']
# Outputting -> '0022Area0051'
I know strings are immutable so I have to create a new string and can't use replace.
I was thinking something along the lines of:
import re
string = '22Area51'
nums = re.findall("(\d+",string)
transform_array = ['0022','0051']
new_string = ''
for i in range(len(nums)):
k = s.index(nums[i])
new_string += string[s[:k] + transform_array[i]
But this would output:
First iteration:
>>> '0022Area51'
Second iteration
>>> '22Area0051'
I can't seem to wrap my mind on how to put it together. Any guidance would be greatly appreciated.
You can use itertools.cycle (doc) and re.sub with custom sub function:
string = '22Area51'
transform_array = ['0022','0051']
import re
from itertools import cycle
c = cycle(transform_array)
print(re.sub(r'\d+', lambda g: next(c), string))
Prints:
0022Area0051
Or, if number of digit groups matches the length of transform array:
import re
c = iter(transform_array)
print(re.sub(r'\d+', lambda g: next(c), string))
With simple builtin iter feature:
import re
string = '22Area51'
transform_array = ['0022','0051']
tr_arr_iter = iter(transform_array) # prepare iterator
res = re.sub(r'\d+', lambda n: next(tr_arr_iter), string)
print(res) # 0022Area0051
import re
string = '22Area51'
transform_array = ['0022', '0051']
new_string = string
nums = re.findall(r'\d+', string)
for num in nums:
for el in transform_array:
if num in el:
new_string = new_string.replace(num, el)
print(new_string) #0022Area0051
Related
I need help in writing the Python code which would return the following output_string as mentioned below in the examples.
Example 1:
input_string = "AAABCCCCDDA"
output_string = "3AB4C2DA"
Example 2:
input_string = "ABBBBCCDDDDAAAAA"
output_string = "A4B2C4D5A"
You can use itertools.groupby.
In python 3.8+, You can use walrus operator (:=) and write a short approach.
>>> from itertools import groupby
>>> input_string = "ABBBBCCDDDDAAAAA"
>>> ''.join(f"{len_g}{k}" if (len_g := len(list(g))) > 1 else k for k, g in groupby(input_string))
'A4B2C4D5A'
In Python < 3.8:
from itertools import groupby
input_string = "AAABCCCCDDA"
st = ''
for k, g in groupby(input_string):
len_g = len(list(g))
if len_g>1:
st += f"{len_g}{k}"
else:
st += k
print(st)
Output:'3AB4C2DA'
it seems like regex also can do the trick:
from re import sub
dna = "AAABCCCCDDA"
sub(r'(\w)\1+',lambda m: str(len(m[0]))+m[1],dna) # '3AB4C2DA'
I would like to know how to remove certain characters from a list of strings.
In this case, I am trying to remove numbers of type str using list comprehension.
numbers = [str(i) for i in range(10)]
imgs_paths = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.jpg')]
foo_imgs_paths = [[e.replace(c, "") for c in e if c not in numbers] for e in imgs_paths]
The code above does not work, because it returns completely empty lists.
Option 1
If I understand your question right, a function might simplify it more than nested comprehension.
"doj394no.jpg".replace("0","").replace("1","")... # "dojno.jpg"
If you have your list of files and list of characters to remove:
files = [...]
numbers = "01234556789"
def remove_chars(original, chars_to_remove):
for char in chars_to_remove:
original = original.replace(char, "")
return original
new_files = [remove_chars(file, numbers) for file in files]
Option 2
If you really want to use comprehensions, you can use them to filter letters out without replace:
numbers = "0123456789"
filename = "log234.txt"
[char for char in filename if char not in numbers] # ["l","o","g",".","t","x","t"]
# To return it to a string:
"".join([char for char in filename if char not in numbers]) # "log.txt"
In your case, it would be like so:
numbers = [str(i) for i in range(10)]
imgs_paths = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.jpg')]
foo_imgs_paths = [
"".join(char for char in img_path if char not in numbers)
for img_path in img_paths
]
Why not use a regular expression?
import re
re.sub(r'\d+', '', 'lo2g4.jpg')
'log.jpg'
Just provide another solution:
old_str = "S11imone22.jpg"
new_str = old_str.translate(str.maketrans("", "", "0123456789"))
print(new_str) # Simone.jpg
I still prefer the re solution, which is faster
So I'm trying to make a python script that takes a pattern (ex: c**l) where it'll return every iteration of the string (* = any character in the alphabet)...
So, we get something like: caal, cbal, ccal and so forth.
I've tried using the itertools library's product but I haven't been able to make it work properly. So after 2 hours I've decide to turn to Stack Overflow.
Here's my current code. It's not complete since I feel stuck
alphabet = list('abcdefghijklmnopqrstuvwxyz')
wildChar = False
tmp_string = ""
combinations = []
if '*' in pattern:
wildChar = True
tmp_string = pattern.replace('*', '', pattern.count('*')+1)
if wildChar:
tmp = []
for _ in range(pattern.count('*')):
tmp.append(list(product(tmp_string, alphabet)))
for array in tmp:
for instance in array:
combinations.append("".join(instance))
tmp = []
print(combinations)
You could try:
from itertools import product
from string import ascii_lowercase
pattern = "c**l"
repeat = pattern.count("*")
pattern = pattern.replace("*", "{}")
for letters in product(ascii_lowercase, repeat=repeat):
print(pattern.format(*letters))
Result:
caal
cabl
cacl
...
czxl
czyl
czzl
Use itertools.product
import itertools
import string
s = 'c**l'
l = [c if c != '*' else string.ascii_lowercase) for c in s]
out = [''.join(c) for c in itertools.product(*l)]
Output:
>>> out
['caal',
'cabl',
'cacl',
'cadl',
'cael',
'cafl',
'cagl',
'cahl',
'cail',
'cajl'
...
Let's assume I have a list like this
List=["Face123","Body234","Face565"]
I would like to obtain as output a list without character/substring described in another list.
NonDesideredPattern["Face","Body"]
Output=[123,234,565].
Create a function which returns a string without the undesired patterns.
Then use this function in a comprehension list:
import re
def remove_pattern(string, patterns):
result = string
for p in patterns:
result = re.sub(p, '', result)
return result
inputs = ["Face123", "Body234", "Face565"]
undesired_patterns = ["Face", "Body"]
outputs = [remove_pattern(e, undesired_patterns) for e in inputs]
I am not sure, this is 100% efficient, but you could do something like this:
def eval_list(og_list):
list_parts = []
list_nums = []
for element in og_list:
part = ""
num = ""
for char in element:
if char.isalpha():
part += char
else:
num += char
list_parts.append(part)
list_nums.append(num)
return list_parts, list_nums
(if you are always working with alphabetical syntax and then a number)
Use re.compile and re.sub
import re
lst = ["Face123", "Body234", "Face565"]
lst_no_desired_pattern = ["Face","Body"]
pattern = re.compile("|".join(lst_no_desired_pattern))
lst_output = [re.sub(pattern, "", word) for word in lst]
Result:
['123', '234', '565']
strings = ["1 asdf 2", "25etrth", "2234342 awefiasd"] #and so on
Which is the easiest way to get [1, 25, 2234342]?
How can this be done without a regex module or expression like (^[0-9]+)?
One could write a helper function to extract the prefix:
def numeric_prefix(s):
n = 0
for c in s:
if not c.isdigit():
return n
else:
n = n * 10 + int(c)
return n
Example usage:
>>> strings = ["1asdf", "25etrth", "2234342 awefiasd"]
>>> [numeric_prefix(s) for s in strings]
[1, 25, 2234342]
Note that this will produce correct output (zero) when the input string does not have a numeric prefix (as in the case of empty string).
Working from Mikel's solution, one could write a more concise definition of numeric_prefix:
import itertools
def numeric_prefix(s):
n = ''.join(itertools.takewhile(lambda c: c.isdigit(), s))
return int(n) if n else 0
new = []
for item in strings:
new.append(int(''.join(i for i in item if i.isdigit())))
print new
[1, 25, 2234342]
Basic usage of regular expressions:
import re
strings = ["1asdf", "25etrth", "2234342 awefiasd"]
regex = re.compile('^(\d*)')
for s in strings:
mo = regex.match(s)
print s, '->', mo.group(0)
1asdf -> 1
25etrth -> 25
2234342 awefiasd -> 2234342
Building on sahhhm's answer, you can fix the "1 asdf 1" problem by using takewhile.
from itertools import takewhile
def isdigit(char):
return char.isdigit()
numbers = []
for string in strings:
result = takewhile(isdigit, string)
resultstr = ''.join(result)
if resultstr:
number = int(resultstr)
if number:
numbers.append(number)
So you only want the leading digits? And you want to avoid regexes? Probably there's something shorter but this is the obvious solution.
nlist = []
for s in strings:
if not s or s[0].isalpha(): continue
for i, c in enumerate(s):
if not c.isdigit():
nlist.append(int(s[:i]))
break
else:
nlist.append(int(s))