Remove characters from list of strings using comprehension - python

I would like to know how to remove certain characters from a list of strings.
In this case, I am trying to remove numbers of type str using list comprehension.
numbers = [str(i) for i in range(10)]
imgs_paths = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.jpg')]
foo_imgs_paths = [[e.replace(c, "") for c in e if c not in numbers] for e in imgs_paths]
The code above does not work, because it returns completely empty lists.

Option 1
If I understand your question right, a function might simplify it more than nested comprehension.
"doj394no.jpg".replace("0","").replace("1","")... # "dojno.jpg"
If you have your list of files and list of characters to remove:
files = [...]
numbers = "01234556789"
def remove_chars(original, chars_to_remove):
for char in chars_to_remove:
original = original.replace(char, "")
return original
new_files = [remove_chars(file, numbers) for file in files]
Option 2
If you really want to use comprehensions, you can use them to filter letters out without replace:
numbers = "0123456789"
filename = "log234.txt"
[char for char in filename if char not in numbers] # ["l","o","g",".","t","x","t"]
# To return it to a string:
"".join([char for char in filename if char not in numbers]) # "log.txt"
In your case, it would be like so:
numbers = [str(i) for i in range(10)]
imgs_paths = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.jpg')]
foo_imgs_paths = [
"".join(char for char in img_path if char not in numbers)
for img_path in img_paths
]

Why not use a regular expression?
import re
re.sub(r'\d+', '', 'lo2g4.jpg')
'log.jpg'

Just provide another solution:
old_str = "S11imone22.jpg"
new_str = old_str.translate(str.maketrans("", "", "0123456789"))
print(new_str) # Simone.jpg
I still prefer the re solution, which is faster

Related

How do I convert a text file containing a list of lists to a string?

The .txt file has a string like this:
[[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]]9.5
My goal is to separate that final number from the list and then turn each of them into a list of lists and a float respectively. I've managed to seperate them but they are still strings and I can't convert them...
Here's the code I have:
def string_to_list(file):
for i in os.listdir(path):
if i == file:
openfile = open(path5+'/'+ i, 'r')
values = openfile.read()
p = ']]'
print(values)
print()
bef, sep, after= values.partition(p)
string1 = values.replace(after,'')
print(string1)
print()
print(after)
The output is, using the previous exemple:
[[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]]9.5
[[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]]
9.5
But they are all strings yet.
How can I make this work?
Thank you
ast.literal_eval can do this. json.loads could, as well.
import ast
s = "[[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]]9.5"
i = s.rfind(']')
l = ast.literal_eval(s[:i+1])
o = float(s[i+1:])
print(l, o)
Here is a simple way that only uses list append and loops:
x = list(a[1:len(a)-1]) # remove the outisde brackets
res = []
tmp = []
for e in x:
if e == ']':
res.append(tmp)
tmp = []
continue
if e not in ('[', ',', ' ', ''):
tmp.append(int(e))
You can also use the eval() function after getting the string1 and after values in your code.
myList = eval(string1) #type(myList) will give you list
myFloat = eval(after) #type(myFloat) will give you float

Remove a pattern from list element and return another list in Python

Let's assume I have a list like this
List=["Face123","Body234","Face565"]
I would like to obtain as output a list without character/substring described in another list.
NonDesideredPattern["Face","Body"]
Output=[123,234,565].
Create a function which returns a string without the undesired patterns.
Then use this function in a comprehension list:
import re
def remove_pattern(string, patterns):
result = string
for p in patterns:
result = re.sub(p, '', result)
return result
inputs = ["Face123", "Body234", "Face565"]
undesired_patterns = ["Face", "Body"]
outputs = [remove_pattern(e, undesired_patterns) for e in inputs]
I am not sure, this is 100% efficient, but you could do something like this:
def eval_list(og_list):
list_parts = []
list_nums = []
for element in og_list:
part = ""
num = ""
for char in element:
if char.isalpha():
part += char
else:
num += char
list_parts.append(part)
list_nums.append(num)
return list_parts, list_nums
(if you are always working with alphabetical syntax and then a number)
Use re.compile and re.sub
import re
lst = ["Face123", "Body234", "Face565"]
lst_no_desired_pattern = ["Face","Body"]
pattern = re.compile("|".join(lst_no_desired_pattern))
lst_output = [re.sub(pattern, "", word) for word in lst]
Result:
['123', '234', '565']

How to create new string by indexing?

Say I have a string 'Area51' and an array ['0051'], how would I go about replacing the 51 in the string with the array so that the output reads 'Area0051'. Assume that I have another function that finds my transform_array but it's not significant to this code.
string = 'Area51'
transformation_array = ['0051']
Ideally, this would extend to examples such as:
string = '22Area51'
transform_array = ['0022','0051']
# Outputting -> '0022Area0051'
I know strings are immutable so I have to create a new string and can't use replace.
I was thinking something along the lines of:
import re
string = '22Area51'
nums = re.findall("(\d+",string)
transform_array = ['0022','0051']
new_string = ''
for i in range(len(nums)):
k = s.index(nums[i])
new_string += string[s[:k] + transform_array[i]
But this would output:
First iteration:
>>> '0022Area51'
Second iteration
>>> '22Area0051'
I can't seem to wrap my mind on how to put it together. Any guidance would be greatly appreciated.
You can use itertools.cycle (doc) and re.sub with custom sub function:
string = '22Area51'
transform_array = ['0022','0051']
import re
from itertools import cycle
c = cycle(transform_array)
print(re.sub(r'\d+', lambda g: next(c), string))
Prints:
0022Area0051
Or, if number of digit groups matches the length of transform array:
import re
c = iter(transform_array)
print(re.sub(r'\d+', lambda g: next(c), string))
With simple builtin iter feature:
import re
string = '22Area51'
transform_array = ['0022','0051']
tr_arr_iter = iter(transform_array) # prepare iterator
res = re.sub(r'\d+', lambda n: next(tr_arr_iter), string)
print(res) # 0022Area0051
import re
string = '22Area51'
transform_array = ['0022', '0051']
new_string = string
nums = re.findall(r'\d+', string)
for num in nums:
for el in transform_array:
if num in el:
new_string = new_string.replace(num, el)
print(new_string) #0022Area0051

Add ": " for every nth character in a list

Say i have this:
x = ["hello-543hello-454hello-765", "hello-745hello-635hello-321"]
how can i get the output to:
["hello-543: hello-454: hello-765", "hello-745: hello-635: hello-321"]
You can split each string based on substring length with a list comprehension using range where the step value is the number of characters each substring should contain. Then use join to convert each list back to a string with the desired separator characters.
x = ["hello-543hello-454hello-765", "hello-745hello-635hello-321"]
n = 9
result = [': '.join([s[i:i+n] for i in range(0, len(s), n)]) for s in x]
print(result)
# ['hello-543: hello-454: hello-765', 'hello-745: hello-635: hello-321']
Or with textwrap.wrap:
from textwrap import wrap
x = ["hello-543hello-454hello-765", "hello-745hello-635hello-321"]
n = 9
result = [': '.join(wrap(s, n)) for s in x]
print(result)
# ['hello-543: hello-454: hello-765', 'hello-745: hello-635: hello-321']
If you are sure every str length is multiply of your n, I would use re.findall for that task.
import re
txt1 = "hello-543hello-454hello-765"
txt2 = "hello-745hello-635hello-321"
out1 = ": ".join(re.findall(r'.{9}',txt1))
out2 = ": ".join(re.findall(r'.{9}',txt2))
print(out1) #hello-543: hello-454: hello-765
print(out2) #hello-745: hello-635: hello-321
.{9} in re.findall mean 9 of any characters excluding newline (\n), so this code would work properly as long as your strs do not contain \n. If this does not hold true you need to add re.DOTALL as third argument of re.findall

removing non-numeric characters from a string

strings = ["1 asdf 2", "25etrth", "2234342 awefiasd"] #and so on
Which is the easiest way to get [1, 25, 2234342]?
How can this be done without a regex module or expression like (^[0-9]+)?
One could write a helper function to extract the prefix:
def numeric_prefix(s):
n = 0
for c in s:
if not c.isdigit():
return n
else:
n = n * 10 + int(c)
return n
Example usage:
>>> strings = ["1asdf", "25etrth", "2234342 awefiasd"]
>>> [numeric_prefix(s) for s in strings]
[1, 25, 2234342]
Note that this will produce correct output (zero) when the input string does not have a numeric prefix (as in the case of empty string).
Working from Mikel's solution, one could write a more concise definition of numeric_prefix:
import itertools
def numeric_prefix(s):
n = ''.join(itertools.takewhile(lambda c: c.isdigit(), s))
return int(n) if n else 0
new = []
for item in strings:
new.append(int(''.join(i for i in item if i.isdigit())))
print new
[1, 25, 2234342]
Basic usage of regular expressions:
import re
strings = ["1asdf", "25etrth", "2234342 awefiasd"]
regex = re.compile('^(\d*)')
for s in strings:
mo = regex.match(s)
print s, '->', mo.group(0)
1asdf -> 1
25etrth -> 25
2234342 awefiasd -> 2234342
Building on sahhhm's answer, you can fix the "1 asdf 1" problem by using takewhile.
from itertools import takewhile
def isdigit(char):
return char.isdigit()
numbers = []
for string in strings:
result = takewhile(isdigit, string)
resultstr = ''.join(result)
if resultstr:
number = int(resultstr)
if number:
numbers.append(number)
So you only want the leading digits? And you want to avoid regexes? Probably there's something shorter but this is the obvious solution.
nlist = []
for s in strings:
if not s or s[0].isalpha(): continue
for i, c in enumerate(s):
if not c.isdigit():
nlist.append(int(s[:i]))
break
else:
nlist.append(int(s))

Categories