Python, context sensitive string substitution - python

Is it possible to do something like this in Python using regular expressions?
Increment every character that is a number in a string by 1
So input "123ab5" would become "234ab6"
I know I could iterate over the string and manually increment each character if it's a number, but this seems unpythonic.
note. This is not homework. I've simplified my problem down to a level that sounds like a homework exercise.

a = "123ab5"
b = ''.join(map(lambda x: str(int(x) + 1) if x.isdigit() else x, a))
or:
b = ''.join(str(int(x) + 1) if x.isdigit() else x for x in a)
or:
import string
b = a.translate(string.maketrans('0123456789', '1234567890'))
In any of these cases:
# b == "234ab6"
EDIT - the first two map 9 to a 10, the last one wraps it to 0. To wrap the first two into zero, you will have to replace str(int(x) + 1) with str((int(x) + 1) % 10)

>>> test = '123ab5'
>>> def f(x):
try:
return str(int(x)+1)
except ValueError:
return x
>>> ''.join(map(f,test))
'234ab6'

>>> a = "123ab5"
>>> def foo(n):
... try: n = int(n)+1
... except ValueError: pass
... return str(n)
...
>>> a = ''.join(map(foo, a))
>>> a
'234ab6'
by the way with a simple if or with try-catch eumiro solution with join+map is the more pythonic solution for me too

Related

How to get portion of string from 2 different strings and concat?

I have 2 strings a and b with - as delimiter, want to get 3rd string by concatenating the substring upto last % from a (which is one-two-three-%whatever% in below example) and from string b, drop the substring upto number of dashes found in resultant string (which is 4 in below e.g., that gives bar-bazz), I did this so far, is there a better way?
>>> a='one-two-three-%whatever%-foo-bar'
>>> b='1one-2two-3three-4four-bar-bazz'
>>> k="%".join(a.split('%')[:-1]) + '%-'
>>> k
'one-two-three-%whatever%-'
>>> k.count('-')
4
>>> y=b.split("-",k.count('-'))[-1]
>>> y
'bar-bazz'
>>> k+y
'one-two-three-%whatever%-bar-bazz'
>>>
An alternative approach using Regex:
import re
a = 'one-two-three-%whatever%-foo-bar'
b = '1one-2two-3three-4four-bar-bazz'
part1 = re.findall(r".*%-",a)[0] # one-two-three-%whatever%-
num = part1.count("-") # 4
part2 = re.findall(r"\w+",b) # ['1one', '2two', '3three', '4four', 'bar', 'bazz']
part2 = '-'.join(part2[num:]) # bar-bazz
print(part1+part2) # one-two-three-%whatever%-bar-bazz
For the first substring obtained from a, you can use rsplit():
k = a.rsplit('%', 1)[0] + '%-'
The rest look good to me
Maybe a little shorter ?
a = 'one-two-three-%whatever%-foo-bar'
b = '1one-2two-3three-4four-bar-bazz'
def merge (a,b):
res = a[:a.rfind ('%')+1]+'-'
return (res + "-".join (b.split ("-")[res.count ('-'):]))
print (merge (a,b) == 'one-two-three-%whatever%-bar-bazz')
I personally get nervous when I need to manually increment indexes or concatenate bare strings.
This answer is pretty similar to hingev's, just without the additional concat/addition operators.
t = "-"
ix = list(reversed(a)).index("%")
t.join([s] + b.split(t)[len(a[:-ix].split(t)):])
yet another possible answer:
def custom_merge(a, b):
result = []
idx = 0
for x in itertools.zip_longest(a.split('-'), b.split('-')):
result.append(x[idx])
if x[0][0] == '%' == x[0][-1]:
idx = 1
return "-".join(result)
Your question is specific enough that you might be optimizing the wrong thing (a smaller piece of a bigger problem). That being said, one way that feels easier to follow, and avoids some of the repeated linear traversals (splits and joins and counts) would be this:
def funky_percent_join(a, b):
split_a = a.split('-')
split_b = b.split('-')
breakpoint = 0 # len(split_a) if all of a should be used on no %
for neg, segment in enumerate(reversed(split_a)):
if '%' in segment:
breakpoint = len(split_a) - neg
break
return '-'.join(split_a[:breakpoint] + split_b[breakpoint:])
and then
>>> funky_percent_join('one-two-three-%whatever%-foo-bar', '1one-2two-3three-4four-bar-bazz')
'one-two-three-%whatever%-bar-bazz'
print(f"one-two-three-%{''.join(a.split('%')[1])}%")
that would work for the first, and then you could do the same for the second, and when you're ready to concat, you can do:
part1 = str(f"one-two-three-%{''.join(a.split('%')[1])}%")
part2 = str(f"-{''.join(b.split('-')[-2])}-{''.join(b.split('-')[-1])}")
result = part1+part2
this way it'll grab whatever you set the a/b variables to, provided they follow the same format.
but then again, why not do something like:
result = str(a[:-8] + b[22:])

Exchanging characters in a string

I need to exchange the middle character in a numeric string of 15 numbers with the last number of the string.
So I get that this:
def string(str):
return str[-1:] + str[1:-1] + str[:1]
print(string('abcd'))
print(string('12345'))
RESULTS:
dbca
52341
But how can I make it so that in the initial input string, 012345678912345,
where the 7 is exchanged with the last character in the string 5?
Consider
def last_to_mid(s):
if len(s) == 1:
return s
if len(s)%2 == 0:
raise ValueError('expected string of odd length')
idx = len(s)//2
return f'{s[:idx]}{s[-1]}{s[idx+1:-1]}{s[idx]}'
operating like this:
>>> last_to_mid('021')
'012'
>>> last_to_mid('0123x4567')
'01237456x'
>>> last_to_mid('1')
'1'
Assuming you have Python 3.6 or newer for f-strings.
You can have a function for this:
In [178]: def swap_index_values(my_string):
...: l = list(my_string)
...: middleIndex = (len(l) - 1)/2
...: middle_val = l[middleIndex]
...: l[middleIndex] = l[-1]
...: l[-1] = middle_val
...: return ''.join(l)
...:
In [179]:
In [179]: a
Out[179]: '012345678912345'
In [180]: swap_index_values(a)
Out[180]: '012345658912347'
Above, you can see that middle value and last values have been exchanged.
In this very specific context (always the middle and last character of a string of length 15), your initial approach can be extended to:
text[0:7]+text[-1]+text[8:-1]+text[7]
Also try to avoid variable names like str, since they shadow the function of the same name.
s1='1243125'
s2=s1[:len(s1)//2] + s1[-1] + s1[len(s1)//2 + 1:]
print(s2)
'1245125'

Printing alphabets advanced by n in Python

how can i write a python program to intake some alphabets in and print out (alphabets+n) in the output. Example
my_string = 'abc'
expected_output = 'cde' # n=2
One way I've thought is by using str.maketrans, and mapping the original input to (alphabets + n). Is there any other way?
PS: xyz should translate to abc
I've tried to write my own code as well for this, (apart from the infinitely better answers mentioned):
number = 2
prim = """abc! fgdf """
final = prim.lower()
for x in final:
if(x =="y"):
print("a", end="")
elif(x=="z"):
print("b", end="")
else:
conv = ord(x)
x = conv+number
print(chr(x),end="")
Any comments on how to not convert special chars? thanks
If you don't care about wrapping around, you can just do:
def shiftString(string, number):
return "".join(map(lambda x: chr(ord(x)+number),string))
If you do want to wrap around (think Caesar chiffre), you'll need to specify a start and an end of where the alphabet begins and ends:
def shiftString(string, number, start=97, num_of_symbols=26):
return "".join(map(lambda x: chr(((ord(x)+number-start) %
num_of_symbols)+start) if start <= ord(x) <= start+num_of_symbols
else x,string))
That would, e.g., convert abcxyz, when given a shift of 2, into cdezab.
If you actually want to use it for "encryption", make sure to exclude non-alphabetic characters (like spaces etc.) from it.
edit: Shameless plug of my Vignère tool in Python
edit2: Now only converts in its range.
How about something like
>>> my_string = "abc"
>>> n = 2
>>> "".join([ chr(ord(i) + n) for i in my_string])
'cde'
Note As mentioned in comments the question is bit vague about what to do when the edge cases are encoundered like xyz
Edit To take care of edge cases, you can write something like
>>> from string import ascii_lowercase
>>> lower = ascii_lowercase
>>> input = "xyz"
>>> "".join([ lower[(lower.index(i)+2)%26] for i in input ])
'zab'
>>> input = "abc"
>>> "".join([ lower[(lower.index(i)+2)%26] for i in input ])
'cde'
I've made the following change to the code:
number = 2
prim = """Special() ops() chars!!"""
final = prim.lower()
for x in final:
if(x =="y"):
print("a", end="")
elif(x=="z"):
print("b", end="")
elif (ord(x) in range(97, 124)):
conv = ord(x)
x = conv+number
print(chr(x),end="")
else:
print(x, end="")
**Output**: urgekcn() qru() ejctu!!
test_data = (('abz', 2), ('abc', 3), ('aek', 26), ('abcd', 25))
# translate every character
def shiftstr(s, k):
if not (isinstance(s, str) and isinstance(k, int) and k >=0):
return s
a = ord('a')
return ''.join([chr(a+((ord(c)-a+k)%26)) for c in s])
for s, k in test_data:
print(shiftstr(s, k))
print('----')
# translate at most 26 characters, rest look up dictionary at O(1)
def shiftstr(s, k):
if not (isinstance(s, str) and isinstance(k, int) and k >=0):
return s
a = ord('a')
d = {}
l = []
for c in s:
v = d.get(c)
if v is None:
v = chr(a+((ord(c)-a+k)%26))
d[c] = v
l.append(v)
return ''.join(l)
for s, k in test_data:
print(shiftstr(s, k))
Testing shiftstr_test.py (above code):
$ python3 shiftstr_test.py
cdb
def
aek
zabc
----
cdb
def
aek
zabc
It covers wrapping.

python - string match only whole words

I have two lists - query and line. My code finds if a query such as:
["president" ,"publicly"]
Is contained in a line (order matters) such as:
["president" ,"publicly", "told"]
And this is the code I'm currently using:
if ' '.join(query) in ' '.join(line)
Problem is, I want to match whole words only. So the query below won't pass the condition statement:
["president" ,"pub"]
How can I do that?
Here is one way:
re.search(r'\b' + re.escape(' '.join(query)) + r'\b', ' '.join(line)) is not None
Just use the "in" operator:
mylist = ['foo', 'bar', 'baz']
'foo' in mylist -> returns True
'bar' in mylist -> returns True
'fo' in mylist -> returns False
'ba' in mylist -> returns False
You could use regexes and the \b word boundaries:
import re
the_regex = re.compile(r'\b' + r'\b'.join(map(re.escape, ['president', 'pub'])) + r'\b')
if the_regex.search(' '.join(line)):
print 'matching'
else:
print 'not matching'
As an alternative you can write a function to check if a given list is a sublist of the line. Something like:
def find_sublist(sub, lst):
if not sub:
return 0
cur_index = 0
while cur_index < len(lst):
try:
cur_index = lst.index(sub[0], cur_index)
except ValueError:
break
if lst[cur_index:cur_index + len(sub)] == sub:
break
lst = lst[cur_index + 1:]
return cur_index
Which you can use as:
if find_sublist(query, line) >= 0:
print 'matching'
else:
print 'not matching'
Just for fun you can also do:
a = ["president" ,"publicly", "told"]
b = ["president" ,"publicly"]
c = ["president" ,"pub"]
d = ["publicly", "president"]
e = ["publicly", "told"]
from itertools import izip
not [l for l,n in izip(a, b) if l != n] ## True
not [l for l,n in izip(a, c) if l != n] ## False
not [l for l,n in izip(a, d) if l != n] ## False
## to support query in the middle of the line:
try:
query_list = a[a.index(e[0]):]
not [l for l,n in izip(query_list, e) if l != n] ## True
expect ValueError:
pass
you can use issubset method to achieve this. Simply do:
a = ["president" ,"publicly"]
b = ["president" ,"publicly", "told"]
if set(a).issubset(b):
#bla bla
this will return matching items in both lists.
You can use the all built in quantor function:
if all(word in b for word in a):
""" all words in list"""
Note that this may not be run time efficient for long lists. Better use set type instead of list for a (list list of words to search in).
Here is a non-regex way of doing it. I'm sure regex would be much faster than this:
>>> query = ['president', 'publicly']
>>> line = ['president', 'publicly', 'told']
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
True
>>> query = ["president" ,"pub"]
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
False
Explicit is better than implicit. And as ordering matters, I would write it down like this:
query = ['president','publicly']
query_false = ['president','pub']
line = ['president','publicly','told']
query_len = len(query)
blocks = [line[i:i+query_len] for i in xrange(len(line)-query_len+1)]
blocks holds all relevant combinations to check for:
[['president', 'publicly'], ['publicly', 'told']]
Now you can simply check if your query is in that list:
print query in blocks # -> True
print query_false in blocks # -> False
The code works the way you would probably explain the straight forward solution in words, which is usually a good sign to me. If you have long lines and performance becomes a problem, you can replace the generated list by a generator.

Find the index of the first digit in a string

I have a string like
"xdtwkeltjwlkejt7wthwk89lk"
how can I get the index of the first digit in the string?
Use re.search():
>>> import re
>>> s1 = "thishasadigit4here"
>>> m = re.search(r"\d", s1)
>>> if m:
... print("Digit found at position", m.start())
... else:
... print("No digit in that string")
...
Digit found at position 13
Here is a better and more flexible way, regex is overkill here.
s = 'xdtwkeltjwlkejt7wthwk89lk'
for i, c in enumerate(s):
if c.isdigit():
print(i)
break
output:
15
To get all digits and their positions, a simple expression will do
>>> [(i, c) for i, c in enumerate('xdtwkeltjwlkejt7wthwk89lk') if c.isdigit()]
[(15, '7'), (21, '8'), (22, '9')]
Or you can create a dict of digit and its last position
>>> {c: i for i, c in enumerate('xdtwkeltjwlkejt7wthwk89lk') if c.isdigit()}
{'9': 22, '8': 21, '7': 15}
Thought I'd toss my method on the pile. I'll do just about anything to avoid regex.
sequence = 'xdtwkeltjwlkejt7wthwk89lk'
i = [x.isdigit() for x in sequence].index(True)
To explain what's going on here:
[x.isdigit() for x in sequence] is going to translate the string into an array of booleans representing whether each character is a digit or not
[...].index(True) returns the first index value that True is found in.
Seems like a good job for a parser:
>>> from simpleparse.parser import Parser
>>> s = 'xdtwkeltjwlkejt7wthwk89lk'
>>> grammar = """
... integer := [0-9]+
... <alpha> := -integer+
... all := (integer/alpha)+
... """
>>> parser = Parser(grammar, 'all')
>>> parser.parse(s)
(1, [('integer', 15, 16, None), ('integer', 21, 23, None)], 25)
>>> [ int(s[x[1]:x[2]]) for x in parser.parse(s)[1] ]
[7, 89]
import re
first_digit = re.search('\d', 'xdtwkeltjwlkejt7wthwk89lk')
if first_digit:
print(first_digit.start())
To get all indexes do:
idxs = [i for i in range(0, len(string)) if string[i].isdigit()]
Then to get the first index do:
if len(idxs):
print(idxs[0])
else:
print('No digits exist')
As the other solutions say, to find the index of the first digit in the string we can use regular expressions:
>>> s = 'xdtwkeltjwlkejt7wthwk89lk'
>>> match = re.search(r'\d', s)
>>> print match.start() if match else 'No digits found'
15
>>> s[15] # To show correctness
'7'
While simple, a regular expression match is going to be overkill for super-long strings. A more efficient way is to iterate through the string like this:
>>> for i, c in enumerate(s):
... if c.isdigit():
... print i
... break
...
15
In case we wanted to extend the question to finding the first integer (not digit) and what it was:
>>> s = 'xdtwkeltjwlkejt711wthwk89lk'
>>> for i, c in enumerate(s):
... if c.isdigit():
... start = i
... while i < len(s) and s[i].isdigit():
... i += 1
... print 'Integer %d found at position %d' % (int(s[start:i]), start)
... break
...
Integer 711 found at position 15
In Python 3.8+ you can use re.search to look for the first \d (for digit) character class like this:
import re
my_string = "xdtwkeltjwlkejt7wthwk89lk"
if first_digit := re.search(r"\d", my_string):
print(first_digit.start())
I'm sure there are multiple solutions, but using regular expressions you can do this:
>>> import re
>>> match = re.search("\d", "xdtwkeltjwlkejt7wthwk89lk")
>>> match.start(0)
15
Here is another regex-less way, more in a functional style. This one finds the position of the first occurrence of each digit that exists in the string, then chooses the lowest. A regex is probably going to be more efficient, especially for longer strings (this makes at least 10 full passes through the string and up to 20).
haystack = "xdtwkeltjwlkejt7wthwk89lk"
digits = "012345689"
found = [haystack.index(dig) for dig in digits if dig in haystack]
firstdig = min(found) if found else None
you can use regular expression
import re
y = "xdtwkeltjwlkejt7wthwk89lk"
s = re.search("\d",y).start()
def first_digit_index(iterable):
try:
return next(i for i, d in enumerate(iterable) if d.isdigit())
except StopIteration:
return -1
This does not use regex and will stop iterating as soon as the first digit is found.
import re
result = " Total files:................... 90"
match = re.match(r".*[^\d](\d+)$", result)
if match:
print(match.group(1))
will output
90
instr = 'nkfnkjbvhbef0njhb h2konoon8ll'
numidx = next((i for i, s in enumerate(instr) if s.isdigit()), None)
print(numidx)
Output:
12
numidx will be the index of the first occurrence of a digit in instr. If there are no digits in instr, numidx will be None.
I didn't see this solution here, and thought it should be.

Categories