Replace space character between two numbers - python

I need to replace space with comma between two numbers
15.30 396.90 => 15.30,396.90
In PHP this is used:
'/(?<=\d)\s+(?=\d)/', ','
How to do it in Python?

There are several ways to do it (sorry, Zen of Python). Which one to use depends on your input:
>>> s = "15.30 396.90"
>>> ",".join(s.split())
'15.30,396.90'
>>> s.replace(" ", ",")
'15.30,396.90'
or, using re, for example, this way:
>>> import re
>>> re.sub("(\d+)\s+(\d+)", r"\1,\2", s)
'15.30,396.90'

You can use the same regex with the re module in Python:
import re
s = '15.30 396.90'
s = re.sub(r'(?<=\d)\s+(?=\d)', ',', s)

Related

How to format a number with comma every four digits in Python?

I have a number 12345 and I want the result '1,2345'. I tried the following code, but failed:
>>> n = 12345
>>> f"{n:,}"
'12,345'
Regex will work for you:
import re
def format_number(n):
return re.sub(r"(\d)(?=(\d{4})+(?!\d))", r"\1,", str(n))
>>> format_number(123)
'123'
>>> format_number(12345)
'1,2345'
>>> format_number(12345678)
'1234,5678'
>>> format_number(123456789)
'1,2345,6789'
Explanation:
Match:
(\d) Match a digit...
(?=(\d{4})+(?!\d)) ...that is followed by one or more groups of exactly 4 digits.
Replace:
\1, Replace the matched digit with itself and a ,
Sounds like a locale thing(*). This prints 12,3456,7890 (Try it online!):
import locale
n = 1234567890
locale._override_localeconv["thousands_sep"] = ","
locale._override_localeconv["grouping"] = [4, 0]
print(locale.format_string('%d', n, grouping=True))
That's an I guess hackish way based on this answer. The other answer there talks about using babel, maybe that's a clean way to achieve it.
(*) Quick googling found this talking about Chinese grouping four digits, and OP's name seems somewhat Chinese, so...
Using babel:
>>> from babel.numbers import format_decimal
>>> format_decimal(1234, format="#,####", locale="en")
'1234'
>>> format_decimal(12345, format="#,####", locale="en")
'1,2345'
>>> format_decimal(1234567890, format="#,####", locale="en")
'12,3456,7890'
This format syntax is specified in UNICODE LOCALE DATA MARKUP LANGUAGE (LDML). Some light bedtime reading there.
Using stdlib only (hackish):
>>> from textwrap import wrap
>>> n = 12345
>>> ",".join(wrap(str(n)[::-1], width=4))[::-1]
'1,2345'
You can break your number into chunks of 10000's using modulus and integer division, then str.join using ',' delimiters
def commas(n):
s = []
while n > 0:
n, chunk = divmod(s, n)
s.append(str(chunk))
return ','.join(reversed(s))
>>> commas(123456789)
'1,2345,6789'
>>> commas(123)
'123'

python regex find characters from and end of the string

svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz
from the following string I need to fetch Rev1233. So i was wondering if we can have any regexpression to do that. I like to do following string.search ("Rev" uptill next /)
so far I split this using split array
s1,s2,s3,s4,s5 = string ("/",4)
You don't need a regex to do this. It is as simple as:
str = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
str.split('/')[-2]
Here is a quick python example
>>> impot re
>>> s = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
>>> p = re.compile('.*/(Rev\d+)/.*')
>>> p.match(s).groups()[0]
'Rev1223'
Find second part from the end using regex, if preferred:
/(Rev\d+)/[^/]+$
http://regex101.com/r/cC6fO3/1
>>> import re
>>> m = re.search('/(Rev\d+)/[^/]+$', 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz')
>>> m.groups()[0]
'Rev1223'

Is there a single function for matching and replacing?

I wonder if there is a simpler alternative (e.g. a single function call) for matching and replacing to the following example:
>>> import re
>>>
>>> line = 'file:///windows-d/academic%20discipline/study%20objects/areas/formal%20systems/math'
>>>
>>> match = re.match(r'^file://(.*)$', line)
>>> if match and match.group(1):
... substitution = re.sub(r'%20', r' ', match.group(1))
...
>>> substitution
'/windows-d/academic discipline/study objects/areas/formal systems/math'
Thanks.
I'm going to dodge your regex question and suggest you use something else for this:
>>> line = 'file:///windows-d/academic%20discipline/study%20objects/areas/formal%20systems/math'
>>> import urllib
>>> urllib.unquote(line)
'file:///windows-d/academic discipline/study objects/areas/formal systems/math'
Then just strip off the file:// with a slice or str.replace if necessary.
%20 (space) is not the only escaped character possible here, so it's better to use the right tool for the job than have your regex solution break later when there is another character needing un-escaping.
You could try the below simple python code,
>>> import re
>>> line = 'file:///windows-d/academic%20discipline/study%20objects/areas/formal%20systems/math'
>>> m = re.sub(r'%20|file://', r' ', line).strip()
>>> m
'/windows-d/academic discipline/study objects/areas/formal systems/math'
re.sub(r'%20|file://', r' ', line).strip() code replaces the string %20 or file:// with a space. And again the strip() function removes all the leading and trailing spaces.
>>> import re
>>> s = 'file:///windows-d/academic%20discipline/study%20objects/areas/formal%20systems/math'
>>> re.sub(r'^file://(.*)$', lambda m: m.group(1).replace('%20', ' '), s)
'/windows-d/academic discipline/study objects/areas/formal systems/math'
>>> s = 'file:///windows-d/academic%20discipline/study%20objects/areas/formal%20systems/math'
>>> s.replace('file://', '').replace('%20', ' ')
'/windows-d/academic discipline/study objects/areas/formal systems/math'

How to match sequential string in text using regular expression?

I am using python's re module to match sequential string in text, for example:
s = 'habcabcabcj', I try the following code:
import re
re.findall(r'(abc)+', s)
And the result is: ["abc"]
If I want the match result to be ["abcabcabc"], how can I do this?
Use a non-capturing group (?:...):
>>> import re
>>> s = 'habcabcabcj'
>>> re.findall(r'(?:abc)+', s)
['abcabcabc']
>>>

how to Ignore characters other than [a-z][A-Z]

How can I ignore characters other than [a-z][A-Z] in input string in python, and after applying method what will the string look like?
Do I need to use regular expressions?
If you need to use a regex, use a negative character class ([^...]):
re.sub(r'[^a-zA-Z]', '', inputtext)
A negative character class matches anything not named in the class.
Demo:
>>> import re
>>> inputtext = 'The quick brown fox!'
>>> re.sub(r'[^a-zA-Z]', '', inputtext)
'Thequickbrownfox'
But using str.translate() is way faster:
import string
ascii_letters = set(map(ord, string.ascii_letters))
non_letters = ''.join(chr(i) for i in range(256) if i not in ascii_letters)
inputtext.translate(None, non_letters)
Using str.translate() is more than 10 times faster than a regular expression:
>>> import timeit, partial, re
>>> ascii_only = partial(re.compile(r'[^a-zA-Z]').sub, '')
>>> timeit.timeit('f(t)', 'from __main__ import ascii_only as f, inputtext as t')
7.903045892715454
>>> timeit.timeit('t.translate(None, m)', 'from __main__ import inputtext as t, non_letters as m')
0.5990171432495117
Using Jakub's method is slower still:
>>> timeit.timeit("''.join(c for c in t if c not in l)", 'from __main__ import inputtext as t; import string; l = set(string.letters)')
9.960685968399048
You can use regex:
re.compile(r'[^a-zA-Z]').sub('', your_string)
You could also manage without regular expressions (e.g, if you had regex phobia):
import string
new_string = ''.join(c for c in old_string
if c not in set(string.letters))
Although I would use regex, this example has additional educational values: set, comprehension and string library. Note that set is not strictly needed here

Categories