How to split the string using python3 - python

How to split the string using regex
input :
result = '1,000.03AM2,97.2323,089.301,903.230.0034,928.9911,24.30AM'
Want to split this so that I can store into different strings for further use like following
o/p should be :
a = 1,000.03AM, b = 2,97.23, c = 23,089.30, d = 1,903.23, e = 0.00, f = 34,928.99, g = 11,24.30AM
I have tried like this but it's showing wrong output
import re
print(re.findall(r'[0-9.]+|[^0-9.]', result))

You may extract the strings using
re.findall(r'\d+(?:,\d+)*(?:\.\d{2})?[^,\d]*', text)
See the regex demo
Details
\d+ - 1+ digits
(?:,\d+)* - 0 or more repetitions of a comma and 1+ digits
(?:\.\d{2})? - an optional occurrence of a dot and 2 digits
[^,\d]* - any 0 or more chars other than a comma and digit.
Python demo:
import re
text = "1,000.03AM2,97.2323,089.301,903.230.0034,928.9911,24.30AM"
print( re.findall(r'\d+(?:,\d+)*(?:\.\d{2})?[^,\d]*', text) )
# => ['1,000.03AM', '2,97.23', '23,089.30', '1,903.23', '0.00', '34,928.99', '11,24.30AM']

For your result you need following regex:
re.findall(r"[\d,]+\.\d{2}(?:AM)?", result)
This produce following:
['1,000.03AM', '2,97.23', '23,089.30', '1,903.23', '0.00', '34,928.99', '11,24.30AM']
Regex explanation:
[\d,] - match digits and comma
[\d,]+\.\d{2} - match whole float value (with two digest after dot)
(?:AM)? - matching optional AM in non-capturing group, in example below I use (?=AM)? to not include it into result
In case on the place of AM you have anything else, you may edit (?:AM) to (?:AM|Other|...)
If you need to parse it as float, I have two suggestion for you. First is removing comma:
map(lambda x: float(x.replace(",", "")), re.findall(r"[\d,]+\.\d{2}(?=AM)?", s))
Result:
[1000.03, 297.23, 23089.3, 1903.23, 0.0, 34928.99, 1124.3]
Another variant is using locale:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF8')
'en_US.UTF8'
>>> list(map(lambda x: locale.atof(x), re.findall(r"[\d,]+\.\d{2}(?=AM)?", s)))
[1000.03, 297.23, 23089.3, 1903.23, 0.0, 34928.99, 1124.3]

Provided if string length and its parameter remains same.
Most efficient solution would be.
a = result[0:10]
b = result[10:17]
c = result[17:26]
d = result[26:34]
e = result[34:38]
f = result[38:47]
Hope this helps.

Related

Replace string which has dynamic character in python

Trying to replace the string with regular expression and could not success.
The strings are "LIVE_CUS2_PHLR182" ,"LIVE_CUS2ee_PHLR182" and "PHLR182 - testing recovery".Here I need to get PHLR182 as an output with all the string but where second string has "ee" which is not constant. It can be string or number with 2 character.Below is the code what I have tried.
For first and last string I just simply used replace function like below.
s = "LIVE_CUS2_PHLR182"
s.replace("LIVE_CUS2_", ""), s.replace(" - testing recovery","")
>>> PHLR182
But for second I tried like below.
1. s= "LIVE_CUS2ee_PHLR182"
s.replace(r'LIVE_CUS2(\w+)*_','')
2. batRegex = re.compile(r'LIVE_CUS2(\w+)*_PHLR182')
mo2 = batRegex.search('LIVE_CUS2dd_PHLR182')
mo2.group()
3. re.sub(r'LIVE_CUS2(?is)/s+_PHLR182', '', r)
In all case I could not get "PHLR182" as an output. Please help me.
I think this is what you need:
import re
texts = """LIVE_CUS2_PHLR182
LIVE_CUS2ee_PHLR182
PHLR182 - testing recovery""".split('\n')
pat = re.compile(r'(LIVE_CUS2\w{,2}_| - testing recovery)')
# 1st alt pattern | 2nd alt pattern
# Look for 'LIV_CUS2_' with up to two alphanumeric characters after 2
# ... or Look for ' - testing recovery'
results = [pat.sub('', text) for text in texts]
# replace the matched pattern with empty string
print(f'Original: {texts}')
print(f'Results: {results}')
Result:
Original: ['LIVE_CUS2_PHLR182', 'LIVE_CUS2ee_PHLR182', 'PHLR182 - testing recovery']
Results: ['PHLR182', 'PHLR182', 'PHLR182']
Python Demo: https://repl.it/repls/ViolentThirdAutomaticvectorization
Regex Demo: https://regex101.com/r/JiEVqn/2

How to match 2 digits in python with string?

I have following options, but all of them return full string. I need to remove date at the beginning with regex.
d = re.match('^\d{2}.\d{2}.\d{4}(.*?)$', '01.10.2018Any text..')
d = re.match('^[0-9]{2}.[0-9]{2}.[0-9]{4}(.*?)$', '01.10.2018Any text..')
How to do that? Python 3.6
You could use sub to match the date like pattern (Note that that does not validate a date) from the start of the string ^\d{2}\.\d{2}\.\d{4} and replace with an empty string.
And as #UnbearableLightness mentioned, you have to escape the dot \. if you want to match it literally.
import re
result = re.sub(r'^\d{2}\.\d{2}\.\d{4}', '', '01.10.2018Any text..')
print(result) # Any text..
Demo
Grab the first group of the match
>>> d = re.match('^\d{2}.\d{2}.\d{4}(.*?)$', '01.10.2018Any text..').group(1)
>>> print (d)
'Any text..'
If you are not sure, if there would be a match, you have to check it first
>>> s = '01.10.2018Any text..'
>>> match = re.match('^\d{2}.\d{2}.\d{4}(.*?)$', s)
>>> d = match.group(1) if match else s
>>> print(d)
'Any text..'
Use a group to extract the date part:
d = re.match('^(\d{2}.\d{2}.\d{4})(.*?)$', '01.10.2018Any text..')
if d:
print(d.group(1))
print(d.group(2))
Group 0 is the whole string, I added a pair of parentheses in the regex around the date. This is group 1. Group 2 is the text after which is what you're after

get all occurence of a regex in string python

I am trying to find in the following string TreeModel/Node/Node[1]/Node[4]/Node[1] this :
TreeModel/Node
TreeModel/Node/Node[1]
TreeModel/Node/Node[1]/Node[4]
TreeModel/Node/Node[1]/Node[4]/Node[1]
Using regular expression in python. Here is the code I tried:
string = 'TreeModel/Node/Node[1]/Node[4]/Node[1]'
pattern = r'.+?Node\[[1-9]\]'
print re.findall(pattern=pattern,string=string)
#result : ['TreeModel/Node/Node[1]', '/Node[4]', '/Node[1]']
#expected result : ['TreeModel/Node', 'TreeModel/Node/Node[1]', 'TreeModel/Node/Node[1]/Node[4]', 'TreeModel/Node/Node[1]/Node[4]/Node[1]']
You can use split here:
>>> s = 'TreeModel/Node/Node[1]/Node[4]/Node[1]'
>>> split_s = s.split('/')
>>> ['/'.join(split_s[:i]) for i in range(2, len(split_s)+1)]
['TreeModel/Node',
'TreeModel/Node/Node[1]',
'TreeModel/Node/Node[1]/Node[4]',
'TreeModel/Node/Node[1]/Node[4]/Node[1]']
You can also use regex:
for i in range(2, s.count('/')+2):
s_ = '[^/]+/*'
regex = re.search(r'('+s_*i+')', s).group(0)
print(regex)
TreeModel/Node/
TreeModel/Node/Node[1]/
TreeModel/Node/Node[1]/Node[4]/
TreeModel/Node/Node[1]/Node[4]/Node[1]
I'm not good in Python at all but for regex part with your specific structure of string below regex matches each segment:
/?(?:{[^{}]*})?[^/]+
Where braces and preceding / is optional. It matches a slash mark (if any) then braces with their content (if any) then the rest up to next slash mark.
Python code (see live demo here):
matches = re.findall(r'/?(?:{[^{}]*})?[^/]+', string)
output = ''
for i in range(len(matches)):
output += matches[i];
print(output)

Need help extracting data from a file

I'm a newbie at python.
So my file has lines that look like this:
-1 1:-0.294118 2:0.487437 3:0.180328 4:-0.292929 5:-1 6:0.00149028 7:-0.53117 8:-0.0333333
I need help coming up with the correct python code to extract every float preceded by a colon and followed by a space (ex: [-0.294118, 0.487437,etc...])
I've tried dataList = re.findall(':(.\*) ', str(line)) and dataList = re.split(':(.\*) ', str(line)) but these come up with the whole line. I've been researching this problem for a while now so any help would be appreciated. Thanks!
try this one:
:(-?\d\.\d+)\s
In your code that will be
p = re.compile(':(-?\d\.\d+)\s')
m = p.match(str(line))
dataList = m.groups()
This is more specific on what you want.
In your case .* will match everything it can
Test on Regexr.com:
In this case last element wasn't captured because it doesnt have space to follow, if this is a problem just remove the \s from the regex
This will do it:
import re
line = "-1 1:-0.294118 2:0.487437 3:0.180328 4:-0.292929 5:-1 6:0.00149028 7:-0.53117 8:-0.0333333"
for match in re.finditer(r"(-?\d\.\d+)", line, re.DOTALL | re.MULTILINE):
print match.group(1)
Or:
match = re.search(r"(-?\d\.\d+)", line, re.DOTALL | re.MULTILINE)
if match:
datalist = match.group(1)
else:
datalist = ""
Output:
-0.294118
0.487437
0.180328
-0.292929
0.00149028
-0.53117
-0.0333333
Live Python Example:
http://ideone.com/DpiOBq
Regex Demo:
https://regex101.com/r/nR4wK9/3
Regex Explanation
(-?\d\.\d+)
Match the regex below and capture its match into backreference number 1 «(-?\d\.\d+)»
Match the character “-” literally «-?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single character that is a “digit” (ASCII 0–9 only) «\d»
Match the character “.” literally «\.»
Match a single character that is a “digit” (ASCII 0–9 only) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Given:
>>> s='-1 1:-0.294118 2:0.487437 3:0.180328 4:-0.292929 5:-1 6:0.00149028 7:-0.53117 8:-0.0333.333'
With your particular data example, you can just grab the parts that would be part of a float with a regex:
>>> re.findall(r':([\d.-]+)', s)
['-0.294118', '0.487437', '0.180328', '-0.292929', '-1', '0.00149028', '-0.53117', '-0.0333.333']
You can also split and partition, which would be substantially faster:
>>> [e.partition(':')[2] for e in s.split() if ':' in e]
['-0.294118', '0.487437', '0.180328', '-0.292929', '-1', '0.00149028', '-0.53117', '-0.0333.333']
Then you can convert those to a float using try/except and map and filter:
>>> def conv(s):
... try:
... return float(s)
... except ValueError:
... return None
...
>>> filter(None, map(conv, [e.partition(':')[2] for e in s.split() if ':' in e]))
[-0.294118, 0.487437, 0.180328, -0.292929, -1.0, 0.00149028, -0.53117, -0.0333333]
A simple oneliner using list comprehension -
str = "-1 1:-0.294118 2:0.487437 3:0.180328 4:-0.292929 5:-1 6:0.00149028 7:-0.53117 8:-0.0333333"
[float(s.split()[0]) for s in str.split(':')]
Note: this is simplest to understand (and pobably fastest) as we are not doing any regex evaluation. But this would only work for the particular case above. (eg. if you've to get the second number - in the above not so correctly formatted string would need more work than a single one-liner above).

Python - string replacement, re.match and re.sub in a single operation (replace a char if at end of a string)

I have a script where I need to replace some chars that could generate some troubles with others.
I would like to optimize the number of operation required:
# Replace % at end of string
find_char = re.match( r'.+\%[a-zA-Z0-9]+', line)
if find_char:
line=re.sub(r'\%','PCT',line)
Here I want to replace % but only if it is present at the end of a string, can I do this in one single operation with re.sub?
Yes, of course, just specify that the match should be at the end of the string, using $:
>>> import re
>>> re.sub("%$", "o", "fo%")
'foo'
>>> re.sub("%$", "o", "f%o")
'f%o'
find_char = re.match( r'.+\%[a-zA-Z0-9]+', line)
if find_char:
line=re.sub(r'\%$','PCT',line)
use $ to match a character at the end
I think you mean this. It replaces the % symbol present at the end of a string with PCT
>>> import re
>>> m = re.sub(r'(?<=\S)%(?= |$)', r'PCT', 'foo%bar foo% bar%')
>>> m
'foo%bar fooPCT barPCT'
If you want to replace a single % symbol also which was preceded by a space and followed by a space then try this,
>>> m = re.sub(r'(?<=[\S\s])%(?= |$)', r'PCT', 'foo%bar % foo% bar%')
>>> m
'foo%bar PCT fooPCT barPCT'
OR
>>> import regex
>>> m = regex.sub(r'(?<=^|[\S\s])%(?= |$)', r'PCT', '% foo%bar % foo% bar%')
>>> m
'PCT foo%bar PCT fooPCT barPCT'

Categories