How to use in python regular expression

How to use in python regular expression - python

I would like to use a numeric variable regular expression part.
What should I do if I want to use a variable in this part (?P<hh>\d)
I want to output lines that contain the input number.

Using string interpolation:
m = re.compile(r'\d{%d}:\d{%d}' % (var1, var2))
If the vars aren't already integers you may need to convert types like so:
m = re.compile(r'\d{%d}:\d{%d}' % (int(var1), int(var2)))

Your question isn't clear.
If you want to capture some specific part of the regex, you have to create groups (using pharentesis):
hh = sys.argv[1]
m = re.compile(r'(?P<hh>\d):(\d{2})')
match = m.match(hh)
print match.group(1)
print match.group(2)
for example, if hh = '1:23', the above code will print:
1
23
Now, if what you need is replace \d{2} by some variable, you can do:
variable = r'\d{2}'
m = re.compile(r'(?P<hh>\d):%s' % variable)
or if you just want to replace the 2, you can do:
variable = '2'
m = re.compile(r'(?P<hh>\d):\d{%s}' % variable)
Another option could be using:
r'(?P<hh>\d):{0}'.format(variable)

You can pass it in as a string (I'd escape it first):
m = re.compile(re.escape(hh) + r':\d{2}')

Related

In python how to replace a character in a string

import re
n=input("Enter a String:")
# Replace the String based on the pattern
replacedText = re.sub('[%]+','$', n,1)
# Print the replaced string
print("Replaced Text:",replacedText)
input I have given is:
ro%hi%
Output:
ro$hi%
I want to change the second % in the String with empty space(''). Is it possible. For that what changes can I do in my code.

Here is an arguably dumb solution, but it seems to do what you need. Would be good if you want to avoid regex and don't mind two function calls (i.e., performance in that sense isn't critical).
input_text = "ro%hi%"
output_text = input_text.replace("%", "$", 1).replace("%", "", 1)
print(output_text)
Terminal output:
$ python exp.py
ro$hi

use replace() function.
Specify how many occurances you need to replace at the end.
x = "ro%hi%"
print(x.replace("%", "$", 1)

You can use .replace("%", "$", 1) which will replace first % with $ then apply another .replace("%", "") to replace second % with ''.
n=input("Enter a String:")
# Print the replaced string
# This will replace the first % with $ and replace second % with ''
print("Replaced Text:",n.replace('%','$',1).replace('%', '', 1))
# If there are only two % in your input then you can use
# print("Replaced Text:",n.replace('%','$',1).rstrip('%')
Output:
ro$hi

You can just do
n.replace('%','')

Python replace different character

I have different strings like these :
"/table[1]/tr/td[2]/table[2]/tr/td[2]/p/b/text()"
"/table[1]/tr/td[2]/table[3]/tr/td[2]/p/b/text()"
I'd like to change the substring "/table[" + some number + "]" with "/table[" + the same number + "]/tbody".
For example this string
"/table[1]/tr/td[2]/table[2]/tr/td[2]/p/b/text()"
should change in
"/table[1]/tbody/tr/td[2]/table[2]/tbody/tr/td[2]/p/b/text()"

Use the symbolic group naming, this way:
>>> s
'/table[1]/tr/td[2]/table[2]/tr/td[2]/p/b/text()'
>>>
>>> re.sub(r'(?P<table>/table\[\d+\])', r'\g<table>/tbody', s)
'/table[1]/tbody/tr/td[2]/table[2]/tbody/tr/td[2]/p/b/text()'
>>>
>>> #similarly you can also reference by group number
>>> re.sub(r'(?P<table>/table\[\d+\])', r'\g<1>/tbody', s)
'/table[1]/tbody/tr/td[2]/table[2]/tbody/tr/td[2]/p/b/text()'
Quoting from Python Doc:
(?P<name>...)
Similar to regular parentheses, but the substring
matched by the group is accessible via the symbolic group name name.
Group names must be valid Python identifiers, and each group name must
be defined only once within a regular expression. A symbolic group is
also a numbered group, just as if the group were not named.

This is the solution :
import re
s = "/table[1]/tr/td[2]/table[2]/tr/td[2]/p/b/text()"
sl = s.split("/")
new_str = []
for n in sl :
match = re.search(r'table\[(?P<num>\d+)\]$', n)
if match != None :
#if you want to get the num
#num = match.group('num')
new_str.append("{}/tbody".format(n))
else :
new_str.append(n)
print "/".join(new_str)

Regex Python / group quantifiers

I want to match a list of variables which look like directories, e.g.:
Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34
The length of the "subdirectories" is variable having an upper bound (above it's 9).
I want to group every subdirectory except the 1st one which I named "Same" above.
The best I could come up with is:
^(?:([^/]+)/){4,8}([^/]+)=(.*)
It already looks for 4-8 subdirectories but only groups the last one. Why's that?
Is there a better solution using group quantifiers?
Edit: Solved. Will use split() instead.

import re
regx = re.compile('(?:(?<=\A)|(?<=/)).+?(?=/|\Z)')
for ss in ('Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123',
'Same/Same2/Battery/Name=SomeString',
'Same/Same2/Home/Land/Some/More/Stuff=0.34'):
print ss
print regx.findall(ss)
print
Edit 1
Now you have given more info on what you want to obtain ( _"Same/Same2/Battery/Name=SomeString becoming SAME2_BATTERY_NAME=SomeString"_ ) better solutions can be proposed: either with a regex or with split() , + replace()
import re
from os import sep
sep2 = r'\\' if sep=='\\' else '/'
pat = '^(?:.+?%s)(.+$)' % sep2
print 'pat==%s\n' % pat
ragx = re.compile(pat)
for ss in ('Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123',
'Same\Same2\Battery\Name=SomeString',
'Same\Same2\Home\Land\Some\More\Stuff=0.34'):
print ss
print ragx.match(ss).group(1).replace(sep,'_')
print ss.split(sep,1)[1].replace(sep,'_')
print
result
pat==^(?:.+?\\)(.+$)
Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
Same\Same2\Battery\Name=SomeString
Same2_Battery_Name=SomeString
Same2_Battery_Name=SomeString
Same\Same2\Home\Land\Some\More\Stuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34
Edit 2
Re-reading your comment, I realized that I didn't take in account that you want to upper the part of the strings that lies before the '=' sign but not after it.
Hence, this new code that exposes 3 methods that answer this requirement. You will choose which one you prefer:
import re
from os import sep
sep2 = r'\\' if sep=='\\' else '/'
pot = '^(?:.+?%s)(.+?)=([^=]*$)' % sep2
print 'pot==%s\n' % pot
rogx = re.compile(pot)
pet = '^(?:.+?%s)(.+?(?==[^=]*$))' % sep2
print 'pet==%s\n' % pet
regx = re.compile(pet)
for ss in ('Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123',
'Same\Same2\Battery\Name=SomeString',
'Same\Same2\Ocean\Atlantic\North=',
'Same\Same2\Maths\Addition\\2+2=4\Simple=ohoh'):
print ss + '\n' + len(ss)*'-'
print 'rogx groups '.rjust(32),rogx.match(ss).groups()
a,b = ss.split(sep,1)[1].rsplit('=',1)
print 'split split '.rjust(32),(a,b)
print 'split split join upper replace %s=%s' % (a.replace(sep,'_').upper(),b)
print 'regx split group '.rjust(32),regx.match(ss.split(sep,1)[1]).group()
print 'regx split sub '.rjust(32),\
regx.sub(lambda x: x.group(1).replace(sep,'_').upper(), ss)
print
result, on a Windows platform
pot==^(?:.+?\\)(.+?)=([^=]*$)
pet==^(?:.+?\\)(.+?(?==[^=]*$))
Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123
----------------------------------------------
rogx groups ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
split split ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
split split join upper replace SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
regx split group Same2\Foot\Ankle\Joint\Sensor\Value
regx split sub SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
Same\Same2\Battery\Name=SomeString
----------------------------------
rogx groups ('Same2\\Battery\\Name', 'SomeString')
split split ('Same2\\Battery\\Name', 'SomeString')
split split join upper replace SAME2_BATTERY_NAME=SomeString
regx split group Same2\Battery\Name
regx split sub SAME2_BATTERY_NAME=SomeString
Same\Same2\Ocean\Atlantic\North=
--------------------------------
rogx groups ('Same2\\Ocean\\Atlantic\\North', '')
split split ('Same2\\Ocean\\Atlantic\\North', '')
split split join upper replace SAME2_OCEAN_ATLANTIC_NORTH=
regx split group Same2\Ocean\Atlantic\North
regx split sub SAME2_OCEAN_ATLANTIC_NORTH=
Same\Same2\Maths\Addition\2+2=4\Simple=ohoh
-------------------------------------------
rogx groups ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
split split ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
split split join upper replace SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh
regx split group Same2\Maths\Addition\2+2=4\Simple
regx split sub SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh

I probably misunderstood what exactly you want to do, but here is how you would do it without regex:
for entry in list_of_vars:
key, value = entry.split('=')
key_components = key.split('/')
if 4 <= len(key_components) <= 8:
# here the actual work is done
print "%s=%s" % ('_'.join(key_components[1:]).upper(), value)

Just use split?
>>> p='Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123'
>>> p.split('/')
['Same', 'Same2', 'Foot', 'Ankle', 'Joint', 'Actuator', 'Sensor', 'Temperature', 'Value=4.123']
Also, if you want that key/val pair you can do something like this...
>>> s = p.split('/')
>>> s[-1].split('=')
['Value', '4.123']

A couple of variations on your theme. For one, I've always found regexen to be cryptic to the point of unmaintainable, so I wrote the pyparsing module. In my mind, I look at your code and think, "oh, it's a list of '/'-delimited strings, an '=' sign, and then some kind of rvalue." And that translates pretty directly into the pyparsing parser definition code. By adding a name here and there in the parser ("key" and "value", similar to named groups in regex), the output is pretty easily processed.
data="""\
Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34""".splitlines()
from pyparsing import Word, alphas, alphanums, Word, nums, QuotedString, delimitedList
wd = Word(alphas, alphanums)
number = Word(nums+'+-', nums+'.').setParseAction(lambda t:float(t[0]))
rvalue = wd | number | QuotedString('"')
defn = delimitedList(wd, '/')('key') + '=' + rvalue('value')
for d in data:
result = defn.parseString(d)
Second, I question your approach at defining all of those variable names - creating variable names on the fly based on your data is a pretty well-recognized Code Smell (not necessarily bad, but you might really want to rethink this approach). I used a recursive defaultdict to create a navigable structure so that you can easily do operations like "find all the entries that are sub-elements of "Same2" (in this case, "Foot", "Battery", and "Home") - this kind of work is more difficult when trying to sift through some collection of variable names as found in locals(), it seems to me you will end up re-parsing these names to reconstruct the key hierarchy.
from collections import defaultdict
class recursivedefaultdict(defaultdict):
def __init__(self, attrFactory=int):
self.default_factory = lambda : type(self)(attrFactory)
self._attrFactory = attrFactory
def __getattr__(self, attr):
newval = self._attrFactory()
setattr(self, attr, newval)
return newval
table = recursivedefaultdict()
# parse each entry, and accumulate into hierarchical dict
for d in data:
# use pyparsing parser, gives us key (list of names) and value
result = defn.parseString(d)
t = table
for k in result.key[:-1]:
t = t[k]
t[result.key[-1]] = result.value
# recursive method to iterate over hierarchical dict
def showTable(t, indent=''):
for k,v in t.items():
print indent+k,
if isinstance(v,dict):
print
showTable(v, indent+' ')
else:
print v
showTable(table)
Prints:
Same
Same2
Foot
Ankle
Joint
Actuator
Sensor
Temperature
Value 4.123
Battery
Name SomeString
Home
Land
Some
More
Stuff 0.34
If you are really set on defining those variable names, then adding some helpful parse actions to pyparsing will reformat the parsed data at parse time, so that it's directly processable afterwards:
wd = Word(alphas, alphanums)
number = Word(nums+'+-', nums+'.').setParseAction(lambda t:float(t[0]))
rvaluewd = wd.copy().setParseAction(lambda t: '"%s"' % t[0])
rvalue = rvaluewd | number | QuotedString('"')
defn = delimitedList(wd, '/')('key') + '=' + rvalue('value')
def joinNamesWithAllCaps(tokens):
tokens["key"] = '_'.join(map(str.upper, tokens.key))
defn.setParseAction(joinNamesWithAllCaps)
for d in data:
result = defn.parseString(d)
print result.key,'=', result.value
Prints:
SAME_SAME2_FOOT_ANKLE_JOINT_ACTUATOR_SENSOR_TEMPERATURE_VALUE = 4.123
SAME_SAME2_BATTERY_NAME = "SomeString"
SAME_SAME2_HOME_LAND_SOME_MORE_STUFF = 0.34
(Note that this also encloses your SomeString value in quotes, so that the resulting assignment statement is valid Python.)

parsing a line of text to get a specific number

I have a line of text in the form " some spaces variable = 7 = '0x07' some more data"
I want to parse it and get the number 7 from "some variable = 7". How can this be done in python?

I would use a simpler solution, avoiding regular expressions.
Split on '=' and get the value at the position you expect
text = 'some spaces variable = 7 = ...'
if '=' in text:
chunks = text.split('=')
assignedval = chunks[1]#second value, 7
print 'assigned value is', assignedval
else:
print 'no assignment in line'

Use a regular expression.
Essentially, you create an expression that goes something like "variable = (\d+)", do a match, and then take the first group, which will give you the string 7. You can then convert it to an int.
Read the tutorial in the link above.

Basic regex code snippet to find numbers in a string.
>>> import re
>>> input = " some spaces variable = 7 = '0x07' some more data"
>>> nums = re.findall("[0-9]*", input)
>>> nums = [i for i in nums if i] # remove empty strings
>>> nums
['7', '0', '07']
Check out the documentation and How-To on python.org.

Is it possible to use a back reference to specify the number of replications in a regular expression?

Is it possible to use a back reference to specify the number of replications in a regular expression?
foo= 'ADCKAL+2AG.+2AG.+2AG.+2AGGG+.G+3AGGa.'
The substrings that start with '+[0-9]' followed by '[A-z]{n}.' need to be replaced with simply '+' where the variable n is the digit from earlier in the substring. Can that n be back referenced? For example (doesn't work) '+([0-9])[A-z]{/1}.' is the pattern I want replaced with "+" (that last dot can be any character and represents a quality score) so that foo should come out to ADCKAL+++G.G+.
import re
foo = 'ADCKAL+2AG.+2AG.+2AG.+2AGGG+.+G+3AGGa.'
indelpatt = re.compile('\+([0-9])')
while indelpatt.search(foo):
indelsize=int(indelpatt.search(foo).group(1))
new_regex = '\+%s[ACGTNacgtn]{%s}.' % (indelsize,indelsize)
newpatt=re.compile(new_regex)
foo = newpatt.sub("+", foo)
I'm probably missing an easier way to parse the string.

No, you cannot use back-references as quantifiers. A workaround is to construct a regular expression that can handle each of the cases in an alternation.
import re
foo = 'ADCKAL+2AG.+2AG.+2AG.+2AGGG^+.+G+3AGGa4.'
pattern = '|'.join('\+%s[ACGTNacgtn]{%s}.' % (i, i) for i in range(1, 10))
regex = re.compile(pattern)
foo = regex.sub("+", foo)
print foo
Result:
ADCKAL++++G^+.+G+4.
Note also that your code contains an error that causes it to enter an infinite loop on the input you gave.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use in python regular expression - python

I would like to use a numeric variable regular expression part. What should I do if I want to use a variable in this part (?P<hh>\d) I want to output lines that contain the input number.

Using string interpolation: m = re.compile(r'\d{%d}:\d{%d}' % (var1, var2)) If the vars aren't already integers you may need to convert types like so: m = re.compile(r'\d{%d}:\d{%d}' % (int(var1), int(var2)))

You can pass it in as a string (I'd escape it first): m = re.compile(re.escape(hh) + r':\d{2}')

Related

In python how to replace a character in a string

Python replace different character

Regex Python / group quantifiers

parsing a line of text to get a specific number

Is it possible to use a back reference to specify the number of replications in a regular expression?

Categories

Resources