Python equivalent of ruby's StringScanner?

Python equivalent of ruby's StringScanner? - python

Is there a python class equivalent to ruby's StringScanner class? I Could hack something together, but i don't want to reinvent the wheel if this already exists.

Interestingly there's an undocumented Scanner class in the re module:
import re
def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)
scanner = re.Scanner([
(r"[a-zA-Z_]\w*", s_ident),
(r"\d+\.\d*", s_float),
(r"\d+", s_int),
(r"=|\+|-|\*|/", s_operator),
(r"\s+", None),
])
print scanner.scan("sum = 3*foo + 312.50 + bar")
Following the discussion it looks like it was left in as experimental code/a starting point for others.

There is nothing exactly like Ruby's StringScanner in Python. It is of course easy to put something together:
import re
class Scanner(object):
def __init__(self, s):
self.s = s
self.offset = 0
def eos(self):
return self.offset == len(self.s)
def scan(self, pattern, flags=0):
if isinstance(pattern, basestring):
pattern = re.compile(pattern, flags)
match = pattern.match(self.s, self.offset)
if match is not None:
self.offset = match.end()
return match.group(0)
return None
along with an example of using it interactively
>>> s = Scanner("Hello there!")
>>> s.scan(r"\w+")
'Hello'
>>> s.scan(r"\s+")
' '
>>> s.scan(r"\w+")
'there'
>>> s.eos()
False
>>> s.scan(r".*")
'!'
>>> s.eos()
True
>>>
However, for the work I do I tend to just write those regular expressions in one go and use groups to extract the needed fields. Or for something more complicated I would write a one-off tokenizer or look to PyParsing or PLY to tokenize for me. I don't see myself using something like StringScanner.

Looks like a variant on re.split( pattern, string ).
http://docs.python.org/library/re.html
http://docs.python.org/library/re.html#re.split

https://pypi.python.org/pypi/scanner/
Seems a more maintained and feature complete solution. But it uses oniguruma directly.

Maybe look into the built in module tokenize. It looks like you can pass a string into it using the StringIO module.

Today there is a project by Mark Watkinson that implements StringScanner in Python:
http://asgaard.co.uk/p/Python-StringScanner
https://github.com/markwatkinson/python-string-scanner
http://code.google.com/p/python-string-scanner/

Are you looking for regular expressions in Python? Check this link from official docs:
http://docs.python.org/library/re.html

Related

the best way to parse and validate YAML configuration file

We have project which stores settings in YAML (settings file is generated by ansible scripts). Now we are using pyyaml to parse YAML format and marshmallow to validate settings. I'm pretty happy with storing setting in YAML, but I don't think marshmellow is the tool I need (schemas are hard to read, I do not need serialization for settings, want something like xsd). So what are the best practices of validating settings in project, maybe there is language independent way? (we are using python 2.7)
YAML settings:
successive:
worker:
cds_process_number: 0 # positive integer or zero
spider_interval: 10 # positive integer
run_worker_sh: /home/lmakeev/CDS/releases/master/scripts/run_worker.sh # OS path
allow:
- "*" # regular expression
deny:
- "^[A-Z]{3}_.+$" # regular expression

A schema description is a language of its own, with its own syntax and idiosyncrasies you have to learn. And you have to maintain its "programs" against which your YAML is verified if your requirements change.
If you are already working with YAML and are familiar with Python you can use YAML's tag facility to check objects at parse time.
Assuming you have a file input.yaml:
successive:
worker:
cds_process_number: !nonneg 0
spider_interval: !pos 10
run_worker_sh: !path /home/lmakeev/CDS/releases/master/scripts/run_worker.sh
allow:
- !regex "*"
deny:
- !regex "^[A-Z]{3}_.+$"
(your example file with the comments removed and tags inserted), you can create and register four classes that check the values using the following program¹:
import sys
import os
import re
import ruamel.yaml
import pathlib
class NonNeg:
yaml_tag = u"!nonneg"
#classmethod
def from_yaml(cls, constructor, node):
val = int(node.value) # this creates/returns an int
assert val >= 0
return val
class Pos(int):
yaml_tag = u"!pos"
#classmethod
def from_yaml(cls, constructor, node):
val = cls(node.value) # this creates/return a Pos()
assert val > 0
return val
class Path:
yaml_tag = u"!path"
#classmethod
def from_yaml(cls, constructor, node):
val = pathlib.Path(node.value)
assert os.path.exists(val)
return val
class Regex:
yaml_tag = u"!regex"
def __init__(self, val, comp):
# store original string and compile() of that string
self._val = val
self._compiled = comp
#classmethod
def from_yaml(cls, constructor, node):
val = str(node.value)
try:
comp = re.compile(val)
except Exception as e:
comp = None
print("Incorrect regex", node.start_mark)
print(" ", node.tag, node.value)
return cls(val, comp)
yaml = ruamel.yaml.YAML(typ="safe")
yaml.register_class(NonNeg)
yaml.register_class(Pos)
yaml.register_class(Path)
yaml.register_class(Regex)
data = yaml.load(pathlib.Path('input.yaml'))
The actual checks in the individual from_yaml classmethods should be adapted to your needs (I had to remove the assert for the Path, as I don't have that file).
If you run the above you'll note that it prints:
Incorrect regex in "input.yaml", line 7, column 9
!regex *
because "*" is not a valid regular expression. Did you mean: ".*"?
¹ This was done using ruamel.yaml, a YAML 1.2 parser, of which I am the author. You can achieve the same results with PyYAML, e.g by subclassing ObjectDict (which is unsafe by default, so make sure you correct that in your code)

How to extract functions used in a python code file?

I would like to create a list of all the functions used in a code file. For example if we have following code in a file named 'add_random.py'
`
import numpy as np
from numpy import linalg
def foo():
print np.random.rand(4) + np.random.randn(4)
print linalg.norm(np.random.rand(4))
`
I would like to extract the following list:
[numpy.random.rand, np.random.randn, np.linalg.norm, np.random.rand]
The list contains the functions used in the code with their actual name in the form of 'module.submodule.function'. Is there something built in python language that can help me do this?

You can extract all call expressions with:
import ast
class CallCollector(ast.NodeVisitor):
def __init__(self):
self.calls = []
self.current = None
def visit_Call(self, node):
# new call, trace the function expression
self.current = ''
self.visit(node.func)
self.calls.append(self.current)
self.current = None
def generic_visit(self, node):
if self.current is not None:
print "warning: {} node in function expression not supported".format(
node.__class__.__name__)
super(CallCollector, self).generic_visit(node)
# record the func expression
def visit_Name(self, node):
if self.current is None:
return
self.current += node.id
def visit_Attribute(self, node):
if self.current is None:
self.generic_visit(node)
self.visit(node.value)
self.current += '.' + node.attr
Use this with a ast parse tree:
tree = ast.parse(yoursource)
cc = CallCollector()
cc.visit(tree)
print cc.calls
Demo:
>>> tree = ast.parse('''\
... def foo():
... print np.random.rand(4) + np.random.randn(4)
... print linalg.norm(np.random.rand(4))
... ''')
>>> cc = CallCollector()
>>> cc.visit(tree)
>>> cc.calls
['np.random.rand', 'np.random.randn', 'linalg.norm']
The above walker only handles names and attributes; if you need more complex expression support, you'll have to extend this.
Note that collecting names like this is not a trivial task. Any indirection would not be handled. You could build a dictionary in your code of functions to call and dynamically swap out function objects, and static analysis like the above won't be able to track it.

In general, this problem is undecidable, consider for example getattribute(random, "random")().
If you want static analysis, the best there is now is jedi
If you accept dynamic solutions, then cover coverage is your best friend. It will show all used functions, rather than only directly referenced though.
Finally you can always roll your own dynamic instrumentation along the lines of:
import random
import logging
class Proxy(object):
def __getattr__(self, name):
logging.debug("tried to use random.%s", name)
return getattribute(_random, name)
_random = random
random = Proxy()

How to extend a class in python?

In python how can you extend a class? For example if I have
color.py
class Color:
def __init__(self, color):
self.color = color
def getcolor(self):
return self.color
color_extended.py
import Color
class Color:
def getcolor(self):
return self.color + " extended!"
But this doesn't work...
I expect that if I work in color_extended.py, then when I make a color object and use the getcolor function then it will return the object with the string " extended!" in the end. Also it should have gotton the init from the import.
Assume python 3.1

Use:
import color
class Color(color.Color):
...
If this were Python 2.x, you would also want to derive color.Color from object, to make it a new-style class:
class Color(object):
...
This is not necessary in Python 3.x.

class MyParent:
def sayHi():
print('Mamma says hi')
from path.to.MyParent import MyParent
class ChildClass(MyParent):
pass
An instance of ChildClass will then inherit the sayHi() method.

Another way to extend (specifically meaning, add new methods, not change existing ones) classes, even built-in ones, is to use a preprocessor that adds the ability to extend out of/above the scope of Python itself, converting the extension to normal Python syntax before Python actually gets to see it.
I've done this to extend Python 2's str() class, for instance. str() is a particularly interesting target because of the implicit linkage to quoted data such as 'this' and 'that'.
Here's some extending code, where the only added non-Python syntax is the extend:testDottedQuad bit:
extend:testDottedQuad
def testDottedQuad(strObject):
if not isinstance(strObject, basestring): return False
listStrings = strObject.split('.')
if len(listStrings) != 4: return False
for strNum in listStrings:
try: val = int(strNum)
except: return False
if val < 0: return False
if val > 255: return False
return True
After which I can write in the code fed to the preprocessor:
if '192.168.1.100'.testDottedQuad():
doSomething()
dq = '216.126.621.5'
if not dq.testDottedQuad():
throwWarning();
dqt = ''.join(['127','.','0','.','0','.','1']).testDottedQuad()
if dqt:
print 'well, that was fun'
The preprocessor eats that, spits out normal Python without monkeypatching, and Python does what I intended it to do.
Just as a c preprocessor adds functionality to c, so too can a Python preprocessor add functionality to Python.
My preprocessor implementation is too large for a stack overflow answer, but for those who might be interested, it is here on GitHub.

I use it like this.
class menssagem:
propriedade1 = "Certo!"
propriedade2 = "Erro!"
def metodo1(self)
print(self.propriedade1)
to extend.
import menssagem
class menssagem2(menssagem):
menssagem1 = None #não nescessario not necessary
def __init__(self,menssagem):
self.menssagem1 = menssagem
#call first class method
#usando o metodo da menssagem 1
def Menssagem(self):
self.menssagem1.metodo1()

Regex matching Python function calls

I'd like to create a regular expression in Python that will match against a line in Python source code and return a list of function calls.
The typical line would look like this:
something = a.b.method(time.time(), var=1) + q.y(x.m())
and the result should be:
["a.b.method()", "time.time()", "q.y()", "x.m()"]
I have two problems here:
creating the correct pattern
the catch groups are overlapping
thank you for help

I don't think regular expressions is the best approach here. Consider the ast module instead, for example:
class ParseCall(ast.NodeVisitor):
def __init__(self):
self.ls = []
def visit_Attribute(self, node):
ast.NodeVisitor.generic_visit(self, node)
self.ls.append(node.attr)
def visit_Name(self, node):
self.ls.append(node.id)
class FindFuncs(ast.NodeVisitor):
def visit_Call(self, node):
p = ParseCall()
p.visit(node.func)
print ".".join(p.ls)
ast.NodeVisitor.generic_visit(self, node)
code = 'something = a.b.method(foo() + xtime.time(), var=1) + q.y(x.m())'
tree = ast.parse(code)
FindFuncs().visit(tree)
result
a.b.method
foo
xtime.time
q.y
x.m

$ python3
>>> import re
>>> from itertools import chain
>>> def fun(s, r):
... t = re.sub(r'\([^()]+\)', '()', s)
... m = re.findall(r'[\w.]+\(\)', t)
... t = re.sub(r'[\w.]+\(\)', '', t)
... if m==r:
... return
... for i in chain(m, fun(t, m)):
... yield i
...
>>> list(fun('something = a.b.method(time.time(), var=1) + q.y(x.m())', []))
['time.time()', 'x.m()', 'a.b.method()', 'q.y()']

/([.a-zA-Z]+)\(/g
should match the method names; you'd have to add the parens after since you have some nested.

I don't really know Python, but I can imagine that making this work properly involves some complications, eg:
strings
comments
expressions that return an object
But for your example, an expression like this works:
(?:\w+\.)+\w+\(

I have an example for you proving this is doable in Python3
import re
def parse_func_with_params(inp):
func_params_limiter = ","
func_current_param = func_params_adder = "\s*([a-z-A-Z]+)\s*"
try:
func_name = "([a-z-A-Z]+)\s*"
p = re.compile(func_name + "\(" + func_current_param + "\)")
print(p.match(inp).groups())
except:
while 1:
func_current_param += func_params_limiter + func_params_adder
try:
func_name = "([a-z-A-Z]+)\s*"
p = re.compile(func_name + "\(" + func_current_param + "\)")
print(p.match(inp).groups())
break
except:
pass
Command line Input: animalFunc(lion, tiger, giraffe, singe)
Output: ('animalFunc', 'lion', 'tiger', 'giraffe', 'singe')
As you see the function name is always the first in the list and the rest are the paramaters names passed

how can I combine a switch-case and regex in Python

I want to process a string by matching it with a sequence of regular expression. As I'm trying to avoid nested if-then, I'm thinking of switch-case. How can I write the following structure in Python? Thank you
switch str:
case match(regex1):
# do something
case match(regex2):
# do sth else
I know Perl allows one to do that. Does Python?

First consider why there is no case statement in Python. So reset you brain and forget them.
You can use an object class, function decorators or use function dictionaries to achieve the same or better results.
Here is a quick trivial example:
#!/usr/bin/env python
import re
def hat(found):
if found: print "found a hat"
else: print "no hat"
def cat(found):
if found: print "found a cat"
else: print "no cat"
def dog(found):
if found: print "found a dog"
else: print "no dog"
st="""
Here is the target string
with a hat and a cat
no d o g
end
"""
patterns=['hat', 'cat', 'dog']
functions=[hat,cat,dog]
for pattern,case in zip(patterns,functions):
print "pattern=",pattern
case(re.search(pattern,st))
C style case / switch statements also "fall through, such as:
switch(c) {
case 'a':
case 'b':
case 'c': do_abc();
break;
... other cases...
}
Using tuples and lists of callables, you can get the similar behavior:
st="rat kitten snake puppy bug child"
def proc1(st): print "cuddle the %s" % st
def proc2(st): print "kill the %s" % st
def proc3(st): print "pick-up the %s" % st
def proc4(st): print "wear the %s" % st
def proc5(st): print "dispose of the %s" %st
def default(st): print "%s not found" % st
dproc={ ('puppy','kitten','child'):
[proc3, proc1],
('hat','gloves'):
[proc3, proc4],
('rat','snake','bug'):
[proc2, proc3, proc5]}
for patterns,cases in dproc.iteritems():
for pattern in patterns:
if re.search(pattern,st):
for case in cases: case(pattern)
else: default(pattern)
print
This gets the order for the found item correct: 1) pick up child, cuddle the child; 2) kill the rat, pick up the rat... It would be difficult to do the same with a C switch statement in an understandable syntax.
There are many other ways to imitate a C switch statement. Here is one (for integers) using function decorators:
case = {}
def switch_on(*values):
def case_func(f):
case.update((v, f) for v in values)
return f
return case_func
#switch_on(0, 3, 5)
def case_a(): print "case A"
#switch_on(1,2,4)
def case_b(): print "case B"
def default(): print "default"
for i in (0,2,3,5,22):
print "Case: %i" % i
try:
case[i]()
except KeyError:
default()
To paraphrase Larry Wall, Tom Christiansen, Jon Orwant in Programming Perl regarding understanding context in Perl:
You will be miserable programming Python until you use the idioms that are native to the language...

A quick search shows a similar question asked earlier with multiple work arounds. May favorite solution from that one is by Mizard
import re
class Re(object):
def __init__(self):
self.last_match = None
def match(self,pattern,text):
self.last_match = re.match(pattern,text)
return self.last_match
def search(self,pattern,text):
self.last_match = re.search(pattern,text)
return self.last_match
gre = Re()
if gre.match(r'foo',text):
# do something with gre.last_match
elif gre.match(r'bar',text):
# do something with gre.last_match
else:
# do something else

You are looking for pyswitch (disclaimer: I am the author). With it, you can do the following, which is pretty close to the example you gave in your question:
from pyswitch import Switch
mySwitch = Switch()
#myswitch.caseRegEx(regex1)
def doSomething(matchObj, *args, **kwargs):
# Do Something
return 1
#myswitch.caseRegEx(regex2)
def doSomethingElse(matchObj, *args, **kwargs):
# Do Something Else
return 2
rval = myswitch(stringYouWantToSwitchOn)
There's a much more comprehensive example given at the URL I linked. pyswitch is not restricted to just switching on regular expressions. Internally, pyswitch uses a dispatch system similar to the examples others have given above. I just got tired of having to re-write the same code framework over and over every time I needed that kind of dispatch system, so I wrote pyswitch.

Your question regarding Perl style switch statements is ambiguous. You reference Perl but you are using a C style switch statement in your example. (There is a deprecated module that provides C style switch statements in Perl, but this is not recommended...)
If you mean Perl given / when type switch statements, this would not be trivial to implement in Python. You would need to implement smart matching and other non-trivial Perl idioms. You might as well just write whatever in Perl?
If you mean C style switch statements, these are relatively trivial in comparison. Most recommend using a dictionary dispatch method, such as:
import re
def case_1():
print "case 1"
return 1
def case_2():
print "case 2"
return 2
def case_3():
print "case 3"
return 3
def default():
print "None"
return 0
dispatch= {
'a': case_1,
'g': case_2,
'some_other': case_3,
'default': default
}
str="abcdefg"
r=[dispatch[x]() if re.search(x,str) else dispatch['default']()
for x in ['a','g','z'] ]
print "r=",r

If you're avoiding if-then, you can build on something like this:
import re
# The patterns
r1 = "spam"
r2 = "eggs"
r3 = "fish"
def doSomething1():
return "Matched spam."
def doSomething2():
return "Matched eggs."
def doSomething3():
return "Matched fish."
def default():
return "No match."
def match(r, s):
mo = re.match(r, s)
try:
return mo.group()
except AttributeError:
return None
def delegate(s):
try:
action = {
match(r1, s): doSomething1,
match(r2, s): doSomething2,
match(r3, s): doSomething3,
}[s]()
return action
except KeyError:
return default()
Results
>>> delegate("CantBeFound")
0: 'No match.'
>>> delegate("spam")
1: 'Matched spam.'
>>> delegate("eggs")
2: 'Matched eggs.'
>>> delegate("fish")
3: 'Matched fish.'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python equivalent of ruby's StringScanner? - python

Is there a python class equivalent to ruby's StringScanner class? I Could hack something together, but i don't want to reinvent the wheel if this already exists.

Looks like a variant on re.split( pattern, string ). http://docs.python.org/library/re.html http://docs.python.org/library/re.html#re.split

https://pypi.python.org/pypi/scanner/ Seems a more maintained and feature complete solution. But it uses oniguruma directly.

Maybe look into the built in module tokenize. It looks like you can pass a string into it using the StringIO module.

Today there is a project by Mark Watkinson that implements StringScanner in Python: http://asgaard.co.uk/p/Python-StringScanner https://github.com/markwatkinson/python-string-scanner http://code.google.com/p/python-string-scanner/

Are you looking for regular expressions in Python? Check this link from official docs: http://docs.python.org/library/re.html

Related

the best way to parse and validate YAML configuration file

How to extract functions used in a python code file?

How to extend a class in python?

Regex matching Python function calls

how can I combine a switch-case and regex in Python

Categories

Resources