I am currently implementing an ORM that stores data defined in an XSD handled with a DOM generated by PyXB.
Many of the respective elements contain sub-elements and so forth, which each have a minOccurs=0 and thus may resolve to None in the DOM.
Hence when accessing some element hierarchy containing optional elements I now face the problem whether to use:
with suppress(AttributeError):
wanted_subelement = root.subelement.sub_subelement.wanted_subelement
or rather
if root.subelement is not None:
if root.subelement.sub_subelement is not None:
wanted_subelement = root.subelement.sub_subelement.wanted_subelement
While both styles work perfectly fine, which is preferable? (I am not Dutch, btw.)
This also works:
if root.subelement and root.subelement.sub_subelement:
wanted_subelement = root.subelement.sub_subelement.wanted_subelement
The if statement evaluates None as False and will check from left to right. So if the first element evaluates to false it will not try to access the second one.
If you have quite a few such lookups to perform, better to wrap this up in a more generic lookup function:
# use a sentinel object distinct from None
# in case None is a valid value for an attribute
notfound = object()
# resolve a python attribute path
# - mostly, a `getattr` that supports
# arbitrary sub-attributes lookups
def resolve(element, path):
parts = path.split(".")
while parts:
next, parts = parts[0], parts[1:]
element = getattr(element, next, notfound)
if element is notfound:
break
return element
# just to test the whole thing
class Element(object):
def __init__(self, name, **attribs):
self.name = name
for k, v in attribs.items():
setattr(self, k, v)
e = Element(
"top",
sub1=Element("sub1"),
nested1=Element(
"nested1",
nested2=Element(
"nested2",
nested3=Element("nested3")
)
)
)
tests = [
"notthere",
"does.not.exists",
"sub1",
"sub1.sub2",
"nested1",
"nested1.nested2",
"nested1.nested2.nested3"
]
for path in tests:
sub = resolve(e, path)
if sub is notfound:
print "%s : not found" % path
else:
print "%s : %s" % (path, sub.name)
Related
I need to check for empty values of every field of a distinct object. And I'm tiered of typing it out.
In this case. I have the an object called signal with multiple fields, which should not be empty.
if self.is_blank(signal.provider_id):
error_response = "Signal rejected. No signal provider id given."
elif self.is_blank(signal.sequence_id):
error_response = "Signal rejected. No signal sequence id provided."
....
def is_blank (self, string):
"""Checks for None and empty values"""
return True if string and string.strip() else False
Anyhow, what is the fast way in python to check all fields for "emptiness"? How do we loop them?
You may want to use operator.attrgetter:
def is_blank(self, field_names):
for name in field_names:
if getattr(self, name) and getattr(self, name).strip():
return True, name
return False, None
...
is_blank, name = self.is_blank(['provider_id', 'sequence_id', ...])
if is_blank:
print(f'Signal rejected. No signal {name} provided.')
You can also implement is_blank with next:
def is_blank(self, field_names):
return next(
((True, name)
for name in field_names
if getattr(self, name) and getattr(self, name).strip()),
(False, None),
)
This is going to print an error message for the first field that is failing the check. All you need to do is to provide a complete list of the attributes to be checked.
As rostamn mentioned, you can convert your object into a dictionary,
after which you can loop through the (key, values) in a single line with a filter and check the result like so:
any_empty = any([True for x, y in your_obj.__dict__.items() if not y])
Change the condition in the loop to the type of empty check you need.
To loop over all instance properties you use my_instance.__dict__
see this answer for details: Explain __dict__ attribute
I have encountered a very strange behaviour while working with BeautifulSoup today.
Let's have a look at a very simple html snippet:
<html><body><ix:nonfraction>lele</ix:nonfraction></body></html>
I am trying to get the content of the <ix:nonfraction> tag with BeautifulSoup.
Everything works fine when using the find method:
from bs4 import BeautifulSoup
html = "<html><body><ix:nonfraction>lele</ix:nonfraction></body></html>"
soup = BeautifulSoup(html, 'lxml') # The parser used here does not matter
soup.find('ix:nonfraction')
>>> <ix:nonfraction>lele</ix:nonfraction>
However, when trying to use the find_all method, I expect to have a list with this single element returned, which is not the case !
soup.find_all('ix:nonfraction')
>>> []
In fact, find_all seems to return an empty list everytime a colon is present in the tag I am searching for.
I have been able to reproduce the problem on two different computers.
Does anyone have an explanation, and more importantly, a workaround ?
I need to use the find_all method simply because my actual case requires me to get all these tags on a whole html page.
The reason #yosemite_k's solution works is because in the source code of bs4, it's skipping a certain condition which causes this behavior. You can in fact do many variations which will produce this same result. Examples:
soup.find_all({"ix:nonfraction"})
soup.find_all('ix:nonfraction', limit=1)
soup.find_all('ix:nonfraction', text=True)
Below is a snippet from the source code of beautifulsoup that shows what happens when you call find or find_all. You will see that find just calls find_all with limit=1. In _find_all it checks for a condition:
if text is None and not limit and not attrs and not kwargs:
If it hits that condition, then the it might eventually make it down to this condition:
# Optimization to find all tags with a given name.
if name.count(':') == 1:
If it makes it there, then it does a reassignment of name:
# This is a name with a prefix.
prefix, name = name.split(':', 1)
This is where you're behavior is different. As long as find_all doesn't meet any of the prior conditions, then you'll find the element.
beautifulsoup4==4.6.0
def find(self, name=None, attrs={}, recursive=True, text=None,
**kwargs):
"""Return only the first child of this Tag matching the given
criteria."""
r = None
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
if l:
r = l[0]
return r
findChild = find
def find_all(self, name=None, attrs={}, recursive=True, text=None,
limit=None, **kwargs):
"""Extracts a list of Tag objects that match the given
criteria. You can specify the name of the Tag and any
attributes you want the Tag to have.
The value of a key-value pair in the 'attrs' map can be a
string, a list of strings, a regular expression object, or a
callable that takes a string and returns whether or not the
string matches for some custom definition of 'matches'. The
same is true of the tag name."""
generator = self.descendants
if not recursive:
generator = self.children
return self._find_all(name, attrs, text, limit, generator, **kwargs)
def _find_all(self, name, attrs, text, limit, generator, **kwargs):
"Iterates over a generator looking for things that match."
if text is None and 'string' in kwargs:
text = kwargs['string']
del kwargs['string']
if isinstance(name, SoupStrainer):
strainer = name
else:
strainer = SoupStrainer(name, attrs, text, **kwargs)
if text is None and not limit and not attrs and not kwargs:
if name is True or name is None:
# Optimization to find all tags.
result = (element for element in generator
if isinstance(element, Tag))
return ResultSet(strainer, result)
elif isinstance(name, str):
# Optimization to find all tags with a given name.
if name.count(':') == 1:
# This is a name with a prefix.
prefix, name = name.split(':', 1)
else:
prefix = None
result = (element for element in generator
if isinstance(element, Tag)
and element.name == name
and (prefix is None or element.prefix == prefix)
)
return ResultSet(strainer, result)
results = ResultSet(strainer)
while True:
try:
i = next(generator)
except StopIteration:
break
if i:
found = strainer.search(i)
if found:
results.append(found)
if limit and len(results) >= limit:
break
return results
leaving the tag name empty and using ix as attribute.
soup.find_all({"ix:nonfraction"})
works well
EDIT: 'ix:nonfraction' is not a tag name, so soup.find_all("ix:nonfraction") returned an empty list for a non existent tag.
first to mention, my code works, I just feel, I could do it more efficient. But how?
I have a routine with (optional and standard given) arguments:
def read(fpath = 'C:', fname = 'text.txt')
Later I call this function and case by case different arguments for the shown function:
def get(index, path=None, name=None):
if path == None:
if name == None:
elements = read()
else:
elements = read(fname=name)
else:
if name == None:
elements = read(fpath=path)
else:
elements = read(fpath=path,fname=name)
How can I write it shorter without losing clarity of code?
Thank you.
PS: Is my first question, if I missed a rule, please correct me. I'm learning.
You can modify the definition of read to use the same arguments' names, then you can call it directly.
def read(path=None, name=None):
print(path)
print(name)
return 'something usefull'
def get(index, **kwargs):
elements = read(**kwargs)
get(0, path='a path', name='a name')
# a path
# a name
Instead of path=None, name=None you can of course provide actual default values in read's definition (unless they are mutable).
You could work with a dictionary and use keyword parameters:
def get(index, path = None, name = None):
kwargs = {}
if path is not None:
kwargs['path'] = path
if name is not None:
kwargs['name'] = name
elements = read(**kwargs)
Or if you simply want to filter out Nones for all parameters:
def get(index, path = None, name = None):
kwargs = dict(path = path, name = name)
elements = read(**{k:v for k,v in kwargs.items() if v is not None})
Nevertheless I wonder whether it wouldn't be better, if you communicate the default values to the user by using the same default values. Since a user will notice weird behavior. So my advice would be to use:
def read(index, fpath = 'C:', fname = 'text.txt'):
elements = read(fpath=fpath,fname=fname)
Since now a user can see from the type signature what will happen if he/she does not provide a certain parameter. You can also pass the **kwargs in which case the arguments provided to get are (with except of index here), passed to read(..):
def get(index, **kwargs):
elements = read(**kwargs)
I think you can use **kwargs over here.
**kwargs allows you to pass keyworded variable length of arguments to a function. You should use **kwargs if you want to handle named arguments in a function. Here is an example:
def greet_me(**kwargs):
if kwargs is not None:
for key, value in kwargs.iteritems():
print "%s == %s" %(key,value)
Output:
greet_me(name="yasoob")
name == yasoob
I've set up a metaclass and base class pair for creating the line specifications of several different file types I have to parse.
I have decided to go with using enumerations because many of the individual parts of the different lines in the same file often have the same name. Enums make it easy to tell them apart. Additionally, the specification is rigid and there will be no need to add more members, or extend the line specifications later.
The specification classes work as expected. However, I am having some trouble dynamically creating them:
>>> C1 = LineMakerMeta('C1', (LineMakerBase,), dict(a = 0))
AttributeError: 'dict' object has no attribute '_member_names'
Is there a way around this? The example below works just fine:
class A1(LineMakerBase):
Mode = 0, dict(fill=' ', align='>', type='s')
Level = 8, dict(fill=' ', align='>', type='d')
Method = 10, dict(fill=' ', align='>', type='d')
_dummy = 20 # so that Method has a known length
A1.format(**dict(Mode='DESIGN', Level=3, Method=1))
# produces ' DESIGN 3 1'
The metaclass is based on enum.EnumMeta, and looks like this:
import enum
class LineMakerMeta(enum.EnumMeta):
"Metaclass to produce formattable LineMaker child classes."
def _iter_format(cls):
"Iteratively generate formatters for the class members."
for member in cls:
yield member.formatter
def __str__(cls):
"Returns string line with all default values."
return cls.format()
def format(cls, **kwargs):
"Create formatted version of the line populated by the kwargs members."
# build resulting string by iterating through members
result = ''
for member in cls:
# determine value to be injected into member
try:
try:
value = kwargs[member]
except KeyError:
value = kwargs[member.name]
except KeyError:
value = member.default
value_str = member.populate(value)
result = result + value_str
return result
And the base class is as follows:
class LineMakerBase(enum.Enum, metaclass=LineMakerMeta):
"""A base class for creating Enum subclasses used for populating lines of a file.
Usage:
class LineMaker(LineMakerBase):
a = 0, dict(align='>', fill=' ', type='f'), 3.14
b = 10, dict(align='>', fill=' ', type='d'), 1
b = 15, dict(align='>', fill=' ', type='s'), 'foo'
# ^-start ^---spec dictionary ^--default
"""
def __init__(member, start, spec={}, default=None):
member.start = start
member.spec = spec
if default is not None:
member.default = default
else:
# assume value is numerical for all provided types other than 's' (string)
default_or_set_type = member.spec.get('type','s')
default = {'s': ''}.get(default_or_set_type, 0)
member.default = default
#property
def formatter(member):
"""Produces a formatter in form of '{0:<format>}' based on the member.spec
dictionary. The member.spec dictionary makes use of these keys ONLY (see
the string.format docs):
fill align sign width grouping_option precision type"""
try:
# get cached value
return '{{0:{}}}'.format(member._formatter)
except AttributeError:
# add width to format spec if not there
member.spec.setdefault('width', member.length if member.length != 0 else '')
# build formatter using the available parts in the member.spec dictionary
# any missing parts will simply not be present in the formatter
formatter = ''
for part in 'fill align sign width grouping_option precision type'.split():
try:
spec_value = member.spec[part]
except KeyError:
# missing part
continue
else:
# add part
sub_formatter = '{!s}'.format(spec_value)
formatter = formatter + sub_formatter
member._formatter = formatter
return '{{0:{}}}'.format(formatter)
def populate(member, value=None):
"Injects the value into the member's formatter and returns the formatted string."
formatter = member.formatter
if value is not None:
value_str = formatter.format(value)
else:
value_str = formatter.format(member.default)
if len(value_str) > len(member) and len(member) != 0:
raise ValueError(
'Length of object string {} ({}) exceeds available'
' field length for {} ({}).'
.format(value_str, len(value_str), member.name, len(member)))
return value_str
#property
def length(member):
return len(member)
def __len__(member):
"""Returns the length of the member field. The last member has no length.
Length are based on simple subtraction of starting positions."""
# get cached value
try:
return member._length
# calculate member length
except AttributeError:
# compare by member values because member could be an alias
members = list(type(member))
try:
next_index = next(
i+1
for i,m in enumerate(type(member))
if m.value == member.value
)
except StopIteration:
raise TypeError(
'The member value {} was not located in the {}.'
.format(member.value, type(member).__name__)
)
try:
next_member = members[next_index]
except IndexError:
# last member defaults to no length
length = 0
else:
length = next_member.start - member.start
member._length = length
return length
This line:
C1 = enum.EnumMeta('C1', (), dict(a = 0))
fails with exactly the same error message. The __new__ method of EnumMeta expects an instance of enum._EnumDict as its last argument. _EnumDict is a subclass of dict and provides an instance variable named _member_names, which of course a regular dict doesn't have. When you go through the standard mechanism of enum creation, this all happens correctly behind the scenes. That's why your other example works just fine.
This line:
C1 = enum.EnumMeta('C1', (), enum._EnumDict())
runs with no error. Unfortunately, the constructor of _EnumDict is defined as taking no arguments, so you can't initialize it with keywords as you apparently want to do.
In the implementation of enum that's backported to Python3.3, the following block of code appears in the constructor of EnumMeta. You could do something similar in your LineMakerMeta class:
def __new__(metacls, cls, bases, classdict):
if type(classdict) is dict:
original_dict = classdict
classdict = _EnumDict()
for k, v in original_dict.items():
classdict[k] = v
In the official implementation, in Python3.5, the if statement and the subsequent block of code is gone for some reason. Therefore classdict must be an honest-to-god _EnumDict, and I don't see why this was done. In any case the implementation of Enum is extremely complicated and handles a lot of corner cases.
I realize this is not a cut-and-dried answer to your question but I hope it will point you to a solution.
Create your LineMakerBase class, and then use it like so:
C1 = LineMakerBase('C1', dict(a=0))
The metaclass was not meant to be used the way you are trying to use it. Check out this answer for advice on when metaclass subclasses are needed.
Some suggestions for your code:
the double try/except in format seems clearer as:
for member in cls:
if member in kwargs:
value = kwargs[member]
elif member.name in kwargs:
value = kwargs[member.name]
else:
value = member.default
this code:
# compare by member values because member could be an alias
members = list(type(member))
would be clearer with list(member.__class__)
has a false comment: listing an Enum class will never include the aliases (unless you have overridden that part of EnumMeta)
instead of the complicated __len__ code you have now, and as long as you are subclassing EnumMeta you should extend __new__ to automatically calculate the lengths once:
# untested
def __new__(metacls, cls, bases, clsdict):
# let the main EnumMeta code do the heavy lifting
enum_cls = super(LineMakerMeta, metacls).__new__(cls, bases, clsdict)
# go through the members and calculate the lengths
canonical_members = [
member
for name, member in enum_cls.__members__.items()
if name == member.name
]
last_member = None
for next_member in canonical_members:
next_member.length = 0
if last_member is not None:
last_member.length = next_member.start - last_member.start
The simplest way to create Enum subclasses on the fly is using Enum itself:
>>> from enum import Enum
>>> MyEnum = Enum('MyEnum', {'a': 0})
>>> MyEnum
<enum 'MyEnum'>
>>> MyEnum.a
<MyEnum.a: 0>
>>> type(MyEnum)
<class 'enum.EnumMeta'>
As for your custom methods, it might be simpler if you used regular functions, precisely because Enum implementation is so special.
Ok, I recently started programming in Python, and I really like it.
However, I have run into a little issue.
I want to be able to define a function to take in some data and assign it to a variable that I designate, rather than have to perform the operation every time I want to submit the value.
Here is a code fragment:
try:
if elem.virtual.tag:
virt = True
temp_asset.set_virtual(True)
except AttributeError:
temp_asset.set_virtual(False)
if virt: #if virtual, get only faction, value, and range for presence
try:
fac = elem.presence.faction #an xml tag (objectified)
except AttributeError:
fac = "faction tag not found"
temp_asset.misload = True
try:
val = elem.presence.value
except AttributeError:
val = "value tag not found"
temp_asset.misload = True
try:
rang = elem.presence.range
except AttributeError:
rang = "range tag not found"
temp_asset.misload = True
#Set presence values
temp_asset.set_presence(fac, val, rang)
The functions set the values, but I want to be able to perform the error checking with something like this:
def checkval(self, variable_to_set, tag_to_use)
try:
variable_to_set = tag_to_use
except AttributeError:
variable_to_set = "tag not found"
temp_asset.misload = True
Is this doable? Let me know if I need to show more code.
Edit: I don't need pointers per se, just anything that works this way and saves typing.
Edit 2: Alternatively, I need a solution of how to check whether an objectified xml node exists (lxml).
Have you tried/looked into the getattr and setattr functions?
For example, assuming these "variables" are object attributes:
def checkval(self, attr, presence, tagstr):
tag = getattr(presence, tagstr, None) # tag = presence."tagstr" or None
setattr(self, attr, tag or 'tag not found') # ?? = presence."tagstr" or 'tag not found'
if tag is None:
self.temp_asset.misload = True
You call it like,
your_object.checkval('fac', elem.presence, 'faction')
Alternatively, you can pre-define these variables and set them default values before you attempt to look up the tags. For example:
class YourObject(object):
_attrmap = {
'fac': 'faction',
'val': 'value',
'rang': 'range',
}
def __init__(self):
# set default values
for attr, tagstr in self._attrmap.items():
setattr(self, attr, '%s tag not found' % tagstr)
def checkval(self, attr, presence):
for attr, tagstr in self._attrmap.items():
tag = getattr(presence, tagstr, None)
if tag is not None:
setattr(self, attr, tag)
else:
self.temp_asset.misload = True