I have the following Python code (I'm using Python 2.7.X):
my_csv = '{first},{middle},{last}'
print( my_csv.format( first='John', last='Doe' ) )
I get a KeyError exception because 'middle' is not specified (this is expected). However, I want all of those placeholders to be optional. If those named parameters are not specified, I expect the placeholders to be removed. So the string printed above should be:
John,,Doe
Is there built in functionality to make those placeholders optional, or is some more in depth work required? If the latter, if someone could show me the most simple solution I'd appreciate it!
Here is one option:
from collections import defaultdict
my_csv = '{d[first]},{d[middle]},{d[last]}'
print( my_csv.format( d=defaultdict(str, first='John', last='Doe') ) )
"It does{cond} contain the the thing.".format(cond="" if condition else " not")
Thought I'd add this because it's been a feature since the question was asked, the question still pops up early in google results, and this method is built directly into the python syntax (no imports or custom classes required). It's a simple shortcut conditional statement. They're intuitive to read (when kept simple) and it's often helpful that they short-circuit.
Here's another option that uses the string interpolation operator %:
class DataDict(dict):
def __missing__(self, key):
return ''
my_csv = '%(first)s,%(middle)s,%(last)s'
print my_csv % DataDict(first='John', last='Doe') # John,,Doe
Alternatively, if you prefer using the more modern str.format() method, the following would also work, but is less automatic in the sense that you'll have explicitly define every possible placeholder in advance (although you could modify DataDict.placeholders on-the-fly if desired):
class DataDict(dict):
placeholders = 'first', 'middle', 'last'
default_value = ''
def __init__(self, *args, **kwargs):
self.update(dict.fromkeys(self.placeholders, self.default_value))
dict.__init__(self, *args, **kwargs)
my_csv = '{first},{middle},{last}'
print(my_csv.format(**DataDict(first='John', last='Doe'))) # John,,Doe
I faced the same problem as yours and decided to create a library to solve this problem: pyformatting.
Here is the solution to your problem with pyformatting:
>>> from pyformatting import defaultformatter
>>> default_format = defaultformatter(str)
>>> my_csv = '{first},{middle},{last}'
>>> default_format(my_csv, first='John', last='Doe')
'John,,Doe'
The only problem is pyformatting doesn't support python 2. pyformatting supports python 3.1+
If i see any feedback on the need for 2.7 support i think i will add that support.
Related
I would like to understand some Python code that I've been reading:
my_stream = some.library.method(arg1=val, arg2=val)(input_stream)
My guess is that some.library.method() returns an iterator into which input_stream is passed as an argument. Is this correct?
I have searched "python generator functions" to get documentation on this type of syntax but have found nothing other than nested examples such as: sum(mult(input)). Can anyone provide an explanation or link?
UPDATE
Below is a specific example:
tokenized_train_stream = trax.data.Tokenize(vocab_file=VOCAB_FILE, vocab_dir=VOCAB_DIR)(train_stream)
Is this correct?
If you are unsure about if thing in python is something you might use inspect built-in module, it provides numerous issomething functions, among them isgenerator, simple usage example
import inspect
lst = [1,2,3]
gen = (i for i in [1,2,3])
print(inspect.isgenerator(lst)) # False
print(inspect.isgenerator(gen)) # True
I have a helper function that converts a %Y-%m-%d %H:%M:%S-formatted string to a datetime.datetime:
def ymdt_to_datetime(ymdt: str) -> datetime.datetime:
return datetime.datetime.strptime(ymdt, '%Y-%m-%d %H:%M:%S')
I can validate the ymdt format in the function itself, but it'd be more useful to have a custom object to use as a type hint for the argument, something like
from typing import NewType, Pattern
ymdt_pattern = '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]'
YmdString = NewType('YmdString', Pattern[ymdt_pattern])
def ymdt_to_datetime(ymdt: YmdString)...
Am I going down the wrong rabbit hole? Should this be an issue in mypy or someplace? Or can this be accomplished with the current type hint implementation (3.61)?
There currently is no way for types to statically verify that your string matches a precise format, unfortunately. This is partially because checking at compile time the exact values a given variable can hold is exceedingly difficult to implement (and in fact, is NP-hard in some cases), and partially because the problem becomes impossible in the face of things like user input. As a result, it's unlikely that this feature will be added to either mypy or the Python typing ecosystem in the near future, if at all.
One potential workaround would be to leverage NewType, and carefully control when exactly you construct a string of that format. That is, you could do:
from typing import NewType
YmdString = NewType('YmdString', str)
def datetime_to_ymd(d: datetime) -> YmdString:
# Do conversion here
return YmdStr(s)
def verify_is_ymd(s: str) -> YmdString:
# Runtime validation checks here
return YmdString(s)
If you use only functions like these to introduce values of type YmdString and do testing to confirm that your 'constructor functions' are working perfectly, you can more or less safely distinguish between strings and YmdString at compile time. You'd then want to design your program to minimize how frequently you call these functions to avoid incurring unnecessary overhead, but hopefully, that won't be too onerous to do.
Using type-hints does nothing in Python and acts as an indication of the type in static checkers. It is not meant to perform any actions, merely annotate a type.
You can't do any validation, all you can do, with type-hints and a checker, is make sure the argument passed in is actually of type str.
🌷🌷🌷🌷🌷
Okay, here we are five years later and the answer is now yes, at least if you're willing to take a third-party library on board and decorate the functions you want to be checked at runtime:
$ pip install beartype
import re
from typing import Annotated # python 3.9+
from beartype import beartype
from beartype.vale import Is
YtdString = Annotated[str, Is[lambda string: re.match('[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]', string) is not None]]
#beartype
def just_print_it(ytd_string: YtdString) -> None:
print(ytd_string)
> just_print_it("hey")
BeartypeCallHintParamViolation: #beartyped just_print_it() parameter ytd_string='hey' violates type hint typing.Annotated[str, Is[<lambda>]], as 'hey' violates validator Is[<lambda>]:
False == Is[<lambda>].
> just_print_it("2022-12-23 09:09:23")
2022-12-23 09:09:23
> just_print_it("2022-12-23 09:09:2")
BeartypeCallHintParamViolation: #beartyped just_print_it() parameter ytd_string='2022-12-23 09:09:2' violates type hint typing.Annotated[str, Is[<lambda>]], as '2022-12-23 09:09:2' violates validator Is[<lambda>]:
False == Is[<lambda>].
Please note that I'm using the very imperfect regex pattern I originally included in the question, not production-ready.
Then, a hopeful note: The maintainer of beartype is hard at work on an automagical import hook which will eliminate the need for decorating functions in order to achieve the above.
I want to use Python for creating JSON.
Since I found no library which can help me, I want to know if it's possible to inspect the order of the classes in a Python file?
Example
# example.py
class Foo:
pass
class Bar:
pass
If I import example, I want to know the order of the classes. In this case it is [Foo, Bar] and not [Bar, Foo].
Is this possible? If "yes", how?
Background
I am not happy with yaml/json. I have the vague idea to create config via Python classes (only classes, not instantiation to objects).
Answers which help me to get to my goal (Create JSON with a tool which is easy and fun to use) are welcome.
The inspect module can tell the line numbers of the class declarations:
import inspect
def get_classes(module):
for name, value in inspect.getmembers(module):
if inspect.isclass(value):
_, line = inspect.getsourcelines(value)
yield line, name
So the following code:
import example
for line, name in sorted(get_classes(example)):
print line, name
Prints:
2 Foo
5 Bar
First up, as I see it, there are 2 things you can do...
Continue pursuing to use Python source files as configuration files. (I won't recommend this. It's analogous to using a bulldozer to strike a nail or converting a shotgun to a wheel)
Switch to something like TOML, JSON or YAML for configuration files, which are designed for the job.
Nothing in JSON or YAML prevents them from holding "ordered" key-value pairs. Python's dict data type is unordered by default (at least till 3.5) and list data type is ordered. These map directly to object and array in JSON respectively, when using the default loaders. Just use something like Python's OrderedDict when deserializing them and voila, you preserve order!
With that out of the way, if you really want to use Python source files for the configuration, I suggest trying to process the file using the ast module. Abstract Syntax Trees are a powerful tool for syntax level analysis.
I whipped a quick script for extracting class line numbers and names from a file.
You (or anyone really) can use it or extend it to be more extensive and have more checks if you want for whatever you want.
import sys
import ast
import json
class ClassNodeVisitor(ast.NodeVisitor):
def __init__(self):
super(ClassNodeVisitor, self).__init__()
self.class_defs = []
def visit(self, node):
super(ClassNodeVisitor, self).visit(node)
return self.class_defs
def visit_ClassDef(self, node):
self.class_defs.append(node)
def read_file(fpath):
with open(fpath) as f:
return f.read()
def get_classes_from_text(text):
try:
tree = ast.parse(text)
except Exception as e:
raise e
class_extractor = ClassNodeVisitor()
li = []
for definition in class_extractor.visit(tree):
li.append([definition.lineno, definition.name])
return li
def main():
fpath = "/tmp/input_file.py"
try:
text = read_file(fpath)
except Exception as e:
print("Could not load file due to " + repr(e))
return 1
print(json.dumps(get_classes_from_text(text), indent=4))
if __name__ == '__main__':
sys.exit(main())
Here's a sample run on the following file:
input_file.py:
class Foo:
pass
class Bar:
pass
Output:
$ py_to_json.py input_file.py
[
[
1,
"Foo"
],
[
5,
"Bar"
]
]
If I import example,
If you're going to import the module, the example module to be on the import path. Importing means executing any Python code in the example module. This is a pretty big security hole - you're loading a user-editable file in the same context as the rest of the application.
I'm assuming that since you care about preserving class-definition order, you also care about preserving the order of definitions within each class.
It is worth pointing out that is now the default behavior in python, since python3.6.
Aslo see PEP 520: Preserving Class Attribute Definition Order.
(Moving my comments to an answer)
That's a great vague idea. You should give Figura a shot! It does exactly that.
(Full disclosure: I'm the author of Figura.)
I should point out the order of declarations is not preserved in Figura, and also not in json.
I'm not sure about order-preservation in YAML, but I did find this on wikipedia:
... according to the specification, mapping keys do not have an order
It might be the case that specific YAML parsers maintain the order, though they aren't required to.
You can use a metaclass to record each class's creation time, and later, sort the classes by it.
This works in python2:
class CreationTimeMetaClass(type):
creation_index = 0
def __new__(cls, clsname, bases, dct):
dct['__creation_index__'] = cls.creation_index
cls.creation_index += 1
return type.__new__(cls, clsname, bases, dct)
__metaclass__ = CreationTimeMetaClass
class Foo: pass
class Bar: pass
classes = [ cls for cls in globals().values() if hasattr(cls, '__creation_index__') ]
print(sorted(classes, key = lambda cls: cls.__creation_index__))
The standard json module is easy to use and works well for reading and writing JSON config files.
Objects are not ordered within JSON structures but lists/arrays are, so put order dependent information into a list.
I have used classes as a configuration tool, the thing I did was to derive them from a base class which was customised by the particular class variables. By using the class like this I did not need a factory class. For example:
from .artifact import Application
class TempLogger(Application): partno='03459'; path='c:/apps/templog.exe'; flag=True
class GUIDisplay(Application): partno='03821'; path='c:/apps/displayer.exe'; flag=False
in the installation script
from .install import Installer
import app_configs
installer = Installer(apps=(TempLogger(), GUIDisplay()))
installer.baseline('1.4.3.3475')
print installer.versions()
print installer.bill_of_materials()
One should use the right tools for the job, so perhaps python classes are not the right tool if you need ordering.
Another python tool I have used to create JSON files is Mako templating system. This is very powerful. We used it to populate variables like IP addresses etc into static JSON files that were then read by C++ programs.
I'm not sure if this is answers your question, but it might be relevant. Take a look at the excellent attrs module. It's great for creating classes to use as data types.
Here's an example from glyph's blog (creator of Twisted Python):
import attr
#attr.s
class Point3D(object):
x = attr.ib()
y = attr.ib()
z = attr.ib()
It saves you writing a lot of boilerplate code - you get things like str representation and comparison for free, and the module has a convenient asdict function which you can pass to the json library:
>>> p = Point3D(1, 2, 3)
>>> str(p)
'Point3D(x=1, y=2, z=3)'
>>> p == Point3D(1, 2, 3)
True
>>> json.dumps(attr.asdict(p))
'{"y": 2, "x": 1, "z": 3}'
The module uses a strange naming convention, but read attr.s as "attrs" and attr.ib as "attrib" and you'll be okay.
Just touching the point about creating JSON from python. there is an excellent library called jsonpickle which lets you dump python objects to json. (and using this alone or with other methods mentioned here you can probably get what you wanted)
I am using HTMLTestRunner to create an HTML report for my unit test. I suppose the code for HTMLTestRunner provided here is written and optimized for python 2 and before, because I got three errors regarding incompatibility with python 3 like use of StringIO instead of io.
Now, line 639 has a has_key method defined, in this code snippet
def sortResult(self, result_list):
# unittest does not seems to run in any particular order.
# Here at least we want to group them together by class.
rmap = {}
classes = []
for n,t,o,e in result_list:
cls = t.__class__
if not rmap.has_key(cls):
rmap[cls] = []
classes.append(cls)
rmap[cls].append((n,t,o,e))
r = [(cls, rmap[cls]) for cls in classes]
return r
Since python 3 has has_key removed from python 3, so I get error regarding this. Since I am not that much familiar with python, I searched and found that in can be a suitable replacement. So how can I replace this has_ key method? I tried by simply replacing has key with in but it failed and got an invalid syntax error.
Instead of
if not rmap.has_key(cls):
try
if cls not in rmap:
You can see the docs for details.
When I write a function in Python (v2.7), I very often have a type in mind for one of the arguments. I'm working with the unbelievably brilliant pandas library at the movemement, so my arguments are often 'intended' to be pandas.DataFrames.
In my favorite IDE (Spyder), when you type a period . a list of methods appear. Also, when you type the opening parenthesis of a method, the docstring appears in a little window.
But for these things to work, the IDE has to know what type a variable is. But of course, it never does. Am I missing something obvious about how to write Pythonic code (I've read Python Is Not Java but it doesn't mention this IDE autocomplete issue.
Any thoughts?
I don't know if it works in Spyder, but many completion engines (e.g. Jedi) also support assertions to tell them what type a variable is. For example:
def foo(param):
assert isinstance(param, str)
# now param will be considered a str
param.|capitalize
center
count
decode
...
Actually I use IntelliJ idea ( aka pyCharm ) and they offer multiple ways to specify variable types:
1. Specify Simple Variable
Very simple: Just add a comment with the type information behind the definition. From now on Pycharm supports autcompletition! e.g.:
def route():
json = request.get_json() # type: dict
Source: https://www.jetbrains.com/help/pycharm/type-hinting-in-pycharm.html
2. Specify Parameter:
Add three quote signs after the beginning of a method and the idea will autocomplete a docstring, as in the following example:
Source: https://www.jetbrains.com/help/pycharm/using-docstrings-to-specify-types.html
(Currently on my mobile, going to make it pretty later)
If you're using Python 3, you can use function annotations. As an example:
#typechecked
def greet(name: str, age: int) -> str:
print("Hello {0}, you are {1} years old".format(name, age))
I don't use Spyder, but I would assume there's a way for it to read the annotations and act appropriately.
I don't know whether Spyder reads docstrings, but PyDev does:
http://pydev.org/manual_adv_type_hints.html
So you can document the expected type in the docstring, e.g. as in:
def test(arg):
'type arg: str'
arg.<hit tab>
And you'll get the according string tab completion.
Similarly you can document the return-type of your functions, so that you can get tab-completion on foo for foo = someFunction().
At the same time, docstrings make auto-generated documention much more helpful.
The problem is with the dynamic features of Python, I use Spyder and I've used a lot more of python IDEs (PyCharm, IDLE, WingIDE, PyDev, ...) and ALL of them have the problem you stated here. So, when I want code completion for help I just instantiate the variable to the type I want and then type ".", for example: suppose you know your var df will be a DataFrame in some piece of code, you can do this df = DataFrame() and for now on the code completion should work for you just do not forget to delete (or comment) the line df = DataFrame() when you finish editing the code.