I am curious of is it a way to deal with Avro Python in the same way as in Java or C++ implementations.
According to the official Avro Python documentation, I have to provide an Avro schema in runtime to encode/decode data. But is it a way to use code generator as it did in Java/C++?
Update: My coworker put together a pretty good library for doing this, avro-to-python. We have been using it in production for over a year now on some pretty complex schemas.
I had to implement something like this for php: avro-to-php
pyschema is a pretty good start, but the documentation is poor. You'll need to look a the source code to see how it all works. You can use it to read avro schemas and generate python source code. It adds another layer of abstraction and as such slows things down a bit more.
I've asked this question a couple of times recently in the Pulsar slack channel and my belief is that no tool currently exists that can convert an Avro schema to a Python class that is compatible with the Pulsar Python client library.
The Pulsar Python client library expects the Python class to inherit from the Record class (https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/pulsar/schema/definition.py#L57), and for every field in the Python class to inherit from the Field class (https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/pulsar/schema/definition.py#L141), both of which are defined in the Pulsar Python client library.
So, an Avro to Python converter would have to import the Record class and Field class from the Python client library, and so if such a converter exists, someone in the Pulsar Slack community really should know about it.
Further, the Pulsar Python client library is missing support for Avro keywords like "doc", "namespace", and for null default values. So even if an Avro to Python converter exists for Pulsar, likely, the converted Python class cannot be properly consumed by the Pulsar Python client library.
I don't see any indication of an existing Avro schema -> Python class code generator in the docs (which explicitly mention code generation for the Java case) for arbitrary Python interpreters. If you're using Jython, you could use the Java code generator to make a class that you access in your Jython code.
Unlike Java and C++, failing to have code generation doesn't affect Python performance much (in the CPython case anyway), since class instances are implemented in terms of dicts anyway (there are exceptions to this rule in a sense, but they mostly change memory usage, not the fact that dict lookup is always involved). That makes code generation largely "nice to have" syntactic sugar, not a necessary feature for development; with some effort, you could always implement a converter than writes out a class definition and evals it in Python to get a similar effect (this is how collections.namedtuple classes are defined).
Related
And how can I define my own types of variables/objects to behave with the provided operations in python?
Can I do this without having to create my own object classes and operators as a separate and distinct entity? I dont mind having to create my own object class (thats a given), but I want it to integrate flawlessly with the already existing constructs. So emphasis on my wish to avoid "separate and distinct".
Im trying to create a quaternion object class. I want to define a 1i and 1k that are distinct from 1j.
And yes, a package might already exist; this is purely academic and for my own programming practice and understanding. Im trying to extend what is already there, and not build something that is distinct and separate.
I already do class objects but unfortunately they require a redefinition of the basic operations in order to make use of them, and even then I have to "declare" these objects before I can use them, quite unlike '1j'.
I hope I am clear with my intent. The end result of a quaternion is not my intent; it is the types of methods and objects and generalizations Im trying to figure out how to do, to extend and make use of what is already built into python.
It seems to me whoever built numpy and cmath have already been able to achieve this endeavor.
Thanks to the commentary below. I now have the vocabulary to express my intent better.
Im trying to add new features to Pythons SYNTAX.
If anyone can offer resources on how to do this, Id appreciate it.
I see two options for you here:
Change the Python syntax (fork CPython), there's a surprising amount of articles about how to do that.
Build some kind of preprocessor like Mypy.
Either way it seems like too much trouble just to have a new literal value.
Python does not support custom operators nor custom literals.
A language that supports custom literals is C++ (since C++11 I believe), but it does not support custom operators.
A language that supports custom operators is, for example, Haskell.
If you want to add this feature to Python you'll have to take the Python sources, modify its grammar, modify the lexer/parser and more importantly the compiler.
However at that point you just create a new language, which means you broke compatibility with python.
The easiest solution would simply be to write a simple preprocessor that replaces some simple syntax with an expanded equivalent. For example:
sed -i 's/(\d)+\+(\d+)i/MyComplex(\1, \2)/g' my_file.py
Then you can execute the preprocessor in the build step of your library/application.
This has the advantage of letting you write the code you want, but when you ship it/use it it is translated into normal python, keeping 100% compatibility with existing installations.
I believe using import hooks it would be possible to avoid having to ship the preprocessed version of your library... basically the preprocessor could be included in the import step and done on the fly. This would avoid having to deal with temporary preprocessed files.
The only requirement would be that people that need to use your library will have to install the import hook someway.
A post by Ned Batchelder back in 12008 made the claim the Python code objects are marshalled in Python bytecode files:
[...] The entire rest of the file is just the output of marshal.dump() of the code object that results from compiling the source file.
This may apply to code objects created at the Python level, but how exactly are these code objects marshalled at the C level?
I've looked over the documentation for Python code objects(version 3.5), and not much is said about them and certainly doesn't provide an information about how their marshalled:
The C structure of the objects used to describe code objects. The fields of this type are subject to change at any time.
I've also tried to take a look at the Python code object's source code, but couldn't really find anything that answered my question.
If Python code objects are implemented using structs at the C level - which I believe they are - how then can they be marshalled into bytecode files? As far as the research I've done, C provides no built in methods for marshalling nor any third-party libraries?
If code objects are marshalled at the C level, how exactly is it done? If not, how are they encoded into the bytecode?
1I especial noted the date because perhaps this detail has changed between Python versions.
This may apply to code objects created at the Python level, but how exactly are these code objects marshalled at the C level?
With marshal.dump, or perhaps one of the related interfaces like marshal.dumps.
It's not as simple as "delegate to some C standard library function"; the marshalling and unmarshalling implementation comprises an 1820-line file you'd have to read if you want to see every detail. With how Python-specific its data structures and functionality are, most of the object tree traversal and serialization need to be written from scratch.
I want to call methods and get/set instance variables on an instance of a given python class from another process. All of the class methods and variables accept/return simple python dictionaries or lists (specifically it is the P4Python API - I can't use the perforce c++ interop and need the option to call this from another host)
I'd like do this via SOAP or passing json back and forth. My first target is to have mono consume the python class. I am toying with the idea of writing my own bindings generator using python's inspect module that would spit out c# files for my python class.
Have I missed anything out there that already lets me do this? pywebsvcs looks quite close! Could I generate a wsdl file from this?
Does it have to be SOAP or JSON? I think something similar is quite simple with xmlrpc (which comes with python). I'm using it a lot.
I was wondering whether objects serialized using CPython's cPickle are readable by using IronPython's cPickle; the objects in question do not require any modules outside of the built-ins that both Cpython and IronPython include. Thank you!
If you use the default protocol (0) which is text based, then things should work. I'm not sure what will happen if you use a higher protocol. It's very easy to test this ...
It will work because when you unpickle objects during load() it will use the current definitions of whatever classes you have defined now, not back when the objects were pickled.
IronPython is simply Python with the standard library implemented in C# so that everything emits IL. Both the CPython and the IronPython pickle modules have the same functionality, except one is implemented in C and the other in C#.
Would it be possible to create a class interface in python and various implementations of the interface.
Example: I want to create a class for pop3 access (and all methods etc.). If I go with a commercial component, I want to wrap it to adhere to a contract.
In the future, if I want to use another component or code my own, I want to be able to swap things out and not have things very tightly coupled.
Possible? I'm new to python.
For people coming from a strongly typed language background, Python does not need a class interface. You can simulate it using a base class.
class BaseAccess:
def open(arg):
raise NotImplementedError()
class Pop3Access(BaseAccess):
def open(arg):
...
class AlternateAccess(BaseAccess):
def open(arg):
...
But you can easily write the same code without using BaseAccess. Strongly typed language needs the interface for type checking during compile time. For Python, this is not necessary because everything is looked up dynamically in run time. Google 'duck typing' for its philosophy.
There is a Abstract Base Classes module added in Python 2.6. But I haven't have used it.
Of course. There is no need to create a base class or an interface in this case either, as everything is dynamic.
One option is to use zope interfaces. However, as was stated by Wai Yip Tung, you do not need to use interfaces to achieve the same results.
The zope.interface package is really more a tool for discovering how to interact with objects (generally within large code bases with multiple developers).
Yes, this is possible. There are typically no impediments to doing so: just keep a stable API and change how you implement it.