Creating a customized language using Python

Creating a customized language using Python - python

I have started playing with Sage recently, and I've come to suspect that the standard Python int is wrapped in a customized class called Integer in Sage. If I type in type(1) in Python, I get <type 'int'>, however, if I type in the same thing in the sage prompt I get <type 'sage.rings.integer.Integer'>.
If I wanted to replace Python int (or list or dict) with my own custom class, how might it be done? How difficult would it be (e.g. could I do it entirely in Python)?

As an addendum to the other answers: when running any code, Sage has a preprocessing step which converts the Sage-Python to true Python (which is then executed). This is done by the preparse function, e.g.
sage: preparse('a = 1')
'a = Integer(1)'
sage: preparse('2^40')
'Integer(2)**Integer(40)'
sage: preparse('F.<x> = PolynomialRing(ZZ)')
"F = PolynomialRing(ZZ, names=('x',)); (x,) = F._first_ngens(1)"
This step is precisely what allows the transparent use of Integers (in place of ints) and the other non-standard syntax (like the polynomial ring example above and [a..b] etc).
As far as I understand, this is the only way to completely transparently use replacements for the built-in types in Python.

You are able to subclass all of Python's built-in types. For example:
class MyInt(int):
pass
i = MyInt(2)
#i is now an instance of MyInt, but still will behave entirely like an integer.
However, you need to explicitly say each integer is a member of MyInt. So type(1) will still be int, you'll need to do type(MyInt(1)).
Hopefully that's close to what you're looking for.

In the case of Sage, it's easy. Sage has complete control of its own REPL (read-evaluate-print loop), so it can parse the commands you give it and make the parts of your expression into whatever classes it wants. It is not so easy to have standard Python automatically use your integer type for integer literals, however. Simply reassigning the built-in int() to some other type won't do it. You could probably do it with an import filter, that scans each file imported for (say) integer literals and replaces them with MyInt(42) or whatever.

Related

Casting Constructor in Python

I'm using stricly typed Python, and would like to achieve something similar to the copy/move constructor overloading of C++. That is, I'd like to make my object convertible to another type using an explicit definition.
Here's an example:
class CallerSurface(StringEnum):
IOS_APPLICATION = "ios_application"
ANDROID_APPLICATION = "android_application"
WEB_APPLICATION = "web_application"
which I can use with some function such as:
def getResponse(data: Data, caller_name: CallerSurface) -> str:
I'd like to add some part of the definition of class CallerSurface to make it possible for a function that takes a param of type CallerSurface, to also take a param of type str, and just "know" how to convert the str to CallerSurface without the programmer needing to explicitly figure out a conversion.
So I want to use in the following way:
caller_name: str = HTTPUtils.extractCallerFromUserAgent(request)
response = getResponse(other_data, caller_name)
caller_name is a str, but getResponse takes a CallerSurface. I'd like to make the conversion be implicitly defined in then CallerSurface class.
In C++ you could achieve this by defining a copy and a move constructor that takes in string. Is there something in Python for this?

There is no way to automate the conversion (there's no such thing as implicit type conversion of the sort you're looking for), so your options are:
Expand the allowed argument types (caller_name: CallerSurface becomes caller_name: Union[CallerSurface, str]), manually convert the related types to the intended type, or
Use #functools.singledispatch to make multiple versions of the function, one that accepts each type, where all but one implementation just converts to the intended type and calls itself
In short, this isn't C++, and implicit type conversions aren't a thing in the general case.

Python doesn't do type conversion implicitly. It does provide things like the __str__ magic method, but that conversion still requires an explicit call to the str() function.
What you want to do, I think, is leave the typing as CallerSurface, and use a static type checker to force the caller to do (e.g.):
caller_name = HTTPUtils.extractCallerFromUserAgent(request)
response = getResponse(other_data, CallerSurface(caller_name))
Using a type checker (e.g. mypy) is key, since that makes it impossible to forget the CallerSurface call (or whatever other kind of conversion needs to happen).

Returning a function python-c-api

I am creating Python bindings for a C library.
In C the code to use the functions would look like this:
Ihandle *foo;
foo = MethFunc();
SetArribute(foo, 's');
I am trying to get this into Python. Where I have MethFunc() and SetAttribute() functions that could be used in my Python code:
import mymodule
foo = mymodule.MethFunc()
mymodule.SetAttribute(foo)
So far my C code to return the function looks like this:
static PyObject * _MethFunc(PyObject *self, PyObject *args) {
return Py_BuildValue("O", MethFunc());
}
But that fails by crashing (no errors)
I have also tried return MethFunc(); but that failed.
How can I return the function foo (or if what I am trying to achieve is completely wrong, how should I go about passing MethFunc() to SetAttribute())?

The problem here is that MethFunc() returns an IHandle *, but you're telling Python to treat it as a PyObject *. Presumably those are completely unrelated types.
A PyObject * (or any struct you or Python defines that starts with an appropriate HEAD macro) begins with pointers to a refcount and a type, and the first thing Python is going to do with any object you hand it is deal with those pointers. So, if you give it an object that instead starts with, say, two ints, Python is going to end up trying to access a type at 0x00020001 or similar, which is almost certain to segfault.
If you need to pass around a pointer to some C object, you have to wrap it up in a Python object. There are three ways to do this, from hackiest to most solid.
First, you can just cast the IHandle * to a size_t, then PyLong_FromSize_t it.
This is dead simple to implement. But it means these objects are going to look exactly like numbers from the Python side, because that's all they are.
Obviously you can't attach a method to this number; instead, your API has to be a free function that takes a number, then casts that number back to an IHandle* and calls a method.
It's more like, e.g., C's stdio, where you have to keep passing stdin or f as an argument to fread, instead of Python's io, where you call methods on sys.stdin or f.
But even worse, because there's no type checking, static or dynamic, to protect you from some Python code accidentally passing you the number 42. Which you'll then cast to an IHandle * and try to dereference, leading to a segfault…
And if you were hoping Python's garbage collector would help you know when the object is still referenced, you're out of luck. You need to make your users manually keep track of the number and call some CloseHandle function when they're done with it.
Really, this isn't that much better than accessing your code from ctypes, so hopefully that inspires you to keep reading.
A better solution is to cast the IHandle * to a void *, then PyCapsule_New it.
If you haven't read about capsules, you need to at least skim the main chapter. But the basic idea is that it wraps up a void* as a Python object.
So, it's almost as simple as passing around numbers, but solves most of the problems. Capsules are opaque values which your Python users can't accidentally do arithmetic on; they can't send you 42 in place of a capsule; you can attach a function that gets called when the last reference to a capsule goes away; you can even give it a nice name to show up in the repr.
But you still can't attach any behavior to capsules.
So, your API will still have to be a MethSetAttribute(mymodule, foo) instead of mymeth.SetAttribute(foo) if mymodule is a capsule, just as if it's an int. (Except now it's type-safe.)
Finally, you can build a new Python extension type for a struct that contains an IHandle *.
This is a lot more work. And if you haven't read the tutorial on Defining Extension Types, you need to go thoroughly read through that whole chapter.
But it means that you have an actual Python type, with everything that goes with it.
You can give it a SetAttribute method, and Python code can just call that method. You can give it whatever __str__ and __repr__ you want. You can give it a __doc__. Python code can do isinstance(mymodule, MyMeth). And so on.
If you're willing to use C++, or D, or Rust instead of C, there are some great libraries (PyCxx, boost::python, Pyd, rust-python, etc.) that can do most of the boilerplate for you. You just declare that you want a Python class and how you want its attributes and methods bound to your C attributes and methods and you get something you can use like a C++ class, except that it's actually a PyObject * under the covers. (And it'll even takes care of all the refcounting cruft for you via RAII, which will save you endless weekends debugging segfaults and memory leaks…)
Or you can use Cython, which lets you write C extension modules in a language that's basically Python, but extended to interface with C code. So your wrapper class is just a class, but with a special private cdef attribute that holds the IHandle *, and your SetAttribute(self, s) can just call the C SetAttribute function with that private attribute.
Or, as suggested by user, you can also use SWIG to generate the C bindings for you. For simple cases, it's pretty trivial—just feed it your C API, and it gives you back the code to build your Python .so. For less simple cases, I personally find it a lot more painful than something like PyCxx, but it definitely has a lower learning curve if you don't already know C++.

Is it possible to create a regex-constrained type hint?

I have a helper function that converts a %Y-%m-%d %H:%M:%S-formatted string to a datetime.datetime:
def ymdt_to_datetime(ymdt: str) -> datetime.datetime:
return datetime.datetime.strptime(ymdt, '%Y-%m-%d %H:%M:%S')
I can validate the ymdt format in the function itself, but it'd be more useful to have a custom object to use as a type hint for the argument, something like
from typing import NewType, Pattern
ymdt_pattern = '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]'
YmdString = NewType('YmdString', Pattern[ymdt_pattern])
def ymdt_to_datetime(ymdt: YmdString)...
Am I going down the wrong rabbit hole? Should this be an issue in mypy or someplace? Or can this be accomplished with the current type hint implementation (3.61)?

There currently is no way for types to statically verify that your string matches a precise format, unfortunately. This is partially because checking at compile time the exact values a given variable can hold is exceedingly difficult to implement (and in fact, is NP-hard in some cases), and partially because the problem becomes impossible in the face of things like user input. As a result, it's unlikely that this feature will be added to either mypy or the Python typing ecosystem in the near future, if at all.
One potential workaround would be to leverage NewType, and carefully control when exactly you construct a string of that format. That is, you could do:
from typing import NewType
YmdString = NewType('YmdString', str)
def datetime_to_ymd(d: datetime) -> YmdString:
# Do conversion here
return YmdStr(s)
def verify_is_ymd(s: str) -> YmdString:
# Runtime validation checks here
return YmdString(s)
If you use only functions like these to introduce values of type YmdString and do testing to confirm that your 'constructor functions' are working perfectly, you can more or less safely distinguish between strings and YmdString at compile time. You'd then want to design your program to minimize how frequently you call these functions to avoid incurring unnecessary overhead, but hopefully, that won't be too onerous to do.

Using type-hints does nothing in Python and acts as an indication of the type in static checkers. It is not meant to perform any actions, merely annotate a type.
You can't do any validation, all you can do, with type-hints and a checker, is make sure the argument passed in is actually of type str.

🌷🌷🌷🌷🌷
Okay, here we are five years later and the answer is now yes, at least if you're willing to take a third-party library on board and decorate the functions you want to be checked at runtime:
$ pip install beartype
import re
from typing import Annotated # python 3.9+
from beartype import beartype
from beartype.vale import Is
YtdString = Annotated[str, Is[lambda string: re.match('[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]', string) is not None]]
#beartype
def just_print_it(ytd_string: YtdString) -> None:
print(ytd_string)
> just_print_it("hey")
BeartypeCallHintParamViolation: #beartyped just_print_it() parameter ytd_string='hey' violates type hint typing.Annotated[str, Is[<lambda>]], as 'hey' violates validator Is[<lambda>]:
False == Is[<lambda>].
> just_print_it("2022-12-23 09:09:23")
2022-12-23 09:09:23
> just_print_it("2022-12-23 09:09:2")
BeartypeCallHintParamViolation: #beartyped just_print_it() parameter ytd_string='2022-12-23 09:09:2' violates type hint typing.Annotated[str, Is[<lambda>]], as '2022-12-23 09:09:2' violates validator Is[<lambda>]:
False == Is[<lambda>].
Please note that I'm using the very imperfect regex pattern I originally included in the question, not production-ready.
Then, a hopeful note: The maintainer of beartype is hard at work on an automagical import hook which will eliminate the need for decorating functions in order to achieve the above.

Why to use types in python 3.5 +

I'm trying to understand why should I use types annotation in python. For exemple I can write function like:
def some_function(a: int, b: int) -> int:
return a + b
When I use it with int all gone good:
some_function(1, 2) # return 3, type int
But when I run for exemple
some_function(1, 2.0) # return 3.0, type float
I have result without any notes that types are wrong. So what is the reason to use types annotation?

Type hints are there for other tools to check your code, they are not enforced at runtime. The goal is enable static analysis tools to detect invalid argument use.
Use an IDE like PyCharm, or the commandline code checker mypy to be told that 2.0 is not a valid argument type.
From the Type Hinting PEP (484):
This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and (perhaps, in some contexts) code generation utilizing type information.
Emphasis mine. Runtime type checking is left to third-party tools. Note that such runtime checks would come with a performance downside, your code will likely run slower if you were to check for types on every call.

As one can read in the PEP 484 that introduces type hints:
(...)
This PEP aims to provide a standard syntax for type annotations,
opening up Python code to easier static analysis and refactoring,
potential runtime type checking, and (perhaps, in some contexts) code
generation utilizing type information.
Of these goals, static analysis is the most important. This includes
support for off-line type checkers such as mypy, as well as providing
a standard notation that can be used by IDEs for code completion and
refactoring.
IDE's (static analysis)
So the main use is in static analysis: your IDE can detect that something is wrong when you call a function and can provide a list of functions you can call on the result of function.
For instance if you write:
some_function(1,2).
your IDE can provide a list with real as a possible option so you can easily write:
some_function(1,2).real
and if you write:
some_function('foo',2).bar
It will hint that 'foo' is not an acceptable parameter nor is .bar a good call on that object.
Dynamic inspection
You can also use it for dynamic inspection with inspect.getfulargspec like:
>>> import inspect
>>> inspect.getfullargspec(some_function).annotations
{'return': <class 'int'>, 'a': <class 'int'>, 'b': <class 'int'>}
Now we know that some_function returns an int and can be feeded two ints. This can be used for arbitrary tests (which are popular in Haskell): you simply feed the some_function random integers and looks that it always returns an int (and does not raises an exception for instance).

Accessing Object Memory Address

When you call the object.__repr__() method in Python you get something like this back:
<__main__.Test object at 0x2aba1c0cf890>
Is there any way to get a hold of the memory address if you overload __repr__(), other then calling super(Class, obj).__repr__() and regexing it out?

The Python manual has this to say about id():
Return the "identity'' of an object.
This is an integer (or long integer)
which is guaranteed to be unique and
constant for this object during its
lifetime. Two objects with
non-overlapping lifetimes may have the
same id() value. (Implementation note:
this is the address of the object.)
So in CPython, this will be the address of the object. No such guarantee for any other Python interpreter, though.
Note that if you're writing a C extension, you have full access to the internals of the Python interpreter, including access to the addresses of objects directly.

You could reimplement the default repr this way:
def __repr__(self):
return '<%s.%s object at %s>' % (
self.__class__.__module__,
self.__class__.__name__,
hex(id(self))
)

Just use
id(object)

There are a few issues here that aren't covered by any of the other answers.
First, id only returns:
the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
In CPython, this happens to be the pointer to the PyObject that represents the object in the interpreter, which is the same thing that object.__repr__ displays. But this is just an implementation detail of CPython, not something that's true of Python in general. Jython doesn't deal in pointers, it deals in Java references (which the JVM of course probably represents as pointers, but you can't see those—and wouldn't want to, because the GC is allowed to move them around). PyPy lets different types have different kinds of id, but the most general is just an index into a table of objects you've called id on, which is obviously not going to be a pointer. I'm not sure about IronPython, but I'd suspect it's more like Jython than like CPython in this regard. So, in most Python implementations, there's no way to get whatever showed up in that repr, and no use if you did.
But what if you only care about CPython? That's a pretty common case, after all.
Well, first, you may notice that id is an integer;* if you want that 0x2aba1c0cf890 string instead of the number 46978822895760, you're going to have to format it yourself. Under the covers, I believe object.__repr__ is ultimately using printf's %p format, which you don't have from Python… but you can always do this:
format(id(spam), '#010x' if sys.maxsize.bit_length() <= 32 else '#18x')
* In 3.x, it's an int. In 2.x, it's an int if that's big enough to hold a pointer—which is may not be because of signed number issues on some platforms—and a long otherwise.
Is there anything you can do with these pointers besides print them out? Sure (again, assuming you only care about CPython).
All of the C API functions take a pointer to a PyObject or a related type. For those related types, you can just call PyFoo_Check to make sure it really is a Foo object, then cast with (PyFoo *)p. So, if you're writing a C extension, the id is exactly what you need.
What if you're writing pure Python code? You can call the exact same functions with pythonapi from ctypes.
Finally, a few of the other answers have brought up ctypes.addressof. That isn't relevant here. This only works for ctypes objects like c_int32 (and maybe a few memory-buffer-like objects, like those provided by numpy). And, even there, it isn't giving you the address of the c_int32 value, it's giving you the address of the C-level int32 that the c_int32 wraps up.
That being said, more often than not, if you really think you need the address of something, you didn't want a native Python object in the first place, you wanted a ctypes object.

Just in response to Torsten, I wasn't able to call addressof() on a regular python object. Furthermore, id(a) != addressof(a). This is in CPython, don't know about anything else.
>>> from ctypes import c_int, addressof
>>> a = 69
>>> addressof(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: invalid type
>>> b = c_int(69)
>>> addressof(b)
4300673472
>>> id(b)
4300673392

You can get something suitable for that purpose with:
id(self)

With ctypes, you can achieve the same thing with
>>> import ctypes
>>> a = (1,2,3)
>>> ctypes.addressof(a)
3077760748L
Documentation:
addressof(C instance) -> integer
Return the address of the C instance internal buffer
Note that in CPython, currently id(a) == ctypes.addressof(a), but ctypes.addressof should return the real address for each Python implementation, if
ctypes is supported
memory pointers are a valid notion.
Edit: added information about interpreter-independence of ctypes

I know this is an old question but if you're still programming, in python 3 these days... I have actually found that if it is a string, then there is a really easy way to do this:
>>> spam.upper
<built-in method upper of str object at 0x1042e4830>
>>> spam.upper()
'YO I NEED HELP!'
>>> id(spam)
4365109296
string conversion does not affect location in memory either:
>>> spam = {437 : 'passphrase'}
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
>>> str(spam)
"{437: 'passphrase'}"
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'

You can get the memory address/location of any object by using the 'partition' method of the built-in 'str' type.
Here is an example of using it to get the memory address of an object:
Python 3.8.3 (default, May 27 2020, 02:08:17)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> object.__repr__(1)
'<int object at 0x7ca70923f0>'
>>> hex(int(object.__repr__(1).partition('object at ')[2].strip('>'), 16))
0x7ca70923f0
>>>
Here, I am using the built-in 'object' class' '__repr__' method with an object/item such as 1 as an argument to return the string and then I am partitioning that string which will return a tuple of the string before the string that I provided, the string that I provided and then the string after the string that I provided, and as the memory location is positioned after 'object at', I can get the memory address as it has partitioned it from that part.
And then as the memory address was returned as the third item in the returned tuple, I can access it with index 2 from the tuple. But then, it has a right angled bracket as a suffix in the string that I obtained, so I use the 'strip' function to remove it, which will return it without the angled bracket. I then transformed the resulted string into an integer with base 16 and then turn it into a hex number.

While it's true that id(object) gets the object's address in the default CPython implementation, this is generally useless... you can't do anything with the address from pure Python code.
The only time you would actually be able to use the address is from a C extension library... in which case it is trivial to get the object's address since Python objects are always passed around as C pointers.

If the __repr__ is overloaded, you may consider __str__ to see the memory address of the variable.
Here is the details of __repr__ versus __str__ by Moshe Zadka in StackOverflow.

There is a way to recovery the value from the 'id' command, here it the TL;DR.
ctypes.cast(memory_address,ctypes.py_object).value
source

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.