First of all I want to mention that I know this is a horrible idea and it shouldn't be done. My intention is mainly curiosity and learning the innards of Python, and how to 'hack' them.
I was wondering whether it is at all possible to change what happens when we, for instance, use [] to create a list. Is there a way to modify how the parser behaves in order to, for instance, cause ["hello world"] to call print("hello world") instead of creating a list with one element?
I've attempted to find any documentation or posts about this but failed to do so.
Below is an example of replacing the built-in dict to instead use a custom class:
from __future__ import annotations
from typing import List, Any
import builtins
class Dict(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.__dict__ = self
def subset(self, keys: List[Any]) -> Dict:
return Dict({key: self[key] for key in keys})
builtins.dict = Dict
When this module is imported, it replaces the dict built-in with the Dict class. However this only works when we directly call dict(). If we attempt to use {} it will fall back to the base dict built-in implementation:
import new_dict
a = dict({'a': 5, 'b': 8})
b = {'a': 5, 'b': 8}
print(type(a))
print(type(b))
Yields:
<class 'py_extensions.new_dict.Dict'>
<class 'dict'>
[] and {} are compiled to specific opcodes that specifically return a list or a dict, respectively. On the other hand list() and dict() compile to bytecodes that search global variables for list and dict and then call them as functions:
import dis
dis.dis(lambda:[])
dis.dis(lambda:{})
dis.dis(lambda:list())
dis.dis(lambda:dict())
returns (with some additional newlines for clarity):
3 0 BUILD_LIST 0
2 RETURN_VALUE
5 0 BUILD_MAP 0
2 RETURN_VALUE
7 0 LOAD_GLOBAL 0 (list)
2 CALL_FUNCTION 0
4 RETURN_VALUE
9 0 LOAD_GLOBAL 0 (dict)
2 CALL_FUNCTION 0
4 RETURN_VALUE
Thus you can overwrite what dict() returns simply by overwriting the global dict, but you can't overwrite what {} returns.
These opcodes are documented here. If the BUILD_MAP opcode runs, you get a dict, no way around it. As an example, here is the implementation of BUILD_MAP in CPython, which calls the function _PyDict_FromItems. It doesn't look at any kind of user-defined classes, it specifically makes a C struct that represents a python dict.
It is possible in at least some cases to manipulate the python bytecode at runtime. If you really wanted to make {} return a custom class, I suppose you could write some code to search for the BUILD_MAP opcode and replace it with the appropriate opcodes. Though those opcodes aren't the same size, so there's probably quite a few additional changes you'd have to make.
The ast module is an interface to Python's Abstract Syntax Tree which is built after parsing Python code.
It's possible to replace literal dict ({}) with dict call by modifying Abstract Syntax Tree of Python code.
import ast
import new_dict
a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}
print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))
src = """
a = dict({"a": 5, "b": 8})
b = {"a": 5, "b": 8}
print(type(a))
print(type(b))
print(type({"a": 5, "b": 8}))
"""
class RewriteDict(ast.NodeTransformer):
def visit_Dict(self, node):
# don't replace `dict({"a": 1})`
if isinstance(node.parent, ast.Call) and node.parent.func.id == "dict":
return node
# replace `{"a": 1} with `dict({"a": 1})
new_node = ast.Call(
func=ast.Name(id="dict", ctx=ast.Load()),
args=[node],
keywords=[],
type_comment=None,
)
return ast.fix_missing_locations(new_node)
tree = ast.parse(src)
# set parent to every node
for node in ast.walk(tree):
for child in ast.iter_child_nodes(node):
child.parent = node
RewriteDict().visit(tree)
exec(compile(tree, "ast", "exec"))
output;
<class 'new_dict.Dict'>
<class 'dict'>
<class 'dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>
<class 'new_dict.Dict'>
Related
I have a nested set of dataclasses that I want to convert to a dictionary
however, some classes should remain as a class, and not be converted to a dataclass
(the full structure is deeper and more complex)
in this example:
from dataclasses import dataclass, field, asdict
#dataclass
class C:
x: int = 1
#dataclass
class B:
c: C = C()
#dataclass
class A:
b: B = B()
asdict(A())
# returns
# {'b': {'c': {'x': 1}}}
# I want
custom_asdict(A())
# should return:
# {'b': {'c': C(x=1)}}
marking the class C as "do not expand" could be either as a parameter to custom_asdict or as a prameter to the dataclass decorator
Although dataclasses.asdict allows for a "dict_factory" parameter, its use is limited, as it is only called for pairs of name/value for each field recursively, but "depth first": meaning all dataclass values are already serialized to a dict when the custom factory is called.
So, it is very hard to customize a "dict_factory" that would provide the needed behavior - on the other hand, it is possible to simply wrap "asdict" (or the inner function it calls) to do not serialize the classes you do not want.
That is way more straightforward, and, if needed, can be designed in a way to be turned "on or off" (for example, using unittest.mock.patch".)
Otherwise, just set for an attribute name to indicate the classes you don't want to serialize as dicts, and call the function bellow prior to calling asdict(). (This code checks for a _dont_expand attribute)
def patch():
import dataclasses
if getattr(dataclasses, "_patched", False):
return
original = dataclasses._asdict_inner
def new_asdict_inner(obj, factory):
if dataclasses._is_dataclass_instance(obj) and getattr(obj, "_dont_expand", False):
return obj
return original(obj, factory)
dataclasses._asdict_inner = new_asdict_inner
dataclasses._patched = True
I tested this in a Python shell with classes like yours and it works like a charm:
In [75]: #dataclasses.dataclass
...: class C:
...: _dont_expand = True
...: x: int = 1
...:
In [76]: #dataclasses.dataclass
...: class B:
...: c: C = dataclasses.field(default_factory=C)
...:
In [77]: #dataclasses.dataclass
...: class A:
...: b: B = dataclasses.field(default_factory=B)
...:
In [78]: a = A()
In [79]: a
Out[79]: A(b=B(c=C(x=1)))
In [80]: patch()
In [81]: dataclasses.asdict(a)
Out[81]: {'b': {'c': C(x=1)}}
(note: with this code you can set the _dont_expand attribute directly on instances you don't want to serialize: it will work just for those instances, while their class keep the normal behavior)
d = {}
d[3] = 0
d[1] = 4
I tried
mask = d > 1 # TypeError: '>' not supported between instances of 'dict' and 'int'
mask = d.values > 1 # TypeError: '>' not supported between instances of 'builtin_function_or_method' and 'int'
Both aren't correct. Is it possible to perform the computation without using dictionary comprehension?
The desired output would be:
{3: False, 1: True}
I feel like what you want is the ability to actually write d < 5 and magically get a new dictionary (which I don't think is possible with plain dict()). But on the other hand I thought this was a great idea, so I implemented a first version:
"""Here is our strategy for implementing this:
1) Inherit the abstract Mapping which define a
set of rules — interface — we will have to respect
to be considered a legitimate mapping class.
2) We will implement that by delegating all the hard
work to an inner dict().
3) And we will finally add some methods to be able
to use comparison operators.
"""
import collections
import operator
"Here is step 1)"
class MyDict(collections.abc.MutableMapping):
"step 2)"
def __init__(self, *args):
self.content = dict(*args)
# All kinds of delegation to the inner dict:
def __iter__(self): return iter(self.content.items())
def __len__(self): return len(self.content)
def __getitem__(self, key): return self.content[key]
def __setitem__(self, key, value): self.content[key] = value
def __delitem__(self, key): del self.content[key]
def __str__(self): return str(self.content)
"And finally, step 3)"
# Build where function using the given comparison operator
def _where_using(comparison):
def where(self, other):
# MyDict({...}) is optional
# you could just decide to return a plain dict: {...}
return MyDict({k: comparison(v, other) for k,v in self})
return where
# map all operators to the corresponding "where" method:
__lt__ = _where_using(operator.lt)
__le__ = _where_using(operator.le)
__eq__ = _where_using(operator.eq)
__gt__ = _where_using(operator.gt)
__ge__ = _where_using(operator.ge)
We can use this the way you asked for:
>>> d = MyDict({3:0, 1:4})
>>> print(d)
{3: 0, 1: 4}
>>> print(d > 1)
{3: False, 1: True}
Note that this would also work on other types of (comparable) objects:
>>> d = MyDict({3:"abcd", 1:"abce"})
>>> print(d)
{3: 'abcd', 1: 'abce'}
>>> print(d > "abcd")
{3: False, 1: True}
>>> print(d > "abcc")
{3: True, 1: True}
Here's an easy way for you to use something like d<5. You just need:
import pandas as pd
res = pd.Series(d) < 4
res.to_dict() # returns {3: True, 1: False}`
Why does the following code work while the code after it breaks?
I'm not sure how to articulate my question in english, so I attached the smallest code I could come up with to highlight my problem.
(Context: I'm trying to create a terminal environment for python, but for some reason the namespaces seem to be messed up, and the below code seems to be the essence of my problem)
No errors:
d={}
exec('def a():b',d)
exec('b=None',d)
exec('a()',d)
Errors:
d={}
exec('def a():b',d)
d=d.copy()
exec('b=None',d)
d=d.copy()
exec('a()',d)
It is because the d does not use the globals provided by exec; it uses the mapping to which it stored the reference in the first exec. While you set 'b' in the new dictionary, you never set b in the globals of that function.
>>> d={}
>>> exec('def a():b',d)
>>> exec('b=None',d)
>>> d['a'].__globals__ is d
True
>>> 'b' in d['a'].__globals__
True
vs
>>> d={}
>>> exec('def a():b',d)
>>> d = d.copy()
>>> exec('b=None',d)
>>> d['a'].__globals__ is d
False
>>> 'b' in d['a'].__globals__
False
If exec didn't work this way, then this too would fail:
mod.py
b = None
def d():
b
main.py
from mod import d
d()
A function will remember the environment where it was first created.
It is not possible to change the dictionary that an existing function points to. You can either modify its globals explicitly, or you can make another function object altogether:
from types import FunctionType
def rebind_globals(func, new_globals):
f = FunctionType(
code=func.__code__,
globals=new_globals,
name=func.__name__,
argdefs=func.__defaults__,
closure=func.__closure__
)
f.__kwdefaults__ = func.__kwdefaults__
return f
def foo(a, b=1, *, c=2):
print(a, b, c, d)
# add __builtins__ so that `print` is found...
new_globals = {'d': 3, '__builtins__': __builtins__}
new_foo = rebind_globals(foo, new_globals)
new_foo(a=0)
I'm planning to use PyYAML for a configuration file. Some of the items
in that configuration file are Python tuples of tuples. So, I need a
convenient way to represent them. One can represent Python tuples of
tuples as follows using PyYAML
print yaml.load("!!python/tuple [ !!python/tuple [1, 2], !!python/tuple [3, 4]]")
However, this is not convenient notation for a long sequence of
items. I think it should be possible to define a custom tag, like
python/tuple_of_tuples. I.e. something like
yaml.load("!!python/tuple_of_tuples [[1,2], [3,4]]")
See my first attempt to define this below, by mimicking how
python/tuple is defined, and trying to do similar subclassing. It
fails, but gives an idea what I am after, I think. I have a second
attempt that works, but is a cheat, since it just calls eval.
If I can't find anything better I'll just use that. However, YAML is
intended as a replacement for ConfigObj, which uses INI files, and is
considerably less powerful than YAML, and I used the same approach
(namely eval) for tuples of tuples. So in that respect it will be no
worse.
A proper solution would be most welcome.
I have a couple of comments on my first solution.
I'd have thought that the constructor
construct_python_tuple_of_tuples would return the completed
structure, but in fact it seems to return an empty structure as
follows
([], [])
I traced the calls, and there seems to be a lot of complicated stuff
happening after construct_python_tuple_of_tuples is called.
The value that is returned is a tuple of lists of integers, so quite
close to the desired result. So, the structure must be completed
later.
The line with
tuple([tuple(t) for t in x])
was my attempt to coerce the list of tuples to a tuple of tuples, but
if I return that from construct_python_tuple_of_tuples, then the
resulting call to yaml.load("!!python/tuple_of_tuples [[1,2], [3,4]]") is just
((),())
Not sure what is with the
yaml.org,2002
Why 2002?
First attempt
import yaml
from yaml.constructor import Constructor
def construct_python_tuple_of_tuples(self, node):
# Complete content of construct_python_tuple
# is
# return tuple(self.construct_sequence(node))
print "node", node
x = tuple(self.construct_sequence(node))
print "x", x
foo = tuple([tuple(t) for t in x])
print "foo", foo
return x
Constructor.construct_python_tuple_of_tuples =
construct_python_tuple_of_tuples
Constructor.add_constructor(
u'tag:yaml.org,2002:python/tuple_of_tuples',
Constructor.construct_python_tuple_of_tuples)
y = yaml.load("!!python/tuple_of_tuples [[1,2], [3,4]]")
print "y", y, type(y)
print y[0], type(y[0])
print y[0][0], type(y[0][0])
The results are
node SequenceNode(tag=u'tag:yaml.org,2002:python/tuple_of_tuples',
value=[SequenceNode(tag=u'tag:yaml.org,2002:seq',
value=[ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'1'),
ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'2')]),
SequenceNode(tag=u'tag:yaml.org,2002:seq',
value=[ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'3'),
ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'4')])])
x ([], [])
foo ((), ())
y ([1, 2], [3, 4]) <type 'tuple'>
y[0] [1, 2] <type 'list'>
y[0][0] 1 <type 'int'>
Second attempt
import yaml
from yaml import YAMLObject, Loader, Dumper
class TupleOfTuples(YAMLObject):
yaml_loader = Loader
yaml_dumper = Dumper
yaml_tag = u'!TupleOfTuples'
#yaml_flow_style = ...
#classmethod
def from_yaml(cls, loader, node):
import ast
print "node", node
print "node.value", node.value, type(node.value)
return ast.literal_eval(node.value)
#classmethod
def to_yaml(cls, dumper, data):
return node
t = yaml.load("!TupleOfTuples ((1, 2), (3, 4))")
print "t", t, type(t)
The results are:
node ScalarNode(tag=u'!TupleOfTuples', value=u'((1, 2), (3, 4))')
node.value ((1, 2), (3, 4)) <type 'unicode'>
t ((1, 2), (3, 4)) <type 'tuple'>
To start with question 2 first: 2002 was the year this kind of tag was introduced in the Sep 1, 2002 version of the YAML 1.0 draft
Question 1 is more complicated. If you do:
from __future__ import print_function
import yaml
lol = [[1,2], [3,4]] # list of lists
print(yaml.dump(lol))
you get (A):
[[1, 2], [3, 4]]
But actually this is short for (B):
!!seq [
!!seq [
!!int "1",
!!int "2",
],
!!seq [
!!int "3",
!!int "4",
],
]
which is short for (C):
!<tag:yaml.org,2002:seq> [
!<tag:yaml.org,2002:seq> [
!<tag:yaml.org,2002:int> "1",
!<tag:yaml.org,2002:int> "2",
],
!<tag:yaml.org,2002:seq> [
!<tag:yaml.org,2002:int> "3",
!<tag:yaml.org,2002:int> "4",
],
]
A, B and C all load to the original list of list, because the seq(uence) is a built in type.
I don't think that extending the syntax of yaml (with e.g. () indicating a tuple would be a good idea. To minimize tags you reduce your example to:
yaml_in = "!tuple [ !tuple [1, 2], !tuple [3, 4]]"
and add a constructor:
yaml.add_constructor("!tuple", construct_tuple)
but this pushes the problem to creating the construct_tuple function. The one for a sequence (in constructor.py) is:
def construct_yaml_seq(self, node):
data = []
yield data
data.extend(self.construct_sequence(node))
But you cannot just replace the [] in there with () as changing the tuple by extending it will not work (the reason for this two step creation, with a yield, is e.g. to allow circular references in complex types like sequence and mapping).
You should define a Tuple() class that behaves like a list until "locked" (which you would do at the end of the contruction), and from then on it should behave like a tuple (i.e. no more modification). The following does so without subclassing yaml.YAMLObject, so you have to explicitly provide and register the constructor and representer for the Class.
class Tuple(list):
def _lock(self):
if hasattr(self, '_is_locked'):
return
self._is_locked = True
self.append = self._append
self.extend = self._extend
def _append(self, item):
raise AttributeError("'Tuple' object has no attribute 'append'")
def _extend(self, items):
raise AttributeError("'Tuple' object has no attribute 'extend'")
def __str__(self):
return '(' + ', '.join((str(e) for e in self)) + ')'
# new style class cannot assign something to special method
def __setitem__(self, key, value):
if getattr(self, '_is_locked', False):
raise TypeError("'Tuple' object does not support item assignment")
list.__setitem__(self, key, value)
def __delitem__(self, key, value):
if getattr(self, '_is_locked', False):
raise TypeError("'Tuple' object does not support item deletion")
list.__delitem__(self, key, value)
#staticmethod
def _construct_tuple(loader, data):
result = Tuple()
yield result
result.extend(loader.construct_sequence(data))
result._lock()
#staticmethod
def _represent_tuple(dumper, node):
return dumper.represent_sequence("!tuple", node)
# let yaml know how to handle this
yaml.add_constructor("!tuple", Tuple._construct_tuple)
yaml.add_representer(Tuple, Tuple._represent_tuple)
With that in place you can do:
yaml_in = "!tuple [ !tuple [1, 2], !tuple [3, 4]]"
#yaml_in = "!tuple [1, 2]"
data = yaml.load(yaml_in)
print(data)
print(data[1][0])
print(type(data))
to get:
((1, 2), (3, 4))
3
<class '__main__.Tuple'>
This is not a real tuple, but it doesn't allow list-like actions. The following activities all throw the appropriate error:
# test appending to the tuple,
try:
data.append(Tuple([5, 6]))
except AttributeError:
pass
else:
raise NotImplementedError
# test extending the tuple,
try:
data.extend([5, 6])
except AttributeError:
pass
else:
raise NotImplementedError
# test replacement of an item
try:
data[0] = Tuple([5, 6])
except TypeError:
pass
else:
raise NotImplementedError
# test deletion of an item
try:
del data[0]
except TypeError:
pass
else:
raise NotImplementedError
And finally you can do:
print(yaml.dump(data, default_flow_style=True))
for the following output:
!tuple [!tuple [1, 2], !tuple [3, 4]]
If you really want !tuple [[1, 2], [3, 4]] to create a Tuple of Tuples, you can do so by keeping context state in the the Baseloader class of yaml and overriding the method that construct python object from the sequences to Tuples or lists depending on context. That would probably have to be a stack of context states, to allow for nested use of !tuple as well as non-nested use, and some explicit overriding to get lists within tuples when using !!seq as tag.
I might have not checked Tuple() for completeness, and only implemented the restrictions that tuple has compared to list that immediately came to mind.
I tested this with my enhanced version of PyYAML: ruamel.yaml, but this should work the same in PyYAML itself.
I have data currently structured as following in Matlab
item{i}.attribute1(2,j)
Where item is a cell from i = 1 .. n each containing the data structure of multiple attributes each a matrix of size 2,j where j = 1 .. m. The number of attributes is not fixed.
I have to translate this data structure to python, but I am new to numpy and python lists. What is the best way of structuring this data in python with numpy/scipy?
Thanks.
I've often seen the following conversion approaches:
matlab array -> python numpy array
matlab cell array -> python list
matlab structure -> python dict
So in your case that would correspond to a python list containing dicts, which themselves contain numpy arrays as entries
item[i]['attribute1'][2,j]
Note
Don't forget the 0-indexing in python!
[Update]
Additional: Use of classes
Further to the simple conversion given above, you could also define a dummy class, e.g.
class structtype():
pass
This allows the following type of usage:
>> s1 = structtype()
>> print s1.a
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-40-7734865fddd4> in <module>()
----> 1 print s1.a
AttributeError: structtype instance has no attribute 'a'
>> s1.a=10
>> print s1.a
10
Your example in this case becomes, e.g.
>> item = [ structtype() for i in range(10)]
>> item[9].a = numpy.array([1,2,3])
>> item[9].a[1]
2
A simple version of the answer by #dbouz , using the idea by #jmetz
class structtype():
def __init__(self,**kwargs):
self.Set(**kwargs)
def Set(self,**kwargs):
self.__dict__.update(kwargs)
def SetAttr(self,lab,val):
self.__dict__[lab] = val
then you can do
myst = structtype(a=1,b=2,c=3)
or
myst = structtype()
myst.Set(a=1,b=2,c=3)
and still do
myst.d = 4 # here, myst.a=1, myst.b=2, myst.c=3, myst.d=4
or even
myst = structtype(a=1,b=2,c=3)
lab = 'a'
myst.SetAttr(lab,10) # a=10,b=2,c=3 ... equivalent to myst.(lab)=10 in MATLAB
and you get exactly what you'd expect in matlab for myst=struct('a',1,'b',2,'c',3).
The equivalent of a cell of structs would be a list of structtype
mystarr = [ structtype(a=1,b=2) for n in range(10) ]
which would give you
mystarr[0].a # == 1
mystarr[0].b # == 2
If you are looking for a good example how to create a structured array in Python like it is done in MATLAB, you might want to have a look at the scipy homepage (basics.rec).
Example
x = np.zeros(1, dtype = [('Table', float64, (2, 2)),
('Number', float),
('String', '|S10')])
# Populate the array
x['Table'] = [1, 2]
x['Number'] = 23.5
x['String'] = 'Stringli'
# See what is written to the array
print(x)
The printed output is then:
[([[1.0, 2.0], [1.0, 2.0]], 23.5, 'Stringli')]
Unfortunately, I did not find out how you can define a structured array without knowing the size of the structured array. You can also define the array directly with its contents.
x = np.array(([[1, 2], [1, 2]], 23.5, 'Stringli'),
dtype = [('Table', float64, (2, 2)),
('Number', float),
('String', '|S10')])
# Same result as above but less code (if you know the contents in advance)
print(x)
For some applications a dict or list of dictionaries will suffice. However, if you really want to emulate a MATLAB struct in Python, you have to take advantage of its OOP and form your own struct-like class.
This is a simple example for instance that allows you to store an arbitrary amount of variables as attributes and can be also initialized as empty (Python 3.x only). i is the indexer that shows how many attributes are stored inside the object:
class Struct:
def __init__(self, *args, prefix='arg'): # constructor
self.prefix = prefix
if len(args) == 0:
self.i = 0
else:
i=0
for arg in args:
i+=1
arg_str = prefix + str(i)
# store arguments as attributes
setattr(self, arg_str, arg) #self.arg1 = <value>
self.i = i
def add(self, arg):
self.i += 1
arg_str = self.prefix + str(self.i)
setattr(self, arg_str, arg)
You can initialise it empty (i=0), or populate it with initial attributes. You can then add attributes at will. Trying the following:
b = Struct(5, -99.99, [1,5,15,20], 'sample', {'key1':5, 'key2':-100})
b.add(150.0001)
print(b.__dict__)
print(type(b.arg3))
print(b.arg3[0:2])
print(b.arg5['key1'])
c = Struct(prefix='foo')
print(c.i) # empty Struct
c.add(500) # add a value as foo1
print(c.__dict__)
will get you these results for object b:
{'prefix': 'arg', 'arg1': 5, 'arg2': -99.99, 'arg3': [1, 5, 15, 20], 'arg4': 'sample', 'arg5': {'key1': 5, 'key2': -100}, 'i': 6, 'arg6': 150.0001}
<class 'list'>
[1, 5]
5
and for object c:
0
{'prefix': 'foo', 'i': 1, 'foo1': 500}
Note that assigning attributes to objects is general - not only limited to scipy/numpy objects but applicable to all data types and custom objects (arrays, dataframes etc.). Of course that's a toy model - you can further develop it to make it able to be indexed, able to be pretty-printed, able to have elements removed, callable etc., based on your project needs. Just define the class at the beginning and then use it for storage-retrieval. That's the beauty of Python - it doesn't really have exactly what you seek especially if you come from MATLAB, but it can do so much more!