I have a url:
https://enterpriseefiling.fcc.gov/dataentry/api/download/dbfile/Current_LMS_Dump.zip
Let
u1 = ['https://enterpriseefiling.fcc.gov/dataentry/api/download/dbfile/Current_LMS_Dump.zip']
I run the following code in my python interpreter
import requests, io
r = requests.get(u1, stream=True)
io.BytesIO(r.content)
I get the following response
<_io.BytesIO object at 0x000002244592F1A8>
My question is: what does this mean? Where is 0x000002244592F1A8? What does 0x000002244592F1A8 refer to?
When python needs to print out an object, and the object doesn't otherwise have a built-in method that tells the interpreter how to print it out (for example, requests.Response and python's built-in list and dict types do have this sort of instruction), python uses this format:
<[objecttype] object at [pointer]>
where pointer is literally a pointer to the object's location in memory. That's what you see here: when you do io.BytesIO(r.content) in your interpreter, you create an io.BytesIO object.
A different method tends to get called when, on the interpreter, you do
>>> print(<object>)
rather than just
>>> <object>
and the io.BytesIO class certainly has methods you can use for more useful output, if you look at its documentation. Try assigning it to a variable instead of printing it:
b = io.BytesIO(r.content)
What does 0x000002244592F1A8 refer to?
It refers to the identity of the object. The number is an implementation detail (in CPython it happens to be the address of the object in memory, the same number returned by the id builtin), but what you can count on is that the number will be different for every BytesIO object currently extant in the process.
That kind of information is included in the __repr__ of many objects because it can come useful when debugging, allowing one to distinguish different objects that might have identical content.
My understanding is that 0x000002244592F1A8 here is the formatted id of the object in memory. I say "formatted", because, if you do id() on the same object, the representation will be a little different (it will be formatted as an int or a long instead of a pointer address:
In [1]: import io
In [2]: obj = io.BytesIO()
In [3]: obj
Out[3]: <_io.BytesIO at 0x10da1ca70>
In [4]: id(obj)
Out[4]: 4523674224
To convert the id() to the format you see, you can do something like this (stolen from this post):
In [5]: format(id(obj), '#010x' if sys.maxsize.bit_length() <= 32 else '#18x')
Out[5]: ' 0x10da1ca70'
This ^^ is not particularly useful, but it just shows you how id() lines up with what you are seeing.
The reason you are seeing it is just that it is displayed as part of the default __repr__() for the BytesIO object.
Related
I have a C++ library which performs analysis on audio data, and a C API to it. One of the C API functions takes const int16_t* pointers to the data and returns the results of the analysis.
I'm trying to build a Python interface to this API, and most of it is working, but I'm having trouble getting ctypes pointers to use as arguments for this function. Since the pointers on the C side are to const, it feels to me like it ought to be possible to make this work fine with any contiguous data. However, the following does not work:
import ctypes
import wave
_native_lib = ctypes.cdll.LoadLibrary('libsound.so')
_native_function = _native_lib.process_sound_data
_native_function.argtypes = [ctypes.POINTER(ctypes.c_int16),
ctypes.c_size_t]
_native_function.restype = ctypes.c_int
wav_path = 'hello.wav'
with wave.open(wav_path, mode='rb') as wav_file:
wav_bytes = wav_file.readframes(wav_file.getnframes())
data_start = ctypes.POINTER(ctypes.c_int16).from_buffer(wav_bytes) # ERROR: data is immutable
_native_function(data_start, len(wav_bytes)//2)
Manually copying wav_bytes to a bytearray allows the pointer to be constructed but causes the native code to segfault, indicating that the address it receives is wrong (it passes unit tests with data read in from C++). Fixing this by getting the address right would technically solve the problem but I feel like there's a better way.
Surely it's possible to just get the address of some data and promise that it's the right format and won't be altered? I'd prefer not to have to deep copy all my Pythonically-stored audio data to a ctypes format, since presumably the bytes are in there somewhere if I can just get a pointer to them!
Ideally, I'd like to be able to do something like this
data_start = cast_to(address_of(data[0]), c_int16_pointer)
_native_function(data_start, len(data))
which would then work with anything that has a [0] and a len. Is there a way to do something like this in ctypes? If not, is there a technical reason why it's impossible, and is there something else I should be using instead?
This should work for you. Use array for a writable buffer and create a ctypes array that references the buffer.
data = array.array('h',wav_bytes)
addr,size = data.buffer_info()
arr = (c_short * size).from_address(addr)
_native_function(arr,size)
Alternatively, to skip the copy of wav_bytes into data array, you could lie about the pointer type in argtypes. ctypes knows how convert a byte string to a c_char_p. A pointer is just an address, so the _native_function will receive the address but use it as an int* internally:
_native_function.argtypes = c_char_p,c_size_t
_native_function(wav_bytes,len(wav_bytes) // 2)
Another way to work around the "underlying buffer is not writable" error is to leverage c_char_p, which allows an immutable byte string to used, and then explicitly cast it to the pointer type you want:
_native_function.argtypes = POINTER(c_short),c_size_t
p = cast(c_char_p(wav_bytes),POINTER(c_short))
_native_function(p,len(wav_bytes) // 2)
In these latter cases you must ensure you don't actually write to the buffer as it will corrupt the immutable Python object holding the data.
I had a look around at the CPython bug tracker to see if this had come up before, and it seems it was raised as an issue in 2011. I agree with the poster that it's a serious mis-design, but it seems the developers at that time did not.
Eryk Sun's comment on that thread revealed that it's actually possible to just use ctypes.cast directly. Here is part of the comment:
cast calls ctypes._cast(obj, obj, typ). _cast is a ctypes function pointer defined as follows:
_cast = PYFUNCTYPE(py_object,
c_void_p, py_object, py_object)(_cast_addr)
Since cast makes an FFI call that converts the first arg to c_void_p, you can directly cast bytes to a pointer type:
>>> from ctypes import *
>>> data = b'123\x00abc'
>>> ptr = cast(data, c_void_p)
It's a bit unclear to me if this is actually required by the standard or if it's just a CPython implementation detail, but the following works for me in CPython:
import ctypes
data = b'imagine this string is 16-bit sound data'
data_ptr = ctypes.cast(data, ctypes.POINTER(ctypes.c_int16))
The documentation on cast says the following:
ctypes.cast(obj, type)
This function is similar to the cast operator in C. It returns a new instance of type which points to the same memory block as obj. type must be a pointer type, and obj must be an object that can be interpreted as a pointer.
so it seems that that CPython is of the opinion that bytes 'can be interpreted as a pointer'. This seems fishy to me, but these modern pointer-hiding languages have a way of messing with my intuition.
Request
I was wondering if it's possible to take the default jsons.dump behavior and make it idempotent (return the input string) for python IPAdresses.
This would enable me to use an object while in python and use the same string in all serializations and deserializations. That way when we load the serialized JSON we don't need different control paths for the first program that loads the data and the second + N programs that load it.
Current Behavior
>>> import ipaddress
>>> import jsons
>>> import ipaddress
>>> ipaddress.IPv4Address("192.0.0.1")
IPv4Address('192.0.0.1')
>>> jsons.dump(ipaddress.IPv4Address("192.0.0.1"))
{'_ip': 3221225473}
>>> jsons.load(jsons.dump(ipaddress.IPv4Address("192.0.0.1")))
{'_ip': 3221225473}
Desired Behavior
>>> jsons.load(jsons.dump(ipaddress.IPv4Address("192.0.0.1")))
"192.0.0.1"
Desired but Probably Asking too Much
>>> jsons.load(jsons.dump(ipaddress.IPv4Address("192.0.0.1")))
IPv4Address('192.0.0.1')
Current workaround
I've changed the __repr__ method to do type conversions to string for now. But this means I have to do jsons.dump(repr(<variable>)) and this means other developers that work with my code have a potential landmine they need to be aware of.
I'm trying to test for and fix a bug in pprint++ (edit: the correct link; original link left for posterity) which is coming up because the instancemethod type is not hashable:
In [16]: import pandas as pd
In [17]: type(pd.tslib.NaT).__repr__
Out[17]: <instancemethod __repr__ at 0x1058d2be8>
In [18]: hash(type(pd.tslib.NaT).__repr__)
...
TypeError: unhashable type: 'instancemethod'
But I'm having trouble testing for this issue because I don't know where else I can find an instancemethod in the Python 3 standard library, and I don't want my tests to depend on Pandas.
Specifically, it seems like the "normal" builtin types have "instance methods" that are implemented slightly differently:
In [19]: type(None).__repr__
Out[19]: <slot wrapper '__repr__' of 'NoneType' objects>
In [20]: hash(type(None).__repr__)
Out[20]: -9223372036583849574
So: where can I find an instancemethod in the Python 3 standard library so I can write tests against it? Or is it a special type that doesn't appear there?
(note: this only appears to affect Python 3, as the same method in Python 2 is an unbound method, which is hashable)
This type isn't used in anything that comes with Python, and there's no Python-level API to create objects of this type. However, you can do it with a direct C API call:
import ctypes
PyInstanceMethod_New = ctypes.pythonapi.PyInstanceMethod_New
PyInstanceMethod_New.argtypes = (ctypes.py_object,)
PyInstanceMethod_New.restype = ctypes.py_object
arbitrary_callable = sum
instance_method = PyInstanceMethod_New(arbitrary_callable)
The name instancemethod looks a lot like a bound method object, but it turns out it's something else entirely. It's a weird internal thing that, according to its documentation, is supposed to be the new way for C types to represent their methods, except that the standard C-level API for creating a type doesn't actually use it.
According to conversations on the Python issue tracker, this feature was requested by the developers of Cython and Pyrex. It looks like pandas.tslib.NaT is implemented in Cython, and the Cython implementation actually uses this type, where the standard C API for creating types doesn't.
Note that the situation is completely different on Python 2. On Python 2, this new type didn't exist, and instancemethod was the name of the type of method objects representing ordinary methods written in Python. In Python 3, the new type took that name, and the type of method objects for methods written in Python is now named method.
Python provides it, but basically only as part of their test suite AFAICT (no included batteries use it otherwise). You can make one for testing using the _testcapi module:
>>> import _testcapi
>>> testinstancemethod = _testcapi.instancemethod(str.__repr__)
>>> hash(testinstancemethod)
...
TypeError: unhashable type: 'instancemethod'
I am new to Python and I have been stuck for hours with this problem... I don't know how to convert a variable (type string) to another variable (type instance).
>>from Crypto.PublicKey import RSA
>>from Crypto import Random
>>randomValue = Random.new().read
>>priv = RSA.generate(512, randomValue)
After these lines of code, "priv" is created, and this has type "instance".
And I had to convert this "priv" to type string using str(priv).
>>convertedToStr = str(priv)
>>type(convertedToStr)
<type 'str'>
Now, I need to convert it back to 'instance' and want to get the same thing in value and type as the original "priv". Assume that I cannot use "priv" anymore, and I need to convert "convertedToStr" (type string) into "convertedToStr" (type instance).
Is this ever possible?
Note: The reason I am doing this complex thing is because I have client and server sides and when one side sends a message to the other using sendall(var), it does not allow me to send variable of type 'instance'. So I had to convert it to string before sending it. Now, I want to use that on the receiver side as an variable of type 'instance' but I do not know how to convert it back.
The instance type is used for instances of old-style classes in Python 2. You may want to look at priv.__class__ instead of type(priv) to find out what class it actually has. I expect you'll find that it's class is Crypto.PublicKey.RSA._RSAObject, since that's what the generate function is documented to return.
I don't have the Crypto package installed, so I don't actually know what string you get when you call str on a private key instance. You might be able to parse the string and then call the function Crypto.PublicKey.RSA.construct with appropriate values to reconstruct the key object.
But I think that is doing more work than necessary. Instead of calling str on the key, you should instead call its exportKey method. Then, after you send the string you get back to the other system, you can pass it to Crypto.PublicKey.RSA.importKey.
Note that sending a private key over a network may expose it to eavesdropping, making it useless! You probably shouldn't do it unless the connection between your two systems is encrypted with some other system. Your system is only as secure as its weakest link.
Type instance is nothing specific, you can make a custom class and instantiate it, and it will have type instance:
>>> class x:
... y=1
...
>>> type(x())
<type 'instance'>
You can't arbitrarily convert things to a string by calling str() and guarantee get useful results - it merely asks the object to return a string that could say anything at all. In this case you asked for an RSA private key 512 bytes long and the str() output is ~45 bytes long, there's not 10% of the information needed to get the full object state back from that.
The general problem you're trying to solve is serialization/deserialization, and it's the topic of many modules, libraries and protocols - but luckily RSA keys are easy to convert to useful text and back again (not all objects are).
>>> out = priv.exportKey()
>>> new = RSA.importKey(out)
>>> new == priv
True
NB. when I tried your code, it clearly complained at me that 512 byte keys are weak and refused to generate them, insisting on 1024 bytes or more. You possibly are on an older version, but should specify a longer keylength.
When you call the object.__repr__() method in Python you get something like this back:
<__main__.Test object at 0x2aba1c0cf890>
Is there any way to get a hold of the memory address if you overload __repr__(), other then calling super(Class, obj).__repr__() and regexing it out?
The Python manual has this to say about id():
Return the "identity'' of an object.
This is an integer (or long integer)
which is guaranteed to be unique and
constant for this object during its
lifetime. Two objects with
non-overlapping lifetimes may have the
same id() value. (Implementation note:
this is the address of the object.)
So in CPython, this will be the address of the object. No such guarantee for any other Python interpreter, though.
Note that if you're writing a C extension, you have full access to the internals of the Python interpreter, including access to the addresses of objects directly.
You could reimplement the default repr this way:
def __repr__(self):
return '<%s.%s object at %s>' % (
self.__class__.__module__,
self.__class__.__name__,
hex(id(self))
)
Just use
id(object)
There are a few issues here that aren't covered by any of the other answers.
First, id only returns:
the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
In CPython, this happens to be the pointer to the PyObject that represents the object in the interpreter, which is the same thing that object.__repr__ displays. But this is just an implementation detail of CPython, not something that's true of Python in general. Jython doesn't deal in pointers, it deals in Java references (which the JVM of course probably represents as pointers, but you can't see those—and wouldn't want to, because the GC is allowed to move them around). PyPy lets different types have different kinds of id, but the most general is just an index into a table of objects you've called id on, which is obviously not going to be a pointer. I'm not sure about IronPython, but I'd suspect it's more like Jython than like CPython in this regard. So, in most Python implementations, there's no way to get whatever showed up in that repr, and no use if you did.
But what if you only care about CPython? That's a pretty common case, after all.
Well, first, you may notice that id is an integer;* if you want that 0x2aba1c0cf890 string instead of the number 46978822895760, you're going to have to format it yourself. Under the covers, I believe object.__repr__ is ultimately using printf's %p format, which you don't have from Python… but you can always do this:
format(id(spam), '#010x' if sys.maxsize.bit_length() <= 32 else '#18x')
* In 3.x, it's an int. In 2.x, it's an int if that's big enough to hold a pointer—which is may not be because of signed number issues on some platforms—and a long otherwise.
Is there anything you can do with these pointers besides print them out? Sure (again, assuming you only care about CPython).
All of the C API functions take a pointer to a PyObject or a related type. For those related types, you can just call PyFoo_Check to make sure it really is a Foo object, then cast with (PyFoo *)p. So, if you're writing a C extension, the id is exactly what you need.
What if you're writing pure Python code? You can call the exact same functions with pythonapi from ctypes.
Finally, a few of the other answers have brought up ctypes.addressof. That isn't relevant here. This only works for ctypes objects like c_int32 (and maybe a few memory-buffer-like objects, like those provided by numpy). And, even there, it isn't giving you the address of the c_int32 value, it's giving you the address of the C-level int32 that the c_int32 wraps up.
That being said, more often than not, if you really think you need the address of something, you didn't want a native Python object in the first place, you wanted a ctypes object.
Just in response to Torsten, I wasn't able to call addressof() on a regular python object. Furthermore, id(a) != addressof(a). This is in CPython, don't know about anything else.
>>> from ctypes import c_int, addressof
>>> a = 69
>>> addressof(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: invalid type
>>> b = c_int(69)
>>> addressof(b)
4300673472
>>> id(b)
4300673392
You can get something suitable for that purpose with:
id(self)
With ctypes, you can achieve the same thing with
>>> import ctypes
>>> a = (1,2,3)
>>> ctypes.addressof(a)
3077760748L
Documentation:
addressof(C instance) -> integer
Return the address of the C instance internal buffer
Note that in CPython, currently id(a) == ctypes.addressof(a), but ctypes.addressof should return the real address for each Python implementation, if
ctypes is supported
memory pointers are a valid notion.
Edit: added information about interpreter-independence of ctypes
I know this is an old question but if you're still programming, in python 3 these days... I have actually found that if it is a string, then there is a really easy way to do this:
>>> spam.upper
<built-in method upper of str object at 0x1042e4830>
>>> spam.upper()
'YO I NEED HELP!'
>>> id(spam)
4365109296
string conversion does not affect location in memory either:
>>> spam = {437 : 'passphrase'}
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
>>> str(spam)
"{437: 'passphrase'}"
>>> object.__repr__(spam)
'<dict object at 0x1043313f0>'
You can get the memory address/location of any object by using the 'partition' method of the built-in 'str' type.
Here is an example of using it to get the memory address of an object:
Python 3.8.3 (default, May 27 2020, 02:08:17)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> object.__repr__(1)
'<int object at 0x7ca70923f0>'
>>> hex(int(object.__repr__(1).partition('object at ')[2].strip('>'), 16))
0x7ca70923f0
>>>
Here, I am using the built-in 'object' class' '__repr__' method with an object/item such as 1 as an argument to return the string and then I am partitioning that string which will return a tuple of the string before the string that I provided, the string that I provided and then the string after the string that I provided, and as the memory location is positioned after 'object at', I can get the memory address as it has partitioned it from that part.
And then as the memory address was returned as the third item in the returned tuple, I can access it with index 2 from the tuple. But then, it has a right angled bracket as a suffix in the string that I obtained, so I use the 'strip' function to remove it, which will return it without the angled bracket. I then transformed the resulted string into an integer with base 16 and then turn it into a hex number.
While it's true that id(object) gets the object's address in the default CPython implementation, this is generally useless... you can't do anything with the address from pure Python code.
The only time you would actually be able to use the address is from a C extension library... in which case it is trivial to get the object's address since Python objects are always passed around as C pointers.
If the __repr__ is overloaded, you may consider __str__ to see the memory address of the variable.
Here is the details of __repr__ versus __str__ by Moshe Zadka in StackOverflow.
There is a way to recovery the value from the 'id' command, here it the TL;DR.
ctypes.cast(memory_address,ctypes.py_object).value
source