Every time I perform an arthetic operation in python, new number objects are created. Wouldn't it be more efficient for the interpreter to perform arithmetic operations with primitive data types rather than having the overhead of creating objects (even for numbers) for arithmetic operations?
Thank you in advance
Yes, it would.
Just like contrived benchmarks prove more or less whatever you want them to prove.
What happens when you add 3 + 23431827340987123049712093874192376491287364912873641298374190234691283764? Well, if you're using primitive types you get all kinds of overflows. If you're using Python objects then it does some long maths, which yeah, is slower than native math, but they're big numbers so you kind of have to deal with that.
Your right, when you use objects, you have a small overhead which deteriorates efficiency.
But numbers are non-mutable and memory allocation is optimized for small numbers: see the article Python integer objects implementation.
Using objects allows the developer to inherit the number classes int, float, complex and add new, specialized methods.
In Python, a type Hierarchy for Numbers is also defined: Number :> Complex :> Real :> Rational :> Integral. That way, you can have features similar to Scheme.
If you need optimization, you can use Cython. It uses the C language to perform optimizations.
On the other hand, you also have the famous NumPy for scientific computing with Python. This library is also compiled in C.
If you really want to play with primitive types, there is an array standard packages: Efficient arrays of numeric values.
As a conclusion: We agree that Python numbers are not primitive types, but using objects offer many more possibilities without introducing complexity for the developer, and without too much lose of performance.
Related
I am looking to create a custom scalar type that has magnitude and a bunch of sign bits. I have defined operations such as arithmetic and comparison. I want to use this scalar type with numpy extensively. (Eg., Normal operations would involve multiplying matrices of size 1000 X 1000 multiple times). So I need this to be as efficient as possible. I have an implementation in C++ using Eigen as matrix library but I need a python port to make it easily accessible in python.
I see three ways as of now :
Create a python class
Create an extension type as explained here
Using Boost.python and expose my C++ class
The operations of my type are only bit level (addition, for) but they would be done millions of times.
I would like to know what method would create the most efficient scalar type.
writing a class in python Vs using boost, would there be any observable difference in speed
writing an extension type in C Vs class in python, would the difference be noticable.
I could carry out experiment but I am looking for some experience. If this doesn't get answered, I will answer myself.
Edit:
As suggested by #Parfait, it looks like the best way would be using Cython cdef. Here is a quick benchmark involving numpy and Cython.
Actually, I am doing a sequence operation about numpy array, therefore, I want to know how to access a[i] quickly?
(Because I accessa[i-1] in the last loop, therefore, in c++, we may simply access a[i] by adding 1 to the address of a[i-1],but I don't know whether it is possible in numpy. Thanks.
I don't think this is possible/a[i] is the fastest way.
Python is a programming language that is easier to learn (and use) than c++, this of course comes at a cost, one of these costs is, that it's slower.
The references you're talking about can be "dangerous", hence python makes them not (easily) available to people, to protect them from things the do not understand.
While references are faster, you can't use them in python (as it is anyway slower, the difference in using references or not doesn't matter that much)
It's best not to think of a Python NumPy: ndarray as a C++ array. They're much different. Python also offers its own native list objects and includes an array module in its standard libraries.
A Python list behaves mostly like a generic array (as found in many programming languages). It's an ordered sequence; elements can be accessed by integer index from 0 up through (but not including) the length of the list (len(mylist)); ranges of elements can be access using "slice" notation (mylist[start_offset:end_offset]) returning another list object; negative indexes are treated as offsets from the end of the list (mylist[-1] is the last item of the list) and so on.
Additionally they support a number of methods such as .count(), .find() and .replace().
Unlike the arrays in most programming languages, Python lists are heterogenous. The elements can be any mixture of any object types in Python, including references to nested lists, dictionaries, code, classes, generator objects and other callable first class objects and, of course, instances of custom objects.
The Python array module allows one to instantiate homogenous list-like objects. That is you instantiate them using any of a dozen primitive data types (character or Unicode, signed or unsigned short or long integers, floating point or double precision floating point). Indexing and slicing are identical to Python native lists.
The primary advantage of Python array.array() instances is that they can store large numbers of their elements far more compactly than the more generalized list objects. Various operations on these arrays are likely to be somewhat faster than similar operations performed by iterating over or otherwise referencing elements in a native Python list because there's greater locality of reference in the more compact array layout (in memory) and because the type constraint obviates some dispatch overhead that's incurred when handling generalized objects.
NumPy, on the other hand, is far more sophisticated than the Python array module.
For one thing the ndarray can be multi-dimensional and can be dynamically reshaped. It's common to start with a linear ndarray and to reshape it into a matrix or other higher dimensional structure. Also the ndarray supports a much richer set of data types than the Python array module. NumPy also implements some rather advanced fancy indexingfeatures.
But the real performance advantages of NumPy relate to how it "vectorizes" most operations, broadcasts them across the data structures (possibly using any SIMD features supported by your CPU or even your GPU in the process. At the very list many common matrix operations, when properly written in Python for NumPy, are execute as native machine code speed. This performance edge does well beyond the minor effects of locality of references and obviating dispatch tables that one gains using the simple array module.
Is there any Python module that allows one to freely reinterpret raw bytes as different data types and perform various arithmetic operations on them? For example, take a look at this C snippet:
char test[9] = "test_data";
int32_t *x = test;
(*x)++;
printf("%d %.9s\n", x, test);
//outputs 1245780400 uest_data on LE system.
Here, both test and x point to the same chunk of memory. I can use x to perform airthmetics and testfor string-based operations and individual byte access.
I would like to do the same in Python - I know that I can use struct.pack and struct.unpack whenerver I feel the need to convert between list of bytes and an integer, but maybe there's a module that makes such fiddling much easier.
I'd like to have a datatype that would support arithmetics and at the same time, would allow me to mess with the individual bytes. This doesn't sound like a use-case that Python was designed for, but nevertheless, it's a quite common problem.
I took a look into pycrypto codebase, since cryptography is one of the most common use cases for such functionality, but pycrypto implements most of its algorithms in plain C and the much smaller Python part uses two handcrafted methods (long_to_bytes and bytes_to_long) for conversion between the types.
You may be able to use ctypes.Union, which is analogous to Unions in C.
I don't have much experience in C, but as I understand Unions allow you to write to a memory location as one type and read it as another type, and vice versa, which seems to fit your use case.
https://docs.python.org/3/library/ctypes.html#structures-and-unions
Otherwise, if your needs are simpler, you could just convert bytes / bytearray objects from / to integers using the built-in int.to_bytes and int.from_bytes methods.
I am writing a little arbitrary precision in c(I know such libraries exist, such as gmp, nut I find it more fun to write it, just to exercise myself), and I would like to know if arrays are the best way to represent very long integers, or if there is a better solution(maybe linked chains)? And secondly how does python work to have big integers?(does it use arrays or another technique ?)
Thanks in advance for any response
Try reading documentation about libgmp, it already implements bigints. From what I see, it seems like integers are implemented as a dynamically allocated which is realloc'd when the number needs to grow. See http://gmplib.org/manual/Integer-Internals.html#Integer-Internals.
Python long integers are implemented as a simple structure with just an object header and an array of integers (which might be either 16- or 32-bit ints, depending on platform).
Code is at http://hg.python.org/cpython/file/f8942b8e6774/Include/longintrepr.h
In Python, what is the best data structure of n bits (where n is about 10000) on which performing the usual binary operations (&, |, ^) with other such data structures is fast?
"Fast" is always relative :)
The BitVector package seems to do what you need. I have no experience concerning performance with it though.
There is also a BitString implementation. Perhaps you do some measurements to find out, which one is more performant for your specific needs?
If you don't want a specific class and do not need things such as slicing or bit counting, you might be ok simply using python's long values which are arbitrary length integers. This might be the most performant implementation.
This qestion seems to be similar, although the author need fewer bits and requires a standard library.
In addition to those mentioned by MartinStettner, there's also the bitarray module, which I've used on multiple occations with great results.
PS: My 100th answer, wohooo!