Counting number of symbols in Python script - python

I have a Telit module which runs [Python 1.5.2+] (http://www.roundsolutions.com/techdocs/python/Easy_Script_Python_r13.pdf)!. There are certain restrictions in the number of variable, module and method names I can use (< 500), the size of each variable (16k) and amount of RAM (~ 1MB). Refer pg 113&114 for details. I would like to know how to get the number of symbols being generated, size in RAM of each variable, memory usage (stack and heap usage).
I need something similar to a map file that gets generated with gcc after the linking process which shows me each constant / variable, symbol, its address and size allocated.

Python is an interpreted and dynamically-typed language, so generating that kind of output is very difficult, if it's even possible. I'd imagine that the only reasonable way to get this information is to profile your code on the target interpreter.
If you're looking for a true memory map, I doubt such a tool exists since Python doesn't go through the same kind of compilation process as C or C++. Since everything is initialized and allocated at runtime as the program is parsed and interpreted, there's nothing to say that one interpreter will behave the same as another, especially in a case such as this where you're running on such a different architecture. As a result, there's nothing to say that your objects will be created in the same locations or even with the same overall memory structure.
If you're just trying to determine memory footprint, you can do some manual checking with sys.getsizeof(object, [default]) provided that it is supported with Telit's libs. I don't think they're using a straight implementation of CPython. Even still, this doesn't always work and with raise a TypeError when an object's size cannot be determined if you don't specify the default parameter.
You might also get some interesting results by studying the output of the dis module's bytecode disassembly, but that assumes that dis works on your interpreter, and that your interpreter is actually implemented as a VM.
If you just want a list of symbols, take a look at this recipe. It uses reflection to dump a list of symbols.
Good manual testing is key here. Your best bet is to set up the module's CMUX (COM port MUXing), and watch the console output. You'll know very quickly if you start running out of memory.

This post makes me recall my pain once with Telit GM862-GPS modules. My code was exactly at the point that the number of variables, strings, etc added up to the limit. Of course, I didn't know this fact by then. I added one innocent line and my program did not work any more. I drove me really crazy for two days until I look at the datasheet to find this fact.
What you are looking for might not have a good answer because the Python interpreter is not a full fledged version. What I did was to use the same local variable names as many as possible. Also I deleted doc strings for functions (those count too) and replace with #comments.
In the end, I want to say that this module is good for small applications. The python interpreter does not support threads or interrupts so your program must be a super loop. When your application gets bigger, each iteration will take longer. Eventually, you might want to switch to a faster platform.

Related

What is the best or proper way to allow debugging of generated code?

For various reasons, in one project I generate executable code by means of generating AST from various source files the compiling that to bytecode (though the question could also work for cases where the bytecode is generated directly I guess).
From some experimentation, it looks like the debugger more or less just uses the lineno information embedded in the AST alongside the filename passed to compile in order to provide a representation for the debugger's purposes, however this assumes the code being executed comes from a single on-disk file.
That is not necessarily the case for my project, the executable code can be pieced together from multiple sources, and some or all of these sources may have been fetched over the network, or been retrieved from non-disk storage (e.g. database).
And so my Y questions, which may be the wrong ones (hence the background):
is it possible to provide a memory buffer of some sort, or is it necessary to generate a singular on-disk representation of the "virtual source"?
how well would the debugger deal with jumping around between the different bits and pieces if the virtual source can't or should not be linearised[0]
and just in case, is the assumption of Python only supporting a single contiguous source file correct or can it actually be fed multiple sources somehow?
[0] for instance a web-style literate program would be debugged in its original form, jumping between the code sections, not in the so-called "tangled" form
Some of this can be handled by the trepan3k debugger. For other things various hooks are in place.
First of all it can debug based on bytecode alone. But of course stepping instructions won't be possible if the line number table doesn't exist. And for that reason if for no other, I would add a "line number" for each logical stopping point, such as at the beginning of statements. The numbers don't have to be line numbers, they could just count from 1 or be indexes into some other table. This is more or less how go's Pos type position works.
The debugger will let you set a breakpoint on a function, but that function has to exist and when you start any python program most of the functions you define don't exist. So the typically way to do this is to modify the source to call the debugger at some point. In trepan3k the lingo for this is:
from trepan.api import debug; debug()
Do that in a place where the other functions you want to break on and that have been defined.
And the functions can be specified as methods on existing variables, e.g. self.my_function()
One of the advanced features of this debugger is that will decompile the bytecode to produce source code. There is a command called deparse which will show you the context around where you are currently stopped.
Deparsing bytecode though is a bit difficult so depending on which kind of bytecode you get the results may vary.
As for the virtual source problem, well that situation is somewhat tolerated in the debugger, since that kind of thing has to go on when there is no source. And to facilitate this and remote debugging (where the file locations locally and remotely can be different), we allow for filename remapping.
Another library pyficache is used to for this remapping; it has the ability I believe remap contiguous lines of one file into lines in another file. And I think you could use this over and over again. However so far there hasn't been need for this. And that code is pretty old. So someone would have to beef up trepan3k here.
Lastly, related to trepan3k is a trepan-xpy which is a CPython bytecode debugger which can step bytecode instructions even when the line number table is empty.

LinkedList on python and c++ [duplicate]

Why does Python seem slower, on average, than C/C++? I learned Python as my first programming language, but I've only just started with C and already I feel I can see a clear difference.
Python is a higher level language than C, which means it abstracts the details of the computer from you - memory management, pointers, etc, and allows you to write programs in a way which is closer to how humans think.
It is true that C code usually runs 10 to 100 times faster than Python code if you measure only the execution time. However if you also include the development time Python often beats C. For many projects the development time is far more critical than the run time performance. Longer development time converts directly into extra costs, fewer features and slower time to market.
Internally the reason that Python code executes more slowly is because code is interpreted at runtime instead of being compiled to native code at compile time.
Other interpreted languages such as Java bytecode and .NET bytecode run faster than Python because the standard distributions include a JIT compiler that compiles bytecode to native code at runtime. The reason why CPython doesn't have a JIT compiler already is because the dynamic nature of Python makes it difficult to write one. There is work in progress to write a faster Python runtime so you should expect the performance gap to be reduced in the future, but it will probably be a while before the standard Python distribution includes a powerful JIT compiler.
CPython is particularly slow because it has no Just in Time optimizer (since it's the reference implementation and chooses simplicity over performance in certain cases). Unladen Swallow is a project to add an LLVM-backed JIT into CPython, and achieves massive speedups. It's possible that Jython and IronPython are much faster than CPython as well as they are backed by heavily optimized virtual machines (JVM and .NET CLR).
One thing that will arguably leave Python slower however, is that it's dynamically typed, and there is tons of lookup for each attribute access.
For instance calling f on an object A will cause possible lookups in __dict__, calls to __getattr__, etc, then finally call __call__ on the callable object f.
With respect to dynamic typing, there are many optimizations that can be done if you know what type of data you are dealing with. For example in Java or C, if you have a straight array of integers you want to sum, the final assembly code can be as simple as fetching the value at the index i, adding it to the accumulator, and then incrementing i.
In Python, this is very hard to make code this optimal. Say you have a list subclass object containing ints. Before even adding any, Python must call list.__getitem__(i), then add that to the "accumulator" by calling accumulator.__add__(n), then repeat. Tons of alternative lookups can happen here because another thread may have altered for example the __getitem__ method, the dict of the list instance, or the dict of the class, between calls to add or getitem. Even finding the accumulator and list (and any variable you're using) in the local namespace causes a dict lookup. This same overhead applies when using any user defined object, although for some built-in types, it's somewhat mitigated.
It's also worth noting, that the primitive types such as bigint (int in Python 3, long in Python 2.x), list, set, dict, etc, etc, are what people use a lot in Python. There are tons of built in operations on these objects that are already optimized enough. For example, for the example above, you'd just call sum(list) instead of using an accumulator and index. Sticking to these, and a bit of number crunching with int/float/complex, you will generally not have speed issues, and if you do, there is probably a small time critical unit (a SHA2 digest function, for example) that you can simply move out to C (or Java code, in Jython). The fact is, that when you code C or C++, you are going to waste lots of time doing things that you can do in a few seconds/lines of Python code. I'd say the tradeoff is always worth it except for cases where you are doing something like embedded or real time programming and can't afford it.
Compilation vs interpretation isn't important here: Python is compiled, and it's a tiny part of the runtime cost for any non-trivial program.
The primary costs are: the lack of an integer type which corresponds to native integers (making all integer operations vastly more expensive), the lack of static typing (which makes resolution of methods more difficult, and means that the types of values must be checked at runtime), and the lack of unboxed values (which reduce memory usage, and can avoid a level of indirection).
Not that any of these things aren't possible or can't be made more efficient in Python, but the choice has been made to favor programmer convenience and flexibility, and language cleanness over runtime speed. Some of these costs may be overcome by clever JIT compilation, but the benefits Python provides will always come at some cost.
The difference between python and C is the usual difference between an interpreted (bytecode) and compiled (to native) language. Personally, I don't really see python as slow, it manages just fine. If you try to use it outside of its realm, of course, it will be slower. But for that, you can write C extensions for python, which puts time-critical algorithms in native code, making it way faster.
Python is typically implemented as a scripting language. That means it goes through an interpreter which means it translates code on the fly to the machine language rather than having the executable all in machine language from the beginning. As a result, it has to pay the cost of translating code in addition to executing it. This is true even of CPython even though it compiles to bytecode which is closer to the machine language and therefore can be translated faster. With Python also comes some very useful runtime features like dynamic typing, but such things typically cannot be implemented even on the most efficient implementations without heavy runtime costs.
If you are doing very processor-intensive work like writing shaders, it's not uncommon for Python to be somewhere around 200 times slower than C++. If you use CPython, that time can be cut in half but it's still nowhere near as fast. With all those runtmie goodies comes a price. There are plenty of benchmarks to show this and here's a particularly good one. As admitted on the front page, the benchmarks are flawed. They are all submitted by users trying their best to write efficient code in the language of their choice, but it gives you a good general idea.
I recommend you try mixing the two together if you are concerned about efficiency: then you can get the best of both worlds. I'm primarily a C++ programmer but I think a lot of people tend to code too much of the mundane, high-level code in C++ when it's just a nuisance to do so (compile times as just one example). Mixing a scripting language with an efficient language like C/C++ which is closer to the metal is really the way to go to balance programmer efficiency (productivity) with processing efficiency.
Comparing C/C++ to Python is not a fair comparison. Like comparing a F1 race car with a utility truck.
What is surprising is how fast Python is in comparison to its peers of other dynamic languages. While the methodology is often considered flawed, look at The Computer Language Benchmark Game to see relative language speed on similar algorithms.
The comparison to Perl, Ruby, and C# are more 'fair'
Aside from the answers already posted, one thing is Python's ability to change things during runtime, which you can't do in other languages such as C. You can add member functions to classes as you go.
Also, Pythons' dynamic nature makes it impossible to say what type of parameters will be passed to a function, which in turn makes optimizing a whole lot harder.
RPython seems to be a way of getting around the optimization problem.
Still, it'll probably won't be near the performance of C for number-crunching and the like.
C and C++ compile to native code- that is, they run directly on the CPU. Python is an interpreted language, which means that the Python code you write must go through many, many stages of abstraction before it can become executable machine code.
Python is a high-level programming language. Here is how a python script runs:
The python source code is first compiled into Byte Code. Yes, you heard me right! Though Python is an interpreted language, it first gets compiled into byte code. This byte code is then interpreted and executed by the Python Virtual Machine(PVM).
This compilation and execution are what make Python slower than other low-level languages such as C/C++. In languages such as C/C++, the source code is compiled into binary code which can be directly executed by the CPU thus making their execution efficient than that of Python.
This answer applies to python3. Most people do not know that a JIT-like compile occurs whenever you use the import statement. CPython will search for the imported source file (.py), take notice of the modification date, then look for compiled-to-bytecode file (.pyc) in a subfolder named "_ _ pycache _ _" (dunder pycache dunder). If everything matches then your program will use that bytecode file until something changes (you change the source file or upgrade Python)
But this never happens with the main program which is usually started from a BASH shell, interactively or via. Here is an example:
#!/usr/bin/python3
# title : /var/www/cgi-bin/name2.py
# author: Neil Rieck
# edit : 2019-10-19
# ==================
import name3 # name3.py will be cache-checked and/or compiled
import name4 # name4.py will be cache-checked and/or compiled
import name5 # name5.py will be cache-checked and/or compiled
#
def main():
#
# code that uses the imported libraries goes here
#
if __name__ == "__main__":
main()
#
Once executed, the compiled output code will be discarded. However, your main python program will be compiled if you start up via an import statement like so:
#!/usr/bin/python3
# title : /var/www/cgi-bin/name1
# author: Neil Rieck
# edit : 2019-10-19
# ==================
import name2 # name2.py will be cache-checked and/or compiled
#name2.main() #
And now for the caveats:
if you were testing code interactively in the Apache area, your compiled file might be saved with privs that Apache can't read (or write on a recompile)
some claim that the subfolder "_ _ pycache _ _" (dunder pycache dunder) needs to be available in the Apache config
will SELinux allow CPython to write to subfolder (this was a problem in CentOS-7.5 but I believe a patch has been made available)
One last point. You can access the compiler yourself, generate the pyc files, then change the protection bits as a workaround to any of the caveats I've listed. Here are two examples:
method #1
=========
python3
import py_compile
py_compile("name1.py")
exit()
method #2
=========
python3 -m py_compile name1.py
python is interpreted language is not complied and its not get combined with CPU hardware
but I have a solutions for increase python as a faster programing language
1.Use python3 for run and code python command like Ubuntu or any Linux distro use python3 main.py and update regularly your python so you python3 framework modules and libraries i will suggest use pip 3.
2.Use [Numba][1] python framework with JIT compiler this framework use for data visualization but you can use for any program this framework use GPU acceleration of your program.
3.Use [Profiler optimizing][1] so this use for see with function or syntax for bit longer or faster also have use full to change syntax as a faster for python its very god and work full so this give a with function or syntax using much more time execution of code.
4.Use multi threading so making multiprocessing of program for python so use CPU cores and threads so this make your code much more faster.
5.Using C,C#,C++ increasing python much more faster i think its called parallel programing use like a [cpython][1] .
6.Debug your code for test your code to make not bug in your code so then you will get little bit your code faster also have one more thing Application logging is for debugging code.
and them some low things that makes your code faster:
1.Know the basic data structures for using good syntax use make best code.
2.make a best code have Reduce memory footprinting.
3.Use builtin functions and libraries.
4.Move calculations outside the loop.
5.keep your code base small.
so using this thing then get your code much more faster yes so using this python not a slow programing language

Does Python load in function arguments into registers or does it keep them on the stack?

So I'm writing a function that takes in a tuple as an argument and does a bunch of stuff to it. Here is what that looks like:
def swap(self, location):
if (location[0] < 0 or location[1] < 0 or
location[0] >= self.r or location[1] >= self.c):
return False
self.board[0][0] = self.board[location[0]][location[1]]
self.board[location[0]][location[1]] = 0
self.empty = (location[0],location[1])
I'm trying to make my code as efficient as possible, so since I am not modifying the values of location, does it make sense to load the variables in registers (loc0 = location[0]; loc1 = location[1]) for faster computations (zero-cycle read) or is location already loaded into registers by the Python compiler when it's passed in as a function argument?
Edit: I bit the bullet and ran some tests. Here are the results (in seconds) for this function running 10 million times with the repeating inputs: "up", "down", "left", "right" (respectively)
Code as is:
run#1: 19.39
run#2: 17.18
run#3: 16.85
run#4: 16.90
run#5: 16.74
run#6: 16.76
run#7: 16.94
Code after defining location[0] and location[1] in the beginning of the function:
run#1: 14.83
run#2: 14.79
run#3: 14.88
run#4: 15.033
run#5: 14.77
run#6: 14.94
run#7: 14.67
That's an average of 16% increase in performance. Definitely not insignificant for my case. Of course, this is not scientific as I need to do more tests in more environments with more inputs, but enough for my simple use case!
Times measured using Python 2.7 on a Macbook Pro (Early 2015), which has a Broadwell i5-5257U CPU (2c4t max turbo 3.1GHz, sustained 2.7GHz, 3MB L3 cache).
IDE was: PyCharm Edu 3.5.1 JRE: 1.8.0_112-release-408-b6 x86_64 JVM: OpenJDK 64-Bit Server VM .
Unfortunately, this is for a class that grades based on code speed.
If you're using an interpreter, it's unlikely that any Python variables will live in registers between different expressions. You could look at how the Python source compiled to byte-code.
Python bytecode (the kind stored in files outside the interpreter) is stack-based (http://security.coverity.com/blog/2014/Nov/understanding-python-bytecode.html). This byte-code is then interpreted or JIT-compiled to native machine code. Regular python only interprets, so it's not plausible for it to keep python variables in machine registers across multiple statements.
An interpreter written in C might keep the top of the bytecode stack in a local variable inside an interpret loop, and the C compiler might keep that C variable in a register. So repeated use of the same Python variable might end up not having too many store/reload round-trips.
Note that store-forwarding latency on your Broadwell CPU is about 4 or 5 clock cycles, nowhere near the hundreds of cycles for a round-trip to DRAM. A store/reload doesn't even have to wait for the store to retire and commit to L1D cache; it's forwarded directly from the store buffer. Related: http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/ and http://agner.org/optimize/, and other links in the x86 tag wiki). Load-use latency is also only 5 clock cycles for an L1D cache hit (latency from address being ready to data being ready. You can measure it by pointer-chasing through a linked list (in asm).) There's enough interpreter overhead (total number of instructions it runs to figure out what to do next) that this probably isn't even the bottleneck.
Keeping a specific python variable in a register is not plausible at all for an interpreter. Even if you wrote an interpreter in asm, the fundamental problem is that registers aren't addressable. An x86 add r14d, eax instruction has to have both registers hard-coded into the instruction's machine-code. (Every other ISA works the same way: register numbers are part of the machine-code for the instruction, with no indirection based on any data). Even if the interpreter did the work to figure out that it needed to "add reg-var #3 to reg-var #2" (i.e. decoding the bytecode stack operations back into register variables for an internal representation that it interprets), it would have to use a different function than any other combination of registers.
Given an integer, the only ways to get the value of the Nth register are branching to an instruction that uses that register, or storing all the registers to memory and indexing the resulting array. (Or maybe some kind of branchless compare and mask stuff).
Anyway, trying to do anything specific about this is not profitable, which is why people just write the interpreter in C and let the C compiler do a (hopefully) good job of optimizing the machine code that will actually run.
Or you write a JIT-compiler like Sun did for Java (the HotSpot VM). IDK if there are any for Python. See Does the Python 3 interpreter have a JIT feature?.
A JIT-compiler does actually turn the Python code into machine code, where register state mostly holds Python variables rather than interpreter data. Again, without a JIT compiler (or ahead-of-time compiler), "keeping variables in registers" is not a thing.
It's probably faster because it avoids the [] operator and other overhead (see Bren's answer, which you accepted)
Footnote: a couple ISAs have memory-mapped registers. e.g. AVR (8-bit RISC microcontrollers), where the chip also has built-in SRAM containing the low range of memory addresses that includes the registers. So you can do an indexed load and get register contents, but you might as well have done that on memory that wasn't holding architectural register contents.
The Python VM only uses a stack to execute its bytecode, and this stack is completely independent of the hardware stack. You can use dis to disassemble your code to see how your changes affect the generated bytecode.
It will be a little faster if you store these two variable:
loc0 = location[0]
loc1 = location[1]
Because there will be only two look-up instead of four.
Btw, if you want to use python, you shouldn't take care about performance in this low level.
Those kinds of details are not part of the specified behavior of Python. As Ignacio's answer says, CPython does it one way, but that is not guaranteed by the language itself. Python's description of what it does is very far removed from low-level notions like registers, and most of the time it's not useful to worry about how what Python does maps onto those details. Python is a high-level language whose behavior is defined in terms of high-level abstractions, akin to an API.
In any case, doing something like loc0 = language[0] in Python code has nothing to do with setting registers. It's just creating new Python name pointing an existing Python object.
That said, there is a performance difference, because if you use location[0] everywhere, the actual lookup will (or at least may -- in theory a smart Python implementation could optimize this) happen again and again every time the expression location[0] is evaluated. But if you do loc0 = location[0] and then use loc0 everywhere, you know the lookup only happens once. In typical situations (e.g., location is a Python list or dict, you're not running this code gazillions of times in a tight loop) this difference will be tiny.

Will Python be faster if I put commonly called code into separate methods or files?

I thought I once read on SO that Python will compile and run slightly more quickly if commonly called code is placed into methods or separate files. Does putting Python code in methods have an advantage over separate files or vice versa? Could someone explain why this is? I'd assume it has to do with memory allocation and garbage collection or something.
It doesn't matter. Don't structure your program around code speed; structure it around coder speed. If you write something in Python and it's too slow, find the bottleneck with cProfile and speed it up. How do you speed it up? You try things and profile them. In general, function call overhead in critical loops is high. Byte compiling your code takes a very small amount of time and only needs to be done once.
No. Regardless of where you put your code, it has to be parsed once and compiled if necessary. Distinction between putting code in methods or different files might have an insignificant performance difference, but you shouldn't worry about it.
About the only language right now that you have to worry about structuring "right" is Javascript. Because it has to be downloaded from net to client's computer. That's why there are so many compressors and obfuscators for it. Stuff like this isn't done with Python because it's not needed.
Two things:
Code in separate modules is compiled into bytecode at first runtime and saved as a precompiled .pyc file, so it doesn't have to be recompiled at the next run as long as the source hasn't been modified since. This might result in a small performance advantage, but only at program startup.
Also, Python stores variables etc. a bit more efficiently if they are placed inside functions instead of at the top level of a file. But I don't think that's what you're referring to here, is it?

Accessing Python Objects in a Core Dump

Is there anyway to discover the python value of a PyObject* from a corefile in gdb
It's lots of work, but of course it can be done, especially if you have all the symbols. Look at the header files for the specific version of Python (and compilation options in use to build it): they define PyObject as a struct which includes, first and foremost, a pointer to a type. Lots of macros are used, so you may want to run the compile of that Python from sources again, with exactly the same flags but in addition a -E to stop after preprocessing, so you can refer to the specific C code that made the bits you're seeing in the core dump.
A type object has, among many other things, a string (array of char) that's its name, and from it you can infer what exactly objects of that type contain -- be it content directly, or maybe some content (such as a length, i.e. number of items) and a pointer to the actual data.
I've done such super-advanced post-mortem debugging a couple of times (starting with VERY precise knowledge of the Python versions involved and all the prepared preprocessed sources &c) and each time it took me a day or two (were I still a freelance and charging by the hour, if I had to bid on such a task I'd say at least 20 hours -- at my not-cheap hourly rates!-).
IOW, it's worth it only if it's really truly the only way out of some very costly pickle. On the plus side, it WILL teach you more about Python's internals than you ever thought was there, even after memorizing every line of the sources. Good luck, you'll need some!!!

Categories