Python twisted web server caching and executing outdated code

Python twisted web server caching and executing outdated code - python

Background: Working on a web application that allows users to upload python scripts to a server (Twisted web server). The UI provides full CRUD functionality on these python scripts. After uploading a script the user can then select the script and run it on the server and get results back on the UI. Everything works fine...
Problem: ...except when the user edits the python code inline (via the UI) or updates a script by uploading a new script overwriting one which already exists. It seems that twisted caches the code (both old and new) and runs new code sometimes and sometimes runs the old code.
Example: I upload a script hello.py on the server which has a function called run() which does: print 'hello world'. Someone else comes along and uploads another script named hello.py which does: print 'goodbye world'. Then, I go back and execute the run() function on the script 10 times. Half of the times it will say 'hello world' and half of the times it will say 'goodbye world'.
Tried so far: Several different ways to reload the script into memory before executing it, including:
python's builtin reload():
module = __import__('hello')
reload(module)
module.run()
imp module reload():
import imp
module = __import__('hello')
imp.reload(module)
module.run()
twisted.python.rebuild()
from twisted.python.rebuild import rebuild
module = __import__('hello')
rebuild(module)
module.run()
figured that perhaps if we force python to not write bytecode, that would solve the issue: sys.dont_write_bytecode = True
restart twisted server
a number of other things which I can't remember
And the only way to make sure that the most up to date python code executes is to restart twisted server manually. I have been researching for quite some time and have not found any better way of doing it, which works 100% of the time. This leads me to believe that bouncing twisted is the only way.
Question: Is there a better way to accomplish this (i.e. always execute the most recent code) without having to bounce twisted? Perhaps by preventing twisted from caching scripts into memory, or by clearing twisted cache before importing/reloading modules.
I'm fairly new to twisted web server, so it's possible that I may have overlooked obvious way to resolve this issue, or may have a completely wrong way of approaching this. Some insight into solving this issue would be greatly appreciated.
Thanks
T

Twisted doesn't cache Python code in memory. Python's module system works by evaluating source files once and then placing a module object into sys.modules. Future imports of the module do not re-evaluate the source files - they just pull the module object from sys.modules.
What parts of Twisted will do is keep references to objects that it is using. This is just how you write Python programs. If you don't have references to objects, you can't use them. The Twisted Web server can't call the run function unless it has a reference to the module that defines that function.
The trouble with reload is that it re-evaluates the source file defining the module but it can't track down and replace all of the references to the old version of the objects that module defined - for example, your run function. The imp.reload function is essentially the same.
twisted.python.rebuild tries to address this problem but using it correctly takes some care (and more likely than not there are edge cases that it still doesn't handle properly).
Whether any of these code reloading tools will work in your application or not is extremely sensitive to the minute, seemingly irrelevant details of how your application is written.
For example,
import somemodule
reload(somemodule)
somemodule.foo()
can be expected to run the newest version of somemodule.foo. But...
from somemodule import foo
import somemodule
reload(somemodule)
foo()
Can be expected not to run the newest version of somemodule.foo. There are even more subtle rules for using twisted.python.rebuild successfully.
Since your question doesn't include any of the actual code from your application, there's no way to know which of these cases you've run into (resulting in the inability to reliably update your objects to reflect the latest version of their source code).
There aren't any great solutions here. The solution that works the most reliably is to restart the process. This certainly clears out any old code/objects and lets things run with the newest version (though not 100% of the time - for example, timestamp problems on .py and .pyc files can result in an older .pyc file being used instead of a new .py file - but this is pretty rare).
Another approach is to use execfile (or exec) instead of import. This bypasses the entire module system (and therefore its layer of "caching"). It puts the entire burden of managing the lifetime of the objects defined by the source you're loading onto you. It's more work but it also means there are few surprises coming from other levels of the runtime.
And of course it is possible to do this with reload or twisted.python.rebuild if you're willing to go through all of your code for interacting with user modules and carefully audit it for left-over references to old objects. Oh, and any library code you're using that might have been able to get a reference to those objects, too.

Related

How far can python go?

I've started python a few months ago, for school projects and I was pretty surprised at the file edition power it had without asking for any permissions. My question is, how far can python go? Can it delete System files? Can it delete normal files? I also saw a video that I didn't click that said python malware was really easy to make... So I am just really curious to how far it goes, mostly because my IDE didn't even need admin permissions to be installed...
P-S: not sure if this is appropriate to stack overflow, kinda new here :)

Python can go just as far as the user running Python can. If you have the right to delete a file, then if you start Python and run a script or issue a command that deletes a file, Python will be allowed to. Python will be acting under your user account.
Having said that, it's not always obvious what user is running Python exactly. Normally, if you start Python yourself and pass it a script, or run some commands interactively, it'll be you.
But if, for example, you start Python from a scheduled task, the user running Python won't be you by default, but some sort of system account which may have more restricted rights.
On the other hand, if you're not allowed to do something (say access a folder that has restricted access for other users only), you can still write a Python script that tries to perform the actions. If you were to run that script, it would fail, but if one of those other users logs on and runs the same script, it will succeed.
Python is restricted in that it doesn't contain libraries for every function imaginable (although you could probably write them yourself in Python, given enough time). To get around that, you typically install third party packages, created by people that have already written that code, further extending what Python can do. But no package should be able to get around the restrictions the OS imposes on the user running Python.
To get a sense of how complete Python is, even without third party packages, have a look at the Python Standard Library. All those things can be done with standard Python, provided the user running it is allowed to.

how to make vscode detect / auto reload modules after editing them?

I've seen a few questions asking this, but none of the solutions worked for me.
I am developing a few functions/classes in different modules and have a main.py script that calls everything.
The problem is, when I make a change to a function in another module i.e. module1.py, VSCode does not detect the changes when I call the function in main.py after updating, it's still the older version.
I can get around this by doing something like:
from importlib import reload
reload module1
but this gets old real quick especially when I'm importing specific functions or classes from a module.
Simply re-running the imports at the top of my main.py doesn't actually do anything, I can only do that if I kill the shell and reopen it from the begining, which is not ideal if I am incrementally developing something.
I've read on a few questions that I could include this:
"files.useExperimentalFileWatcher" : true
into my settings.json, but it does not seem to be a known configuration setting in my version, 1.45.1.
This is something Spyder handles by default, and makes it very easy to code incrementally when calling functions and classes from multiple modules in the pkg you are developing.
How can I achieve this in VSCode? To be clear, I don't want to use IPython autoreload magic command.
Much appreciated
FYI here are the other questions I saw, but did not get a working solution out of, amongst others with similar questions/answers :
link1
link2

There is no support for this in VS Code as Python's reload mechanism is not reliable enough to use outside of the REPL, and even then you should be careful. It isn't a perfect solution and can lead to stale code lying about which can easily trip you up (and I know this because I wrote importlib.reload() 😁).

Case sensitivity with names of modules and files in python 2.7.15

I have encountered a rather funny situation: I work in a big scientific collaboration whose major software package is based on C++ and python (2.7.15 still). This collaboration also has multiple servers (SL6) to run the framework on. Since I joined the collaboration recently, I received instructions on how to set up the software and run it. All works perfectly on the server. Now, there are reasons not to connect to the server to do simple tasks or code development, instead it is preferrable to do these kind of things on your local laptop. Thus, I set up a virtual machine (docker) according to a recipe I received, installed a couple of things (fuse, cvmfs, docker images, etc.) and in this way managed to connect my MacBook (OSX 10.14.2) to the server where some of the libraries need to be sourced in order for the software to be compiled and run. And after 2h it does compile! So far so good..
Now comes the fun part: you run the software by executing a specific python script which is fed as argument another python script. Not funny yet. But somewhere in this big list of python scripts sourcing one another, there is a very simple task:
import logging
variable = logging.DEBUG
This is written inside a script that is called Logging.py. So the script and library only are different by the first letter: l or L. On the server, this runs perfectly smooth. On my local VM set up, I get the error
AttributeError: 'module' object has no attribute 'DEBUG'
I checked the python versions (which python) and the location of the logging library (print logging.__file__), and in both set ups I get the same result for both commands. So the same python version is run, and the same logging library is sourced but in one case there is a mix up with the name of the file that sources the library.
So I am wondering, if there is some "convention file" (like a .vimrc for vi) sourced somewhere where this issue could be resolved by setting some tolerance parameter to some other value...?
Thanks a lot for the help!
conni

as others have said, OSX treats names as case-insensitive by default, so the Python bundled logging module will be confused with your Logging.py file. I'd suggest the better fix would be to get the Logging.py file renamed, as this would improve compatibility of the code base. otherwise, you could create a "Case-sensitive" APFS file system using "Disk Utility"
if you go with creating a file system, I'd suggest not changing the root/system partition to case-sensitive as this will break various programs in subtle ways. you could either repartition your disk and create a case-sensitive filesystem, or create an "Image" (this might be slower, not sure how much) and work in there. Just make sure you pick the "APFS (Case-sensitive)" format when creating the filesystem!

Debugging IDAPython Scripts outside of IDAPro

I'm kinda new to scripting for IDA - nevertheless, I've written a complex script I need to debug, as it is not working properly.
It is composed of a few different files containing a few different classes.
Writing line-by-line in the commandline is not effective for obvious reasons.
Running a whole script from the File doesn't allow debugging.
Is there a way of using the idc, idautils, idaapi not from within IDA?
I've written the script on PyDev for Eclipse, I'm hoping for a way to run the scripts from within it.
A similar question is, can the api classes I have mentioned work on idb files without IDA having them loaded?
Thanks.

Now I may be wrong for I haven't written any IDA script for long time. But as far as I remember the answer to your first question is no. There is the part that loads the IDA script and prepare the whole environment so you could re implement it and create your own environment, however I would not recommend that.
What I can tell you is to consider running your script from command line if automation is what you are aiming for. IDA python (as well as any other IDA plugin) have a good support for running scripts from command line. For performance you can also run the TUI version of IDA.
There is also a hack for that enables you to launch a new python interpreter in the middle of the IDA script. It is useful for debugging a current state yet you will still need to edit the python file every time to launch the interpreter.
Here is the hack:
import code
all = globals()
all.update(locals())
code.interact(local = all)
Anyway - logs are good and debug prints are OK.
Good luck :)

We've just got a notice from one of our users that the latest version of WingIDE supports debugging of IDAPython scripts. I think there are a couple of other programs using the same approach (import a module to do RPC debugging) that might work.

How do i make Pydev + jython to startup faster when running a script?

i'm working with pydev + jython.great ide , but quite slow when i try to run a jython program.
this is probably something due to libraries load time.
What can i do to speed it up ?
Thanks ,
yaniv

Jython startup time is slow ... there's a lot to bootup!
Everytime you run a Jython script from scratch, it will incur the same Jython startup time cost.
Hence, the reason Jython, Java, and Python are not great for CGI invocations. Hence, the reason for mod_python in Apache.
The key is to start-up Jython once and reuse it. But this is not always possible especially during development because your modules are always changing and Jython does not recognize these changes automatically.
Jython needs a way to know which modules have changed for automatic reloads. This is not built into Jython and you'll have to rely on some other third party library to help with this. The concept is to remove from 'sys.modules' the modules which have changed. A simple solution is to just clear all the modules from sys.modules - which will cause all modules to be reloaded. This is obviously, not the most efficient solution.
Another tip is to only import modules that your module needs at the time that it 'really' needs them. If you import every module at the top of your modules, that will increase your module import cost. So, refactor imports to within methods/functions where they are needed and where it 'makes sense'. Of course, if your method/function is computation heavy and is used frequently, it does not make sence to import modules within that method/function.
Hopefully, that helps you out!

If you have a machine with more than one processor you could try starting eclipse/pydev with the options -vmargs -XX:+UseParallelGC You could also try different JVMs to see if any of them give better performance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.