Reuse python loaded module

Reuse python loaded module - python

Is it possible to reuse python module already loaded in memory ?
Let us say I have scripts loader.py and consume.py. I am try to do next thing - invoke loader.py and reuse it in consume.py . First script should load in memory big file, second one will be invoked many times and use big file.
Can I achieve this? I am not familar with python but I guess, there should be a way to access loaded module (script) in memory.
My current implementation attempt looks like this:
loader.py
x = 3
print 'module loaded'
consume.py
from loader import x
print x
Update
I have tried to use importlib as it was described here and here, but my loader module loads every time again. Below is my code for cosume.py
import importlib
module = importlib.import_module('loader')
globals().update(
{n: getattr(module, n) for n in module.__all__} if hasattr(module, '__all__')
else
{k: v for (k, v) in module.__dict__.items() if not k.startswith('_')
})
print(x)
Final goal
Invoke consume script many times from nodejs and not to load big file every time. Need to share data between script executions

Define a function in consume.py that does the work you want to do. In fact, it should all be functions. You could have three files, one where you define functions that load the data, one where you define functions that consume data, and one where you combine them into some process.
For example, one module loads data:
# loader.py
def load_data():
# load the data
One module where you write functions that consume data:
# consume.py
def consume_data(data):
# do stuff with the data
def consume_data_differently(data):
# do other stuff with the data
and a script that actually does stuff:
# do_stuff.py
from loader import load_data
from consume import consume_data
data = load_data()
for d in data: # consume pieces of data in a loop
consume_data(d)
Settings things up like this gives you much more flexibility than relying on the import mechanism to run code, which isn't what it's designed for.
Addendum based on your update: you're making things much harder than they need to be. You really, really don't need to play around with importlib and globals() in normal code. Those are tools for building libraries, not doing data analysis.

Related

I am using a large module in many of my files that takes some time to import. Will importing it in every file waste time?

I have a module that takes a while to import, let's call it big_module. This module creates several variables that I need in my other files. I use this module in many of my helper files, called helper1, helper2, etc...
I have a main file that imports each helper file, so my files would look like this:
# helper1.py
import big_module
def do_stuff1(input):
# code that uses big_module
# helper2.py
import big_module
def do_stuff2(input):
# code that uses big_module
and so on for the helper files. Then I have my main file:
# main.py
import helper1
import helper2
# and so on
data1 = [some data]
data2 = helper1.do_stuff1(data1)
data3 = helper1.do_stuff2(data2)
# and so on
When I import each helper, and each helper subsequently imports big_module, does big_module get rerun every time, causing me to lose time, or does python cache it or something so that it is only run once? And if importing this in several files does waste time, is there a good way to only have to import it once?

No. The importer will check whether the module has already been imported. If so, you get a reference to the existing module, it is not reimported. You could test this by adding a print at module level to the .py file. You will only see that print once.
It would be very bad if python reimported the module on every import. That would mean that each importer would see a different namespace for each import. If the imported module had global variables, it would have a different set of them for each import and it would be difficult to hold state that is valid for the entire program.

PyCharm: Embed code of imported functions automatically

I'm more or less the only one in my office who codes little scripts for the analysis of data in Python. Hence I have a small library of helper functions which I reuse every now and than in different scripts; this library is placed centrally in the Anaconda librarys on my machine.
Sometimes though, I want to "deploy" the scripts as standalone version, so they could be run on a plain python distribution, i.e. without first having to "install" my library files.
Is there any way in PyCharm to automatically replace certain import statements by the code of the imported functions? I know this can be a complex task to determine whether the function-code to be embedded further relies on different functions again which also need to be imported (but somehow PyInstaller also gets these dependencies). But mainly I'm talking about functions which do not have any further dependencies or only within the same module...
To give an example:
Maybe my library looks like this:
def readfile(filepath):
return ....
def readfileAsList(filepath):
return readfile(filepath).splitlines()
def writefile(contents, filepath, writemode='w'):
with open(filepath, writemode):
...
And my script for some analysis looks like this:
import re
from myLib import readfileAsList
# ANALYSIS CODE GOES HERE ...
Now I want PyCharm to automatically transform the script-code and embed readfileAsList as well as readfile (because of the dependencies).
So the code would look like:
import re
def readfile(filepath):
return ....
def readfileAsList(filepath):
return readfile(filepath).splitlines()
# ANALYSIS CODE GOES HERE ...
Thanks in advance!

Transferring modules between two processes with python multiprocessing

So i have a problem. I'm trying to make my imports faster, so i started using multiprocessing module to split a group of imports into two functions, and then run each on separate core, thus speeding the imports up. But now the code will not recognize the modules at all. What am I doing wrong ?
import multiprocessing
def core1():
import wikipedia
import subprocess
import random
return wikipedia, subprocess, random
def core2():
from urllib import request
import json
import webbrowser
return request, json, webbrowser
if __name__ == "__main__":
start_core_1 = multiprocessing.Process(name='worker 1', target=core1, args = core2())
start_core_2 = multiprocessing.Process(name='worker 2', target=core2, args = core1())
start_core_1.start()
start_core_2.start()
while True:
user = input('[!] ')
with request.urlopen('https://api.wit.ai/message?v=20160511&q=%s&access_token=Z55PIVTSSFOETKSBPWMNPE6YL6HVK4YP' % request.quote(user)) as wit_api: # call to wit.ai api
wit_api_html = wit_api.read()
wit_api_html = wit_api_html.decode()
wit_api_data = json.loads(wit_api_html)
intent = wit_api_data['entities']['Intent'][0]['value']
term = wit_api_data['entities']['search_term'][0]['value']
if intent == 'info_on':
with request.urlopen('https://kgsearch.googleapis.com/v1/entities:search?query=%s&key=AIzaSyCvgNV4G7mbnu01xai0f0k9NL2ito8vY6s&limit=1&indent=True' % term.replace(' ', '%20')) as response:
google_knowledge_base_html = response.read()
google_knowledge_base_html = google_knowledge_base_html.decode()
google_knowledge_base_data = json.loads(google_knowledge_base_html)
print(google_knowledge_base_data['itemListElement'][0]['result']['detailedDescription']['articleBody'])
else:
print('Something')

I think you are missing the important parts of the whole picture i.e. crucial parts of what you need to know about multiprocessing when using it.
Here are some crucial parts that you have to know and then you will understand why you can't just import modules in child process and speed up the thing. Even returning loaded modules is not a perfect answer too.
First, when you use multiprocess.Process a child process is forked (on Linux) or spawned (on Windows). I'll assume you are using Linux. In that case, every child process inherits every loaded module from parent (global state). When child process changes anything, like global variables or imports new modules, those stay just in its context. So, parent process is not aware of it. I believe part of this can also be of interest.
Second, module can be a set of classes, external lib bindings, functions, etc. and some of them quite probably can't be pickled, at least with pickle. Here is the list of what can be pickled in Python 2.7 and in Python 3.X. There are even libraries that give you 'more pickling power' like dill. However, I'm not sure pickling whole modules is a good idea at all, not to mention that you have slow imports and yet you want to serialize them and send them to parent process. Even if you manage to do it, it doesn't sound like a best approach.
Some of the ideas on how to change the perspective:
Try to revise which module you need and why? Maybe you can use other modules that can give you similar functionalities. Maybe these modules are overweighing and bringing too much with them and cost is great in comparing to what you get.
If you have slow loading of modules, try to make a script that will always be running, so you do not have to run it multiple times.
If you really need those modules maybe you can separate their using in two processes and then each process does it's own thing. Example would be, one process parses page, other process processes and so on. That way you sped up the loading but you have to deal with passing messages between processes.

Load modules conditionally Python

I'm wrote a main python module that need load a file parser to work, initially I was a only one text parser module, but I need add more parsers for different cases.
parser_class1.py
parser_class2.py
parser_class3.py
Only one is required for every running instance, then I'm thinking load it by command line:
mmain.py -p parser_class1
With this purpose I wrote this code in order to select the parser to load when the main module will be called:
#!/usr/bin/env python
import argparse
aparser = argparse.ArgumentParser()
aparser.add_argument('-p',
action='store',
dest='module',
help='-p module to import')
results = aparser.parse_args()
if not results.module:
aparser.error('Error! no module')
try:
exec("import %s" %(results.module))
print '%s imported done!'%(results.module)
except ImportError, e:
print e
But, I was reading that this way is dangerous, maybe no stardard..
Then, is this approach ok? or I must find another way to do it?
Why?
Thanks, any comment are welcome.

You could actually just execute the import statement inside a conditional block:
if x:
import module1a as module1
else:
import module1b as module1
You can account for various whitelisted module imports in different ways using this, but effectively the idea is to pre-program the imports, and then essentially use a GOTO to make the proper imports... If you do want to just let the user import any arbitrary argument, then the __import__ function would be the way to go, rather than eval.
Update:
As #thedox mentioned in the comment, the as module1 section is the idiomatic way for loading similar APIs with different underlying code.
In the case where you intend to do completely different things with entirely different APIs, that's not the pattern to follow.
A more reasonable pattern in this case would be to include the code related to a particular import with that import statement:
if ...:
import module1
# do some stuff with module1 ...
else:
import module2
# do some stuff with module2 ...
As for security, if you allow the user to cause an import of some arbitrary code-set (e.g. their own module, perhaps?), it's not much different than using eval on user-input. It's essentially the same vulnerability: the user can get your program to execute their own code.
I don't think there's a truly safe manner to let the user import arbitrary modules, at all. The exception here is if they have no access to the file-system, and therefore cannot create new code to be imported, in which case you're basically back to the whitelist case, and may as well implement an explicit whitelist to prevent future-vulnerabilities if/when at some point in the future the user does gain file-system access.

here is how to use __import__()
allowed_modules = ['os', 're', 'your_module', 'parser_class1.py', 'parser_class2.py']
if not results.module:
aparser.error('Error! no module')
try:
if results.module in allowed_modules:
module = __import__(results.module)
print '%s imported as "module"'%(results.module)
else:
print 'hey what are you trying to do?'
except ImportError, e:
print e
module.your_function(your_data)
EVAL vs __IMPORT__()
using eval allows the user to run any code on your computer. Don't do that. __import__() only allows the user to load modules, apparently not allowing user to run arbitrary code. But it's only apparently safer.
The proposed function, without allowed_modules is still risky since it can allow to load an arbitrary model that may have some malicious code running on when loaded. Potentially the attacker can load a file somewhere (a shared folder, a ftp folder, a upload folder managed by your webserver ...) and call it using your argument.
WHITELISTS
Using allowed_modules mitigates the problem but do not solve it completely: to hardening even more you still have to check if the attacker wrote a "os.py", "re.py", "your_module.py", "parser_class1.py" into your script folder, since python first searches module there (docs).
Eventually you may compare parser_class*.py code against a list of hashes, like sha1sum does.
FINAL REMARKS: At the real end, if user has write access to your script folder you cannot ensure an absolutely safe code.

You should think of all of the possible modules you may import for that parsing function and then use a case statement or dictionary to load the correct one. For example:
import parser_class1, parser_class2, parser_class3
parser_map = {
'class1': parser_class1,
'class2': parser_class2,
'class3': parser_class3,
}
if not args.module:
#report error
parser = None
else:
parser = parser_map[args.module]
#perform work with parser
If loading any of the parser_classN modules in this example is expensive, you can define lambdas or functions that return that module (i.e. def get_class1(): import parser_class1; return parser_class1) and alter the line to be parser = parser_map[args.module]()
The exec option could be very dangerous because you're executing unvalidated user input. Imagine if your user did something like -
mmain.py -p "parser_class1; some_function_or_code_that_is_malicious()"

How do I override a Python import?

I'm working on pypreprocessor which is a preprocessor that takes c-style directives and I've been able to make it work like a traditional preprocessor (it's self-consuming and executes postprocessed code on-the-fly) except that it breaks library imports.
The problem is: The preprocessor runs through the file, processes it, outputs to a temporary file, and exec() the temporary file. Libraries that are imported need to be handled a little different, because they aren't executed, but rather they are loaded and made accessible to the caller module.
What I need to be able to do is: Interrupt the import (since the preprocessor is being run in the middle of the import), load the postprocessed code as a tempModule, and replace the original import with the tempModule to trick the calling script with the import into believing that the tempModule is the original module.
I have searched everywhere and so far and have no solution.
This Stack Overflow question is the closest I've seen so far to providing an answer:
Override namespace in Python
Here's what I have.
# Remove the bytecode file created by the first import
os.remove(moduleName + '.pyc')
# Remove the first import
del sys.modules[moduleName]
# Import the postprocessed module
tmpModule = __import__(tmpModuleName)
# Set first module's reference to point to the preprocessed module
sys.modules[moduleName] = tmpModule
moduleName is the name of the original module, and tmpModuleName is the name of the postprocessed code file.
The strange part is this solution still runs completely normal as if the first module completed loaded normally; unless you remove the last line, then you get a module not found error.
Hopefully someone on Stack Overflow know a lot more about imports than I do, because this one has me stumped.
Note: I will only award a solution, or, if this is not possible in Python; the best, most detailed explanation of why this is not impossible.
Update: For anybody who is interested, here is the working code.
if imp.lock_held() is True:
del sys.modules[moduleName]
sys.modules[tmpModuleName] = __import__(tmpModuleName)
sys.modules[moduleName] = __import__(tmpModuleName)
The 'imp.lock_held' part detects whether the module is being loaded as a library. The following lines do the rest.

Does this answer your question? The second import does the trick.
Mod_1.py
def test_function():
print "Test Function -- Mod 1"
Mod_2.py
def test_function():
print "Test Function -- Mod 2"
Test.py
#!/usr/bin/python
import sys
import Mod_1
Mod_1.test_function()
del sys.modules['Mod_1']
sys.modules['Mod_1'] = __import__('Mod_2')
import Mod_1
Mod_1.test_function()

To define a different import behavior or to totally subvert the import process you will need to write import hooks. See PEP 302.
For example,
import sys
class MyImporter(object):
def find_module(self, module_name, package_path):
# Return a loader
return self
def load_module(self, module_name):
# Return a module
return self
sys.meta_path.append(MyImporter())
import now_you_can_import_any_name
print now_you_can_import_any_name
It outputs:
<__main__.MyImporter object at 0x009F85F0>
So basically it returns a new module (which can be any object), in this case itself. You may use it to alter the import behavior by returning processe_xxx on import of xxx.
IMO: Python doesn't need a preprocessor. Whatever you are accomplishing can be accomplished in Python itself due to it very dynamic nature, for example, taking the case of the debug example, what is wrong with having at top of file
debug = 1
and later
if debug:
print "wow"
?

In Python 2 there is the imputil module that seems to provide the functionality you are looking for, but has been removed in python 3. It's not very well documented but contains an example section that shows how you can replace the standard import functions.
For Python 3 there is the importlib module (introduced in Python 3.1) that contains functions and classes to modify the import functionality in all kinds of ways. It should be suitable to hook your preprocessor into the import system.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.