When I run a Python script in VS Code, it delays execution by more than a second if, for example, Pandas or Numpy are part of the script's import statement.
If only libraries from the Python Standard Library are used in the imports, the script will start immediately.
A second doesn't sound like a lot, but it is to me because I've used Spyder so far, where the same script starts immediately without spending any noticeable time on imports. I was wondering if this is normal in VS Code or if there are configuration parameters to speed up import time.
Edit
A minimal example would be a script with content
import collections
import pandas
print("a string")
which in my opinion should only take a few milliseconds (not noticeable) for complete processing after clicking the "run" button. Without the pandas import, it actually does.
I think this is an important aspect, because slow "import speeds" hinder the unit-testing workflow.
pandas invokes numpy, and both of those are very large packages with many C DLLs. It takes many seconds for them to load. Once the DLLs have been loaded into Windows file cache, it should load quicker until they age out. It's just a fact.
Related
I am running a python script that calls local modules (with import), and since few days it became so slow.
I could not find the reason why it is so, neither on other Github posts or Google, that's why I am posting this.
Providing code won't be of much help, but here is the import that poses a problem:
import latplan
where latplan is this library
But again, this import did not pose any problem at all before...
import statements can be very slow in Python because they are allowed to execute arbitrary code as side effects; they don't just scoot for class and def keywords.
Now, as with any performance issue, it's hard to guess what exactly is taking so long without profiling. Luckily there's a builtin, specialised profiler for import time:
python -X importtime -c "import latplant"
I recommend using tuna to visualise the reports.
I apologize in advance if my question is badly formulated, for I don't know if what I need makes any sense.
I'm currently working on a c# project where I need to run several time the same python script from inside the program, but with different arguments each time.
For this, I'm not using IronPython, but the ProcessStartInfo class, for I understood that IronPython has some problem with certain packages I use. But this can change.
My problem is that although the python script is small and fast, it needs to import first a lot of packages, and this takes a lot of time. And therefore, my code is very slow, while 90% of the time is used to import python packages.
I can't work around the problem by running this python script a single time with many arguments.
So is there a way to "open a permanent python console" from c#, where I could import everything once, then run the small script with my first argument, get the result back in c#, then run the script a second time etc .... ? Or any other way to optimize this ?
Thanks for your help,
Astrosias.
I've encountered this issue with two separate modules now, one that I attempted to download myself (Quartz; could probably be the way I installed it, but let's ignore this scenario for now) and another that I installed using pip install (Pandas; let's focus on this one).
I wrote a two-line script that includes just import pandas and print('test'), for testing purposes. When I execute this in the terminal, instead of printing test to confirm the script runs correctly, it prints the docstring for another completely unrelated script:
[hidden]~/Python/$ python3 test.py
Usage: python emailResponse.py [situation] - copy situation response
The second line is a docstring I wrote for a simple fetch script for responding to emails, which is unrelated. What's worse is, if I just envoke Python3 in the terminal, and try import pandas, it'll print that same docstring and take me out of Python3 and back into the terminal shell / bash (sorry if this is not the right verbiage; still learning). The same results happen trying import Quartz as well, but no other modules are impacted (at least, that I'm aware of).
I'm at a complete loss why this might be the case. It was easy enough to avoid using Quartz, but I need Pandas for work purposes and this issue is starting to directly affect my work.
Any idea why this might be the case?
I have a problem with the debugger when some modules in my code call each other.
Practical example:
A file dog.py contains the following code:
import cat
print("Dog")
The file cat.py is the following:
import dog
print("Cat")
When I run dog.py (or cat.py) I don't have any problem and the program runs smoothly.
However, when I try to debug it, the whole spyder freezes and I have to kill the program.
Do you know how can I fix this? I would like to use this circular importing, as the modules use functions that are in the other modules.
Thank you!
When I run dog.py (or cat.py) I don't have any problem and the program runs smoothly.
AFAICT that's mostly because a script is imported under the special name ("__main__"), while a module is imported under it's own name (here "dog" or "cat"). NB : the only difference between a script and a module is actually loaded - passed an argument to the python runtime (python dog.py) or imported from a script or any module with an import statement.
(Actually circular imports issues are a bit more complicated than what I describe above, but I'll leave this to someone more knowledgeable.)
To make a long story short: except for this particular use case (which is actually more of a side effect), Python does not support circular imports. If you have functions (classes, whatever) shared by other scripts or modules, put these functions in a different module. Or if you find out that two modules really depends on each other, you may just want to regroup them into a single module (or regroup the parts that depend on each other in a same module and everything else in one or more other modules).
Also: unless it's a trivial one-shot util or something that only depends on the stdlib, your script's content is often better reduced to a main function parsing command-line arguments / reading config files / whatever, importing the required modules and starting the effective process.
I have a fairly complex python application that uses numpy, pandas, PySide, pyqtgraph, and matplotlib, among other packages. When I bundle the application with cx_Freeze on Windows, it comes in at 349MB.
My problem is the resulting executable has a very long startup time of about 15 seconds. When I say startup time, I mean the amount of time before any code gets executed. I have a simple script that prints "Hello" to the console, and even that takes about 15 seconds to run.
Does anyone know of a solution to this problem, or any ways to debug it? Is it slow because there are so many .dll files from so many packages?
EDIT:
Using a great tool called Process Montor, I have narrowed down my problem to the pytz module. On one particular load, 20 seconds were spent querying the library.zip (where cx_Freeze puts all of the compiled bytecode) for pytz zoneinfo! I recently added pandas as a dependency, and pandas uses pytz.
See this picture for a sampling of the Process Monitor output:
The solution I found is to use Process Monitor to see if cx_Freeze is loading a module for an unreasonable amount of time. Using that tool, I also found that it was taking a long time (maybe 4 seconds) to load a particular matplotlib font. I removed it and my application worked fine.