I have a very strange problem none of my other dev coworkers have within the same codebase.
My pytest suite takes about 2min while it's collecting tests, once the first test runs it runs at normal speed through them all.
For other devs, this 2 min wait is not existent, it collect the tests and runs them in about the same amount of time once it starts.
Even stranger is if I'm connected to my phone hotspot (and one other wifi network from a certain office), this 2 min wait is not there and pytest collects and runs test at the same speed as other devs. Under any other Wifi network or no wifi connection at all, it hangs.
None of our tests require internet connection so I'm at a loss what to try, I wasn't able to find similar posts online so I thought of asking here.
I have updated pytest version to the latest version plus re-installed my whole dev environment to no avail. I've also tried running pytest with different flags but so far nothing changes or yields information about the cause for this.
Im on a Macbook M1 13" 8Gb Ram. (Other devs on M1 don't have this problem btw only me). Any ideas?
Related
I and other team members develop, test, and debug our compute-intensive Python code on a cloud-based Linux server using large datasets and many CPUs/GPUs. During the day there can be one or more users with interactive sessions on this machine (e.g. SSH console or PyCharm over SSH) specifically so we can debug.
The cloud instance we run on costs $10-$20 per hour, which is fine as long as people are using it. We try to remember to shut it down manually when nobody is using it (which requires checking that others aren't logged in). Sometimes we forget, which can cost ~$300 overnight or $1,000 if left idle over a weekend. Note that user sessions can be set to timeout on the client side by configuring OpenSSH, but that leaves the server running.
How to set up scripts and configurations that either:
detect that all interactive users have been idle for X hours ("ideal" condition); or
detect that there have been no interactive sessions for X >= 0 hours ("good-enough" condition); and
sudo shutdown now when the condition is detected?
I'm aware that (for example) on AWS there are some hacky/complex/proprietary/unreliable ways to sort of do this by setting up external monitor services, and I assume there are similar kludges for GCP and Azure. We may want to do similar things on different cloud platforms (AWS, GCP, Azure), but on all of them we'd likely use Ubuntu 20.04+ as the common environment, so I'm looking for implementations that can be coded at the Ubuntu/Linux level.
I would prefer that solutions are based on bash or python. Assume all users are sudoers.
I've already tried proprietary services that are unreliable and not portable.
I use gRPC for Python RPC on the same machine. It has been working great till yesterday. Then, all of a sudden, it started being very slow. The helloworld example now takes about 78s to complete. I tested it on three computers on the same network, all Ubuntu 18.04, with the same results. At home, the same example runs almost instantaneously. I suspect some networking issue, maybe an automatic update on the gateway, but I'm at a loss on how to troubleshoot the problem. Any suggestions?
EDIT:
I still don't know what happened, but I found a workaround. Replacing localhost with 127.0.0.1 in the grpc.insecure_channel connection string makes gRPC responsive again.
Using docker (docker-compose) on macOS. When running the Docker containers and attaching Visual Studio Code (VSCode) to the active app container it can make the hyperkit process go crazy :( the macBook fans have to go at full speed to try to keep the temperature down.
When using VSCode on python files I noticed that actions, such as done by pylint, that result in scanning/parsing your file will increase the hyperkit CPU usage to the max and the macBook fans go on full speed :(. Hyperkit CPU usage goes down again when the action of pylint is finished.
When using VSCode to debug my Django Python app the hyperkit CPU usage goes to the max again. When actively debugging the hyperkit goes wild but it does settle down again afterwards.
I'm currently switching "bind mounts" to "volume mounts" I think I see some improvements but haven't done enough testing to say anything conclusive. I've only switched my source code to using "volume mount" instead of "bind mount" and will do the same for my static files and database and see if that results in improvements.
You can check out this stackoverflow post on Docker volumes for some more info on the subject.
Here is some post that I found regarding this issue:
https://code.visualstudio.com/docs/remote/containers?origin_team=TJ8BCJSSG
https://github.com/docker/for-mac/issues/1759
Any other ideas on how to keep the hyperkit process under control❓
[update 27 March] Docker debug mode was set to TRUE I've changed this to FALSE but I have not seen any significant improvements.
[update 27 March] Using "delegated" option for my source code (app) folder and first impressions are positive. I'm seeing significant performance improvements we'll have to see if it lasts 😀
FYI Docker docu on delegated: the container’s view is authoritative (permit delays before updates on the container appear in the host)
[update 27 March] I've also reduced the number of CPU cores Docker desktop can use (settings->advanced). Hopefully this prevents the CPU from getting too hot.
I "solved" this issue by using http://docker-sync.io to create volumes that I can mount without raising my CPU usage at all. I am currently running 8 containers (6 Python and 2 node) with file watchers on and my CPU is at 25% usage.
I created a a very simple test that launches and close a software I was testing using Python Nose test platform to track down a bug in the start up sequence of the software I was working on.
The test was set up so that it would launch and close about 1,500 times in a singling execution.
A few hours later, I discovered that the test was not able to launch to the software around after 300 iterations. It was timing out while waiting for the process to start. As soon as I logged back in, the test started launching the process without any problem and all the tests started passing as well.
This is quite puzzling to me. I have never seen this behavior. This never happened on Windows also.
I am wondering if there is a sort of power saving state that Mac was waiting for currently running process to finish and prohibits new process from starting.
I would really appreciate if anybody can shed light on this confusion.
I was running Python 2.7.x on High Sierra.
I am not aware of any state where the system flat out denies new processes while old ones are still running.
However, I can easily imagine a situation in which a process may hang because of some unexpected dependency on e.g. the window server. For example, I once noticed that rsvg-convert, a command-line SVG-to-image converter, running in an SSH session, had different fonts available to it depending on whether I was also simultaneously logged in on the console. This behavior went away when I recompiled the SVG libraries to exclude all references to macOS specific libraries...
I am currently trying to run a long running python script on Ubuntu 12.04. The machine is running on a Digital Ocean droplet. It has no visible memory leaks (top shows constant memory). After running without incident (there are no uncaught exceptions and the used memory does not increase) for about 12 hours, the script gets killed.
The only messages present in syslog relating to the script are
Sep 11 06:35:06 localhost kernel: [13729692.901711] select 19116 (python), adj 0, size 62408, to kill
Sep 11 06:35:06 localhost kernel: [13729692.901713] send sigkill to 19116 (python), adj 0, size 62408
I've encountered similar problems before (with other scripts) in Ubuntu 12.04 but the logs then contained the additional information that the scripts were killed by oom-killer.
Those scripts, as well as this one, occupy a maximum of 30% of available memory.
Since i can't find any problems with the actual code, could this be an OS problem? If so, how do i go about fixing it?
Your process was indeed killed by the oom-killer. The log message “select … to kill“ hints to that.
Probably your script didn’t do anything wrong, but it was selected to be killed because it used the most memory.
You have to provide more free memory, by adding more (virtual) RAM if you can, by moving other services from this machine to a different one, or by trying to optimize memory usage in your script.
See e.g. Debug out-of-memory with /var/log/messages for debugging hints. You could try to spare your script from being killed: How to set OOM killer adjustments for daemons permanently? But often killing some process at random may leave the whole machine in an unstable state. In the end you will have to sort out the memory requirements and then make sure enough memory for peak loads is available.