script has a huge delay at the beginning - python

I have multiple python scripts in various complexity I am running from another application as part of data load steps.
Some are in production for more than 2 years some new.
I have observed lately a script starts and does nothing for 10+ minutes.
Example : The application has the start timestamp as 2022-05-17 15:48:42
The first entry in the log file for this script is 2022-05-17 16:01:04
Script logs the start time as a first step.
The list of libraries imported for the example run above is a short one since this is one of the areas I looked at
import snowflake.connector
import json
import os
import datetime
import threading
from pathlib import Path
Any ideas what could be causing this behavior?
How can I further debug this?
Update : Here is the minimal example of the script
import snowflake.connector
import json
import os
import datetime
import threading
from pathlib import Path
def to_log(msg, include_timestamp=True):
"""Write the log entry to log file"""
# add the current date to the log file name in order to create a file for everyday
log_file = Path(
work_dir
/ (
f"log/kafka_loader_"
+ "{:%Y%m%d}".format(datetime.datetime.now())
+ ".log"
)
)
if include_timestamp:
msg = "{:%Y-%m-%d %H:%M:%S}".format(datetime.datetime.now()) + f" {msg}"
print(msg)
with open(log_file, "a") as l_f:
l_f.write(msg + "\n")
def proces():
...
if __name__ == "__main__":
to_log("Starting a new session")
process()
to_log("Session completed")
The application used to invoke the script is called WhereScape.
In the UI I see the started start timestamp above.

Related

I keep having a error when running the script. It keeps sending "name input_file is not defined"

I am trying to create a python def to unzip a few .gz files within a folder. I know that the main script works if it is not in the def. The script I have created is similar to others I have done but this one give me the error
File "unzip.py", line 24, in
decompressed_files(input_folder)
NameError: name 'input_folder' is not defined
I copied the script below so someone can help me to see where the error is. I haven't done any BioInformatics for the last couple of years and I am a bit rusty.
import glob
import sys
import os
import argparse
import subprocess
import gzip
def decompressed_files(input_folder):
print ('starting decompressed_files')
output_folder=input_folder + '/fasta_files'
if os.path.exists(output_folder):
print ('folder already exists')
else:
os.makedirs(output_folder)
for f in input_folder:
fastqs=glob.glob(input_folder + '/*.fastq.gz')
cmd =[gunzip, -k, fastqs, output_folder]
my_file=subprocess.Popen(cmd)
my_file.wait
print ('The programme has finished doing its job')
decompressed_files(input_folder)
This is done for python 2.7, I know that is old but it is the one that it is installed in my work server.
That's why when you call decompressed_files(input_folder) in the last line, you didn't define input_folder before. you should do it like this :
input_folder = 'C:/Some Address/'
decompressed_files(input_folder)

How to fix delays/lag in python script

I have my python script that executes an mp3 when the current time matches the time specified in a text file. However everything works well but I notice a lag and delay of around 18 seconds before mplayer plays the mp3 file.
Is there anyway of making my python script better in order to get rid of the 18 seconds lag and make the mp3 file play instantaneously?
Here is my python script:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# import libraries
import json
import urllib2
from bs4 import BeautifulSoup
import requests
import datetime
import playsound
import os
import subprocess
dateSTR = datetime.datetime.now().strftime('%H:%M')
f = open('/home/pi/test.txt','r')
messagetest = f.read()
newnametest = messagetest.replace("\n","")
f.close()
if (dateSTR) == (newnametest):
os.system("mplayer -ao alsa:device=bluealsa /home/pi/test.mp3")
Try starting mplayer in a subprocess before you actually need it as:
p = subprocess.Popen('mplayer -slave -idle -ao alsa:device=bluealsa', shell=True, stdin=subprocess.PIPE)
That should start up mplayer and have it waiting for when you need it. Then, when you want to play a file, do:
p.communicate(input=b'loadfile /home/pi/test.mp3\n')
I'd create a loop, something like:
from time import sleep
from datetime import datetime
...
done = []
while 1:
dateSTR = datetime.now().strftime('%H:%M')
if (dateSTR) == (newnametest) and not dateSTR in done:
done.append(dateSTR)
os.system("mplayer -ao alsa:device=bluealsa /home/pi/test.mp3")
sleep(1)

Sharing a path string between modules in python

I am trying to make a gui that displays a path to a file, and the user can change it anytime. I have my defaults which are in my first script.The following is a simplified version without any of the gui stuff. But then the user pushes a button and it runs a different script (script2). In this script, the information on the file is read.
script1:
import os
import multiprocessing as mp
import script2
specsfile = mp.Array('c',1000, lock=True)
path_save = mp.Array('c',1000, lock=True)
p = mp.Process(target=script2, args=(specsfile,path_save))
p.start()
specsfile = '//_an_excel_sheet_directory.xlsx'
path_save = '//path/to/my/directory/'
subprocess.call([sys.executable, 'script2.py'])
script2:
import multiprocessing as mp
from script1 import specsfile
from script1 import path_save
print(specsfile)
spec= pd.read_excel(specsfile)
When I run it, it gives me this error: PermissionError: [WinError 5] Access is denied
I'm not sure if I'm initializing this wrong or not. I've never used multiprocessing but I was reading about some recommendations for that when sharing data. so basically I want to initialize a specsfile string and a path_save string but when it changes,I want it to be reflected and sent to specs2 file.

Mounted PATH Dependent import works through CLI, not through script

I have a python module called user_module, which resides in a mounted network location.
In a script I'm using, I need to import that module - but due to NFS issues, sometimes this path isn't available until we actually change directory to the relevant one and\or restarting autofs service.
In order to reproduce the issue and try to WA it - I've manually stopped autofs service, and tried to run my script with my WA - (probably not the most elegant one though):
import os
import sys
from subprocess import call
PATH="/some/network/path"
sys.path.append(PATH)
try:
os.chdir(PATH)
import user_module
except:
print "Unable to load user_module, trying to restart autofs service"
call(['service', 'autofs', 'restart'])
os.chdir(PATH)
import user_module # Throws Import error!
But, I still get import error due to path unavabilable.
Now this is what I find weird - On the same machine, I've tried executing the same operations as in my script, with intentionally pre stopping autofs service, and it works perfect -
[root#machine]: service autofs stop # To reproduce the import error
[root#machine]: python
>>> import os
>>> import sys
>>> from subprocess import call
>>> PATH="/some/network/path"
>>> sys.path.append(PATH)
>>> os.chdir(PATH)
######################################################################
################## exception of no such file or directory ############
######################################################################
>>> call(['service', 'autofs', 'restart'])
>>> os.chdir(PATH) # No exception now after restarting the service
>>> import user_module # NO Import error here
Can someone shed some light on the situation
and explain to me why same methodology works through python CLI, but through a script?
What is it that I don't know or what is it that I'm missing here?
Also - How to overcome this?
Thanks

Python subprocess script failing

Have written the below script to delete files in a folder not matching the dates in the "keep" period. Eg. Delete all except files partly matching this name.
The command works from the shell but fails with the subprocess call.
/bin/rm /home/backups/!(*"20170920"*|*"20170919"*|*"20170918"*|*"20170917"*|*"20170916"*|*"20170915"*|*"20170914"*)
#!/usr/bin/env python
from datetime import datetime
from datetime import timedelta
import subprocess
### Editable Variables
keepdays=7
location="/home/backups"
count=0
date_string=''
for count in range(0,keepdays):
if(date_string!=""):
date_string+="|"
keepdate = (datetime.now() - timedelta(days=count)).strftime("%Y%m%d")
date_string+="*\""+keepdate+"\"*"
full_cmd="/bin/rm "+location+"/!("+date_string+")"
subprocess.call([full_cmd], shell=True)
This is what the script returns:
#./test.py
/bin/rm /home/backups/!(*"20170920"*|*"20170919"*|*"20170918"*|*"20170917"*|*"20170916"*|*"20170915"*|*"20170914"*)
/bin/sh: 1: Syntax error: "(" unexpected
Python version is Python 2.7.12
Just as #hjpotter said, subprocess will use /bin/sh as default shell, which doesn't support the kind of globbing you want to do. See official documentation. You can change that using the executable parameter to subprocess.call() with a more appropriate shell (/bin/bash or /bin/zsh for example): subprocess.call([full_cmd], executable="/bin/bash", shell=True)
BUT you can be a lot better served by Python itself, you don't need to call a subprocess to delete a file:
#!/usr/bin/env python
from datetime import datetime
from datetime import timedelta
import re
import os
import os.path
### Editable Variables
keepdays=7
location="/home/backups"
now = datetime.now()
keeppatterns = set((now - timedelta(days=count)).strftime("%Y%m%d") for count in range(0, keepdays))
for filename in os.listdir(location):
dates = set(re.findall(r"\d{8}", filename))
if not dates or dates.isdisjoint(keeppatterns):
abs_path = os.path.join(location, filename)
print("I am about to remove", abs_path)
# uncomment the line below when you are sure it won't delete any valuable file
#os.path.delete(abs_path)

Categories