celery task fails with dlib cnn face detection - python

I have flask app with following extensions used:
Python 3.7.2
Flask 1.0.2
Celery 4.3.0rc2
dlib 19.6
Without celery, as a regular single-thread program for face detection, everything works fine, but when I'm starting it as a task - the following error occurs:
Task handler raised error: WorkerLostError('Worker exited prematurely:
signal 11 (SIGSEGV).')
on this piece of code:
dlib.cnn_face_detection_model_v1(model)
And I don't get why, but with this:
dlib.get_frontal_face_detector()
it works fine!
I know the conflict caused by dlib (or BLAS) not being thread-safe, but is there any way to disable multiprocessing either make celery workers work with this?
UPD:
I have following project structure:
./app/face_detector.py
./tasks.py
In tasks.py I'm using this on top of the file:
from app.face_detector import FaceDetector
the trick with import inside of task:
#app.task
def foo():
from app.face_detector import FaceDetector
Doesn't work at all and throws this:
no module named...
So, the solution from github thread was not understood by me or not working

Related

Module not found error when running script as wsgi within Apache

I have a developed Server application using pywps on a Flask server and now im trying to migrate it to an apache server.
Tech-stack :
OS: Windows Server
Python 3.9.6
Apache 2.4.48 (Win64)
mod-wsgi 4.8.0
I can open the configured URL but receive a status code 500. Error log says the following:
mod_wsgi (pid=7212, process='', application='127.0.0.1:8008|/wps'): Loading Python script file 'C:/Apache24/wps_env/pywps-flask-master/pywps.wsgi'.
mod_wsgi (pid=7212): Failed to exec Python script file 'C:/Apache24/wps_env/pywps-flask-master/pywps.wsgi'.
mod_wsgi (pid=7212): Exception occurred processing WSGI script 'C:/Apache24/wps_env/pywps-flask-master/pywps.wsgi'.
Traceback (most recent call last):\r
File "C:/Apache24/wps_env/pywps-flask-master/pywps.wsgi", line 9, in <module>\r
import pywps\r
ModuleNotFoundError: No module named 'pywps'\r
The wsgi file in question is :
from pathlib import Path
import pywps
from pywps import Service
from processes.sayhello import SayHello
from processes.csv_input import CSVInputs
processes = [
SayHello(),
CSVInputs()
]
application = Service(processes, [Path(r'C:\Users\Jdoe\wps_env\pywps-flask-master\pywps.cfg')])
Now comes the strange thing, i am able to execute this exakt same script from the powershell with no errors at all.
I would rule out, path or enviroment related issues, because i have installed all pip packages which i used in the virtaul enviroemnt i have to develop in the global namespace aswell, so both interpreters know the same packages. I know this is not best practice, but i'm currently working in a sandbox anyway and this more a POC then anything else.
Since the wsgi application tries to run, i also assume that my apache conf is correct.
What am I missing here?
Thank you for your help
ModuleNotFoundError means that a module is missing and that you have to install it. In your case its PyWPS.
So open a terminal and type
pip install pywps
or
pip3 install pywps
then if you rerun your code it should work.
P.S. You can find the package on PyPI

How do I import numpy into an Apache Beam pipeline, running on GCP Dataflow?

I am attempting to write an Apache Beam pipeline using Python (3.7). I am running into issues importing numpy, specifically, attempting to use numpy in a DoFn transformation class I wrote.
When running in GCP DataFlow, I am getting the following error "NameError: name 'numpy' is not defined"
To start, everything works how one would expect when using the DirectRunner. The issue is solely when using the DataFlow runner by GCP.
I believe the problem is related to how scope works in GCP DataFlow, and not the import itself. For example, I can successfully get the import to work if I add it to the "process" method inside my class, but am unsuccessful when I add the import at the top of the file.
I tried both using a requirements file, and a setup.py file as command options for the pipeline, but nothing changed. Again, I don't believe the problem is bringing in numpy, but more to do with DataFlow having unexpected scoping of class/functions.
setup.py file
from __future__ import absolute_import
from __future__ import print_function
import setuptools
REQUIRED_PACKAGES = [
'numpy',
'Cython',
'scipy',
'google-cloud-bigtable'
]
setuptools.setup(
name='my-pipeline',
version='0.0.1',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
)
Overall, I am running into many issues with "scope" that I am hoping someone can help with as the Apache Beam documentation really doesn't cover this to well.
from __future__ import absolute_import
from __future__ import division
import apache_beam as beam
import numpy
class Preprocess(beam.DoFn):
def process(self, element, *args, **kwargs):
# Demonstrating how I want to call numpy in the process function
if numpy.isnan(numpy.sum(element['signal'])):
return [MyOject(element['signal'])]
def run(argv=None):
parser = argparse.ArgumentParser()
args, pipeline_args = parser.parse_known_args(argv)
options = PipelineOptions(pipeline_args)
p = beam.Pipeline(options=options)
messages = (p | beam.io.ReadFromPubSub(subscription=args.input_subscription).with_output_types(bytes))
lines = messages | 'Decode' >> beam.Map(lambda x: x.decode('utf-8'))
json_messages = lines | "Jsonify" >> beam.Map(lambda x: json.loads(x))
preprocess_messages = json_messages | "Preprocess" >> beam.ParDo(Preprocess())
result = p.run()
result.wait_until_finish()
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()
I expect the pipeline to work similarly to how it does when running locally with the DirectRunner, but instead the scoping/importing works different and is causing my pipeline to crash.
When you launch an Apache Beam DirectRunner python program from your desktop, the program is running on your desktop. You have already installed the numpy library locally. However, you have not informed Dataflow to download and install numpy. That is why your program runs as DirectRunner but fails as DataflowRunner.
Edit/Create a normal Python requirements.txt file and include all dependencies such as numpy. I prefer to use virtualdev, import required packages, make sure that my program runs under DirectRunner and then run pip freeze to create my package list for requirements.txt. Now Dataflow will know what packages to import so that your program runs on the Dataflow cluster.

ModuleNotFoundError - Airflow error while import Python file

I created a very simple DAG to execute a Python file using PythonOperator. I'm using docker image to run Airflow but it doesn't recognize a module where I have my .py file
The structure is like this:
main_dag.py
plugins/__init__.py
plugins/njtransit_scrapper.py
plugins/sql_queries.py
plugins/config/config.cfg
cmd to run docker airflow image:
docker run -p 8080:8080 -v /My/Path/To/Dags:/usr/local/airflow/dags puckel/docker-airflow webserver
I already tried airflow initdb and restarting the web server but it keeps showing the error ModuleNotFoundError: No module named 'plugins'
For the import statement I'm using:
from plugins import njtransit_scrapper
This is my PythonOperator:
tweets_load = PythonOperator(
task_id='Tweets_load',
python_callable=njtransit_scrapper.main,
dag=dag
)
My njtransit_scrapper.py file is just a file that collects all tweets for a tweeter account and saves the result in a Postgres database.
If I remove the PythonOperator code and imports the code works fine. I already test almost everything but I'm not quite sure if this is a bug or something else.
It's possible that when I created a volume for the docker image, it's just importing the main dag and stopping there causing to not import the entire package?
To help others who might land on this page and get this error because of the same mistake I did, I will record it here.
I had an unnecessary __init__.py file in dags/ folder.
Removing it solved the problem, and allowed all the dags to find their dependency modules.

Celery 4.2.1 (windowlicker) not throwing exception in Linux

In local environment [windows], celery seems to be working fine when the following is executed:
try:
from .redis_repo import redis_store
except ImportError:
from redis_repo import redis_store
This is point of entry for my celery task, in local environment its working because try catch seems to working fine. But in linux environment it doesn't .
Any ideas on this?

Segmentation fault in read Lttng events with Python

I used Ubunto 16.04, Lttng 2.8.1 and python3.5.2. I also installed python3-babeltrace package. The first step was I recorded some logs exactly based on Lttng documents with lttng create, evenet-enable, start, stop, destroy. In the second step I write a very simple python program to read lttng events, some thing like this:
from collections import Counter
import babeltrace
import sys
print("Start")
trace_path = sys.argv[1]
print("1-Get Path")
col = babeltrace.TraceCollection()
print("2-TraceCollection")
# (LTTng traces always have the 'ctf' format)
if col.add_trace(trace_path, 'ctf') is None:
raise RuntimeError('Cannot add trace')
print("3-Add trace by ctf")
for event in col.events:
print(event.name)
print("4-Get all events")
then I debug the program with gdb and after this outputs:
Start
1-Get Path
2-TraceCollection
3-Add trace by ctf
I got the error:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff565d97f in bt_iter_add_trace ()
from /usr/lib/x86_64-linux-gnu/libbabeltrace.so.1
Does any one have any idea about this?
I uninstall all of the packages, also uninstall ubuntu and then installed again but each time I got the same error.
I also try to install Ubuntu 16.10 but with that one I got another error in lttng-module package installation.
Update:
I found that none of babeltrace command and lttng view not worked, and caused the segmentation fault error.

Categories