Deferred tasks creates new instances that can't access some python modules - python

I am using the latest version of GAE with automated scaling, endpoints API, and deferred.defer() tasks.
The problem is that since adding the API, there have been some instances that will spin up automatically that always throw permanent task failures:
Permanent failure attempting to execute task
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 310, in post
self.run_from_request()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 305, in run_from_request
run(self.request.body)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 145, in run
raise PermanentTaskFailure(e)
PermanentTaskFailure: No module named app.Report
The permanent task failures are unique for a single instance though, in which every deferred tasks on that instance fail. These deferred tasks all throw the same error, even though the tasks aren't using the Api.py module. On other instances, the same deferred tasks will run just fine if they aren't routed to a failing instance.
The app.yaml handlers looks like this:
handlers:
# Api Handler
- url: /_ah/api/.*
script: main.api
- url: /_ah/spi/.*
script: main.api
# All other traffic
- url: .*
script: main.app
builtins:
- deferred: on
The main.py looks like:
import Api, endpoints, webapp2
api = endpoints.api_server([Api.AppApi])
app = webapp2.WSGIApplication(
[(misc routes)]
,debug=True)
The Api.py looks like :
import endpoints
from protorpc import messages
from protorpc import message_types
from protorpc import remote
from google.appengine.ext import deferred
from app.Report import ETLScheduler
#endpoints.api(...)
class AppApi(remote.Service):
#endpoints.method(...)
def reportExtract(self, request):
deferred.defer(
ETLScheduler,
params
)
I'm not doing any path modification, so I'm curious why the new instance is having trouble finding the python modules for the API, even though the deferred tasks are in another module using other functions. Why would it throw these errors for that instance only?
Edit:
So after looking at some other SO issues, I tried doing path modification in appengine_config.py. I moved all my folders to a lib directory, and added this to the config file:
import os,sys
sys.path.append(os.path.join(os.path.dirname(__file__), 'lib'))
Now the error I get on the failing instance is:
Permanent failure attempting to execute task
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 310, in post
self.run_from_request()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 305, in run_from_request
run(self.request.body)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 145, in run
raise PermanentTaskFailure(e)
PermanentTaskFailure: cannot import name ETLScheduler
So it seems to be finding the module, but same as before, none of the deferred tasks on the instance can import the method.

So I figured out a way to make it work, but am not sure why it works.
By importing the entire module, rather than a method from the module, the new instances that spin up for deferred tasks no longer throw the PermanentTaskFailure: cannot import name ETLScheduler error.
I tried importing the whole module instead of the method, so that the Api.py looks like this:
import endpoints
from protorpc import messages
from protorpc import message_types
from protorpc import remote
from google.appengine.ext import deferred
# Import the module instead of the method
#from app.Report import ETLScheduler
import app.Report
#endpoints.api(...)
class AppApi(remote.Service):
#endpoints.method(...)
def reportExtract(self, request):
deferred.defer(
app.Report.ETLScheduler,
params
)
Now I am no longer getting instances that throw the PermanentTaskFailure: cannot import name ETLScheduler. Might be a circular dependency by import Api.py in main.py (I'm not sure) but at least it works now.

You're missing the _target kwarg in your defer invocation if you're trying to run something in a specific module.
deferred.defer(
app.Report.ETLScheduler,
params,
_target="modulename"
)

Related

Apache Airflow giving broken DAG error cannot import __builtin__ for speedtest.py

This is a weird error I'm coming across. In my Python 3.7 environment I have installed Airflow 2, speedtest-cli and few other things using pip and I keep seeing this error popup in the Airflow UI:
Broken DAG: [/env/app/airflow/dags/my_dag.py] Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/speedtest.py", line 156, in <module>
import __builtin__
ModuleNotFoundError: No module named '__builtin__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/speedtest.py", line 179, in <module>
_py3_utf8_stdout = _Py3Utf8Output(sys.stdout)
File "/usr/local/lib/python3.7/site-packages/speedtest.py", line 166, in __init__
buf = FileIO(f.fileno(), 'w')
AttributeError: 'StreamLogWriter' object has no attribute 'fileno'
For sanity checks I did run the following and saw no problems:
~# python airflow/dags/my_dag.py
/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py:94 DeprecationWarning: provide_context is deprecated as of 2.0 and is no longer required
~# airflow dags list
dag_id | filepath | owner | paused
===========+===============+=========+=======
my_dag | my_dag.py | rafay | False
~# airflow tasks list my_dag
[2021-03-08 16:46:26,950] {dagbag.py:448} INFO - Filling up the DagBag from /env/app/airflow/dags
/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py:94 DeprecationWarning: provide_context is deprecated as of 2.0 and is no longer required
Start_backup
get_configs
get_targets
push_targets
So nothing out of the ordinary and testing each of the tasks does not cause problems either. Further running the speedtest-cli script independently outside of Airflow does not raise any errors either. The script goes something like this:
import speedtest
def get_upload_speed():
"""
Calculates the upload speed of the internet in using speedtest api
Returns:
Returns upload speed in Mbps
"""
try:
s = speedtest.Speedtest()
upload = s.upload()
except speedtest.SpeedtestException as e:
raise AirflowException("Failed to check network bandwidth make sure internet is available.\nException: {}".format(e))
return round(upload / (1024**2), 2)
I even went to the exact line of speedtest.py as mentioned Broken DAG error, line 156, it seems fine and runs fine when I put in in the python interpreter.
try:
import __builtin__
except ImportError:
import builtins
from io import TextIOWrapper, FileIO
So, how do I diagnose this? Seems like a package import problem of some sort
Edit: If it helps here is my directory and import structure for my_dag.py
- airflow
- dags
- tasks
- get_configs.py
- get_taargets.py
- push_targets.py (speedtest is imported here)
- my_dag.py
The import sequence of tasks in the dag file are as follows:
from datetime import timedelta
# The DAG object; we'll need this to instantiate a DAG
from airflow import DAG
# Operators; we need this to operate!
from airflow.operators.python import PythonOperator
from airflow.operators.dummy import DummyOperator
from tasks.get_configs import get_configs
from tasks.get_targets import get_targets
from tasks.push_targets import push_targets
...
The Airflow StreamLogWriter (and other log-related facilities) do not implement the fileno method expected by "standard" Python (I/O) log facility clients (confirmed by a todo comment). The problem here happens also when enabling the faulthandler standard library in an Airflow task.
So what to do at this point? Aside opening an issue or sending a PR to Airflow, it is really case by case. In the speedtest-cli situation, it may be necessary to isolate the function calling fileno, and try to "replace" it (e.g. forking the library, changing the function if it can be isolated and injected, perhaps choosing a configuration that does not use that part of the code).
In my particular case, there is no way to bypass the code, and a fork was the most straightforward method.

PyCharm & console don't let me use local modules

I'm absolutely frustraded about the fact that I can't start my Python journey. I have a simple service which I use as a training with Python which is new for me.
I've downloaded PyCharm and as long as I had one file, everything was fine.
That I decided to to some structure and suddenly my project stopped working.
I have a structure like:
project/
project/employees
project/employees/__init__.py
project/employees/employees.py
project/server.py
project/venv/
project/venv/(...)
The project is a source root.
And yet I have something like this:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/project/server.py", line 5, in <module>
from employees.employees import Employees, EmployeesName
File "C:\Users\user\PycharmProjects\project\employees\employees.py", line 4, in <module>
from server import db_connect
File "C:\Users\user\PycharmProjects\project\server.py", line 5, in <module>
from employees.employees import Employees, EmployeesName
ImportError: cannot import name 'Employees'
I tested this with VS Code and CMD and the same happend.
I would be grateful for any suggestions!
EDIT:
employees.py:
from flask_jsonpify import jsonify
from flask_restful import Resource
from server import db_connect
class Employees(Resource):
(...)
class EmployeesName(Resource):
(...)
The problem here is that you have a circular dependency.
In employees.py you import server.py; and vice versa.
You have to rearrange your .py files in order that not to happen anymore.

Error trying to import between python files

I have a basic parser app I'm building in Python. I monitors a folder and imports files when they are dropped there. I have a MongoDB that I'm trying to save the imports to. There's almost nothing to it. The problem happens when I try to include one of my class/mongo-document files. I'm sure it's a simple syntax issue I don't understand. I have all my requirements installed, and I'm running this in an virtual env. This is my first python app though, so it's likely something I'm not seeing.
My file structure is
application.py
requirements.txt
__init__.py
-services
parser.py
__init__.py
-models
hl7message.py
__init__.py
Here is application.py
from mongoengine import connect
import os, os.path, time
from services import parser
db = connect('testdb')
dr = 'C:\\Imports\\Processed'
def processimports():
while True:
files = os.listdir(dr)
print(str(len(files)) + ' files found')
for f in files:
msg = open(dr + '\\' + f).read().replace('\n', '\r')
parser.parse_message(msg)
print('waiting')
time.sleep(10)
processimports()
requirements.txt
mongoengine
hl7
parser.py
import hl7
from models import hl7message
def parse_message(message):
m = hl7.parse(str(message))
h = hl7message()
hl7message.py
from utilities import common
from application import db
import mongoengine
class Hl7message(db.Document):
message_type = db.StringField(db_field="m_typ")
created = db.IntField(db_field="cr")
message = db.StringField(db_field="m")
If I don't include the hl7message class in the parser.py it runs fine, but as soon as I include it I get the error, so I'm sure it has something to do with that file. The error message though isn't to helpful. I don't know if I've got myself into some kind of include loop or something.
Sorry, stack trace is below
Traceback (most recent call last):
File "C:/OneDrive/Dev/3/Importer/application.py", line 3, in <module>
from services import parser
File "C:\OneDrive\Dev\3\Importer\services\parser.py", line 2, in <module>
from models import hl7message
File "C:\OneDrive\Dev\3\Importer\models\hl7message.py", line 2, in <module>
from application import db
File "C:\OneDrive\Dev\3\Importer\application.py", line 23, in <module>
processimports()
File "C:\OneDrive\Dev\3\Importer\application.py", line 17, in processimports
parser.parse_message(msg)
AttributeError: module 'services.parser' has no attribute 'parse_message'
This is a circular import issue. Application.py imports parser, which imports h17 which imports h17message, which imports application which runs processimports before the whole code of the parser module has been run.
It seems to me that service modules should not import application. You could create a new module common.py containing the line db = connect('testdb') and import db from common both in application.py and in h17message.

Importing a sub class from a sub folder

New to Python I'm trying to setup a simple OOP-structure of files, folders and classes. Here are the file paths:
C:\Users\Mc_Topaz\Programmering\Python\Terminal\Main.py
C:\Users\Mc_Topaz\Programmering\Python\Terminal\Connections\Connection.py
C:\Users\Mc_Topaz\Programmering\Python\Terminal\Connections\NoConnection.py
Notice Connection.py and NoConnection.py is loacted in sub folder Connections.
Connection.py
class Connection:
def __init__(self):
pass
def ToString(self):
pass
NoConnection.py
from Connection import Connection
class NoConnection(Connection):
def __init__(self):
pass
def ToString(self):
print("No connection")
In the Main.py file I would like to call the ToString() method from each class.
Main.py
from Connections.Connection import Connection
from Connections.NoConnection import NoConnection
connection = Connection()
print(connection.ToString())
noConnection = NoConnection()
print(noConnection.ToString())
When I run the Main.py file I get this error:
C:\Users\Mc_Topaz\Programmering\Python\Terminal>Main.py Traceback
(most recent call last): File
"C:\Users\Mc_Topaz\Programmering\Python\Terminal\Main.py", line 2, in
from Connections.NoConnection import NoConnection
File
"C:\Users\Mc_Topaz\Programmering\Python\Terminal\Connections\NoConnection.py",
line 1, in
from Connection import Connection
ImportError: No module named 'Connection'
It seems that the interpreter cannot import the NoConnection class in my Main.py file due to it cannot import the Connection class from the NoConnection.py file.
I can run Connection.py and NoConnection.py separately with no problems.
I don't understand why the Main.py don't run. I assume is something super simple and I cannot see it due to I'm to green to Python.
For python to recognize a directory is a module, or a collection of python files, that directory must contain a file named __init__.py. That file doesn't need to contain anything code whatsoever, though it can. If you add this file to your Connections directory, the interpreter should be able to import the contained files.
Just to add on to what mobiusklein mentioned: a common practice with __init__.py is to import the objects you use most frequently to avoid redundancy in your imports. In your example your Connections\__init__.py would likely contain:
import Connection
import NoConnection
Then your Main.py could use the following import statements successfully:
from Connections import Connection, NoConnection
I think it's down to relative imports in Python 3.
Change the import line in NoConnection to be explicit...
from Connections.Connection import Connection
and it works in Python3 (it works either way in Python2). Path to Terminal may have to be on you PYTHONPATH environment variable.
"Python 3 has disabled implicit relative imports altogether; imports are now always interpreted as absolute, meaning that in the above example import baz will always import the top-level module. You will have to use the explicit import syntax instead (from . import baz)." from here https://softwareengineering.stackexchange.com/questions/159503/whats-wrong-with-relative-imports-in-python
So when you import Connection from NoConnection it will be looking for it at the Terminal level, not at the Terminal/Connections level.
UPDTAE: Read comments for this post as it contain the solution.
Still don't get this to work. But I have made some changes:
Changed names of classes
> Class Connection = Foo inside Connection.py
> NoConnection = Bar inside NoConnection.py
This is ensure that the interpreter don't get confused if Connection is a module or a class.
Dropped ini.py file
I have dropped the Connections\__init_.py file as this don't seem to be necessary in Python version 3.4.2.
Files still run separately
I can run Connection.py and NoConnection.py separately. So they work.
Main.py
from Connections.NoConnection import Bar
noConnection = Bar()
print(noConnection.ToString())
When running Main.py I get the same error at line 1:
"Cannot find module 'Connection' in NoConnection.py at line 1".
The only logical reason I can see why this error happens is:
The interpreter looks for Connection.py inside my Terminal folder as Python was started from that folder. Even if it's importing the NoConnection.py file from the Connections folder where Connection.py is located.
Is this a bug?

web2py db is not defined

I'm trying to run a script at command line that uses the models with the following command:
c:\web2py>python web2py.py -M -N -S automate -R applications/automate/modules/eventserver.py
but I keep getting the error:
web2py Web Framework
Created by Massimo Di Pierro, Copyright 2007-2011
Version 1.99.7 (2012-03-04 22:12:08) stable
Database drivers available: SQLite3, pymysql, pg8000, IMAP
Traceback (most recent call last):
File "c:\web2py\gluon\shell.py", line 206, in run
execfile(startfile, _env)
File "applications/automate/modules/eventserver.py", line 6, in <module>
deviceHandler = devicehandler.DeviceHandler()
File "applications\automate\modules\devicehandler.py", line 10, in __init__
self.devices = self.getActiveDevices()
File "applications\automate\modules\devicehandler.py", line 18, in getActiveDe
vices
print db
NameError: global name 'db' is not defined
What am I doing wrong?
edit: From my research I have only found the solution "add -M to your command" but I've already done that and it still doesnt work.
edit2: I have db = DAL('sqlite://storage.sqlite') in my db.py so it should get loaded
edit2: I have db = DAL('sqlite://storage.sqlite') in my db.py so it should get loaded
Assuming db.py is in the /models folder, the db object created there will be available in later executed model files as well as in the controller and view, but it will not be available within modules that you import. Instead, you will have to pass the db object to a function or class in the module. Another option is to add the db object to the current thread local object, which can then be imported and accessed within the module:
In /models/db.py:
from gluon import current
db = DAL('sqlite://storage.sqlite')
current.db = db
In /modules/eventserver.py:
from gluon import current
def somefunction():
db = current.db
[do something with db]
Note, if you do define the db object in the module, don't define it at the top level -- define it in a function or class.
For more details, see the book section on modules and current.

Categories