Im trying to run scheduled task using Django-q I followed the docs but its not running
heres my config
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
'LOCATION': 'db_cache_table',
}
}
Q_CLUSTER = {
'name': 'DjangORM',
'workers': 4,
'timeout': 90,
'retry': 120,
'queue_limit': 50,
'bulk': 10,
'orm': 'default'
}
heres my scheduled task
Nothin is executing, please help
I also had problems with getting scheduled tasks processed in the first place, but finally found a workflow.
I run django-q on a windows machine, using the django ORM as a broker.
Before talking about the execution routine i came up, lets quickly check out my modules first, starting with ..
settings.py:
Q_CLUSTER = {
"name": "austrian_energy_monthly",
"workers": 1,
"timeout": 10,
"retry": 20,
"queue_limit": 50,
"bulk": 10,
"orm": "default",
"ack_failures": True,
"max_attempts": 1,
"attempt_count": 0,
}
.. and my folder structure:
As you can see, the folder of my django project is inside the src folder. Further, there's a folder for the app i created for this project, which is simply called "app". Inside the app folder i do have another folder called "cron", which includes the following files and functions related to the scheduling:
tasks.py
I do not use the schedule() method provided by the django-q, instead i go for the creating tables directly (see: django-q official schedule docs)
from django.utils import timezone
from austrian_energy_monthly.app.cron.func import create_text_file
from django_q.models import Schedule
Schedule.objects.create(
func="austrian_energy_monthly.app.cron.func.create_text_file",
kwargs={"content": "Insert this into a text file"},
hooks="austrian_energy_monthly.app.cron.hooks.print_result",
name="Text file creation process",
schedule_type=Schedule.ONCE,
next_run=timezone.now(),
)
Make sure you assign the "right" path to the "func" keyword. Just using "func.create_text_file",didn't work out for me, even though these files are laying in the same folder. Same for the "hooks" keyword.
(NOTE: I've set up my project as a development package via setup.py, such that i can call it from everywhere inside my src folder)
func.py:
Contains the function called by the schedule table object.
def create_text_file(content: str) -> str:
file = open(f"copy.txt", "w")
file.write(content)
file.close()
return "Created a text file"
hooks.py:
Contains the function called after the scheduled process finished.
def print_result(task):
print(task.result)
Let's now see how i managed to get the executions running for with the file examples described above:
First i've scheduled the "Text file creation process". Therefore I used "python manage.py shell" and imported the tasks.py module (you probably could schedule everythin via the admin page as well, but i didnt tested this yet):
You could now see the scheduled task, with a question mark on the success column in the admin page (tab "Scheduled tasks", as within your picture):
After that i opened a new terminal and started the cluster with "python manage.py qcluster", resulting in the following output in the terminal:
The successful execution can be inspected by looking at "13:22:17 [Q] INFO Processed [ten-virginia-potato-high]", alongside the hook print statement "Created a text file" in the terminal. Further you can check it at the admin page, under the tab "Successful Tasks", where you should see:
Hope that helped!
Django-q dont support windows. :)
Related
Is it possible to create a Airflow DAG programmatically, by using just REST API?
Background
We have a collection of models, each model consists of:
A collection of SQL files that need to be run for the model
We also keep a JSON file for each model which defines the dependencies between each SQL file.
The scripts are run through a Python job.py file that takes a script file name as parameter.
Our models are updated by many individuals so we need to update our DAG daily. What we have done is created a scheduled Python script that reads all the JSON files and for each model creates in memory DAG that executes each model and its SQL scripts as per the defined dependencies in the JSON config files. What we want to do is to be able to recreate that DAG visually within Airflow DAG programmatically and then execute it, rerun failures etc.
I did some research and per my understanding Airflow DAGs can only be created by using decorators on top of Python files. Is there another approach I missed using REST API?
Here is an example of a JSON we have:
{
"scripts" :[
"Script 1": {
"script_task" : "job.py",
"script_params" : {
"param": "script 1.sql"
},
"dependencies": [
"Script 2",
"Script 3"
]
},
"Script 2": {
"script_task" : "job.py",
"script_params" : {
"param": "script 2.sql"
},
"dependencies": [
"Script 3"
]
},
"Script 3": {
"script_task" : "job.py",
"script_params" : {
"param": "script 3.sql"
},
"dependencies": [
]
}
]
}
Airflow dags are python objects, so you can create a dags factory and use any external data source (json/yaml file, a database, NFS volume, ...) as source for your dags.
Here are the steps to achieve your goal:
create a python script in your dags folder (assume its name is dags_factory.py)
create a python class or method which return a DAG object (assume it is a method and it is defined as create_dag(config_dict))
in the main, load your file/(any external data source) and loop over dags configs, and for each dag:
# this step is very important to persist the created dag and add it to the dag bag
globals()[<dag id>] = create_dag(dag_config)
So without passing in the details of your java file, if you have already a script which creates the dags in memory, try to apply those steps, and you will find the created dags in the metadata and the UI.
Here are some tips:
Airflow runs the dag file processor each X seconds (conf), so no need to use an API, instead, you can upload your files to S3/GCS or a git repository , and load them in the main script before calling the create_dag method.
Try to imporve your json schema, for ex, scripts can be an array
For the method create_dag, I will try to simplify the code (according to what I understood from your json file):
from datetime import datetime
from json import loads
from airflow import DAG
from airflow.operators.bash import BashOperator
def create_dag(dag_id, dag_conf) -> DAG:
scripts = dag_conf["scripts"]
tasks_dict = {}
dag = DAG(dag_id=dag_id, start_date=datetime(2022, 1, 1), schedule_interval=None) # configure your dag
for script_name, script_conf in scripts.items():
task = BashOperator(
bash_command=f"python {script_conf['script_task']} {(f'{k}={v}' for k, v in script_conf['script_params'])}",
dag=dag
)
tasks_dict[script_name] = {
"task": task,
"dependencies": script_conf["dependencies"]
}
for task_conf in tasks_dict.values():
for dependency in task_conf["dependencies"]:
task_conf["task"] << tasks_dict[dependency]["task"] # if you mean the inverse, you can replace << by >>
return dag
if __name__ == '__main__':
# create a loop if you have multiple file
# you can load the files from git or S3, I use local storage for testing
dag_conf_file = open("dag.json", "r")
dag_conf_dict = loads(dag_conf_file.read())
dag_id = "test_dag" # read it from the file
globals()[dag_id] = create_dag(dag_id, dag_conf_dict)
P.S: if you will create a big number of dags in the same script (one script to process multiple json file), you may have some performance issues because Airflow scheduler and workers will re-run the script for each task operation, so you will need to improve it using magic loop or the new syntax added in 2.4
I'm using a server for the first time. It has Ubuntu 18.04.
I've never worked with that OS, but after some guides I managed to get my code working, except for the environment variable.
In ~/.bashrc at the end of file I added export KEY="123asd".
Then I reloaded the terminal.
I checked if my environment variable is loaded via printenv KEY and it shows the correct value.
In my main.py there's:
import os
import telebot
API_KEY = os.getenv("KEY")
bot = telebot.TeleBot(API_KEY)
When I run it with pm2 start main.py --interpreter=python3 there's an error in logs:
raise Exception('Bot token is not defined')
Exception: Bot token is not defined
If I understand correctly it means that API_KEY is None so there's a problem with the environment variable.
I tried giving API_KEY an actual value, not an environment variable, and it worked fine.
So what else do I need to do to use an environment variable properly?
I was looking in the wrong place.
If I want to use pm2 then I need to create a ecosystem.config.js file and give it my variable. Like this:
module.exports = {
apps : [{
name: "main.py",
env: {
KEY: "123asd"
}
}]
}
It works, only I'm not sure if it's correct since there are more than 1 processes of my main.py (1 online, others are erorred)
We are using Django as backend for a website that provides various things, among others using a Neural Network using Tensorflow to answer to certain requests.
For that, we created an AppConfig and added loading of this app config to the INSTALLED_APPS in Django's settings.py. This AppConfig then loads the Neural Network as soon as it is initialized:
settings.py:
INSTALLED_APPS = [
...
'bert_app.apps.BertAppConfig',
]
.../bert_apps/app.py:
class BertAppConfig(AppConfig):
name = 'bert_app'
if 'bert_app.apps.BertAppConfig' in settings.INSTALLED_APPS:
predictor = BertPredictor() #loads the ANN.
Now while that works and does what it should, the ANN is now loaded for every single command run through manage.py. While we of course want it to be executed if you call manage.py runserver, we don't want it to be run for manage.py migrate, or manage.py help and all other commands.
I am generally not sure if this is the proper way how to load an ANN for a Django-Backend in general, so does anybody have any tips how to do this properly? I can imagine that loading the model on startup is not quite best practice, and I am very open to suggestions on how to do that properly instead.
However, there is also some other code besides the actual model-loading that also takes a few seconds and that is definitely supposed to be executed as soon as the server starts up (so on manage.py runserver), but also not on manage.py help (as it takes a few seconds as well), so is there some quick fix for how to tell Django to execute it only on runserver and not for its other commands?
I had a similar problem, solved it with checking argv.
class SomeAppConfig(AppConfig):
def ready(self, *args, **kwargs):
is_manage_py = any(arg.casefold().endswith("manage.py") for arg in sys.argv)
is_runserver = any(arg.casefold() == "runserver" for arg in sys.argv)
if (is_manage_py and is_runserver) or (not is_manage_py):
init_your_thing_here()
Now a bit closer to the if not is_manage_py part: in production you run your web server with uwsgi/uvicorn/..., which is still a web server, except it's not run with manage.py. Most likely, it's the only thing that you will ever run without manage.py
Use AppConfig.ready() - it's intended for it:
Subclasses can override this method to perform initialization tasks such as registering signals. It is called as soon as the registry is fully populated. - [django documentation]
To get your AppConfig back, use:
from django.apps import apps
apps.get_app_config(app_name)
# apps.get_app_configs() # all
This is another way, in your manage.py will have something probably look like this
def main():
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'slambook.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)
# check if has runserver
if `runserver` in sys.argv:
#execute your custom function
if __name__ == '__main__':
main()
you can check sys.argv if it have runserver, if so then execute your script or function
I am trying to create a DAG that generates tasks dynamically based on a JSON file located in storage. I followed this guide step-by-step:
https://bigdata-etl.com/apache-airflow-create-dynamic-dag/
But the DAG gets stuck with the following message:
Is it possible to read an external file and use it to create tasks dynamically in Composer? I can do this when I read data only from an airflow Variable, but when I read an external file, the dag gets stuck in the isn't available in the web server's DagBag object state. I need to read from an external file as the contents of the JSON will change with every execution.
I am using composer-1.8.2-airflow-1.10.2.
I read this answer to a similar question:
Dynamic task definition in Airflow
But I am not trying to create the tasks based on a separate task, only based on the external file.
This is my second approach that also get's stuck in that error state:
import datetime
import airflow
from airflow.operators import bash_operator
from airflow.operators.dummy_operator import DummyOperator
from airflow.models import Variable
import json
import os
products = json.loads(Variable.get("products"))
default_args = {
'owner': 'Composer Example',
'depends_on_past': False,
'email': [''],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': datetime.timedelta(minutes=5),
'start_date': datetime.datetime(2020, 1, 10),
}
with airflow.DAG(
'json_test2',
default_args=default_args,
# Not scheduled, trigger only
schedule_interval=None) as dag:
# Print the dag_run's configuration, which includes information about the
# Cloud Storage object change.
def read_json_file(file_path):
if os.path.exists(file_path):
with open(file_path, 'r') as f:
return json.load(f)
def get_run_list(files):
run_list = []
#The file is uploaded in the storage bucket used as a volume by Composer
last_exec_json = read_json_file("/home/airflow/gcs/data/last_execution.json")
date = last_exec_json["date"]
hour = last_exec_json["hour"]
for file in files:
#Testing by adding just date and hour
name = file['name']+f'_{date}_{hour}'
run_list.append(name)
return run_list
rl = get_run_list(products)
start = DummyOperator(task_id='start', dag=dag)
end = DummyOperator(task_id='end', dag=dag)
for name in rl:
tsk = DummyOperator(task_id=name, dag=dag)
start >> tsk >> end
It is possible to create DAG that generates task dynamically based on a JSON file, which is located in a Cloud Storage bucket. I followed guide, that you provided, and it works perfectly in my case.
Firstly you need to upload your JSON configuration file to $AIRFLOW_HOME/dags directory, and then DAG python file to the same path (you can find the path in airflow.cfg file, which is located in the bucket).
Later on, you will be able to see DAG in Airflow UI:
As you can see the log DAG isn't available in the web server's DagBag object, the DAG isn't available on Airflow Web Server. However, the DAG can be scheduled as active because Airflow Scheduler is working independently with the Airflow Web Server.
When a lot of DAGs are loaded at once to a Composer environment, it may overload on the environment. As the Airflow webserver is on a Google-managed project, only certain types of updates will cause the webserver container to be restarted, like adding or upgrading one of the PyPI packages or changing an Airflow setting. The workaround is to add a dummy environment variable:
Open Composer instance in GCP
ENVIRONMENT VARIABLE tab
Edit, then add environment variable and Submit
You can use following command to restart it:
gcloud composer environments update ${ENVIRONMENT_NAME} --location=${ENV_LOCATION} --update-airflow-configs=core-dummy=true
gcloud composer environments update ${ENVIRONMENT_NAME} --location=${ENV_LOCATION} --remove-airflow-configs=core-dummy
I hope you find the above pieces of information useful.
I have a small Python web application using the Cherrypy framework. I am by no means an expert in web servers.
I got Cherrypy working with Apache using mod_python on our Ubuntu server. This time, however, I have to use Windows 2003 and IIS 6.0 to host my site.
The site runs perfectly as a stand alone server - I am just so lost when it comes to getting IIS running. I have spent the past day Googling and blindly trying any and everything to get this running.
I have all the various tools installed that websites have told me to (Python 2.6, CherrpyPy 3, ISAPI-WSGI, PyWin32) and have read all the documentation I can. This blog was the most helpful:
http://whatschrisdoing.com/blog/2008/07/10/turbogears-isapi-wsgi-iis/
But I am still lost as to what I need to run my site. I can't find any thorough examples or how-to's to even start with. I hope someone here can help!
Cheers.
I run CherryPy behind my IIS sites. There are several tricks to get it to work.
When running as the IIS Worker Process identity, you won't have the same permissions as you do when you run the site from your user process. Things will break. In particular, anything that wants to write to the file system will probably not work without some tweaking.
If you're using setuptools, you probably want to install your components with the -Z option (unzips all eggs).
Use win32traceutil to track down problems. Be sure that in your hook script that you're importing win32traceutil. Then, when you're attempting to access the web site, if anything goes wrong, make sure it gets printed to standard out, it'll get logged to the trace utility. Use 'python -m win32traceutil' to see the output from the trace.
It's important to understand the basic process to get an ISAPI application running. I suggest first getting a hello-world WSGI application running under ISAPI_WSGI. Here's an early version of a hook script I used to validate that I was getting CherryPy to work with my web server.
#!python
"""
Things to remember:
easy_install munges permissions on zip eggs.
anything that's installed in a user folder (i.e. setup develop) will probably not work.
There may still exist an issue with static files.
"""
import sys
import os
import isapi_wsgi
# change this to '/myapp' to have the site installed to only a virtual
# directory of the site.
site_root = '/'
if hasattr(sys, "isapidllhandle"):
import win32traceutil
appdir = os.path.dirname(__file__)
egg_cache = os.path.join(appdir, 'egg-tmp')
if not os.path.exists(egg_cache):
os.makedirs(egg_cache)
os.environ['PYTHON_EGG_CACHE'] = egg_cache
os.chdir(appdir)
import cherrypy
import traceback
class Root(object):
#cherrypy.expose
def index(self):
return 'Hai Werld'
def setup_application():
print "starting cherrypy application server"
#app_root = os.path.dirname(__file__)
#sys.path.append(app_root)
app = cherrypy.tree.mount(Root(), site_root)
print "successfully set up the application"
return app
def __ExtensionFactory__():
"The entry point for when the ISAPIDLL is triggered"
try:
# import the wsgi app creator
app = setup_application()
return isapi_wsgi.ISAPISimpleHandler(app)
except:
import traceback
traceback.print_exc()
f = open(os.path.join(appdir, 'critical error.txt'), 'w')
traceback.print_exc(file=f)
f.close()
def install_virtual_dir():
import isapi.install
params = isapi.install.ISAPIParameters()
# Setup the virtual directories - this is a list of directories our
# extension uses - in this case only 1.
# Each extension has a "script map" - this is the mapping of ISAPI
# extensions.
sm = [
isapi.install.ScriptMapParams(Extension="*", Flags=0)
]
vd = isapi.install.VirtualDirParameters(
Server="CherryPy Web Server",
Name=site_root,
Description = "CherryPy Application",
ScriptMaps = sm,
ScriptMapUpdate = "end",
)
params.VirtualDirs = [vd]
isapi.install.HandleCommandLine(params)
if __name__=='__main__':
# If run from the command-line, install ourselves.
install_virtual_dir()
This script does several things. It (a) acts as the installer, installing itself into IIS [install_virtual_dir], (b) contains the entry point when IIS loads the DLL [__ExtensionFactory__], and (c) it creates the CherryPy WSGI instance consumed by the ISAPI handler [setup_application].
If you place this in your \inetpub\cherrypy directory and run it, it will attempt to install itself to the root of your IIS web site named "CherryPy Web Server".
You're also welcome to take a look at my production web site code, which has refactored all of this into different modules.
OK, I got it working. Thanks to Jason and all his help. I needed to call
cherrypy.config.update({
'tools.sessions.on': True
})
return cherrypy.tree.mount(Root(), '/', config=path_to_config)
I had this in the config file under [/] but for some reason it did not like that. Now I can get my web app up and running - then I think I will try and work out why it needs that config update and doesn't like the config file I have...