I have a small python script that basically connects to a SQL Server (Micrsoft) database and gets users from there, and then syncs them to another mysql database, basically im just running queries to check if the user exists, if not, then add that user to the mysql database.
The script usually would take around 1 min to sync. I require the script to do its work every 5 mins (for example) exactly once (one sync per 5 mins).
How would be the best way to go about building this?
I have some test data for the users but on the real site, theres a lot more users so I can't guarantee the script takes 1 min to execute, it might even take 20 mins. However having an interval of say 15 mins everytime the script executes would be ideal for the problem...
Update:
I have the connection params for the sql server windows db, so I'm using a small ubuntu server to sync between the two databases located on different servers. So lets say db1 (windows) and db2 (linux) are the database servers, I'm using s1 (python server) and pymssql and mysql python modules to sync.
Regards
I am not sure cron is right for the job. It seems to me that if you have it run every 15 minutes but sometimes a synch takes 20 minutes you could have multiple processes running at once and possibly collide.
If the driving force is a constant wait time between the variable execution times then you might need a continuously running process with a wait.
def main():
loopInt = 0
while(loopInt < 10000):
synchDatabase()
loopInt += 1
print("call #" + str(loopInt))
time.sleep(300) #sleep 5 minutes
main()
(obviously not continuous, but long running) You can set the result of while to true and it will be continuous. (comment out loopInt += 1)
Edited to add: Please see note in comments about monitoring the process as you don't want the script to hang or crash and you not be aware of it.
You might want to use a system that handles queues, for example RabbitMQ, and use Celery as the python interface to implement it. With Celery, you can add tasks (like execution of a script) to a queue or run a schedule that'll perform a task after a given interval (just like cron).
Get started http://celery.readthedocs.org/en/latest/
Related
How to do that the best way?
How to autostart and run the script every 5 seconds? (i read something from a rs232 device)
I want to write some values every 5 seconds to a postgresql database and for this is it ok to open the database connection every 5 seconds and close it or can it be stay opend?
thanks in advance
I think the best way is to have a constantly running script that reads the value, sends to db, and sleeps for the remainer of the interval and keep the connection open. This way you can monitor and react if a read, write or both take too long for example. And then to have a separate script just to check if the main one is alive and notify you or restart the main one. I had some success with this model when reading from a bitcoin exchange api and inserting into mariadb every 6 seconds
I'm running a python script remotely from a task machine and it creates a process that is supposed to be running for 3 hours. However, it seems to be terminating prematurely at exactly 2 hours. I don't believe it is a problem with the code because after the while loop ends, I am logging to a log file. The log file doesn't show that it exits out of that while loop successfully. Is there a specific setting on the machine that I need to look into that's interrupting my python process?
Is this perhaps a Scheduled Task? If so, have you checked the task's properties?
On my Windows 7 machine under the "Settings" tab is a checkbox for "Stop the task if it runs longer than:" with a box where you can specify the duration.
One of the suggested durations on my machine is "2 hours."
I have a list of 1000 jobs that needs to be ran periodically, I'm putting 5 machines to ran these jobs, each machine will check the list from top to bottom and find the first one that needs to be ran (by comparing last_run_time and current time and the interval value of that job it knows whether a job needs to be ran again) and ran it. While one machine starts running a job, it updates the job to say it's running so other machines won't double run it.
Job table in DB:
job_id | last_run_time | interval (sec)| running
1 11:05 60 False
2 07:12 100 True
...
1000 12:56 3000 False
Each task can be ran 10% faster if ran on the same machine of last run because it needs result from previous run, if on the same machine it doesn't need to download that, so I want that to happen as often as possible. I'm thinking about naming each machine and include machine names as an extra column (last_run_on_machine) of this table, but that's troublesome and I want to be able to add more machines or shut down some of the machines whenever I want. What's a good algorithm or if there exists some tool/library that can do this? I'm using Python, the DB is MySQL, completing each task takes about 10 - 60 seconds.
I am relatively new to programming and thus have limited experience when it comes to the logic involved in getting scripts to work as I want.
In a nutshell I have an Arduino connected to a raspberry pi (Raspbian). The Arduino controls sensors while the raspberry pi acts as a web server. I have created a database in MySQL containing one database with two tables. The first table needs to receive an INSERT every 5 minutes (through script 1.py) and the second table gets an UPDATE every minute. Both tables receive values from the Arduino.
I can run each script separately, and it works fine. I can join the two scripts and that works too, BUT because I use cron, they both run at the SAME time interval (e.g., 5 minutes). If I run both scripts (5 minute interval and 1 minute interval) through cron, only one works. I think it has something to do with the timing of the open and close connections?
Are there any suggestions as to how I can get this to work? Ideally I would like 1 script to run every 5 minutes and the other every 5 seconds (cron can't do seconds).
I have developed a basic proxy tester in Python. Proxy IPs and ports, as well as their date_of_last_test (e.g. 31/12/2011 10:10:10) and result_of_last_test (OK or KO) are stored in a single SQLite table. (I realize I could store a lot more details on the tests results and keep an history / stats, but this simple model suits my needs).
Here is the simplified code of the tester main loop, where I loop over the proxies and update their status:
while True:
# STEP 1: select
myCursor.execute("SELECT * from proxy ORDER BY date_of_last_test ASC;")
row = myCursor.fetchone()
# STEP 2: update
if isProxyWorking(row['ip'], row['port']): # this test can last a few seconds
updateRow(row['ip'], row['port'], 'OK')
else:
updateRow(row['ip'], row['port'], 'KO')
My code works fine when run as a single process. Now, I would like to be able to run many processes of the program, using the same SQLite database file.
The problem with the current code is the lack of a locking mechanism that would prevent several processes from testing the same proxy.
What would be the cleanest way to put a lock on the row at STEP 1 / SELECT time, so that the next process doing a SELECT gets the next row ?
In other words, I'd like to avoid the following situation:
Let's say it's 10PM, and the DB contains 2 proxies:
Proxy A tested for the last time at 8PM and proxy B tested at 9PM.
I start two processes of the tester to update their statuses:
10:00 - Process 1 gets the "oldest" proxy to test it: A
10:01 -
Process 2 gets the "oldest" proxy to test it: !!! A !!! (here I'd
like Process 2 to get proxy B because A is already being tested -
though not updated yet in db)
10:10 - Testing of A by Process 1 is
over, its status is updated in DB
10:11 - Testing of A by Process 2 is
over, its status is updated (!!! AGAIN !!!) in DB
There is no actual error/exception in this case, but a waste of time I want to avoid.
SQlite only allows one process to update anything in the database at a time, From the FAQ
Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time,
and
When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY. You can adjust this behavior from C code using the sqlite3_busy_handler() or sqlite3_busy_timeout() API functions.
So if there only a few updates then this will work otherwise you need to change to a more capable database.
so there is only one lock which is on the whole database