I wrote a python script to send Data from a local DB via REST to Kafka.
My goal: I would like this script to run indefinitely, by either restarting in set intervals (i.e. every 5min) or whenever the DB gets new entries. I assume the set Intervals thing would be good enough, easier and safer.
Someone suggested to me to either run it via a cronjob and use a monitoring tool or do it using jenkins (which he considered better).
My Setting: I am not a DevOps engineer, and would like to know about the possibilities and risks setting this Script up. It would be no trouble to recreate the Script in Java if this improves the situation.
My Question: I did try to learn what jenkins is about and i think i understood the CI and CD part. But i don't see how this could help me with my goal. Can someone elaborate on this with some experience on this topic?
If you would suggest a cronjob, what are common methods or tools to monitor such a case? I think the main risks are, failing to send the data due to connection issues on the local machine to REST or the local DB or not beieng started properly at the specified time.
Jobs can be scheduled at regular intervals in Jenkins just like with cron, in fact it uses the same syntax. What's nice about scheduling the job via Jenkins, is that it's very easy to have it send an email if the job exits with a non-zero return code. I've moved all of my cron jobs into Jenkins and it's working well. So by running it via Jenkins you're covering the execution side and the monitoring side at the same time.
Related
So far when dealing with web scraping projects, I've used GAppsScript, meaning that I can easily trigger the script to be run once a day.
Is there an equivalent service when dealing with python scripts? I have a RaspberryPi, so I guess I can keep it on 24/7 and use cronjobs to trigger the script daily. But that seems rather wasteful, since I'm talking about a few small scripts that take only a few seconds to run.
Is there any service that allows me to trigger a python script once a day? (without a need to keep a local machine on 24/7) The simpler the solution the better, wouldn't want to overengineer such a basic use case if a ready-made system already exists.
The only service I've found so far to do this with is WayScript and here's a python example running in the cloud. The free tier that should be enough for most simple/hobby-tier usecases.
I have a number of very simple Python scripts that are constantly hitting API endpoints 24/7. The amount of data being pulled is very minimal and they all query the APIs every few seconds. My question is, is it okay to run multiple simple scripts in AWS Lightsail using tmux on a single core Lightsail instance or is it better practice to create a new instance for each Python script?
I don't find any limits mentioned in LightSail for your use case. As long as the end-points are owned by you or you don't get blocked for hitting them continuously, all seems good.
https://aws.amazon.com/lightsail/faq/
You can also set some alarms on Lightsail instance usage to know if you've hit any limits.
I am running a Python script on Azure VM. This code is on a continuous while loop and is designed to never stop. If the VM resets or the program randomly stops I have no way to know that. How can I make a logic app that will tell me if the program stops?
I would like to receive an email notifying me.
So, what you're asking isn't really the right way to do this.
You should develop and deploy your app so that it's enabled/run by Windows either in a VM as you have now or an Azure App Service.
Meaning, build and deploy it so it can just restart after a crash rather than worrying about constantly checking it. Of course, it need to run reliably as well.
And again, Azure Services don't just randomly crash so that's really one of the last edge cases you should be concerned about.
I have a Python script that can really eat some CPU and memory. Thus, I figure the most efficient way to run the script is to containerize it in a Docker Container. The script is not meant to run forever. Rather, it gets dependency information from environment variables, does it's behavior and then terminates. Once the script is over, by default Docker will remove the container from memory.
This is good. I am only paying for computing resource while the script is being run.
My problem is this: I have a number of different types of scripts I can run. What I want to do is create a manager that, given the name of a script type to run, gets the identified container to run in Google Container Engine in such as way that the invocation is configured to use a predefined CPU, disk and memory allocation envirnoment that is intended to run the script as fast as possible.
Then, once the script finishes, I want the container removed from the environment so that I am no longer paying for the resource. In other words I want to be able to do in an automated manner in Container Engine what I can do manually from my local machine at the command line.
I am trying to learn how to get Container Engine to support my need in an automated manner. It seems to me that using Kubernetes might be a bit of an overkill in that I do not really want to guarantee constant availability. Rather, I just want the container to run and die. If for some reason the script fails or terminated before success, the archtecture is designed to detect the unsuccesful attempt.
You could use a Kubernetes Controller to create a job object that 'runs to completion'.
A job object such as this can be used to run a single pod.
Once the job (in this case your script) has completed, the pod is terminated and will therefore no longer use any resources. The pod wouldn't be deleted (unless the job is deleted) but will remain in a terminated state. If required and configured correctly, no more pods will be created.
The job object can also be configured to start a new pod should the job fail for any reason should you require this functionality.
For more detailed information on this please see this page.
Also just to add, to keep your billing to a minimum, you could reduce the number of nodes in the cluster down to zero when you are not running the job, and then increase it to the required number when the jobs need to be executed. This could be done programmatically by making use of API calls if required. This should ensure your billing is kept as low as possible as you will only be billed for the nodes when they are running.
Trying to get some initial bearings on useful processes that a basic working knowledge of python can assist with or make less tedious. Specifically, processes that can be executed on the command line in a Linux environment. An example or two of both the tedious process as well as sample code to use as a starting point would be greatly appreciated.
What you want to automate depends on what you are doing manually and what your role is ? If you are a system administrator (say) and if you have shell scripts written to automate some of the tasks (like server management, user account creation etc.) you can port them to Python.