I'm operating in a factory setting where speed is important. I store order information in the cloud. Barcodes are printed by querying the database for information. Operators use a tkinter app on a raspberry pi that runs a python script to query the cloud.
Currently printing out a barcode takes about 5 seconds to make that query and then use os.system() to print out the barcode.
Is there a faster way to send jobs to the printer?
I've been looking into storing files locally to speed this process up, does anyone have any of what to look into? Network attached storage that downloads relevant files from the cloud nightly?
Any suggestions for running modern factory automation with python?
checkout subprocess.Popen(), which is a flexible/lite version of os.system()
Related
I am attempting to download a small image file (e.g. https://cdn4.telesco.pe/file/some_long_string.jpg) in the shortest possible time.
My machine pings 200ms to the server, but I'm unable to achieve better than 650ms.
What's the science behind fast-downloading of a single file? What are the factors? Is a multipart download possible?
I find many resources for parallelizing downloads of multiple files, but nothing on optimizing for download-time on a single file.
It is not so easy to compare those two types of response time...
The commandline "machine ping" is a much more "lowlevel" and fast type of response in the network architecture between two devices, computers or servers.
With a python-script that asks for a file on a remote webserver you have much more "overhead" in the request where every instance consumes some milliseconds, like the speed of your local python runtime, the operating system from you and the remote server (win/osx/linux), the used webserver and its configuration (apache/iis/ngix) etc.
I'm fairly new to Django. I am creating an application where I am posting images from flutter application to Django REST API. I need to run a python script with the input as the image that gets posted in the API.
Does anyone have any idea about this?
The best way to handle this is a job management system (e.g. Slurm, Torque, and Oracle Grid Engine) as you can create and submit a job with every uploaded image and send the response back to the user and the job management system will process independently from the request. Celery can also work if the job won't take much time.
A simple implementation that scales well:
Upload the images in a directory uploaded
Have your script run as a daemon (controlled by systemd) looking for new files in directory uploaded
whenever it finds a new file, it moves mv it to a directory working (that way, you can run multiple instances of your script in parallel to scale up)
once your script us done with the image, it moves it to a directory finished (or where-ever you need the finished images).
That setup is very simple, and works on both a small one-machine setups with low traffic, as well as on multi-machine setups with dedicated storage and multiple worker machines that handle the image transform jobs.
It also decouples your image processing from your web backend.
I have about 100 million json files (10 TB), each with a particular field containing a bunch of text, for which I would like to perform a simple substring search and return the filenames of all the relevant json files. They're all currently stored on Google Cloud Storage. Normally for a smaller number of files I might just spin up a VM with many CPUs and run multiprocessing via Python, but alas this is a bit too much.
I want to avoid spending too much time setting up infrastructure like a Hadoop server, or loading all of that into some MongoDB database. My question is: what would be a quick and dirty way to perform this task? My original thoughts were to set up something on Kubernetes with some parallel processing running Python scripts, but I'm open to suggestions and don't really have a clue how to go about this.
Easier would be to just load the GCS data into Big Query and just run your query from there.
Send your data to AWS S3 and use Amazon Athena.
The Kubernetes option would be set up a cluster in GKE and install Presto in it with a lot of workers, use a hive metastore with GCS and query from there. (Presto doesn't have direct GCS connector yet, afaik) -- This option seems more elaborate.
Hope it helps!
I would like to run a program on my laptop (Gazebo simulator) and send a stream of image data to a GCE instance, where it will be run through an object-detection network and sent back to my laptop in near real-time. Is such a set-up possible?
My best idea right now is, for each image:
Save the image as a JPEG on my personal machine
Stream the JPEG to a Cloud Storage bucket
Access the storage bucket from my GCE instance and transfer the file to the instance
In my python script, convert the JPEG image to numpy array and run through the object detection network
Save the detection results in a text file and transfer to the Cloud Storage bucket
Access the storage bucket from my laptop and download the detection results file
Convert the detection results file to a numpy array for further processing
This seems like a lot of steps, and I am curious if there are ways to speed it up, such as reducing the number of save and load operations or transporting the image in a better format.
If your question is "is it possible to set up such a system and do those actions in real time?" then I think the answer is yes I think so. If your question is "how can I reduce the number of steps in doing the above" then I am not sure I can help and will defer to one of the experts on here and can't wait to hear the answer!
I have implemented a system that I think is similar to what you describe for research of Forex trading algorithms (e.g. upload data to storage from my laptop, compute engine workers pull the data and work on it, post results back to storage and I download the compiled results from my laptop).
I used the Google PubSub architecture - apologies if you have already read up on this. It allows near-realtime messaging between programs. For example you can have code looping on your laptop that scans a folder that looks out for new images. When they appear it automatically uploads the files to a bucket and once theyre in the bucket it can send a message to the instance(s) telling them that there are new files there to process, or you can use the "change notification" feature of Google Storage buckets. The instances can do the work, send the results back to the storage and send a notification to the code running on your laptop that work is done and results are available for pick-up.
Note that I set this up for my project above and encountered problems to the point that I gave up with PubSub. The reason was that the Python Client Library for PubSub only supports 'asynchronous' message pulls, which seems to mean that the subscribers will pull multiple messages from the queue and process them in parallel. There are some features to help manage 'flow control' of messages built into the API, but even with them implemented I couldn't get it to work the way I wanted. For my particular application I wanted to process everything in order, one file at a time because it was important to me that I'm clear what the instance is doing and the order its doing it in. There are several threads on google search, StackOverflow and Google groups that discuss workarounds for this using queues, classes, allocating specific tasks for specific instances, etc which I tried, but even these presented problems for me. Some of these links are:
Run synchronous pull in PubSub using Python client API and pubsub problems pulling one message at a time and there are plenty more if you would like them!
You may find that if the processing of an image is relatively quick, order isn't too important and you don't mind an instance working on multiple things in parallel that my problems don't really apply to your case.
FYI, I ended up just making a simple loop on my 'worker instances' that scans the 'task list' bucket every 30 seconds or whatever to look for new files to process, but obviously this isn't quite the real-time approach that you were originally looking for. Good luck!
The main purpose of my python script is to parse website, and then save results as html ot txt file on server. And also I want this script to repeat this operation every 15 minutes without my action.
Google App Engine doesn't allow to save files on server, instead I should use DataBase. Is it real to save txt or HTML in DB? And how to make script running without stopping?
Thanks for helping in advance
You are correct in saying that it is impossible to save files directly to the server. Your only option is the datastore as you say. The data type best suited to you is probably the "Text string (long)", however you are limited to 1mb. See https://developers.google.com/appengine/docs/python/datastore/entities for more information.
Regarding the scheduling, you are looking for Cron jobs. You can setup a Cron job to run at any configurable period. See https://developers.google.com/appengine/docs/python/config/cron for details describing how Cron jobs work.