How to mimize download time for a single .jpg file download? - python

I am attempting to download a small image file (e.g. https://cdn4.telesco.pe/file/some_long_string.jpg) in the shortest possible time.
My machine pings 200ms to the server, but I'm unable to achieve better than 650ms.
What's the science behind fast-downloading of a single file? What are the factors? Is a multipart download possible?
I find many resources for parallelizing downloads of multiple files, but nothing on optimizing for download-time on a single file.

It is not so easy to compare those two types of response time...
The commandline "machine ping" is a much more "lowlevel" and fast type of response in the network architecture between two devices, computers or servers.
With a python-script that asks for a file on a remote webserver you have much more "overhead" in the request where every instance consumes some milliseconds, like the speed of your local python runtime, the operating system from you and the remote server (win/osx/linux), the used webserver and its configuration (apache/iis/ngix) etc.

Related

Transferring files to a fast computer on the network and queuing the files for rendering

The task I want to accomplish is to send a copy of the opened file, transfer it to a location on the server, and for the fast render farm pc to open it, render the file, then close itself, essentially dumping all hardware intensive tasks onto one computer.
I also want to make sure that only one file is rendered/opened at a time.
What do I need to know to accomplish this ? How would you go about this ? It's about Maya Batch Rendering(.ma) as well as Nuke files (.nk)
You can try using socket library(pre-installed) and the flask library. With them you can enstablish a connection between two or more pcs.
For Flask here is a site that can help you
https://pythonbasics.org/flask-upload-file/#:~:text=It%20is%20very%20simple%20to,it%20to%20the%20required%20location.
For Socket here is another site
https://www.thepythoncode.com/article/send-receive-files-using-sockets-python
And I tou search on google or youtube you can find mano tutorial about it

How to stream a very large file to Dropbox using python v2 api

Background
I finally convinced someone willing to share his full archival node 5868GiB database for free (which now requires to be built in ram and thus requires 100000$ worth of ram in order to be built but can be run on an ssd once done).
However he want to send it only through sending a single tar file over raw tcp using a rather slow (400Mps) connection for this task.
I m needing to get it on dropbox and as a result, he don’t want to use the https://www.dropbox.com/request/[my upload key here] allowing to upload files through a web browser without a dropbox account (it really annoyed him that I talked about using an other method or compressing the database to the point he is on the verge of changing his mind about sharing it).
Because on my side, dropbox allows using 10Tib of storage for free during 30 days and I didn’t receive the required ssd yet (so once received I will be able to download it using a faster speed).
The problem
I m fully aware of upload file to my dropbox from python script but in my case the file doesn t fit into a memory buffer not even on disk.
And previously in api v1 it wasn t possible to append data to an exisiting file (but I didn t find the answer for v2).
To upload a large file to the Dropbox API using the Dropbox Python SDK, you would use upload sessions to upload it in pieces. There's a basic example here.
Note that the Dropbox API only supports files up to 350 GB though.

Is it possible to perform real-time communication with a Google Compute Engine instance?

I would like to run a program on my laptop (Gazebo simulator) and send a stream of image data to a GCE instance, where it will be run through an object-detection network and sent back to my laptop in near real-time. Is such a set-up possible?
My best idea right now is, for each image:
Save the image as a JPEG on my personal machine
Stream the JPEG to a Cloud Storage bucket
Access the storage bucket from my GCE instance and transfer the file to the instance
In my python script, convert the JPEG image to numpy array and run through the object detection network
Save the detection results in a text file and transfer to the Cloud Storage bucket
Access the storage bucket from my laptop and download the detection results file
Convert the detection results file to a numpy array for further processing
This seems like a lot of steps, and I am curious if there are ways to speed it up, such as reducing the number of save and load operations or transporting the image in a better format.
If your question is "is it possible to set up such a system and do those actions in real time?" then I think the answer is yes I think so. If your question is "how can I reduce the number of steps in doing the above" then I am not sure I can help and will defer to one of the experts on here and can't wait to hear the answer!
I have implemented a system that I think is similar to what you describe for research of Forex trading algorithms (e.g. upload data to storage from my laptop, compute engine workers pull the data and work on it, post results back to storage and I download the compiled results from my laptop).
I used the Google PubSub architecture - apologies if you have already read up on this. It allows near-realtime messaging between programs. For example you can have code looping on your laptop that scans a folder that looks out for new images. When they appear it automatically uploads the files to a bucket and once theyre in the bucket it can send a message to the instance(s) telling them that there are new files there to process, or you can use the "change notification" feature of Google Storage buckets. The instances can do the work, send the results back to the storage and send a notification to the code running on your laptop that work is done and results are available for pick-up.
Note that I set this up for my project above and encountered problems to the point that I gave up with PubSub. The reason was that the Python Client Library for PubSub only supports 'asynchronous' message pulls, which seems to mean that the subscribers will pull multiple messages from the queue and process them in parallel. There are some features to help manage 'flow control' of messages built into the API, but even with them implemented I couldn't get it to work the way I wanted. For my particular application I wanted to process everything in order, one file at a time because it was important to me that I'm clear what the instance is doing and the order its doing it in. There are several threads on google search, StackOverflow and Google groups that discuss workarounds for this using queues, classes, allocating specific tasks for specific instances, etc which I tried, but even these presented problems for me. Some of these links are:
Run synchronous pull in PubSub using Python client API and pubsub problems pulling one message at a time and there are plenty more if you would like them!
You may find that if the processing of an image is relatively quick, order isn't too important and you don't mind an instance working on multiple things in parallel that my problems don't really apply to your case.
FYI, I ended up just making a simple loop on my 'worker instances' that scans the 'task list' bucket every 30 seconds or whatever to look for new files to process, but obviously this isn't quite the real-time approach that you were originally looking for. Good luck!

Linux program to take newest ftp file and send to other ftp server

I was wondering if it was possible to take the newest files uploaded to an ftp server and send them to another ftp server. BUT, every file can only be sent once. If you can do this in python that would be nice, I know intermediate python. EXAMPLE:
2:14 PM file.txt is uploaded to the server. the program takes the file and sensd it to another server.
2:15 PM example.txt is uploaded to the server. the program takes just that file and sends it to another server.
I have searched online for this but cant find anything. Please help!
As you said that you already know python, I will give you some conceptual hints. Basically, you are looking for a one-way synchronisation. The main problem with this task is to make your program detect new files. The simplest way to do this is to create a database (note that by database I mean a way of storing data, not necessarly a specialized database). For example, a text file. In this database, each file will be recorded. Periodically, check the database with the current files (the basic ls or something similar will do). If a new file appears (meaning that there are files that are not in database), upload them.
This is the basic idea. You can improve it by using multi threading, some checks if a file has modified and so on.
EDIT: This is a programming way. As it has been suggested in comments, there are also some software solutions that will do this for you.

Port desktop to web application (bioinformatic)

I want to port a few bioinformatic programs which I wrote for Windows OS to web applications. I'm using a few bioinformatic packages like BLAST, Bowtie or Primer3. These external tools usually take a file which the user provides, processes it and creates an output file which I parse and display. In addition these tools are using specific databases, which are created and reused by the user.
Up to now I was saving the databases created by the tools (the file is also provided by the user) and the output results on the PC where my software is installed. Now, I do not know how to handle such a setup on a web server. I cannot save all the databases created by the users from all over the world, but at the same time it is quite nasty to create a database again every time (e.g. the human genome db is 2.7 GB and takes some time to create it) when the user comes back (I guess one user creates about 5-10 databases per tool; I have 3 tools: 1 MB - 50 GB).
How can this problem be solved with web apps?
Edit
To make things more clear, I want actually only to know whether there is a more sophisticated way to reuse data which the user creates. I was thinking about to store those files temporally for a session. There is no possibility to ask for charging because those tools are quite specific and I don't have many users. In addition most users are close colleagues. After years fighting with different OS, debugging and maintaining my programs, I finally give up (I do this in my private time), it is simply to time consuming (in addition I have some request for Linux, Android and IOS).
Thanks

Categories