Proxy Server that supports smart pipelining/multiplexing

Proxy Server that supports smart pipelining/multiplexing - python

So, I want to develop a proxy server that when contacted checks the size of what it will be downloading to proxy (using head most likely) and if it's over a set size it splits the download of the request via pipelining and using Range into generally good sized (1 megabyte or possibly using a config file) segments. Then as it downloads it and rotates the pipes I want it to feed back to it's client what it gets (in order), so that if it is say a stream of media it will be able to play it easily. The goal is to split too large ones into pipelines and the smaller ones to leave them alone. I am sorta unsure where to start. I did find other proxy servers (polipo) that could do pipelining/multiplexing as mentioned but none worked as outlined above. So A. does anything exist that does it and B. how would i get started? (I would prefer to work in python if possible)

I would look at twisted http://twistedmatrix.com/trac/ it is an great event based networking library for python. It takes a bit of getting used to, but it does this kind of thing very well.

Related

How to integrate BIRT with Python Django Project by using Py4j

Hi is there anyone who is help me to Integrate BIRT report with Django Projects? or any suggestion for connect third party reporting tools with Django like Crystal or Crystal Clear Report.

Some of the 3rd-party Crystal Reports viewers listed here provide a full command line API, so your python code can preview/export/print reports via subprocess.call()
The resulting process can span anything between an interactive Crystal Report viewer session (user can login, set/change parameters, print, export) and an automated (no user interaction) report printing/exporting.
While this would simplify your code, it would restrict deployment to Windows.

For prototyping, or if you don't mind performance, you can call from BIRT from the command line.
For example, download the POJO runtime and use the script genReport.bat (IIRC) to generate a report to a file (eg. PDF format). You can specify the output options and the report parameters on the command line.
However, the BIRT startup is heavy overhead (several seconds).
For achieving reasonable performance, it is much better to perform this only once.
To achieve this goal, there are at least two possible ways:
You can use the BIRT viewer servlet (which is included as a WAR file with the POJO runtime). So you start the servlet with a web server, then you use HTTP requests to generate reports.
This looks technically old-fashioned (eg. no JSON Requests), but it should work. However, I never used this approach.
The other option is to write your own BIRT server.
In our product, we followed this approach.
You can take the viewer servlet as a template for seeing how this could work.
The basic idea is:
You start one (or possibly more than one) Java process.
The Java process initializes the BIRT runtime (this is what takes some seconds).
After that, the Java process listens for requests somehow (we used a plain socket listener, but of course you could use HTTP or some REST server framework as well).
A request would contain the following information:
which module to run
which output format
report parameters (specific to the module)
possibly other data/metadata, e.g. for authentication
This would create a RunAndRenderTask or separate RunTask and RenderTasks.
Depending on your reports, you might consider returning the resulting output (e.g. PDF) directly as a response, or using an asynchronous approach.
Note that BIRT will happily create several reports at the same time - multi-threading is no problem (except for the initialization), given enough RAM.
Be warned, however, that you will need at least a few days to build a POC for this "create your own server" approach, and probably some weeks for prodction quality.
So if you just want to build something fast to see if the right tool for you, you should start with the command line approach, then the servlet approach and only then, and only if you find that the servlet approach is not quite good enough, you should go the "create your own server" way.
It's a pity that currently there doesn't seem to exist an open-source, production-quality, modern BIRT REST service.
That would make a really good contribution to the BIRT open-source project... (https://github.com/eclipse/birt)

Sending data to Django backend from RaspberryPi Sensor (frequency, bulk-update, robustness)

I’m currently working on a Raspberry Pi/Django project slightly more complex that i’m used to. (i either do local raspberry pi projects, or simple Django websites; never the two combined!)
The idea is two have two Raspberry Pi’s collecting information running a local Python script, that would each take input from one HDMI feed (i’ve got all that part figured out - I THINK) using image processing. Now i want these two Raspberry Pi’s (that don’t talk to each other) to connect to a backend server that would combine, store (and process) the information gathered by my two Pis
I’m expecting each Pi to be working on one frame per second, comparing it to the frame a second earlier (only a few different things he is looking out for) isolate any new event, and send it to the server. I’m therefore expecting no more than a dozen binary timestamped data points per second.
Now what is the smart way to do it here ?
Do i make contact to the backend every second? Every 10 seconds?
How do i make these bulk HttpRequests ? Through a POST request? Through a simple text file that i send for the Django backend to process? (i have found some info about “bulk updates” for django but i’m not sure that covers it entirely)
How do i make it robust? How do i make sure that all data what successfully transmitted before deleting the log locally ? (if one call fails for a reason, or gets delayed, how do i make sure that the next one compensates for lost info?
Basically, i’m asking advise for making a IOT based project, where a sensor gathers bulk information and want to send it to a backend server for processing, and how should that archiving process be designed.
PS: i expect the image processing part (at one fps) to be fast enough on my Pi Zero (as it is VERY simple); backlog at that level shouldn’t be an issue.
PPS: i’m using a django backend (even if it seems a little overkill)
a/ because i already know the framework pretty well
b/ because i’m expecting to build real-time performance indicators from the combined data points gathered, using django, and displaying them in (almost) real-time on a webpage.
Thank you very much !

This partly depends on just how resilient you need it to be. If you really can't afford for a single update to be lost, I would consider using a message queue such as RabbitMQ - the clients would add things directly to the queue and the server would pop them off in turn, with no need to involve HTTP requests at all.
Otherwise it would be much simpler to just POST each frame's data in some serialized format (ie JSON) and Django would simply deserialize and iterate through the list, saving each entry to the db. This should be fast enough for the rate you describe - I'd expect saving a dozen db entries to take significantly less than half a second - but this still leaves the problem of what to do if things get hung up for some reason. Setting a super-short timeout on the server will help, as would keeping the data to be posted until you have confirmation that it has been saved - and creating unique IDs in the client to ensure that the request is idempotent.

Timing application and exact timestamps from server - serverless?

I would like developers to give a comment about :) thank you
I am building an application that needs an exact timestamp for multiple devices at the same time. Device time cannot be used, because they are not at the same time.
Asking time from the server is Ok but it is slow and depend on connection speed, and if making this as serverless/regioness then serverside timestamp cannot be used because of time-zones?
On demo application, this works fine with one backend at pointed region but still it needs to respond faster to clients.
Here is simply image to see this in another way?
In the image, there are no IDs etc but the main idea is there
I am planning to build this on nodejs server and later when more timing calculations translate it to python/Django pack...
Thank you

Using server time is perfectly fine and timezone difference due to different regions can be easily adjusted in code. If you are using a cloud provider, they typically group their services by region instead of distributing them worldwide, so adjusting the timezone shouldn't be a problem. Anyway, check the docs for the cloud service you are planning to use, they usually clearly document the geographic distribution.
Alternatively, you could consider implementing (or using) Network Time Protocol (NTP) on your devices. It's a fairly simple protocol for synchronizing clocks, if you need a source code example take a look at the source for node-ntp-client , a javascript implementation.

How to get filtered VM list on vcenter with python

I'm trying to get a list of registered virtual machines - by their name - in my vcenter. The problem that I have many vms (~5K), and I am doing it a lot of times (O(1000)/hour).
The SDKs I'm using causing it a lot of traffic (1-2MB/request):
pysphere: which ask for all vms, and filters on client side.
pyVmomi, which need to use recursion to list all vms (I saw SI.content.searchIndex.FindByDnsName on reboot_vm.py, but my machines' DNS configuration is not true)
Looking into SOAP documentation didn't help (got into RetrievePropertiesEx.objectSet but it doesn't looks to filter anything), and the new REST (v6.5) didn't help too (since I need to get its "datastore path", and all I can get is the name)

Have you tried a using a property collector, as in pyVmomi.vmodl.query.PropertyCollector.FilterSpec?
The pyVmomi Community Samples contain examples of using this API, such as
https://github.com/vmware/pyvmomi-community-samples/blob/master/samples/filter_vms.py.

vAPI is based on REST, even more, it does network requests every time, so it will be slow. Another way is to use VCDB, where are stored all VMs (there us such one, but I can not help you here more), see https://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.installclassic.doc_41/install/prep_db/c_preparing_the_vmware_infrastructure_databases.html or https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1025914

Twisted FTPFileListProtocol and file names with spaces

I am using Python and the Twisted framework to connect to an FTP site to perform various automated tasks. Our FTP server happens to be Pure-FTPd, if that's relevant.
When connecting and calling the list method on an FTPClient, the resulting FTPFileListProtocol's files collection does not contain any directories or file names that contain a space (' ').
Has anyone else seen this? Is the only solution to create a sub-class of FTPFileListProtocol and override its unknownLine method, parsing the file/directory names manually?

Firstly, if you're performing automated tasks on a retrieived FTP listing then you should probably be looking at NLST rather than LIST as noted in RFC 959 section 4.1.3:
NAME LIST (NLST)
...
This command is intended to return information that
can be used by a program to further process the
files automatically.
The Twisted documentation for LIST says:
It can cope with most common file listing formats.
This make me suspicious; I do not like solutions that "cope". LIST was intended for human consumption not machine processing.
If your target server supports them then you should prefer MLST and MLSD as defined in RFC 3659 section 7:
7. Listings for Machine Processing (MLST and MLSD)
The MLST and MLSD commands are intended to standardize the file and
directory information returned by the server-FTP process. These
commands differ from the LIST command in that the format of the
replies is strictly defined although extensible.
However, these newer commands may not be available on your target server and I don't see them in Twisted. Therefore NLST is probably your best bet.
As to the nub of your problem, there are three likely causes:
The processing of the returned results is incorrect (Twisted may be at fault, as you suggest, or perhaps elsewhere)
The server is buggy and not sending a correct (complete) response
The wrong command is being sent (unlikely with straight NLST/LIST, but some servers react differently if arguments are supplied to these commands)
You can eliminate (2) and (3) and prove that the cause is (1) by looking at what is sent over the wire. If this option is not available to you as part of the Twisted API or the Pure-FTPD server logging configuration, then you may need to break out a network sniffer such as tcpdump, snoop or WireShark (assuming you're allowed to do this in your environment). Note that you will need to trace not only the control connection (port 21) but also the data connection (since that carries the results of the LIST/NLST command). WireShark is nice since it will perform the protocol-level analysis for you.
Good luck.

This is somehow expected. FTPFileListProtocol isn't able to understand every FTP output, because, well, some are wacky. As explained in the docstring:
If you need different evil for a wacky FTP server, you can
override either C{fileLinePattern} or C{parseDirectoryLine()}.
In this case, it may be a bug: maybe you can improve fileLinePattern and makes it understand filename with spaces. If so, you're welcome to open a bug in the Twisted tracker.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.