Scriptable HTTP benchmark (preferable in Python)

Scriptable HTTP benchmark (preferable in Python) - python

I'm searching for a good way to stress test a web application. Basically I'm searching für something like ab with a scriptable interface. Ideally I want to define some tasks, that simulate different action on the webapp (register a account, login, search, etc.) and the tool runs a hole bunch of processes that executes these tasks*. As result I would like something like "average request time", "slowest request (per uri)", etc.
*: To be independed from the client bandwith I will run theses test from some EC2 instances so in a perfect world the tool will already support this - otherwise I will script is using boto.

If you're familiar with the python requests package, locust is very easy to write load tests in.
http://locust.io/
I've used it to write all of our perf tests in it.

You can maybe look onto these tools:
palb (Python Apache-Like Benchmark Tool) - HTTP benchmark tool with command line interface resembles ab.
It lacks the advanced features of ab, but it supports multiple URLs (from arguments, files, stdin, and Python code).
Multi-Mechanize - Performance Test Framework in Python
Multi-Mechanize is an open source framework for performance and load testing.
Runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service.
Can be used to generate workload against any remote API accessible from Python.
Test output reports are saved as HTML or JMeter-compatible XML.
Pylot (Python Load Tester) - Web Performance Tool
Pylot is a free open source tool for testing performance and scalability of web services.
It runs HTTP load tests, which are useful for capacity planning, benchmarking, analysis, and system tuning.
Pylot generates concurrent load (HTTP Requests), verifies server responses, and produces reports with metrics.
Tests suites are executed and monitored from a GUI or shell/console.
( Pylot on GoogleCode )
The Grinder
Default script language is Jython.
Pretty compact how-to guide.
Tsung
Maybe a bit unusual for the first use but really good for stress-testing.
Step-by-step guide.
+1 for locust.io in answer above.

I would recommend JMeter.
See: http://jmeter.apache.org/
You can setup JMeter as proxy of your browser to record actions like login and then stress test your web application. You can also write scripts to it.

Don't forget FunkLoad, it's very easy to use

Related

How to integrate BIRT with Python Django Project by using Py4j

Hi is there anyone who is help me to Integrate BIRT report with Django Projects? or any suggestion for connect third party reporting tools with Django like Crystal or Crystal Clear Report.

Some of the 3rd-party Crystal Reports viewers listed here provide a full command line API, so your python code can preview/export/print reports via subprocess.call()
The resulting process can span anything between an interactive Crystal Report viewer session (user can login, set/change parameters, print, export) and an automated (no user interaction) report printing/exporting.
While this would simplify your code, it would restrict deployment to Windows.

For prototyping, or if you don't mind performance, you can call from BIRT from the command line.
For example, download the POJO runtime and use the script genReport.bat (IIRC) to generate a report to a file (eg. PDF format). You can specify the output options and the report parameters on the command line.
However, the BIRT startup is heavy overhead (several seconds).
For achieving reasonable performance, it is much better to perform this only once.
To achieve this goal, there are at least two possible ways:
You can use the BIRT viewer servlet (which is included as a WAR file with the POJO runtime). So you start the servlet with a web server, then you use HTTP requests to generate reports.
This looks technically old-fashioned (eg. no JSON Requests), but it should work. However, I never used this approach.
The other option is to write your own BIRT server.
In our product, we followed this approach.
You can take the viewer servlet as a template for seeing how this could work.
The basic idea is:
You start one (or possibly more than one) Java process.
The Java process initializes the BIRT runtime (this is what takes some seconds).
After that, the Java process listens for requests somehow (we used a plain socket listener, but of course you could use HTTP or some REST server framework as well).
A request would contain the following information:
which module to run
which output format
report parameters (specific to the module)
possibly other data/metadata, e.g. for authentication
This would create a RunAndRenderTask or separate RunTask and RenderTasks.
Depending on your reports, you might consider returning the resulting output (e.g. PDF) directly as a response, or using an asynchronous approach.
Note that BIRT will happily create several reports at the same time - multi-threading is no problem (except for the initialization), given enough RAM.
Be warned, however, that you will need at least a few days to build a POC for this "create your own server" approach, and probably some weeks for prodction quality.
So if you just want to build something fast to see if the right tool for you, you should start with the command line approach, then the servlet approach and only then, and only if you find that the servlet approach is not quite good enough, you should go the "create your own server" way.
It's a pity that currently there doesn't seem to exist an open-source, production-quality, modern BIRT REST service.
That would make a really good contribution to the BIRT open-source project... (https://github.com/eclipse/birt)

Load testing an API in python for response time

I would like to do a Load test over an API I developed. I have tried the Jmeter which seems to be only providing the elapsed time and the latency
So is there a way I can do the same test(i.e sending over 100 POST request to the API) using python so I can get much better control over it

You cannot write the response time by configuring what has to be written to the file, but it can be done by right clicking the Response Times over time graph(which has to installed via the plugins)

JMeter provides whatever you "tell" it to provide, check out Results File Configuration documentation chapter to see what JMeter can store and what are the default values.
Once you have the results file with the metrics you want you can either analyze it yourself or generate HTML Reporting Dashboard from it.
If you're good in Python development you might want to try Locust tool which is Python-based and the workload is defined via locustfiles - Python scripts.
More information: JMeter vs. Locust - Which One Should You Choose?

How to run a python script on Azure for CSV file analysis

I have a python script on my local machine that reads a CSV file and outputs some metrics. The end goal is to create a web interface where the user uploads the CSV file and the metrics are displayed, while all being hosted on Azure.
I want to use a VM on Azure to run this python script.
The script takes the CSV file and outputs metrics which are stored in CosmosDB.
A web interface reads from this DB and displays graphs from the data generated by the script.
Can someone elaborate on the steps I need to follow to achieve this? Detailed steps are not essentially required, but a brief overview with links to relevant learning sources would be helpful.

There's an article that lists the primary options for hosting sites in Azure: https://learn.microsoft.com/en-us/azure/developer/python/quickstarts-app-hosting
As Sadiq mentioned, Functions is probably your best choice as it will probably be less expensive, less maintenance, and can handle both the script and the web interface. Here is a python tutorial for that method: https://learn.microsoft.com/en-us/azure/developer/python/tutorial-vs-code-serverless-python-01
Option 2 would be to run a traditional website on an App Service plan, with background tasks handled either by Functions or a Webjob- they both use the webjobs SDK, so the code is very similar: https://learn.microsoft.com/en-us/learn/paths/deploy-a-website-with-azure-app-service/
VMs are an option if either of those two don't work, but it comes with significantly more administration. This learning path has info on how to do this. The website is built on the MEAN stack, but is applicable to Python as well: https://learn.microsoft.com/en-us/learn/paths/deploy-a-website-with-azure-virtual-machines/

Can Scrapy be replaced by pyspider?

I've been using Scrapy web-scraping framework pretty extensively, but, recently I've discovered that there is another framework/system called pyspider, which, according to it's github page, is fresh, actively developed and popular.
pyspider's home page lists several things being supported out-of-the-box:
Powerful WebUI with script editor, task monitor, project manager and result viewer
Javascript pages supported!
Task priority, retry, periodical and
recrawl by age or marks in index page (like update time)
Distributed architecture
These are the things that Scrapy itself doesn't provide, but, it is possible with the help of portia (for Web UI), scrapyjs (for js pages) and scrapyd (deploying and distributing through API).
Is it true that pyspider alone can replace all of these tools? In other words, is pyspider a direct alternative to Scrapy? If not, then which use cases does it cover?
I hope I'm not crossing "too broad" or "opinion-based" line.

pyspider and Scrapy have the same purpose, web scraping, but a different view about doing that.
spider should never stop till WWW dead. (information is changing, data is updating in websites, spider should have the ability and responsibility to scrape latest data. That's why pyspider has URL database, powerful scheduler, #every, age, etc..)
pyspider is a service more than a framework. (Components are running in isolated process, lite - all version is running as service too, you needn't have a Python environment but a browser, everything about fetch or schedule is controlled by script via API not startup parameters or global configs, resources/projects is managed by pyspider, etc...)
pyspider is a spider system. (Any components can been replaced, even developed in C/C++/Java or any language, for better performance or larger capacity)
and
on_start vs start_url
token bucket traffic control vs download_delay
return json vs class Item
message queue vs Pipeline
built-in url database vs set
Persistence vs In-memory
PyQuery + any third package you like vs built-in CSS/Xpath support
In fact, I have not referred much from Scrapy. pyspider is really different from Scrapy.
But, why not try it yourself? pyspider is also fast, has easy-to-use API and you can try it without install.

Since I use both scrapy and pyspider, I would like to suggest the following:
If the website is really small / simple, try pyspider first since it has almost everything you need
Use webui to setup project
Try the online code editor and view parse result instantly
View the result easily in browser
Run/Pause the project
Setup the expiration date so it can re-process the url
However, if you tried pyspider and found it can't fit your needs, it's time to use scrapy.
- migrate on_start to start_request
- migrate index_page to parse
- migrate detail_age to detail_age
- change self.crawl to response.follow
Then you are almost done.
Now you can play with scrapy's advanced features like middleware, items, pipline etc.

Python Web Backend

I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!

If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.