Testing Adobe Analytics Instrumentation with Python

Testing Adobe Analytics Instrumentation with Python - python

I'm attempting to automate tests of Adobe Analytics (aka Omniture) instrumentation of a web app by implementing test scripts with the Selenium Python package.
If correctly instrumented, HTTP requests are made from the browser with certain expected query parameters. Is there a Python package that would allow me to capture those outgoing HTTP requests? Right now, we do it manually with the Chrome dev tools in the Network -> Images section.
This application is also available as a native app across nearly twenty other platforms (including Smart TVs and game consoles), and I'll need to perform similar tests across those. Although, unfortunately, I won't be able to automate the script, I'd still like to capture and store the HTTP calls. I'm currently using HTTPScoop to do this manually.
I'm most comfortable with Python, but if there's a simple way of doing this in another language, I'm all ears.

I was recently working on a similar task so I can share my experience and what I've learnt on the way (rather than give you the solution).
First you need to run a proxy on your machine (e.g. http://bmp.lightbody.net/). Then I needed to run manually a few commands ( https://github.com/lightbody/browsermob-proxy#rest-api). Once the proxy was running I wrote a small script following example here https://github.com/lightbody/browsermob-proxy#using-with-selenium. Finally you simply loop over the har entries as captured on the proxy and check if an analytics request is present (you can check for URL params if needed).
I have this ready in form of a unit test for FF and Chrome (for a given URL). To be able to run this test on different devices/OS/platforms one would probably need to run the code through selenium remote webdriver https://code.google.com/p/selenium/wiki/RemoteWebDriver using service like https://www.browserstack.com/ in the cloud. I contacted them but they don't have any documentation ready but suggested I refer to online resources. That's where I am now.
Hope it helps

Related

Python requests, Kerberos and NTLM

I've been recently working on a project in which I need to access a asp.net web API in order to get some data. The way I've been gaining access to this API so far is by manually setting the cookies manually within the code and then using requests to get the information that I need. My task now is to automate this process. I get the cookies by using the Chrome developer tools, in the network tab. Now obviously the cookies change every once in a while so I've been trying to make something that will automatically change the cookies inside.
I should mention that the network at which this is being done is air-gaped and getting python libraries inside is kind of tedious, so I am trying to avoid that. It is also the reason why getting code examples here is very complicated.
The way the log-in process works in this web app is as follows (data from chrome dev tools):
Upon entering the URL there are a bunch of redirects which seem to do nothing.
A request is made to /login.aspx which returns a "set-cookie: 'sessionId=xyz'" header and redirects to /LandingPage.aspx
A request is made to /LandingPage.aspx with said cookie which returns a "set-cookie" header with a bunch of cookies (ASP.NET etc'). These are the cookies that I need in order to make the python script access the API.
What's written above is the browser way of doing things, when I try to imitate this in python requests, I get the first cookie from /login.aspx but when it redirects to /LandingPage.aspx, I get a 401 Unauthorized with the following headers:
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
After having done some reading I understood that these response headers are related to NTLM and Kerberos protocols (side question: if it responds with both headers does it mean that I need to provide both authentications or that either one will suffice?).
Quick google search yielded that after these mentioned responses should follow a request with the Kerberos/NTLM token (which I have no idea how to acquire) in order to get a 200 response. I find this pretty weird considering the browser doesn't make any of these requests and the web app just gives it the cookies without it seemingly transferring any NTLM or Kerberos data.
I've thought of a few ways to overcome this and hopefully you could help me figure out whether this would work.
Trying to get the requests-kerberos or requests-ntlm libraries for python and using those to overcome this problem. I would like your opinion to whether this would work. I am reluctant to use this method though, because of what was mentioned above.
Somehow using PowerShell to get these tokens and then somehow using these tokens in python requests without the above mentioned libraries. But I have no idea if this would work either.
I would very much appreciate anyone who could maybe further explain the process that's happening here in general, and of course would greatly appreciate any help with solving this.
Thank you very much!

Trying to get the requests-kerberos or requests-ntlm libraries for python and using those to overcome this problem. I would like your opinion to whether this would work. I am reluctant to use this method though, because of what was mentioned above.
Yes, requests-kerberos would work. HTTP Negotiate means Kerberos almost 100% of the time.
For Linux I'd slightly prefer requests-gssapi, which is based on a more maintained 'gssapi' backend, but at the moment it's limited to Unix-ish systems only – while requests-kerberos has the advantage of supporting Windows through the 'winkerberos' backend. But it doesn't really matter; both will do the job fine.
Don't use NTLM if you can avoid it. Your domain admins will appreciate being able to turn off NTLM domain-wide as soon as they can.
Somehow using PowerShell to get these tokens and then somehow using these tokens in python requests without the above mentioned libraries. But I have no idea if this would work either.
Technically it's possible, but doing this via PowerShell (or .NET in general) is going the long way around. You can achieve exactly the same thing using Python's sspi module, which talks directly to the actual Windows SSPI interface that handles Kerberos ticket acquisition (and NTLM, for that matter).
(The gssapi module is the Linux equivalent, and the spnego module is a cross-platform wrapper around both.)
You can see a few examples here – OP has a .NET example, the answer has Python.
But keep in mind that Kerberos tokens contain not only the service ticket but also a one time use authenticator (to prevent replay attacks), so you need to get a fresh token for every HTTP request.
So don't reinvent the wheel and just use requests-kerberos, which will automatically call SSPI to get a token whenever needed.
it says that in order for requests-kerberos to work there has to be a TGT cached already on the PC. This program is supposed to run for weeks without being interfered with and to my understanding these tickets expire after about 10 hours.
That's typical for all Kerberos use, not just requests-kerberos specifically.
If you run the app on Windows, from an interactive session, then Windows will automatically renew Kerberos tickets as needed (it keeps your password cached in LSA memory for that purpose). However, don't run long-term tasks in interactive sessions...
If you run the app on Windows, as a service, then it will use the "machine credentials" aka "computer account" (see details), and again LSA will keep the tickets up-to-date.
If you run the app on Linux, then you can create a keytab that stores the client credentials for the application. (This doesn't need domain admin rights, you only need to know the app account's password.)
On Linux there are at least 4 different ways to use a keytab for long-term jobs: k5start (third-party, but common); KRB5_CLIENT_KTNAME (built-in to MIT Kerberos, but only in recent versions); gss-proxy (from RedHat, might already be part of the OS); or a basic cronjob that just re-runs kinit to acquire new tickets every 4-6 hours.
I find this pretty weird considering the browser doesn't make any of these requests and the web app just gives it the cookies without it seemingly transferring any NTLM or Kerberos data.
It likely does, you might be overlooking it.
Note that some SSO systems use JavaScript to dynamically probe for whether the browser has Kerberos authentication properly set up – if the main page really doesn't send a token, then it might be an iframe or an AJAX/XHR request that does.

python webbrowsing automation using proxy

I'm coding some automation tool for a specific web site, and having some problems. The web site needs to access (such as pushing buttons) within a browser to get JSON format responses.
(I'm familiar with Python but not regarding network traffic such things. and sorry for my poor explanation, English is not my first language)
i have to listen to JSON format response from the web site, AFAIK, local proxy(127.0.0.1) is needed to fetch the traffic. I've found a code(http://luugiathuy.com/2011/03/simple-web-proxy-python/) that fetches data from port 80 (HTTP). however, the code needs to change network setting of my PC. is there any way not to change whole pc's proxy setting to get traffic data? it slows my pc. i want to run this code "independently?". i've tried to emulate independent web browser to handle this but i had hard time to figure out setting independent local proxy.
followed by question #1, the website needs some button action triggered by mouse. as i mentioned above(independently operatable), it has not to interfere actual mouse. is there any library i can use for this purpose? i've tried to create some "virtual mouse" to achieve this goal but sadly failed.
i have more detail questions but shorten down to most crucial ones.

How can I create a application to login into another website with no API

I need to create an application, which logs into a website (username/pwd) with my credentials but this website has no API or authentication protocol (its not been updated since 1998, but I need data from it continuously).
Is there anyway to do this? Preferably in python but can use any language or tools.
I have been searching google but most people have APIs to work with.

As stated in the comments, you can use the Python Selenium bindings to get this set up fairly painlessly.
Another option is the Mechanize family of tools (Python's is http://wwwsearch.sourceforge.net/mechanize/)
If you want a less heavyweight solution (that doesn't require a hefty web browser instance like Selenium or any third-party packages) you can most likely use the curl command-line client to authenticate to the app and send your requests, then put the curl commands into a shell or Python script.
You can get a head start on developing a curl solution using the Chrome dev tools:
With the dev tools open, bring up the Network tab
Select the "Preserve log" checkbox
Manually navigate to your web app in the browser and log in
Perform any other actions you want to automate
You should now have a list of requests in the Network tab
Scan through the requests and determine which are important (e.g., GETs for images can be ignored)
For each request you want to include in your script, right click the item and click ""Copy as cURL" option to get the curl equivalent of the request in your clipboard.
The string Chrome places in your clipboard will be very verbose; you can likely remove several bits and still have a working request, if you want to clean it up.
Parameterize the requests as necessary, and you should have the beginnings of a working shell script for your task.

What tracking solutions are available for server side code?

I'm working on a tracking proxy (for want of a better term) written in Python. It's a simple http (wsgi) application that will run on one (maybe more) server and accepts event data from a desktop client. This service would then forward the tracking data on to some actual tracking platform (DeskMetrics, MixPanel, Google Analytics) so that we don't have to deal with the slicing and dicing of data.
The reason for this implementation is that it's much easier and faster to make changes to a server process that we control rather than having to ensure every client in the wild gets updated if the tracking backend changes in some way.
I've been looking up info on the various options and I was hoping somebody here would have some good advice from their own experiences. Ideally we'd be able to use Google Analytics as it's free for any amount of usage but paid options are fine.
My only real requirement is either a good Python library or a well documented api that I can write a wrapper for (this seems somewhat lacking in GA when it comes to triggering events through any method other than their js or other provided libs).
N.B. We're not really tracking server code so something like NewRelic isn't appropriate, we're just decoupling a desktop application from the specifics of the tracking backend.

We ran into this same problem a bunch of times, we ended up building a suite of server-side analytics libraries to make this easier.
Segment.io has libraries for Python, Ruby, Java, Node, .NET and PHP that abstract the APIs for Mixpanel, KISSmetrics, Google Analytics and a bunch of other analytics services.
You could integrate the Python library once, and then send your data wherever you want. The data is proxied through Segment.io's hosted service. Hopefully this cleans up the mess of integrating a bunch of libraries, each with slightly different APIs. (The service is free for the first million events.)

Have you tried anything below?
The Google Data APIs Python Client Library has source specific to analytics
http://code.google.com/p/gdata-python-client/
http://code.google.com/p/gdata-python-client/source/browse/#hg%2Fsamples%2Fanalytics
https://developers.google.com/gdata/articles/python_client_lib
You might be able to borrow from these sources as well;
Google has something they are working on for mobile and source is available in PHP, JSP, ASP.net and Perl: https://developers.google.com/analytics/devguides/collection/other/mobileWebsites
I also came accross this in PHP http://code.google.com/p/php-ga/
As for others:
KissMetrics: http://support.kissmetrics.com/apis/python
MixPanel: https://mixpanel.com/docs/integration-libraries/python
DeskMetrics: don't seem to have python, http://docs.deskmetrics.com/index.html
Sorry I cannot provide information based off extensive experience with anything python related other then providing a few of these resources. I would be interested to see what you come up with.

TFS Webservice Documentation

We use a lot of of python to do much of our deployment and would be handy to connect to our TFS server to get information on iteration paths, tickets etc. I can see the webservice but unable to find any documentation. Just wondering if anyone knew of anything?

The web services are not documented by Microsoft as it is not an officially supported route to talk to TFS. The officially supported route is to use their .NET API.
In the case of your sort of application, the course of action I usually recommend is to create your own web service shim that lives on the TFS server (or another server) and uses their API to talk to the server but allows you to present the data in a nice way to your application.
Their object model simplifies the interactions a great deal (depending on what you want to do) and so it actually means less code over-all - but better tested and testable code and also you can work around things such as the NTLM auth used by the TFS web services.
Hope that helps,
Martin.

So, this question is friggin' old, but let me take a whack at it (since it keeps coming up in my google searches).
There's no officiall supported API for the on premise TFS (the MSFT hosted one has http://www.visualstudio.com/en-us/integrate/api/overview).
That said, you can always use Fiddler (http://www.telerik.com/fiddler) or something like it to inspect the calls that the web client for TFS is making to the server and do your magic to turn those into the scripts in python you want.
You'll need to run your python scripts under a service account that has TFS privs appropriate to what it is trying to do (read, update, confugure... whatever).
Since it sounds like you are just trying to read from TFS, this might be a really easy way for you to get what you want since an HTTP get to
http://yourserver/tfs/yourcollection/yourproject/_workitems#id=yourworkitemid
will hand you back (halfway) sane html payloads.
If you want lists of iterations or teams or whatever, then your service account needs to have the appropriate admin privileges and hit things like
http://yourserver/tfs/yourcollection/yourproject/_admin/_iterations
and use that response.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.