How do I get the user agent calling a bottlepy API

How do I get the user agent calling a bottlepy API - python

Am trying to get the user agent that is calling an API built with bottle micro framework. When the API is called directly using a browser, it shows what the user agent is. However, when its called from another application written e.g. in PHP or JAVA, it doesn't show the user agent.
I can however get the IP address whether or not the request is from browser or another application
client_ip = request.environ.get('REMOTE_ADDR')
logging.info("Source IP Address: %s" %(client_ip)) #Works
browser_agent = request.environ.get('HTTP_USER_AGENT')
logging.info("Source Browser Type: %s" %(browser_agent)) #Doesn't work when called from an application
When I call it using a browser or say postman, it gives me the result as below:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.3
So, is there a special parameter to use to know what type of agent is calling the API?

Clients are not required to send User-agent headers. Your browser is sending one (as most do), but your PHP and Java clients are (probably) not.
If you have control over the clients, add a user agent header to each request they make. For example, in PHP, see this SO answer.

Related

How can I download many small files quickly? (Not Bandwidth limited)

I need to download ~50 CSV files in python. Based on the Google Chrome network stats, the download takes only 0.1 seconds, while the request takes about 7 seconds to process.
I am currently using headless Chrome to make the requests.
I tried multithreading, but from what I can tell, the browser doesn't support that (it can't make another request before the first request finishes processing). I don't think Multiprocessing is an option as this script will be hosted on a virtual server.
My next idea is to use the requests module instead of headless Chrome, but I am having issues connecting to the company network without a browser. Will this work, though? Any other solutions? Could I do something with multiple driver instances or multiple tabs on a single driver?Thanks!
Here's my code:
from Multiprocessing.pool import ThreadPool
driver=ChromeDriver()
Login(driver)
def getFile(item):
driver.get(url.format(item))
updateSet=blah
pool= ThreadPool(len(updateSet))
for item in updateSet:
pool.apply_async(getFile,(item,))
pool.close()
pool.join()

For request maybe try setting the user agent string to a browser like Chrome, ex: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36.
Some example code:
import requests
url = 'SOME URL'
headers = {
'User-Agent': 'user agent here',
'From': 'youremail#domain.com' # This is another valid field
}
response = requests.get(url, headers=headers)

Automating Archer VR600 | AC1600 via Python

I work in an environment where occasionally we have to bulk configure TP-Link ADSL routers. As one can understand this does cause productivity issues. I solved the issue using python & especially it's requests.session() library. It worked tremendously well especially for older TP-LINK models such as TP-LINK Archer D5.
Reference: How to control a TPLINK router with a python script
The method that i used was to do the configuration via browser, packet capture using Wireshark and replicate it using Python. Archer VR600 introduces new method. When starting configuration using the browser, the main page requests for new password. Once done then it generates a random long string (KEY) which is sent to the router.This key is random and unique, based on this random string JSESSIONID is generated and used throughout the session.
AC1600 IP Address: 192.168.1.1
PC IP Address: 192.168.1.100
KEY and SESSIONID when configured via Browser.
KEY and SESSIONID when configured via Python Script.
As you can see i am trying to replicate the steps via script but failing due to not been able to create a unique key which will be accepted by the router, thus failing to generate a SESSIONID and enable rest on the configuration. 
Code:
def configure_tplink_archer_vr600():
user = 'admin'
salt = '%3D'
default_password = 'admin:admin'
password = "admin"
base_url = 'http://192.168.1.1'
setPwd_url = 'http://192.168.1.1/cgi/setPwd?pwd='
login_url = "http://192.168.1.1/cgi/login?UserName=0f98175e8bd1c9297fc22ec6a47fa4824bfb3c8c73141acd7b46db283557d229c9783f409690c9af5e87055608b358ab4d1dfc45f17e6261daabd3e042d7aee92aa1d8829a8d5a69eb641dcc103b17c4f443a96800c8c523b911589cf7e6164dbc1001194"
get_busy_url = "http://192.168.1.1/cgi/getBusy"
authorization = base64.b64encode(
(default_password).encode()).decode('ascii')
salted_password = base64.b64encode((password).encode()).decode('ascii')
salted_password = salted_password.replace("=", "%3D")
print("Salted Password" + salted_password)
setPwd_url = setPwd_url + salted_password
rs = requests.session()
rs.headers['Cookie'] = 'Authorization=Basic ' + authorization
rs.headers['Referer'] = base_url
rs.headers[
'User-Agent'] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
print("This is authorization string: " + authorization)
response = rs.post(setPwd_url)
print(response)
print(response.text.encode("utf-8"))
response = rs.post(get_busy_url)
print(response)
print(response.text.encode("utf-8"))
response = rs.post(login_url)
print(response)
print(response.text.encode("utf-8"))

Use the python requests library to log in to the router, this cuts the need for any manual work:
Go to the login page and right click + inspect element.
Navigate to the resources tab, here you can see HTTP methods as they
happen.
Login using some username and password and you should see the
corresponding GET/POST method on the network tab.
Click on it and find the payload it sends to the router, this is
usually in json format and you'll need to build it in your python
script, and send it as an input to the webpage.(luckily there are
many tutorials for this out there.
Note that sometimes a payload for the script is actually generated by some javascript, but in most cases it's just some string cramped into the HTML source. If you see a payload you don't understand, just search for it in the page source. Then you'll have to extract it with something like regex and add it to your payload.

Wordpress xmlrpc page returns non xml data for non browser applications

I am using python (http://python-wordpress-xmlrpc.readthedocs.io/en/latest/) to connect to wordpress to post contents.
I have a few wordpress sites to which I connect using sitename.com/xmlrpc.php
However one of my sites recently started reporting a problem while connection mentioning not a valid xml. When I view the page in browser I see the usual "XML-RPC server accepts POST requests only." but when I connect using python I see the following message:
funct ion toNumbers(d){var e=[];d.replace(/(..)/g,function(d){e.push(parseInt(d,16))}) ;return e}function toHex(){for(var d=[],d=1==arguments.length&&arguments[0].cons tructor==Array?arguments[0]:arguments,e="",f=0;fd[f]?"0":"" )+d[f].toString(16);return e.toLowerCase()}var a=toNumbers("f655ba9d09a112d4968c 63579db590b4"),b=toNumbers("98344c2eee86c3994890592585b49f80"),c=toNumbers("c299 e542498206cd9cff8fd57dfc56df");document.cookie="__test="+toHex(slowAES.decrypt(c ,2,a,b))+"; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/"; location.href="http://targetDomainNameHere.com/xmlrpc.php?i=1";This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support
I searched for the file aes.js, no luck.
How to get this working ? How do I remove this? I am using the latest version of Wordpress as of 07.NOV.2017

You can try to pass "User-Agent" header in the request. Generally the Java or Python library would use their version in the User-Agent allowing the word-press server to block.
Over-riding User-Agent header with browser-like value can help get data for some word-press servers. Value can look like: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36

How to use cookies in python 3?

I want use cookies that copy from my chrome, but make much error.
import urllib.request
import re
def open_url(url):
header={"User-Agent":r'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
Cookies={'Cookie':r"xxxxx"}
Request=urllib.request.Request(url=url,headers=Cookies)
response=urllib.request.urlopen(Request,timeout=100)
return response.read().decode("utf-8")
Where does my code go wrong? Is that headers=Cookies ?

The correct way when using urllib.request is to use an OpenerDirector populated with aCookieProcessor:
cookieProcessor = urllib.request.HTTPCookieProcessor()
opener = urllib.request.build_opener(cookieProcessor)
then you use opener and it will automagically process the cookies:
response = opener.open(request,timeout=100)
By default, the CookieJar (http.cookiejar.CookieJar) used in a simple in memory store, but you can use a FileCookieJar in you need long term storage of persistent cookies, or even a http.cookiejar.MozillaCookieJar if you want to use persistent cookies stored in a cookies.txt now legacy Mozilla format
If you want to use cookies existing in your web browser, you must first store them in a cookie.txt compatible file and load them in a MozillaCookieJar. For Mozilla, you can find an add-on Cookie Exporter. For other browser, you must manually create a cookie.txt file by reading the content of the cookies you need in your browser. The format can be found in The Unofficial Cookie FAQ. Extracts:
... each line contains one name-value pair. An example cookies.txt file may have an entry that looks like this:
.netscape.com TRUE / FALSE 946684799 NETSCAPE_ID 100103
Each line represents a single piece of stored information. A tab is inserted between each of the fields.
From left-to-right, here is what each field represents:
domain - The domain that created AND that can read the variable.
flag - A TRUE/FALSE value indicating if all machines within a given domain can access the variable. This value is set automatically by the browser, depending on the value you set for domain.
path - The path within the domain that the variable is valid for.
secure - A TRUE/FALSE value indicating if a secure connection with the domain is needed to access the variable.
*expiration - The UNIX time that the variable will expire on. UNIX time is defined as the number of seconds since Jan 1, 1970 00:00:00 GMT.
name - The name of the variable.
value - The value of the variable.
But the normal way is to mimic a full session and extract automatically the cookies from the responses.

"When receiving an HTTP request, a server can send a Set-Cookie header with the response. The cookie is usually stored by the browser and, afterwards, the cookie value is sent along with every request made to the same server as the content of a Cookie HTTP header" extracted from mozilla site.
This link
Please go through this
will help you give some knowledge about headers and http request. Please go through this. This might answer alot of your answer.

You can use a better library (IMHO) - requests.
import requests
headers = {
'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
cookies = dict(c1="cookie_numba_one")
r = requests.get('http://example.com', headers = headers, cookies = cookies)
print(r.text)

How to check Proxy headers to check anonymity?

I'm trying to determine high anonymity proxies. Also called private/elite proxies. From a forum I've read this:
High anonymity Servers don't send HTTP_X_FORWARDED_FOR, HTTP_VIA and
HTTP_PROXY_CONNECTION variables. Host doesn't even know you are using
proxy server and of course it doesn't know your IP address.
A highly anonymous proxy will display the following information:
REMOTE_ADDR = Proxy's IP address
HTTP_VIA = blank
HTTP_X_FORWARDED_FOR = blank
So, how I can check for this headers in Python, to discard them as a HA Proxy ? I have tried to retrieve the headers for 20-30 proxies using the requests package, also with urllib, with the build-in http.client, with urllib2. But I didn't see these headers, never. So I should be doing something wrong...
This is the code I've used to test with requests:
proxies = {'http': 'http://176.100.108.214:3128'}
header = {'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.360',}
s = requests.session()
s.proxies = proxies
r = s.get('http://www.python.org', headers=header)
print(r.status_code)
print(r.request.headers)
print(r.headers)

It sounds like the forum post you're referring to is talking about the headers seen by the server on your proxied request, not the headers seen by the client on the proxied response.
Since you're testing with www.python.org as the server, the only way to see the headers it receives would be to have access to their logs. Which you don't.
But there's a simple solution: run your own HTTP server, make requests against that, and then you can see what it receives. (If you're behind a firewall or NAT that the proxy you're testing won't be able to connect to, you may have to get a free hosted server somewhere; if not, you can just run it on your machine.)
If you have no idea how to set up and configure a web server, Python comes with one of its own. Just run this script with Python 3.2+ (on your own machine, or an Amazon EC2 free instance, or whatever):
from http.server import HTTPServer, SimpleHTTPRequestHandler
class HeaderDumper(SimpleHTTPRequestHandler):
def do_GET(self):
try:
return super().do_GET()
finally:
print(self.headers)
server = HTTPServer(("", 8123), HeaderDumper)
server.serve_forever()
Then run that script with python3 in the shell.
Then just run your client script, with http://my.host.ip instead of http://www.python.org, and look at what the script dumps to the server's shell.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.