I was trying my hand at implementing a bittorrent client in python (I know there are libs out there that can do this for me easily, but I'm just trying to learn new things).
I downloaded and managed to succesfully decode the torrent file, however when I try to do the GET request at the tracker I get back a 403 response and I have no idea why. This is what I tried (this is code copied from the python shell):
>>> f = open("torrents/test.torrent")
>>> torrentData = bencoder.decode(f.read())
>>> torrentData["announce"]
'http://reactor.flro.org:8080/announce.php?passkey=d59fc5b5b9e2664895ad1c68a3621caf'
>>> params["info_hash"] = sha1(bencoder.encode(torrentData["info"])).digest()
>>> params["peer_id"] = '-AZ-1234-12345678901'
>>> params["left"] = sum(f["length"] for f in torrentData["info"]["files"])
>>> params["port"] = 6890
>>> params["uploaded"] = 0
>>> params["downloaded"] = 0
>>> params["compact"] = 1
>>> params["event"] = "started"
>>> params
{'uploaded': 0, 'compact': 1, 'info_hash': '\xab}\x19\x0e\xac"\x9d\xcf\xe5g\xd4R\xae\xee\x1e\xd7\
>>> final_url = torrentData["announce"] + "&" + urllib.urlencode(params)
>>> final_url
'http://reactor.flro.org:8080/announce.php?passkey=d59fc5b5b9e2664895ad1c68a3621caf&uploaded=0&co
>>> urllib2.urlopen(final_url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
response = meth(req, response)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
return self._call_chain(*args)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
result = func(*args)
File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Am I missing something from the params folder? I also tried this torrent in my uTorrent client and it worked so the tracker is working fine. I even tried the naked announce url (without the params) and same thing. From what I read from the bittorrent specification there is no mention of a 403 response from the tracker.
I'd be very happy if you guys could help me out with this.
To reduce the amount of variables it is better to test against a tracker you're running locally, e.g. opentracker is a good choice since it imposes few requirements.
Errors you only get on specific trackers but not on others are likely due to additional requirements imposed by the tracker administrators and not by the bittorrent protocol itself.
The major exceptions are that many public trackers may not allow non-compact announces or require UDP announces instead of HTTP ones.
Ok I managed to figure out the issue. It's kinda silly but it's actually because the request to the tracker didn't have any headers, and that tracker actually needed an user-agent otherwise it would reject the request. All I had to do was add a user-agent to the request.
Related
I was working on API testing, and I tried everything but it would not print the JSON file into a string. I was wondering if it was the website I was testing the API requests from as I kept getting a 406 error. I even tried taking code from online that shows how to do this but it still would not print and it would give the error listed below. Here I give the code I used and the response Pycharm's console gave me.
import json
import requests
res = requests.get("http://dummy.restapiexample.com/api/v1/employees")
data = json.loads(res.text)
data = json.dumps(data)
print(data)
print(type(data))
Traceback (most recent call last):
File "C:/Users/johnc/PycharmProjects/API_testing/api_testing.py", line 8, in <module>
data = json.loads(res.text)
File "D:\Program Files (x86)\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "D:\Program Files (x86)\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\Program Files (x86)\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
REST API's vary widely in the types of requests they will accept. 406 means that you didn't give enough information for the server to format a response. You should include a user agent because API's are frequently tweaked to deal with the foibles of various http clients and specifically list the format of the output you want. Adding acceptable encodings lets the API compress data. Charset is a good idea. You could even add a language request but most API's don't care.
import json
import requests
headers={"Accept":"application/json",
"User-agent": "Mozilla/5.0",
"Accept-Charset":"utf-8",
"Accept-Encoding":"gzip, deflate",
"Accept-Language":"en-US"} # or your favorite language
res = requests.get("http://dummy.restapiexample.com/api/v1/employees", headers=headers)
data = json.loads(res.text)
data = json.dumps(data)
print(data)
print(type(data))
The thing about REST API's is that they may ignore some or part of the header and return what they please. Its a good idea to form the request properly anyway.
The default Python User-Agent was being probably blocked by the hosting company.
You can setup any string or search for a real device string.
res = requests.get("http://dummy.restapiexample.com/api/v1/employees", headers={"User-Agent": "XY"})
It's you, your connection or a proxy. Things work just fine for me.
>>> import requests
>>> res = requests.get("http://dummy.restapiexample.com/api/v1/employees")
>>> res.raise_for_status() # would raise if status != 200
>>> print(res.json()) # `res.json()` is the canonical way to extract JSON from Requests
{'status': 'success', 'data': [{'id': '1', 'employee_name': 'Tiger Nixon', 'employee_salary': '320800', ...
How do i ping list of urls (around 80k) using python. The url is given in the format "https://www.test.com/en/Doors-Down/Buffalo/pm/99000002/3991","99000002". So i need to remove the numbers after comma from url(,"99000002") and ping the rest to url to find which one of them shows 404 error code.I was able to remove the the last character using rsplit library.
df= '"https://www.test.com/en/Doors-Down/Buffalo/pm/99000002/3991","99000002"'
print(df.rsplit(',',1)[0])
I have the urls in a csv file.But how do i ping such a huge list of urls.
update
I did try the a solution but after some time i get an error
MY code:
import csv
from urllib2 import urlopen
import urllib2
import split
import requests
with open('C:\Users\kanchan.jha\Desktop\pm\performer_metros.csv',"rU") as csvfile:
reader = csv.reader(csvfile)
output = csv.writer(open("C:\Users\kanchan.jha\Desktop\pm\pm_quotes.csv",'w'))
for row in reader:
splitlist = [i.split(',',1)[0] for i in row]
#output.writerow(splitlist)
#converting to string and removing the extra quotes and square bracket
url = str(splitlist)[1:-1]
urls = str(url.strip('\''))
content = urllib2.urlopen(urls).read()
if content.find('404') > -1:
output.writerow(splitlist)
csvfile.close()
The code runs for a while and then i get an error(pasted below).A output file is created but it contains only 10-15 urls having 404 error. It seems only a few urls are checked for error not all.
Traceback (most recent call last):
File "c:\Users\kanchan.jha\Desktop\file.py", line 27, in <module>
content = urllib2.urlopen(urls, timeout =1000).read()
File "C:\Python27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 435, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 473, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 407, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
You can use requests library and ping all the URLs one by one and collect the data on which one returned a 404. You can probably keep writing this data to disk instead of keeping it in memory if you want to preserve it.
import requests
# raw_string_urls is your list of 80k urls with string attached
raw_string_urls = ['"https://www.test.com/en/Doors-Down/Buffalo/pm/99000002/3991","99000002"', '"https://www.test.com/en/Doors-Down/Buffalo/pm/99000002/3991","99000002"', '"https://www.test.com/en/Doors-Down/Buffalo/pm/99000002/3991","99000002"', '"https://www.test.com/en/Doors-Down/Buffalo/pm/99000002/3991","99000002"']
not_found_urls = list()
# Iterate here on the raw_string_urls
# The below code could be executed for each url.
for raw_string_url in raw_string_urls:
url = raw_string_url.split(',')[0].strip('"')
r = requests.get(url)
print(url)
print(r.status_code)
if r.status_code == 404:
not_found_urls.append(url)
You can then dump not_found_urls list as JSON file or whatever you want. Hope this helps.
You can ping a url using Python Requests.
import requests
url = "https://stackoverflow.com/questions/49634166/how-do-i-have-a-list-of-url-around-80k-using-python"
response = requests.get(url)
print response.status_code
# 200
Once you have your urls, you can easily iterate through the list and send a get request, saving or printing the result per URL as per your requirements. Not sure if it's going to work seamlessly with such a big list though, and also please note we are assuming that every URL will be available with no authentication and that every URL will be valid, which I am not sure it is the case.
the is a snippet of infrastructure code to ping the urls using multi-threading,
a simple worker-queue model there is a queue with tasks and every worker (thread) spawn will listen to this queue and take tasks from it
by using multiple threads you can process 80K requests in a reasonable time
import threading, Queue, requests
pool = Queue.Queue()
num_worker_threads = 10
def ping(url):
# do a ping to the url return True/False or whatever you want..
response = requests.get(url)
if response.status_code != 200:
return False
return True
def worker():
while True:
url = pool.get()
try:
response = ping(url)
# check if response is ok and do stuff (printing to log or smt)
except Exception as e:
pass
pool.task_done()
for i in range(num_worker_threads):
t = threading.Thread(target=worker, args=())
t.setDaemon(True)
t.start()
urls = [...] #list of urls to check
for url in urls:
pool.put(url)
pool.join()
I am using urllib.request to perform a sequence of http calls in python 3.6. I need to retrieve the value of a 302 http redirect that is returned in response to a urllib.request.urlopen call like so...
import urllib.request
... many previous http calls ...
post_data = {'foo': 'bar', 'some': 'otherdata'}
encoded = urllib.parse.urlencode(post_data).encode('utf-8')
req = urllib.request.Request('https://some-url', encoded)
redirected_url = urllib.request.urlopen(req).geturl()
I get an error like...
urllib.error.HTTPError: HTTP Error 302: Found - Redirection to url 'gibberish://login_callback?code=ABCD......' is not allowed
What I need is to actually get the url that is being returned in the 302 as the .geturl() method should provide, but instead I get an error.
Please no answers like "Hey use this other library that I'm super into right now" as we've spent a long time building this script using urllib2 and we have very little python knowledge.
Thanks for your help.
If you dont want to use the requests library (which is almost part of the core libs at this point), you need to write a custom HTTPRedirectHandler using urllib2.
import urllib2
class CustomHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
### DO YOUR STUFF HERE
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
opener = urllib2.build_opener(CustomHTTPRedirectHandler)
post_data = {'foo': 'bar', 'some': 'otherdata'}
encoded = urllib.parse.urlencode(post_data).encode('utf-8')
req = urllib.request.Request('https://some-url', encoded)
opener.urlopen(req)
I keep getting this error everytime I try running my code through proxy. I have gone through every single link available on how to get my code running behind proxy and am simply unable to get this done.
import twython
import requests
TWITTER_APP_KEY = 'key' #supply the appropriate value
TWITTER_APP_KEY_SECRET = 'key-secret'
TWITTER_ACCESS_TOKEN = 'token'
TWITTER_ACCESS_TOKEN_SECRET = 'secret'
t = twython.Twython(app_key=TWITTER_APP_KEY,
app_secret=TWITTER_APP_KEY_SECRET,
oauth_token=TWITTER_ACCESS_TOKEN,
oauth_token_secret=TWITTER_ACCESS_TOKEN_SECRET,
client_args = {'proxies': {'http': 'proxy.company.com:10080'}})
now if I do
t = twython.Twython(app_key=TWITTER_APP_KEY,
app_secret=TWITTER_APP_KEY_SECRET,
oauth_token=TWITTER_ACCESS_TOKEN,
oauth_token_secret=TWITTER_ACCESS_TOKEN_SECRET,
client_args = client_args)
print t.client_args
I get only a {}
and when I try running
t.update_status(status='See how easy this was?')
I get this problem :
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
t.update_status(status='See how easy this was?')
File "build\bdist.win32\egg\twython\endpoints.py", line 86, in update_status
return self.post('statuses/update', params=params)
File "build\bdist.win32\egg\twython\api.py", line 223, in post
return self.request(endpoint, 'POST', params=params, version=version)
File "build\bdist.win32\egg\twython\api.py", line 213, in request
content = self._request(url, method=method, params=params, api_call=url)
File "build\bdist.win32\egg\twython\api.py", line 134, in _request
response = func(url, **requests_args)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\sessions.py", line 377, in post
return self.request('POST', url, data=data, **kwargs)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\sessions.py", line 438, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\adapters.py", line 327, in send
raise ConnectionError(e)
ConnectionError: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /1.1/statuses/update.json (Caused by <class 'socket.gaierror'>: [Errno 11004] getaddrinfo failed)
I have searched everywhere. Tried everything that I possibly could. The only resources available were :
https://twython.readthedocs.org/en/latest/usage/advanced_usage.html#manipulate-the-request-headers-proxies-etc
https://groups.google.com/forum/#!topic/twython-talk/GLjjVRHqHng
https://github.com/fumieval/twython/commit/7caa68814631203cb63231918e42e54eee4d2273
https://groups.google.com/forum/#!topic/twython-talk/mXVL7XU4jWw
There were no topics I could find here (on Stack Overflow) either.
Please help. Hope someone replies. If you have already done this please help me with some code example.
Your code isn't using your proxy. The example shows, you specified a proxy for plain HTTP but your stackstrace shows a HTTPSConnectionPool. Your local machine probably can't resolve external domains.
Try setting your proxy like this:
client_args = {'proxies': {'https': 'http://proxy.company.com:10080'}}
In combination with #t-8ch's answer (which is that you must use a proxy as he has defined it), you should also realize that as of this moment, requests (the underlying library of Twython) does not support proxying over HTTPS. This is a problem with requests underlying library urllib3. It's a long running issue as far as I'm aware.
On top of that, reading a bit of Twython's source explains why t.client_args returns an empty dictionary. In short, if you were to instead print t.client.proxies, you'd see that indeed your proxies are being processed as they very well should be.
Finally, complaining about your workplace while on StackOverflow and linking to GitHub commits that have your GitHub username (and real name) associated with them in the comments is not the best idea. StackOverflow is indexed quite thoroughly by Google and there is little doubt that someone else might find this and associate it with you as easily as I have. On top of that, that commit has absolutely no effect on Twython's current behaviour. You're running down a rabbit hole with no end by chasing the author of that commit.
It looks like a domain name lookup failed. Assuming your configured DNS server can resolve Twitter's domain name (and surely it can), I would presume your DNS lookup for proxy.company.com failed. Try using a proxy by IP address instead of by hostname.
I'm trying to write some python code which can create multipart mime http requests in the client, and then appropriately interpret then on the server. I have, I think, partially succeeded on the client end with this:
from email.mime.multipart import MIMEMultipart, MIMEBase
import httplib
h1 = httplib.HTTPConnection('localhost:8080')
msg = MIMEMultipart()
fp = open('myfile.zip', 'rb')
base = MIMEBase("application", "octet-stream")
base.set_payload(fp.read())
msg.attach(base)
h1.request("POST", "http://localhost:8080/server", msg.as_string())
The only problem with this is that the email library also includes the Content-Type and MIME-Version headers, and I'm not sure how they're going to be related to the HTTP headers included by httplib:
Content-Type: multipart/mixed; boundary="===============2050792481=="
MIME-Version: 1.0
--===============2050792481==
Content-Type: application/octet-stream
MIME-Version: 1.0
This may be the reason that when this request is received by my web.py application, I just get an error message. The web.py POST handler:
class MultipartServer:
def POST(self, collection):
print web.input()
Throws this error:
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 242, in process
return self.handle()
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 415, in _delegate
return handle_class(cls)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/application.py", line 390, in handle_class
return tocall(*args)
File "/home/richard/Development/server/webservice.py", line 31, in POST
print web.input()
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/webapi.py", line 279, in input
return storify(out, *requireds, **defaults)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 150, in storify
value = getvalue(value)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 139, in getvalue
return unicodify(x)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 130, in unicodify
if _unicode and isinstance(s, str): return safeunicode(s)
File "/usr/local/lib/python2.6/dist-packages/web.py-0.34-py2.6.egg/web/utils.py", line 326, in safeunicode
return obj.decode(encoding)
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 137-138: invalid data
My line of code is represented by the error line about half way down:
File "/home/richard/Development/server/webservice.py", line 31, in POST
print web.input()
It's coming along, but I'm not sure where to go from here. Is this a problem with my client code, or a limitation of web.py (perhaps it just can't support multipart requests)? Any hints or suggestions of alternative code libraries would be gratefully received.
EDIT
The error above was caused by the data not being automatically base64 encoded. Adding
encoders.encode_base64(base)
Gets rid of this error, and now the problem is clear. HTTP request isn't being interpreted correctly in the server, presumably because the email library is including what should be the HTTP headers in the body instead:
<Storage {'Content-Type: multipart/mixed': u'',
' boundary': u'"===============1342637378=="\n'
'MIME-Version: 1.0\n\n--===============1342637378==\n'
'Content-Type: application/octet-stream\n'
'MIME-Version: 1.0\n'
'Content-Transfer-Encoding: base64\n'
'\n0fINCs PBk1jAAAAAAAAA.... etc
So something is not right there.
Thanks
Richard
I used this package by Will Holcomb http://pypi.python.org/pypi/MultipartPostHandler/0.1.0 to make multi-part requests with urllib2, it may help you out.
After a bit of exploration, the answer to this question has become clear. The short answer is that although the Content-Disposition is optional in a Mime-encoded message, web.py requires it for each mime-part in order to correctly parse out the HTTP request.
Contrary to other comments on this question, the difference between HTTP and Email is irrelevant, as they are simply transport mechanisms for the Mime message and nothing more. Multipart/related (not multipart/form-data) messages are common in content exchanging webservices, which is the use case here. The code snippets provided are accurate, though, and led me to a slightly briefer solution to the problem.
# open an HTTP connection
h1 = httplib.HTTPConnection('localhost:8080')
# create a mime multipart message of type multipart/related
msg = MIMEMultipart("related")
# create a mime-part containing a zip file, with a Content-Disposition header
# on the section
fp = open('file.zip', 'rb')
base = MIMEBase("application", "zip")
base['Content-Disposition'] = 'file; name="package"; filename="file.zip"'
base.set_payload(fp.read())
encoders.encode_base64(base)
msg.attach(base)
# Here's a rubbish bit: chomp through the header rows, until hitting a newline on
# its own, and read each string on the way as an HTTP header, and reading the rest
# of the message into a new variable
header_mode = True
headers = {}
body = []
for line in msg.as_string().splitlines(True):
if line == "\n" and header_mode == True:
header_mode = False
if header_mode:
(key, value) = line.split(":", 1)
headers[key.strip()] = value.strip()
else:
body.append(line)
body = "".join(body)
# do the request, with the separated headers and body
h1.request("POST", "http://localhost:8080/server", body, headers)
This is picked up perfectly well by web.py, so it's clear that email.mime.multipart is suitable for creating Mime messages to be transported by HTTP, with the exception of its header handling.
My other overall conern is in scalability. Neither this solution nor the others proposed here scale well, as they read the contents of a file into a variable before bundling up in the mime message. A better solution would be one which could serialise on demand as the content is piped out over the HTTP connection. It's not urgent for me to fix that, but I'll come back here with a solution if I get to it.
There is a number of things wrong with your request. As TokenMacGuy suggests, multipart/mixed is unused in HTTP; use multipart/form-data instead. In addition, parts should have a Content-disposition header. A python fragment to do that can be found in the Code Recipes.