Maybe my question title is different from the question content.
statuscode = []
statuscode.append(200)
for x in find_from_sublister(hostname):
x2 = x.strip()
url = "http://"+ x2
try:
req = requests.get(url)
req1 = str(req.status_code) + " " + str(url) + '\n'
req2 = str(req.status_code)
req3 = str(url)
dict = {req2 : req3}
print " \n " + str(req1)
except requests.exceptions.RequestException as e:
print "Can't make the request to this Subdomain " + str(url) + '\n'
for keys, values in dict.iteritems():
print "finding the url's whose status code is 200"
if keys == statuscode[0]:
print values
else:
print "po"
I am using the code to do some following stuffs.
It will find the subdomains from the Sub-lister (Locally)
Then it will go to find the status code of each subdomain which was find by the sublister, with the help of for loop. for x in find_from_sublister(hostname):
Note: find_from_sublister(hostname) is a function for finding the subdomain from the sublister.
Then print the status code with the URL. Here print " \n " + str(req1)
[All goes well, but the problem starts here ]
Now what I want is to seperate the URLs which have 200 status code.
so I heard then it can happen by using the dictionary in python. So, I try to use the dictionary. As you can see dict = {req2 : req3}
Now I also make a list of index which have value 200
Then I compare the keys to list of index. here keys == statuscode[0]
and if they match then it should print all the URL's which have 200 status code.
But the result I am getting is below,
finding the url's whose status code is 200
po
You see the value po its else statment value,
else:
print "po"
Now the problem is Why I am getting this else value why not the URL's which have status code 200?
Hope I explained you clearly. And waiting for someone who talk to me on this.
Thanks
Note: I am Using Python 2.7 ver.
In your case, you don't even need the dictionary.
I've tried to clean up the code as much as possible, including more descriptive variable names (also, it's a bad idea to override builtin names like dict).
urls_returning_200 = []
for url in find_from_sublister(hostname):
url = "http://" + url.strip()
try:
response = requests.get(url)
if response.status_code == 200:
urls_returning_200.append(url)
print " \n {} {}\n".format(response.status_code, url)
except requests.exceptions.RequestException as e:
print "Can't make the request to this Subdomain {}\n".format(url)
print "finding the url's whose status code is 200"
print urls_returning_200
Related
I'm asking the user for an email, and then sending it to an email verification api, which I then get certain bits of info from. I'm getting a KeyError: 'username' and I have no idea why I'm getting that error. It's also annoying to test since they ratelimit after ~5 attempts
import json, requests, sys
emailInput = ""
def printHelp():
print("Proper usage is: python test.py [email]")
if len(sys.argv) < 2:
printHelp()
sys.exit()
elif len(sys.argv) == 2 and sys.argv[1] == "--help":
printHelp()
sys.exit()
elif len(sys.argv) == 2 and sys.argv[1] != "--help":
emailInput = str(sys.argv[1])
url = 'https://api.trumail.io/v2/lookups/json?email=' + str(emailInput)
res = requests.get(url)
res.raise_for_status()
resultText = res.text
emailInfo = json.loads(resultText)
print("\nEmail Analyzer 1.0\n\nInformation for email: " + sys.argv[1])
print("=====================================================")
print("Username: " + str(emailInfo["username"]))
print("Domain: " + str(emailInfo["domain"]))
print("Valid Format: " + str(emailInfo["validFormat"]))
print("Deliverable: " + str(emailInfo["deliverable"]))
print("Full Inbox: " + str(emailInfo["fullInbox"]))
print("Host Exists: " + str(emailInfo["hostExists"]))
print("Catch All: " + str(emailInfo["catchAll"]))
print("Disposable: " + str(emailInfo["disposable"]))
print("Free: " + str(emailInfo["free"]))
The reason is because a user enters an email that might seem valid - i.e. it's a valid email address with an # symbol etc. - but the email likely does not exist or is not in use.
For example, I ran your script with the following dummy input:
emailInput = 'acdefg#gmail.com'
After I added a print(emailInfo) statement for debugging purposes, this is what I found to be the output from the server:
{'Message': 'No response received from mail server'}
Therefore, your goal here will be to validate the server output. That is, in the case of a valid email that does not exist, you will receive an HTTP 200 (OK) response from the server with a Message field alone populated in the JSON response object. The task here will be to correctly detect the presence of this key, and then run a separate logic other than the happy path, which was already being handled above.
Your error is coming from the fact that emailInfo does not have a key username. Perhaps use emailInfo.get("username", default_value), where default_value is any value you would like if there is no username.
The line with the error is print("Username: " + str(emailInfo["username"]))
import os
import urllib2
import json
while True :
ip= raw_input ("what's the ip :")
url="http://ip-api.com/json/"
response = urllib2.urlopen(url + ip)
data = response.read()
values = json.loads(data)
print ("IP:" + values['query'])
print ("city:" + values['city'])
print ("ISP:" + values['isp'])
print ("COUNTRY:" + values['country'])
print ("region:" + values['region'])
print ("time zone:" + values['timezone'])
what should i add instead of these 2 ?
print ("latitude:" + values['lat'])
print ("longitude:" + values['lon'])
break
$ curl http://ip-api.com/json/1.1.1.1
{"status":"success","country":"Australia","countryCode":"AU","region":"QLD","regionName":"Queensland","city":"South Brisbane","zip":"4101","lat":-27.4766,"lon":153.0166,"timezone":"Australia/Brisbane","isp":"Cloudflare, Inc","org":"APNIC and Cloudflare DNS Resolver project","as":"AS13335 Cloudflare, Inc.","query":"1.1.1.1"}
From this, we can see that lat and lon are numbers in the JSON, not strings:
"lat":-27.4766,"lon":153.0166
So Python is complaining, because you can't add a number to a string.
To make it work, either convert the number to a string first:
print ("latitude:" + str(values['lat']))
print ("longitude:" + str(values['lon']))
Or, for a more modern and readable solution, use f-strings:
print (f"latitude:{values['lat']}")
print (f"longitude:{values['lon']}")
I wrote a hiscore checker for a game that I play, basically you enter a list of usernames into the .txt file & it outputs the results in found.txt.
However if the page responds a 404 it throws an error instead of returning output as " 0 " & continuing with the list.
Example of script,
#!/usr/bin/python
import urllib2
def get_total(username):
try:
req = urllib2.Request('http://services.runescape.com/m=hiscore/index_lite.ws?player=' + username)
res = urllib2.urlopen(req).read()
parts = res.split(',')
return parts[1]
except urllib2.HTTPError, e:
if e.code == 404:
return "0"
except:
return "err"
filename = "check.txt"
accs = []
handler = open(filename)
for entry in handler.read().split('\n'):
if "No Displayname" not in entry:
accs.append(entry)
handler.close()
for account in accs:
display_name = account.split(':')[len(account.split(':')) - 1]
total = get_total(display_name)
if "err" not in total:
rStr = account + ' - ' + total
handler = open('tried.txt', 'a')
handler.write(rStr + '\n')
handler.close()
if total != "0" and total != "49":
handler = open('found.txt', 'a')
handler.write(rStr + '\n')
handler.close()
print rStr
else:
print "Error searching"
accs.append(account)
print "Done"
HTTPERROR exception that doesn't seem to be working,
except urllib2.HTTPError, e:
if e.code == 404:
return "0"
except:
return "err"
Error response shown below.
Now I understand the error shown doesn't seem to be related to a response of 404, however this only occurs with users that return a 404 response from the request, any other request works fine. So I can assume the issue is within the 404 response exception.
I believe the issue may lay in the fact that the 404 is a custom page which you get redirected too?
so the original page is " example.com/index.php " but the 404 is " example.com/error.php "?
Not sure how to fix.
For testing purposes, format to use is,
ID:USER:DISPLAY
which is placed into check.txt
It seems that total can end up being None. In that case you can't check that it has 'err' in it. To fix the crash, try changing that line to:
if total is not None and "err" not in total:
To be more specific, get_total is returning None, which means that either
parts[1] is None or
except urllib2.HTTPError, e: is executed but e.code is not 404.
In the latter case None is returned as the exception is caught but you're only dealing with the very specific 404 case and ignoring other cases.
I am using Tweepy package in Python to collect tweets. I track several users and collect their latest tweets. For some users I get an error like "Failed to parse JSON payload: ", e.g. "Failed to parse JSON payload: Expecting ',' delimiter or '}': line 1 column 694303 (char 694302)". I took a note of the userid and tried to reproduce the error and debug the code. The second time I ran the code for that particular user, I got results (i.e. tweets) with no problem. I adjusted my code so that when I get this error I try once more to extract the tweets. So, I might get this error once, or twice for a user, but in a second or third attempt the code returns the tweets as usual without the error. I get similar behaviour for other userids too.
My question is, why does this error appear randomly? Nothing else has changed. I searched on the internet but couldn't find a similar report. A snippet of my code follows
#initialize a list to hold all the tweepy Tweets
alltweets = []
ntries = 0
#make initial request for most recent tweets (200 is the maximum allowed count)
while True:
try: #if process fails due to connection problems, retry.
if beforeid:
new_tweets = api.user_timeline(user_id = user,count=200, since_id=sinceid, max_id=beforeid)
else:
new_tweets = api.user_timeline(user_id = user,count=200, since_id=sinceid)
break
except tweepy.error.RateLimitError:
print "Rate limit error:", sys.exc_info()[0]
print("Timeout, retry in 5 minutes...\n")
time.sleep(60 * 5)
continue
except tweepy.error.TweepError as er:
print('TweepError: ' + er.message)
if er.message == 'Not authorized.':
new_tweets = []
break
else:
print(str(ntries))
ntries +=1
pass
except:
print "Unexpected error:", sys.exc_info()[0]
new_tweets = []
break
I would like to check an infinite number of self generated URLs for validity, and if valid safe body of response in a file. URLs look like this: https://mydomain.com/ + random string (e.g. https://mydomain.com/ake3t) and I want to generate them using the alphabet "abcdefghijklmnopqrstuvwxyz0123456789_-" and just brute force try out all possibilities.
I wrote a script in python but as I am an absolute beginner here it was very slow! As I need something very very fast I tried to use scrapy as I thought it was meant for exactly this kind of job.
The problem now is I cannot find out how to dynamically generate URLs on the fly, I cannot generate them beforehand as it is not a fixed number of them.
Could somebody please show me how to achieve this or recommend me another tool or library even better suited for this job?
UPDATE:
This is the script I used, but I think it is slow. What worries me the most is that it gets slower if I use more than one Thread (specified in threadsNr)
import threading, os
import urllib.request, urllib.parse, urllib.error
threadsNr = 1
dumpFolder = '/tmp/urls/'
charSet = 'abcdefghijklmnopqrstuvwxyz0123456789_-'
Url_pre = 'http://vorratsraum.com/'
Url_post = 'alwaysTheSameTail'
# class that generate the words
class wordGenerator ():
def __init__(self, word, charSet):
self.currentWord = word
self.charSet = charSet
# generate the next word set that word as currentWord and return the word
def nextWord (self):
self.currentWord = self._incWord(self.currentWord)
return self.currentWord
# generate the next word
def _incWord(self, word):
word = str(word) # convert to string
if word == '': # if word is empty
return self.charSet[0] # return first char from the char set
wordLastChar = word[len(word)-1] # get the last char
wordLeftSide = word[0:len(word)-1] # get word without the last char
lastCharPos = self.charSet.find(wordLastChar) # get position of last char in the char set
if (lastCharPos+1) < len(self.charSet): # if position of last char is not at the end of the char set
wordLastChar = self.charSet[lastCharPos+1] # get next char from the char set
else: # it is the last char
wordLastChar = self.charSet[0] # reset last char to have first character from the char set
wordLeftSide = self._incWord(wordLeftSide) # send left site to be increased
return wordLeftSide + wordLastChar # return the next word
class newThread(threading.Thread):
def run(self):
global exitThread
global wordsTried
global newWord
global hashList
while exitThread == False:
part = newWord.nextWord() # generate the next word to try
url = Url_pre + part + Url_post
wordsTried = wordsTried + 1
if wordsTried == 1000: # just for testing how fast it is
exitThread = True
print( 'trying ' + part) # display the word
print( 'At URL ' + url)
try:
req = urllib.request.Request(url)
req.addheaders = [('User-agent', 'Mozilla/5.0')]
resp = urllib.request.urlopen(req)
result = resp.read()
found(part, result)
except urllib.error.URLError as err:
if err.code == 404:
print('Page not found!')
elif err.code == 403:
print('Access denied!')
else:
print('Something happened! Error code', err.code)
except urllib.error.URLError as err:
print('Some other error happened:', err.reason)
resultFile.close()
def found(part, result):
global exitThread
global resultFile
resultFile.write(part +"\n")
if not os.path.isdir(dumpFolder + part):
os.makedirs(dumpFolder + part)
print('Found Part = ' + part)
wordsTried = 0
exitThread = False # flag to kill all threads
newWord = wordGenerator('',charSet); # word generator
if not os.path.isdir(dumpFolder):
os.makedirs(dumpFolder)
resultFile = open(dumpFolder + 'parts.txt','a') # open file for append
for i in range(threadsNr):
newThread().start()
You cannot check "an infinite number of URLs" without it being "very slow", beginner or no.
The time your scraper is taking is almost certainly dominated by the response time of the server you're accessing, not by the efficiency of your script.
What are you trying to do, exactly?
Do you want brute force or random? Below is a sequential brute force method with repeating characters. The speed of this is going to be largely determined by your server response. Also note that this would likely generate a denial of service condition very quickly.
import itertools
import url
pageChars = 5
alphabet = "abcdefghijklmnopqrstuvwxyz0123456789_-"
#iterate over the product of alphabet with <pageChar> elements
#this assumes repeating characters are allowed
# Beware this generates len(alphabet)**pageChars possible strings
for chars in itertools.product(alphabet,repeat=pageChars):
pageString = ''.join(chars)
urlString = 'https://mydomain.com/' + pageString
try:
url = urllib2.urlopen(url)
except urllib2.HTTPError:
print('No page at: %s' % urlString)
continue
pageDate = url.read()
#do something with page data