I was making a site component scanner with Python. Unfortunately, something goes wrong when I added another value to my script. This is my script:
#!/usr/bin/python
import sys
import urllib2
import re
import time
import httplib
import random
# Color Console
W = '\033[0m' # white (default)
R = '\033[31m' # red
G = '\033[1;32m' # green bold
O = '\033[33m' # orange
B = '\033[34m' # blue
P = '\033[35m' # purple
C = '\033[36m' # cyan
GR = '\033[37m' # gray
#Bad HTTP Responses
BAD_RESP = [400,401,404]
def main(path):
print "[+] Testing:",host.split("/",1)[1]+path
try:
h = httplib.HTTP(host.split("/",1)[0])
h.putrequest("HEAD", "/"+host.split("/",1)[1]+path)
h.putheader("Host", host.split("/",1)[0])
h.endheaders()
resp, reason, headers = h.getreply()
return resp, reason, headers.get("Server")
except(), msg:
print "Error Occurred:",msg
pass
def timer():
now = time.localtime(time.time())
return time.asctime(now)
def slowprint(s):
for c in s + '\n':
sys.stdout.write(c)
sys.stdout.flush() # defeat buffering
time.sleep(8./90)
print G+"\n\t Whats My Site Component Scanner"
coms = { "index.php?option=com_artforms" : "com_artforms" + "link1","index.php?option=com_fabrik" : "com_fabrik" + "ink"}
if len(sys.argv) != 2:
print "\nUsage: python jx.py <site>"
print "Example: python jx.py www.site.com/\n"
sys.exit(1)
host = sys.argv[1].replace("http://","").rsplit("/",1)[0]
if host[-1] != "/":
host = host+"/"
print "\n[+] Site:",host
print "[+] Loaded:",len(coms)
print "\n[+] Scanning Components\n"
for com,nme,expl in coms.items():
resp,reason,server = main(com)
if resp not in BAD_RESP:
print ""
print G+"\t[+] Result:",resp, reason
print G+"\t[+] Com:",nme
print G+"\t[+] Link:",expl
print W
else:
print ""
print R+"\t[-] Result:",resp, reason
print W
print "\n[-] Done\n"
And this is the error message that comes up:
Traceback (most recent call last):
File "jscan.py", line 69, in <module>
for com,nme,expl in xpls.items():
ValueError: need more than 2 values to unpack
I already tried changing the 2 value into 3 or 1, but it doesn't seem to work.
xpls.items returns a tuple of two items, you're trying to unpack it into three. You initialize the dict yourself with two pairs of key:value:
coms = { "index.php?option=com_artforms" : "com_artforms" + "link1","index.php?option=com_fabrik" : "com_fabrik" + "ink"}
besides, the traceback seems to be from another script - the dict is called xpls there, and coms in the code you posted...
you can try
for (xpl, poc) in xpls.items():
...
...
because dict.items will return you tuple with 2 values.
You have all the information you need. As with any bug, the best place to start is the traceback. Let's:
for com,poc,expl in xpls.items():
ValueError: need more than 2 values to unpack
Python throws ValueError when a given object is of correct type but has an incorrect value. In this case, this tells us that xpls.items is an iterable an thus can be unpacked, but the attempt failed.
The description of the exception narrows down the problem: xpls has 2 items, but more were required. By looking at the quoted line, we can see that "more" is 3.
In short: xpls was supposed to have 3 items, but has 2.
Note that I never read the rest of the code. Debugging this was possible using only those 2 lines.
Learning to read tracebacks is vital. When you encounter an error such as this one again, devote at least 10 minutes to try to work with this information. You'll be repayed tenfold for your effort.
As already mentioned, dict.items() returns a tuple with two values. If you use a list of strings as dictionary values instead of a string, which should be split anyways afterwards, you can go with this syntax:
coms = { "index.php?option=com_artforms" : ["com_artforms", "link1"],
"index.php?option=com_fabrik" : ["com_fabrik", "ink"]}
for com, (name, expl) in coms.items():
print com, name, expl
>>> index.php?option=com_artforms com_artforms link1
>>> index.php?option=com_fabrik com_fabrik ink
Related
I am trying to check what files that are present in my full_list_files are also present in required_list.
The thing here is they are not exactly equal to one other , but macthes with filename and last sub directory.
Example :
'C:\Users\Documents\Updated\Build\Output\M\Application_1.bin' matches with "M/Application_1.bin" except the slashes are different.
So I am trying to make both uniform by using the function convert_fslash_2_bslash
But still, I see the output as below ,none of the files are matched.
full_list_files = set(['C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Report.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_2.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Testing.txt', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\masking.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_1.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_1.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\History.zip', 'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Challenge.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_2.bin', 'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\porting.tar.gz', 'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Booting.tar.gz'])
original required_list = set(['N/Application_2.bin', 'M/masking.tar.gz', 'N/Application_1.bin', 'O/Challenge.tar.gz', 'M/Application_1.bin', 'O/Testing.txt', 'M/rooting.tar.gz', 'M/Application_2.bin', 'O/History.zip', 'N/porting.tar.gz', 'O/Report.tar.gz'])
modified required_list = ['N\\Application_2.bin', 'M\\masking.tar.gz', 'N\\Application_1.bin', 'O\\Challenge.tar.gz', 'M\\Application_1.bin', 'O\\Testing.txt', 'M\\rooting.tar.gz', 'M\\Application_2.bin', 'O\\History.zip', 'N\\porting.tar.gz', 'O\\Report.tar.gz']
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Report.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_2.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Testing.txt' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\masking.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\Application_1.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_1.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\History.zip' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\O\\Challenge.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Application_2.bin' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\N\\porting.tar.gz' not present
'C:\\Users\\Documents\\Updated\\Build\\Output\\M\\Booting.tar.gz' not present
How can I get it working correctly.
import os
import sys
import re
full_list_files = {
#These are actually real paths parsed from listdir
#Just for convenience used as strings
'C:\Users\Documents\Updated\Build\Output\M\Application_1.bin',
'C:\Users\Documents\Updated\Build\Output\M\Application_2.bin',
'C:\Users\Documents\Updated\Build\Output\M\masking.tar.gz',
'C:\Users\Documents\Updated\Build\Output\M\Booting.tar.gz',
'C:\Users\Documents\Updated\Build\Output\N\Application_1.bin',
'C:\Users\Documents\Updated\Build\Output\N\Application_2.bin',
'C:\Users\Documents\Updated\Build\Output\N\porting.tar.gz',
'C:\Users\Documents\Updated\Build\Output\O\Challenge.tar.gz',
'C:\Users\Documents\Updated\Build\Output\O\History.zip',
'C:\Users\Documents\Updated\Build\Output\O\Testing.txt',
'C:\Users\Documents\Updated\Build\Output\O\Report.tar.gz'
}
required_list = {
"M/Application_1.bin",
"M/Application_2.bin",
"M/masking.tar.gz",
"M/rooting.tar.gz",
"N/Application_1.bin",
"N/Application_2.bin",
"N/porting.tar.gz",
"O/Challenge.tar.gz",
"O/History.zip",
"O/Testing.txt",
"O/Report.tar.gz"
}
def convert_fslash_2_bslash(required_file_list):
required_config_file_list = []
i = 0
for entry in required_file_list:
entry = entry.strip()
entry = entry.replace('"',"")
entry = entry.replace('/','\\')
required_config_file_list.insert(i, entry)
i = i + 1
return required_config_file_list
if __name__ == "__main__":
print
print "full_list_files = ", full_list_files
print
print "original required_list = ", required_list
print
required_config_file_list = convert_fslash_2_bslash(required_list)
print "modified required_list = ", required_config_file_list
print
for f_entry in full_list_files:
f_entry = repr(f_entry)
#for r_entry in required_config_file_list:
#if ( f_entry.find(r_entry) != -1):
if f_entry in required_config_file_list:
print f_entry ," present"
else:
print f_entry ," not present"
Here is the logic you need at the bottom:
for f_entry in full_list_files:
for r_entry in required_config_file_list:
if f_entry.endswith(r_entry):
print f_entry, " present"
You need to loop over both collections, then check to see if the longer path ends with the shorter path. One of your mistakes was calling repr(), which changes the double backslashes to quadruple ones.
I'll leave it up to you to decide how you'll handle printing paths that are not present at all.
Can somebody help me identify the error here? I am using Python on Visual Studio, and I have copied the code from another source. When I paste the code into a playground like Ideone.com, the program runs without issue. Only when I paste into visual studio do I get an error. The error is in the last two lines of code that begin with "print". This is probably a huge noob question, but I am a beginner. Help!
import hashlib as hasher
import datetime as date
# Define what a Snakecoin block is
class Block:
def __init__(self, index, timestamp, data, previous_hash):
self.index = index
self.timestamp = timestamp
self.data = data
self.previous_hash = previous_hash
self.hash = self.hash_block()
def hash_block(self):
sha = hasher.sha256()
sha.update(str(self.index) + str(self.timestamp) + str(self.data) + str(self.previous_hash))
return sha.hexdigest()
# Generate genesis block
def create_genesis_block():
# Manually construct a block with
# index zero and arbitrary previous hash
return Block(0, date.datetime.now(), "Genesis Block", "0")
# Generate all later blocks in the blockchain
def next_block(last_block):
this_index = last_block.index + 1
this_timestamp = date.datetime.now()
this_data = "Hey! I'm block " + str(this_index)
this_hash = last_block.hash
return Block(this_index, this_timestamp, this_data, this_hash)
# Create the blockchain and add the genesis block
blockchain = [create_genesis_block()]
previous_block = blockchain[0]
# How many blocks should we add to the chain
# after the genesis block
num_of_blocks_to_add = 20
# Add blocks to the chain
for i in range(0, num_of_blocks_to_add):
block_to_add = next_block(previous_block)
blockchain.append(block_to_add)
previous_block = block_to_add
# Tell everyone about it!
print "Block #{} has been added to the blockchain!".format(block_to_add.index)
print "Hash: {}\n".format(block_to_add.hash)
In visual studio and most other editors, they are updated to the most recent python, also meaning that you have to place parentheses after the print statement.
For example:
print("hello")
Visual Studio runs Python 3.x.x, so, you have to write the print sentences on this way:
print('String to print')
Because in Python 3 print is a function, not a reserved word like in Python 2.
First of, i'm sort of new to Python so sorry if this question is obvious. The detect english module appears to be wrong, but it functions perfectly fine when calling it and running it on its own, theres no errors when running it alone and i've rewritten it a couple times to triple check it.
Traceback (most recent call last):
File "H:\Python\Python Cipher Program\transposition hacker.py", line 49, in <module>
main()
File "H:\Python\Python Cipher Program\transposition hacker.py", line 11, in main
hackedMessage = hackTransposition(myMessage)
File "H:\Python\Python Cipher Program\transposition hacker.py", line 34, in hackTransposition
if detectEnglish.isEnglish(decryptedText):
File "H:\Python\Python Cipher Program\detectEnglish.py", line 48, in isEnglish
wordsMatch = getEnglishCount(message) * 100 >= wordPercentage
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
this is the error i am getting when trying to run the Transposition Hacker (copied directly from here
Here is the code for the Detect English Module
# Detect english Module
# to use this code
# import detectEnglish
# detectEnglish.isEnglish(somestring)
# returns true of false
# there must be a dictionary.txt file in the same directory
# all english words
# one per line
UPPERLETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
LETTERS_AND_SPACE = UPPERLETTERS + UPPERLETTERS.lower() + ' \t\n'
def loadDictionary()
dictionaryFile = open('Dictionary.txt')
englishWords = {}
for word in dictionaryFile.read().split('\n'):
englishWords[word] = None
dictionaryFile.close()
return englishWords
ENGLISH_WORDS = loadDictionary()
def getEnglishCount(message):
message = message.upper()
message = removeNonLetters(message)
possibleWords = message.split()
if possibleWords == []:
return 0.0
matches = 0
for word in possibleWords:
if word in ENGLISH_WORDS:
matches += 1
return float(matches) / len(possibleWords)
def removeNonLetters(message):
lettersOnly = []
for symbol in message:
if symbol in LETTERS_AND_SPACE:
lettersOnly.append(symbol)
return ''.join(lettersOnly)
def isEnglish(message, wordPercentage=20, letterPercentage=85):
# by default 20% of the words mustr exist in dictionary file
# 85% of charecters in messafe must be spaces or letters
wordsMatch = getEnglishCount(message) * 100 >= wordPercentage
numLetters = len(removeNonLetters(message))
messageLettersPercentage = float(numLetters) / len(message) * 100
lettersMatch = messageLettersPercentage >= letterPercentage
return wordsMatch and lettersMatcht
getEnglishCount looks like it is missing a return statement. If python gets to the end of a function without hitting a return statement it will return None as you're seeing.
try this:
def getEnglishCount(message):
message = message.upper()
message = removeNonLetters(message)
possibleWords = message.split()
# if possibleWords == []: # redundant
# return 0.0
return len(possibleWords)
Edit: #Kevin Yea I think you're right - there was more in that function. Maybe try this:
def getEnglishCount(message):
message = message.upper()
message = removeNonLetters(message)
possibleWords = message.split()
if possibleWords == []:
return 0.0
matches = 0.
for word in possibleWords:
if word in ENGLISH_WORDS:
matches += 1
return matches / len(possibleWords)
I'd guess the indentation somehow got changed when you copy and pasted the code, with the return statement nested under the if.
As the other poster has said, you're missing a return for the getEnglishCount method, so it's returning NoneType, meaning that there is no value to be returned.
You can't do math on NoneTypes, so the NoneType*100 fails, which is what the bottom of your error traceback says.
i write the python code ,in order to extract key from the log.And using the same log,it worked well in one machine.But when i run it in hadoop,it failed.I guess there are some bugs when using regex.Who can give me some comments?Is regex can't support hadoop?
This python code is aim to extract qry and rc ,and count the value of rc ,and then print it as qry query_count rc_count .When run it in hadoop,it report
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1.
I search google,there may some bug in your mapper code.So how can i fix it?
log formats like that,
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|1|1 discount=20 indv_type=0 rep_query=
And my python code is that
import sys
import re
for line in sys.stdin:
count_result = 0
line = line.strip()
match=re.search('.*qry=(.*?)qid0.*rc=(.*?)discount',line).groups()
if (len(match)<2):
continue
counts_tmp = match[1].strip()
counts=counts_tmp.split('|')
for count in counts:
if count.isdigit():
count_result += int(count)
key_tmp = match[0].strip()
if key_tmp.strip():
key = key_tmp.split('\t')
key = ' '.join(key)
print '%s\t%s\t%s' %(key,1,count_result)
Most likely is that your regular expression catches more that you expect. I would suggest to split it to some more simple parts like:
(?<= qry=).*(?= quid0)
and
(?<= rc=).*(?= discount)
Taking a lot of assumptions and hazarding an educated guess, you might be able to parse your log like this:
from collections import defaultdict
input = """NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|1|1 discount=20 indv_type=0 rep_query=
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=boats qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|5|2 discount=20 indv_type=0 rep_query=
NOTICE: 01-03 23:57:23: [a.cpp][b][222] show_ver=11 sid=ae1d esid=6WVj uid=D1 a=20 qry=cars qid0=293 loc_src=4 phn=0 mid=0 wvar=c op=0 qry_src=0 op_type=1 src=110|120|111 at=60942 rc=3|somestring|12 discount=20 indv_type=0 rep_query="""
d = defaultdict (lambda: 0)
for line in input.split ("\n"):
tokens = line.split (" ")
count = 0
qry = None
for token in tokens:
pair = token.split ("=")
if len (pair) != 2: continue
key, value = pair
if key == "qry":
qry = value
if key == "rc":
values = value.split ("|")
for value in values:
try: count += int (value)
except: pass
if qry: d [qry] += count
print (d)
Assuming, that (a) key-value pairs are separated by spaces, and (b) there are no spaces inside neither keys nor values.
Here is a scraper I created using Python on ScraperWiki:
import lxml.html
import re
import scraperwiki
pattern = re.compile(r'\s')
html = scraperwiki.scrape("http://www.shanghairanking.com/ARWU2012.html")
root = lxml.html.fromstring(html)
for tr in root.cssselect("#UniversityRanking tr:not(:first-child)"):
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
data = {
'arwu_rank' : str(re.sub(pattern, r'', tr.cssselect("td.ranking")[0].text_content())),
'university' : tr.cssselect("td.rankingname")[0].text_content().strip()
}
# DEBUG BEGIN
if not type(data["arwu_rank"]) is str:
print type(data["arwu_rank"])
print data["arwu_rank"]
print data["university"]
# DEBUG END
if "-" in data["arwu_rank"]:
arwu_rank_bounds = data["arwu_rank"].split("-")
data["arwu_rank"] = int( ( float(arwu_rank_bounds[0]) + float(arwu_rank_bounds[1]) ) * 0.5 )
if not type(data["arwu_rank"]) is int:
data["arwu_rank"] = int(data["arwu_rank"])
scraperwiki.sqlite.save(unique_keys=['university'], data=data)
It works perfectly except when scraping the final data row of the table (the "York University" line), at which point instead of lines 9 through 11 of the code causing the string "401-500" to be retrieved from the table and assigned to data["arwu_rank"], those lines somehow seem instead to be causing the int 450 to be assigned to data["arwu_rank"]. You can see that I've added a few lines of "debugging" code to get a better understanding of what's going on, but also that that debugging code doesn't go very deep.
I have two questions:
What are my options for debugging scrapers run on the ScraperWiki infrastructure, e.g. for troubleshooting issues like this? E.g. is there a way to step through?
Can you tell me why the the int 450, instead of the string "401-500", is being assigned to data["arwu_rank"] for the "York University" line?
EDIT 6 May 2013, 20:07h UTC
The following scraper completes without issue, but I'm still unsure why the first one failed on the "York University" line:
import lxml.html
import re
import scraperwiki
pattern = re.compile(r'\s')
html = scraperwiki.scrape("http://www.shanghairanking.com/ARWU2012.html")
root = lxml.html.fromstring(html)
for tr in root.cssselect("#UniversityRanking tr:not(:first-child)"):
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
data = {
'arwu_rank' : str(re.sub(pattern, r'', tr.cssselect("td.ranking")[0].text_content())),
'university' : tr.cssselect("td.rankingname")[0].text_content().strip()
}
# DEBUG BEGIN
if not type(data["arwu_rank"]) is str:
print type(data["arwu_rank"])
print data["arwu_rank"]
print data["university"]
# DEBUG END
if "-" in data["arwu_rank"]:
arwu_rank_bounds = data["arwu_rank"].split("-")
data["arwu_rank"] = int( ( float(arwu_rank_bounds[0]) + float(arwu_rank_bounds[1]) ) * 0.5 )
if not type(data["arwu_rank"]) is int:
data["arwu_rank"] = int(data["arwu_rank"])
scraperwiki.sqlite.save(unique_keys=['university'], data=data)
There's no easy way to debug your scripts on ScraperWiki, unfortunately it just sends your code in its entirety and gets the results back, there's no way to execute the code interactively.
I added a couple more prints to a copy of your code, and it looks like the if check before the bit that assigns data
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
doesn't trigger for "York University" so it will be keeping the int value (you set it later on) from the previous time around the loop.