Python Skipping my for loop - python

First off note that I am a beginner in Python. Taking a class currently dealing with Python in an ArcGIS environment. My current project is a simple program to create files and copy other files to them. However part of the assignment is to have print statements state what is going on as it happens, for instance the final print statement should look like:
Processing Polygon FeatureClasses....
Processing FeatureClass >> Building
Field Information:
etc.
Here is the small bit of my code that should do that:
pointlist = arcpy.ListFeatureClasses("*", "Point")
print "Processing Point FeatureClasses..."
for pl in pointlist:
arcpy.MakeFeatureLayer_management(pl, "Point" + 1)
pointlayer = arcpy.SelectLayerByLocation_management(pl, "intersect", MapGridID)
pointcount = int(arcpy.GetCount_management(pointlayer).getOutput(0))
if pointcount >= 1:
arcpy.CopyFeatures_management(pointlayer, OutputGDB)
for pl in pointlist:
print "Processing FeatureClass:" + pl
pointfield = arcpy.ListFields()
for pf in pointfield:
print "Field Name:" + pf
The issue arises that it prints the first print statement, "Processing Point FeatureClasses" but doesn't do the rest and then skips to my next part of the code and executes that. Any idea why? Sorry if my formatting or wording is off/sounds weird. Thank you.
EDIT
I emailed my professor as well asking for some guidance and he wrote back a slightly edited version of my above code block. I can now get everything to print out except the pointfield print statements, those are now getting skipped. Here is the code:
pointlist = arcpy.ListFeatureClasses("*", "Point")
print "Processing Point FeatureClasses...\n"
i = 1
for pl in pointlist:
print "Processing FeatureClass: " + pl
featlayernamepoint = "Point" + str(i)
arcpy.MakeFeatureLayer_management(pl, featlayernamepoint)
arcpy.SelectLayerByLocation_management(featlayernamepoint, "intersect", featurelayerMG2)
pointcount = int(arcpy.GetCount_management(featlayernamepoint).getOutput(0))
if pointcount >= 1:
arcpy.CopyFeatures_management(featlayernamepoint, OutputGDB)
pointfield = arcpy.ListFields(featlayernamepoint)
for pf in pointfield:
print "Field Name: " + pf.name
i += 1

You forgot to pass the point to ListFields()
pointfield = arcpy.ListFields(pl)
for pf in pointfield:
print "Field Name:" + pf.name

Related

PYTHON - Itertools.combinations results in memory error

I am creating a brute force script (for my studies) on python which connects to my login.php form and tries a combinaison of every characters of the alphabet to find the password.
The thing is that, after something like 110 000 combinaison, I am getting a memory error.
I already looked up on the net to find solutions which seems to be :
--> gc.collect() & del var
I tried to add the gc.collect() on the blank field of my function bruteforce so everytime i % 100 == 0, I clear the memory space but It dosen't work.
I can't find the way to add it in my script to free memory.
I think the problem is that I can't clear the memory space when my function is running. Maybe I should play with many threads like :
start 1
stop 1
clear space
start 2
stop 2
clear space
etc...
Do you guys have any suggestions ?
Here is my code :
import itertools, mechanize, threading, gc, os, psutil
charset='abcdefghijklmnopqrstuvwxyz'
br=mechanize.Browser()
combi=itertools.combinations(charset, 2)
br.open('http://192.168.10.105/login.php')
def check_memory():
process = psutil.Process(os.getpid())
mem_info = process.memory_info()
return mem_info.rss
def bruteforce():
i = 0
for x in combi:
br.select_form(nr=0)
br.form ['username'] = 'myusername'
tried = br.form ['password'] = ''.join(x)
i = i+1
print "Checking : ", tried, " Number of try : ", i
response=br.submit()
if response.geturl()=="http://192.168.10.105/index.php":
print("The password is : ", ''.join(x))
break
if i % 100 == 0:
mem_before = check_memory():
print "Memory before : ", mem_before
print "Free memory "
mem_after = check_memory()
print "Memory after : ", mem_after
x = threading.Thread(target = bruteforce)
x.start()
Thank you !
Thanks to #chepner, I found that the issue was with mechanize, not with itertools.
A good way to fix it is by clearing the history : "Out of Memory" error with mechanize

Python on Visual Studio Error

Can somebody help me identify the error here? I am using Python on Visual Studio, and I have copied the code from another source. When I paste the code into a playground like Ideone.com, the program runs without issue. Only when I paste into visual studio do I get an error. The error is in the last two lines of code that begin with "print". This is probably a huge noob question, but I am a beginner. Help!
import hashlib as hasher
import datetime as date
# Define what a Snakecoin block is
class Block:
def __init__(self, index, timestamp, data, previous_hash):
self.index = index
self.timestamp = timestamp
self.data = data
self.previous_hash = previous_hash
self.hash = self.hash_block()
def hash_block(self):
sha = hasher.sha256()
sha.update(str(self.index) + str(self.timestamp) + str(self.data) + str(self.previous_hash))
return sha.hexdigest()
# Generate genesis block
def create_genesis_block():
# Manually construct a block with
# index zero and arbitrary previous hash
return Block(0, date.datetime.now(), "Genesis Block", "0")
# Generate all later blocks in the blockchain
def next_block(last_block):
this_index = last_block.index + 1
this_timestamp = date.datetime.now()
this_data = "Hey! I'm block " + str(this_index)
this_hash = last_block.hash
return Block(this_index, this_timestamp, this_data, this_hash)
# Create the blockchain and add the genesis block
blockchain = [create_genesis_block()]
previous_block = blockchain[0]
# How many blocks should we add to the chain
# after the genesis block
num_of_blocks_to_add = 20
# Add blocks to the chain
for i in range(0, num_of_blocks_to_add):
block_to_add = next_block(previous_block)
blockchain.append(block_to_add)
previous_block = block_to_add
# Tell everyone about it!
print "Block #{} has been added to the blockchain!".format(block_to_add.index)
print "Hash: {}\n".format(block_to_add.hash)
In visual studio and most other editors, they are updated to the most recent python, also meaning that you have to place parentheses after the print statement.
For example:
print("hello")
Visual Studio runs Python 3.x.x, so, you have to write the print sentences on this way:
print('String to print')
Because in Python 3 print is a function, not a reserved word like in Python 2.

Unpack ValueError in Python

I was making a site component scanner with Python. Unfortunately, something goes wrong when I added another value to my script. This is my script:
#!/usr/bin/python
import sys
import urllib2
import re
import time
import httplib
import random
# Color Console
W = '\033[0m' # white (default)
R = '\033[31m' # red
G = '\033[1;32m' # green bold
O = '\033[33m' # orange
B = '\033[34m' # blue
P = '\033[35m' # purple
C = '\033[36m' # cyan
GR = '\033[37m' # gray
#Bad HTTP Responses
BAD_RESP = [400,401,404]
def main(path):
print "[+] Testing:",host.split("/",1)[1]+path
try:
h = httplib.HTTP(host.split("/",1)[0])
h.putrequest("HEAD", "/"+host.split("/",1)[1]+path)
h.putheader("Host", host.split("/",1)[0])
h.endheaders()
resp, reason, headers = h.getreply()
return resp, reason, headers.get("Server")
except(), msg:
print "Error Occurred:",msg
pass
def timer():
now = time.localtime(time.time())
return time.asctime(now)
def slowprint(s):
for c in s + '\n':
sys.stdout.write(c)
sys.stdout.flush() # defeat buffering
time.sleep(8./90)
print G+"\n\t Whats My Site Component Scanner"
coms = { "index.php?option=com_artforms" : "com_artforms" + "link1","index.php?option=com_fabrik" : "com_fabrik" + "ink"}
if len(sys.argv) != 2:
print "\nUsage: python jx.py <site>"
print "Example: python jx.py www.site.com/\n"
sys.exit(1)
host = sys.argv[1].replace("http://","").rsplit("/",1)[0]
if host[-1] != "/":
host = host+"/"
print "\n[+] Site:",host
print "[+] Loaded:",len(coms)
print "\n[+] Scanning Components\n"
for com,nme,expl in coms.items():
resp,reason,server = main(com)
if resp not in BAD_RESP:
print ""
print G+"\t[+] Result:",resp, reason
print G+"\t[+] Com:",nme
print G+"\t[+] Link:",expl
print W
else:
print ""
print R+"\t[-] Result:",resp, reason
print W
print "\n[-] Done\n"
And this is the error message that comes up:
Traceback (most recent call last):
File "jscan.py", line 69, in <module>
for com,nme,expl in xpls.items():
ValueError: need more than 2 values to unpack
I already tried changing the 2 value into 3 or 1, but it doesn't seem to work.
xpls.items returns a tuple of two items, you're trying to unpack it into three. You initialize the dict yourself with two pairs of key:value:
coms = { "index.php?option=com_artforms" : "com_artforms" + "link1","index.php?option=com_fabrik" : "com_fabrik" + "ink"}
besides, the traceback seems to be from another script - the dict is called xpls there, and coms in the code you posted...
you can try
for (xpl, poc) in xpls.items():
...
...
because dict.items will return you tuple with 2 values.
You have all the information you need. As with any bug, the best place to start is the traceback. Let's:
for com,poc,expl in xpls.items():
ValueError: need more than 2 values to unpack
Python throws ValueError when a given object is of correct type but has an incorrect value. In this case, this tells us that xpls.items is an iterable an thus can be unpacked, but the attempt failed.
The description of the exception narrows down the problem: xpls has 2 items, but more were required. By looking at the quoted line, we can see that "more" is 3.
In short: xpls was supposed to have 3 items, but has 2.
Note that I never read the rest of the code. Debugging this was possible using only those 2 lines.
Learning to read tracebacks is vital. When you encounter an error such as this one again, devote at least 10 minutes to try to work with this information. You'll be repayed tenfold for your effort.
As already mentioned, dict.items() returns a tuple with two values. If you use a list of strings as dictionary values instead of a string, which should be split anyways afterwards, you can go with this syntax:
coms = { "index.php?option=com_artforms" : ["com_artforms", "link1"],
"index.php?option=com_fabrik" : ["com_fabrik", "ink"]}
for com, (name, expl) in coms.items():
print com, name, expl
>>> index.php?option=com_artforms com_artforms link1
>>> index.php?option=com_fabrik com_fabrik ink

Debugging ScraperWiki scraper (producing spurious integer)

Here is a scraper I created using Python on ScraperWiki:
import lxml.html
import re
import scraperwiki
pattern = re.compile(r'\s')
html = scraperwiki.scrape("http://www.shanghairanking.com/ARWU2012.html")
root = lxml.html.fromstring(html)
for tr in root.cssselect("#UniversityRanking tr:not(:first-child)"):
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
data = {
'arwu_rank' : str(re.sub(pattern, r'', tr.cssselect("td.ranking")[0].text_content())),
'university' : tr.cssselect("td.rankingname")[0].text_content().strip()
}
# DEBUG BEGIN
if not type(data["arwu_rank"]) is str:
print type(data["arwu_rank"])
print data["arwu_rank"]
print data["university"]
# DEBUG END
if "-" in data["arwu_rank"]:
arwu_rank_bounds = data["arwu_rank"].split("-")
data["arwu_rank"] = int( ( float(arwu_rank_bounds[0]) + float(arwu_rank_bounds[1]) ) * 0.5 )
if not type(data["arwu_rank"]) is int:
data["arwu_rank"] = int(data["arwu_rank"])
scraperwiki.sqlite.save(unique_keys=['university'], data=data)
It works perfectly except when scraping the final data row of the table (the "York University" line), at which point instead of lines 9 through 11 of the code causing the string "401-500" to be retrieved from the table and assigned to data["arwu_rank"], those lines somehow seem instead to be causing the int 450 to be assigned to data["arwu_rank"]. You can see that I've added a few lines of "debugging" code to get a better understanding of what's going on, but also that that debugging code doesn't go very deep.
I have two questions:
What are my options for debugging scrapers run on the ScraperWiki infrastructure, e.g. for troubleshooting issues like this? E.g. is there a way to step through?
Can you tell me why the the int 450, instead of the string "401-500", is being assigned to data["arwu_rank"] for the "York University" line?
EDIT 6 May 2013, 20:07h UTC
The following scraper completes without issue, but I'm still unsure why the first one failed on the "York University" line:
import lxml.html
import re
import scraperwiki
pattern = re.compile(r'\s')
html = scraperwiki.scrape("http://www.shanghairanking.com/ARWU2012.html")
root = lxml.html.fromstring(html)
for tr in root.cssselect("#UniversityRanking tr:not(:first-child)"):
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
data = {
'arwu_rank' : str(re.sub(pattern, r'', tr.cssselect("td.ranking")[0].text_content())),
'university' : tr.cssselect("td.rankingname")[0].text_content().strip()
}
# DEBUG BEGIN
if not type(data["arwu_rank"]) is str:
print type(data["arwu_rank"])
print data["arwu_rank"]
print data["university"]
# DEBUG END
if "-" in data["arwu_rank"]:
arwu_rank_bounds = data["arwu_rank"].split("-")
data["arwu_rank"] = int( ( float(arwu_rank_bounds[0]) + float(arwu_rank_bounds[1]) ) * 0.5 )
if not type(data["arwu_rank"]) is int:
data["arwu_rank"] = int(data["arwu_rank"])
scraperwiki.sqlite.save(unique_keys=['university'], data=data)
There's no easy way to debug your scripts on ScraperWiki, unfortunately it just sends your code in its entirety and gets the results back, there's no way to execute the code interactively.
I added a couple more prints to a copy of your code, and it looks like the if check before the bit that assigns data
if len(tr.cssselect("td.ranking")) > 0 and len(tr.cssselect("td.rankingname")) > 0:
doesn't trigger for "York University" so it will be keeping the int value (you set it later on) from the previous time around the loop.

benchmarking python script reveals mysterious time delays

i have two modules: moduleParent and moduleChild
i'm doing something like this in moduleParent:
import moduleChild
#a bunch of code
start = time.time()
moduleChild.childFunction()
finish = time.time()
print "calling child function takes:", finish-start, "total seconds"
#a bunch of code
i'm doing something like this in moduleChild:
def childFunction():
start = time.time()
#a bunch of code
finish = time.time()
print "child function says it takes:", finish-start, "total seconds"
the output looks like this:
calling child function takes: .24 total seconds
child function says it takes: 0.0 total seconds
so my question is, where are these .24 extra seconds coming from?
thank you for your expertise.
#
here is the actual code for "childFuntion". it really shouldn't take .24 seconds.
def getResources(show, resourceName='', resourceType=''):
'''
get a list of resources with the given name
#show: show name
#resourceName: name of resource
#resourceType: type of resource
#return: list of resource dictionaries
'''
t1 = time.time()
cmd = r'C:\tester.exe -cmdFile "C:\%s\info.txt" -user root -pwd root'%show
cmd += " -cmd findResources -machineFormatted "
if resourceName:
cmd += '-name %s'%resourceName
if resourceType:
cmd += '_' + resourceType.replace(".", "_") + "_"
proc=subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output = proc.stdout.read()
output = output.strip()
resourceData = output.split("\r\n")
resourceData = resourceData[1:]
resourceList = []
for data in resourceData:
resourceId, resourceName, resourceType = data.split("|")
rTyp = "_" + resourceType.replace(".", "_") + "_"
shot, assetName = resourceName.split(rTyp)
resourceName = assetName
path = '//projects/%s/scenes/%s/%s/%s'%(show, shot, resourceType.replace(".", "/"), assetName)
resourceDict = {'id':resourceId, 'name':resourceName, 'type':resourceType, 'path':path }
resourceList.append(resourceDict)
t2 = time.time()
print (" ", t2 - t2, "seconds")
return resourceList
Edit 2: I just noticed a typo in child function, you have t2 - t2 in the print statement
ignore below:
Calling the function itself has overhead (setting up stack space, saving local variables, returning, etc). The result suggests that your function is so trivial that setting up for a function call took longer than running the code itself.
Edit: also, calling the timers as well as print ads overhead. Now that I think about it, calling print could account for a lot of that .24 seconds. IO is slow.
You can't measure the time of a function by running it once, especially one which runs so short. There are a myriad of factors which could affect the timing, not the least of which is what other processes you have running on the system.
Something like this I would probably run at least a few hundred times. Check out the timeit module.

Categories