Find_between function with indexed output? - python

I'd like to use the find_between function to retrieve index-able values from a specific web server.
I'm using the requests module to gather some source code from a specific website seen on line 18:
response = requests.get("https://www.shodan.io/search?query=Server%3A+SQ-WEBCAM")
and I'd like to call the find_between function to retrieve all the values (all items on page each item represented by the incrementing value of 'n') with the specified find_between parameters:
x = find_between(response.content,'/></a><a href="/host/','">---')
Anyone know how to pull this off?
import sys
import requests
from time import sleep
# Find between page tags on page.
def find_between( s, tag1, tag2 ):
try:
start = s.index( tag1 ) + len( tag1 )
end = s.index( tag2, start )
return s[start:end]
except ValueError:
return ""
def main():
# Default value for 'n' index value (item on page) is 0
n = 0
# Enter the command 'go' to start
cmd = raw_input("Enter Command: ")
if cmd == "go":
print "go!"
# Go to this page for page item gathering.
response = requests.get("https://www.shodan.io/search?query=Server%3A+SQ-WEBCAM")
# Initial source output...
print response.content
# Find between value of 'x' sources between two tags
x = find_between(response.content,'/></a><a href="/host/','">---')
while(True):
# Wait one second before continuing...
sleep(1)
n = n + 1
# Display find_between data in 'x'
print "\nindex: %s\n\n%s\n" % (n, x)
# Enter 'exit' to exit script
if cmd == "exit":
sys.exit()
# Recursive function call
while(True):
main()

A few things in your code appear to need addressing:
The value of x is set outside (before) your while loop, so the loop increments the index n but prints the same text over and over because x never changes.
find_between() returns only a single match, and you want all matches.
Your while loop never ends.
Suggestions:
Put the call to find_between() inside the while loop.
Each successive time you call find_between(), pass it only the portion of the text following the previous match.
Exit the while loop when find_between() finds no match.
Something like this:
text_to_search = response.content
while(True):
# Find between value of 'x' sources between two tags
x = find_between(text_to_search, '/></a><a href="/host/', '">---')
if not x:
break
# Wait one second before continuing...
sleep(1)
# Increment 'n' for index value of item on page
n = n + 1
# Display find_between data in 'x'
print "\nindex: %s\n\n%s\n" % (n, x)
# Remove text already searched
found_text_pos = text_to_search.index(x) + len(x)
text_to_search = text_to_search[found_text_pos:]

Related

How to stop adding values to a list with empty line

I'm trying to add values to a list, and stop adding with empty line:
def main():
signal = []
i = 0
print("Enter the data points of the signal. Stop with empty line.\n")
while i != "":
value = float(input())
signal.append(value)
i += 1
print(signal)
main()
However, when I press enter (empty line) I get the following error:
File "C:\Users\Omistaja\Downloads\template_median_filter.py", line 30, in main
value = float(input())
ValueError: could not convert string to float: ''
How to proceed?
You almost got it:
def main():
signal = []
i = 0
print("Enter the data points of the signal. Stop with empty line.\n")
while True:
i += 1
data = input("Data point {}: ".format(i))
if data == "":
break
signal.append(float(data))
print("\nThe signal is: {}".format(signal))
main()
You don't need a counter. Just check if the input is empty
signal = []
while True:
points = input("Enter a data point for the signal. Stop with empty line.\n")
if not points:
break
signal.append(float(points))
print(signal)

Trying to produce correct recursion

Didn't know who to ask at this time of night. But I'm trying to implement recursion for the first time with not much background knowledge. I am getting some result on the right track but the program is now in an infinite loop.
def url_open(url, count, position):
for i in range(count):
newURL = 0
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
tags = soup.find_all("a")
newURL = dict_populate(tags, position)
url_open(newURL, count - 1, position)
def dict_populate(tags, position):
workingCOUNT = 0
workingDICT = {}
newURL = 0
for tag in tags:
workingCOUNT += 1
for key,value in tag.attrs.items():
workingDICT[workingCOUNT] = value
new = workingDICT[position]
return new
url = input("Enter - ")
var1 = input("Enter count - ")
var2 = input("Enter position - ")
searchCOUNT = int(var1)
urlPOSI = int(var2)
url_open(url, searchCOUNT, urlPOSI)
print("The last url retrieved: ", url)
It works with low values of count 1, 2, 3, but over that it gets into an infinite loop.
Any suggestions?
EDIT:
I have posted the whole program.
This program parses webpage for a URL. The website that I'm asked to use is a website that contains links to other websites of the same links but in different order. I need to find the url in position n and that repeat the process for n other websites until I find the last one.
Have a look at while statement.
While Loops is used to execute a block of statements repeatedly until a given condition is satisfied.
# assign a default value to a variable called "count"
count = 0
# iterate until test expression is False
while (count < 5):
count += 1;
# print results
print("count is :" + str(count) )
Output:
count is :1
count is :2
count is :3
count is :4
count is :5
Obviously, you can manage while statements inside a for loop.
for item in my_items_list:
while some_conditions:
# do something here

Downloading a webpage using urllib3

I'm trying to write a program for an assignment that uses urllib3 to download a webpage and store it in a dictionary. (I'm using spyder 3.6)
The program is giving me an 'AttributeError' and I have no idea what I'm doing wrong. here is my code with step by step notes I wrote for the assignment.
#Downloading a webpage
import urllib3
import sys
#these import statements allow us to use 'modules' aka 'libraries' ....
#code written by others that we can use
urlToRead = 'http://www.google.com'
#This value won't actually get used, because of the way the while loop
#below is set up. But while loops often need a dummy value like this to
#work right the first time
crawledWebLinks = {}
#Initialize an empty dictionary, in which (key, value) pairs will correspond to (short, url) eg
#("Goolge" , "http://www.google.com")
#Ok, there is a while loop coming up
#Here ends the set up
while urlToRead != ' ':
#This is a condition that dictates that the while loop will keep checking
#as long as this condition is true the loop will continue, if false it will stop
try:
urlToRead = input("Please enter the next URL to crawl")
#the "try" prevents the program from crashing if there is an error
#if there is an error the program will be sent to the except block
if urlToRead == '':
print ("OK, exiting loop")
break
#if the user leaves the input blank it will break out of the loop
shortName = input("Please enter a short name for the URL " + urlToRead)
webFile = urllib3.urlopen(urlToRead).read()
#This line above uses a ready a readymade function in the urllib3 module to
#do something super - cool:
#IT takes a url, goes to the website for the url, downloads the
#contents (which are in the form of HTML) and returns them to be
#stored in a string variable (here called webFile)
crawledWebLinks[shortName] = webFile
#this line above place a key value pair (shortname, HTML for that url)
#in the dictionary
except:
#this bit of code - the indented lines following 'except:' will be
#excecuted if the code in the try block (the indented following lines
#the 'try:' above) throw and error
#this is an example of something known as exeption-handling
print ("*************\nUnexpected Error*****", sys.exc_info()[0])
#The snip 'sys.exc_info()[0]' return information about the last
#error that occurred -
#this code is made available through the sys library that we imported above
#Quite Magical :)
stopOrProceed = input("Hmm..stop or proceed? Enter 1 to stop, enter anything else to continue")
if stopOrProceed ==1 :
print ('OK...Stopping\n')
break
#this break will break out of the nearest loop - in this case,
#the while loop
else:
print ("Cool! Let's continue\n")
continue
# this continue will skip out of the current iteration of this
#loop and move to the next i.e. the loop will reset to the start
print (crawledWebLinks.keys())
Your issue is that you are trying to call urllib3.urlopen(), and urllib3 does not have a member urlopen Here is a working snippet. All that I did was replace urllib3 with urllib.request:
import urllib.request
import sys
urlToRead = 'http://www.google.com'
crawledWebLinks = {}
while urlToRead != ' ':
try:
urlToRead = input("Please enter the next URL to crawl: ")
if urlToRead == '':
print ("OK, exiting loop")
break
#if the user leaves the input blank it will break out of the loop
shortName = input("Please enter a short name for the URL " + urlToRead + ": ")
webFile = urllib.request.urlopen(urlToRead).read()
crawledWebLinks[shortName] = webFile
except:
print ("*************\nUnexpected Error*****", sys.exc_info()[0])
stopOrProceed = input("Hmm..stop or proceed? Enter 1 to stop, enter anything else to continue")
if stopOrProceed ==1 :
print ('OK...Stopping\n')
break
else:
print ("Cool! Let's continue\n")
continue
print (crawledWebLinks)
Another note, simply printing out the type of error in your except block is not very useful. I was able to debug your code in 30 seconds once I removed that and viewed the actual traceback.

String manipulating in python

I'm trying to make a function that will take a string an remove any blocks of text from it. For example turning "(example) somestuff" into "somestuff" removing any blocked text from the string. This is a single function for a large program that is meant to automatically create directories based on the files name and move relevant files into said folder. I think I'm running into an endless loop but lost as to what by problem is.
startbrackets = '[', '('
endbrackets = ']', ')'
digits = range(0,10)
def striptoname(string):
startNum = 0
endNum = 0
finished = True
indexBeginList = []
indexEndList = []
while (finished):
try:
for bracket in startbrackets:
indexBeginList.append(string.find(bracket, 0, len(string)))
except:
print "Search Start Bracket Failed"
wait()
exit()
# Testing Code START
finished = False
for i in indexBeginList:
if i != -1:
finished = True
startNum = i
break
# Testing Code END
try:
for bracket in endbrackets:
indexEndList.append(string.find(bracket, 0, len(string)))
except:
print "Search End Bracket Failed"
wait()
exit()
# Testing Code START
for i in indexEndList:
if i != -1:
endNum = i
break
# Testing Code END
if(finished):
if(startNum == 0):
string = string[:(endNum+1)]
else:
string = string[0:startNum]
for i in digits:
string.replace(str(i),"")
return string
Here's an approach using re:
import re
def remove_unwanted(s):
# This will look for a group of any characters inside () or [] and substitute an empty string, "", instead of that entire group.
# The final strip is to eliminate any other empty spaces that can be leftover outside of the parenthesis.
return re.sub("((\(|\[).*(\)|\]))", "", s).strip()
print(remove_unwanted("[some text] abcdef"))
>>> "abcdef"
print(remove_unwanted("(example) somestuff"))
>>> "somestuff"

Using Split arguments in other functions

The code is suppose to take a string of multiple arguments and split them with the "Split()". It does do that, but it only passes the first argument to the "CheckList()". So if I type " 1 2 4" it will only pass "1" to CheckList. Everything else works as it should.
import re
def CheckList(Start):
DoIt = 0
s = int(Start)
End = s + 1
End = str(End)
for PodCheck in F.readlines():
if re.match('Pod' + End, PodCheck.strip()):
DoIt = 0
if re.match('Pod' + Start, PodCheck.strip()):
DoIt = 1
if DoIt == 1:
print PodCheck,
return
def Split(P):
Pods = P.split()
for Pod in Pods:
CheckList(Pod)
return
F = open("C:\Users\User\Desktop\IP_List.txt")
Pod = raw_input('What pod number would you like to check?: ')
Split(Pod.strip())
print 'Done'
Your problem is right here:
for PodCheck in F.readlines():
The first call to CheckList uses up all the data in F. Subsequent calls to Checklist skip the for loop because there is nothing left to read.
So after opening F your should read all of it's data. Without changing too much of your code I would add this after you open your file:
F_lines = F.readlines()
And change to loop in CheckList to
for PodCheck in F_lines:

Categories