Why would this section of python code hang during execution? - python

try:
html = urlopen('http://glbse.com/api/asset/' + asset.name)
except:
print 'error while updating the price of ' + asset.name
continue
json_txt = html.read()
ticker = json.loads(json_txt)
average_price = int(ticker['t24havg'])
if average_price == 0:
average_price = int(ticker['t5davg'])
if average_price == 0:
average_price = int(ticker['t7davg'])
if average_price == 0:
average_price = int(ticker['latest_trade'])
if average_price == 0:
print 'could not determine the price of ' + asset.name
continue
asset.average_price = average_price
I am using mechanize for urlopen.
This code (and the rest of the program) seems to run fine for hours but, after having looped through this section thousands of times, will eventually hang somewhere in this section of code.
It hangs for an indefinite length of time. I've even come back to it to find it had hung there for hours.
Googling the issue all I come up with are reports of a similar issue, where execution hangs on .read() which is reported to be fixed years ago.
So what is causing execution to hang and how can I fix or workaround it?

using mechanize.Browser().open() instead of urlopen reveals a "urllib2.URLError urlopen connection time out" which is not raised when using urlopen alone. I strongly suspect this is the problem and my solution is to use mechanize.Browser().open() in place of urlopen in all cases

Related

Catching website changes with python using urlopen read function

Hi I am a high school student who has not used python to code programs much, and I was having trouble with creating code to check when a website was updated. I have looked at different resources and I have used them to create what I have but when I run the code it doesn't seem to work and do what I expect it to do. When I run the code I expect it to tell me if a site has been updated or stayed the same from when I last checked it. I put some print statements in the code to try to catch the issue, but it has only showed me that the website has changed even though it doesn't look like it has changed.
import time
import hashlib
from urllib.request import urlopen, Request
url = Request('https://www.canada.ca/en/immigration-refugees-citizenship/services/immigrate-canada/express-entry/submit-profile/rounds-invitations.html')
res = urlopen(url).read()
current = hashlib.sha224(res).hexdigest()
print("running")
time.sleep(10)
while True:
try:
res = urlopen(url).read()
current = hashlib.sha224(res).hexdigest()
print(current)
print(res)
time.sleep(30)
res = urlopen(url).read()
newHash = hashlib.sha224(res).hexdigest()
print (newHash)
print(res)
if newHash == current:
print ("nothing changed")
continue
else:
print("there was a change")
except AttributeError as e:
print ("error")

Having trouble using Beautiful Soup's 'Next Sibling' to extract some information

On Auction websites, there is a clock counting down the time remaining. I am trying to extract that piece of information (among others) to print to a csv file.
For example, I am trying to take the value after 'Time Left:' on this site: https://auctionofchampions.com/Michael_Jordan___Magic_Johnson_Signed_Lmt__Ed__Pho-LOT271177.aspx
I have tried 3 different options, without any success
1)
time = ''
try:
time = soup.find(id='tzcd').text.replace('Time Left:','')
#print("Time: ",time)
except Exception as e:
print(e)
time = ''
try:
time = soup.find(id='tzcd').text
#print("Time: ",time)
except:
pass
3
time = ''
try:
time = soup.find('div', id="BiddingTimeSection").find_next_sibling("div").text
#print("Time: ",time)
except:
pass
I am a new user of Python and don't know if it's because of the date/time structure of the pull or because of something else inherently flawed with my code.
Any help would be greatly appreciated!
That information is being pulled into page via a Javascript XHR call. You can see that by inspecting Network tab in browser's Dev tools. The following code will get you the time left in seconds:
import requests
s = requests.Session()
header = {'X-AjaxPro-Method': 'GetTimerText'}
payload = '{"inventoryId":271177}'
r = s.get('https://auctionofchampions.com/Michael_Jordan___Magic_Johnson_Signed_Lmt__Ed__Pho-LOT271177.aspx')
s.headers.update(header)
r = s.post('https://auctionofchampions.com/ajaxpro/LotDetail,App_Web_lotdetail.aspx.cdcab7d2.1voto_yr.ashx', data=payload)
print(r.json()['value']['timeLeft'])
Response:
792309
792309 seconds are a bit over 9 days. There are easy ways to return them in days/hours/minutes, if you want.

using automated testing to send text and image(outside) from exel list to whatsapp file but not sending this every contact list

Loop works when import image is not scripted
pre = os.path.dirname(os.path.realpath(__file__))
f_name = 'wpcontacts.xlsx'
path = os.path.join(pre, f_name)
f_name = pandas.read_excel(path)
count = 0
image_url = input("url here")
driver = webdriver.Chrome(executable_path='D:/Old Data/Integration Files/new/chromedriver')
driver.get('https://web.whatsapp.com')
sleep(25)
for column in f_name['Contact'].tolist():
try:
driver.get('https://web.whatsapp.com/send?phone=' + str(f_name['Contact'][count]) + '&text=' + str(
f_name['Messages'][0]))
sent = False
sleep(7)
# It tries 3 times to send a message in case if there any error occurred
click_btn = driver.find_element(By.XPATH,
'/html/body/div[1]/div/div/div[4]/div/footer/div[1]/div/span[2]/div/div[2]/div[2]/button/span')
file_path = 'amazzon.jpg'
driver.find_element(By.XPATH,
'//*[#id="main"]/footer/div[1]/div/span[2]/div/div[1]/div[2]/div/div/span').click()
sendky = driver.find_element(By.XPATH,
'//*[#id="main"]/footer/div[1]/div/span[2]/div/div[1]/div[2]/div/span/div/div/ul/li[1]/button/span')
input_box = driver.find_element(By.TAG_NAME, 'input')
input_box.send_keys(image_url)
sleep(3)
except Exception:
print("Sorry message could not sent to " + str(f_name['Contact'][count]))
else:
sleep(3)
driver.find_element(By.XPATH,
'//*[#id="app"]/div/div/div[2]/div[2]/span/div/span/div/div/div[2]/div/div[2]/div[2]/div/div').click()
sleep(2)
print('Message sent to: ' + str(f_name['Contact'][count]))
count = count + 1
output is
Message sent to: 919891350373
Process finished with exit code 0
how convert this code into loop so that i can send text to every no. mentioned in exel file
thanks
Firstly, if what you've written in the question is the code you are using, I am confused how you aren't getting a syntax error due to the tab spacing eg here:
try:
driver.get('https://web.whatsapp.com/send?phone=' + str(f_name['Contact'][count]) + '&text=' + str(
f_name['Messages'][0]))
I am going to assume this is a mixup related to copy-paste.
Next, I'll just mention the following: I highly doubt you need a 25-second sleep for the page to load, and the default test timeout, and the default timeout for Selenium tests is 30 seconds, so with the other sleeps you've added I'm not sure why it's not simply timing out unless you've overridden this timeout in some other part of the code that's not added in your question.
What is the point of doing driver.get('https://web.whatsapp.com'), then following it with another driver.get()?
All this aside, it would make sense to me that your problem lies with the spacing for your increment count = count + 1; it is not inside your for loop in the code as I see it. So, the count is not actually incremented in the loop itself but rather after the whole loop is executed. If it does not help to add a tab before the count increment, I'm quite sure that you've made some mistake(s) pasting the code here so please organize it such that we can see what code is actually being executed.
Finally, another comment I have: the xpaths you've got scare me. You should almost NEVER use an absolute xpath (like '/html/body/div[1]/div/div/div[4]/div/footer/div[1]/div/span[2]/div/div[2]/div[2]/button/span'). Just about any change to the HTML on the page will cause this to break. I haven't the time to find better selectors for you, but I highly recommend you examine these.
Let me know whether any of the above helps or not!

Module urllib.request not getting data

I am trying to test this demo program from lynda using Python 3. I am using Pycharm as my IDE. I already added and installed the request package, but when I run the program, it runs cleanly and shows a message "Process finished with exit code 0", but does not show any output from print statement. Where am I going wrong ?
import urllib.request # instead of urllib2 like in Python 2.7
import json
def printResults(data):
# Use the json module to load the string data into a dictionary
theJSON = json.loads(data)
# now we can access the contents of the JSON like any other Python object
if "title" in theJSON["metadata"]:
print(theJSON["metadata"]["title"])
# output the number of events, plus the magnitude and each event name
count = theJSON["metadata"]["count"];
print(str(count) + " events recorded")
# for each event, print the place where it occurred
for i in theJSON["features"]:
print(i["properties"]["place"])
# print the events that only have a magnitude greater than 4
for i in theJSON["features"]:
if i["properties"]["mag"] >= 4.0:
print("%2.1f" % i["properties"]["mag"], i["properties"]["place"])
# print only the events where at least 1 person reported feeling something
print("Events that were felt:")
for i in theJSON["features"]:
feltReports = i["properties"]["felt"]
if feltReports != None:
if feltReports > 0:
print("%2.1f" % i["properties"]["mag"], i["properties"]["place"], " reported " + str(feltReports) + " times")
# Open the URL and read the data
urlData = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson"
webUrl = urllib.request.urlopen(urlData)
print(webUrl.getcode())
if webUrl.getcode() == 200:
data = webUrl.read()
data = data.decode("utf-8") # in Python 3.x we need to explicitly decode the response to a string
# print out our customized results
printResults(data)
else:
print("Received an error from server, cannot retrieve results " + str(webUrl.getcode()))
Not sure if you left this out on purpose, but this script isn't actually executing any code beyond the imports and function definition. Assuming you didn't leave it out on purpose, you would need the following at the end of your file.
if __name__ == '__main__':
data = "" # your data
printResults(data)
The check on __name__ equaling "__main__" is just so your code is only executing when the file is explicitly run. To always run your printResults(data) function when the file is accessed (like, say, if its imported into another module) you could just call it at the bottom of your file like so:
data = "" # your data
printResults(data)
I had to restart the IDE after installing the module. I just realized and tried it now with "Run as Admin". Strangely seems to work now.But not sure if it was a temp error, since even without restart, it was able to detect the module and its methods.
Your comments re: having to restart your IDE makes me think that pycharm might not automatically detect newly installed python packages. This SO answer seems to offer a solution.
SO answer

fetch text from a web site and displaying it back

Currently, there's a game that has different groups, and you can play for a prize 'gold' every hour. Sometimes there is gold, sometimes there isn't. It is posted on facebook every hour ''gold in group2" or "gold in group6'', and other times there isn't a post due to no gold being a prize for that hour. I want to write a small script that will check the site hourly and grab the result (if there is gold or not, and what group) and display it back to me. I was wanting to write it in python as I'm learning it. Would this be the best language to use? And how would I go about doing this? All I can really find is information on extracting links. I don't want to extract links, just the text. Thanks for any and all help. I appreciate it.
Check out urllib2 for getting html from a url and BeautifulSoup/HTMLParser/etc to parse the html. Then, you could use something like this as a starting point for the script:
import time
import urllib2
import BeautifulSoup
import HTMLParser
def getSource(url, postdata):
source = ""
req = urllib2.Request(url, postdata)
try:
sock = urllib2.urlopen(req)
except urllib2.URLError, exc:
# handle the error..
pass
else:
source = sock.read()
finally:
try:
sock.close()
except:
pass
return source
def parseSource(source):
pass
# parse source with BeautifulSoup/HTMLParser, or here...
def main():
last_run = 0
while True:
t1 = time.time()
# check if 1 hour has passed since last_run
if t1 - last_run >= 3600:
source = getSource("someurl.com", "user=me&blah=foo")
last_run = time.time()
parseSource(source)
else:
# sleep for 60 seconds and check time again.
time.sleep(60)
return 0
if __name__ == "__main__":
sys.exit(main())
Here is a good article about parsing-html-with-python
I have something similiar to what you have, but you left out what my main question revolves around. I looked at htmlparser and bs, but I am unsure how to do something like if($posttext == gold) echo "gold in so and so".. seems like bs deals a lot with tags..i suppose since facebook posts can use a variety of tags, how would i go about doing just a search on the text and to return the 'post' ??

Categories