Regarding memory consumption in python with urllib2 module - python

PS: I have a similar question with Requests HTTP library here.
I am using python v2.7 on windows 7 OS. I am using urllib2 module. I have two code snippets. One file is named as myServer.py The server class has 2 methods named as getName(self,code) and getValue(self).
The other script named as testServer.py simply calls the methods from the server class to retrieve the values and prints them. The server class basically retrieves the values from a Server in my local network. So, unfortunately I can't provide you the access for testing the code.
Problem: When I execute my testServer.py file, I observed in the task manager that the memory consumption keeps increasing. Why is it increasing and how to avoid it? If I comment out the following line
print serverObj.getName(1234)
in testServer.py then there is no increase in memory consumption.
I am sure that the problem is with the getName(self,code) of the server class. But unfortunately, I couldn't figure out what the problem is.
Code: Please find the code snippets below:
#This is the myServer.py file
import urllib2
import json
import random
class server():
def __init__(self):
url1 = 'https://10.0.0.1/'
username = 'user'
password = 'passw0rd'
passwrdmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
passwrdmgr.add_password(None, url1, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passwrdmgr)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
def getName(self, code):
code = str(code)
url = 'https://10.0.0.1/' + code
response = urllib2.urlopen(url)
data = response.read()
name = str(data).strip()
return name
def getValue(self):
value = random.randrange(0,11)
return value
The following is the testServer.py snippet
from myServer import server
import time
serverObj = server()
while True:
time.sleep(1)
print serverObj.getName(1234)
print serverObj.getValue()
Thank you for your time!

This is question is quite similar to my other question. So I think the answer is also quite similar. The answer can be found here https://stackoverflow.com/a/23172330/2382792

Related

Constructing a URI in Python?

I'm trying to convert a Java program to Python, and one thing that I am currently stuck on is working with URI's in Python. I found urllib.response in Python, but I'm struggling to figure out how to utilize it.
What I'm trying to do with this URI is obtain user info (particularly username and password), the host and path. In Java, there are associated methods (getUserInfo(), getHost(), and getPath()) for this, but I'm having trouble finding equivalents for this in Python, even after looking up the urllib.response Python documentation.
The equivalent code in Java is:
URI dbUri = new URI(env);
username = dbUri.getUserInfo().split(":")[0];
password = dbUri.getUserInfo().split(":")[1];
dbUrl = "jdbc:postgresql://" + dbUri.getHost() + dbUri.getPath();
and what would be the appropriate methods that could be used to convert this to Python?
Seems like you'd want to use something like urllib.parse.urlparse.
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse
from urllib.parse import urlparse
db_url = urlparse(raw_url_string)
username = db_url.username
password = db_url.password
host = db_url.net_loc
path = db_url.path
...
You might need to adjust this a bit. There is a subtle difference between urlparse and urlsplit regarding parameters. Then you can use one of the urlunparse or unsplit.
Code
from urllib import parse
from urllib.parse import urlsplit
url = 'http://localhost:5432/postgres?user=postgres&password=somePassword'
split_url = urlsplit(url)
hostname = split_url.netloc
path = split_url.path
params = dict(parse.parse_qsl(split_url.query))
username = params['user']
password = params['password']
db_url = "jdbc:postgresql://" + hostname + path
print(db_uri)
Output
jdbc:postgresql://localhost:5432/postgres

Loop to check if a variable has changed in Python

I have just learned the basics of Python, and I am trying to make a few projects so that I can increase my knowledge of the programming language.
Since I am rather paranoid, I created a script that uses PycURL to fetch my current IP address every x seconds, for VPN security. Here is my code[EDITED]:
import requests
enterIP = str(input("What is your current IP address?"))
def getIP():
while True:
try:
result = requests.get("http://ipinfo.io/ip")
print(result.text)
except KeyboardInterrupt:
print("\nProccess terminated by user")
return result.text
def checkIP():
while True:
if enterIP == result.text:
pass
else:
print("IP has changed!")
getIP()
checkIP()
Now I would like to expand the idea, so that the script asks the user to enter their current IP, saves that octet as a string, then uses a loop to keep running it against the PycURL function to make sure that their IP hasn't changed? The only problem is that I am completely stumped, I cannot come up with a function that would take the output of PycURL and compare it to a string. How could I achieve that?
As #holdenweb explained, you do not need pycurl for such a simple task, but nevertheless, here is a working example:
import pycurl
import time
from StringIO import StringIO
def get_ip():
buffer = StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, "http://ipinfo.io/ip")
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
return buffer.getvalue()
def main():
initial = get_ip()
print 'Initial IP: %s' % initial
try:
while True:
current = get_ip()
if current != initial:
print 'IP has changed to: %s' % current
time.sleep(300)
except KeyboardInterrupt:
print("\nProccess terminated by user")
if __name__ == '__main__':
main()
As you can see I moved the logic of getting the IP to separate function: get_ip and added few missing things, like catching the buffer to a string and returning it. Otherwise it is pretty much the same as the first example in pycurl quickstart
The main function is called below, when the script is accessed directly (not by import).
First off it calls the get_ip to get initial IP and then runs the while loop which checks if the IP has changed and lets you know if so.
EDIT:
Since you changed your question, here is your new code in a working example:
import requests
def getIP():
result = requests.get("http://ipinfo.io/ip")
return result.text
def checkIP():
initial = getIP()
print("Initial IP: {}".format(initial))
while True:
current = getIP()
if initial == current:
pass
else:
print("IP has changed!")
checkIP()
As I mentioned in the comments above, you do not need two loops. One is enough. You don't even need two functions, but better do. One for getting the data and one for the loop. In the later, first get initial value and then run the loop, inside which you check if value has changed or not.
It seems, from reading the pycurl documentation, like you would find it easier to solve this problem using the requests library. Curl is more to do with file transfer, so the library expects you to provide a file-like object into which it writes the contents. This would greatly complicate your logic.
requests allows you to access the text of the server's response directly:
>>> import requests
>>> result = requests.get("http://ipinfo.io/ip")
>>> result.text
'151.231.192.8\n'
As #PeterWood suggested, a function would be more appropriate than a class for this - or if the script is going to run continuously, just a simple loop as the body of the program.

Python URLLib does not work with PyQt + Multiprocessing

A simple code as such:
import urllib2
import requests
from PyQt4 import QtCore
import multiprocessing
import time
data = (
['a', '2'],
)
def mp_worker((inputs, the_time)):
r = requests.get('http://www.gpsbasecamp.com/national-parks')
request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
def mp_handler():
p = multiprocessing.Pool(2)
p.map(mp_worker, data)
if __name__ == '__main__':
mp_handler()
Basically, if i import PyQt4, and i have a urllib request (i believe this is used in almost all web extraction libraries such as BeautifulSoup, Requests or Pyquery. it crashes with a cryptic log on my MAC)
This is exactly True. It always fails on Mac, I have wasted rows of days just to fix this. And honestly there is no fix as of now. The best way is to use Thread instead of Process and it will work like a charm.
By the way -
r = requests.get('http://www.gpsbasecamp.com/national-parks')
and
request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
do one and the same thing. Why are you doing it twice?
This may be due _scproxy.get_proxies() not being fork-safe on Mac.
This is raised here https://bugs.python.org/issue33725#msg329926
_scproxy has been known to be problematic for some time, see for instance Issue31818. That issue also gives a simple workaround: setting urllib's "no_proxy" environment variable to "*" will prevent the calls to the System Configuration framework.
This is something that urllib may be attempting to do causing failure when multiprocessing.
There is a workaround and that is to set the environmental variable no-proxy to *
Eg. export no_proxy=*

How to access header information sent to http.server from an AJAX client using python 3+?

I am trying to read data sent to python's http.server from a local javascript program posted with AJAX. Everything works in python 2.7 as in this example, but now in python 3+ I can't access the header anymore to get the file length.
# python 2.7 (works!)
class handler_class(SimpleHTTPServer.SimpleHTTPRequestHandler):
def do_POST(self):
if self.path == '/data':
length = int(self.headers.getheader('Content-Length'))
NewData = self.rfile.read(length)
I've discovered I could use urllib.request, as I have mocked up below. However, I am running on a localhost and don't have a full url as I've seen in the examples, and I am starting to second guess if this is even the right way to go? Frustratingly, I can see the content-length printed out in the console, but I can't access it.
# python 3+ (url?)
import urllib.request
class handler_class(http.server.SimpleHTTPRequestHandler):
def do_POST(self):
if self.path == '/data':
print(self.headers) # I can see the content length here but cannot access it!
d = urllib.request.urlopen(url) # what url?
length = int(d.getheader('Content-Length'))
NewData = self.rfile.read(length)
Various url's I have tried are:
self.path
http://localhost:8000/data
/data
and I generally get this error:
ValueError: unknown url type: '/data'
So why is 'urllib.request' failing me and more importantly, how does one access 'self.header' in this Python3 world?

GData: How to avoid hardcoding login info

I am working on a reporting system which automatically updates results overnight and puts them in files on a google drive.
The way it is working right now is by hardcoding the login and password information which is by no means ideal even if it works. A search in StackOverflow does not point this question specifically, which surprises me.
A very simplified example with the relevant sections of code looks like:
import gdata.docs.service
class GDrive(object):
def __init__(self, email, password)
self.gd_client = gdata.docs.service.DocService()
self.gd_client.ClientLogin(email, password)
def upload(self):
# Code to Upload file somewhere in GDrive.
gd = GDrive("the#email.address", "ThePassword")
gd.upload()
Can something be done to avoid writing explicitly the username and password?
I would make use of the OAuth2 protocol. It is a secure way to store credentials for a long time (but not forever).
A bit of a short answer from my cell, but check:
https://developers.google.com/drive/about-auth
And this bit makes working with Oauth2 a lot easier:
https://developers.google.com/api-client-library/python/platforms/google_app_engine#Decorators
import gdata.docs.service
import sys
class GDrive(object):
def __init__(self, email, password)
self.gd_client = gdata.docs.service.DocService()
self.gd_client.ClientLogin(email, password)
def upload(self):
# Code to Upload file somewhere in GDrive.
if __name__ == "__main__":
username = sys.argv[1]
password = sys.argv[2]
gd = GDrive(username, password)
gd.upload()
now run from your commandline like script.py the#email.address ThePassword where script.py is the name of your python scripts...

Categories