Is there a way to make a function that makes other functions to be called later named after the variables passed in?
For the example let's pretend https://example.com/engine_list returns this xml file, when I call it in get_search_engine_xml
<engines>
<engine address="https://www.google.com/">Google</engine>
<engine address="https://www.bing.com/">Bing</engine>
<engine address="https://duckduckgo.com/">DuckDuckGo</engine>
</engines>
And here's my code:
import re
import requests
import xml.etree.ElementTree as ET
base_url = 'https://example.com'
def make_safe(s):
s = re.sub(r"[^\w\s]", '', s)
s = re.sub(r"\s+", '_', s)
s = str(s)
return s
# This is what I'm trying to figure out how to do correctly, create a function
# named after the engine returned in get_search_engine_xml(), to be called later
def create_get_engine_function(function_name, address):
def function_name():
r = requests.get(address)
return function_name
def get_search_engine_xml():
url = base_url + '/engine_list'
r = requests.get(url)
engines_list = str(r.content)
engines_root = ET.fromstring(engines_list)
for child in engines_root:
engine_name = child.text.lower()
engine_name = make_safe(engine_name)
engine_address = child.attrib['address']
create_get_engine_function(engine_name, engine_address)
## Runs without error.
get_search_engine_xml()
## But if I try to call one of the functions.
google()
I get the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'google' is not defined
Defining engine_name and engine_address seems to be working when I log it out. So I'm pretty sure the problem lies in create_get_engine_function, which admittedly I don't know what I'm doing and I was trying to piece together from similar questions.
Can you name a function created by another function with an argument that's passed in? Is there a better way to do this?
You can assign them to globals()
def create_get_engine_function(function_name, address):
def function():
r = requests.get(address)
function.__name__ = function_name
function.__qualname__ = function_name # for Python 3.3+
globals()[function_name] = function
Although, depending on what you're actually trying to accomplish, a better design would be to store all the engine names/addresses in a dictionary and access them as needed:
# You should probably should rename this to 'parse_engines_from_xml'
def get_search_engine_xml():
...
search_engines = {} # maps names to addresses
for child in engines_root:
...
search_engines[engine_name] = engine_address
return search_engines
engines = get_search_engine_xml()
e = requests.get(engines['google'])
<do whatever>
e = requests.get(engines['bing'])
<do whatever>
Related
I'm creating a web scraper that will be used to value stocks. The problem I got is that my code returns a object "placement" (Not sure what it should be called) instead of the value.
import requests
class Guru():
MedianPE = 0.0
def __init__(self, ticket):
self.ticket = ticket
try:
url = ("https://www.gurufocus.com/term/pettm/"+ticket+"/PE-Ratio-TTM/")
response = requests.get(url)
htmlText = response.text
firstSplit = htmlText
secondSplit = firstSplit.split("And the <strong>median</strong> was <strong>")[1]
thirdSplit = secondSplit.split("</strong>")[0]
lastSplit = float(thirdSplit)
try:
Guru.MedianPE = lastSplit
except:
print(ticket + ": Median PE N/A")
except:
print(ticket + ": Median PE N/A")
def getMedianPE(self):
return float(Guru.getMedianPE)
g1 = Guru("AAPL")
g1.getMedianPE
print("Median + " + str(g1))
If I print the lastSplit inside the __init__ it returns the value I want 15.53 but when I try to get it by the function getMedianPE I just get Median + <__main__.Guru object at 0x0000016B0760D288>
Thanks a lot for your time!
Looks like you are trying to cast a function object to a float. Simply change return float(Guru.getMedianPE) to return float(Guru.MedianPE)
getMedianPE is a function (also called object method when part of a class), so you need to call it with parentheses. If you call it without parentheses, you get the method/function itself rather than the result of calling the method/function.
The other problem is that getMedianPE returns the function Guru.getMedianPE rather than the value Guru.MedianPE. I don't think you want MedianPE to be a class variable - you probably just want to set it as a default of 0 in init so that each object has its own median_PE value.
Also, it is not a good idea to include all of the scraping code in your init method. That should be moved to a scrape() method (or some other name) that you call after instantiating the object.
Finally, if you are going to print an object, it is useful to have a str method, so I added a basic one here.
So putting all of those comments together, here is a recommended refactor of your code.
import requests
class Guru():
def __init__(self, ticket, median_PE=0):
self.ticket = ticket
self.median_PE = median_PE
def __str__(self):
return f'{self.ticket} {self.median_PE}'
def scrape(self):
try:
url = f"https://www.gurufocus.com/term/pettm/{self.ticket}/PE-Ratio-TTM/"
response = requests.get(url)
htmlText = response.text
firstSplit = htmlText
secondSplit = firstSplit.split("And the <strong>median</strong> was <strong>")[1]
thirdSplit = secondSplit.split("</strong>")[0]
lastSplit = float(thirdSplit)
self.median_PE = lastSplit
except ValueError:
print(f"{self.ticket}: Median PE N/A")
Then you run the code
>>>g1 = Guru("AAPL")
...g1.scrape()
...print(g1)
AAPL 15.53
I am having a bit of an issue. First off, I know that this code is able to stand alone and not be in a class but I would prefer that it is in a class. Second, when I run the code, I get this error TypeError: set_options() takes exactly 2 arguments (1 given) .
Here is my code. If anyone could point me in the right direction, I would appreciate it. I'm assuming that the set_options method isn't getting my jobj instance. Am I correct in assuming that and how would one go about fixing this? ps. I do have the correct imports and here is my py command at terminal python test.py radar 127.0.0.1 hashNumber testplan:speed
class TransferStuff(object):
tool = sys.argv[1]
target = sys.argv[2]
hash = sys.argv[3]
options = sys.argv[4]
def set_options(self, test_options):
option_arr = test_options.split(',')
new_arr = [i + ':{}'.format(i) for i in option_arr if ':' not in i]
for i in option_arr:
if ':' in i:
new_arr.append(i)
d = {}
for i in new_arr:
temp = i.split(':')
d[temp[0]] = temp[1]
return d
data = {'target': target, 'test': tool, 'HASH': hash,
'options': set_options(options)}
def write_to_json(self):
"""Serialize cli args and tool options in json format.
Write stream to json file.
"""
with open('envs.json', 'w') as fi:
json.dump(TransferStuff.data, fi)
if __name__ == "__main__":
try:
jobj = TransferStuff()
jobj.write_to_json()
Your method is inside a class, you need to create a instance of the class:
transfer_stuff_instance = TransferStuff()
And call the method with this instance:
transfer_stuff_instance.ser_options(options)
In my python file, I have created a class called Download. The code where the class is:
import requests, json, os, pytube, threading
class Download:
def __init__(self, url, json=False, get=False, post=False, put=False, unwanted="", wanted="", unwanted2="", wanted2="", unwanted3="", wanted3=""):
self.url = url
self.json = json
self.get = get
self.post = post
self.put = put
self.unwanted = unwanted
self.wanted = wanted
self.unwanted2 = unwanted2
self.wanted2 = wanted2
self.unwanted3 = unwanted3
self.wanted3 = wanted3
def downloadJson(self):
if self.get is True:
downloadJson = requests.get(self.url)
downloadJson = str(downloadJson.content)
downloadJsonS = str(downloadJson) # This saves the downloaded JSON file as string
if self.json is True:
with open("downloadedJson.json", "w") as writeDownloadedJson:
writeDownloadedJson.write(json.dumps(downloadJson))
writeDownloadedJson.close()
with open("downloadedJson.json", "r") as replaceUnwanted:
a = replaceUnwanted.read()
x = a.replace(self.unwanted, self.wanted)
# y = a.replace(self.unwanted2, self.wanted2)
# z = a.replace(self.unwanted3, self.wanted3)
print(x)
with open("downloadedJson.json", "w") as writeUnwanted:
# writeUnwanted.write(y)
# writeUnwanted.write(z)
writeUnwanted.write(x)
else:
# with open("downloadedJson.json", "w")as j:
# j.write(downloadJsonS)
# j.close()
pass
I have written all this by myself, and I understand how it works. My objective is to remove all the unwanted characters that come in the JSON file once downloaded, such as: \\n, \' or \n. I have many arguments in the __init__() function, like the __init__(unwanted="", wanted="", unwanted2="") etcetera.
By this, when adding any character to the unwanted parameter, such as: \\n, it should replace all these characters by a space. This is done properly, and it works. The lines of code that are comments are the lines of code that I was using, but that did not work. It would only replace the characters from only 1 argument.
Is there any way of passing all the unwanted characters in each for each argument, using threads. If it is not possible using threads, is there any alternative?
By the way, the file where I am executing the class: (main.py):
from downloader import Download
with open("url.txt", "r")as url:
x = Download(url.read(), get=True, json=True, unwanted="\\n")
x.downloadJson()
Thanks
You could apply the replacements one after another:
x = a.replace(self.unwanted, self.wanted)
x = x.replace(self.unwanted2, self.wanted2)
x = x.replace(self.unwanted3, self.wanted3)
You could also chain the replacement together, but that would quickly become hard to read:
x = a.replace(...).replace(...).replace(...)
Btw, instead of having multiple unwantedN and wantedN,
it would be probably a lot easier to use a list of (unwanted, wanted) pairs, something like this:
def __init__(self, url, json=False, get=False, post=False, put=False, replacements=[]):
self.url = url
self.json = json
self.get = get
self.post = post
self.put = put
self.replacements = replacements
And then you could perform the replacements in a loop:
x = a
for unwanted, wanted in self.replacements:
x = x.replace(unwanted, wanted)
I'm studying how to use mocking in my unit test program.
Now I have a SafeConfigParser object and I want to test what I write is correct.
After google the mocking usage of SafeConfigParser, I already know how to test the read of SafeConfigParser. But I still don't know how to verify the write of SafeConfigParser.
My idea is:
Make a empty buffer.
Consider a method that can set the buffer to SafeConfigParser.
Call the function which include SafeConfigParser.write()
Verify the buffer with my answer.
My program which need to be tested is like following:
def write_tokens_to_config(self):
"""Write token values to the config
"""
parser = SafeConfigParser()
with open(self.CONFIG_PATH) as fp:
parser.readfp(fp)
if not parser.has_section('Token'):
parser.add_section('Token')
parser.set('Token', 'access_token', self._access_token)
parser.set('Token', 'refresh_token', self._refresh_token)
with open(self.CONFIG_PATH, 'wb') as fp:
parser.write(fp)
P.S. You can check the read part from this url: http://www.snip2code.com/Snippet/4347/
I finally find out a solution :).
I modify my program(ex: program.py) to the followings:
class Program():
def __init__(self):
self._access_token = None
self._refresh_token = None
self.CONFIG_PATH = 'test.conf'
def write_tokens_to_config(self):
"""Write token value to the config
"""
parser = SafeConfigParser()
parser.read(self.CONFIG_PATH)
if not parser.has_section('Token'):
parser.add_section('Token')
parser.set('Token', 'access_token', self._access_token)
parser.set('Token', 'refresh_token', self._refresh_token)
with open(self.CONFIG_PATH, 'wb') as f:
parser.write(f)
And my test program like this:
class TestMyProgram(unittest.TestCase):
def setUp(self):
from program import Program
self.program = Program()
def test_write_tokens_to_config(self):
from mock import mock_open
from mock import call
self.program._access_token = 'aaa'
self.program._refresh_token = 'bbb'
with mock.patch('program.ConfigParser.SafeConfigParser.read'):
m = mock_open()
with mock.patch('__builtin__.open', m, create=True):
self.program.write_tokens_to_config()
m.assert_called_once_with(self.program.CONFIG_PATH, 'wb')
handle = m()
handle.write.assert_has_calls(
[
call('[Token]\n'),
call('access_token = aaa\n'),
call('refresh_token = bbb\n'),
]
)
Ref: http://docs.python.org/dev/library/unittest.mock#mock-open
I'm a bit new to Python dev -- I'm creating a larger project for some web scraping. I want to approach this as "Pythonically" as possible, and would appreciate some help with the project structure. Here's how I'm doing it now:
Basically, I have a base class for an object whose purpose is to go to a website and parse some specific data on it into its own array, jobs[]
minion.py
class minion:
# Empty getJobs() function to be defined by object pre-instantiation
def getJobs(self):
pass
# Constructor for a minion that requires site authorization
# Ex: minCity1 = minion('http://portal.com/somewhere', 'user', 'password')
# or minCity2 = minion('http://portal.com/somewhere')
def __init__(self, title, URL, user='', password=''):
self.title = title
self.URL = URL
self.user = user
self.password = password
self.jobs = []
if (user == '' and password == ''):
self.reqAuth = 0
else:
self.reqAuth = 1
def displayjobs(self):
for j in self.jobs:
j.display()
I'm going to have about 100 different data sources. The way I'm doing it now is to just create a separate module for each "Minion", which defines (and binds) a more tailored getJobs() function for that object
Example: minCity1.py
from minion import minion
from BeautifulSoup import BeautifulSoup
import urllib2
from job import job
# MINION CONFIG
minTitle = 'Some city'
minURL = 'http://www.somewebpage.gov/'
# Here we define a function that will be bound to this object's getJobs function
def getJobs(self):
page = urllib2.urlopen(self.URL)
soup = BeautifulSoup(page)
# For each row
for tr in soup.findAll('tr'):
tJob = job()
span = tr.findAll(['span', 'class="content"'])
# If row has 5 spans, pull data from span 2 and 3 ( [1] and [2] )
if len(span) == 5:
tJob.title = span[1].a.renderContents()
tJob.client = 'Some City'
tJob.source = minURL
tJob.due = span[2].div.renderContents().replace('<br />', '')
self.jobs.append(tJob)
# Don't forget to bind the function to the object!
minion.getJobs = getJobs
# Instantiate the object
mCity1 = minion(minTitle, minURL)
I also have a separate module which simply contains a list of all the instantiated minion objects (which I have to update each time I add one):
minions.py
from minion_City1 import mCity1
from minion_City2 import mCity2
from minion_City3 import mCity3
from minion_City4 import mCity4
minionList = [mCity1,
mCity2,
mCity3,
mCity4]
main.py references minionList for all of its activities for manipulating the aggregated data.
This seems a bit chaotic to me, and was hoping someone might be able to outline a more Pythonic approach.
Thank you, and sorry for the long post!
Instead of creating functions and assigning them to objects (or whatever minion is, I'm not really sure), you should definitely use classes instead. Then you'll have one class for each of your data sources.
If you want, you can even have these classes inherit from a common base class, but that isn't absolutely necessary.