Python OOP Project Organization - python

I'm a bit new to Python dev -- I'm creating a larger project for some web scraping. I want to approach this as "Pythonically" as possible, and would appreciate some help with the project structure. Here's how I'm doing it now:
Basically, I have a base class for an object whose purpose is to go to a website and parse some specific data on it into its own array, jobs[]
minion.py
class minion:
# Empty getJobs() function to be defined by object pre-instantiation
def getJobs(self):
pass
# Constructor for a minion that requires site authorization
# Ex: minCity1 = minion('http://portal.com/somewhere', 'user', 'password')
# or minCity2 = minion('http://portal.com/somewhere')
def __init__(self, title, URL, user='', password=''):
self.title = title
self.URL = URL
self.user = user
self.password = password
self.jobs = []
if (user == '' and password == ''):
self.reqAuth = 0
else:
self.reqAuth = 1
def displayjobs(self):
for j in self.jobs:
j.display()
I'm going to have about 100 different data sources. The way I'm doing it now is to just create a separate module for each "Minion", which defines (and binds) a more tailored getJobs() function for that object
Example: minCity1.py
from minion import minion
from BeautifulSoup import BeautifulSoup
import urllib2
from job import job
# MINION CONFIG
minTitle = 'Some city'
minURL = 'http://www.somewebpage.gov/'
# Here we define a function that will be bound to this object's getJobs function
def getJobs(self):
page = urllib2.urlopen(self.URL)
soup = BeautifulSoup(page)
# For each row
for tr in soup.findAll('tr'):
tJob = job()
span = tr.findAll(['span', 'class="content"'])
# If row has 5 spans, pull data from span 2 and 3 ( [1] and [2] )
if len(span) == 5:
tJob.title = span[1].a.renderContents()
tJob.client = 'Some City'
tJob.source = minURL
tJob.due = span[2].div.renderContents().replace('<br />', '')
self.jobs.append(tJob)
# Don't forget to bind the function to the object!
minion.getJobs = getJobs
# Instantiate the object
mCity1 = minion(minTitle, minURL)
I also have a separate module which simply contains a list of all the instantiated minion objects (which I have to update each time I add one):
minions.py
from minion_City1 import mCity1
from minion_City2 import mCity2
from minion_City3 import mCity3
from minion_City4 import mCity4
minionList = [mCity1,
mCity2,
mCity3,
mCity4]
main.py references minionList for all of its activities for manipulating the aggregated data.
This seems a bit chaotic to me, and was hoping someone might be able to outline a more Pythonic approach.
Thank you, and sorry for the long post!

Instead of creating functions and assigning them to objects (or whatever minion is, I'm not really sure), you should definitely use classes instead. Then you'll have one class for each of your data sources.
If you want, you can even have these classes inherit from a common base class, but that isn't absolutely necessary.

Related

What is the py-algorand-sdk equivalent of "./sandbox goal account list"

TLDR; I am running an Algorand PrivateNet in a sandbox. I would like to know how I can use the Python SDK for Algorand to return the address and balance of the accounts on this network.
Longer version;
Until now I have been using "./sandbox goal account list" to find the account addresses and account balances in Bash, this returns:
[offline]
I3CMDHG236HVBDMCKEM22DLY5L2OJ3CBRUDHG4PVVFGEZ4QQR3X3KNHRMU
I3CMDHG236HVBDMCKEM22DLY5L2OJ3CBRUDHG4PVVFGEZ4QQR3X3KNHRMU
8012889074131520 microAlgos
[offline]
3NSWKJTYMW3PSZZDIE6NCHMRTOZ24VY5G3OJNDSTYRVRXMZBZVPCGQHJJI
3NSWKJTYMW3PSZZDIE6NCHMRTOZ24VY5G3OJNDSTYRVRXMZBZVPCGQHJJI
1001611000000000 microAlgos
[online]
5KLSI3AHMBBDALXBEO2HEA3PBBCBAYT4PIHCD3B25557WGWUZGRTQETPHQ
5KLSI3AHMBBDALXBEO2HEA3PBBCBAYT4PIHCD3B25557WGWUZGRTQETPHQ
100000 microAlgos
I have tried to utilise the algod and v2client.indexer modules as outlined below but instead of the 3 accounts listed above 4 are being listed by my code (the original 3 +1). The code:
#imports
from algosdk import algod, v2client
# instatiate AlgodClient
def connect_to_network():
algod_address = "http://localhost:4001"
algod_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
algod_client = algod.AlgodClient(algod_token, algod_address)
return algod_client
# instantiate IndexClient
def indexer():
indexer_token = ""
indexer_address = "http://localhost:8980"
indexer_client = v2client.indexer.IndexerClient(indexer_token, indexer_address)
return indexer_client
# returns all account details from indexer
def account_details():
raw_details = indexer().accounts() # <-- I think the error is here because this returns 4 addresses.
account_details = []
for details in raw_details["accounts"]:
account_details.append({"address":details["address"], "amount": details["amount"]})
return account_details
# returns the amount in each account
def account_amounts(account_address):
return connect_to_network().account_info(account_address)
# returns account address and amount for each account
for detail in account_details():
print(account_amounts(detail["address"])["address"])
print(account_amounts(detail["address"])["amount"])
This returns:
I3CMDHG236HVBDMCKEM22DLY5L2OJ3CBRUDHG4PVVFGEZ4QQR3X3KNHRMU
8013089091731545
NTMTJYBQHWUXEGVG3XRRX5CH6FCYJ3HKCSIOYW4DLAOF6V2WHSIIFMWMPY
1001636000000000
3NSWKJTYMW3PSZZDIE6NCHMRTOZ24VY5G3OJNDSTYRVRXMZBZVPCGQHJJI
1001636000000000
5KLSI3AHMBBDALXBEO2HEA3PBBCBAYT4PIHCD3B25557WGWUZGRTQETPHQ
100000
How can I replicate "./sandbox goal account list" with py-algorand-sdk?
(Bonus question) what is this extra mystery "account" that appears when I use the SDK compared to Bash?
The issue above arose because like with ./sandbox goal accounts listI was trying to return the accounts I had keys for locally in the KMD but my call to the indexer raw_details = indexer().accounts() was returning all accounts on the network.
To remedy this, instead of importing and using the v2client.indexer module import the wallet module from the algosdk library. These are the necessary imports:
from algosdk import algod, kmd, wallet
Then set up the connections to algod, kmd, and your wallet:
# instatiate AlgodClient
def connect_to_network():
algod_address = "http://localhost:4001"
algod_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
algod_client = algod.AlgodClient(algod_token, algod_address)
return algod_client
# instantiate KMDClient <--- this is used as a parameter when instantiating the Wallet Class.
def key_management():
kmd_address = "http://localhost:4002"
kmd_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
kmd_client = kmd.KMDClient(kmd_token, kmd_address)
return kmd_client
#get wallet details
def wallet_details():
wallet_name = 'unencrypted-default-wallet'
password = ""
kmd_client = key_management()
return wallet.Wallet(wallet_name, password, kmd_client)
You can now return the account addresses held locally in the KMD:
keys = wallet_details().list_keys()
Now that you have the addresses you can query use the algod client to return the amount of (Micro) Algo held at each of these addresses.

Django model error with nested dictionaries

I am relatively new to Django, but not to python, My model is trying to use a class (defined in a separate file) in which data is coming from a REST API, the retrieved data is in a nested dictionary. The code will run fine in python, but when I try it in Django (makemigrations), I get an error:
File "c:\blah-blah\Clone_PR.py", line 20, in GetFoundOnSelectItems
values = self._issueEdit["fields"]["customfield_13940"]["allowedValues"]
TypeError: 'NoneType' object is not subscriptable
I tried using type hints, but that does not work either.
models.py
from dal import autocomplete
from django.db import models
from django.contrib import messages
from .Login import jlogin
from .Jira_Constants import ProductionServer, TestServer, StageServer
from .Clone_PR import Issue
jira = None
issue = Issue()
class ClonePrLogin(models.Model):
username = models.CharField(max_length=30)
password = models.CharField(max_length=30)
#classmethod
def LoginToJira(cls):
global jira
jira = jlogin(ProductionServer, cls.username, cls.password)
class PrEntry(models.Model):
prToClone = models.CharField(max_length=20)
#classmethod
def GetIssueAndMeta(cls):
global issue
issue.initialize(jira, cls.prToClone)
class ClonePr(models.Model):
issueKey = issue.issueKey
issue.GetFoundOnSelectItems()
foundOnList = issue.foundOnSelectItems
foundOn = autocomplete.Select2ListChoiceField(choice_list=foundOnList)
Clone_PR.py
from typing import List, Dict
class Issue():
def __init__(self):
self.jiraInst = None
self.issueKey = ''
self._issue = None
self._issueEdit = None
# self._issueEdit = Dict[str, Dict[str, Dict[str, List[Dict[str, str]]]]]
self.foundOnSelectItems = []
def initialize(self, jira, prKey):
self.jiraInst = jira
self.issueKey = prKey
self._issue = jira.issue(prKey)
self._issueEdit = jira.editmeta(prKey)
def GetFoundOnSelectItems(self):
values = self._issueEdit["fields"]["customfield_13940"]["allowedValues"]
items = [x["value"] for x in values]
self.foundOnSelectItems = items
In Django, running makemigrations will load all the modules. You said you're familiar with Python so you should know that the declarations inside the class:
class ClonePr(models.Model):
issueKey = issue.issueKey
issue.GetFoundOnSelectItems()
foundOnList = issue.foundOnSelectItems
foundOn = autocomplete.Select2ListChoiceField(choice_list=foundOnList)
will run when the modules load. You're calling issue.GetFoundOnSelectItems() at that time, which in turn calls values = self._issueEdit["fields"]["customfield_13940"]["allowedValues"], except that self._issueEdit = None upon the initiation of instance Issue above with this line: issue = Issue().
I highly recommend you spend some time to become more familiar with how Django starts up an app. The module-level and nested model declarations here are both antipatterns and may cause data issues in the future.

How do I detect proper nouns in the Google NLP API?

Apologies if this isn't totally clear - I'm a Python copy-the-code-and-try-to-make-it-work developer.
I'm using the Google NLP API in Python 2.7.
When I use analyze_entities(), I can get and print the name, entity type and salience.
Mentions is supposed to contain the noun type: PROPER or COMMON, per this page:
https://cloud.google.com/natural-language/docs/reference/rest/v1beta1/Entity#EntityMention
I can't get mention type from the returned dictionary.
Here's my hideous code:
def entities_text(text, client):
"""Detects entities in the text."""
language_client = client
# Instantiates a plain text document.
document = language_client.document_from_text(text)
# Detects entities in the document. You can also analyze HTML with:
# document.doc_type == language.Document.HTML
entities = document.analyze_entities()
return entities
articles = os.listdir('articles')
for f in articles:
language_client = language.Client()
fname = "articles/" + f
thisfile = open(fname,'r')
content = thisfile.read()
entities = entities_text(content, language_client)
for e in entities:
name = e.name.strip()
type = e.entity_type.strip()
if e.name.strip()[0].isupper() and len(e.name.strip()) > 2:
print name, type, e.salience, e.mentions
That returns this:
RELATED OTHER 0.0019081507 [u'RELATED']
Zoe 3 PERSON 0.0016676666 [u'Zoe 3']
Where the value in [] is the mentions.
If I try to get mentions.type, I get an attribute not found error.
I'd appreciate any input.
1) Do not call the "AnalyzeEntities" function, but call the "AnnotateText" one instead.
2) Check for "Proper". Examine its value, it should be "PROPER" and not "PROPER_UNKNOWN" nor "NOT_PROPER".

Plone: Change default schemata of a content type

I am using the Products.FacultyStaffDirectory product to manage contacts in my website.
One of the content types, i.e. FacultyStaffDirectory, has the description field in the 'categorization' tab when you are in edit mode. Now I need to remove the description field and put it in the default tab.
To do this, I'm using archetypes.schemaextender product to edit the field. In particular, I am using ISchemaModifier.
I have implemented the code but the field is still being shown in the categorization tab. May be something I missed. Below is the code:
This is the class that contains the description field that I want to modify:
# -*- coding: utf-8 -*-
__author__ = """WebLion <support#weblion.psu.edu>"""
__docformat__ = 'plaintext'
from AccessControl import ClassSecurityInfo
from Products.Archetypes.atapi import *
from zope.interface import implements
from Products.FacultyStaffDirectory.interfaces.facultystaffdirectory import IFacultyStaffDirectory
from Products.FacultyStaffDirectory.config import *
from Products.CMFCore.permissions import View, ManageUsers
from Products.CMFCore.utils import getToolByName
from Products.ATContentTypes.content.base import ATCTContent
from Products.ATContentTypes.content.schemata import ATContentTypeSchema, finalizeATCTSchema
from Products.membrane.at.interfaces import IPropertiesProvider
from Products.membrane.utils import getFilteredValidRolesForPortal
from Acquisition import aq_inner, aq_parent
from Products.FacultyStaffDirectory import FSDMessageFactory as _
schema = ATContentTypeSchema.copy() + Schema((
LinesField('roles_',
accessor='getRoles',
mutator='setRoles',
edit_accessor='getRawRoles',
vocabulary='getRoleSet',
default = ['Member'],
multiValued=1,
write_permission=ManageUsers,
widget=MultiSelectionWidget(
label=_(u"FacultyStaffDirectory_label_FacultyStaffDirectoryRoles", default=u"Roles"),
description=_(u"FacultyStaffDirectory_description_FacultyStaffDirectoryRoles", default=u"The roles all people in this directory will be granted site-wide"),
i18n_domain="FacultyStaffDirectory",
),
),
IntegerField('personClassificationViewThumbnailWidth',
accessor='getClassificationViewThumbnailWidth',
mutator='setClassificationViewThumbnailWidth',
schemata='Display',
default=100,
write_permission=ManageUsers,
widget=IntegerWidget(
label=_(u"FacultyStaffDirectory_label_personClassificationViewThumbnailWidth", default=u"Width for thumbnails in classification view"),
description=_(u"FacultyStaffDirectory_description_personClassificationViewThumbnailWidth", default=u"Show all person thumbnails with a fixed width (in pixels) within the classification view"),
i18n_domain="FacultyStaffDirectory",
),
),
))
FacultyStaffDirectory_schema = OrderedBaseFolderSchema.copy() + schema.copy() # + on Schemas does only a shallow copy
finalizeATCTSchema(FacultyStaffDirectory_schema, folderish=True)
class FacultyStaffDirectory(OrderedBaseFolder, ATCTContent):
"""
"""
security = ClassSecurityInfo()
implements(IFacultyStaffDirectory, IPropertiesProvider)
meta_type = portal_type = 'FSDFacultyStaffDirectory'
# Make this permission show up on every containery object in the Zope instance. This is a Good Thing, because it easy to factor up permissions. The Zope Developer's Guide says to put this here, not in the install procedure (http://www.zope.org/Documentation/Books/ZDG/current/Security.stx). This is because it isn't "sticky", in the sense of being persisted through the ZODB. Thus, it has to run every time Zope starts up. Thus, when you uninstall the product, the permission doesn't stop showing up, but when you actually remove it from the Products folder, it does.
security.setPermissionDefault('FacultyStaffDirectory: Add or Remove People', ['Manager', 'Owner'])
# moved schema setting after finalizeATCTSchema, so the order of the fieldsets
# is preserved. Also after updateActions is called since it seems to overwrite the schema changes.
# Move the description field, but not in Plone 2.5 since it's already in the metadata tab. Although,
# decription and relateditems are occasionally showing up in the "default" schemata. Move them
# to "metadata" just to be safe.
if 'categorization' in FacultyStaffDirectory_schema.getSchemataNames():
FacultyStaffDirectory_schema.changeSchemataForField('description', 'categorization')
else:
FacultyStaffDirectory_schema.changeSchemataForField('description', 'metadata')
FacultyStaffDirectory_schema.changeSchemataForField('relatedItems', 'metadata')
_at_rename_after_creation = True
schema = FacultyStaffDirectory_schema
# Methods
security.declarePrivate('at_post_create_script')
def at_post_create_script(self):
"""Actions to perform after a FacultyStaffDirectory is added to a Plone site"""
# Create some default contents
# Create some base classifications
self.invokeFactory('FSDClassification', id='faculty', title='Faculty')
self.invokeFactory('FSDClassification', id='staff', title='Staff')
self.invokeFactory('FSDClassification', id='grad-students', title='Graduate Students')
# Create a committees folder
self.invokeFactory('FSDCommitteesFolder', id='committees', title='Committees')
# Create a specialties folder
self.invokeFactory('FSDSpecialtiesFolder', id='specialties', title='Specialties')
security.declareProtected(View, 'getDirectoryRoot')
def getDirectoryRoot(self):
"""Return the current FSD object through acquisition."""
return self
security.declareProtected(View, 'getClassifications')
def getClassifications(self):
"""Return the classifications (in brains form) within this FacultyStaffDirectory."""
portal_catalog = getToolByName(self, 'portal_catalog')
return portal_catalog(path='/'.join(self.getPhysicalPath()), portal_type='FSDClassification', depth=1, sort_on='getObjPositionInParent')
security.declareProtected(View, 'getSpecialtiesFolder')
def getSpecialtiesFolder(self):
"""Return a random SpecialtiesFolder contained in this FacultyStaffDirectory.
If none exists, return None."""
specialtiesFolders = self.getFolderContents({'portal_type': 'FSDSpecialtiesFolder'})
if specialtiesFolders:
return specialtiesFolders[0].getObject()
else:
return None
security.declareProtected(View, 'getPeople')
def getPeople(self):
"""Return a list of people contained within this FacultyStaffDirectory."""
portal_catalog = getToolByName(self, 'portal_catalog')
results = portal_catalog(path='/'.join(self.getPhysicalPath()), portal_type='FSDPerson', depth=1)
return [brain.getObject() for brain in results]
security.declareProtected(View, 'getSortedPeople')
def getSortedPeople(self):
""" Return a list of people, sorted by SortableName
"""
people = self.getPeople()
return sorted(people, cmp=lambda x,y: cmp(x.getSortableName(), y.getSortableName()))
security.declareProtected(View, 'getDepartments')
def getDepartments(self):
"""Return a list of FSDDepartments contained within this site."""
portal_catalog = getToolByName(self, 'portal_catalog')
results = portal_catalog(portal_type='FSDDepartment')
return [brain.getObject() for brain in results]
security.declareProtected(View, 'getAddableInterfaceSubscribers')
def getAddableInterfaceSubscribers():
"""Return a list of (names of) content types marked as addable using the
IFacultyStaffDirectoryAddable interface."""
return [type['name'] for type in listTypes() if IFacultyStaffDirectoryAddable.implementedBy(type['klass'])]
security.declarePrivate('getRoleSet')
def getRoleSet(self):
"""Get the roles vocabulary to use."""
portal_roles = getFilteredValidRolesForPortal(self)
allowed_roles = [r for r in portal_roles if r not in INVALID_ROLES]
return allowed_roles
#
# Validators
#
security.declarePrivate('validate_id')
def validate_id(self, value):
"""Ensure the id is unique, also among groups globally."""
if value != self.getId():
parent = aq_parent(aq_inner(self))
if value in parent.objectIds():
return _(u"An object with id '%s' already exists in this folder") % value
groups = getToolByName(self, 'portal_groups')
if groups.getGroupById(value) is not None:
return _(u"A group with id '%s' already exists in the portal") % value
registerType(FacultyStaffDirectory, PROJECTNAME)
Below is the portion of code from the class that I have implemented to change the description field's schemata:
from Products.Archetypes.public import ImageField, ImageWidget, StringField, StringWidget, SelectionWidget, TextField, RichWidget
from Products.FacultyStaffDirectory.interfaces.facultystaffdirectory import IFacultyStaffDirectory
from archetypes.schemaextender.field import ExtensionField
from archetypes.schemaextender.interfaces import ISchemaModifier, ISchemaExtender, IBrowserLayerAwareExtender
from apkn.templates.interfaces import ITemplatesLayer
from zope.component import adapts
from zope.interface import implements
class _ExtensionImageField(ExtensionField, ImageField): pass
class _ExtensionStringField(ExtensionField, StringField): pass
class _ExtensionTextField(ExtensionField, TextField): pass
class FacultyStaffDirectoryExtender(object):
"""
Adapter to add description field to a FacultyStaffDirectory.
"""
adapts(IFacultyStaffDirectory)
implements(ISchemaModifier, IBrowserLayerAwareExtender)
layer = ITemplatesLayer
fields = [
]
def __init__(self, context):
self.context = context
def getFields(self):
return self.fields
def fiddle(self, schema):
desc_field = schema['description'].copy()
desc_field.schemata = "default"
schema['description'] = desc_field
And here is the code from my configure.zcml:
<adapter
name="apkn-FacultyStaffDirectoryExtender"
factory="apkn.templates.extender.FacultyStaffDirectoryExtender"
provides="archetypes.schemaextender.interfaces.ISchemaModifier"
/>
Is there something that I'm missing in this?
def fiddle(self, schema):
schema['description'].schemata = 'default'
should be sufficient. The copy() operation does not make any sense here.
In order to check if the fiddle() method is actually used: use pdb or add debug print statements.
It turns out that there was nothing wrong with the code.
I got it to work by doing the following:
Restarted the plone website
Re-installed the product
Clear and rebuilt my catalog
This got it working somehow.

Using memcache to store obj's in google app engine

I'm trying to use memcache to cache data retrevied from the datastore. Storing stings works fine. But can't one store an object? I get an error "TypeError: 'str' object is not callable" when trying to store with this:
pageData = StandardPage(page)
memcache.add(memcacheid, pageData, 60)
I've read in the documentation that it requires "The value type can be any value supported by the Python pickle module for serializing values." But don't really understand what that is. Or how to convert pageData to it.
Any ideas?
..fredrik
EDIT:
I was a bit unclear. PageType is an class that amongst other thing get data from the datastore and manipulate it. The class looks like this:
class PageType():
def __init__(self, page):
self.pageData = PageData(data.Modules.gql('WHERE page = :page', page = page.key()).fetch(100))
self.modules = []
def renderEditPage(self):
self.addModules()
return self.modules
class StandardPage(PageTypes.PageType):
templateName = 'Altan StandardPage'
templateFile = 'templates/page.html'
def __init__(self, page):
self.pageData = PageTypes.PageData(data.Modules.gql('WHERE page = :page', page = page.key()).fetch(100))
self.modules = []
self.childModules = []
for child in page.childPages:
self.childModules.append(PageTypes.PageData(data.Modules.gql('WHERE page = :page', page = child.key()).fetch(100)))
def addModules(self):
self.modules.append(PageTypes.getStandardHeading(self, 'MainHeading'))
self.modules.append(PageTypes.getStandardTextBox(self, 'FirstTextBox'))
self.modules.append(PageTypes.getStandardTextBox(self, 'SecondTextBox'))
self.modules.append(PageTypes.getStandardHeading(self, 'ListHeading'))
self.modules.append(PageTypes.getStandardTextBox(self, 'ListTextBox'))
self.modules.append(PageTypes.getDynamicModules(self))
You can use db.model_to_protobuf to turn your object into something that can be stored in memcache. Similarly, db.model_from_protobuf will get your object back.
Resource:
Datastore Functions

Categories