I'm writing a package that imports audio files, processes them, plots them etc., for research purposes.
At each stage of the pipeline, settings are pulled from a settings module as shown below.
I want to be able to update a global setting like MODEL_NAME and have it update in any dicts containing it too.
settings.py
MODEL_NAME = 'Test1'
DAT_DIR = 'dir1/dir2/'
PROCESSING = {
"key1":{
"subkey2":0,
"subkey3":1
},
"key2":{
"subkey3":MODEL_NAME
}
}
run.py
import settings as s
wavs = import_wavs(s.DAT_DIR)
proc_wavs = proc_wavs(wavs,s.PROCESSING)
Some of the settings dicts I would like to contain MODEL_NAME, which works fine. The problem arises when I want to change MODEL_NAME during runtime. So if I do:
import settings as s
wavs = import_wavs(s.DAT_DIR)
s.MODEL_NAME='test1'
proc_wavs1 = proc_wavs(wavs,s.PROCESSING)
s.MODEL_NAME='test2'
proc_wavs2 = proc_wavs(wavs,s.PROCESSING)
But obviously both the calls so s.PROCESSING will contain the MODEL_NAME originally assigned in the settings file.
What is the best way to have it update?
Possible solutions I've thought of:
Store the variables as a mutable type, then update it e.g.:
s.MODEL_NAME[0] = ["test1"]
# do processing things
s.MODEL_NAME[0] = ["test2"]
Define each setting category as a function instead, so it is rerun on
each call e.g.:
MODEL_NAME = 'test1' ..
def PROCESSING():
return {
"key1":{
"subkey2":0,
"subkey3":1
},
"key2":{
"subkey3":MODEL_NAME
}
}
Then
s.MODEL_NAME='test1'
proc_wavs1 = proc_wavs(wavs,s.PROCESSING())
s.MODEL_NAME='test2'
proc_wavs1 = proc_wavs(wavs,s.PROCESSING())
I thought this would work great, but then it's very difficult to
change any entries of the functions during runtime eg if I wanted to
update the value of subkey2 and run something else.
Other thoughts maybe a class with an update method or something, does anyone have any better ideas?
You want to configure generic and specific settings structured in dictionaries for functions that perform waves analysis.
Start by defining a settings class, like :
class Settings :
data_directory = 'path/to/waves'
def __init__(self, model):
self.parameters= {
"key1":{
"subkey1":0,
"subkey2":0
},
"key2":{
"subkey1":model
}
}
# create a new class based on model1
s1 = Settings('model1')
# attribute values to specific keys
s1.parameters["key1"]["subkey1"] = 3.1415926
s1.parameters["key1"]["subkey2"] = 42
# an other based on model2
s2 = Settings('model2')
s2.parameters["key1"]["subkey1"] = 360
s2.parameters["key1"]["subkey2"] = 1,618033989
# load the audio
wavs = openWaves(Settings.data_directory)
# process with the given parameters
results1 = processWaves(wavs,s1)
results2 = processWaves(wavs,s2)
Related
Problem
How to group fields together when serialising a flat-structured SQLAlchemy object with Marshmallow without changing the flat data structure in the background?
Example
Suppose a SQLAlchemy model in a Flask app like this:
from app import db # db using SQLAlchemy
class Data(db.Model):
x = db.Column(db.Float(), required=True)
y = db.Column(db.Float(), required=True)
d = db.Column(db.Float())
How can I serialise this object so that x and y are nested into coordinates, while maintaining a flat data structure in the background (the model)? The output should look something like this:
{
"coordinates": {
"x": 10.56
"y": 1
},
"d": 42.0
}
The problem arises specifically because I use the Data schema with the many=True option. The initialisation is roughly:
schema_data = MomentData()
schema_datas = MomentData(many=True)
Solution Candidates
So this is what I've tried so far, but none of them seemed to work.
Creating a second Schema
Adding a second schema and modifying the Data schema from before yields:
class CoordinatesSchema(Schema):
x = fields.Float(required=True)
y = fields.Float(required=True)
class DataSchema(Schema):
coordinates = fields.Nested(coordinatesSchema, required=True)
d = fields.Float()
Having that in place raises the problem of having to go through every Data item and manually adding the Coordinates schema. My data is coming from a SQLAlchemy query returning a list of Data objects, so that I can easily dump them using schema_datas.
Using fields.Dict
Since Marshmallows fields module offers a dictionary, I tried that as well.
def DataSchema(Schema):
coordinates = fields.Dict(keys=fields.String(),
value=fields.Float(),
required=True,
default={
"x": Data.x,
"y": Data.y
})
d = fields.Float()
Doesn't seem to work either, because Marshmallow can't find Data.x and Data.y automatically when using schema_datas.dump().
Using Self-Nesting
The most logical solution path would be to self-nest. But (from what I understood reading the documentation) self-nesting only refers to nesting one or more other instances within the object. I want to nest the same instance.
class DataSchema(Schema):
x = fields.Float(required=True, load_only=True)
y = fields.Float(required=True, load_only=True)
coordinates = fields.Nested(
lambda: DataSchema(only=('x', 'y')),
dump_only=True)
But unfortunately this also didn't work.
Using Decorator #pre_dump
Inspired by this issue on Marshmallow's Github page, I tried to use the #pre_dump decorator to achieve the desired outcome, but failed again.
class CoordinatesSchema(Schema):
x = fields.Float(required=True)
y = fields.Float(required=True)
class DataSchema(Schema):
coordinates = fields.Nested(coordinatesSchema, required=True)
d = fields.Float()
#pre_dump
def group_coordinates(self, data, many):
return {
"coordinates": {
"x": data.x,
"y": data.y
},
"d": data.d
}
But I can't figure out how to do it properly...
So my question is, what am I doing wrong and how can I solve this problem?
I have one django project. It has one function in view.py to process the data from the inputs to give the output for other function. However the processing time for the function is kind of long. I want to fulfill the instant demonstration of the processed output. How could I achieved that? The following processing() function is for the processing purpose. And the output 'user_entries' is for the demonstration in results() as followed.
def processing(request):
import sys
n = []
for topic in Topic.objects.filter(owner=request.user).order_by("date_added"):
entries = topic.entries.all()
m = []
for p in entries:
q = p.text
m.append(q)
n.append(m)
list = []
start(list, n)
request.session['user_entries'] = list
return request.session['user_entries']
def results(request):
data = processing(request)
return render(request, "project/results.html", {"datas": data})
In the start() function of the processing() function. There is one part list.append() to add new output into list. But it seems that the new appended list cannot be transferred and show the instant results in project/results.html?
What you're doing could likely be done a lot more simply.
def results(request):
return render(
request,
"project/results.html",
{
"user_entries": Entry.objects.filter(topic__owner=request.user),
"start_values": "...", # Whatever start is appending...
},
)
Since you have a foreign key from entry to User, you could also use request.user.topic_set.all() to get the current user's topics.
Or, if you actually do need those lists nested...
# ...
"user_entries": (
topic.entries.all() for topic in
Topic.objects.filter(owner=request.user)
),
# ...
Just based on what you're showing us, it seems like your ordering -- for both Topic and Entry -- should probably have a sensible default set in, e.g., Topic.Meta.ordering, which in this case would probably look like this:
class Topic(models.Model):
# ...
class Meta:
ordering = ("date_added",)
# ...
That way, in this and most other cases, you would not have to apply .ordering(...) manually.
i have enums python file which has :
class ClassificationType(object):
CLASSIFICATION_TYPE_UNSPECIFIED = 0
MULTICLASS = 1
MULTILABEL = 2
i am writing another python file to get the value of the variable declared inside the enums class.
def dataset(model_typ):
dataset_spec = {
"classification": enums.ClassificationType.MULTICLASS
}
as per above code, i am able to get the value of MULTICLASS as 1.
now i need to pass the MULTICLASS/MULTILABEL/CLASSIFICATION_TYPE_UNSPECIFIED as argument(model_type) and pass it to dataset_spec.
how to do it?
thanks in advance
NOTE: i dont want to change enums.py file.
class ClassificationType(object):
CLASSIFICATION_TYPE_UNSPECIFIED = 0
MULTICLASS = 1
MULTILABEL = 2
def dataset(model_typ_multiclass):
dataset_spec = {
"classification": model_typ_multiclass
}
obj = ClassificationType()
model_typ_multiclass = obj.MULTICLASS
dataset(model_typ_multiclass)
Try using the below code in the other file (not the enums.py file):
from enums import ClassificationType as ct
import random
def dataset(model_typ):
dataset_spec = {
"classification": model_typ
}
print(dataset_spec)
dataset(random.choice([ct.MULTILABEL,ct.MULTICLASS]))
Output:
{'classification': 2}
I just simply changed the value of the dataset_spec's classification key to the argument model_typ then at the end of the code call the dataset function and in the parameters write in the enums.ClassificationType.MULTICLASS to get the enums.py files MULTICLASS variable
I am trying to use map/reduce to find the duplication of the data in couchDB
the map function is like this:
function(doc) {
if(doc.coordinates) {
emit({
twitter_id: doc.id_str,
text: doc.text,
coordinates: doc.coordinates
},1)};
}
}
and the reduce function is:
function(keys,values,rereduce){return sum(values)}
I want to find the sum of the data in the same key, but it just add everything together and I get the result:
<Row key=None, value=1035>
Is that a problem of group? How can I set it to true?
Assuming you're using the couchdb package from pypi, you'll need to pass a dictionary with all of the options you require to the view.
for example:
import couchdb
# the design doc and view name of the view you want to use
ddoc = "my_design_document"
view_name = "my_view"
#your server
server = couchdb.server("http://localhost:5984")
db = server["aCouchDatabase"]
#naming convention when passing a ddoc and view to the view method
view_string = ddoc +"/" + view_name
#query options
view_options = {"reduce": True,
"group" : True,
"group_level" : 2}
#call the view
results = db.view(view_string, view_options)
for row in results:
#do something
pass
I'm a bit new to Python dev -- I'm creating a larger project for some web scraping. I want to approach this as "Pythonically" as possible, and would appreciate some help with the project structure. Here's how I'm doing it now:
Basically, I have a base class for an object whose purpose is to go to a website and parse some specific data on it into its own array, jobs[]
minion.py
class minion:
# Empty getJobs() function to be defined by object pre-instantiation
def getJobs(self):
pass
# Constructor for a minion that requires site authorization
# Ex: minCity1 = minion('http://portal.com/somewhere', 'user', 'password')
# or minCity2 = minion('http://portal.com/somewhere')
def __init__(self, title, URL, user='', password=''):
self.title = title
self.URL = URL
self.user = user
self.password = password
self.jobs = []
if (user == '' and password == ''):
self.reqAuth = 0
else:
self.reqAuth = 1
def displayjobs(self):
for j in self.jobs:
j.display()
I'm going to have about 100 different data sources. The way I'm doing it now is to just create a separate module for each "Minion", which defines (and binds) a more tailored getJobs() function for that object
Example: minCity1.py
from minion import minion
from BeautifulSoup import BeautifulSoup
import urllib2
from job import job
# MINION CONFIG
minTitle = 'Some city'
minURL = 'http://www.somewebpage.gov/'
# Here we define a function that will be bound to this object's getJobs function
def getJobs(self):
page = urllib2.urlopen(self.URL)
soup = BeautifulSoup(page)
# For each row
for tr in soup.findAll('tr'):
tJob = job()
span = tr.findAll(['span', 'class="content"'])
# If row has 5 spans, pull data from span 2 and 3 ( [1] and [2] )
if len(span) == 5:
tJob.title = span[1].a.renderContents()
tJob.client = 'Some City'
tJob.source = minURL
tJob.due = span[2].div.renderContents().replace('<br />', '')
self.jobs.append(tJob)
# Don't forget to bind the function to the object!
minion.getJobs = getJobs
# Instantiate the object
mCity1 = minion(minTitle, minURL)
I also have a separate module which simply contains a list of all the instantiated minion objects (which I have to update each time I add one):
minions.py
from minion_City1 import mCity1
from minion_City2 import mCity2
from minion_City3 import mCity3
from minion_City4 import mCity4
minionList = [mCity1,
mCity2,
mCity3,
mCity4]
main.py references minionList for all of its activities for manipulating the aggregated data.
This seems a bit chaotic to me, and was hoping someone might be able to outline a more Pythonic approach.
Thank you, and sorry for the long post!
Instead of creating functions and assigning them to objects (or whatever minion is, I'm not really sure), you should definitely use classes instead. Then you'll have one class for each of your data sources.
If you want, you can even have these classes inherit from a common base class, but that isn't absolutely necessary.