I have a database of possibly infinitely many products.
Every single product has a Title and tags representing it.
I am trying to make a system where somebody can search for the product using the Title primarily, but helping to sort out the Top 100 products using tags.
Some people have suggested Django (Taggit) or just forming an API.
I am not sure how relevant that is and how needed it is for something simple like the task mentioned above.
I am also looking for something efficient.
Problem: I’m a nurse, and part of my job is to pull up a list of “unsigned services”. Then of course, take these charts, and send them to the right person.
The person before me did not do this, leaving me THOUSANDS of charts to pull up by patient name and DOB, select the right document, and send to the right person.
I have figured out how to use selenium with python to automate logging in, using input to send keys to search the correct patient, and even to pull up the correct document that needs signed.
How do I have the program do this, for every chart? How do I have python work down the list of names and DOB’s without my having to manually put them in?
Anything I look for on my own is just examples of applying a basic function to a list of numbers and that isn’t my goal.
Thanks for your help!
For my work I need to create a Python program to download all the results for "awards" from SBIR automatically.
There are as of now, 171616 results.
I have two possible options. I can download 1,000 at a time but I need to verify that I am not a robot with the reCAPTCHA, therefore I can not automate the download.
Or I could use their API, which would be great! But it only downloads a 100 results when searching for everything available. Is there I way I could iterate through chunks and then compile it into one big JSON file?
This is the documentation.
This is where I say file>save as>filename.json
Any help/advice would really help me out.
Hmm, one way to go is to cycle through possible combinations of parameters that you know. E.g, the API accepts parameters 'year' and 'company' among others. You can start with the earliest year that the award was given, say 1990, and cycle through the years up till present.
https://www.sbir.gov/api/awards.json?year=2010
https://www.sbir.gov/api/awards.json?year=2011
https://www.sbir.gov/api/awards.json?year=2012
this way you'll get up to a 100 awards per year. That's better, however you mentioned that there are 171616 possible results, meaning more than 100 per year, so it won't get all of them. You can use another parameter 'company' in combination.
https://www.sbir.gov/api/awards.json?year=2010&company=luna
https://www.sbir.gov/api/awards.json?year=2011&company=luna
https://www.sbir.gov/api/awards.json?year=2010&company=other_company
https://www.sbir.gov/api/awards.json?year=2011&company=other_company
Now you are getting up to 100 results per company per year. That will give you way more results. You can get the list of companies from another endpoint they provide, which doesn't seem to have a limit on results displayed - https://www.sbir.gov/api/firm.json , watch out though, the json that comes out is absolutely massive and may freeze your laptop. You can use the values from that json for the 'company' parameter and cycle through those.
Of course all of that is a workaround and still doesn't guarantee you getting ALL of the results (although it might get them all). My first action would be to try to contact website admins telling them about your problem. A common thing to do for the apis that return a massive list of results is to provide a page parameter in the url - https://www.sbir.gov/api/awards.json?page=2 so that you can cycle through pages of results. Maybe you can persuade them to do that.
I wish they have better documentation. It seems we can do pagination via:
https://www.sbir.gov/api/awards.json?agency=DOE&start=100
https://www.sbir.gov/api/awards.json?agency=DOE&start=200
https://www.sbir.gov/api/awards.json?agency=DOE&start=300
EDIT:
I have added [MVC] and [design-patterns] tags to expand the audience for this question as it is more of a generic programming question than something that has direclty to do with Python or SQLalchemy. It applies to all applications with business logic and an ORM.
The basic question is if it is better to keep business logic in separate modules, or to add it to the classes that our ORM provides:
We have a flask/sqlalchemy project for which we have to setup a structure to work in. There are two valid opinions on how to set things up, and before the project really starts taking off we would like to make our minds up on one of them.
If any of you could give us some insights on which of the two would make more sense and why, and what the advantages/disadvantages would be, it would be greatly appreciated.
My example is an HTML letter that needs to be sent in bulk and/or displayed to a single user. The letter can have sections that display an invoice and/or a list of articles for the user it is addressed to.
Method 1:
Split the code into 3 tiers - 1st tier: web interface, 2nd tier: processing of the letter, 3rd tier: the models from the ORM (sqlalchemy).
The website will call a server side method in a class in the 2nd tier, the 2nd tier will loop through the users that need to get this letter and it will have internal methods that generate the HTML and replace some generic fields in the letter, with information for the current user. It also has internal methods to generate an invoice or a list of articles to be placed in the letter.
In this method, the 3rd tier is only used for fetching data from the database and perhaps some database related logic like generating a full name from a users' first name and last name. The 2nd tier performs most of the work.
Method 2:
Split the code into the same three tiers, but only perform the loop through the collection of users in the 2nd tier.
The methods for generating HTML, invoices and lists of articles are all added as methods to the model definitions in tier 3 that the ORM provides. The 2nd tier performs the loop, but the actual functionality is enclosed in the model classes in the 3rd tier.
We concluded that both methods could work, and both have pros and cons:
Method 1:
separates business logic completely from database access
prevents that importing an ORM model also imports a lot of methods/functionality that we might not need, also keeps the code for the model classes more compact.
might be easier to use when mocking out ORM models for testing
Method 2:
seems to be in line with the way Django does things in Python
allows simple access to methods: when a model instance is present, any function it
performs can be immediately called. (in my example: when I have a letter-instance available, I can directly call a method on it that generates the HTML for that letter)
you can pass instances around, having all appropriate methods at hand.
Normally, you use the MVC pattern for this kind of stuff, but most web frameworks in python have dropped the "Controller" part for since they believe that it is an unnecessary component. In my development I have realized, that this is somewhat true: I can live without it. That would leave you with two layers: The view and the model.
The question is where to put business logic now. In a practical sense, there are two ways of doing this, at least two ways in which I am confrontet with where to put logic:
Create special internal view methods that handle logic, that might be needed in more than one view, e.g. _process_list_data
Create functions that are related to a model, but not directly tied to a single instance inside a corresponding model module, e.g. check_login.
To elaborate: I use the first one for strictly display-related methods, i.e. they are somehow concerned with processing data for displaying purposes. My above example, _process_list_data lives inside a view class (which groups methods by purpose), but could also be a normal function in a module. It recieves some parameters, e.g. the data list and somehow formats it (for example it may add additional view parameters so the template can have less logic). It then returns the data set to the original view function which can either pass it along or process it further.
The second one is used for most other logic which I like to keep out of my direct view code for easier testing. My example of check_login does this: It is a function that is not directly tied to display output as its purpose is to check the users login credentials and decide to either return a user or report a login failure (by throwing an exception, return False or returning None). However, this functionality is not directly tied to a model either, so it cannot live inside an ORM class (well it could be a staticmethod for the User object). Instead it is just a function inside a module (remember, this is Python, you should use the simplest approach available, and functions are there for something)
To sum this up: Display logic in the view, all the other stuff in the model, since most logic is somehow tied to specific models. And if it is not, create a new module or package just for logic of this kind. This could be a separate module or even a package. For example, I often create a util module/package for helper functions, that are not directly tied for any view, model or else, for example a function to format dates that is called from the template but contains so much python could it would be ugly being defined inside a template.
Now we bring this logic to your task: Processing/Creation of letters. Since I don't know exactly what processing needs to be done, I can only give general recommendations based on my assumptions.
Let's say you have some data and want to bring it into a letter. So for example you have a list of articles and a costumer who bought these articles. In that case, you already have the data. The only thing that may need to be done before passing it to the template is reformatting it in such a way that the template can easily use it. For example it may be desired to order the purchased articles, for example by the amount, the price or the article number. This is something that is independent of the model, the order is now only display related (you could have specified the order already in your database query, but let's assume you didn't). In this case, this is an operation your view would do, so your template has the data ready formatted to be displayed.
Now let's say you want to get the data to create a specifc letter, for example a list of articles the user bough over time, together with the date when they were bought and other details. This would be the model's job, e.g. create a query, fetch the data and make sure it is has all the properties required for this specifc task.
Let's say in both cases you with to retrieve a price for the product and that price is determined by a base value and some percentages based on other properties: This would make sense as a model method, as it operates on a single product or order instance. You would then pass the model to the template and call the price method inside it. But you might as well reformat it in such a way, that the call is made already in the view and the template only gets tuples or dictionaries. This would make it easier to pass the same data out as an API (see below) but it might not necessarily be the easiest/best way.
A good rule for this decision is to ask yourself If I were to provide a JSON API additionally to my standard view, how would I need to modify my code to be as DRY as possible?. If theoretical is not enough at the start, build some APIs for the templates and see where you need to change things to the API makes sense next to the views themselves. You may never use this API and so it does not need to be perfect, but it can help you figure out how to structure your code. However, as you saw above, this doesn't necessarily mean that you should do preprocessing of the data in such a way that you only return things that can be turned into JSON, instead you might want to make some JSON specifc formatting for the API view.
So I went on a little longer than I intended, but I wanted to provide some examples to you because that is what I missed when I started and found out those things via trial and error.
I want to generate the list of various milestones to accomplish something, and the deadline for each of them is calculated dynamically from a final date given by the user.
I'm not sure about the best way to handle this. The first idea that came to my mind is to write some template (not django template here) file on the server containing the necessary informations for generating all the steps, which will be fetched once for every new user, and used to create a list of milestone objects from a milestone class (some generic model in django). Maybe something written in json :
{"some_step":
{
"start_date" = "final_date-10",
"end_date" = "final_date-7",
}
}
and the corresponding model
class Milestone(models.Model):
name = models.Charfield()
start_date = models.DateField()
end_date = models.DateField()
def time_to_final(self,time):
return self.final_date-time
strings like the "finaldate-10" would be converted by some routine and passed at the registration time to the time_to_final method, when initializing the data for the new user in the database.
However I'm not sure it's the best approach. Though it won't be used by millions of people, I'm worried about possible negative impacts on the server performances ? Is there a better, maybe more pythonic way ?
EDIT for more clarification :
A user wants to do complete something at date D0.
My app generates the steps like this :
do step 1 from date D1i to date D1f
do step 2 from date D2i to date D2f
-...
until date D0 is reached and all tasks are completed
All the dates are calculated when D0 is provided.
All the steps are generated for every user.
What have templates got to do with this? Design your models first - maybe you need a Steps model with a foreign key to User and a foreign Key to Milestone (or maybe not - I'm not clear from your description).
Only when you've got the data clear in your mind start thinking about templates etc.
The great thing about django is that once you've made your models you can use the admin interface to enter some data. It will quickly become clear whether you've modeled your problem correctly.
Don't worry about performance, get your data structures clear, make it work and if you find it isn't running fast enough (unlikely) optimize it.