I am trying to make this simple feature of sorting by the most viewed articles in the last 24 hours.
I did this by saving the timestamp every time the webpage gets a request.
Now the most "trending" article is the one that has maximum timestamps saved in T-24 hours.
However, I don't think that this is the right way to do it as the list keeps getting bigger, and it must be slowing down the webpages (which will become noticeable after a critical number of views).
[I'm using Django]
You can use django-hitcount. It is pretty easy and useful, I used it in many projects.
https://pypi.org/project/django-hitcount/
Related
First of all I want to apologize if my question is too broad or generic, but it would really save me a lot of needlessly wasted time to get an answer to guide me in the right direction for the work I want to do. With that out of the way, here it goes.
I am trying to retrieve some publicly available data from a website, to create a dataset to work with for a Data Science project. My big issue is that the website does not have a friendly way to download it, and, from what I gathered, it also has no API. So, getting the data requires skills that I do not possess. I would love to learn how to scrape the website (the languages I am most comfortable with are Python and R), and it would add some value to my project if I did it, but I also am somewhat pressured by time constraints, and don't know if it is even possible to scrape the website, much less to learn how to do it in a few days.
The website is this one https://www.rnec.pt/pt_PT/pesquisa-de-estudos-clinicos. It has a search box, and the only option I configure is to click the banner that says "Pesquisa Avançada" and then mark the box that says "Menor de 18 anos". I then click the "Pesquisar" button in the lower-right, and the results that show up are the ones that I want to extract (either that or, if it's simpler, all the results, without checking the "Menor de 18 anos" box). In my computer, 2 results show up per page, and there are 38 pages total. Each result has some of it details in the page where the results appear but, to get the full data from each entry, one has to click "Detalhes" in the lower right of each result, which opens a display with all the data from that result. If possible, I would love to download all the data from that "Detalhes" page of each result (the data there alerady contains the fields that show up in the search result page).
Honestly, I am ready to waste a whole day manually transcribing all the data, but it would be much better to do it computationally, even it it takes me two or three days to learn and do it.
I think that, for someone with experience in web scraping, it is probably super simple to check if it is possible to download the data I described, and what is the best way to go about it (in general terms, I would research and learn it). But I really am lost when it comes to this, and just kindly want to ask for some help in showing me the way go about it (even if the answer is "it is too complicated/impossible, just do it manually"). I know that there are some Python packages for web scraping, like BeautifulSoup and Selenium, but I don't really know if either of them would be appropriate.
I am sorry if my request is not exactly a short and simple coding question, but I have to try to gather any help or guidance I can get. Thank you in advance to everyone who reads my question and a special thank you if you are able to give me some pointers.
I am trying to create a blog where all the comments get loaded on each blog post page. The issue is that some posts can contain a few comments which takes seconds to load while others can contain well over 100 which will take a lot longer. I want to load each comment independently one after another to decrease waiting time so they can work seamlessly but I dont know if this is the best approach. Assuming I cant use pagination(I need it as one continuously list), what would be the best method/approach?
Why don't you use Django Channels to implement asynchronous load of what seems could be "big streams" of data (potentially hundreds of comments) instead of trying to go for slow AJAX paginations?
You can try the little Django Channels chat application example tutorial, maybe that will bring you some ideas to implement your blog comments section. By using this websockets approach, you could even implement something more dynamic with not too much effort, so new comments are added in real time and other similar nice features.
Just some ideas.
This is more of an efficiency question. My django web page is working fine, in the sense that I don't get any errors, but it is very slow. That being said, I don't know where else I would ask this, other than here, so here goes:
I am developing a sales dashboard. In doing so, I am accessing the same data over and over and I would like to speed things up.
For example, one of my metrics is number of opportunities won. This accesses my Opportunities model, sorts out the opportunities won within the last X days and reports it.
Another metric is neglected opportunities. That is, opportunities that are still reported as being worked on, but that there has been no activity on them for Y days. This metric also accesses my Opportunities model.
I read here that querysets are lazy, which, if I understand this concept correctly, would mean that my actual database is accessed only at the very end. Normally this would be an ideal situation, as all of the filters are in place and the queryset only accesses a minimal amount of information.
Currently, I have a separate function for each metric. So, for the examples above, I have compile_won_opportunities and compile_neglected_opportunities. Each function starts with something like:
won_opportunities_query = Opportunities.objects.all()
and then I filter it down from there. If I am reading the documentation correctly, this means that I am accessing the same database many, many times.
There is a noticeable lag when my web page loads. In an attempt to find out what is causing the lag, I commented out different sections of code. When I comment out the code that accesses my database for each function, my web page loads immediately. My initial thought was to access my database in my calling function:
opportunities_query = Opportunities.objects.all()
and then pass that query to each function that uses it. My rationale was that the database would only be accessed one time, but apparently django doesn't work that way, as it made no obvious difference in my page load time. So, after my very long-winded explanation, how can I speed up my page load time?
If I am reading the documentation correctly, this means that I am accessing the same database many, many times.
https://pypi.org/project/django-debug-toolbar/
Btw, go with this one https://docs.djangoproject.com/en/2.2/ref/models/querysets/#select-related
I have made the API parsing the GITHUB contribution data of each account and arranging by month, week or day and decorating with JSON.
responding to just one request takes approximately 2 second. (1800ms)
Link to my GitHub repository.
contributions.py in repository is the python code that does the above things.
THE POINT OF QUESTION : What makes my API slow?
just too many data to parse (about 365)?
the way api make the JSON String?
Thank you for answering and helping me in advance.
"Why is my code slow?" is a really hard question to answer. There's basically an unlimited number of possible reasons that could be. I may not be able to answer the question, but I can provide some suggestions to hopefully help you answer it for yourself.
There are dozens of questions to ask... What kind of hardware are you using? What kind of network/internet connection do you have? Is it just slow on the first request, or all requests? Is it just slow on the call to one type of request (daily, weekly, monthly) or all? etc. etc.
You are indicating overall request times being ~1800ms, but as you pointed out, there are a lot of things happening during the processing of that request. In my experience, often times one of the best ways to find out is to add some timing code to narrow down the scope of the slowness.
For example, one quick and dirty way to do this is to use the python time module. I quickly added some code to the weekly contributions method:
import time
# [...]
#app.route("/contributions/weekly/<uname>")
def contributionsWeekly(uname):
before = time.time()
rects = getContributionsElement(uname)
after = time.time()
timeToGetContribs = after - before
# [...]
print(' timeToGetContribs: ' + str(timeToGetContribs))
print('timeToIterateRects: ' + str(timeToIterateRects))
print(' timeToBuildJson: ' + str(timeToBuildJson))
Running this code locally produced the following results:
timeToGetContribs: 0.8678717613220215
timeToIterateRects: 0.011543750762939453
timeToBuildJson: 1.5020370483398438e-05
(Note the e-05 on the end of the last time... very tiny amount of time).
From these results, we know that the time to get the contributions is taking the majority of the full request. Now we can drill down into that method to try to further isolate the most time consuming part. The next set of results shows:
timeToOpenUrl: 0.5734567642211914
timeToInstantiateSoup: 0.3690469264984131
timeToFindRects: 0.0023255348205566406
From this it appears that the majority of the time is spent actually opening the URL and retrieving the HTML (meaning that network latency, internet connection speed, GitHub server response time, etc are the likely suspects). The next heaviest is the time it actually takes to instantiate the BeautifulSoup parser.
Take all of these concrete numbers with a grain of salt. These are on my hardware (12 year old PC) and my local internet connection. On your system, the numbers will likely vary, and may even be significantly different. The point is, the best way to track down slowness is to go through some basic troubleshooting steps to identify where the slowness is occurring. After you've identified the problem area(s), you can likely search for more specific answers, or ask more targeted questions.
So I'm new to python and just finished my first application. (Giving random chords to be played on a midi piano and increasing the score if the right notes are hit in a graphical interface, nothing too fancy but also non-trivial.) And now I'm looking for a new challenge, this time I'm going to try and create a program that monitors a poker table and collects data on all the players. Though this is completely allowed on almost all poker rooms (example of the largest one) there is obviously no set and go API available. This probably makes the extraction of relevant data the most challenging part of the entire program. In my search for more information, I came across an undergraduate thesis that goes in to writing such a program using Java (Internet Poker: Data Collection and Analysis - Haruyoshi Sakai).
In this thesis, the author speaks of 3 data collection methods:
Sniffing packets
Hand history
Scraping the screen
Like the author, I've come to the conclusion that the third option is probably the best route, but unlike him I have no knowledge of how to start this.
What I do know is the following: Any table will look like the image below. Note how text, including numbers is written in the same font on the table. Additionally, all relevant information is also supplied in the chat box situated in the lower left corner of the window.
In some regards using the chat box sounds like the best way to go, seeing as all text is predictable and in the same font. The problem I see is computational speed: It will often occur that many actions get executed in rapid succession. Any program will have to be able to keep up with this.
On the other hand, using the table as reference means that you have to deal with unpredictable bet locations.
The plan: Taking this in to a count, I'd start by getting an index of all player's names and stacks from the table view and "initialising" the table that way, and continue to use their stacks to extrapolate the betting they do.
The Method: Of course, the method is the entire reason why I made this post. It seems to me like one would need some sort of OCR to achieve all this, but seeing as everything is in a known font, there may be some significant optimisations that can be made. I would love some input on resources to learn about solutions to similar problems. Or if you've got a better idea on how to tackle this problem, I'd love to hear that too!
Please do be sure to ask any questions you may have, I will be happy to answer them in as much detail as possible.