Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm currently working on my first website using the Django framework. Major parts of my content is fetched from a third party API, which requires three API requests to said API in order to fetch all the data I need.
My problem is that this slows down performance a lot, meaning my page load time is about 1-2 seconds, which I don't find satisfying at all.
I'm looking for a few alternatives/best practices for these kind of scenarios. What would one do to speed up page load times? So far, I've been thinking of running a cronjob in the background which calls the APIs for all users that are currently logged in and store the data on my local database, which has a much faster response time.
The other alternative would be loading the API request data separately and adding the data once it has been loaded, however I don't know at all how this would work.
Any other ideas or any tips on how I can improve this?
Thank you!
Tobias
A common practice it's build a cache, so you first look the data in your local database, if doesn't exists, then call the api and save the data.
Without more information it's impossible to write a working example.
You could make a custom method to do all in once.
def call_data(id):
try:
data = DataModel.objects.get(api_id=id)
except Exception, e:
data = requests.get("http://api-call/")
DataModel.objects.create(**data)
return data
This is an example, not to use in production, needs some success validation at least.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm working on a personal project where I need to do multiple requests to scrape keywords & abstract data from different pages (~ 800 requests). Every time run my program it took 30 min to scrape all the data.
I'm thinking two ways to speed up runtime:
read data into CSV file once and use panada to read data from CSV file for future reference
create a MySQL DB and store data in there.
Are these two approaches feasible? It would be great if I get some insights.
Thanks
Having some experience with scraping you have several options as using the requests library to do your GET and Post. -> Please remember to keep the session.
Or then using a framework as scrapy.
The main thing to scrape in an optimal way is to:
Split your work[1];
Use a lots of try/exception handling and save the errors [2];
If you are scraping a lot rate-limit your requests to avoid being blocked[3];
Save information on each step;
And please if you are lost use the Inspect tools on your browser to see the network calls :)
[1] - A timeout is very time consuming and will stop your process until the timeout exception occurs, splitting your work will help on that.
[2] - Several errors may happen and "stop" all your work with a simple error. Using try and catching the exception will allow you to save the errors and later work on it. Saving Where you are working will allow you to resume it later.
[3] - Some sites will block you if you do several requests by minute to be reasonable.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Usually, for me, loading data from the SQL database on a Server first, then manipulatting later with pandas on my computer.
However, many other's aere preprocessing some data in SQL first (like case etc.) then the rest with pandas.
So i wonder which is better and why? thx!
This question is quit general. For a more specific answer, we would need to know more about your setup.
I make some assumptions to answer your question: I assume your database is running on a server and your python code is executed on your local machine.
In this case, you have to consider at least two things:
transmitted data over the network
data processing
If you make a general SQL request, large amounts of data are transmitted over the network. Next, your machine has to process the data. Your local machine might be less powerful than the server.
On the other hand, if you submit a specific SQL request, the powerful server can process the data and only return the data you are actually interested.
SQL queries can get long and hard to understand since you have to pass it as one statement. In python, you have the possibility to process the data over multiple lines of code.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to build a app which has the following design pattern.
Results displayed on iOS device.
User data stored online (more than
just username/password, also data they themselves put in).
User can sign in with TW/FB etc.
Computation logic code running on backend, which will need to gather data from online sources, and produce results. The server code will be Node.js or Python.
I think some combination of Firebase and Google App engine will work, but I'm not exactly sure which of the design patterns in the following link is the one I'm looking for.
https://cloud.google.com/solutions/mobile/mobile-app-backend-services#design-pattern
?
Based on your description, I think the second one will work best.
https://cloud.google.com/solutions/mobile/mobile-app-backend-services#firebase-appengine-standard
You will likely need to use firebase queue to do what you're planning to do.
https://firebase.googleblog.com/2015/05/introducing-firebase-queue_97.html
Results displayed on iOS device.
Using iOS firebase calls
User data stored online (more than
just username/password, also data they themselves put in).
Using iOS fireabse calls
User can
sign in with TW/FB etc.
Using firebase authentication
Computation logic code running on backend,
which will need to gather data from online sources, and produce
results. The server code will be Node.js or Python.
Using firebase queue running on google app engine
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Recently I have been assigned a project at my college, which is a news aggregator. I found Flipboard to be a very interesting and viral app for news aggregation. To achieve this, I am building a web crawler, that will crawl the websites, to fetch recent news and posts. I was going through a post on Gizmod
Is the scraper universal/generic, or are there customer scrapers for
certain sites?
Doll: It is mostly universal/generic. However, we can
limit the amount of content displayed on a site-specific basis. We
already try to do this with some sites that publish extremely
abbreviated RSS feeds- even though we aren't using RSS directly, we
attempt to achieve display parity with their feed.
I am quite familiar with the process of fetching data from a single website. But not sure how could I fetch the data from multiple websites and blogs, all with a completely different structure.
I am currently using Python 2.7, urllib2 and BeautifulSoup for crawling a single website.
Question:
I want to know, how could I achieve the objective of fetching data from thousands of websites via just one generic crawler?
I recommend creating one big Spider class, then subclassing for individual sites. I wrote a short answer to a similar question here on stackoverflow.
I have done something similar, although having a basic knowledge of python and google-fu taught me how to make a script that the more advanced users would scoff at. But hey, it works for my use, and doesn't leave too much footprint.
I made several functions that used 'request' to fetch the sites and used 'beautifulsoup' to parse the individual sites based on the structure I reverse-engineered from the sites by using the inspector in Chrome.
When the script is run, it runs all of the functions, thus fetching the info I want.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am building a simple Django app that will use scribd to display documents. I would like to have a page where the administrator can upload documents to scribd through the website, since I need to know a few things about it before it gets to scribd. What is the best/easiest way to do this, display an upload page and then take the file that is uploaded and send it to scribd through the docs.upload method of their api? I'm a little new at this Python/Django/REST API thing, so sorry if this is too many questions at once.
That is quite a few questions.
Handling the file upload is pretty straight-forward with Django, see the File Uploads documentation for examples. In short you can access the uploaded file via request.FILES['file'].
To call the scribd api you can use urllib2; see this Hackoarama page for instructions. urllib2 can be a little convoluted but it works once you get a hang of it.
You can call the scribd api directly from within your Django view, but it'd be better practice to separate it out: from within your Django view save the file somewhere on disk and put an "upload this" message on messaging system (eg. beanstalkd). Have a separate process pick up the message and upload the file to scribd. That way you shield your http process and user from any issues accessing the API and the associated delays.
What you want to do (at least from what I read here and on the Django documentation site) is create a custom storage system.
This should give you exactly what you need - it's the motivation they use to start the example, after all!