When scraping simple sites, a simple GET or POST request with a few parameters is enough to get the data from the site.
However, when scraping an oauth-secured site, in addition to the url-specific parameters it's necessary to send oauth parameters (oauth_consumer_key, oauth_signature, oauth_nonce, oauth_signature_method, oauth_timestamp, oauth_token)
How could scrapy be configured so that scrapy.Request automatically includes those parameters? (obviosly after supplying api keys, etc)
If that can't be done, is it possible to extract those parameters from a request created from another library (like oauth2) and put them into a scrapy request?
Related
i am trying to use the requests library with python to fetch data from the traffic api with python.
this is the link for website with api that should include the traffic data:
https://api.tomtom.com/traffic/services/4/flowSegmentData/relative0/10/json?point=52.41072%2C4.84239&openLr=true&jsonp=jsonp&key=3EeqxQCR2DNsYzRCT0RPIxUhlzAM3hQc
but it returns an "Developer Inactive" on the website . how to solve that and use the api
also i want to ask if this will work with kivy.
API request that you provided has an API key that is not existing anymore. I tried it with your API key that you provided in the comment and it worked. But you must notice that you copied it wrongly - there is an additional character at the beginning.
I'm doing a personal project where I am trying to scrape HTML tables from a financial data website using Python. I am able to successfully use the requests package in Python to access public websites and extract any information (using BeautfulSoup4 afterwards for processing), but the code I am using is shown below:
# import requests
import requests
# access website
url = 'https://financial-data-url.ezproxy1.library.uniname.edu.com/path/to/financial/data'
headers = example_header
page = requests.get(url, headers = headers)
However, trying to access the website normally requires login through my University's library database through an EZproxy server (shown in example url). When I attempt to request the URL of the financial data webpage after getting access through the library database, it returns what seems to be the University library EZproxy webpage. This is where I need to click "login" before being directed to the financial data webpage.
Is there some credential provision that I may be missing in the request function, or potentially a different way of passing the proxy server to the URL so that the request does not end up on the proxy server login page?
I found that the fastest and most effective work-around for this problem is to use the Selenium web-based automation package (https://selenium-python.readthedocs.io/)
Selenium makes it very easy to replicate a login as well as navigation within the browser just as a person would. IMO, the simplicity of it may far outweigh the benefits of calling the web-page directly depending on the use-case (not efficient when speed and efficiency is the primary goal, however if that is not a major constraint it works quite well)
I'm trying to retrieve emails via a python script. I was looking to see if there's a way to retrieve them and display into an HTML inbox page. I know that I could just log onto my email and see my inbox, but I still want to see if I can retrieve my emails and render it in readable form.
Assuming you're trying to retrieve your Gmail inbox, Google's Gmail API is perfect for that use case.
First of all, here's the setup instructions for Python:
https://developers.google.com/gmail/api/quickstart/python
Once you've got your Gmail project set up (including signing up and getting an API key), you can retrieve an inbox from the Users.messages data. The example Gmail uses in the first link (quickstart) retrieves Users.labels, so it should be a pretty basic modification to retrieve Users.messages using the same syntax.
Alternatively, Gmail has a REST API which you could use to easily retrieve JSON data using a simple HTTP request. See here:
https://developers.google.com/gmail/api/v1/reference/
Note: if you're going the HTTP request route, then you might as well use JQuery (Javascript library) to execute an HTTP request using the $.get() and/or $.post() methods.
I need to fetch basic profile data (complete page - html) of Linkedin profile. I tried python packages such as beautifulsoup but I get access denied.
I have generated the api tokens for linkedIn, but I am not sure how to incorporate those into the code.
Basically, I want to automate the process of scraping by just providing the company name.
Please help. Thanks!
Beautiful Soup is a web scraper. Typically, people use this library to parse data from public websites or websites that don't have APIs. For example, you could use it to scrape the top 10 Google Search results.
Unlike web scrapers, a API lets you retrieve data behind non-public websites. Furthermore, it returns the data in a easily readable XML or JSON format, so you don't have to "scrape" a HTML file for the specific data you care about.
To make a API call to LinkedIn, use need to use a python HTTP request library. See this stackoverflow post for examples.
Take a look at Step 4 of the LinkedIn API documentation. It shows a sample HTTP GET call.
GET /v1/people/~ HTTP/1.1
Host: api.linkedin.com
Connection: Keep-Alive
Authorization: Bearer AQXdSP_W41_UPs5ioT_t8HESyODB4FqbkJ8LrV_5mff4gPODzOYR
Note that you also need to send a "Authorization" header along with HTTP GET call. This is where your token would go. You're probably getting an access denied right now because you didn't set this header in your request.
Here's an example of how you would add that header to a request with the requests library.
And that should be it. When you make that request, it should return a XML or JSON that has the data you want. You can use an XML or JSON parser to get the specific fields you want.
I am doing a test project that uses Facebook Graph API to retrieve data from an events page. I need to use the following url: https://graph.facebook.com/OffTheGridSF/events and do a HTTP GET from my web app. I created a facebook app (for testing) and have the APP_ID, APP_SECRET. I was wondering which library (if any) should I use. I have looked at django-facebook and pyfb. I am not sure how the authentication process works. I don't need a login page for my website. I only need the JSON containing the list of events. Any help as to how I should proceed will be highly appreciated. I just started playing around with Django a few hours ago so nothing is trivial.
You can try using python requests library directly with the URL you want to GET. Checkout requests-oathlib