Scraping documents from climate change website PYTHON - python

I need to get files from the European Space Agency (ESA) website which allows to choose latitude, longitude and years and once those gaps are filled in you can click on "download" and get the specific file you asked for.
I need to retrieve data from a range of coordinates and a set of years would like to do it on python. The website's code uses "id" and not "name" to identify the variables (lat, lon, years)
link: http://www.esa-sst-cci.org/PUG/map.htm
Thank you

You can do this using Selenium WebDriver. It is can make direct call to the browsers using browser's native support for automation.
Click here
Introducing Selenium webdriver api by example

Related

How to Scrape data from pop-ups (i need to scrape data that is only visible once I click the popup, which is not a link)

I'm an absolute beginner in Python.
I need to scrape data from this website, which is a directory of professors
Some of the data are visible without the need to click, (names and school etc)
However I need to scrape email, department info as well.
I've been searching on the internet for the whole day and I don't know how to do it
Could anyone plz help?!
When you check the network activity, you'll see that the data is dynamically loaded from google spreadsheets. You can retrieve the spreadsheet directly without scraping.

How to locate graphic <canvas> html element for scraping?

I'm currently scraping a website dedicated to money conversion using python and library selenium, for that case I need to extract the information received by a graphic which shows the money value hourly, but I've been unable to find where is the function on the component or part of the html scripts that can help me extracting that information.
This is the link to the website: https://dolar.set-icap.com/

Scraping GIS coordinates from a non-traditional map using selenium?

I'm trying to scrape a website with real estate publications. Each publication looks like this:
https://www.portalinmobiliario.com/venta/casa/providencia-metropolitana/5427357-francisco-bilbao-amapolas-uda#position=5&type=item&tracking_id=cedfbb41-ce47-455d-af9f-825614199c5e
I have been able to extract all the information I need except the coordinates (GIS) of the publications. The maps appear to be pasted (not linked). Does anyone know how to do this?
please help
Im using selenium/python3.
Im using Selenium & Chrome.
This is the list of publications:
https://www.portalinmobiliario.com/venta/casa/propiedades-usadas/las-condes-metropolitana
If you click any property of that list, it will take to the page were the maps are displayed. I'm using a loop to go through all of them (one at a time).
The code is a bit long, but so far i have been mostly using find_element_by_class_name and find_element_by_xpath to find and extract the information. I tried using them for the map but I dont know where to find the coordinates.

How to get co-ordinates(longitude and latitude) on Google map with Selenium?

I am using Selenium (python, Firefox) to test the website behavior. On a certain point it gets to the web-page where there's a Google map with one object on it. I need to get its coordinates (longitude and latitude of this object on a map).
Do you have any idea how it could be done? I can't modify the code on the website.

How can I input data into a webpage to scrape the resulting output using Python?

I am familiar with BeautifulSoup and urllib2 to scrape data from a webpage. However, what if a parameter needs to be entered into the page before the result that I want to scrape is returned?
I'm trying to obtain the geographic distance between two addresses using this website: http://www.freemaptools.com/how-far-is-it-between.htm
I want to be able to go to the page, enter two addresses, click "Show", and then extract the "Distance as the Crow Flies" and "Distance by Land Transport" values and save them to a dictionary.
Is there any way to input data into a webpage using Python?
Take a look at tools like mechanize or scrape:
http://pypi.python.org/pypi/mechanize
http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
http://www.ibm.com/developerworks/linux/library/l-python-mechanize-beautiful-soup/
http://zesty.ca/scrape/
Packt Publishing has an article on that matter, too:
http://www.packtpub.com/article/web-scraping-with-python
Yes! Try mechanize for this kind of Web screen-scraping task.
I think you can also use PySide/PyQt, because they have a browser core of qtwebkit, you can control the browser to open pages, simulate human actions(fill, click...), then scrape data from pages. FMiner is work on this way, it's a web scraping software I developed with PySide.
Or you can try phantomjs, it's an easy library to control browser, but not it's javascript not python lanuage.
In addition with the answers already given, you could simply do a request on that page. Using your browser you could always inspect the Network (under Tools/Web Developer tools) behaviors and actions when you interact with the page. E.g. http://www.freemaptools.com/ajax/getaandb.php?a=Florida_Usa&b=New%20York_Usa&c=6052 -> request query for getting the results page you are expecting. Request that page and scrape the field you wanted to. IMHO, page requests are way faster than screen scraping (case-to-case basis).
But of course, you could always do screen scraping/browser simulation also (Mechanize, Splinter) and use headless browsers (PhantomJS, etc.) or the browser driver of the browser you want to use.
The query may have been resolved.
You can use Selenium WebDriver for this purpose. A web page can be interacted using programming language. All the operations can be performed as if a human user is accessing the web page.

Categories