Scraping Interactive Line Charts with Python - python

I would like to know how to extract the data that popped up when the mouse hovers over a certain time frame inside a graph panel.
The website is
https://app.truflation.com/
Data Graph Image
I tried Use Selenium to webscrape the data that popped up. There was a solution from https://towardsdatascience.com/scraping-interactive-charts-with-python-2bc20a9c7f5c and https://youtu.be/lTypMlVBFM4 but the difference here is I cannot capture the XPATH of the popup message.
from selenium import webdriver
DRIVER_PATH = '/Users/hudso/OneDrive/Documents/UST course/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get("https://app.truflation.com/")
datas = driver.find_element_by_class_name('overlay')
for data in datas:
Date = data.find_element_by_xpath('.//*[#id="trufrate-timeframe"]/g[2]/text[1]').text
Inflation = data.find_element_by_xpath('.//*[#id="trufrate-timeframe"]/g[2]').text
print(Date,Inflation)
I tried many different versions and this one seems to be closer. I am yet to print Date and Inflation, so not yet applied it to dataframe.

Related

How to scrape a map on a website for informations stored in popups?

I'm relatively new to web scraping so I'm not sure about which approach I should use to collect informations in a specific scenario in which the informations are stored on a map and displayed in popups, such as : https://utils.ocim.fr/cartocim2/
Basically :
the website shows a map,
contact informations are displayed in popups,
a popup will appear when clicking on a geo-tag button,
targeted informations are those lines stored in that popup
I was thinking of using selenium + xpath method but I'm unsure regarding the way to deal :
with this amount of buttons that have to be clicked on
with the popups.
Would you have any resources / tips to advise me to know where to start ?
With great difficulty
Here's a start but it gets a little more complicated as the markers start overlapping so clicking the elements fails, might need to add a step to zoom in etc
from selenium import webdriver
import requests
import pandas as pd
url_base = r'https://utils.ocim.fr/cartocim2/'
driver = webdriver.Chrome(r'C:\Users\username\Downloads\chromedriver_win32\chromedriver.exe')
driver.get(url_base) #open page
#find all the icons
links = driver.find_elements_by_css_selector('div.leaflet-pane.leaflet-marker-pane > img')
import time
output = [] #temp table to append into
for i in range(5): #chaneg to len(links) when done
links[i].click() #click on first icon
output.append(driver.find_elements_by_xpath('//*[#id="popup-header"]')[0].text) #get the text of the name
time.sleep(1) #sleep
driver.find_element_by_css_selector('#initmap').click() #reset the map - needed as without it the next icon might not be on the screen due to map relocation or popup overlap
time.sleep(1)

How to "open in a new tab" with selenium and python?

I'm writing a bot with Python/Selenium.
In my process, I want :
to right click on a picture
open it in a new chrome tab
I tried the following :
def OpenInNewTab(self):
content = self.browser.find_element_by_class_name('ABCD')
action = ActionChains(self.browser)
action.move_to_element(content).perform();
action.context_click().perform()
action.send_keys(Keys.DOWN).perform()
action.send_keys(Keys.ENTER).perform()
However, the problem is that my bot :
open the contextual menu
scroll down on the page and not on the contextual menu
After a lot of researches, I tried with :
import win32com.client as comclt
wsh = comclt.Dispatch("WScript.Shell")
wsh.SendKeys("{DOWN}")
wsh.SendKeys("{ENTER}")
However, it's not really working.
I saw many other topics, like this one (supposing there is href associated to the pic)
Then, i'm a little lost to be able to do tthis simple thing : open a righ click on contextual an element and select open in a new tab. I'm open to any advice / new road to follow.
In my experience it will be difficult to achieve a perfect "one fits all" solution involving the (context menu - new tab) combination, and I tend to keep clear of all the headache it can bring.
My strategy would be a bit different, and, on a case by case basis, I'd use something like:
base_window = driver.current_window_handle # this goes after you called driver.get(<url here>)
my_element=driver.find_element_by_xpath(...) #or whatever identification method
my_element.send_keys(Keys.CONTROL + Keys.ENTER)
driver.switch_to.window(driver.window_handles[1]) #switch to newly opened tab
driver.switch_to.window(base_window) # switch back to the initial tab
An alternative workaround is using hrefs - first open a new tab, then load the fetched href(s). Here's an example:
url='https://www.instagram.com/explore/tags/cars/?hl=en'
driver.get(url)
base_window = driver.current_window_handle
a_tags=driver.find_elements_by_xpath("//div[#class='v1Nh3 kIKUG _bz0w']//a")
hrefs=[a_tag.get_attribute('href') for a_tag in a_tags] #list of hrefs
driver.execute_script("window.open();") #open new tab
driver.switch_to.window(driver.window_handles[1]) #switch to new tab
driver.get(hrefs[0]) #get first href for example
driver.close() #close new tab
driver.switch_to.window(base_window) #back to initial tab

HTML snapshots using the selenium webdriver?

I am trying to capture all the visible content of a page as text. Let's say that one for example.
If I store the page source then I won't be capturing the comments section because it's loaded using javascript.
Is there a way to take HTML snapshots with selenium webdriver?
(Preferably expressed using the python wrapper)
Regardless of whether or not the HTML of the page is generated using JavaScript, you will still be able to capture it using driver.page_source.
I imagine the reason you haven't been able to capture the source of the comments section in your example is because it's contained in an iframe - In order to capture the html source for content within a frame/iframe you'll need to first switch focus to that particular frame followed by calling driver.page_source.
This code will take a screenshot of the entire page:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://dukescript.com/best/practices/2015/11/23/dynamic-templates.html')
driver.save_screenshot('screenshot.png')
driver.quit()
however, if you just want a screenshot of a specific element, you could use this:
def get_element_screenshot(element: WebElement) -> bytes:
driver = element._parent
ActionChains(driver).move_to_element(element).perform() # focus
src_base64 = driver.get_screenshot_as_base64()
scr_png = b64decode(src_base64)
scr_img = Image(blob=scr_png)
x = element.location["x"]
y = element.location["y"]
w = element.size["width"]
h = element.size["height"]
scr_img.crop(
left=math.floor(x),
top=math.floor(y),
width=math.ceil(w),
height=math.ceil(h))
return scr_img.make_blob()
Where the WebElement is the Element you're chasing. of course, this method requires you to import from base64 import b64decode and from wand.image import Image to handle the cropping.

How to remove black space from png data pulled from selenium?

I have the following png binary data that I was able to pull from a page utilizing selenium with the following code:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.polyvore.com/cgi/img-thing?.out=jpg&size=l&tid=39713077')
data = driver.get_screenshot_as_png()
However, the image looks like the following and I'd like to remove the black space around it:
The image is located here: http://www.polyvore.com/cgi/img-thing?.out=jpg&size=l&tid=39713077
Is there a way to remove the black space utilizing the binary data or get selenium to pull only the image and not the black background?
I've tried to utilize Pil, but I've only found ways to remove white space and not black space, plus it's difficult to turn it back to binary data, which I need.
I've also looked into the PNG Module, but I couldn't figure out how to turn it back to binary as well.
One solution would be to directly get the screenshot of the image:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.polyvore.com/cgi/img-thing?.out=jpg&size=l&tid=39713077')
element = driver.find_element_by_css_selector("img")
image = element.screenshot_as_png()
But unfortunately Firefox doesn't yet implement this feature.
Another way would be to crop the screenshot to the targeted element:
import StringIO
from selenium import webdriver
from PIL import Image
driver = webdriver.Firefox()
driver.get('http://www.polyvore.com/cgi/img-thing?.out=jpg&size=l&tid=39713077')
element = driver.find_element_by_css_selector("img")
rect = driver.execute_script("return arguments[0].getBoundingClientRect();", element)
screenshot = driver.get_screenshot_as_png()
img = Image.open(StringIO.StringIO(screenshot))
img_cropped = image.crop((rect['x'], rect['y'], rect['width'], rect['height']))
img_cropped.save('screenshot.png')

Python Selenium: click() cannot trigger event

I try to use selenium to mimic my action on a website to convert PDF files to EXCEL files. There are three steps to complete the conversion:
Upload the PDF file.
Input email address.
Click the 'convert' button.
I wrote the code as below. However, every time I click the button the page just refreshes without actually converting the file.
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("https://pdftoexcelonline.com/en/")
# Upload file
el_upload = driver.find_element_by_name("file")
el_upload.send_keys("/path/to/the/file")
# Input email
el_email = driver.find_element_by_name("email")
el_email.clear()
el_email.send_keys("<email address>")
# Convert button
el_button = driver.find_element_by_id("convert_now")
el_button.click()
time.sleep(10)
driver.close()
This page works well when I completed the steps manually. What is reason that my code did not trigger the conversion?
One possible reason is the not enough execution time. You can add some sleep after each action to verify. Treat it as workaround if work.

Categories