how to extract openload link from a website using selenium in python - python

I have been trying to extract the src which is a openload link from a website.
The src is loacted in iframe which is loaded dynamically.
the website is "https://www1.fmovies.se/film/daddys-home-2.kk29w".
Now The problem is that iframe is loaded dynamically. So this is my code
from selenium import webdriver
driver=webdriver.Chrome('C:\\Users\\aman krishna\Desktop\\New folder(3)\chromedriver.exe')
driver.get("https://bmovies.to/film/daddys-home-2.kk29w/78vp5j")
driver.find_element_by_xpath("//iframe[contains(#src,'<https://openload.co/embed/qe3n5GZGyGo/?autostart=true')]")

I could spoon feed you the code, as I wrote my own scraper with python that is eerily similar to what you've posted. but that wouldn't help you in the long run.
I'll give you a hint though. Use var box and var frames to get what you need.

Related

How can I click on 'Show more matches' on the Flashscore website using Selenium library in Python to scrape hidden information?

I am working on scraping data from the Flashscore website.
https://www.flashscore.com/football/albania/superliga-2019-2020/results/
Although I can find the links for most of the matches that are visible once the above page loads, there are many matches that are hidden and can only be accessed by clicking on 'Show more matches'.
Snapshot of the page
I found the class for 'Show more matches' (event__more event__more--static) and used the '.click()' method of the selenium library in Python but the output is null. Also, I tried various other implementations of clicking this link but couldn't get it working.
Is there any other way I can click on the link and extract the information in Python? Any help would be greatly appreciated.
Note: I also haven't found any classes where all of this information is hidden.
You can use the execute_script() driver method to achieve this. It's used for executing JavaScript in the current window/frame.
You can find the code snippet below-
driver.get('https://www.flashscore.com/football/albania/superliga-2019-2020/results/')
show_more_button=driver.find_element_by_xpath('//*[#id="live-table"]/div[1]/div/div/a') #find the show more results element
driver.execute_script("arguments[0].click();", show_more_button)

Python WebScraping Confusion

I tried to webscrape a HTML webpage, https://streamelements.com/logna/leaderboard, but the HTML code that I can see in inspect element with Firefox is different to the HTML source code of the webpage.
Is it possile to webscrape webpages like this or is there a way to get the code you can see through inspect element?
The Html code seen from inspect tool may differ from the original source code. It is because all the js and php code are rendered by the browser. So, while doing web scraping you should consider the HTML code as seen on browser npt the original source code.
Hope, this will help you.

How to get HTML code from elements which don't appear in the source code with selenium?

I'm working in an App that uses Web scraping, but I'm having a hard time figuring out how to get some data from a web page. I can see the info that I'm looking for when i use "inspect element" in Firefox:
The thing is that it doesn't appear in the HTML code of the page, which i actually can get using selenium, the data i look for is obviously database driven and I'm stuck right there, Is there a way to scrap this out with selenium?
This is the url btw: http://2ez.gg/#gg?name=Doombag&server=lan
You should probably be trying to scrape http://lan.op.gg/summoner/userName=Doombag instead, http://2ez.gg/#gg?name=Doombag&server=lan contains an iframe which is why you can't find 55% in the document body.
The reason is that the data you want to retrieve is contained inside the iframe, which has caused the selenium cannot visit the data directly.
Try the following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver=webdriver.Chrome()
URL='http://2ez.gg/#gg?name=Doombag&server=lan'
driver.get(URL)
driver.switch_to.frame('iframe-content')
elem=driver.find_element_by_css_selector('.WinRatioGraph div.Text')
print(elem.text)
Output: 55%

How to extract request url using python selenium

I am new to webscraping and need some help to extract a request-url from the online movie-stream website YIFY. I am familiar with how selenium works and I am trying to find the download url of the movie Revenant.
Using python-selenium I can click on the play icon and if you open your inspect element and go the network tab then you can see the request-url but you can't do inspect element and find it.
Download link - http://download1282.mediafire.com/3pvv1jx9z23g/crdad7bg0ghjh7r/vid.pdf
I am trying to extract this particular download link using python-selenium. Could anyone tell me if it is possible? Well I am not trying to download the movie but checking if it is possible to download the links from it. Here the links are not embedded in the html page, and I will highly appreciate any help.

omegle lxml scrape not working

So I'm performing a scrape of omegle trying to scrape the users online.
This is the HTML code:
<div id="onlinecount">
<strong>
30,000+
</strong>
</div>
Now I would presume that using LXML it would be //div[#id="onlinecount"] to scrape any text within the , I want to get the numbers from the tags, but when I try to scrape this, I just end up with an empty list
Here's my relevant code:
print "\n Grabbing users online now from",self.website
site = requests.get(self.website)
tree = html.fromstring(site.text)
users = tree.xpath('//div[#id="onlinecount"]')
Note that the self.website variable is just http://www.omegle.com
Any ideas what I'm doing wrong? Note I can scrape other parts just not the number of online users.
I ended up using a different set of code which I learned from a friend.
Here's my full code for anyone interested.
http://pastebin.com/u1kTLZtJ
When you send a GET request to "http://www.omegle.com" using requests python module,what I observed is that there is no "onlinecount" in site.text. The reason is that part gets rendered by a javascript. You should use a library that is able to execute the javascript and give you the final html source that is rendered in a browser. One such third party library is Selenium http://selenium-python.readthedocs.org/. The only downside is that it opens a real web browser.
Below is a working code using selenium and an attached screenshot:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.omegle.com")
element = browser.find_element_by_id("onlinecount")
onlinecount = element.find_element_by_tag_name("strong")
You can also use GET method on this http://front1.omegle.com/status
that will return the count of online users and other details in JSON form
I have done a bit of looking at this and that particular part of the page is not XML but Javascript.
Here is the source (this is what the requests library is returning in your program)
<div id="onlinecount"></div>
<script>
if (IS_MOBILE) {
$('sharebuttons').dispose();
$('onlinecount').dispose();
}
</script>
</div>
As you can see, in lxml's eyes there is nothing but a script in the onlinecount div.
I agree with Praveen.
If you want to avoid launching a visible browser, you could use PhantomJS which also has a selenium driver :
http://phantomjs.org/
PhantomJS is a headless WebKit scriptable with a JavaScript API
Instead of selenium scripts, you could also write PhantomJS js scripts (but I assume you prefer to stay in Python env ;))

Categories