Scrape data in JS Pop Up Python - python

I want to know that can we scrape data of a specific field from a pop up generated on a page using Python ? If yes, please suggest.
I am trying to scrape it, but it is not getting detected and return me an empty list. I am using Python and Beautiful soup to do the job .

You can not use scrapy for JS
You can also solve it with ScrapyJS :
This library provides Scrapy+JavaScript integration using Splash.
Follow the installation instructions for Splash and ScrapyJS,
READ the answer here
OR
You can use Selenium:
Follow the installation instructions Selenium Doc

Related

Webscraping elements with python

Im currently using beautiful soup to try and webscrape a website for data however the python module is reading the source code of the page. In the source code of the page the information i need isn't there however if i right click on the page in chrome and inspect element it is.
i was wondering if there was any way a python module could scrape the elements from a webpage and not the source code
In beautiful soup ive tried to search for the elements like however they just dont come up or appear because its searching in the source code. Im also not sure why or how it doesnt appear there.
When the contents are loaded by JavaScript, you can not get the data via Beautiful Soup. In this situation, the Selenium library is used as it is more useful and handy to extract the required dynamic contents.

Scraping webpage generated by javascript

I have a problem getting javascript content into HTML to use it for scripting. I used multiple methods as phantomjs or python QT library and they all get most of the content in nicely but the problem is that there are javascript buttons inside the page like this:
Pls see screenshot here
Now when I load this page from a script these buttons won't default to any value so I am getting back 0 for all SELL/NEUTRAL/BUY values below. Is there a way to set these values when you load the page from a script?
Example page with all the values is: https://www.tradingview.com/symbols/NEBLBTC/technicals/
Any help would be greatly appreciated.
If you are trying to achieve this with scrapy or with derivation of cURL or urrlib I am afraid that you can't do this. Python has another external packages such selenium that allow you to interact with the javascript of the page, but the problem with selenium is too slow, if you want something similar to scrapy you could check how the site works (as i can see it works through ajax or websockets) and fetch the info that you want through urllib, like you would do with an API.
Please let me know if you understand me or i misunderstood your question
I used seleneum which was perfect for this job, it is indeed slow but fits my purpose. I also used the seleneum firefox plugin to generate the python script as it was very challenging to find where exactly in the code as the button I had to press.

Python - Scrapy ecommerce website

I'm trying to scrape the price of this product
http://www.asos.com/au/fila/fila-vintage-plus-ringer-t-shirt-with-small-logo-in-green/prd/9065343?clr=green&SearchQuery=&cid=7616&gridcolumn=2&gridrow=1&gridsize=4&pge=1&pgesize=72&totalstyles=4699
With the following code but it returns an empty array
response.xpath('//*[#id="product-price"]/div/span[2]/text()').extract()
Any help is appreciated, Thanks.
Because the site is dynamic(this is what I got when I use view(response) command in scrapy shell:
As you can see, the price info doesn't come out.
Solutions:
1. splash.
2. selenium+phantomJS
It might help also by checking this answer:Empty List From Scrapy When Using Xpath to Extract Values
The price is later added by the browser which renders the page using javascript code found in the html. If you disable javascript in your browser, you would notice that the page would look a bit different. Also, take a look at the page source, usually that's unaltered, to see that the tag you're looking for doesn't exist (yet).
Scrapy doesn't execute any javascript code. It receives the plain html and that's what you have to work with.
If you want to extract data from pages which look the same as in the browser, I recommend using an headless browser like Splash (if you're already using scrapy): https://github.com/scrapinghub/splash
You can programaticaly tell it to download your page, render it and select the data points you're interested in.
The other way is to check for the request made to the Asos API which asks for the product data. In your case, for this product:
http://www.asos.com/api/product/catalogue/v2/stockprice?productIds=9065343&currency=AUD&keyStoreDataversion=0ggz8b-4.1&store=AU
I got this url by taking a look at all the XMLHttpRequest (XHR) requests sent in the Network tab found in Developers Tools (on Google Chrome).
You can try to find JSON inside HTML (using regular expression) and parse it:
json_string = response.xpath('//script[contains(., "function (view) {")]/text()').re_first( r'view\(\'([^\']+)' )
data = json.loads(json_string)
price = data["price"]["current"]

How to extract request url using python selenium

I am new to webscraping and need some help to extract a request-url from the online movie-stream website YIFY. I am familiar with how selenium works and I am trying to find the download url of the movie Revenant.
Using python-selenium I can click on the play icon and if you open your inspect element and go the network tab then you can see the request-url but you can't do inspect element and find it.
Download link - http://download1282.mediafire.com/3pvv1jx9z23g/crdad7bg0ghjh7r/vid.pdf
I am trying to extract this particular download link using python-selenium. Could anyone tell me if it is possible? Well I am not trying to download the movie but checking if it is possible to download the links from it. Here the links are not embedded in the html page, and I will highly appreciate any help.

Browse links recursively using selenium

I'd like to know if is it possible to browse all links in a site (including the parent links and sublinks) using python selenium (example: yahoo.com),
fetch all links in the homepage,
open each one of them
open all the links in the sublinks to three four levels.
I'm using selenium on python.
Thanks
Ala'a
You want "web-scraping" software like Scrapy and possibly Beautifulsoup4 - the first is used to build a program called a "spider" which "crawls" through web pages, extracting structured data from them, and following certain (or all) links in them. BS4 is also for extracting data from web pages, and combined with libraries like requests can be used to build your own spider, though at this point something like Scrapy is probably more relevant to what you need.
There are numerous tutorials and examples out there to help you - just start with the google search I linked above.
Sure it is possible, but you have to instruct selenium to enter these links one by one as you are working within one browser.
In case, the pages are not having the links rendered by JavaScript in the browser, it would be much more efficient to fetch these pages by direct http request and process it this way. In this case I would recommend using requests. However, even with requests it is up to your code to locate all urls in the page and follow up with fetching those pages.
There might be also other Python packages, which are specialized on this kind of task, but here I cannot serve with real experience.

Categories