Python Scrapy: imitate filter click - python

guys. I'm newbie to Scrapy and learning how to work with it. An issue occurred I can't figure out what to do next, perhaps someone more experienced will help me: I have a basic web-site with a list of items I want to parse and download. The issue is that the page has filters - pressing will filter items out. Basically I want to download data after pressing one filter but can not figure out how to do that. I noticed that pressing this filter will not change page url but the page will be reloaded so this is not ajax. Filter is marked in HTML as a link with href="javascript:qsn.set('comm','0',1);". How can I imitate this filter press with Scrapy?
Any help will be appreciated.

Related

Is there a way to get HTML text through python and a web browser?

I have a website for work and I need to go through a list of numbers and determine if the user associated with the number is still active. The website requires a sign in so I can't use requests. Is there a way I can run it through my chrome browser to get the information I require?
If I can get the HTML then I am fine from there onward with the code.
Any help would be greatly appreciated
Can I have the webpage you are trying to access?

Webscraper in python where I provide a webpage that has a list of links which the scraper then visits individually

I am a beginner in programming and I am trying to make a scraper. As of right now I'm using the requests library and BeautifulSoup. I provide the program a link and I am able to extract any information I want from that single web page. What I am trying to accomplish is as follows... I want to provide a web page to the program, the web page that I provide is a search result where there is a list of links that could be clicked. I want the program to be able to get the links of those search results, and then scrape some information from each of those specific pages from the main web page that I provide.
If anyone can give me some sort of guidance on how I could achieve this I would appreciate it greatly! Are there some other libraries I should be using? Is there some reading material you could refer me to, maybe a video?
You can put all the url links in a list then have your request-sending function loop through it. Use the requests or urllib package for this.
For the search logic, you would want to look for the <a> tag with href property.

Scraping webpage generated by javascript

I have a problem getting javascript content into HTML to use it for scripting. I used multiple methods as phantomjs or python QT library and they all get most of the content in nicely but the problem is that there are javascript buttons inside the page like this:
Pls see screenshot here
Now when I load this page from a script these buttons won't default to any value so I am getting back 0 for all SELL/NEUTRAL/BUY values below. Is there a way to set these values when you load the page from a script?
Example page with all the values is: https://www.tradingview.com/symbols/NEBLBTC/technicals/
Any help would be greatly appreciated.
If you are trying to achieve this with scrapy or with derivation of cURL or urrlib I am afraid that you can't do this. Python has another external packages such selenium that allow you to interact with the javascript of the page, but the problem with selenium is too slow, if you want something similar to scrapy you could check how the site works (as i can see it works through ajax or websockets) and fetch the info that you want through urllib, like you would do with an API.
Please let me know if you understand me or i misunderstood your question
I used seleneum which was perfect for this job, it is indeed slow but fits my purpose. I also used the seleneum firefox plugin to generate the python script as it was very challenging to find where exactly in the code as the button I had to press.

Python Selenium -- Searching for a link but finding cards

Using Python + Selenium to create a web crawler/scraper to notify me when new homework is posted. Managed to log into the main website, but you need to click a link to select your course.
After searching through the HTML manually, I found this information about the link I usually click (The blue box is the link).
However, no button that seems clickable. So I searched the page for the link I knew it should redirect me to, and I found this:
It looks like a card, which is a new data structure/object for me. How can I use an automated web crawler to click this link?
Try the following:
ui.WebDriverWait(self.driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".title.ellipsis")))
driver.find_element_by_css_selector(".title.ellipsis").click()
Hope it helps you!

Parsing a Dynamic Web Page using Python

I am trying to parse a WebPage whose html source code changes when I press a arrow-key to get a drop-down list.
I want to parse the contents of that drop down list. How can I do that?
Example of the Problem: If you go to this site: http://in.bookmyshow.com/hyderabad and select the arrow button on comboBox "Select Movie" a drop-down list of movies appears. I want to get a list of these movies.
Thanks in advance.
The actual URL with the data used to populate the drop-down box is here:
http://in.bookmyshow.com/getJSData/?file=/data/js/GetEvents_MT.js&cmd=GETEVENTSWEB&et=MT&rc=HYD&=1425299159643&=1425299159643
I'd be a bit careful though and double-check with the site terms of use or if there are any APIs that you could use instead.
You may want to have a look at selenium. It allows you to reproduce exacly the same steps as you do because it also uses the browser (Firefox, Chrome, etc).
Ofc, it's not as fast as using mechanize, urllib, beautifulsoup and all this stuff, but it is worth a try.
You will need to dig into the JavaScript to see how that menu gets populated. If it is getting populated via AJAX, then it might be easy to get that content by re-doing a request to the same URL (e.g., do a GET to "http://www.example.com/get_dropdown_entries.php").

Categories