English is my second-language, sorry for poor English.
I made a widget that displays crawled data from web. The web has options(saved to localstorage) that changes the value of data.
I am using requests_html package for my widget to render web.
And I want my widget to read localstorage data so that both web and widget shows the same result according to the options.
I do not want to use selenium since it makes widget crawling speed slower and it is also complicate to match users' chrome version.
Is there any way to solve such situation? Anyway to read localstorage with python without selenium?
Related
I have a problem getting javascript content into HTML to use it for scripting. I used multiple methods as phantomjs or python QT library and they all get most of the content in nicely but the problem is that there are javascript buttons inside the page like this:
Pls see screenshot here
Now when I load this page from a script these buttons won't default to any value so I am getting back 0 for all SELL/NEUTRAL/BUY values below. Is there a way to set these values when you load the page from a script?
Example page with all the values is: https://www.tradingview.com/symbols/NEBLBTC/technicals/
Any help would be greatly appreciated.
If you are trying to achieve this with scrapy or with derivation of cURL or urrlib I am afraid that you can't do this. Python has another external packages such selenium that allow you to interact with the javascript of the page, but the problem with selenium is too slow, if you want something similar to scrapy you could check how the site works (as i can see it works through ajax or websockets) and fetch the info that you want through urllib, like you would do with an API.
Please let me know if you understand me or i misunderstood your question
I used seleneum which was perfect for this job, it is indeed slow but fits my purpose. I also used the seleneum firefox plugin to generate the python script as it was very challenging to find where exactly in the code as the button I had to press.
I am working on a project which needs to programmatically access and update a .aspx (ASP.NET) page. Specifically, I need to automatically access this page, use several html and JavaScript elements (click checkboxes, enter text in form fields, "click" buttons), and reload the page. Also, during the time the page is accessed, there is information being sent back and forth between the client and server.
What is the most efficient way to go about this? I am most likely thinking about writing something in bash + python to do this but I am not sure it is the best tool for the job.
Thanks
The optimal solution for your problem is using Selenium with python.
The selenium package is used to automate web browser interaction from Python.
pip install -U selenium
You can read the documentation to get familiar with the Selenium Webdriver API.
You cannot edit the pages that are hosted by others, but you can mimic the requests using selenium.
I'm trying to put together a little collection of plugins that I need in order to interact with html pages. What I need ranges from simple browsing and interacting with buttons or links of a web page (as is "write some text in this textbox and press this button") to parsing a html page and sending custom get/post messages to the server.
I am using Python 3 and up to now I have Requests for simple webpage loading, custom get and post messages,
BeautifulSoup for parsing the HTML tree and I'm thinking of trying out Mechanize for simple web page interactions.
Are there any other libraries out there that are similar to the 3 I am using so far? Is there some sort of gathering place where all Python libraries hang out? Because I sometimes find if difficult to find what I am looking for.
The set of tools/libraries for web-scraping really depends on the multiple factors: purpose, complexity of the page(s) you want to crawl, speed, limitations etc.
Here's a list of tools that are popular in a web-scraping world in Python nowadays:
selenium
Scrapy
splinter
ghost.py
requests (and grequests)
mechanize
There are also HTML parsers out there, these are the most popular:
BeautifuSoup
lxml
Scrapy is probably the best thing that happened to be created for web-scraping in Python. It's really a web-scraping framework that makes it easy and straightforward, Scrapy provides everything you can imagine for a web-crawling.
Note: if there is a lot AJAX and js stuff involved in loading, forming the page you would need a real browser to deal with it. This is where selenium helps - it utilizes a real browser allowing you to interact with it by the help of a WebDriver.
Also see:
Web scraping with Python
Headless Selenium Testing with Python and PhantomJS
HTML Scraping
Python web scraping resource
Parsing HTML using Python
Hope that helps.
Can I have any highlight kind of things using Python 2.7? Say when my script clicking on the submit button,feeding data into the text field or selecting values from the drop-down field, just to highlight on that element to make sure to the script runner that his/her script doing what he/she wants.
EDIT
I am using selenium-webdriver with python to automate some web based work on a third party application.
Thanks
This is something you need to do with javascript, not python.
[NOTE: I'm leaving this answer for historical purposes but readers should note that the original question has changed from concerning itself with Python to concerning itself with Selenium]
Assuming you're talking about a browser based application being served from a Python back-end server (and it's just a guess since there's no information in your post):
If you are constructing a response in your Python back-end, wrap the stuff that you want to highlight in a <span> tag and set a class on the span tag. Then, in your CSS define that class with whatever highlighting properties you want to use.
However, if you want to accomplish this highlighting in an already-loaded browser page without generating new HTML on the back end and returning that to the browser, then Python (on the server) has no knowledge of or ability to affect the web page in browser. You must accomplish this using Javascript or a Javascript library or framework in the browser.
Hello how can i make changes in my web browser with python? Like filling forms and pressing Submit?
What lib's should i use? And maybe someone of you have some examples?
Using urllib does not make any changes in opened browser for me
Urllib is not intended to do anyting with your browser, but rather to get contents from urls.
To fill in forms and this kind of things, have a look into mechanize, to scrap the webpages, consider using pyquery.
Selenium is great for this. It's a browser automation tool that you can use to launch a browser (any major browser or a 'headless' one), navigate to a url, and interact with the page.
It's used primarily for testing web code against multiple browsers, but is also very useful for 'scraping' pages and automating mundane tasks.
Here are the python docs: http://selenium-python.readthedocs.org/en/latest/index.html