Alright, I've created a script that enters this local site and searches for a specific "Numero de Origem", a simple integer. However, that required clicking my way through a drop-down menu, typing the number and getting the results on that same page (no redirects or search-specific urls).
With all that in mind, I used Selenium and the script works fine.
Now I'm trying to run that script on a server. Unfortunatelly, the server does not support Selenium, neither BeautifulSoup or even requests.
Is there a way to make this work while using only urllib or urllib2? Here is a link of supported python modules, just in case.
Thanks, guys! Really appreciate any help!
Related
I have a website for work and I need to go through a list of numbers and determine if the user associated with the number is still active. The website requires a sign in so I can't use requests. Is there a way I can run it through my chrome browser to get the information I require?
If I can get the HTML then I am fine from there onward with the code.
Any help would be greatly appreciated
Can I have the webpage you are trying to access?
I'm practicing in parsing web pages with python. So what I do is
ans = requests.get(link)
Then I use re to extract some information from html, that is stored in
ans.content
What I faced is that some sites use scripts, that are automatically executed in a browser, but not when I try to download a page using requests. For example, instead of getting a page with information I get something like
scripts_to_get_info.run()
in html code
Browser is installed on my computer, so as a program that I wrote, this means that, theoretically, I should have a way to run this script and to get information while running python code to parse then.
Is it possible? Any suggestion?
(idea, that this is doable, came from the fact, that when I tried to inspect page in google, I saw real html file without any trashy scripts)
I'm trying to run web searches using a python script. I know how to make it work for most sites, such as using the requests library to get "url+query arguments".
I'm trying to run searches on wappalyzer.com. But when you run a search its url doesn't change. I also tried inspecting the html to try and figure out where the search is taking place, so that I could use beautiful soup to change the html and run it but to no avail. I'm really new to web scraping so would love the help.
The URL does not change because the search works with javascript and asynchronous requests. The easiest way to automate such task is to execute the javascript and interact with programatically (often easier than retro engineering the requests the client does, except if a public API is available).
You could use selenium with python, which is pretty easy to use, or any automation framework that executes Javascript by running a web driver (gecko, chrone, phantomjs).
With selenium, you will be able to program your scraper pretty easily, by selecting the field of search (using css selectors or xpath for example), inputing a value and validating the search. You will then be able to dump the whole page or specific parts you need.
I have a problem getting javascript content into HTML to use it for scripting. I used multiple methods as phantomjs or python QT library and they all get most of the content in nicely but the problem is that there are javascript buttons inside the page like this:
Pls see screenshot here
Now when I load this page from a script these buttons won't default to any value so I am getting back 0 for all SELL/NEUTRAL/BUY values below. Is there a way to set these values when you load the page from a script?
Example page with all the values is: https://www.tradingview.com/symbols/NEBLBTC/technicals/
Any help would be greatly appreciated.
If you are trying to achieve this with scrapy or with derivation of cURL or urrlib I am afraid that you can't do this. Python has another external packages such selenium that allow you to interact with the javascript of the page, but the problem with selenium is too slow, if you want something similar to scrapy you could check how the site works (as i can see it works through ajax or websockets) and fetch the info that you want through urllib, like you would do with an API.
Please let me know if you understand me or i misunderstood your question
I used seleneum which was perfect for this job, it is indeed slow but fits my purpose. I also used the seleneum firefox plugin to generate the python script as it was very challenging to find where exactly in the code as the button I had to press.
I tried to link my HTML file with Python code.
I tried this
import webbrowser
webbrowser.open_new_tab("data.HTML")
It returned my HTML page in Firefox..
But i need to return to my Python program to execute remaining lines
But when I closed this browser, it closes my Python script too.
And I tried to link my Python program by,
go to Python
it returns to text editor not to terminal...
But I need to return to terminal.
I need the solution
As someone described, you need to use one web framework (like flask, django others) to run python code. Or the second solution is using CGI(http://modwsgi.readthedocs.io/en/develop/).
For the second problem(want to keep running python code after browser is closed), I want to advice to use Selenium.
Cheers, John