I need to extract data from a website but I found that it was rendered with Flutter Canvaskit renderer. It seems everything I wanted is drawn in the canvas. I have to go through each row, trigger click on a row and then trigger info button on top right which shows the file's attributes and get one of the attribute from there. [refer images]
Is this possible? If so, how? I want to do it in python.
The CORS issue.
In my case.
Use a web proxy like https://cors-anywhere.herokuapp.com/$urlTarget
Scrape the webpage in the back-end, then send the data via API.
I chose method 2 because it is easy to fix when the webpage changes.
Related
I work with python and data mine some content which I categorize into different categories.
Then I go to a specific webpage and submit manually the results.
Is there a way to automate the process? I guess this is a "form-submit" thread but I haven't seen any relevant module in Python. Can you suggest me something?
Selenium Webdriver is the most popular way to drive web pages, but Python also has beautifulsoup; Either library will work.
If you want make this automatic yo have to see which params are send in the form and make a request with this params to the endpoint but directly from your python app, or search a package that simulate a browser and fill the form, but I think that the correct way is making the request directly from your app
I am running the ELK stack for the log analysis in which kibana is being used as the data visualization.Now I want to extract the some fields from the kibana webpage.
I want to extract the CU and count field and as you can see I have attached the screenshot of the webpage and corresponding html source code.
Now I have tried to scrap the same webpage using the python and "Beautiful soap" library but there whatever code I am seeing it is different.
Please help.also,
Can you suggest me some other method by which I can extract the required fields?
It's better to make direct request to your elasticsearch for the data you need.
You can see the query executed by visualization if you go to Dashboard and click the arrow in the bottom left corner and select Request tab:
I tried web.seeother("link"), but this does not open it in a new tab. Now I can generate a link with a _blank tag, but then the user has to click on the link separately that is separate button for generating the link and another button to follow that link. I want to perform both with a single click. A server side method to do this would be best.
I am using the web.py framework.
As the document says web.seeother() is used for redirecting a user to another page. So a more clear way for asking your question is: "how to make web.seeother() open a link in a new tab"?
As I have observed the documents, There is no way to do that on server-side.
Not a web.py issue. Cannot be done from server-side by any python or non-python framework, must be done in the Client.
From the client, you can set target="_blank" in the HTML, or use javascript with something like window.open(url). Javascript will allow you to set size and position of second window.
Can I have any highlight kind of things using Python 2.7? Say when my script clicking on the submit button,feeding data into the text field or selecting values from the drop-down field, just to highlight on that element to make sure to the script runner that his/her script doing what he/she wants.
EDIT
I am using selenium-webdriver with python to automate some web based work on a third party application.
Thanks
This is something you need to do with javascript, not python.
[NOTE: I'm leaving this answer for historical purposes but readers should note that the original question has changed from concerning itself with Python to concerning itself with Selenium]
Assuming you're talking about a browser based application being served from a Python back-end server (and it's just a guess since there's no information in your post):
If you are constructing a response in your Python back-end, wrap the stuff that you want to highlight in a <span> tag and set a class on the span tag. Then, in your CSS define that class with whatever highlighting properties you want to use.
However, if you want to accomplish this highlighting in an already-loaded browser page without generating new HTML on the back end and returning that to the browser, then Python (on the server) has no knowledge of or ability to affect the web page in browser. You must accomplish this using Javascript or a Javascript library or framework in the browser.
I'm trying to scrape some information from a web site, but am having trouble reading the relevant pages. The pages seem to first send a basic setup, then more detailed info. My download attempts only seem to capture the basic setup. I've tried urllib and mechanize so far.
Firefox and Chrome have no trouble displaying the pages, although I can't see the parts I want when I view page source.
A sample url is https://personal.vanguard.com/us/funds/snapshot?FundId=0542&FundIntExt=INT
I'd like, for example, average maturity and average duration from the lower right of the page. The problem isn't extracting that info from the page, it's downloading the page so that I can extract the info.
The page uses JavaScript to load the data. Firefox and Chrome are only working because you have JavaScript enabled - try disabling it and you'll get a mostly empty page.
Python isn't going to be able to do this by itself - your best compromise would be to control a real browser (Internet Explorer is easiest, if you're on Windows) from Python using something like Pamie.
The website loads the data via ajax. Firebug shows the ajax calls. For the given page, the data is loaded from https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542
See the corresponding javascript code on the original page:
<script>populator = new Populator({parentId:
"profileForm:vanguardFundTabBox:tab0",execOnLoad:true,
populatorUrl:"/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542",
inline:fals e,type:"once"});
</script>
The reason why is because it's performing AJAX calls after it loads. You will need to account for searching out those URLs to scrape it's content as well.
As RichieHindle mentioned, your best bet on Windows is to use the WebBrowser class to create an instance of an IE rendering engine and then use that to browse the site.
The class gives you full access to the DOM tree, so you can do whatever you want with it.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(loband).aspx