Web scraping of my Kibana server - python

I am running the ELK stack for the log analysis in which kibana is being used as the data visualization.Now I want to extract the some fields from the kibana webpage.
I want to extract the CU and count field and as you can see I have attached the screenshot of the webpage and corresponding html source code.
Now I have tried to scrap the same webpage using the python and "Beautiful soap" library but there whatever code I am seeing it is different.
Please help.also,
Can you suggest me some other method by which I can extract the required fields?

It's better to make direct request to your elasticsearch for the data you need.
You can see the query executed by visualization if you go to Dashboard and click the arrow in the bottom left corner and select Request tab:

Related

Get web page content where source code isn't visible

I want to get all the Advisory ID and CVE ID from this page 
https://psirt.global.sonicwall.com/vuln-list
My earlier approach was to extract links and IDs from source code (I have followed this approach with other vendors such as Google chrome update and Mozilla update). But here I cannot see any data in the source code. When I am in inspect mode though, I can see the data. However, when I view the source code, I cannot find it.
I tried logging the traffic and then searching for the piece of data it seems like it's requesting https://psirtapi.global.sonicwall.com/api/v1/vulnsummary/?srch=&vulnerable_products=&ord=-advisory_id for the data, you're looking for and then returns it in the response. You can then parse it.

How to scrape the webpage build with Flutter CanvasKit renderer

I need to extract data from a website but I found that it was rendered with Flutter Canvaskit renderer. It seems everything I wanted is drawn in the canvas. I have to go through each row, trigger click on a row and then trigger info button on top right which shows the file's attributes and get one of the attribute from there. [refer images]
Is this possible? If so, how? I want to do it in python.
The CORS issue.
In my case.
Use a web proxy like https://cors-anywhere.herokuapp.com/$urlTarget
Scrape the webpage in the back-end, then send the data via API.
I chose method 2 because it is easy to fix when the webpage changes.

Web scraping for dummies (or not)

GOAL
Extract data from a web page.. automatically.
Data are on this page... Be careful , it's in French...
MY HARD WAY, manually
I choose the data I want by clicking on the desired fields on the left side ('CHOISIR DES INDICATEURS')
Then I select ('Tableau' = Table), to have data table.
Then I click on ('Action'), on the right side, then ('Exporter' = Export)
I choose the format I want (ie CSV) and hit ('Executer'= Execute) to download the file.
WHAT I TRIED
I tried to automate this process, but It's like an impossible task for me. I tried to inspect the page for the network exchanges to see if there is an underlying server I could make easy json request.
I mainly work with python and frameworks like BS4 or scrapy.
I have few data to extract, so I can easily do it manually. Thus this question, I just purely for my own knowledge, to see if it is possible to scrape a page like that.
I would appreciate if you could share your skills!
Thank you,
It is possible. Check this website for details. This website will tell you how to scrape a website with an example.
https://realpython.com/beautiful-soup-web-scraper-python/#scraping-the-monster-job-site

How to approach web-scraping in python

I am new to python just started on python web-scraping. I have to scrape data from this realtor site
I need to scrape all the details op read-state agents according to their real-state agency;
For this on the web-browser I have to follow the following instructions
Go to this site
click on agency offices button, enter 4000 pin in search box and then submit.
then we get list of the agencies.
go to our team tab and then we get agents their.
then we have to go to each agents page and record their information.
Can anyone tell me how to approach this. Whats the best way to make this type of scrapers.
Do i have to use selenium for the interaction with the pages.
I have worked on request, BeautifulSoup and simple form submit using mechanize
I would recommend on a searching site that you either use Selenium or Requests with sessions, the advantage of Selenium it it will probably work however it will be slow. For Selenium you should just use the Selenium IDE (Firefox add on) to record what you do then get the HTML from the webpage and use beautifulsoup to parse the data.
If you want to scrape the data quickly and without using much resources I usually use Requests with Sessions. To scrape a website like this you should open up a modern web browser (Firefox, Chrome) and use the network tools for that browser (usually located in developer tools or via right click inspect element). Once you are recording the network you can interact with the webpage to see the connections made to the server. In an example search they may use suggestions e.g
https://suggest.example.com.au/smart-suggest?query=4000&n=7&regions=false
The response then will probably be a JSON of the suggested results. Once you select a suggestion you can just submit a request with that search parameters e.g
https://www.example.com.au/find-agent/agents/petrie-terrace-qld-4000
The URLs for the agents will the be in that HTML page, you just then need to separately send a request to each page to get the information using BeautifulSoup.
You might wanna give Node and Jquery a try. I used to use Python all the time, but it gets messy and hard to maintain after a while.
Using node, you can turn the page HTML into a DOM object and then scrape all the data very easily using Jquery. I have done this for imdb here: “Using JQuery & NodeJS to scrape the web” #asimmittal https://medium.com/#asimmittal/using-jquery-nodejs-to-scrape-the-web-9bb5d439413b
You can modify this to scrape yelp

web scraping a problem site

I'm trying to scrape some information from a web site, but am having trouble reading the relevant pages. The pages seem to first send a basic setup, then more detailed info. My download attempts only seem to capture the basic setup. I've tried urllib and mechanize so far.
Firefox and Chrome have no trouble displaying the pages, although I can't see the parts I want when I view page source.
A sample url is https://personal.vanguard.com/us/funds/snapshot?FundId=0542&FundIntExt=INT
I'd like, for example, average maturity and average duration from the lower right of the page. The problem isn't extracting that info from the page, it's downloading the page so that I can extract the info.
The page uses JavaScript to load the data. Firefox and Chrome are only working because you have JavaScript enabled - try disabling it and you'll get a mostly empty page.
Python isn't going to be able to do this by itself - your best compromise would be to control a real browser (Internet Explorer is easiest, if you're on Windows) from Python using something like Pamie.
The website loads the data via ajax. Firebug shows the ajax calls. For the given page, the data is loaded from https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542
See the corresponding javascript code on the original page:
<script>populator = new Populator({parentId:
"profileForm:vanguardFundTabBox:tab0",execOnLoad:true,
populatorUrl:"/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542",
inline:fals e,type:"once"});
</script>
The reason why is because it's performing AJAX calls after it loads. You will need to account for searching out those URLs to scrape it's content as well.
As RichieHindle mentioned, your best bet on Windows is to use the WebBrowser class to create an instance of an IE rendering engine and then use that to browse the site.
The class gives you full access to the DOM tree, so you can do whatever you want with it.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(loband).aspx

Categories