Craigslist selenium diagnose what's happening - python

https://sfbay.craigslist.org/sfc/rts/d/san-francisco-new-mortgages-reverse/6908455741.html
How do I diagnose what's happening when I click the buttons "reply" -> "show phone number"?
There is this popup and loading button. I would like to see what is being sent from the browser to the server when clicking the first and second button. The source code is obviously changing dynamically so it is not practical to track by inspecting the source. I imagine there's a plugin or some tool to intercept or read the requests.

You can capture the requests using Fiddler, it captures http/https requests, and help you understand how to mimic the functionality you're looking for.

In Chrome or Firefox you could open the console and simply see the network traffic generated by a button.
With your example, you can see there is a call to the API at the following address:
https://sfbay.craigslist.org/contactinfo/sfo/rts/6908455741
Once you analyzed it, you can script the behaviour with Selenium and grab the data.

Related

How to talk with alert from page and not browser with Selenium

This is what the alert looks like. It reads: The proxy it3.prmsrvs.com is requesting a username and password I have a program in Selenium that automates a certain amount of stuff on a website.
For some reason (this has been encountered also by my colleagues when using the browser normally without automation) there is the possibility of a pop-up showing in the webpage at a random point.
My program's objective is to load a page, get a list of all the elements corresponding to a specific tag and then click on them one by one. They all open in a new tab.
There is the possibility of a pop-up to show in the page after closing one of the tabs.
This pop-up asks for a login but all I have to do is dismiss it and the page will keep working like always.
Now, I've seen people using driver.switch_to.alert.dismiss() but this doesn't seem to work on this page.
I checked the function on a very basic js alert and the dismiss works perfectly, so I think it's the type of alert that is the issue.
I can't inspect the page so I don't know how to retrieve the iFrame of this alert. (I saw this as a possible solution online)
What should I do?

Source code is not complete because "JS is disabled in your browser"

I'm writing a python code to, at first, get a full source code of a web page to later scrape it. But when I try to get the source code - I see the aforementioned message ("If you're seeing this message, that means JavaScript has been disabled on your browser, please enable JS to make this app work") with partial html code. Also when I click F12 to see 'elements' the entire code appears meanwhile, pressing Cntrl + U to view the source code yields the same result as getting it with the below mentioned py script
source = requests.get(link).text
soup = BeautifulSoup(source, 'lxml').prettify()
I've seen similar questions to mine but none of them had a satisfactory solution, for example, it was recommended to use selenium to open a new web page and then to work with it, but it would take additional time. JS is enabled in my browser
It is as you have seen on the other answers, you have to use selenium (or another browser automation tool) to enable javascript rendering. The web page you are trying to access uses client side rendering, which means that the first thing it sends when you access the url is a bunch of javascript code. Then the browser executes the javascript code to create the DOM of the web page.
You are saying that javascript is enabled in the browser but that has nothing to do with your python code. The library you are using requests is sending a HTTP GET request to the server to fetch the web page, and the server replies as it would to any other request with the javascript that knows how to render the web page. That's why you need something like selenium, that runs a browser instead of doing a simple HTTP request.

How do I hide the fact I'm using a bot?

So for my python selenium script I have to complete a lot of Captcha's. I noticed that when I get the Captcha's on my regular browser they're much easier and quicker. Is there a way for me to hide the fact that I'm using a web automation bot so I get the easier Captcha's?
I already tried randomizing the User Agent but to no success.
You can go to your website and inspect the page. Then go to the network tab and select Network. Reload the page and the select the webpage you are accessing from the list. If you scroll down, you can see the user agent that your browser is using to access the page. Use that user agent in your scraper to exactly mimick your browser.
From a generic perspective there are no proven ways to hide the fact that you are using a Selenium driven web automation bot.
You can find a relevant detailed discussion in Can a website detect when you are using Selenium with chromedriver?
However at certain times modifying the navigator.webdriver flag helps to prevent detection.
References
You can find a couple of relevant detailed discussions in:
Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
Selenium Chrome gets detected
How does recaptcha 3 know I'm using selenium/chromedriver?

Pressing a submit button in python using requests

So i want to check if my grades has been uploded, to do that i need to log in a website and then check if it has been changed.
the thing is after you log in you have two buttons: "agree" and "dont agree"
and when i try to press it, the website gives me an error message "wrong use
If you use a Personal Firewall, please deactivate it"
I'll upload my code in few hours
The website's code
Any help would be grateful.
You can use python-selenium to press buttons, you can also hide the browser instance.
Like #e4c5 said, you can really push button by python. As for me, the DevTools of Chrome does help a lot.
Here's the manual of the DevTools.
The 'Network' tab in DevTools will tell you all the requests and resources in the web page.If you want to see the detail request of the button-pushing action, You can firstly push the button and then see the new request shown in 'Network' tab.
You have to put Referer in the headers. This is how to bypass the university's check. Look at the original request (via Chrome dev tools), and copy the referrer into your request.

Selenium: What functions would fire request?

I am new to Selenium and web applications. Please bear with me for a second if my question seems way too obvious. Here is my story.
I have written a scraper in Python that uses Selenium2.0 Webdriver to crawl AJAX web pages. One of the biggest challenge (and ethics) is that I do not want to burn down the website's server. Therefore I need a way to monitor the number of requests my webdriver is firing on each page parsed.
I have done some google-searches. It seems like only selenium-RC provides such a functionality. However, I do not want to rewrite my code just for this reason. As a compromise, I decided to limit the rate of method calls that potentially lead to the headless browser firing requests to the server.
In the script, I have the following kind of method calls:
driver.find_element_by_XXXX()
driver.execute_script()
webElement.get_attribute()
webElement.text
I use the second function to scroll to the bottom of the window and get the AJAX content, like the following:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Based on my intuition, only the second function will trigger request firing, since others seem like parsing existing html content.
Is my intuition wrong?
Many many thanks
Perhaps I should elaborate more. I am automating a process of crawling on a website in Python. There is a subtantial amount of work done, and the script is running without large bugs.
My colleagues, however, reminded me that if in the process of crawling a page I made too many requests for the AJAX list within a short time, I may get banned by the server. This is why I started looking for a way to monitor the number of requests I am firing from my headless PhantomJS browswer in script.
Since I cannot find a way to monitor the number of requests in script, I made the compromise I mentioned above.
Therefore I need a way to monitor the number of requests my webdriver
is firing on each page parsed
As far as I know, the number of requests is depending on the webpage's design, i.e. the resources used by the webpage and the requests made by Javascript/AJAX. Webdriver will open a browser and load the webpage just like a normal user.
In Chrome, you can check the requests and responses using Developer Tools panel. You can refer to this post. The current UI design of Developer Tools is different but the basic functions are still the same. Alternatively, you can also use the Firebug plugin in Firefox.
Updated:
Another method to check the requests and responses is by using Wireshark. Please refer to these Wireshark filters.

Categories