Python get content from the loaded webpage - python

In my Python Script, I would like to get content from the page that I launched using this script:
import webbrowser
url = 'https://yahoo.com'
webbrowser.get('Chrome').open_new(url)
I know how to get using Selenium Webdriver, and have never get it through script itself.
any ideas would helpful,
Thanks

Related

Why do i keep getting none when doing web scraping in python

This is the code that I wrote. I watched lot of tutorials but they get the output with exactly the same code
import requests
from bs4 import BeautifulSoup as bs
url="https://shop.punamflutes.com/pages/5150194068881408"
page=requests.get(url).text
soup=bs(page,'lxml')
#print(soup)
tag=soup.find('div',class_="flex xs12")
print(tag)
I always get none. Also the class name seems strange. The view source code has different stuff than the inspect element thing
Bs4 is weird. Sometimes it returns different code than what is on the page...it alters it depending on the source. Try using selenium. It works great and has many more uses than bs4. Most of all...it is super easy to find elements on a site.
It's not a bs4 problem, it is correctly parsing what requests returns. It rather depends on the webpage itself
If you inspect the "soup", you will see that the source of the page is a set of links to scripts that render the content on the page. In order for these scripts to be executed, you need to have a browser - requests will only get you what the webserver returns, but won't execute the javascript for you. You can verify this yourself by deactivating javascript in the developer tools of your browser.
The solution is to use a web browser (e.g. headless chrome + chromedriver) and Selenium to control it. There are plenty of good tutorials out there on how to do this.

requests not returning full html from a page

I am trying to get full HTML off of a webpage with Python for a little project I am doing. The HTML I get from printing the HTML is different from what I see on Chrome. Urllib does the same thing, I have also tried using selenium webdriver but the info on the webpage is always updating and I don't want to have it constantly opening Chrome.

Accessing Indeed through Python

My goal for this python code is to create a way to obtain job information into a folder. The first step is being unsuccessful. When running the code I want the url to print https://www.indeed.com/. However instead the code returns https://secure.indeed.com/account/login. I am open to using urlib or cookielib to resolve this ongoing issue.
import requests
import urllib
data = {
'action':'Login',
'__email':'email#gmail.com',
'__password':'password',
'remember':'1',
'hl':'en',
'continue':'/account/view?hl=en',
}
response = requests.get('https://secure.indeed.com/account/login',data=data)
print(response.url)
If you're trying to scrape information from indeed, you should use the selenium library for python.
https://pypi.python.org/pypi/selenium
You can then write your program within the context of a real user browsing the site normally.

Downloading a file from a html? url with python 3

I've been searching for hours on how to download a file the documentation shows me how to do this; but cygwin is horrible and an annoyance to use and I'm trying to implement this in Python 3 for a program. I've tried to use urllib, requests, wget(in python), httplib and some other. But it only fetched the redirected page (as you would get if you paste the link in the url bar with the properly formatted url.)
Though when I inspect a page and I trigger the download link that has the same address that I tried, it works properly and provide me with a download pop-up. Here is an example page the link is triggered by clicking "Download data"
I don't get how any python package is unable to send the proper get request and that I would need to implement this program in linux only to be able to use 'wget'.
Anyone has a clue on how to properly call the url?
You need to add &submit=Download+Data to the end of your URL to download the data. You can see this with the network tab of inspect element in google chrome. Hope I helped!
I think
from subprocess import call
def download(URL)
CMD = ['curl',url]
call(CMD)
to run this:
download('www.download.com/blah/bah/blah')
if you want to use this from the interpreter:
save as module.py
python -i /path/to/module.py
>>>download('www.download.com/blah/bah/blah')
p.s. if this works i'll prob use this in my shell program
EDIT: my comment:
I tried this and got "malformed url" error
from subprocess import call
def download(FILE,URL):
#FILE = file to save to
#URL - download from here
CMD = ['curl','-o',FILE,URL]
call(CMD)
this is what i do for all system commands from python so its something to do with curl specifically.

Python : Browser not able to browse URL using selenium

I am writing a python script which includes opening up an URL and do some activity on it. I am facing an issue when i execute below code then Firefox browser starts but it is not able to browse URL. What could be wrong here..?
I also tried to add proxy exception but that isn't solve issue.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('WEBSITE_URL')
So, Pls suggest what is wrong here.

Categories