Open a specific page on a website

Open a specific page on a website - python

Using the webbrowser module, I want to open a specific page on last.fm.
It picks a line from a text file then prints it. I want it to add that line at the end of:
webbrowser.open('http://www.last.fm/music/')
So for example, the random.choice picks example artist. I want example artist to be added at the end of the url correctly.
Any help is appreciated.

Use the urlparse.urljoin function to build up the full destination URL:
import urlparse
import webbrowser
artist_name = 'virt'
url = urlparse.urljoin('http://www.last.fm/music/', artist_name)
# Will open http://www.last.fm/music/virt in your browser.
webbrowser.open(url)

Related

Why won't webbrowser module open my html file in my browser

I am using the python webbrowser module to try and open a html file. I added a short thing to get code from a website to view, allowing me to store a web-page incase I ever need to view it without wifi, for instance a news article or something else.
The code itself is fairly short so far, so here it is:
import requests as req
from bs4 import BeautifulSoup as bs
import webbrowser
import re
webcheck = re.compile('^(https?:\/\/)?(www.)?([a-z0-9]+\.[a-z]+)([\/a-zA-Z0-9#\-_]+\/?)*$')
#Valid URL Check
while True:
url = input('URL (MUST HAVE HTTP://): ')
check = webcheck.search(url)
groups = list(check.groups())
if check != None:
for group in groups:
if group == 'https://':
groups.remove(group)
elif group.count('/') > 0:
groups.append(group.replace('/', '--'))
groups.remove(group)
filename = ''.join(groups) + '.html'
break
#Getting Website Data
reply = req.get(url)
soup = bs(reply.text, 'html.parser')
#Writing Website
with open(filename, 'w') as file:
file.write(reply.text)
#Open Website
webbrowser.open(filename)
webbrowser.open('https://www.youtube.com')
I added webbrowser.open('https://www.youtube.com') so that I knew the module was working, which it was, as it did open up youtube.
However, webbrowser.open(filename) doesn't do anything, yet it returns True if I define it as a variable and print it.
The html file itself has a period in the name, but I don't think that should matter as I have made a file without it as the name and it wont run.
Does webbrowser need special permissions to work?
I'm not sure what to do as I've removed characters from the filename and even showed that the module is working by opening youtube.
What can I do to fix this?

From the webbrowser documentation:
Note that on some platforms, trying to open a filename using this function, may work and start the operating system’s associated program. However, this is neither supported nor portable.
So it seems that webbrowser can't do what you want. Why did you expect that it would?

adding file:// + full path name does the trick for any wondering

Web Browser Module - Can't Loop through a list of URLs with 'for' Loops

I have the following code which will import a .txt file with a list of urls that, with the for statement, I'd like open one by one. Unfortunately webbrowser won't open the link by one by one, but it would open a new Chrome tab with the url "https://link" hence giving me a "about:blank" tab. Do you guys have any idea how to make it work?Thank you very much!
import webbrowser as wb
chrome="/Users/jamesnorton/applications %s"
file=open('File.txt')
for link in file:
wb.get("google").open_new('link')

You are passing a string as parameter not a variable.
Try this instead:
import webbrowser as wb
chrome="/Users/jamesnorton/applications %s"
file=open(file='File.txt',mode='r')
links = file.read()
lines = links.splitlines()
for link in lines:
wb.get("google").open_new(link)

Cycle trough URLs from a txt

This is my first question so please bear with me (I have googled this and I did not find anything)
I'm making a program which goes to a url, clicks a button, checks if the page gets forwarded and if it does saves that url to a file.
So far I've got the first two steps done but I'm having some issues.
I want Selenium to repeat this process with multiple urls (if possible, multiple at a time).
I have all the urls in a txt called output.txt
At first I did
url_list = https://example.com
to see if my program even worked, and it did however I am stuck on how to get it to go to the next URL in the list and I am unable to find anything on the internet which helps me.
This is my code so far
import selenium
from selenium import webdriver
url_list = "C\\user\\python\\output.txt"
def site():
driver = webdriver.Chrome("C:\\python\\chromedriver")
driver.get(url_list)
send = driver.find_element_by_id("NextButton")
send.click()
if (driver.find_elements_by_css_selector("a[class='Error']")):
print("Error class found")
I have no idea as to how I'd get selenium to go to the first url in the list then go onto the second one and so forth.
If anyone would be able to help me I'd be very grateful.

I think the problem is that you assumed the name of the file containing the url, is a url. You need to open the file first and build the url list.
According to the docs https://selenium.dev/documentation/en/webdriver/browser_manipulation/, get expect a url, not a file path.
import selenium
from selenium import webdriver
with open("C\\user\\python\\output.txt") as f:
url_list = f.read().split('\n')
def site():
driver = webdriver.Chrome("C:\\python\\chromedriver")
for url in url_list:
driver.get(url)
send = driver.find_element_by_id("NextButton")
send.click()
if (driver.find_elements_by_css_selector("a[class='Error']")):
print("Error class found")

Spynner: get html of second page after submitting form

I have just started using Spynner to scrape webpages and am not finding any good tutorials out there. I have here a simple example where I type a word into Google and then I want to see the resulting page.
But how do I go from clicking the button to actually getting the new page?
import spynner
def content_ready(browser):
if 'gbqfba' in browser.html:
return True #id of search button
b = spynner.Browser()
b.show()
b.load("http://www.google.com", wait_callback=content_ready)
b.wk_fill('input[name=q]', 'soup')
# b.browse() # Shows the word soup in the input box
with open("test.html", "w") as hf: # writes the initial page to a file
hf.write(b.html.encode("utf-8"))
b.wk_click("#gbqfba") # Clicks the google search button (or so I think)
But now what? I'm not even sure that I have clicked the google search button, although it does have id=gbqfba. I have also tried just b.click("#gbqfba"). How do I get the search results?
I have tried just doing:
with open("test.html", "w") as hf: # writes the initial page to a file
hf.write(b.html.encode("utf-8"))
but that still prints the initial page.

i solved this by sending Enter to the input and waiting two seconds. not ideal, but it works
import spynner
import codecs
from PyQt4.QtCore import Qt
b = spynner.Browser()
b.show()
b.load("http://www.google.com")
b.wk_fill('input[name=q]', 'soup')
# b.browse() # Shows the word soup in the input box
b.sendKeys("input[name=q]",[Qt.Key_Enter])
b.wait(2)
codecs.open("out.html","w","utf-8").write(b.html)

The recommended method is to wait for the new page to load:
b.wait_load()

Iterating through multiple URLs from .txt file with Python/BeautifulSoup

I'm trying to create a script that takes a .txt file with multiple lines of YouTube usernames, appends it to the YouTube user homepage URL, and crawls through to get profile data.
The code below gives me the info I want for one user, but I have no idea where to start for importing and iterating through multiple URLs.
#!/usr/bin/env python
# -- coding: utf-8 --
from bs4 import BeautifulSoup
import re
import urllib2
# download the page
response = urllib2.urlopen("http://youtube.com/user/alxlvt")
html = response.read()
# create a beautiful soup object
soup = BeautifulSoup(html)
# find the profile info & display it
profileinfo = soup.findAll("div", { "class" : "user-profile-item" })
for info in profileinfo:
print info.get_text()
Does anyone have any recommendations?
Eg., if I had a .txt file that read:
username1
username2
username3
etc.
How could I go about iterating through those, appending them to http://youtube.com/user/%s, and creating a loop to pull all the info?

If you don't want to use an actual scraping module (like scrapy, mechanize, selenium, etc), you can just keep iterating on what you've written.
use the iteration on file objects to read line by line A few things, a neat fact about file objects, is that, if they are opened with 'rb', they actually call readline() as their iterator, so you can just do for line in file_obj to go line by line in a document.
concatenate urls I used + below, but you can also use the concatenate function.
make a list of urls - will let you stagger your requests, so you can do compassionate screen scraping.
# Goal: make a list of urls
url_list = []
# use a try-finally to make sure you close your file.
try:
f = open('pathtofile.txt','rb')
for line in f:
url_list.append('http://youtube.com/user/%s' % line)
# do something with url list (like call a scraper, or use urllib2
finally:
f.close()
EDIT: Andrew G's string format is clearer. :)

You'll need to open the file (preferably with the with open('/path/to/file', 'r') as f: syntax) and then do f.readline() in a loop. Assign the results of readline() to a string like "username" and then run your current code inside the loop, starting with response = urllib2.urlopen("http://youtube.com/user/%s" % username).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Open a specific page on a website - python

Use the urlparse.urljoin function to build up the full destination URL: import urlparse import webbrowser artist_name = 'virt' url = urlparse.urljoin('http://www.last.fm/music/', artist_name) # Will open http://www.last.fm/music/virt in your browser. webbrowser.open(url)

Related

Why won't webbrowser module open my html file in my browser

Web Browser Module - Can't Loop through a list of URLs with 'for' Loops

Cycle trough URLs from a txt

Spynner: get html of second page after submitting form

Iterating through multiple URLs from .txt file with Python/BeautifulSoup

Categories

Resources