Python spynner not loading my page - python

I install python spynner im tring to load my page "http://nexo.rf.gd/app.js"
Cant load this please help
I try with mechinize.Browser() but it can't load html
Then i try with selenium remote webdrivers and finally i try with spynner because this browser can able to run js but dont know how ?? Help me please
## with mechanize.Browser() ##
import mechanize
Br = mechanize.Browser()
Br.open('http://nexo.rf.gd/app.js')
Br.response().read()
Answer is HTML code witch says this site need js enable browser
answer will be 'import sqlite'`

Finally i found a way to open my page html code #witch is import sqlite3
Every browser fails but with dryscrape my problem solve thanks anyway

Related

How to display hidden html elements using BeautifulSoup?

I'm trying to scrape video from the website. I can find the video link using Chrome DevTools. But when I use BeautifulSoup to get the video link. The link is hidden. Please help modify the code below to get the video link.
There is the screenshot of the Chrome DevTools. Basically, I need the 'src' of the 'video' tag.
import re
import urllib.request
from bs4 import BeautifulSoup as BS
url_video='http://s.weibo.com/video?q=%23%E6%AC%A7%E9%98%B3%E5%A6%AE%E5%A6%AE%23&xsort=hot&hasvideo=1&tw=video&Refer=weibo_video'
#open and read page
page=urllib.request.urlopen(url_video)
html=page.read()
#create BeautifulSoup parse-able "soup"
soup = BS(html, "lxml")
lst_url_video=[]
print(soup.body.find_all('div',class_='thumbnail')[0])
Please help modify the code to get the video link.
There is a possibility that the site is using some client-side javascript to load some of its html content. When you make a request using urllib.request, it wont execute any client-side javascript. So if the site does load some of its html content via client-side javascript, you'll need a javascript engine in order to run it (i.e. a web browser). You can use a headless browser to execute client-side javascript while scraping a web page. Here's a guide to using chrome headless with puppeteer
https://medium.com/#e_mad_ehsan/getting-started-with-puppeteer-and-chrome-headless-for-web-scraping-6bf5979dee3e

python open webbrowser and get html

Now let me start off by saying that I know bs4, scrapy, selenium and so much more can do this but that isnt what I want for numerous reasons.
What I would like to do is open a webbrowser (chrome, ie, firefox) and extract the html from the page after loading the site in that web browser from what webbrowser.
import webbrowser
import time
class ScreenCapture:
url = 'https://www.google.com/'
webbrowser.get("C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s").open(url)
# get html from browser that is open

Download File Online Using lxml

I am trying to download a file from the internet via a python script. I am using the "mechanize" module to access the web but then when I try to follow links it gives an html read error as the web page is in lxml format. My code is below:
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.open("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/")
Next I would normally do the following:
request = br.retrieve('link.zip')[0]
However this produces the html error mentioned. Could someone help me out please?

How to search within a website using python requests module and returning the response page?

I want to retrieve data from a website named as myip.ms. I'm using requests to send data to form and then I want the response page back to me. When I run the script it returns the same page (homepage) in response. I want the next page using the query I provide. I'm new in WebScraping. Here's the code I'm using to achieve this.
import requests
from urllib.parse import urlencode, quote_plus
payload={
'name':'educationmaza.com',
'value':'educationmaza.com',
}
payload=urlencode(payload)
r=requests.post("http://myip.ms/s.php",data=payload)
infile=open("E://abc.html",'wb')
infile.write(r.content)
infile.close()
I'm no expert, but it appears that when interacting with the webpage, the post is processed by jQuery, which requests does not do well with.
As such, you would have to use the Selenium module to interact with it.
The following code will execute as desired:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("https://myip.ms/s.php")
driver.find_element_by_id("home_txt").send_keys('educationmaza.com')
driver.find_element_by_id("home_submit").click()
html = driver.page_source
infile=open("stack.html",'w')
infile.write(html)
infile.close()
You will have to install the Selenium package, as well as Phantom.JS.
I have tested this code, and it works fine. Let me know if you need any further help!

building Web browser in Python and issue regarding cookies

I know this sounds weird, but I have got no choice, I searched the google and I found nothing, So..
I'm following a video tutorial https://www.youtube.com/watch?v=JEW50aEVi4k on 'building a webbrowser in python', I was wondering if cookies can be saved, So is it possible ?
If yes, then could you give some suggestions.
Cookies are not a problem - you can use mechanize (https://pypi.python.org/pypi/mechanize/) which saves and sends the cookies automatically.
import mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
response = browser.open('http://www.youtube.com')
#Headers are handled automatically. You can access them:
headers = browser.request.header_items()
>>> headers
[('Host', 'www.youtube.com'), ('Cookie', 'YSC=cNcoiHG71bY; VISITOR_INFO1_LIVE=uLHsDODGalg; PREF=f1=50000000'), ('User-agent', 'Python-urllib/2.7')]
It is very hard to write a browser with Javascript support. If you need javasctipt then i suggest you to use selenium with PhantomJS which acts just like a real browser.

Categories