How to read remote page content using Python

How to read remote page content using Python - python

I need to read the remote file content using python but here I am facing some challenges. My code is below:
import subprocess
path = 'http://securityxploded.com/remote-file-inclusion.php'
subprocess.Popen(["rsync", host-ip+path],stdout=subprocess.PIPE)
for line in ssh.stdout:
line
Here I am getting the error NameError: name 'host' is not defined. I could not know what should be the host-ip value because I am running my Python file using terminal(python sub.py). Here I need to read the content of the http://securityxploded.com/remote-file-inclusion.php remote file.

You need the urllib library. Also you are using parameters which you don't use.
Try something like this:
import urllib.request
fp = urllib.request.urlopen("http://www.stackoverflow.com")
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
print(mystr)
Note: this is for python 3
For python 2.7 use this:
import urllib
fp = urllib.urlopen("http://www.stackoverflow.com")
myfile = fp.read()
print myfile

if you want to read remote content via http.
requests or urllib2 are both good choice.
for Python2, use requests.
import requests
resp = requests.get('http://example.com/')
print resp.text
will work.

Related

How to get file from url in python?

I want to download text files using python, how can I do so?
I used requests module's urlopen(url).read() but it gives me the bytes representation of file.

For me, I had to do the following (Python 3):
from urllib.request import urlopen
data = urlopen("[your url goes here]").read().decode('utf-8')
# Do what you need to do with the data.

You can use multiple options:
For the simpler solution you can use this
file_url = 'https://someurl.com/text_file.txt'
for line in urllib.request.urlopen(file_url):
print(line.decode('utf-8'))
For an API solution
file_url = 'https://someurl.com/text_file.txt'
response = requests.get(file_url)
if (response.status_code):
data = response.text
for line in enumerate(data.split('\n')):
print(line)

When downloading text files with python I like to use the wget module
import wget
remote_url = 'https://www.google.com/test.txt'
local_file = 'local_copy.txt'
wget.download(remote_url, local_file)
If that doesn't work try using urllib
from urllib import request
remote_url = 'https://www.google.com/test.txt'
file = 'copy.txt'
request.urlretrieve(remote_url, file)
When you are using the request module you are reading the file directly from the internet and it is causing you to see the text in byte format. Try to write the text to a file then view it manually by opening it on your desktop
import requests
remote_url = 'test.com/test.txt'
local_file = 'local_file.txt'
data = requests.get(remote_url)
with open(local_file, 'wb')as file:
file.write(data.content)

How to download a zip file from a URL with authentification? [duplicate]

I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.
Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?
I'm not sure how to use shutil and os modules, either.
The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!
Here's a partial solution, that I wrote from various answers combined:
import requests
import os
import shutil
global dump
def download_file():
global dump
url = "http://randomsite.com/file.gz"
file = requests.get(url, stream=True)
dump = file.raw
def save_file():
global dump
location = os.path.abspath("D:\folder\file.gz")
with open("file.gz", 'wb') as location:
shutil.copyfileobj(dump, location)
del dump
Could someone point out errors (beginner level) and explain any easier methods to do this?

A clean way to download a file is:
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")
This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.
This example uses the urllib library, and it will directly retrieve the file form a source.

For Python3+ URLopener is deprecated.
And when used you will get error as below:
url_opener = urllib.URLopener() AttributeError: module 'urllib' has no
attribute 'URLopener'
So, try:
import urllib.request
urllib.request.urlretrieve(url, filename)

As mentioned here:
import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")
EDIT: If you still want to use requests, take a look at this question or this one.

Four methods using wget, urllib and request.
#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget
url = 'https://tinypng.com/images/social/website.jpg'
def testRequest():
image_name = 'test1.jpg'
r = requests.get(url, stream=True)
with open(image_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
def testRequest2():
image_name = 'test2.jpg'
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save(image_name)
def testUrllib():
image_name = 'test3.jpg'
testfile = urllib.URLopener()
testfile.retrieve(url, image_name)
def testwget():
image_name = 'test4.jpg'
wget.download(url, image_name)
if __name__ == '__main__':
profile.run('testRequest()')
profile.run('testRequest2()')
profile.run('testUrllib()')
profile.run('testwget()')
testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds
testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds
testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds
testwget - 3489 function calls in 0.020 seconds

I use wget.
Simple and good library if you want to example?
import wget
file_url = 'http://johndoe.com/download.zip'
file_name = wget.download(file_url)
wget module support python 2 and python 3 versions

Exotic Windows Solution
import subprocess
subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/dnishimoto/python-deep-learning/master/list%20iterators%20and%20generators.ipynb", "test.ipynb")
downloads a single raw juypter notebook to file.

For text files, you can use:
import requests
url = 'https://WEBSITE.com'
req = requests.get(url)
path = "C:\\YOUR\\FILE.html"
with open(path, 'wb') as f:
f.write(req.content)

I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.
I had to disable the firewall(lazy)/enable https out by editing the rules(proper)
created the python script:
import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()
dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
with open("file.ova", 'wb') as tmp_file:
shutil.copyfileobj(response, tmp_file)
ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path

Another clean way to save the file is this:
import csv
import urllib
urllib.retrieve("your url goes here" , "output.csv")

Send Rust variable to Python?

I am building a Rust program in which the user types in a command, and then the program reads the command and responds accordingly. One of these commands is to download a file from a set site.
I have a .py file with the following code that I made a while ago that downloads files from a set site:
import urllib
import urllib2
import requests
url = 'http://www.blog.pythonlibrary.org/wpcontent/uploads/2012/06/wxDbViewer.zip'
print "downloading with urllib"
urllib.urlretrieve(url, "code.zip")
print "downloading with urllib2"
f = urllib2.urlopen(url)
data = f.read()
with open("code2.zip", "wb") as code:
code.write(data)
print "downloading with requests"
r = requests.get(url)
with open("code3.zip", "wb") as code:
code.write(r.content)
The URLs in the code are not ones that I will be using; they are examples.
If the Rust program sets the site it needs to go to as a variable, is there a way that I could send the variable to my Python program? I know you can send Python to Rust:
Passing a list of strings from Python to Rust
http://www.joesacher.com/blog/2017/08/24/ptr-types/
Is there a way to do this in the other direction?

Basic http file downloading and saving to disk in python?

I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.
Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?
I'm not sure how to use shutil and os modules, either.
The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!
Here's a partial solution, that I wrote from various answers combined:
import requests
import os
import shutil
global dump
def download_file():
global dump
url = "http://randomsite.com/file.gz"
file = requests.get(url, stream=True)
dump = file.raw
def save_file():
global dump
location = os.path.abspath("D:\folder\file.gz")
with open("file.gz", 'wb') as location:
shutil.copyfileobj(dump, location)
del dump
Could someone point out errors (beginner level) and explain any easier methods to do this?

A clean way to download a file is:
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")
This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.
This example uses the urllib library, and it will directly retrieve the file form a source.

For Python3+ URLopener is deprecated.
And when used you will get error as below:
url_opener = urllib.URLopener() AttributeError: module 'urllib' has no
attribute 'URLopener'
So, try:
import urllib.request
urllib.request.urlretrieve(url, filename)

As mentioned here:
import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")
EDIT: If you still want to use requests, take a look at this question or this one.

Four methods using wget, urllib and request.
#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget
url = 'https://tinypng.com/images/social/website.jpg'
def testRequest():
image_name = 'test1.jpg'
r = requests.get(url, stream=True)
with open(image_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
def testRequest2():
image_name = 'test2.jpg'
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save(image_name)
def testUrllib():
image_name = 'test3.jpg'
testfile = urllib.URLopener()
testfile.retrieve(url, image_name)
def testwget():
image_name = 'test4.jpg'
wget.download(url, image_name)
if __name__ == '__main__':
profile.run('testRequest()')
profile.run('testRequest2()')
profile.run('testUrllib()')
profile.run('testwget()')
testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds
testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds
testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds
testwget - 3489 function calls in 0.020 seconds

I use wget.
Simple and good library if you want to example?
import wget
file_url = 'http://johndoe.com/download.zip'
file_name = wget.download(file_url)
wget module support python 2 and python 3 versions

Exotic Windows Solution
import subprocess
subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/dnishimoto/python-deep-learning/master/list%20iterators%20and%20generators.ipynb", "test.ipynb")
downloads a single raw juypter notebook to file.

For text files, you can use:
import requests
url = 'https://WEBSITE.com'
req = requests.get(url)
path = "C:\\YOUR\\FILE.html"
with open(path, 'wb') as f:
f.write(req.content)

I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.
I had to disable the firewall(lazy)/enable https out by editing the rules(proper)
created the python script:
import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()
dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
with open("file.ova", 'wb') as tmp_file:
shutil.copyfileobj(response, tmp_file)
ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path

Another clean way to save the file is this:
import csv
import urllib
urllib.retrieve("your url goes here" , "output.csv")

Copy text from website to text/excel file

I'm trying to create a simple (hopefully) Python script that copies the text from this address:
http://api.bitcoincharts.com/v1/trades.csv?symbol=mtgoxUSD
to either a simple text file or an excel spreadsheet.
I've tried utilising urllib and resquests libraries, but every time I would try and run a very basic script, the shell wouldn't display anything.
For example,
import requests
data = requests.get('http://api.bitcoincharts.com/v1/trades.csv?symbol=mtgoxUSD')
data.text
Any help would be appreciated. Thank you.

You're almost done;
import requests
symbol = "mtgoxUSD"
url = 'http://api.bitcoincharts.com/v1/trades.csv?symbol={}'.format(symbol)
data = requests.get(url)
# dump resulting text to file
with open("trades_{}.csv".format(symbol), "w") as out_f:
out_f.write(data.text)

Using urllib:
import urllib
f = urllib.urlopen("http://api.bitcoincharts.com/v1/trades.csv?symbol=mtgoxUSD")
print f.read()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read remote page content using Python - python

if you want to read remote content via http. requests or urllib2 are both good choice. for Python2, use requests. import requests resp = requests.get('http://example.com/') print resp.text will work.

Related

How to get file from url in python?

How to download a zip file from a URL with authentification? [duplicate]

Send Rust variable to Python?

Basic http file downloading and saving to disk in python?

Copy text from website to text/excel file

Categories

Resources