I'm using Python to download memes from the /r/memes subreddit. Here's my code:
import praw
import requests
reddit = praw.Reddit(client_id="",
client_secret="",
user_agent="",
username="",
password="")
for submission in reddit.subreddit("memes").stream.submissions(skip_existing=True):
print(submission.url)
response = requests.get(submission.url)
file = open(submission.id, "wb") # line 15
file.write(response.content)
file.close()
My problem comes in at line 15. I'm able to download the image, but can't figure out how to download it as a .png/.jpg. Is there a way I can do this?
just documenting #jarhill0's response, you would write the image to a file like this:
extension = submission.url.rsplit('.')[-1]
with open(f"{submission.id}.{extension}", "wb") as file:
file.write(response.content)
Related
I want to download text files using python, how can I do so?
I used requests module's urlopen(url).read() but it gives me the bytes representation of file.
For me, I had to do the following (Python 3):
from urllib.request import urlopen
data = urlopen("[your url goes here]").read().decode('utf-8')
# Do what you need to do with the data.
You can use multiple options:
For the simpler solution you can use this
file_url = 'https://someurl.com/text_file.txt'
for line in urllib.request.urlopen(file_url):
print(line.decode('utf-8'))
For an API solution
file_url = 'https://someurl.com/text_file.txt'
response = requests.get(file_url)
if (response.status_code):
data = response.text
for line in enumerate(data.split('\n')):
print(line)
When downloading text files with python I like to use the wget module
import wget
remote_url = 'https://www.google.com/test.txt'
local_file = 'local_copy.txt'
wget.download(remote_url, local_file)
If that doesn't work try using urllib
from urllib import request
remote_url = 'https://www.google.com/test.txt'
file = 'copy.txt'
request.urlretrieve(remote_url, file)
When you are using the request module you are reading the file directly from the internet and it is causing you to see the text in byte format. Try to write the text to a file then view it manually by opening it on your desktop
import requests
remote_url = 'test.com/test.txt'
local_file = 'local_file.txt'
data = requests.get(remote_url)
with open(local_file, 'wb')as file:
file.write(data.content)
We need the images from the website <https://api.data.gov.sg/v1/transport/traffic-images >.But the below script download json file.But we want to download images directly .I am beginner .Thanks in advance
from threading import Timer
import time
import requests
startlog = time.time()
image_url = "https://api.data.gov.sg/v1/transport/traffic-images"
tm = 0
while True:
tm += 1
r = requests.get(image_url) # create HTTP response object
with open(str(tm)+"trafficFile.json", 'wb') as f:
f.write(r.content)
print(tm)
time.sleep(20)
This small piece of code written above will download the following image from the web. Now check your local directory(the folder where this script resides), and you will find this image.
I am new to python, I wrote simple script for uploading video from url to vk, I test this script with small files it's working, but for large files I get run out of memory, I read that using 'requests_toolbelt' it's possible to post large file, How can I add this to my script?
import vk
import requests
from homura import download
import glob
import os
import json
url=raw_input("Enter URL: ")
download(url)
file_name = glob.glob('*.mp4')[0]
session = vk.Session(access_token='TOKEN')
vkapi = vk.API(session,v='5.80' )
params={'name' : file_name,'privacy_view' : 'nobody', 'privacy_comment' : 'nobody'}
param = vkapi.video.save(**params)
upload_url = param['upload_url']
print ("Uploading ...")
request = requests.post(upload_url, files={'video_file': open(file_name, "rb")})
os.remove (file_name)
requests_toolbelt (https://github.com/requests/toolbelt) has just the example that might work for you:
import requests
from requests_toolbelt import MultipartEncoder
...
...
m=MultipartEncoder( fields={'video_file':(file_name, open(file_name, "rb"))})
response = requests.post(upload_url, data=m, headers={'Content-Type': m.content_type})
If you know your video file's MIME type, you can add it as a 3-rd item in the () tuple like this:
m=MultipartEncoder( fields={
'video_file':(file_name, open(file_name,"rb"), "video/mp4")})
I have links of the form:
http://youtubeinmp3.com/fetch/?video=LINK_TO_YOUTUBE_VIDEO_HERE
If you put links of this type in an <a> tag on a webpage, clicking them will download an MP3 of the youtube video at the end of the link. Source is here.
I'd like to mimic this process from the command-line by making post requests (or something of that sort), but I'm not sure how to do it in Python! Can I get any advice, please, or is this more difficult than I'm making it out to be?
As Mark Ma mentioned, you can get it done without leaving the standard library by utilizing urllib2. I like to use Requests, so I cooked this up:
import os
import requests
dump_directory = os.path.join(os.getcwd(), 'mp3')
os.makedirs(dump_directory, exist_ok=True)
def dump_mp3_for(resource):
payload = {
'api': 'advanced',
'format': 'JSON',
'video': resource
}
initial_request = requests.get('http://youtubeinmp3.com/fetch/', params=payload)
if initial_request.status_code == 200: # good to go
download_mp3_at(initial_request)
def download_mp3_at(initial_request):
j = initial_request.json()
filename = '{0}.mp3'.format(j['title'])
r = requests.get(j['link'], stream=True)
with open(os.path.join(dump_directory, filename), 'wb') as f:
print('Dumping "{0}"...'.format(filename))
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
It's then trivial to iterate over a list of YouTube video links and pass them into dump_mp3_for() one-by-one.
for video in ['http://www.youtube.com/watch?v=i62Zjga8JOM']:
dump_mp3_for(video)
In its API Doc, it provides one version of URL which returns download link as JSON: http://youtubeinmp3.com/fetch/?api=advanced&format=JSON&video=http://www.youtube.com/watch?v=i62Zjga8JOM
Ok Then we can use urllib2 to call the API and fetch API result, then unserialize with json.loads(), and download mp3 file using urllib2 again.
import urllib2
import json
r = urllib2.urlopen('http://youtubeinmp3.com/fetch/?api=advanced&format=JSON&video=http://www.youtube.com/watch?v=i62Zjga8JOM')
content = r.read()
# extract download link
download_url = json.loads(content)['link']
download_content = urllib2.urlopen(download_url).read()
# save downloaded content to file
f = open('test.mp3', 'wb')
f.write(download_content)
f.close()
Notice the file should be opened using mode 'wb', otherwise the mp3 file cannot be played correctly.
If the file is big, downloading will be a time-consuming progress. And here is a post describes how to display downloading progress in GUI (PySide)
If you want to download video or just the audio from YouTube you can use this module pytube it does all the hard work.
You can also list the audio only:
from pytube import YouTube
# initialize a YouTube object by the url
yt = YouTube("YOUTUBE_URL")
# that will get all audio files available
audio_list = yt.streams.filter(only_audio=True).all()
print(audio_list)
And then download it:
# that will download the file to current working directory
yt.streams.filter(only_audio=True)[0].download()
Complete Code:
from pytube import YouTube
yt = YouTube ("YOUTUBE_URL")
audio = yt.streams.filter(only_audio=True).first()
audio.download()
I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.
Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?
I'm not sure how to use shutil and os modules, either.
The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!
Here's a partial solution, that I wrote from various answers combined:
import requests
import os
import shutil
global dump
def download_file():
global dump
url = "http://randomsite.com/file.gz"
file = requests.get(url, stream=True)
dump = file.raw
def save_file():
global dump
location = os.path.abspath("D:\folder\file.gz")
with open("file.gz", 'wb') as location:
shutil.copyfileobj(dump, location)
del dump
Could someone point out errors (beginner level) and explain any easier methods to do this?
A clean way to download a file is:
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")
This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.
This example uses the urllib library, and it will directly retrieve the file form a source.
For Python3+ URLopener is deprecated.
And when used you will get error as below:
url_opener = urllib.URLopener() AttributeError: module 'urllib' has no
attribute 'URLopener'
So, try:
import urllib.request
urllib.request.urlretrieve(url, filename)
As mentioned here:
import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")
EDIT: If you still want to use requests, take a look at this question or this one.
Four methods using wget, urllib and request.
#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget
url = 'https://tinypng.com/images/social/website.jpg'
def testRequest():
image_name = 'test1.jpg'
r = requests.get(url, stream=True)
with open(image_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
def testRequest2():
image_name = 'test2.jpg'
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save(image_name)
def testUrllib():
image_name = 'test3.jpg'
testfile = urllib.URLopener()
testfile.retrieve(url, image_name)
def testwget():
image_name = 'test4.jpg'
wget.download(url, image_name)
if __name__ == '__main__':
profile.run('testRequest()')
profile.run('testRequest2()')
profile.run('testUrllib()')
profile.run('testwget()')
testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds
testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds
testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds
testwget - 3489 function calls in 0.020 seconds
I use wget.
Simple and good library if you want to example?
import wget
file_url = 'http://johndoe.com/download.zip'
file_name = wget.download(file_url)
wget module support python 2 and python 3 versions
Exotic Windows Solution
import subprocess
subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)
import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/dnishimoto/python-deep-learning/master/list%20iterators%20and%20generators.ipynb", "test.ipynb")
downloads a single raw juypter notebook to file.
For text files, you can use:
import requests
url = 'https://WEBSITE.com'
req = requests.get(url)
path = "C:\\YOUR\\FILE.html"
with open(path, 'wb') as f:
f.write(req.content)
I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.
I had to disable the firewall(lazy)/enable https out by editing the rules(proper)
created the python script:
import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()
dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
with open("file.ova", 'wb') as tmp_file:
shutil.copyfileobj(response, tmp_file)
ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path
Another clean way to save the file is this:
import csv
import urllib
urllib.retrieve("your url goes here" , "output.csv")