Python script not downloading files - python

I have the code below. It prints out the URL in the console. I'm having trouble figuring out how to get it to just download it instead of displaying it. I also want to be able to search for .mov file type. I'd rather have information on how to do this rather than it done for me. Any help is appreciated!
import urllib
def is_download_allowed():
f = urllib.urlopen("http://10.1.1.27/config?action=get&paramid=eParamID_MediaState")
response = f.read()
if (response.find('"value":"1"') > -1):
return True
f = urllib.urlopen("http://10.1.1.27/config?action=set&paramid=eParamID_MediaState&value=1")
def download_clip():
url = "http://10.1.1.27/media/SC1ATK26"
print url
def is_not_download_allowed():
f = urllib.urlopen("http://10.1.1.27/config?action=get&paramid=eParamID_MediaState")
response = f.read()
if (response.find('"value":"-1"') > 1):
return True
f = urllib.urlopen("http://10.1.1.27/config?action=set&paramid=eParamID_MediaState&value=1")
is_download_allowed()
download_clip()
is_not_download_allowed()

You say you don’t want a full solution so ...
Try urllib.urlretrieve
As commented already your download function is just printing a string.

Related

Problem with not being able to open file after downloading

I have a script which is used to do some scraping on Reddit.
import praw
import requests
def reddit_scrape():
count = 0
for submission in subreddit.new(limit = 100):
if (is_known_id(submission_id = submission.id)):
print('known')
continue
save_id(submission.id)
save_to_dict(id = submission.id, txt = submission.title)
img_data = requests.get(submission.url).content
with open(submission.id, 'wb') as handler:
handler.write(img_data)
print(submission.url)
count += 1
if count >= 3: break
However, when I try to open the file saved as handler, it does not have an extension and I am not able to open it.
I have no idea what is causing this issue, as it was working perfectly a while ago.
Feel free to let me know if I am missing any info, as this is just part of the entire script.

regarding file downloading in python

I wrote this code to download an srt subtitle file, but this doesn't work. Please review this problem and help me with the code. I need to find what is the mistake that i'm doing. Thanks.
from urllib import request
srt_url = "https://subscene.com/subtitle/download?mac=LkM2jew_9BdbDSxdwrqLkJl7hDpIL_HnD-s4XbfdB9eqPHsbv3iDkjFTSuKH0Ee14R-e2TL8NQukWl82yNuykti8b_36IoaAuUgkWzk0WuQ3OyFyx04g_vHI_rjnb2290"
def download_srt_file(srt_url):
response = request.urlopen(srt_url)
srt = response.read()
srt_str = str(srt)
lines = srt_str.split('\\n')
dest_url = r'srtfile.srt'
fx = open('dest_url' , 'w')
for line in lines:
fx.write(line)
fx.close()
download_srt_file(srt_url)
A number of things are wrong or can be improved.
You are missing the return statement on your function.
You are calling the function from within the function so you are not actually calling it at all. You never enter it to begin with.
dest_url is not a string, it is a variable so fx = open('dest_url', 'w') will return an error (no such file)
To avoid handling the closing and flushing the file you are writing just use the with statement.
Your split('//n') is also wrong. You are escaping the slash like that. You want to split the lines so it has to be split('\n')
Finally, you don't have to convert the srt to string. It already is.
Below is a modified and hopefully functioning version of your code with the above implemented.
from urllib import request
def download_srt_file(srt_url):
response = request.urlopen(srt_url)
srt = response.read()
lines = srt.split('\n')
dest_url = 'srtfile.srt'
with open(dest_url, 'w') as fx:
for line in lines:
fx.write(line)
return
srt_url = "https://subscene.com/subtitle/download?mac=LkM2jew_9BdbDSxdwrqLkJl7hDpIL_HnD-s4XbfdB9eqPHsbv3iDkjFTSuKH0Ee14R-e2TL8NQukWl82yNuykti8b_36IoaAuUgkWzk0WuQ3OyFyx04g_vHI_rjnb2290"
download_srt_file(srt_url)
Tell me if it works for you.
A final remark is that you are not setting the target directory for the file you are writing. Are you sure you want to do that?

Saving an image from text file providing image url's in python

import urllib2
import urllib
import json
import urlparse
def main():
f = open("C:\Users\Stern Marketing\Desktop\dumpaday.txt","r")
if f.mode == 'r':
item = f.read()
for x in item:
urlParts = urlparse.urlsplit(x)
filename = urlParts.path.split('/')[-1]
urllib.urlretrieve(item.strip(), filename)
if __name__ == "__main__":
main()`
Looks like script still not working properly, I'm really not sure why... :S
Getting lots of errors...
urllib.urlretrieve("x", "0001.jpg")
This will try to download from the (static) URL "x".
The URL you actually want to download from is within the variable x, so you should write your line to reference that variable:
urllib.urlretrieve(x, "0001.jpg")
Also, you probably want to change the target filename for each download, so you don’t keep on overwriting it.
Regarding your filename update:
urlparse.urlsplit is a function that takes an URL and splits it into multiple parts. Those parts are returned from the function, so you need to save it in some variable.
One part is the path, which is what contains the file name. The path itself is a string on which you can call the split method to separate it by the / character. As you are interested in only the last part—the filename—you can discard everything else:
url = 'http://www.dumpaday.com/wp-content/uploads/2013/12/funny-160.jpg'
urlParts = urlparse.urlsplit(url)
print(urlParts.path) # /wp-content/uploads/2013/12/funny-160.jpg
filename = urlParts.path.split('/')[-1]
print(filename) # funny-160.jpg
It should work like this:
import urllib2
import urllib
import json
import urlparse
def main():
with open("C:\Users\Stern Marketing\Desktop\dumpaday.txt","r") as f:
for x in f:
urlParts = urlparse.urlsplit(x.strip())
filename = urlParts.path.split('/')[-1]
urllib.urlretrieve(x.strip(), filename)
if __name__ == "__main__":
main()`
The readlines method of file objects returns lines with a trailing newline character (\n).
Change your loop to the following:
# By the way, you don't need readlines at all. Iterating over a file yields its lines.
for x in fl:
urllib.urlretrieve(x.strip(), "0001.jpg")
Here is a solution that loops over images indexed 160 to 171. You can adjust as needed. This creates a url from the base, opens it via urllib2 and saves it as a binary file.
import urllib2
base_url = "http://www.dumpaday.com/wp-content/uploads/2013/12/funny-{}.jpg"
for n in xrange(160, 170):
url = base_url.format(n)
f_save = "{}.jpg".format(n)
req = urllib2.urlopen(url)
with open(f_save,'wb') as FOUT:
FOUT.write(req.read())

How to make download manager more robust?

I have made this simple download manager, but the problem is it wont work on complex urls, when pages are redirected.
def str(d):
for i in range(len(d)):
if d[-i] == '/':
x=-i
break
s=[]
l=len(d)+x+1
print d[l],d[len(d)-1]
s=d[l:]
return s
import urllib2
url=raw_input()
filename=str(url)
webfile = urllib2.urlopen(url)
data = webfile.read()
fout =open(filename,"w")
fout.write(data)
fout.close()
webfile.close()
it wouldn't work for http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=9&ved=0CG0QFjAI&url=http%3A%2F%2Fwww.iasted.org%2Fconferences%2Fformatting%2FPresentations-Tips.ppt&ei=clfWTpjZEIblrAfC8qWXDg&usg=AFQjCNEIgqx6x4ULHFXzzYDzCITuUJOczA&sig2=0VtKXPvoDnIq-lIR4S9LEQ
while it would work for http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt
and both links are for the same file.
How to solve the problem of redirection?
I think redirection is not a problem here:
Since urllib2 already follows redirect automatically, google redirects to a page in case of error.
Try this script :
url1 = 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=9&ved=0CG0QFjAI&url=http%3A%2F%2Fwww.iasted.org%2Fconferences%2Fformatting%2FPresentations-Tips.ppt&ei=clfWTpjZEIblrAfC8qWXDg&usg=AFQjCNEIgqx6x4ULHFXzzYDzCITuUJOczA&sig2=0VtKXPvoDnIq-lIR4S9LEQ'
url2 = 'http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt'
from urlparse import urlsplit
from urllib2 import urlopen
for url in [url1, url2]:
split = urlsplit(url)
filename = split.path[split.path.rfind('/')+1:]
if not filename:
filename = split.query[split.query.rfind('/')+1:]
f = open(filename, 'w')
f.write(urlopen(url).read())
f.close()
# Yields 2 files : url and Presentations-Tips.ppt [Both are ppt files]
The above script works every time.
In general, you handle redirection by using urllib2.HTTPRedirectHandler, like this:
import urllib2
opener = urllib.build_opener(urllib2.HTTPRedirectHandler)
res = open.open('http://example.com/some/url/')
However, it doesn't like like this will work for the Google URL you've given in your example, because rather than including a Location header in the response, the Google result looks like this:
<script>window.googleJavaScriptRedirect=1</script><script>var a=parent,b=parent.google,c=location;if(a!=window&&b){if(b.r){b.r=0;a.location.href="http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt";c.replace("about:blank");}}else{c.replace("http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt");};</script><noscript><META http-equiv="refresh" content="0;URL='http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt'"></noscript>
...which is to say, it uses a Javascript redirect, which substantially complicates your life. You could use Python's re module to extract the correct location from this block.

Downloading videos in FLV format from YouTube

I can’t really understand how YouTube serves videos, but I have been reading through what I can.
It seems like the old method get_video is now obsolete and can't be used any more. Is there another Pythonic and simple method for collecting YouTube videos?
You might have some luck with youtube-dl
http://rg3.github.com/youtube-dl/documentation.html
I'm not sure if there's a good API, but it's written in Python, so theoretically you could do something a little better than Popen :)
Here is a quick Python script which downloads a YouTube video. No bells and whistles, just scrapes out the necessary URLs, hits the generate_204 URL and then streams the data to a file:
import lxml.html
import re
import sys
import urllib
import urllib2
_RE_G204 = re.compile('"(http:.+.youtube.com.*\/generate_204[^"]+")', re.M)
_RE_URLS = re.compile('"fmt_url_map": "(\d*[^"]+)",.*', re.M)
def _fetch_url(url, ref=None, path=None):
opener = urllib2.build_opener()
headers = {}
if ref:
headers['Referer'] = ref
request = urllib2.Request(url, headers=headers)
handle = urllib2.urlopen(request)
if not path:
return handle.read()
sys.stdout.write('saving: ')
# Write result to file
with open(path, 'wb') as out:
while True:
part = handle.read(65536)
if not part:
break
out.write(part)
sys.stdout.write('.')
sys.stdout.flush()
sys.stdout.write('\nFinished.\n')
def _extract(html):
tree = lxml.html.fromstring(html)
res = {'204': _RE_G204.findall(html)[0].replace('\\', '')}
for script in tree.findall('.//script'):
text = script.text_content()
if 'fmt_url_map' not in text:
continue
# Found it. Extract the URLs we need
for tmp in _RE_URLS.findall(text)[0].split(','):
url_id, url = tmp.split('|')
res[url_id] = url.replace('\\', '')
break
return res
def main():
target = sys.argv[1]
dest = sys.argv[2]
html = _fetch_url(target)
res = dict(_extract(html))
# Hit the 'generate_204' URL first and remove it
_fetch_url(res['204'], ref=target)
del res['204']
# Download the video. Now I grab the first 'download' URL and use it.
first = res.values()[0]
_fetch_url(first, ref=target, path=dest)
if __name__ == '__main__':
main()
Running it:
python youdown.py 'http://www.youtube.com/watch?v=Je_iqbgGXFw' stevegadd.flv
saving: ........................... finished.
I would recommend writing your own parser using urllib2 or Beautiful Soup. You can look at the source code for DownThemAll to see how that plugin finds the video URL.

Categories