pytube urllib.error.HTTPError: HTTP Error 403: Forbidden - python

I get an error when i try to download a video from youtube with pytube.
from pytube import YouTube
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
stream = yt.streams.first()
stream
stream.download()
The error says "Forbidden", but fetching e.g. with curl from the same URL seems to work just fine. What could be wrong here?
Traceback (most recent call last):
File "C:/Users/Mr. jarvis/Desktop/youtube/youtube.py", line 7, in <module>
stream.download()
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytube\streams.py", line 217, in download
bytes_remaining = self.filesize
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytube\streams.py", line 164, in filesize
headers = request.get(self.url, headers=True)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytube\request.py", line 21, in get
response = urlopen(url)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Use pytube3 instead, it fixes this error (but is Python 3 only): https://github.com/hbmartin/pytube3

Experienced the same problem. I found unfinished downloaded file in the project folder. After deleting the unfinished downloaded file and running stream.download() again, it worked. So maybe worth checking.
pytube version 12.0.0

Related

python: urllib.request.urlopen() HTTP Error 308 Permanent redirect

i was just trying to read a file from the internet..
then an error came..
Traceback (most recent call last):
File "E:\Sapphire\Programming\download_file.py", line 46, in <module>
download(arg[1])
File "E:\Sapphire\Programming\download_file.py", line 19, in download
file_info = req.urlopen("http://proget.whirlpool.repl.co/{}/{}.txt".format(arch, name))
File "E:\Programming\Python38-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "E:\Programming\Python38-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "E:\Programming\Python38-32\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "E:\Programming\Python38-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "E:\Programming\Python38-32\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "E:\Programming\Python38-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 308: Permanent Redirect
the line that brought the error (line 20):
file_info = req.urlopen("http://proget.whirlpool.repl.co/{}/{}.txt".format(arch, name))
(the link that it should use is correct..)
any ideas how it came?
SPECS:
OS: Windows 7 SP1
Python: Python 3.8.10
Arch: x86 (32-bit)
Http is outdated, so most sites would redirect to https.
So use
file_info = req.urlopen("https://proget.whirlpool.repl.co/{}/{}.txt".format(arch, name))

downloading a CSV with wget

Hi i am newish to python, and working on modeling of the CV-19 outbreak in the UK.
Currently I am writting a program to automatically download the UK governments latest death statistics.
Here is my code so far:
import wget
url = "https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv"#
wget.download(url, 'C:/Users/Moshe/Downloads/Covid_19_uk_timeseries.csv')
Running this in pycharm I get:
C:\Users\Moshe\PycharmProjects\HelloWorld\venv\Scripts\python.exe C:/Users/Moshe/.PyCharmCE2019.1/config/scratches/Get_CV19_UK_stats.py
Traceback (most recent call last):
File "C:/Users/Moshe/.PyCharmCE2019.1/config/scratches/Get_CV19_UK_stats.py", line 6, in <module>
wget.download(url, 'C:/Users/Moshe/Downloads/Covid_19_uk_timeseries.csv')
File "C:\Users\Moshe\PycharmProjects\HelloWorld\venv\lib\site-packages\wget.py", line 526, in download
(tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "C:\Users\Moshe\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\Moshe\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Moshe\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\Moshe\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Moshe\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\Moshe\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\Moshe\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 308: Permanent Redirect
Process finished with exit code 1
The file I am trying to get can be found on this page:
https://coronavirus.data.gov.uk/?_ga=2.157835066.1251021075.1589887735-1783596499.1585566366
Why won't this work? and what do i need to change?
Can you verify that your output path is valid and writeable?
try
import wget
import os
output_dir = 'C:/Users/Moshe/Downloads'
output_file = 'Covid_19_uk_timeseries.csv'
assert os.path.exists(output_dir)
url = "https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv"#
wget.download(url, out = os.path.join(output_dir,output_file))

urllib request gives 404 error but works fine in browser

When i try this line:
import urllib.request
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
i get the following error:
Traceback (most recent call last):
File "scraper.py", line 26, in <module>
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
But the link works fine in my browser? Why does it work in the browser but not for a request? It works with other pictures from the same site.
The request returns
If you check your developer console, It's a 404:
So what you see is imgur's custom 404 "page" (which is an image).
EDIT:
So urlretrieve fails on 404 status code. If you want to use the contents of the request (even if the statuscode is 404) you can do the following:
try:
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
except Exception as e:
with open("error_photo.jpg", 'wb') as fp:
fp.write(e.read())
Try to change user-agent. You can just add a kwarg:
req = urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg", headers={"User-Agent": "put custom user agent here"})

How to download .log files from the server and save it to local disk using Python

I am trying to download error log files such as IIS logs, HTTP API error logs from the server and save it to my disk. These logs contain the .log extension.
Have tried below code to download, it works well for text files but it didn't work for the files which are of type .log:
from urllib.request import urlopen
response = urlopen('https://servername/path/errorlog.txt') #doesn't work for response = urlopen('https://servername/path/errorlog.log')
data = response.read()
# Write data to file
filename = "test.txt"
file_ = open(filename, 'w')
file_.write(data)
file_.close()
This is the error messages i am getting:
Traceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
response = urlopen(url)
File "C:\Python\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Python\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Python\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Python\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Python\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Do you have a solution to this problem?

urllib.error.HTTPError: HTTP Error 400: Bad Request (bottlenose)

After installing bottlenose and getting my API keys and associate tags I tried following the instructions in this guide: https://github.com/lionheart/bottlenose
(I have removed my api keys)
This is the error I am getting:
>>> import bottlenose
>>> amazon = bottlenose.Amazon(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ASSOCIATE_TAG)
>>> response = amazon.ItemLookup(ItemId="B007OZNUCE")
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
response = amazon.ItemLookup(ItemId="B007OZNUCE")
File "C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\site-packages\bottlenose\api.py", line 265, in __call__
{'api_url': api_url, 'cache_url': cache_url})
File "C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\site-packages\bottlenose\api.py", line 226, in _call_api
return urllib2.urlopen(api_request, timeout=self.Timeout)
File "C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File"C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 471, in open
response = meth(req, response)
File"C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 581, in http_response
'http', request, response, code, msg, hdrs)
File"C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 509, in error
return self._call_chain(*args)
File"C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File"C:\Users\Windows\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 589, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Check that you have the system clock correctly configured.
I have the same problem and I fixed it changing the date and time to the correct one.
You have to provide the value for keys
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_ASSOCIATE_TAG
In order to get the keys you have to register on AWS

Categories