Hey guys I need some help, I am trying to download videos from this sitehttps://ttdownloader.com/dl.php?v=YTo0OntzOjk6IndhdGVybWFyayI7YjowO3M6NzoidmlkZW9JZCI7czoxOToiNjkxMjEwNzYyNzY1MjY5NzM1MCI7czozOiJ1aWQiO3M6MzI6Ijk0MTdiOWE3NWU2MmE3MDQ1NjZhYzk0MzJjMThlY2VlIjtzOjQ6InRpbWUiO2k6MTYxMTQ5NzE1ODt9 using python.
this is code I have tried.
import requests
url ='''https://ttdownloader.com/dl.php?v=YTo0OntzOjk6IndhdGVybWFyayI7YjowO3M6NzoidmlkZW9JZCI7czoxOToiNjkxMjEwNzYyNzY1MjY5NzM1MCI7czozOiJ1aWQiO3M6MzI6Ijk0MTdiOWE3NWU2MmE3MDQ1NjZhYzk0MzJjMThlY2VlIjtzOjQ6InRpbWUiO2k6MTYxMTQ5NzE1ODt9'''
page = requests.get(url)
with open('output.mp4', 'wb') as file:
file.write(page.content)
But it doesnt work as expected, when i check page.content all I see is b''
❌ The link that you are using is NOT a html page.
❌ Therefore it doesn't return anything as html.
✅ Your link is a media link.
✅ Therefore you must stream it and download it. Something like this:
import requests
url = '/your/valid/ttdownloader/url'
with requests.get(url, stream=True) as r:
with open('ouput.mp4', 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
NOTE:
The link that you posted in the question is now invalid.
Please try the above code with a newly generated link.
You should use request.urlretrieve to directly save the URL to a file:
from urllib import request
url ='''https://ttdownloader.com/dl.php?v=YTo0OntzOjk6IndhdGVybWFyayI7YjowO3M6NzoidmlkZW9JZCI7czoxOToiNjkxMjEwNzYyNzY1MjY5NzM1MCI7czozOiJ1aWQiO3M6MzI6Ijk0MTdiOWE3NWU2MmE3MDQ1NjZhYzk0MzJjMThlY2VlIjtzOjQ6InRpbWUiO2k6MTYxMTQ5NzE1ODt9'''
request.urlretrieve(url, output.mp4)
However, this code gave me a urllib.error.HTTPError: HTTP Error 403: Forbidden error. It appears that this link is not publicly available without authentication.
Related
I have been assigning a pre-assigned URL. When I try to get the url by using python's own library 'requests' it shows me this message.
"response"=>["<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>W9AYDBXYWC3SC3", "9F</RequestId><HostId>RzP63sVii2NOCFRNlPyP4/Iz/7sMn8rXJanhNfB8gnQmrDJmCQYk1igJ7TmcBpPubFH4I5QAAo4=</HostId></Error>"]}
But the thing is when I try to hit with the postman. It works well. I don't know the actual reason behind it. Will appreciate your response on it.
Here's how I'm accessing the url
import requests
file_response = requests.get(data.get('file_path'), stream=True)
I'm trying to use request to download the content of some web pages which are in fact PDFs.
I've tried the following code but the output that comes back is not properly decoded it seems:
link= 'http://www.pdf995.com/samples/pdf.pdf'
import requests
r = requests.get(link)
r.text
The output looks like below:
'%PDF-1.3\n%�쏢\n30 0 obj\n<>\nstream\nx��}ݓ%�m���\x15S�%NU���M&O7�㛔]ql�����+Kr�+ْ%���/~\x00��=����{feY�T�\x05��\r�\x00�/���q�8�8�\x7f�\x7f�~����\x1f�ܷ�O�z�7�7�o\x1f����7�\'�{��\x7f<~��\x1e?����C�%\ByLշK����!_b^0o\x083�K\x0b\x0b�\x05z�E�S���?�~ �]rb\x10C�y�>_r�\x10�<�K��<��!>��(�\x17���~�.m��]2\x11��
etc
I was hoping to get the html. I also tried with beautifulsoup but it does not decode it either.. I hope someone can help. Thank you, BR
Yes; a PDF file is a binary file, not a text file, so you should use r.content instead of r.text to access the binary data.
PDF files are not easy to deal with programmatically; but you might (for example) save it to a file:
import requests
link = 'http://www.pdf995.com/samples/pdf.pdf'
r = requests.get(link)
with open('pdf.pdf', 'wb') as f:
f.write(r.content)
I am trying to make a simple program that will help with the confusing part of rooting.
I need to download the file from tiny.cc/latestmagisk
I am using this python code
import request
url = tiny.cc/latestmagisk
r = request.get(url)
r.content
The content it returns is the usual 403 Forbidden for nginx
I need this to work with the shortened URL is there anyway to make that happen?
its's not necessary to import request lib
all you need to do is import ssl, urllib and pass ssl._create_unverified_context() as context to the server while you're sendig a request!
your code should be look like this:
import ssl, urllib
certcontext = ssl._create_unverified_context()
f = open('image.jpg','wb') #creating placeholder
#creating image from url and saving it as `image.jpg`!
f.write(urllib.urlopen("https://i.stack.imgur.com/IKh7E.png", context=certcontext).read())
f.close()
note: it will save the image as image.jpg file ..
Contrary to the other answer, you really should use requests for this as requests has better support for redirects.
For getting a page through a redirect from requests:
r=requests.get(url, allow_redirects=True)
For downloading files through redirects:
r = requests.get(url, allow_redirects=True, stream=True)
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: f.write(chunk)
However, in this case, either tiny.cc or XDA does not allow a simple requests.get; the 403 forbidden is likely due to the User-Agent or other intrinsic header as this method works well with bit.ly and other shortlink generators. You may need to fake headers.
I am trying to use the requests function in python to post the text content of a text file to a website, submit the text for analysis on said website, and pull the results back in to python. I have read through a number of responses here and on other websites, but have not yet figured out how to correctly modify the code to a new website.
I'm familiar with beautiful soup so pulling in webpage content and removing HTML isn't an issue, its the submitting the data that I don't understand.
My code currently is:
import requests
fileName = "texttoAnalyze.txt"
fileHandle = open(fileName, 'rU');
url_text = fileHandle.read()
url = "http://www.webpagefx.com/tools/read-able/"
payload = {'value':url_text}
r = requests.post(url, payload)
print r.text
This code comes back with the html of the website, but hasn't recognized the fact that I'm trying to a submit a form.
Any help is appreciated. Thanks so much.
You need to send the same request the website is sending, usually you can get these with web debugging tools (like chrome/firefox developer tools).
In this case the url the request is being sent to is: http://www.webpagefx.com/tools/read-able/check.php
With the following params: tab=Test+by+Direct+Link&directInput=SOME_RANDOM_TEXT
So your code should look like this:
url = "http://www.webpagefx.com/tools/read-able/check.php"
payload = {'directInput':url_text, 'tab': 'Test by Direct Link'}
r = requests.post(url, data=payload)
print r.text
Good luck!
There are two post parameters, tab and directInput:
import requests
post = "http://www.webpagefx.com/tools/read-able/check.php"
with open("in.txt") as f:
data = {"tab":"Test by Direct Link",
"directInput":f.read()}
r = requests.post(post, data=data)
print(r.content)
First post here, any help would be greatly appreciated :).
I'm trying to scrape from a website with an embedded pdf viewer. As far as I can tell, there is no way to directly download the PDF file.
The browser displays the pdf as multiple PNG image files, the problem is that the png files aren't directly accessible either. They are rendered from the original pdf and then displayed.
And the URL with the heading stripped out is in the codeblock.
The original URL to the pdf viewer (I'm using the second URL), and the link to render the pdf are included in the code.
My strategy here is to pull the viewstate and eventvalidation using urllib, then use wget to download all files from the site. This method does work without post data (page 1). I am getting the rest of the parameters from fiddler (sniffing tool)
But when I use post data to specify the page, I get 405 errors like these when trying to download the image files. However, it downloads the actual html page without a problem, just none of the png files that go along with it. Here is an example of the wget errors.
HTTP request sent, awaiting response... 405 Method Not Allowed
2014-03-27 17:09:38 ERROR 405: Method Not Allowed.
Since I can't access the image file link directly, I thought grabbing the entire page with wget would be my best bet. If anyone knows some better alternatives, please let me know :). The post data seems to work at least partially since the downloaded html file is set to the page I specified in parameters.
According to fiddler, the site automatically does a get request for the image file. I'm not quite sure how to emulate this however.
Any help is appreciated, thanks for your time!
imglink = 'http://201.150.36.178/consultaexpedientes/render/2132495e-863c-4b96-8135-ea7357ff41511.png'
origurl = 'http://201.150.36.178/consultaexpedientes/sistemas/boletines/wfBoletinVisor.aspx?tomo=1&numero=9760&fecha=14/03/2014%2012:40:00'
url = 'http://201.150.36.178/consultaexpedientes/usercontrol/Default.aspx?name=e%3a%5cBoletinesPdf%5c2014%5c3%5cBol_9760.pdf%7c0'
f = urllib2.urlopen(url)
html = f.read()
soup = BeautifulSoup(html)
eventargs = soup.findAll(attrs={'type':'hidden'})
reValue = re.compile(r'value=\"(.*)\"', re.DOTALL)
viewstate = re.findall(reValue, str(eventargs[0]))[0]
validation = re.findall(reValue, str(eventargs[1]))[0]
params = urllib.urlencode({'__VIEWSTATE':viewstate,
'__EVENTVALIDATION':validation,
'PDFViewer1$PageNumberTextBox':6,
'PDFViewer1_BookmarkPanelScrollX':0,
'PDFViewer1_BookmarkPanelScrollY':0,
'PDFViewer1_ImagePanelScrollX' : 0,
'PDFViewer1_ImagePanelScrollY' : 0,
'PDFViewer1$HiddenPageNumber':6,
'PDFViewer1$HiddenAplicaMarcaAgua':0,
'PDFViewer1$HiddenBrowserWidth':1920,
'PDFViewer1$HiddenBrowserHeight':670,
'PDFViewer1$HiddenPageNav':''})
command = '/usr/bin/wget -E -H -k -K -p --post-data=\"%s' % params + '\" ' + url
print command
os.system(command)