Python - Codec Issue with the video file downloaded - python

I am trying to download a video that has been uploaded in the cloud, and I am using API's to extract the data.
The python script seems to download the file fine, but when I open the video it throws this error:
I have tried using different options (VLC, Windows Media Player, etc) to play the video but do not have any luck. Can someone please help?
if res.status_code == 200:
body = res.json()
for meeting in body["meetings"]:
try:
password = requests.get(
f"{root}meetings/{meeting['uuid']}/recordings/settings?access_token={token}").json()["password"]
url = f"https://api.zoom.us/v2/meetings/{meeting['uuid']}/recordings/settings?access_token={token}"
res = requests.patch(
url,
data=json.dumps({"password": ""}),
headers=sess_headers)
except:
pass
topic = meeting["topic"]
try:
os.makedirs("downloads")
except:
pass
for i, recording in enumerate(meeting["recording_files"]):
#os.makedirs(topic)
download_url = recording["download_url"]
name = recording["recording_start"] + \
"-" + meeting["topic"]
ext = recording["file_type"]
filename = f"{name}.{ext}"
path = f'./downloads/{filename}'.replace(":", ".")
res = requests.get(download_url, headers=sess_headers)
with open(Path(path), 'wb') as f:
f.write(res.content)
else:
print(res.text)

One possible problem is next:
After doing each res = requests.get(...) you need to insert line res.raise_for_status().
This is needed to check that status code was 200.
By default requests doesn't throw anything if status code is not 200. Hence your res.content may be an invalid response body in case of bad status code.
If you do res.raise_for_status() then requests will throw error if status code is not 200, thus saving you from possible problems.
But having status code of 200 doesn't definitely mean that there was no error. Some servers respond with HTML containing error description and status code 200.
Another possible problem could be that download url is missing authorization token inside it, then you need to provide it through headers. So instead of last requests.get(...) put next code:
res = requests.get(download_url, headers = {
**sess_headers, 'Authorization': 'Bearer ' + token})
Also you need to check what content type resulting response has, so after last res = response.get(...), do next:
print('headers:', res.headers)
and check what is inside there. Specifically look at field Content-Type, it should have some binary type like application/octet-stream or video/mp4. But definitely not some text format like application/json or text/html, text format file is definitely not video file. In case if it is text/html then try renaming file to test.html and open it in browser to see what's there, probably server responded with some error inside this HTML.
Also just visually compare in some viewer content of two files - downloaded by script and downloaded by some downloader (e.g. browser). Maybe there is some obvious problem visible by eye.
Also file size should be quite big for video. If it is like 50KB then possibly some bad data is inside there.
UPDATE:
Finally worked next solution, replacing last requests.get(...) with line:
res = requests.get(download_url + '?access_token=' + token, headers=sess_headers)

Related

How to make Python go through URLs in a text file, check their status codes, and exclude all ones with 404 error?

I tried the following script, but unfortunately the output file is identical to the input file. I'm not sure what's wrong with it.
import requests
url_lines = open('banana1.txt').read().splitlines()
remove_from_urls = []
for url in url_lines:
remove_url = requests.get(url)
print(remove_url.status_code)
if remove_url.status_code == 404:
remove_from_urls.append(url)
continue
url_lines = [url for url in url_lines if url not in remove_from_urls]
print(url_lines)
# Save urls example
with open('banana2.txt', 'w+') as file:
for item in url_lines:
file.write(item + '\n')
There seems to be no error in your code, but there are few things that would help to make it more readable and consistent. The first course of action should be to make sure there is at least one url that would return a 404 status code.
Edit: After providing the actual URL.
The 404 problem
In your case, the problem is the Twitter actually does not return 404 error for your "Not found" url. You can test it using curl:
$ curl -o /dev/null -w "%{http_code}" "https://twitter.com/davemeltzerWON/status/1321279214365016064"
200
Or using Python:
import requests
response = requests.get("https://twitter.com/davemeltzerWON/status/1321279214365016064")
print(response.status_code)
The output for both should be 200.
Since Twitter is a JavaScript application that loads its content after it has been processed in browser, you cannot find the information you are looking for in the HTML response. You would need to use something like Selenium to actually process the JavaScript for you and then you would be able to look for actual text like "not found" on the web page.
Code review
Please make sure to close the file properly. Also, file object is a lines iterator, you can convert it to list very easily. Another trick to make the code more readable is to make use of Python set. So you may read the file like this:
with open("banana1.txt") as fid:
url_lines = set(fid)
Then you simply remove all the links that do not work:
not_working = set()
for url in url_lines:
if requests.get(url).status_code == 404:
not_working.add(url)
working = url_lines - not_working
with open("banana2.txt", "w") as fid:
fid.write("\n".join(working))
Also, if some of the links point to the same server, you should make use of requests.Session class:
from requests import Session
session = Session()
Then replace requests.get with session.get, you should get some performance boost since the Session uses keep-alive connection and many other things.

(Python) Bittrex API v3 keeps returning invalid content hash

Writing a bot for a personal project, and the Bittrex api refuses to validate my content hash. I've tried everything I can think of and all the suggestions from similar questions, but nothing has worked so far. Tried hashing 'None', tried a blank string, tried the currency symbol, tried the whole uri, tried the command & balance, tried a few other things that also didn't work. Reformatted the request a few times (bytes/string/dict), still nothing.
Documentation says to hash the request body (which seems synonymous with payload in similar questions about making transactions through the api), but it's a simple get/chcek balance request with no payload.
Problem is, I get a 'BITTREX ERROR: INVALID CONTENT HASH' response when I run it.
Any help would be greatly appreciated, this feels like a simple problem but it's been frustrating the hell out of me. I am very new to python, but the rest of the bot went very well, which makes it extra frustrating that I can't hook it up to my account :/
import hashlib
import hmac
import json
import os
import time
import requests
import sys
# Base Variables
Base_Url = 'https://api.bittrex.com/v3'
APIkey = os.environ.get('B_Key')
secret = os.environ.get('S_B_Key')
timestamp = str(int(time.time() * 1000))
command = 'balances'
method = 'GET'
currency = 'USD'
uri = Base_Url + '/' + command + '/' + currency
payload = ''
print(payload) # Payload Check
# Hashes Payload
content = json.dumps(payload, separators=(',', ':'))
content_hash = hashlib.sha512(bytes(json.dumps(content), "utf-8")).hexdigest()
print(content_hash)
# Presign
presign = (timestamp + uri + method + str(content_hash) + '')
print(presign)
# Create Signature
message = f'{timestamp}{uri}{method}{content_hash}'
sign = hmac.new(secret.encode('utf-8'), message.encode('utf-8'),
hashlib.sha512).hexdigest()
print(sign)
headers = {
'Api-Key': APIkey,
'Api-Timestamp': timestamp,
'Api-Signature': sign,
'Api-Content-Hash': content_hash
}
print(headers)
req = requests.get(uri, json=payload, headers=headers)
tracker_1 = "Tracker 1: Response =" + str(req)
print(tracker_1)
res = req.json()
if req.ok is False:
print('bullshit error #1')
print("Bittex response: %s" % res['code'], file=sys.stderr)
I can see two main problems:
You are serialising/encoding the payload separately for the hash (with json.dumps and then bytes) and for the request (with the json=payload parameter to request.get). You don't have any way of knowing how the requests library will format your data, and if even one byte is different you will get a different hash. It is better to convert your data to bytes first, and then use the same bytes for the hash and for the request body.
GET requests do not normally have a body (see this answer for more details), so it might be that the API is ignoring the payload you are sending. You should check the API docs to see if you really need to send a request body with GET requests.

Python Request to send a file with a specific part

I'm trying to send a file via the request library but the receiver requires a part with a designated name (receivers terminology). I have something like this... so far:
filePath = os.path.join( GetDownloadFolder(), fileName )
files = {'upload': open( str( filePath ),'rb')}
response = requests.post( url, headers=header, files=files, verify=False )
GetDownloadFolder() simply gets the location where the file is. Header contains the account info and content type. The code above talks to the server and no longer complains that the file cannot be found. I get an error back from the server that a part with a specific name must exist. I tried using the data=values parameters with:
values = {'upload': ''}
That unfortunately didn't solve the issue. Any ideas would be appreciated.
Oh my... After a bit of debugging I figured it out. The receiver had the wrong error. I had set the content type myself as Content-Type: multipart/form-data.
When I sent file I got back an error stating that I'm missing a named part. I removed the setting of the content type and requests filled out the header like this instead.
Content-Type: multipart/form-data; boundary=3645c8b2b8f74e1a8db8a85c54225964
At that point... the received accepted the data. So the boundary is important. Probably is the size of the content being received or some such detail.

How to upload a binary/video file using Python http.client PUT method?

I am communicating with an API using HTTP.client in Python 3.6.2.
In order to upload a file it requires a three stage process.
I have managed to talk successfully using POST methods and the server returns data as I expect.
However, the stage that requires the actual file to be uploaded is a PUT method - and I cannot figure out how to syntax the code to include a pointer to the actual file on my storage - the file is an mp4 video file.
Here is a snippet of the code with my noob annotations :)
#define connection as HTTPS and define URL
uploadstep2 = http.client.HTTPSConnection("grabyo-prod.s3-accelerate.amazonaws.com")
#define headers
headers = {
'accept': "application/json",
'content-type': "application/x-www-form-urlencoded"
}
#define the structure of the request and send it.
#Here it is a PUT request to the unique URL as defined above with the correct file and headers.
uploadstep2.request("PUT", myUniqueUploadUrl, body="C:\Test.mp4", headers=headers)
#get the response from the server
uploadstep2response = uploadstep2.getresponse()
#read the data from the response and put to a usable variable
step2responsedata = uploadstep2response.read()
The response I am getting back at this stage is an
"Error 400 Bad Request - Could not obtain the file information."
I am certain this relates to the body="C:\Test.mp4" section of the code.
Can you please advise how I can correctly reference a file within the PUT method?
Thanks in advance
uploadstep2.request("PUT", myUniqueUploadUrl, body="C:\Test.mp4", headers=headers)
will put the actual string "C:\Test.mp4" in the body of your request, not the content of the file named "C:\Test.mp4" as you expect.
You need to open the file, read it's content then pass it as body. Or to stream it, but AFAIK http.client does not support that, and since your file seems to be a video, it is potentially huge and will use plenty of RAM for no good reason.
My suggestion would be to use requests, which is a way better lib to do this kind of things:
import requests
with open(r'C:\Test.mp4'), 'rb') as finput:
response = requests.put('https://grabyo-prod.s3-accelerate.amazonaws.com/youruploadpath', data=finput)
print(response.json())
I do not know if it is useful for you, but you can try to send a POST request with requests module :
import requests
url = ""
data = {'title':'metadata','timeDuration':120}
mp3_f = open('/path/your_file.mp3', 'rb')
files = {'messageFile': mp3_f}
req = requests.post(url, files=files, json=data)
print (req.status_code)
print (req.content)
Hope it helps .

Limit return size of GAE url_fetch get method?

I'm trying to grab the Id3 info out of Mp3 files stored online without grabbing the whole file and from a lot of googling the best method seems to be grabbing the first couple kb of the file then getting it from that. Is there a way in googles app engine (python) to get just the start of a file from it's URL?
Something like
rpc.size_limit = 4096
rpc = urlfetch.create_rpc(deadline=10.0)
urlfetch.make_fetch_call(rpc, url, method=method, headers=headers,
payload=payload, allow_truncated=True)
return rpc
Thanks for any help in advance.
Found it! You can just put a range in the headers if the website accepts the header as follows
headers["Range"] = "bytes = 0-4096"
Or you can use the something like the following if the website doesnt like the range header (so far the few I've tried all have)
host = 'http://www.wikipedia.org/somepath/tosome/file.mp3'
req = urllib2.Request(host, headers={'User-Agent' : "Magic Browser"})
response = urllib2.urlopen(req).read(4*1024)
Hopefully this saves some time to someone in the future!

Categories