Trying to upload a .wav file to a bucket using python-requests - python

I'm trying to upload a .wav file (lets say test.wav) into my google storage bucket but i'm running into some problems: A storage object gets uploaded with the appropriate 'test.wav' name, but inside it is just the data from my request. Also the contentType in the bucket is displayed as application/x-www-form-urlencoded.
My bucket has public read/write/delete permissions and uploading other file types works fine. Uploading this way also works fine through postman.
url = "https://www.googleapis.com/upload/storage/v1/b/<bucket_name>/o"
payload ={
"acl":"public-read",
"file":open('test.wav'),
"signature":signature,
"policy":policy
}
headers={"contentType":"audio/wav"}
params={"uploadType":"media","name":"test.wav"}
response = requests.post(url,data=payload,headers=headers,params=params)
print(response.text)
Currently I get the following error :
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position
4420: character maps to <undefined>
I've also tried scipy.io.wavfile.read() and pydub.AudioSegment() but these don't work for me either.
Ideally, I'd want the file to be successfully uploaded and usable for transcription through google's STT.
Thanks and regards.

Found a workaround to this problem. Instead of using the requests module, I've switched over to using the method shown here.
Doing this uploads the file properly, instead of its data being uploaded to a file with a .wav extension. Thereby fixing my issue.

You're specifying the uploadType parameter to be media. That means that the body of the request is the body of the object you're uploading. However, you're specifying the body field of your POST to be a dictionary with fields like ""acl", "signature", and "file". It looks like maybe you're attempting a sort of form-style POST, but that's not how media uploads look.
Here's how you'd use Requests to do a media-style upload to GCS:
import requests
url = "https://www.googleapis.com/upload/storage/v1/b/<bucket_name>/o"
headers = {
"acl": "public-read",
"Authorization": "Bearer ...",
"Content-Type": "audio/wav",
}
params = {"uploadType": "media", "name": "test.wav"}
with open('test.wav', 'rb') as file:
r = requests.post(url, params=params, headers=headers, data=file)
print(r.text)

Related

Updating Binary File on Github using Contents API

After successfully updating a plain text file using the GitHub Repository Contents API, I tried to do the same thing with an Excel file. I understand that git isn't really designed to store binaries; however, this is what my client needs.
Here are the relevant lines of Python code:
# Get the XLSX file from the repo to get its SHA
g = GitHub(my_admin_token)
repo = g.get_repo("theowner/therepo")
contents = repo.get_contents("myfile.xlsx", ref="main")
sha = contents.sha
# So far, so good. We have the SHA.
# Read the bytes we want to use to replace the contents of the file
data = open('my_new_file.xlsx', 'rb').read()
base64_encoded_data = base64.b64encode(data)
# Update the XLSX file in the repo with the new bytes
result = repo.update_file(contents.path, "auto-committed", base64_encoded_data,
sha, branch="main")
print("Result of update_file:")
print(result)
# Result: {'commit': Commit(sha="88f46eb99ce6c1d7d7d287fb8913a7f92f6faeb2"), 'content': ContentFile(path="myfile.xlsx")}
Now, you'd think everything went well; however, when I go to GitHub and look at the file, it's a mass of Base64 encoded data. It somehow "loses the fact that it's an Excel file" in the translation. When I click on the file in the GitHub user interface, and I have the option to Download the file, I get the "big blob" of Base64 text vs. having the XLSX file download.
There doesn't seem to be a way to tell the API what encoding I want to use, e.g., there doesn't seem to be a way to set HTTP headers on the call.
I also tried using the Python requests library to PUT (per doc) to the GitHub API:
result = requests.put('https://api.github.com/repos/myname/myrepo/contents/myfile.xlsx', {
"headers": {
"Accept": "application/vnd.github.VERSION.raw",
"Authorization": "token my_admin_token"
},
"committer": {'name':'My Name', 'email':'me#mymail.com'},
"message": "Did it work?",
"branch": "main",
"content": base64_encoded_data})
and I get an HTTP 404.
I tried playing with the Accept header types as well. No dice.
Various other issues trying this with curl.
If you have a working sample of updating/replacing an XLSX file on GitHub using curl, python, etc. I'd love to see it! Thanks.
Uploading a binary file is very much possible to GitHub. Both via git and via the GitHub API.
The following python snippet works as expected and uploads an excel file to a test repository at https://github.com/recycle-bin/github-playground/tree/main/hello . And I'm able to download the excel file as expected too.
import base64
import datetime
import os
import requests
github_token = os.environ["GITHUB_API_TOKEN"]
repository = "recycle-bin/github-playground"
xlsx_file_path = "workbook.xlsx"
def upload_file_to_github(source_file_path: str, destination_path: str):
headers = {
"content-type": "application/json",
"authorization": f"token {github_token}",
"accept": "application/vnd.github+json",
}
with open(source_file_path, "rb") as source_file:
encoded_string = base64.b64encode(source_file.read()).decode("utf-8")
payload = {
"message": f"Uploaded file at {datetime.datetime.utcnow().isoformat()}",
"content": encoded_string,
}
requests.put(
f"https://api.github.com/repos/{repository}/contents/{destination_path}",
json=payload,
headers=headers,
)
def main():
upload_file_to_github(xlsx_file_path, "hello/workbook.xlsx")
if __name__ == "__main__":
main()
Your 404 could possibly be due to one of the following
The repository does not exist
The branch does not exist

S3 object returns octet-stream but was uploaded as png

I have this existing piece of code that is used to upload files to my s3 bucket.
def get_user_upload_url(customer_id, filename, content_type):
s3_client = boto3.client('s3')
object_name = "userfiles/uploads/{}/{}".format(customer_id, filename)
try:
url = s3_client.generate_presigned_url('put_object',
Params={'Bucket': BUCKET,
'Key': object_name,
"ContentType": content_type # set to "image/png"
},
ExpiresIn=100)
except Exception as e:
print(e)
return None
return url
This returns to my client a presigned URL that I use to upload my files without a issue. I have added a new use of it where I'm uploading a png and I have behave test that uploads to the presigned url just fine. The problem is if i go look at the file in s3 i cant preview it. If I download it, it wont open either. The s3 web client shows it has Content-Type image/png. I visual compared the binary of the original file and the downloaded file and i can see differences. A file type tool detects that its is an octet-stream.
signature_file_name = "signature.png"
with open("features/steps/{}".format(signature_file_name), 'rb') as f:
files = {'file': (signature_file_name, f)}
headers = {
'Content-Type': "image/png" # without this or with a different value the presigned url will error with a signatureDoesNotMatch
}
context.upload_signature_response = requests.put(response, files=files, headers=headers)
I would have expected to have been returned a PNG instead of an octet stream however I'm not sure what I have done wrong . Googling this generally results in people having a problem with the signature because there not properly setting or passing the content type and I feel like I've effectively done that here proven by the fact that if I change the content type everything fails . I'm guessing that there's something wrong with the way I'm uploading the file or maybe reading the file for the upload?
So it is todo with how im uploading. So instead it works if i upload like this.
context.upload_signature_response = requests.put(response, data=open("features/steps/{}".format(signature_file_name), 'rb'), headers=headers)
So this must have to do with the use of put_object. It must be expecting the body to be the file of the defined content type. This method accomplishes that where the prior one would make it a multi part upload. So I think it's safe to say the multipart upload is not compatible with a presigned URL for put_object.
Im still piecing it altogether, so feel free to fill in the blanks.

Python sending POST requests/ multipart/form-data

I'm just working on API conection at my work. I already made some GET and PUT request, but now i have problem with POST. API documantation is here. And here is my code I test but get 400 bad request:
import requests
files = {'files': ('fv.pdf', open(r"C:\python\API\fv.pdf", 'rb'))}
data = {"order_documents":[{'file_name':"fv.pdf", 'type_code':'CUSTOMER_INVOICE' }]}
header = {
'Authorization': '###########################',
}
response = requests.post("https://######.com/api/orders/40100476277994-A/documents", headers=header, files = files, data = data)
print(response.status_code)
print(response.url)
Someone have any idea how i can handle with this?
Looks like you are missing the order_documents parameter, it needs to be an array and also needs to be called order_documents.
Try changing your data variable into:
data = {"order_documents": [ {'file_name':"fv.pdf", 'type_code':'CUSTOMER_INVOICE' } ] }
The API expects files as the parameter name and your dictionary sends file to the server. The parameter name files that you give to session.post is just for requests library and not the actual parameter sent to the server.
The API also expects multiple files in an array, so you need to change your files object.
files = [
('files', ('fv.pdf', open(r"C:\python\API\fv.pdf", 'rb')),
]
Also, I don't think you need to use requests.Session(), just use requests.post(), unless you're planning on using the session object multiple times for subsequent requests.

How to upload a binary/video file using Python http.client PUT method?

I am communicating with an API using HTTP.client in Python 3.6.2.
In order to upload a file it requires a three stage process.
I have managed to talk successfully using POST methods and the server returns data as I expect.
However, the stage that requires the actual file to be uploaded is a PUT method - and I cannot figure out how to syntax the code to include a pointer to the actual file on my storage - the file is an mp4 video file.
Here is a snippet of the code with my noob annotations :)
#define connection as HTTPS and define URL
uploadstep2 = http.client.HTTPSConnection("grabyo-prod.s3-accelerate.amazonaws.com")
#define headers
headers = {
'accept': "application/json",
'content-type': "application/x-www-form-urlencoded"
}
#define the structure of the request and send it.
#Here it is a PUT request to the unique URL as defined above with the correct file and headers.
uploadstep2.request("PUT", myUniqueUploadUrl, body="C:\Test.mp4", headers=headers)
#get the response from the server
uploadstep2response = uploadstep2.getresponse()
#read the data from the response and put to a usable variable
step2responsedata = uploadstep2response.read()
The response I am getting back at this stage is an
"Error 400 Bad Request - Could not obtain the file information."
I am certain this relates to the body="C:\Test.mp4" section of the code.
Can you please advise how I can correctly reference a file within the PUT method?
Thanks in advance
uploadstep2.request("PUT", myUniqueUploadUrl, body="C:\Test.mp4", headers=headers)
will put the actual string "C:\Test.mp4" in the body of your request, not the content of the file named "C:\Test.mp4" as you expect.
You need to open the file, read it's content then pass it as body. Or to stream it, but AFAIK http.client does not support that, and since your file seems to be a video, it is potentially huge and will use plenty of RAM for no good reason.
My suggestion would be to use requests, which is a way better lib to do this kind of things:
import requests
with open(r'C:\Test.mp4'), 'rb') as finput:
response = requests.put('https://grabyo-prod.s3-accelerate.amazonaws.com/youruploadpath', data=finput)
print(response.json())
I do not know if it is useful for you, but you can try to send a POST request with requests module :
import requests
url = ""
data = {'title':'metadata','timeDuration':120}
mp3_f = open('/path/your_file.mp3', 'rb')
files = {'messageFile': mp3_f}
req = requests.post(url, files=files, json=data)
print (req.status_code)
print (req.content)
Hope it helps .

Python file upload from url using requests library

I want to upload a file to an url. The file I want to upload is not on my computer, but I have the url of the file. I want to upload it using requests library. So, I want to do something like this:
url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)
But, only difference is, the file report.xls comes from some url which is not in my computer.
The only way to do this is to download the body of the URL so you can upload it.
The problem is that a form that takes a file is expecting the body of the file in the HTTP POST. Someone could write a form that takes a URL instead, and does the fetching on its own… but that would be a different form and request than the one that takes a file (or, maybe, the same form, with an optional file and an optional URL).
You don't have to download it and save it to a file, of course. You can just download it into memory:
urlsrc = 'http://example.com/source'
rsrc = requests.get(urlsrc)
urldst = 'http://example.com/dest'
rdst = requests.post(urldst, files={'file': rsrc.content})
Of course in some cases, you might always want to forward along the filename, or some other headers, like the Content-Type. Or, for huge files, you might want to stream from one server to the other without downloading and then uploading the whole file at once. You'll have to do any such things manually, but almost everything is easy with requests, and explained well in the docs.*
* Well, that last example isn't quite easy… you have to get the raw socket-wrappers off the requests and read and write, and make sure you don't deadlock, and so on…
There is an example in the documentation that may suit you. A file-like object can be used as a stream input for a POST request. Combine this with a stream response for your GET (passing stream=True), or one of the other options documented here.
This allows you to do a POST from another GET without buffering the entire payload locally. In the worst case, you may have to write a file-like class as "glue code", allowing you to pass your glue object to the POST that in turn reads from the GET response.
(This is similar to a documented technique using the Node.js request module.)
import requests
img_url = "http://...."
res_src = requests.get(img_url)
payload={}
files=[
('files',('image_name.jpg', res_src.content,'image/jpeg'))
]
headers = {"token":"******-*****-****-***-******"}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.text)
above code is working for me.

Categories