Updating Binary File on Github using Contents API

Updating Binary File on Github using Contents API - python

After successfully updating a plain text file using the GitHub Repository Contents API, I tried to do the same thing with an Excel file. I understand that git isn't really designed to store binaries; however, this is what my client needs.
Here are the relevant lines of Python code:
# Get the XLSX file from the repo to get its SHA
g = GitHub(my_admin_token)
repo = g.get_repo("theowner/therepo")
contents = repo.get_contents("myfile.xlsx", ref="main")
sha = contents.sha
# So far, so good. We have the SHA.
# Read the bytes we want to use to replace the contents of the file
data = open('my_new_file.xlsx', 'rb').read()
base64_encoded_data = base64.b64encode(data)
# Update the XLSX file in the repo with the new bytes
result = repo.update_file(contents.path, "auto-committed", base64_encoded_data,
sha, branch="main")
print("Result of update_file:")
print(result)
# Result: {'commit': Commit(sha="88f46eb99ce6c1d7d7d287fb8913a7f92f6faeb2"), 'content': ContentFile(path="myfile.xlsx")}
Now, you'd think everything went well; however, when I go to GitHub and look at the file, it's a mass of Base64 encoded data. It somehow "loses the fact that it's an Excel file" in the translation. When I click on the file in the GitHub user interface, and I have the option to Download the file, I get the "big blob" of Base64 text vs. having the XLSX file download.
There doesn't seem to be a way to tell the API what encoding I want to use, e.g., there doesn't seem to be a way to set HTTP headers on the call.
I also tried using the Python requests library to PUT (per doc) to the GitHub API:
result = requests.put('https://api.github.com/repos/myname/myrepo/contents/myfile.xlsx', {
"headers": {
"Accept": "application/vnd.github.VERSION.raw",
"Authorization": "token my_admin_token"
},
"committer": {'name':'My Name', 'email':'me#mymail.com'},
"message": "Did it work?",
"branch": "main",
"content": base64_encoded_data})
and I get an HTTP 404.
I tried playing with the Accept header types as well. No dice.
Various other issues trying this with curl.
If you have a working sample of updating/replacing an XLSX file on GitHub using curl, python, etc. I'd love to see it! Thanks.

Uploading a binary file is very much possible to GitHub. Both via git and via the GitHub API.
The following python snippet works as expected and uploads an excel file to a test repository at https://github.com/recycle-bin/github-playground/tree/main/hello . And I'm able to download the excel file as expected too.
import base64
import datetime
import os
import requests
github_token = os.environ["GITHUB_API_TOKEN"]
repository = "recycle-bin/github-playground"
xlsx_file_path = "workbook.xlsx"
def upload_file_to_github(source_file_path: str, destination_path: str):
headers = {
"content-type": "application/json",
"authorization": f"token {github_token}",
"accept": "application/vnd.github+json",
}
with open(source_file_path, "rb") as source_file:
encoded_string = base64.b64encode(source_file.read()).decode("utf-8")
payload = {
"message": f"Uploaded file at {datetime.datetime.utcnow().isoformat()}",
"content": encoded_string,
}
requests.put(
f"https://api.github.com/repos/{repository}/contents/{destination_path}",
json=payload,
headers=headers,
)
def main():
upload_file_to_github(xlsx_file_path, "hello/workbook.xlsx")
if __name__ == "__main__":
main()
Your 404 could possibly be due to one of the following
The repository does not exist
The branch does not exist

Related

Possible to request github JSON file without token

So I have been having some issues solving how I can read my repo file, which is in JSON format, with requests. (Python)
Basically I have created something simple like:
r = requests.get('https://raw.githubusercontent.com/Test/testrepo/master/token.json?token=ADAJKFAHFAKNQ3RKVSUQ5T12333777777')
which works, however, every time I make a new commit/changes on that file, it gives me a new token and then I need to recode all over again.
So my question is, is it possible to access the JSON file without the token? (I do need to keep the repo in private as well), but the point is that I want to be able to do changes on the file without the URL being changed.

The easiest solution is probably to use the GitHub API, rather than trying to use the "raw" link you see in the browser.
First, acquire a personal access token
Now issue an API request to /repos using that access token:
import requests
token = "MY_SECRET_TOKEN"
owner = 'Test'
repo = 'testrepo'
path = 'token.json'
r = requests.get(
'https://api.github.com/repos/{owner}/{repo}/contents/{path}'.format(
owner=owner, repo=repo, path=path),
headers={
'accept': 'application/vnd.github.v3.raw',
'authorization': 'token {}'.format(token),
}
)
print(r.text)

You can use the Github python library to get any file in your repository. Since you mentioned keeping the repo in private, you have to login to github using one of the methods described here. Here is an example of getting the file using the github username and password
from github import Github
user_name = <YOUR_USERNAME>
password = <YOUR_PASSWORD>
g = Github(user_name, password)
file_name='test.json' #Choose your required file name location
repo_name = 'repo_name'
repo_location = '{}/{}'.format(user_name, repo_name)
repo = g.get_repo(repo_location)
file = repo.get_contents(file_name)
#if you want the download url for the file (this comes along with the token that you talked about earlier)
download_url = file.download_url
#if you simply want the content inside the file
content = file.decoded_content

#larsks provides solution are great, and I want to supplement.
I choose a public repositoryawesome-python as an example
suppose you want to access master/docs/CNAME contents
import requests
token = "MY_SECRET_TOKEN"
owner = 'vinta'
repo = 'awesome-python'
path = 'docs/CNAME'
branch = 'master' # or sha1, for example: 6831740
url = f'https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={branch}'
print(url)
r = requests.get(url,
headers={
'accept': 'application/vnd.github.v3.raw',
# 'authorization': f'token {token}', # If you are want to read "public" only, then you can ignore this line.
}
)
print(r.text)
"""
Type
r.text: str
r.content: bytes
"""
# If you want to save it as a file, then you can try as below.
# f=open('temp.ico','wb')
# f.write(r.content)
But I think many people may want to access a private repository.
then go to
github.com/settings/tokens
Generate a new token
click repo (Full control of private repositories)
add header of authorizationcancel comment

Trying to upload a .wav file to a bucket using python-requests

I'm trying to upload a .wav file (lets say test.wav) into my google storage bucket but i'm running into some problems: A storage object gets uploaded with the appropriate 'test.wav' name, but inside it is just the data from my request. Also the contentType in the bucket is displayed as application/x-www-form-urlencoded.
My bucket has public read/write/delete permissions and uploading other file types works fine. Uploading this way also works fine through postman.
url = "https://www.googleapis.com/upload/storage/v1/b/<bucket_name>/o"
payload ={
"acl":"public-read",
"file":open('test.wav'),
"signature":signature,
"policy":policy
}
headers={"contentType":"audio/wav"}
params={"uploadType":"media","name":"test.wav"}
response = requests.post(url,data=payload,headers=headers,params=params)
print(response.text)
Currently I get the following error :
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position
4420: character maps to <undefined>
I've also tried scipy.io.wavfile.read() and pydub.AudioSegment() but these don't work for me either.
Ideally, I'd want the file to be successfully uploaded and usable for transcription through google's STT.
Thanks and regards.

Found a workaround to this problem. Instead of using the requests module, I've switched over to using the method shown here.
Doing this uploads the file properly, instead of its data being uploaded to a file with a .wav extension. Thereby fixing my issue.

You're specifying the uploadType parameter to be media. That means that the body of the request is the body of the object you're uploading. However, you're specifying the body field of your POST to be a dictionary with fields like ""acl", "signature", and "file". It looks like maybe you're attempting a sort of form-style POST, but that's not how media uploads look.
Here's how you'd use Requests to do a media-style upload to GCS:
import requests
url = "https://www.googleapis.com/upload/storage/v1/b/<bucket_name>/o"
headers = {
"acl": "public-read",
"Authorization": "Bearer ...",
"Content-Type": "audio/wav",
}
params = {"uploadType": "media", "name": "test.wav"}
with open('test.wav', 'rb') as file:
r = requests.post(url, params=params, headers=headers, data=file)
print(r.text)

Importing Qualtrics Responses using Python Requests library

I am trying to import a csv of responses into Qualtrics using the API shown here: https://api.qualtrics.com/docs/import-responses. But, since I'm a noob at Python and (by extension) at Requests, I'm having trouble figuring out why I keep getting a 413. I've gotten this far:
formTest = {
'surveyId': 'my_id',
'file': {
'value': open('dataFiles/myFile.csv', 'rb'),
'options': {
'contentType': 'text/csv'
}
}
}
headersTest = {
"X-API-TOKEN": "my_token",
'content-type': "multipart/form-data"
}
r = requests.request("POST", url, data=formTest, headers=headersTest)
print(r.text)
The format for the formTest variable is something I found when looking through other code bases for an angular implementation of this, which may not apply to a python version of the code. I can successfully use cUrl, but Python Requests, in my current situation is the way to go (for various reasons).
In a fit of desperation, I tried directly translating the cUrl request to python requests, but that didn't seem to help much either.
Has anyone done something like this before? I took a look at posts for importing contacts and the like, but there was no luck there either (since the data that needs to be sent is formatted differently). Is there something I am missing?

It's best not to mix post data and files but use two separate dictionaries. For the files you should use the files= parameter, because it encodes the POST data as a Multipart Form data and creates the required Content-Type headers.
import requests
url = 'Qualtrics API'
file_path = 'path/to/file'
file_name = 'file.name'
data = {'surveyId':'my_id'}
files = {'file' : (file_name, open(file_path, 'rb'), 'text/csv')}
headers = {'X-API-TOKEN': 'my_token'}
r = requests.post(url, data=data, files=files, headers=headers)
print(r.text)
The first value in files['file'] is the file name (optional), followed by the file object, followed by the file content type (optional).
You will find more info in the docs: Requests, POST a Multipart-Encoded File.

How to upload files to slack using file.upload and requests

I've been searching a lot and I haven't found an answer to what I'm looking for.
I'm trying to upload a file from /tmp to slack using python requests but I keep getting {"ok":false,"error":"no_file_data"} returned.
file={'file':('/tmp/myfile.pdf', open('/tmp/myfile.pdf', 'rb'), 'pdf')}
payload={
"filename":"myfile.pdf",
"token":token,
"channels":['#random'],
"media":file
}
r=requests.post("https://slack.com/api/files.upload", params=payload)
Mostly trying to follow the advice posted here

Sending files through http requires a bit more extra work than sending other data. You have to set content type and fetch the file and all that, so you can't just include it in the payload parameter in requests.
You have to give your file information to the files parameter of the .post method so that it can add all the file transfer information to the request.
my_file = {
'file' : ('/tmp/myfile.pdf', open('/tmp/myfile.pdf', 'rb'), 'pdf')
}
payload={
"filename":"myfile.pdf",
"token":token,
"channels":['#random'],
}
r = requests.post("https://slack.com/api/files.upload", params=payload, files=my_file)

Writing this post, to potentially save you all the time I've wasted. I did try to create a new file and upload it to Slack, without actually creating a file (just having it's content). Because of various and not on point errors from the Slack API I wasted few hours to find out that in the end, I had good code from the beginning and simply missed a bot in the channel.
This code can be used also to open an existing file, get it's content, modify and upload it to Slack.
Code:
from io import StringIO # this library will allow us to
# get a csv content, without actually creating a file.
sio = StringIO()
df.to_csv(sio) # save dataframe to CSV
csv_content = sio.getvalue()
filename = 'some_data.csv'
token=os.environ.get("SLACK_BOT_TOKEN")
url = "https://slack.com/api/files.upload"
request_data = {
'channels': 'C123456', # somehow required if you want to share the file
# it will still be uploaded to the Slack servers and you will get the link back
'content': csv_content, # required
'filename': filename, # required
'filetype': 'csv', # helpful :)
'initial_comment': comment, # optional
'text': 'File uploaded', # optional
'title': filename, # optional
#'token': token, # Don't bother - it won't work. Send a header instead (example below).
}
headers = {
'Authorization': f"Bearer {token}",
}
response = requests.post(
url, data=request_data, headers=headers
)
OFFTOPIC - about the docs
I just had a worst experience (probably of this year) with Slack's file.upload documentation. I think that might be useful for you in the future.
Things that were not working in the docs:
token - it cannot be a param of the post request, it must be a header. This was said in one of github bug reports by actual Slack employee.
channel_not_found - I did provide an existing, correct channel ID and got this message. This is somehow OK, because of security reasons (obfuscation), but why there is this error message then: not_in_channel - Authenticated user is not in the channel. After adding bot to the channel everything worked.
Lack of examples for using content param (that's why I am sharing my code with you.
Different codding resulted with different errors regarding form data and no info in the docs helped to understand what might be wrong, what encoding is required in which upload types.
The main issue is they do not version their API, change it and do not update docs, so many statements in the docs are false/outdated.

Base on the Slack API file.upload documentation
What you need to have are:
Token : Authentication token bearing required scopes.
Channel ID : Channel to upload the file
File : File to upload
Here is the sample code. I am using WebClient method in #slack/web-api package to upload it in slack channel.
import { createReadStream } from 'fs';
import { WebClient } from '#slack/web-api';
const token = 'token'
const channelId = 'channelID'
const web = new WebClient(token);
const uploadFileToSlack = async () => {
await web.files.upload({
filename: 'fileName',
file: createReadStream('path/file'),
channels: channelId,
});
}

Project Oxford Speaker Recognition- Invalid Audio Format

I have been trying a lot to use the Project Oxford Speaker Recognition API
(https://dev.projectoxford.ai/docs/services/563309b6778daf02acc0a508/operations/5645c3271984551c84ec6797).
I have been successfully able to record the sound on my microphone convert it to the required WAV(PCM,16bit,16K,Mono).
The problem is when I try to post this file as a binary stream to the API it returns an Invalid audio format error message.
The same file is accepted by the demo on the website(https://www.projectoxford.ai/demo/SPID).
I am using python 2.7 with this code.
import httplib
import urllib
import base64
import json
import codecs
headers = {
# Request headers
'Content-Type': 'application/octet-stream',
'Ocp-Apim-Subscription-Key': '{KEY}',
}
params = urllib.urlencode({
})
def enroll(audioId):
conn = httplib.HTTPSConnection('api.projectoxford.ai')
file = open('test.wav','rb')
body = file.read()
conn.request("POST", "/spid/v1.0/verificationProfiles/" + audioId +"/enroll?%s" % params, str(body), headers)
response = conn.getresponse()
data = response.read()
print data
conn.close()
return data
And this is the response that i am getting.
{
"error": {
"code": "BadRequest",
"message": "Invalid Audio Format"
}
}
Please if anyone can guide me as to what I am missing. I have verified all the properties of the audio file and the requirements needed by the API but with no luck.
All answers and comments are appreciated.

I sent this file to Project oxford with my test program that is in ruby and it works properly. I think the issue might be in the other params you are sending. Try changing your 'Content Type' header to 'audio/wav; samplerate=1600' this is the header that I used. I also send a 'Content Length' header with the size of the file. I'm not sure if 'Content Length' is required but it is good standard to include it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.