Python protobuf decode base64 string - python

I am trying to get JSON data from encrypted base64 string. I have created my proto file like below
syntax = "proto2";
message ArtifactList {
repeated Artifact artifacts = 1;
}
message Artifact {
required string id = 1;
required uint64 type_id = 2;
required string uri = 3;
}
After that, I have generated python files using the proto command. I am trying to decrypt the base64 string like below.
import message_pb2
import base64
data = base64.b64decode("AAAAAA8KDQgTEBUgBCjln62lxS6AAAAAD2dycGMtc3RhdHVzOjANCg==")
s = str(data)
message_pb2.ArtifactList.ParseFromString(s)
But I am getting the below error.
Traceback (most recent call last):
File "app.py", line 7, in <module>
message_pb2.ArtifactList.ParseFromString(s)
TypeError: descriptor 'ParseFromString' requires a 'google.protobuf.pyext._message.CMessage' object but received a 'str'
I am a newbie for protobuf. I couldn't find a solution to fix this issue.
Could anyone help to fix this problem?
Thanks in advance.

There are two issues.
ParseFromString is a method of an ArtifactList instance
ParseFromString takes a byte-like object, not str, as parameter
>>>import message_pb2
>>>import base64
>>>data = base64.b64decode("AAAAAA8KDQgTEBUgBCjln62lxS6AAAAAD2dycGMtc3RhdHVzOjANCg==")
>>>m=message_pb2.ArtifactList()
>>>m.ParseFromString(data)
>>>m.artifacts
<google.protobuf.pyext._message.RepeatedCompositeContainer object at 0x7fd09a937d68>

ParseFromString is a method on an protobuf Message instance.
Try:
message = message_pb2.ArtifactList()
message.ParseFromString(s)

Related

Upload file to Databricks DBFS with Python API

I'm following the Databricks example for uploading a file to DBFS (in my case .csv):
import json
import requests
import base64
DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)
def dbfs_rpc(action, body):
""" A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
response = requests.post(
BASE_URL + action,
headers={'Authorization': 'Bearer %s' % TOKEN },
json=body
)
return response.json()
# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
while True:
# A block can be at most 1MB
block = f.read(1 << 20)
if not block:
break
data = base64.standard_b64encode(block)
dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})
When using the tutorial as is, I get an error:
Traceback (most recent call last):
File "db_api.py", line 65, in <module>
data = base64.standard_b64encode(block)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 95, in standard_b64encode
return b64encode(s)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
I tried doing with open('./sample.csv', 'rb') as f: before passing the blocks to base64.standard_b64encode but then getting another error:
TypeError: Object of type 'bytes' is not JSON serializable
This happens when the encoded block data is being sent into the API call.
I tried skipping encoding entirely and just passing the blocks into the post call. In this case the file gets created in the DBFS but has 0 bytes size.
At this point I'm trying to make sense of it all. It doesn't want a string but it doesn't want bytes either. What am I doing wrong? Appreciate any help.
In Python we have strings and bytes, which are two different entities note that there is no implicit conversion between them, so you need to know when to use which and how to convert when necessary. This answer provides nice explanation.
With the code snippet I see two issues:
This you already got - open by default reads the file as text. So your block is a string, while standard_b64encode expects bytes and returns bytes. To read bytes from file it needs to be opened in binary mode:
with open('/a/local/file', 'rb') as f:
Only strings can be encoded as JSON. There's no source code available for dbfs_rpc (or I can't find it), but apparently it expects a string, which it internally encodes. Since your data is bytes, you need to convert it to string explicitly and that's done using decode:
dbfs_rpc("add-block", {"handle": handle, "data": data.decode('utf8')})

TrasnslationAPI a bytes-like object is required, not 'Repeated'

I am trying to translate a pdf document from english to french using google translation api and python, however I get a type error.
Traceback (most recent call last):
File "C:\Users\troberts034\Documents\translate_test\translate.py", line 42, in <module>
translate_document()
File "C:\Users\troberts034\Documents\translate_test\translate.py", line 33, in translate_document
f.write(response.document_translation.byte_stream_outputs)
TypeError: a bytes-like object is required, not 'Repeated'
I have a feeling that it has something to do with writing to the file as binary, but I open it as binary too so I am unsure what the issue is. I want it to take a pdf file that has english text and edit the text and translate it to french using the api. Any ideas whats wrong?
from google.cloud import translate_v3beta1 as translate
def translate_document():
client = translate.TranslationServiceClient()
location = "global"
project_id = "translatedocument"
parent = f"projects/{project_id}/locations/{location}"
# Supported file types: https://cloud.google.com/translate/docs/supported-formats
with open("C:/Users/###/Documents/translate_test/test.pdf", "rb") as document:
document_content = document.read()
document_input_config = {
"content": document_content,
"mime_type": "application/pdf",
}
response = client.translate_document(
request={
"parent": parent,
"target_language_code": "fr-FR",
"document_input_config": document_input_config,
}
)
# To output the translated document, uncomment the code below.
f = open('test.pdf', 'wb')
f.write(response.document_translation.byte_stream_outputs)
f.close()
# If not provided in the TranslationRequest, the translated file will only be returned through a byte-stream
# and its output mime type will be the same as the input file's mime type
print("Response: Detected Language Code - {}".format(
response.document_translation.detected_language_code))
translate_document()
I think there is a bug on the sample code (I'm assuming you got the sample from the Cloud Translate API documentation).
To fix your code, you do need to use response.document_translation.byte_stream_outputs[0]. So basically changing this line:
f.write(response.document_translation.byte_stream_outputs)
by:
f.write(response.document_translation.byte_stream_outputs[0])
then your code will work.

TypeError when trying to get data from JSON

I would like to print specific data in a JSON but I get the following error:
Traceback (most recent call last):
File "script.py", line 47, in <module>
print(link['data.file.url.short'])
TypeError: 'int' object has no attribute '__getitem__'
Here is the JSON:
{
"status":true,
"data":{
"file":{
"url":{
"full":"https://anonfile.com/y000H35fn3/yuh_txt",
"short":"https://anonfile.com/y000H35fn3"
},
"metadata":{
"id":"y000H35fn3",
"name":"yuh.txt",
"size":{
"bytes":0,
"readable":"0 Bytes"
}
}
}
}
}
I'm trying to get data.file.url.short which is the short value of the url
Here is the script in question:
post = os.system('curl -F "file=#' + save_file + '" https://anonfile.com/api/upload')
link = json.loads(str(post))
print(link['data.file.url.short'])
Thanks
Other than os.system() return value mentioned by #John Gordon I think correct syntax to access data.file.url.short is link['data']['file']['url']['short'], since json.loads returns dict.
os.system() does not return the output of the command; it returns the exit status of the command, which is an integer.
If you want to capture the command's output, see this question.
You are capturing the return code of the process created by os.system which is an integer.
Why dont you use the request class in the urllib module to perform that action within python?
import urllib.request
import json
urllib.request.urlretrieve('https://anonfile.com/api/upload', save_file)
json_dict = json.load(save_file)
print(json_dict['data']['file']['url']['short']) # https://anonfile.com/y000H35fn3
Or if you don't need to save the file you can use the requests library:
import requests
json_dict = requests.get('https://anonfile.com/api/upload').json()
print(json_dict['data']['file']['url']['short']) # https://anonfile.com/y000H35fn3

Python3 and hmac . How to handle string not being binary

I had a script in Python2 that was working great.
def _generate_signature(data):
return hmac.new('key', data, hashlib.sha256).hexdigest()
Where data was the output of json.dumps.
Now, if I try to run the same kind of code in Python 3, I get the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/hmac.py", line 144, in new
return HMAC(key, msg, digestmod)
File "/usr/lib/python3.4/hmac.py", line 42, in __init__
raise TypeError("key: expected bytes or bytearray, but got %r" %type(key).__name__)
TypeError: key: expected bytes or bytearray, but got 'str'
If I try something like transforming the key to bytes like so:
bytes('key')
I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
I'm still struggling to understand the encodings in Python 3.
You can use bytes literal: b'key'
def _generate_signature(data):
return hmac.new(b'key', data, hashlib.sha256).hexdigest()
In addition to that, make sure data is also bytes. For example, if it is read from file, you need to use binary mode (rb) when opening the file.
Not to resurrect an old question but I did want to add something I feel is missing from this answer, to which I had trouble finding an appropriate explanation/example of anywhere else:
Aquiles Carattino was pretty close with his attempt at converting the string to bytes, but was missing the second argument, the encoding of the string to be converted to bytes.
If someone would like to convert a string to bytes through some other means than static assignment (such as reading from a config file or a DB), the following should work:
(Python 3+ only, not compatible with Python 2)
import hmac, hashlib
def _generate_signature(data):
key = 'key' # Defined as a simple string.
key_bytes= bytes(key , 'latin-1') # Commonly 'latin-1' or 'ascii'
data_bytes = bytes(data, 'latin-1') # Assumes `data` is also an ascii string.
return hmac.new(key_bytes, data_bytes , hashlib.sha256).hexdigest()
print(
_generate_signature('this is my string of data')
)
try
codecs.encode()
which can be used both in python2.7.12 and 3.5.2
import hashlib
import codecs
import hmac
a = "aaaaaaa"
b = "bbbbbbb"
hmac.new(codecs.encode(a), msg=codecs.encode(b), digestmod=hashlib.sha256).hexdigest()
for python3 this is how i solved it.
import codecs
import hmac
def _generate_signature(data):
return hmac.new(codecs.encode(key), codecs.encode(data), codecs.encode(hashlib.sha256)).hexdigest()

Understanding an AttributeError in python and how to solve it

I am getting the error :
data = cipher.encrypt(data)
File "/usr/lib/python2.7/dist-packages/Crypto/Cipher/PKCS1_OAEP.py", line 133, in encrypt
randFunc = self._key._randfunc
AttributeError: 'str' object has no attribute '_randfunc'
in my console for the following section of code:
cipher = PKCS1_OAEP.new(PK_ID)
data = cipher.encrypt(data)
both the PK_ID and data are both of Str
what does the error message mean and how can i solve it for this code?
The PKCS1_OAEP.new() function takes an RSA key object, which you can get from the Crypto.PublicKey.RSA module, it doesn't take a str.

Categories