I'm working in an application which has SAP RFC which returns doc files as XSTRINGs. And also there is a client application written in python that sends request to SAP RFC to get doc files. So my question is, in python, how can i convert XSTRING to a doc file?.
Response Header's content type is application/msword; and charset=utf-8
This answer was given by the OP inside his own question, so I have moved it here to fit StackOverflow principles.
Answer to my own question :
Even though SAP RFC returns a variable of type xstring, Python receives it in base64. In order to convert base64 string to doc, first I decoded base64 string and it gave me output in RTF. Then I wrote RTF bytes to a .rtf file. RTF files can be opened from most of the word processing tools. Therefore I was able to open .rtf file from word processing tools.
Following is code I wrote for conversion:
from base64 import b64decode
base64_resp = response_json['data']
bytes_rtf = b64decode(base64_resp, validate=True)
f = open(rtf_filename, 'wb')
f.write(bytes_rtf)
f.close()
Related
I would like to generate report to pdf using pdfme library. I need the Polish characters to be there as well. The example report end with:
with open('document.pdf', 'wb') as f:
build_pdf(document, f)
So I cannot add encoding = "utf-8". Is there any way I can still use Polish characters?
I tried:
Change to write mode and set encoding to utf-8. Getting: "TypeError: write() argument must be str, not bytes".
While having Polish characters add .encode("utf-8"). Example: "Paweł".encode("utf-8"). Getting: "TypeError: value of . attr must be of type str, list or tuple: b'Pawe\xc5\x82'"
In this case, the part of the code responsible for dealing with the unicode characters is the PDF library. The build_pdf call there, for whatever library it is, has to be able to handle any character in "document". And if it fails it is the context for the PDF library, owner of the "build_pdf" call that has to be changed so that it will handle all the characters you need.
"utf-8" is just one form os expressing characters as bytes - aPDF file is a binary file, and it does have internal headers, structures and settings to do its own character encoding handling: your text may endup inside the PDF either encoded as utf-8, or some other, legacy encoding- but that will be transparent for you and anyone using the PDF file.
It may be that the document, if it is text (we don't know if it is plain text, or if it is some object from your library that has already been pre-processed) - but if it is text, and your library says that build_pdf can accept bytes instead, you can encode the document prior to this call:
build_pdf(document.encode('utf-8', f) - but that would be some strange way of working - it is likely that either build_pdf will do the encoding, or whatever process generated the document had already done so.
To get more meaningful help, you have to say which library you are using to geneate the PDF, and include the import lines in your code,including the creation of your document so that we have a minimal reproducible example: i.e. I can copy your code, paste in a .py file here, install the lib, run it, and see a corrupted PDF file with the Polish characters magled: then I, and others, can be able to fix it. Otherwise, this answer is as far as I can get.
I am trying to decrypt a json message body having mix of numeric and non english characters. The decrypted string is not showing non English characters properly.
Details:-
1) Input is base64 encoded and gpg encrypted
2) I am using python base64 and gnupg modules to decode and decrypt the message.
The output is displayed as (part of the output due to the data sensitivity):-
{"id":"123","name":"ååéåæ¥é¡
I am expecting the output as
{{"id":"123","name":"豐國業銀"
Here is the python code I am using for the above task:
import json
import os
import base64
import gnupg
gpg = gnupg.GPG()
with open('item2.json', 'r') as file:
json_data = json.load(file)
for value in json_data['items']:
data = value['payload']
print (data)
str_data = base64.b64decode(data)
print (str_data)
decrypted_data = gpg.decrypt(str_data, passphrase=output)
print (decrypted_data)
It's probably caused by a character encoding mismatch. It will also make a difference whether you're using Python 2 or Python 3.
Since Python 2 has finally reached its EOL, I'm going to assume you (and everyone to subsequently read this) need Python 3 code. There are also a number of possible explanations stemming from the way the python-gnupg module accesses the gpg executable.
For both security and ease of use reasons, I recommend instead using the Python bindings module which ships whith the GPGME C API instead.
import json
import os
import base64
import gpg
c = gpg.Context(armor=True)
with open('item2.json', 'r') as file:
json_data = json.load(file)
for value in json_data['items']:
data = value['payload']
print (data)
str_data = base64.b64decode(data.encode())
print (str_data)
decrypted_data, result, verify_result = c.decrypt(str_data)
print (decrypted_data)
Now decrypted_data only contains the plaintext, result contains information about the key(s) the encrypted data is encrypted to, and verify_result contains information about the key(s) the data was signed with, if any.
More details are in the included documentation, a draft copy is available here for convenience.
In all likelihood, however, the cause was the base64.b64decoding method, which needs to process bytes, not str. Which means you either need to encode data (as I have done above), or read in the JSON data as bytes instead of strings.
There is a .net API sending the byte data as a string in JSON, I am using a python API to read it and write it in a file.
a = io.BytesIO(b"JVBERi0xLjQNJcjIyMjIyMg...")
with open('test.pdf','wb') as g:
g.write(a.getvalue())
I created a file with this code but unable to open the file.
I need another way of doing the same.
instead of using io.BytesIO use base64.
import base64
data = "strin data"
base64_data =base64.b64decode(data)
with open(filename,'wb') as f:
f.write(base64_data)
I want to automatically download files from a pdf (in which there are links).
I already wrote a script which finds all these links and that works great, the problem I'm facing is with the files' names.
I want to save them by their default names so it will be easy to understand what each file is, without the need to manually change each name.
The problem is, each name is encoded using unicode (utf-8) according to this site https://www.webatic.com/url-convertor which converts the encoded strings great, but python doesn't let me use the function decode to decode this.
For example: this string %D7%97%D7%95%D7%9E%D7%A8%D7%99+%D7%9C%D7%99%D7%9E%D7%95%D7%93 should become חומרי לימוד after decoding.
Python has an URL parser:
>>> import urllib.parse
>>> urllib.parse.unquote_plus('%D7%97%D7%95%D7%9E%D7%A8%D7%99+%D7%9C%D7%99%D7%9E%D7%95%D7%93')
'חומרי לימוד'
I have read the file stream of a zip file by the following code:
file = open(source_url, "rb")
data = file.read()
file.close()
byte_arr = base64.b64encode(data)
Now I am trying to call a webservice which accepts base64Binary format of data (byte array written in java). If I send byte_arr to the web-service I get client error:
Fault env:Client: Caught exception while handling request: unexpected element type: expected={http://www.w3.org/2001/XMLSchema}base64Binary, actual={http://www.w3.org/2001/XMLSchema}string
Please suggest why is base64 module not working for me.
type(byte_arr) is still string.
With thanks,
Sandhya
I guess there's nothing wrong with your base64 encoding. It seems like it is not embedded in a correct XML document. Probably the error is when you send your data, maybe you should check that piece of code.