How to decompress text in Python that has been compressed with gzip?

How to decompress text in Python that has been compressed with gzip? - python

How can you decompress a string of text in Python 3, that has been compressed with gzip and converted to base 64?
For example, the text:
EgAAAB+LCAAAAAAABAALycgsVgCi4vzcVAWFktSKEgC9n1/fEgAAAA==
Should convert to:
This is some text
The following C# code successfully does this:
var gzBuffer = Convert.FromBase64String(compressedText);
using (var ms = new MemoryStream()) {
int msgLength = BitConverter.ToInt32(gzBuffer, 0);
ms.Write(gzBuffer, 4, gzBuffer.Length - 4);
var buffer = new byte[msgLength];
ms.Position = 0;
using (var zip = new GZipStream(ms, CompressionMode.Decompress)) {
zip.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}

You can use the gzip and base64 modules.
>>> import gzip
>>> import base64
>>> s = 'EgAAAB+LCAAAAAAABAALycgsVgCi4vzcVAWFktSKEgC9n1/fEgAAAA=='
>>> gz = base64.b64decode(s)
>>> gz
b'\x12\x00\x00\x00\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\x0b\xc9\xc8,V\x00\xa2\xe2\xfc\xdcT\x05\x85\x92\xd4\x8a\x12\x00\xbd\x9f_\xdf\x12\x00\x00\x00'
# If you need the length
import struct
# Unpacks binary encoded 4 byte integer (assume native byte order)
# Only select first four bytes with [:4] slice
>>> struct.unpack('i', gz[:4])[0]
18
# Skip length value with [4:] slice
>>> gzip.decompress(gz[4:]).decode('UTF8')
'This is some text'

Related

Crypto++ :Encrypt in Python ,decipher in C++

I am trying to do the following:In a python script I use pycrypto lib to encrypt some text.Then I save it to file.Then I load that file and decode the encrypted text using the same key I used in Python.It fails at stfDecryptor.MessageEnd(); with the error:
"CryptoCPP::InvalidCiphertext at memory location [some memory]
Here is my code:
Python:
from Crypto.Cipher import AES
BLOCK_SIZE = 16
PADDING = '{'
# one-liner to sufficiently pad the text to be encrypted
pad = lambda s: s + (BLOCK_SIZE - len(s) % BLOCK_SIZE) * PADDING
EncodeAES = lambda c, s: c.encrypt(pad(s))
secret = 'MyKey123456789ab'
# create a cipher object using the random secret
cipher = AES.new(secret)
# encode a string
encoded = EncodeAES(cipher, textIn)
#save to file
fileOut = open("enc_shader.vert","w")
fileOut.write(encoded)
fileOut.close()
CPP :
std::string key = "MyKey123456789ab";
std::string iv = "aaaaaaaaaaaaaaaa";
std::ifstream fileIn("enc_shader.vert");
std::stringstream buffer;
buffer << fileIn.rdbuf();
std::string ciphertext1 = buffer.str();
CryptoPP::AES::Decryption aesDecryption((byte*)key.c_str(), CryptoPP::AES::DEFAULT_KEYLENGTH);
CryptoPP::CBC_Mode_ExternalCipher::Decryption cbcDecryption( aesDecryption, (byte*)iv.c_str() );
CryptoPP::StreamTransformationFilter stfDecryptor(cbcDecryption, new CryptoPP::StringSink( decryptedtext ) );
stfDecryptor.Put( reinterpret_cast<const unsigned char*>( ciphertext1.c_str() ), ciphertext1.size() );
stfDecryptor.MessageEnd();//fails here.
From what I read these to endpoints should work as pycrypto just a wrapper for the CryptoCPP lib.May be I miss the padding on CPP side?
UPDATE:
Ok,I found that changing the padding scheme:
CryptoPP::StreamTransformationFilter stfDecryptor(cbcDecryption, new CryptoPP::StringSink( decryptedtext ) ,BlockPaddingSchemeDef::NO_PADDING);
decodes the string on CPP side.But the decoded string contains the padding chars.
So if the original string was "aaaaaaaaaaaaaaaaa"
The decoded string looks like this:
"aaaaaaaaaaaaaaaaa{{{{{{{{{{{{{{{"
15 bytes were added to pad to 32 bytes.
Why Crypto++ doesn't remove those at decryption?

Your Python encryption code manually adds '{' characters to pad to the block size. This is not a defined padding mode, so the Crypto++ code will not be able to remove the padding using an integrated padding scheme. In other words, you should decrypt using NO_PADDING and then remove the padding yourself.
But it would be better to let the Python code use PKCS#7 padding, so you can use PKCS_PADDING as option within Crypto++.

How to remove first 4 bytes from s string in python

I got a special packet in string format, which has 32 bytes header and the body contains one of more entries, each consist of 90 bytes.
I want to process this string using python. Can I just read like sock read first 32 bytes header, and take it off the string, and continue read 90 bytes of the first entry?
something like:
str.read(32) # => "x01x02..."
str.read(90) # => "x02x05..."

You can use StringIO to read a string like a file
>>> import StringIO
>>> s = 'Hello, World!'
>>> sio = StringIO.StringIO(s)
>>> sio.read(6)
'Hello,'
>>> sio.read()
' World!'
I would also suggest you take a look at the struct module for help with parsing binary data
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
You define the format of the data using format strings, so 'hhl' in the above example is short (2 bytes), short (2 bytes), int (4 bytes). It also supports specifying endianness (byte order) in the format string.
For example if your header format was uint, 4 byte str, uint, uint, ushort, ulong:
>>> import struct
>>> data = ''.join(chr(i) for i in range(128)) * 10
>>> hdr_fmt = 'I4sIIHL'
>>> struct.calcsize(hdr_fmt)
32
>>> struct.unpack_from(hdr_fmt, data, 0)
(50462976, '\x04\x05\x06\x07', 185207048, 252579084, 4368, 2242261671028070680)

To split the packet into a 32 byte header and body:
header = packet[:32]
body = packet[32:]
To further split the body into one or more entries:
entries = [packet[i:i+90] for i in range(0, len(packet), 90)]

In python 2.x you could do simply:
header = s[:32]
body = s[32:32+90]
In python 3.x all strings are unicode, so I would convert to bytearray firstly:
s = bytearray(s)
header = s[:32]
body = s[32:32+90]

How to append a list of Hex to one Hex number

I have a list of hex bytes strings like this
['0xe1', '0xd7', '0x7', '0x0']
(as read from a binary file)
I want to flip the list and append the list together to create one hex number,
['0x07D7E1']
How do I format the list to this format?

Concatenate your hex numbers into one string:
'0x' + ''.join([format(int(c, 16), '02X') for c in reversed(inputlist)])
This does include the 00 byte explicitly in the output:
>>> inputlist = ['0xe1', '0xd7', '0x7', '0x0']
>>> '0x' + ''.join([format(int(c, 16), '02X') for c in reversed(inputlist)])
'0x0007D7E1'
However, I'd look into reading your binary file format better; using struct for example to unpack bytes directly from the file into proper integers in the right byte order:
>>> import struct
>>> bytes = ''.join([chr(int(c, 16)) for c in inputlist])
>>> value = struct.unpack('<I', bytes)[0]
>>> print hex(value)
0x7d7e1

Python write string of bytes to file

How do I write a string of bytes to a file, in byte mode, using python?
I have:
['0x28', '0x0', '0x0', '0x0']
How do I write 0x28, 0x0, 0x0, 0x0 to a file? I don't know how to transform this string to a valid byte and write it.

Map to a bytearray() or bytes() object, then write that to the file:
with open(outputfilename, 'wb') as output:
output.write(bytearray(int(i, 16) for i in yoursequence))
Another option is to use the binascii.unhexlify() function to turn your hex strings into a bytes value:
from binascii import unhexlify
with open(outputfilename, 'wb') as output:
output.write(unhexlify(''.join(format(i[2:], '>02s') for i in b)))
Here we have to chop off the 0x part first, then reformat the value to pad it with zeros and join the whole into one string.

In Python 3.X, bytes() will turn an integer sequence into a bytes sequence:
>>> bytes([1,65,2,255])
b'\x01A\x02\xff'
A generator expression can be used to convert your sequence into integers (note that int(x,0) converts a string to an integer according to its prefix. 0x selects hex):
>>> list(int(x,0) for x in ['0x28','0x0','0x0','0x0'])
[40, 0, 0, 0]
Combining them:
>>> bytes(int(x,0) for x in ['0x28','0x0','0x0','0x0'])
b'(\x00\x00\x00'
And writing them out:
>>> L = ['0x28','0x0','0x0','0x0']
>>> with open('out.dat','wb') as f:
... f.write(bytes(int(x,0) for x in L))
...
4

xor each byte with 0x71

I needed to read a byte from the file, xor it with 0x71 and write it back to another file. However, when i use the following, it just reads the byte as a string, so xoring creates problems.
f = open('a.out', 'r')
f.read(1)
So I ended up doing the same in C.
#include <stdio.h>
int main() {
char buffer[1] = {0};
FILE *fp = fopen("blah", "rb");
FILE *gp = fopen("a.out", "wb");
if(fp==NULL) printf("ERROR OPENING FILE\n");
int rc;
while((rc = fgetc(fp))!=EOF) {
printf("%x", rc ^ 0x71);
fputc(rc ^ 0x71, gp);
}
return 0;
}
Could someone tell me how I could convert the string I get on using f.read() over to a hex value so that I could xor it with 0x71 and subsequently write it over to a file?

If you want to treat something as an array of bytes, then usually you want a bytearray as it behaves as a mutable array of bytes:
b = bytearray(open('a.out', 'rb').read())
for i in range(len(b)):
b[i] ^= 0x71
open('b.out', 'wb').write(b)
Indexing a byte array returns an integer between 0x00 and 0xff, and modifying in place avoid the need to create a list and join everything up again. Note also that the file was opened as binary ('rb') - in your example you use 'r' which isn't a good idea.

Try this:
my_num = int(f.read(1))
And then xor the number stored in my_num.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to decompress text in Python that has been compressed with gzip? - python

Related

Crypto++ :Encrypt in Python ,decipher in C++

How to remove first 4 bytes from s string in python

How to append a list of Hex to one Hex number

Python write string of bytes to file

xor each byte with 0x71

Categories

Resources