I have a binary sequence, for example: 10010111010101. I need to output this sequence to a file and then read it later but I want it to be compressed as much as possible, what is the easiest way to do this?
I have tried to take every 8 bits (byte) in the sequence together and output the byte value and then when I read it later, I cut it bit by bit, is there an easier way? or a module that does this readily?
The best textual encoding for binary data is either base64 or ascii85.
ASCII85
import base64
import sys
# Length of the binary string in bytes (32 bytes will let you have a 256 digit binary character stream)
# Keep it as low as possible to save space
length = 32
binary_string = input('Enter binary string : ')
integer = eval('0b'+binary_string)
data = integer.to_bytes(length, sys.byteorder, signed=False)
print(base64.a85encode(data).decode('utf-8'))
Base64
import base64
import sys
# Length of the binary string in bytes (32 bytes will let you have a 256 digit binary character stream)
# Keep it as low as possible to save space
length = 32
binary_string = input('Enter binary string : ')
integer = eval('0b'+binary_string)
data = integer.to_bytes(length, sys.byteorder, signed=False)
print(base64.b64encode(data).decode('utf-8'))
WARNING: Typically sys.byteorder is little-endian, so you might run into problems when you try to load up the file.
I have the following string that I am receiving via a python server. Do not have access to that server.
\xa1\x823\xc2\xd5\x823\xc2\xff\x823\xc2\x12\x833\xc2\x1b\x833\xc2\x16\x833\xc2\x1e\x833\xc2 \x833\xc2\x0e\x833\xc2\x03\x833\xc2\x01\x833\xc2\x10\x833\xc2\'\x833\xc2\x17\x833\xc2\x00\x833\xc2\x11\x833\xc2$\x833\xc2$\x833\xc2\x1f\x833\xc2\x02\x833\xc2\xc0\x823\xc2\x94\x823\xc2\x91\x823\xc2\x7f\x823\xc2a\x823\xc2R\x823\xc2N\x823\xc2e\x823\xc2+\x823\xc2\xd3\x813\xc2\xee\x813\xc2\xe9\x813\xc2\xdf\x813\xc2\xfb\x813\xc2(\x823\xc25\x823\xc2\x17\x823\xc2\x1c\x823\xc2;\x823\xc2\xa2\x823\xc2\xe5\x823\xc2\xc2\x823\xc2\xbc\x823\xc2\x9b\x823\xc2\x13\x823\xc2\xbd\x813\xc2\xc0\x813\xc2\xc5\x813\xc2\xf2\x813\xc2(\x823\xc27\x823\xc2;\x823\xc2.\x823\xc2,\x823\xc20\x823\xc2\x11\x823\xc2\x0b\x823\xc2\xdf\x813\xc2\xb0\x813\xc2\xa2\x813\xc2\x7f\x813\xc2v\x813\xc2y\x813\xc2l\x813\xc2m\x813\xc2z\x813\xc2\x8c\x813\xc2\x89\x813\xc2w\x813\xc2Y\x813\xc2Y\x813\xc2c\x813\xc2e\x813\xc2Z\x813\xc2\x10\x813\xc2\xd2\x803\xc2\x8c\x803\xc2G\x803\xc2)\x803\xc2-\x803\xc2\x19\x803\xc2\xef\x7f3\xc2\xc9\x7f3\xc2\xc9\x7f3\xc2\xc8\x7f^C}3\xc2\xe7}3\xc2\xdd}3\xc2\xbc}3\xc2\xa9}3\xc2\xb7}3\xc2\xc1}3\xc2\xb0}3\xc2\x95}3\xc2\x9f}3\xc2\xd8}3\xc2\x05~3\xc2\x12~3\xc2\x15~3\xc2\r~3\xc2\x15~3\xc23~3\xc2/~3\xc2\x1d~3\xc2\x17~3\xc2\x15~3\xc2\x1d~3\xc2\x1e~3\xc2\x1a~3\xc2\x1f~3\xc2E~3\xc2W~3\xc2C~3\xc2o~3\xc2g~3\xc2p~3\xc2\xa3~3\xc2\x9b~3\xc2\x9e~3\xc2\x9e~3\xc2\xce~3\xc2\xe5~3\xc2\xe0~3\xc2\xd2~3\xc2\xc6~3\xc2\xc6~3\xc2\xc1~3\xc2\xca~3\xc2\xd6~3\xc2\xce~3\xc2\xa4~3\xc2\xad~3\xc2\xe1~3\xc2\xf8~3\xc2\xf8~3\xc2\x11\x7f3\xc2;\x7f3\xc2)\x7f3\xc2\xe6~3\xc2\xc4~3\xc2\xcc~3\xc2\xcd~3\xc2\xca~3\xc2\xc4~3\xc2\xbf~3\xc2\xcc~3\xc2\xc8~3\xc2\xc8~3\xc2\xd3~3\xc2\xd5~3\xc2\xa2~3\xc2L~3\xc2\x1c~3\xc2\x11~3\xc2\x14~3\xc2\x0e~3\xc2\x01~3\xc2\xf2}3\xc2\xf8}3\xc2\x05~3\xc2\xe3}3\xc2\xb0}3\xc2\x9c}3\xc2\x9e}3\xc2\x90}3\xc2\xcc}3\xc2\x1b~3\xc2\x05~3\xc2\xfa}3\xc2\x06~3\xc2\xf7}3\xc2\xf6}3\xc2\x15~3\xc2\x1f~3\xc2\x1b~3\xc2#~3\xc23~3\xc2H~3\xc2o~3\xc2\x89~3\xc2\x89~3\xc2\x94~3\xc2\x97~3\xc2\x84~3\xc2m~3\xc2\x8d~3\xc2\xdf~3\xc2\x0e\x7f3\xc2\x10\x7f3\xc27\x7f3\xc2]\x7f3\xc2i\x7f3\xc2e\x7f3\xc2[\x7f3\xc2k\x7f3\xc2x\x7f3\xc2\x89\x7f3\xc2\x9b\x7f3\xc2\xae\x7f3\xc2\xbd\x7f3\xc2\xb2\x7f3\xc2\xa4\x7f3\xc2\xba\x7f3\xc2\xce\x7f3\xc2\xd1\x7f3\xc2\xd0\x7f3\xc2\xc7\x7f3\xc2\xaa\x7f3\xc2m\x7f3\xc25\x7f3\xc2\x1e\x7f3\xc2\x1f\x7f3\xc2\x1b\x7f3\xc2\x1e\x7f3\xc2\r\x7f3\xc2\xed~3\xc2\xe3~3\xc2\xdd~3\xc2\xe6~3\xc2\x15\x7f3\xc2:\x7f3\xc29\x7f3\xc2B\x7f3\xc2N\x7f3\xc21\x7f3\xc2\x11\x7f3\xc2\x13\x7f3\xc2:\x7f3\xc2k\x7f3\xc2v\x7f3\xc2u\x7f3\xc2\x89\x7f3\xc2\x9f\x7f3\xc2\xa7\x7f3\xc2\xbe\x7f3\xc2\xd1\x7f3\xc2\xec\x7f3\xc2\n\x803\xc2\t\x803\xc2\x1f\x803\xc2Y\x803\xc2{\x803\xc2t\x803\xc2p\x803\xc2i\x803\xc2
In reality, this should be floating point number after decoding.
How can I decode it? How to know the encoding of the string? Preferably using python !!
I tried chardet , decode('utf8') and what not !! Any help is appreciated.
After trying this >
c=a.decode('utf-16-be', errors='ignore').encode('ascii')
Got this >
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-199: ordinal not in range(128)
after trying this >>>
c=a.decode('utf-16-le').encode('ascii')
Got this >>>>
File "/usr/lib/python2.7/encodings/utf_16_le.py", line 16, in decode
return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x33 in position
470: truncated data
It looks like this data has been packed using the Python's struct module.
I'm not sure what the first two characters in the string represent, they aren't floats but could be chars or short ints. The remainder of the string comprises of floats.
Ignoring the first two characters for now, we get:
struct.unpack('!143f', s2[2:]) # s2 is the example string from the question
(9.037722037419371e-08, 9.038631532121144e-08, 9.036266845896535e-08, 9.071190731901879e-08, 9.06009489654025e-08, 9.058094008196349e-08, 9.058094008196349e-08, 9.058457806077058e-08, 9.063005279585923e-08, 9.067370854154433e-08, 9.071736428722943e-08, 9.071190731901879e-08, 9.064278572168405e-08, 9.063005279585923e-08, 9.062277683824504e-08, 9.059003502898122e-08, 9.058821603957767e-08, 9.057912109255994e-08, 9.057366412434931e-08, 9.054456029389257e-08, 9.051727545283939e-08, 9.05099994952252e-08, 9.04863526329791e-08, 9.046270577073301e-08, 9.04463348661011e-08, 9.04663437495401e-08, 9.050090454820747e-08, 9.054274130448903e-08, 9.050090454820747e-08, 9.052273242105002e-08, 9.062277683824504e-08, 9.061368189122732e-08, 9.067916550975497e-08, 9.07519250858968e-08, 9.073009721305425e-08, 9.071918327663298e-08, 9.06882604567727e-08, 9.066097561571951e-08, 9.071372630842234e-08, 9.071190731901879e-08, 9.069735540379043e-08, 9.071918327663298e-08, 9.071918327663298e-08, 9.071918327663298e-08, 9.075738205410744e-08, 9.076283902231808e-08, 9.072645923424716e-08, 9.074101114947553e-08, 9.070281237200106e-08, 9.069189843557979e-08, 9.069371742498333e-08, 9.066825157333369e-08, 9.062277683824504e-08, 9.060276795480604e-08, 9.063732875347341e-08, 9.06409667322805e-08, 9.068644146736915e-08, 9.074646811768616e-08, 9.072645923424716e-08, 9.073009721305425e-08, 9.074464912828262e-08, 9.06882604567727e-08, 9.068644146736915e-08, 9.074283013887907e-08, 9.075010609649325e-08, 9.070463136140461e-08, 9.069189843557979e-08, 9.075920104351098e-08, 9.079012386337126e-08, 9.078830487396772e-08, 9.034265957552634e-08, 9.036630643777244e-08, 9.038813431061499e-08, 9.04245140986859e-08, 9.051910154767029e-08, 9.056457628275894e-08, 9.056639527216248e-08, 9.061550798605822e-08, 9.071737139265679e-08, 9.07337422972887e-08, 9.077557905357025e-08, 9.042452120411326e-08, 9.040087434186717e-08, 9.037540849021752e-08, 9.03626755643927e-08, 9.039541737365653e-08, 9.046453897099127e-08, 9.043907311934163e-08, 9.042634019351681e-08, 9.046453897099127e-08, 9.050455673786928e-08, 9.052456562130828e-08, 9.050637572727283e-08, 9.046090099218418e-08, 9.041542625709553e-08, 9.036449455379625e-08, 9.080286389462344e-08, 9.034448567035724e-08, 9.077376006416671e-08, 9.070827644563906e-08, 9.066825867876105e-08, 9.068826756220005e-08, 9.070100048802487e-08, 9.065916373174332e-08, 9.065552575293623e-08, 9.073919926549934e-08, 9.038268444783171e-08, 9.039359838425298e-08, 9.044453008755227e-08, 9.059004923983593e-08, 9.06518948795565e-08, 9.06682657841884e-08, 9.079195706362952e-08, 9.044089921417253e-08, 9.044089921417253e-08, 9.042089033073353e-08, 9.041361437311934e-08, 9.041725235192644e-08, 9.039360548968034e-08, 9.042816628834771e-08, 9.048091698105054e-08, 9.053146499127251e-08, -75.2074203491211, -71.7074203491211, -61.10371017456055, -55.85371017456055, -58.35371017456055, -87.2074203491211, -102.7074203491211, -103.7074203491211, -107.2074203491211, -118.2074203491211, -114.7074203491211, -111.2074203491211, -105.7074203491211, -74.7074203491211, -58.10371017456055, -55.35371017456055, -58.55054473876953, 1.054752845871448e+18, 1005890699264.0, 6.59220528669655e+16, -7.216911831845169e-31)
Treating the first two characters as chars:
struct.unpack('!2c', s2[:2])
('5', 'g')
As short ints:
struct.unpack('!h', s2[:2])
(13671,)
You can unpack the whole string at once by combining the formats:
>>> struct.unpack('!h143f', s2)
The format string consists of three parts:
! indicates that we are using network (big-endian) byte order.
h indicates the first 2 bytes are a short (the size of a short int is 2); if the first two bytes where chars (size 1) we would use 2c instead of h.
143f indicates that there follows 143 floats (the size of afloat is 4)
Added together, the sizes equal the length of the input string:
2 + (143 *4) == len(s2) == 574
True
I'm trying to write a script that generates random unicode by creating random utf-8 encoded strings and then decoding those to unicode. It works fine for a single byte, but with two bytes it fails.
For instance, if I run the following in a python shell:
>>> a = str()
>>> a += chr(0xc0) + chr(0xaf)
>>> print a.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 0: invalid start byte
According to the utf-8 scheme https://en.wikipedia.org/wiki/UTF-8#Description the byte sequence 0xc0 0xaf should be valid as 0xc0 starts with 110 and 0xaf starts with 10.
Here's my python script:
def unicode(self):
'''returns a random (astral) utf encoded byte string'''
num_bytes = random.randint(1,4)
if num_bytes == 1:
return self.gen_utf8(num_bytes, 0x00, 0x7F)
elif num_bytes == 2:
return self.gen_utf8(num_bytes, 0xC0, 0xDF)
elif num_bytes == 3:
return self.gen_utf8(num_bytes, 0xE0, 0xEF)
elif num_bytes == 4:
return self.gen_utf8(num_bytes, 0xF0, 0xF7)
def gen_utf8(self, num_bytes, start_val, end_val):
byte_str = list()
byte_str.append(random.randrange(start_val, end_val)) # start byte
for i in range(0,num_bytes-1):
byte_str.append(random.randrange(0x80,0xBF)) # trailing bytes
a = str()
sum = int()
for b in byte_str:
a += chr(b)
ret = a.decode('utf-8')
return ret
if __name__ == "__main__":
g = GenFuzz()
print g.gen_utf8(2,0xC0,0xDF)
This is, indeed, invalid UTF-8. In UTF-8, only code points in the range U+0080 to U+07FF, inclusive, can be encoded using two bytes. Read the Wikipedia article more closely, and you will see the same thing. As a result, the byte 0xc0 may not appear in UTF-8, ever. The same is true of 0xc1.
Some UTF-8 decoders have erroneously decoded sequences like C0 AF as valid UTF-8, which has lead to security vulnerabilities in the past.
Found one standard that actually accepts 0xc0 : encoding="ISO-8859-1"
from https://stackoverflow.com/a/27456542/4355695
But this entails making sure the rest of the file doesn't have unicode chars, so this would not be an exact answer to the question, but may be helpful for folks like me who didn't have any unicode chars in their file anyways and just wanted python to load the damn thing and both utf-8 and ascii encodings were erroring out.
More on ISO-8859-1 : What is the difference between UTF-8 and ISO-8859-1?
This question already has answers here:
Best way to convert string to bytes in Python 3?
(5 answers)
Closed 11 days ago.
I have the following function to parse a utf-8 string from a sequence of bytes
Note -- 'length_size' is the number of bytes it take to represent the length of the utf-8 string
def parse_utf8(self, bytes, length_size):
length = bytes2int(bytes[0:length_size])
value = ''.join(['%c' % b for b in bytes[length_size:length_size+length]])
return value
def bytes2int(raw_bytes, signed=False):
"""
Convert a string of bytes to an integer (assumes little-endian byte order)
"""
if len(raw_bytes) == 0:
return None
fmt = {1:'B', 2:'H', 4:'I', 8:'Q'}[len(raw_bytes)]
if signed:
fmt = fmt.lower()
return struct.unpack('<'+fmt, raw_bytes)[0]
I'd like to write the function in reverse -- i.e. a function that will take a utf-8 encoded string and return it's representation as a byte string.
So far, I have the following:
def create_utf8(self, utf8_string):
return utf8_string.encode('utf-8')
I run into the following error when attempting to test it:
File "writer.py", line 229, in create_utf8
return utf8_string.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0x98 in position 0: ordinal not in range(128)
If possible, I'd like to adopt a structure for the code similar to the parse_utf8 example. What am I doing wrong?
Thank you for your help!
UPDATE: test driver, now correct
def random_utf8_seq(self, length):
# from http://www.w3.org/2001/06/utf-8-test/postscript-utf-8.html
test_charset = u" !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬ ®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĂ㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŒœŔŕŘřŚśŞşŠšŢţŤťŮůŰűŸŹźŻżŽžƒˆˇ˘˙˛˜˝–—‘’‚“”„†‡•…‰‹›€™"
utf8_seq = u""
for i in range(length):
utf8_seq += random.choice(test_charset)
return utf8_seq
I get the following error:
input_str = self.random_utf8_seq(200)
File "writer.py", line 226, in random_utf8_seq
print unicode(utf8_seq, "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbb in position 0: invalid start byte
If utf-8 => bytestring conversion is what do you want then you may use str.encode, but first you need to properly mark the type of source string in your example - prefix with u for unicode:
# coding: utf-8
import random
def random_utf8_seq(length):
# from http://www.w3.org/2001/06/utf-8-test/postscript-utf-8.html
test_charset = u" !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬ ®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĂ㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŒœŔŕŘřŚśŞşŠšŢţŤťŮůŰűŸŹźŻżŽžƒˆˇ˘˙˛˜˝–—‘’‚“”„†‡•…‰‹›€™"
utf8_seq = u''
for i in range(length):
utf8_seq += random.choice(test_charset)
print utf8_seq.encode('utf-8')
return utf8_seq.encode('utf-8')
print( type(random_utf8_seq(200)) )
-- output --
õ3×sÔP{Ć.s(Ë°˙ě÷xÓ#bűV—û´ő¢uZÓČn˜0|_"Ðyø`êš·ÏÝhunÍÅ=ä?
óP{tlÇűpb¸7s´ňƒG—čøň\zčłŢXÂYqLĆúěă(ÿî ¥PyÐÔŇnל¦Ì˝+•ì›
ŻÛ°Ñ^ÝC÷ŢŐIñJĹţÒył"MťÆ‹ČČ4þ!»šåŮ#Öhň-
ÈLGĄ¢ß˛Đ¯.ªÆź˘Ř^ĽÛŹËaĂŕ¹#¢éüÜńlÊqš=VřU…‚–MŽÎÉèoÙŹŠ¨Ð
<type 'str'>
I have Base64 encoded data from an experiment. So what I am trying to do in stepwise is:
Retrieve bytes from base64 encoding (Decode it)
Convert bytes to little-endian
Decompress bytes from (zlib)
Convert byte array to float array
Example:
Dn3LQ3np4kOyxQJE20kDRBRuFkScZB5ENxEzRFa+O0THMz9EOQRCRFC1QkRYeUNEwXJJROfbSUScvE5EVDtVRK5PV0TLUWNE481lRHX7ZkSBBWpE9FVyRIFdeESkoHhEnid8RI1nfUSy4YBE/C2CRGKQg0RcR4RE54uEROUAhUTBWodErKyMRNsVkkRvUpJEukWURO58lkSqRZ1E2VauRPBTwEQf9cVE9BnKRA==
What I have tried so far
import os
import base64
import struct
s = 'Dn3LQ3np4kOyxQJE20kDRBRuFkScZB5ENxEzRFa+O0THMz9EOQRCRFC1QkRYeUNEwXJJROfbSUScvE5EVDtVRK5PV0TLUWNE481lRHX7ZkSBBWpE9FVyRIFdeESkoHhEnid8RI1nfUSy4YBE/C2CRGKQg0RcR4RE54uEROUAhUTBWodErKyMRNsVkkRvUpJEukWURO58lkSqRZ1E2VauRPBTwEQf9cVE9BnKRA=='
decode=base64.decodestring(s)
tmp_size=len(decode)/4
Now I am trying to convert these byte to little endian from here.
I want to do the next operation in Python.
I am trying to figure it out myself but, it is taking too much time.
Thanks!
It appears your data isn't actually compressed. Read the data as floats either in a loop using struct.unpack_from() or as one big structure using struct.unpack().
import base64
import struct
encoded = 'Dn3LQ3np ... 9BnKRA=='
# decode the string
data = base64.standard_b64decode(encoded)
# ensure that there's enough data for 32-bit floats
assert len(data) % 4 == 0
# determine how many floats there are
count = len(data) // 4
# unpack the data as floats
result = struct.unpack('<{0}f'.format(count), # one big structure of `count` floats
data) # results returned as a tuple
If the data is compressed, decompress it.
import zlib
decompressed = zlib.decompress(data)
Convert bytes to little-endian
Byte ordering only applies to data types that are greater than 1 byte. So you can't just convert a list of bytes to little-endian. You need to understand what is in your list of bytes.
A 32-bit integer is 4 bytes; If you have 16 bytes of data. You could "unpack" that into 4 32-bit integers.
If the data is just ascii text the endianness doesn't matter, that's why you can read the exact same ascii text file on both big-endian and little-endian machines.
Here is an example demonstrating struct.pack and struct.unpack:
#!/usr/bin/env python2.7
import struct
# 32-bit unsigned integer
# base 10 2,864,434,397
# base 16 0xAABBCCDD
u32 = 0xAABBCCDD
print 'u32 =', u32, '(0x%x)' % u32
# big endian 0xAA 0xBB 0xCC 0xDD
u32be = struct.pack('>I', u32)
bx = [byte for byte in struct.unpack('4B', u32be)]
print 'big endian packed', ['0x%02x' % x for x in bx]
assert bx == [0xaa, 0xbb, 0xcc, 0xdd]
# little endian 0xDD 0xCC 0xBB 0xAA
u32le = struct.pack('<I', u32)
lx = [byte for byte in struct.unpack('4B', u32le)]
print 'little endian packed', ['0x%02x' % x for x in lx]
assert lx == [0xdd, 0xcc, 0xbb, 0xaa]
# 64-bit unsigned integer
# base 10 12,302,652,060,662,200,000
# base 16 0xAABBCCDDEEFF0011
u64 = 0xAABBCCDDEEFF0011L
print 'u64 =', u64, '(0x%x)' % u64
# big endian 0xAA 0xBB 0xCC 0xDD 0xEE 0xFF 0x00 0x11
u64be = struct.pack('>Q', u64)
bx = [byte for byte in struct.unpack('8B', u64be)]
print 'big endian packed', ['0x%02x' % x for x in bx]
assert bx == [0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff, 0x00, 0x11]
# little endian 0x11 0x00 0xFF 0xEE 0xDD 0xCC 0xBB 0xAA
u64le = struct.pack('<Q', u64)
lx = [byte for byte in struct.unpack('8B', u64le)]
print 'little endian packed', ['0x%02x' % x for x in lx]
assert lx == [0x11, 0x00, 0xff, 0xee, 0xdd, 0xcc, 0xbb, 0xaa]
check out the documentation for more info: http://docs.python.org/library/struct.html#format-strings
Looks like your next step will be to use struct. Something like this:
struct.unpack("<f", decode[0:4])
This example will turn the first four bytes of decode into a float. Check out the struct documentation for more info on format strings, etc.