I need to be able to send encrypted data between a Ruby client and a Python server (and vice versa) and have been having trouble with the ruby-aes gem/library. The library is very easy to use but we've been having trouble passing data between it and the pyCrypto AES library for Python. These libraries seem to be fine when they're the only one being used, but they don't seem to play well across language boundaries. Any ideas?
Edit: We're doing the communication over SOAP and have also tried converting the binary data to base64 to no avail. Also, it's more that the encryption/decryption is almost but not exactly the same between the two (e.g., the lengths differ by one or there is extra garbage characters on the end of the decrypted string)
(e.g., the lengths differ by one or there is extra garbage characters on the end of the decrypted string)
I missed that bit. There's nothing wrong with your encryption/decryption. It sounds like a padding problem. AES always encodes data in blocks of 128 bits. If the length of your data isn't a multiple of 128 bits the data should be padded before encryption and the padding needs to be removed/ignored after encryption.
Turns out what happened was that ruby-aes automatically pads data to fill up 16 chars and sticks a null character on the end of the final string as a delimiter. PyCrypto requires you to do multiples of 16 chars so that was how we figured out what ruby-aes was doing.
It's hard to even guess at what's happening without more information ...
If I were you, I'd check that in your Python and Ruby programs:
The keys are the same (obviously). Dump them as hex and compare each byte.
The initialization vectors are the same. This is the parameter IV in AES.new() in pyCrypto. Dump them as hex too.
The modes are the same. The parameter mode in AES.new() in pyCrypto.
There are defaults for IV and mode in pyCrypto, but don't trust that they are the same as in the Ruby implementation. Use one of the simpler modes, like CBC. I've found that different libraries have different interpretations of how the mode complex modes, such as PTR, work.
Wikipedia has a great article about how block cipher modes.
Kind of depends on how you are transferring the encrypted data. It is possible that you are writing a file in one language and then trying to read it in from the other. Python (especially on Windows) requires that you specify binary mode for binary files. So in Python, assuming you want to decrypt there, you should open the file like this:
f = open('/path/to/file', 'rb')
The "b" indicates binary. And if you are writing the encrypted data to file from Python:
f = open('/path/to/file', 'wb')
f.write(encrypted_data)
Basically what Hugh said above: check the IV's, key sizes and the chaining modes to make sure everything is identical.
Test both sides independantly, encode some information and check that Ruby and Python endoded it identically. You're assuming that the problem has to do with encryption, but it may just be something as simple as sending the encrypted data with puts which throws random newlines into the data. Once you're sure they encrypt the data correctly, check that you receive exactly what you think you sent. Keep going step by step until you find the stage that corrupts the data.
Also, I'd suggest using the openssl library that's included in ruby's standard library instead of using an external gem.
Related
Once reading in memory a binary, is there an efficient way to tell whether the binary is a Flatbuf? Preferably a Python solution.
There is provision of 4 byte "root identifier" at beginning of buffer(search for it in flat buffer schema language). If your schema has implemented that, you can just validate first 4 bytes.
Or if you have the generated interfaces with you, you may run the complete "flatbuffer verifier" on binary to see if it's a valid flat buffer.
I want to apply an AES 128b encryption (probably CBC + Padding) on a data stream.
In case it matters, I'm sending chunks of around 1500bits each.
I work in Python, and I did a small test with M2Crypto with AES encrypt in one side and decrypt at the other side. It works perfect, but probably don't really secures anything since I use the same key, same IVS and all that.
So, the question is: What the best approach for AES encryption on large data streams?
I thought about loading a new 'keys' file from time to time. Then, the application will use this file to expend and extract AES keys or something like that, but it still sounds awful to build a new AES object for each chunk, so there must be a better way.
I believe I can also use the IVS here, but not quite sure where and how.
I have to migrate data to OpenERP through XMLRPC by using TerminatOOOR.
I send a name with value "Rotule right Aurélia".
In Python the name with be encoded with value : 'Rotule right Aur\xc3\xa9lia '
But in TerminatOOOR (xmlrpc client) the data is encoded with value 'Rotule middle Aur\357\277\275lia'
So in the server side, the data value is not decoded correctly and I get bad data.
The terminateOOOR is a ruby plugin for Kettle ( Java product) and I guess it should encode data by utf-8.
I just don't know why it happens like this.
Any help?
This issue comes from Kettle.
My program is using Kettle to get an Excel file, get the active sheet and transfer the data in that sheet to TerminateOOOR for further handling.
At the phase of reading data from Excel file, Kettle can not recognize the encoding then it gives bad data to TerminateOOOR.
My work around solution is manually exporting excel to csv before giving data to TerminateOOOR. By doing this, I don't use the feature to mapping excel column name a variable name (used by kettle).
first off, whenever you deal with text (and all text is bound to contain some non-US-ASCII character sooner or later), you'll be much happier doing that in Python 3.x instead of in the 2.x series. if Py3 is not an option, try to always use from __future__ import unicode_literals (available in Python 2.6 and 2.7).
basically, when you send text or any other data over the wire, that will only happen in the form of bytes (octets of bits), so it will have to be encoded at some point. try to find out exactly where that encoding takes place in your tool chain; if necessary, use a debugging tool (or deploy print( repr( x ) ) statements) to look into relevant variables. the other software you mention is presumably written in PHP, a language which is known to have issues with unicode. you say that 'it should encode the data by utf-8', but on the other hand, when the receiving end sees the data of an incoming RPC request, that data should already be in utf-8. it would have to be decoded to obtain unicode again.
I need to transfer an array of varying length in which each element is a tuple of two integers. As an example:
path = [(1,1),(1,2)]
path = [(1,1),(1,2),(2,2)]
I am trying to use pack and unpack, however, since the array is of varying length I don't know how to create a format such that both know the format. I was trying to turn it into a single string with delimiters, such as:
msg = 1&1~1&2~
sendMsg = pack("s",msg) or sendMsg = pack("s",str(msg))
on the receiving side:
path = unpack("s",msg)
but that just prints 1 in this case. I was also trying to send 4 integers as well, which send and receive fine, so long as I don't include the extra string representing the path.
sendMsg = pack("hhhh",p.direction[0],p.direction[1],p.id,p.health)
on the receive side:
x,y,id,health = unpack("hhhh",msg)
The first was for illustration as I was trying to send the format "hhhhs", but either way the path doesn't come through properly.
Thank-you for your help. I will also be looking at sending a 2D array of ints, but I can't seem to figure out how to send these more 'complex' structures across the network.
Thank-you for your help.
While you can use pack and unpack, I'd recommend using something like YAML or JSON to transfer your data.
Pack and unpack can lead to difficult to debug errors and incompatibilities if you change your interface and have different versions trying to communicate with each other.
Pickle can give security problems and the pickle format might change between Python versions.
JSON is included in the standard Python distribution since 2.6. For YAML there is PyYAML.
You want some sort of serialization protocol. twisted.spread provides one such (see the Banana specification or Perspective Broker documentation). JSON or protocol buffers would be more verbose examples.
See also Comparison of data serialization formats.
If you include message length as part of the message, then you will know how much data to read. So the entire string should be read across the network.
In any case, perhaps it would help if you posted some of the code you are using to send data across the network, or at least provided more of a description.
Take a look at xdrlib, it might help. It's part of the standard library, and:
The xdrlib module supports the External Data Representation Standard as described in RFC 1014, written by Sun Microsystems, Inc. June 1987. It supports most of the data types described in the RFC.
Pack and unpack are mandatory?
If not, you could use JSON and YAML.
Don't use pickle because is not secure.
This is mainly just a "check my understanding" type of question. Here's my understanding of CLOBs and BLOBs as they work in Oracle:
CLOBs are for text like XML, JSON, etc. You should not assume what encoding the database will store it as (at least in an application) as it will be converted to whatever encoding the database was configured to use.
BLOBs are for binary data. You can be reasonably assured that they will be stored how you send them and that you will get them back with exactly the same data as they were sent as.
So in other words, say I have some binary data (in this case a pickled python object). I need to be assured that when I send it, it will be stored exactly how I sent it and that when I get it back it will be exactly the same. A BLOB is what I want, correct?
Is it really feasible to use a CLOB for this? Or will character encoding cause enough problems that it's not worth it?
CLOB is encoding and collation sensitive, BLOB is not.
When you write into a CLOB using, say, CL8WIN1251, you write a 0xC0 (which is Cyrillic letter А).
When you read data back using AL16UTF16, you get back 0x0410, which is a UTF16 represenation of this letter.
If you were reading from a BLOB, you would get same 0xC0 back.
Your understanding is correct. Since you mention Python, think of the Python 3 distinction between strings and bytes: CLOBs and BLOBs are quite analogous, with the extra issue that the encoding of CLOBs is not under your app's control.