I have written an aes cipher in python and it works well with simple text files.
when viewing a ~200k .txt file before and after encryption/decryption through a hex editor the bytes are identical, however there are issues when I try to encrypt/decrypt any other file types (png of similar size for example). The beginning of the decrypted file is the same as the original but there are differences. a single byte will be missing from the decrypted file that was present in the original but the rest is correct.
what is likely to be the cause? if it was down to the algorithm being incorrect then would it not be affecting text files as well?
I worked out what was wrong. I was stupidly removing padding with bytearray.remove() rather than .pop()
Related
I've recently tried to create my own Vigenère encryption script for files. It works without issues with both encryption and decryption mechanisms. But it goes REALLY slow when it comes to "big files". For example, a 772 Ko file took 13.71s and it doesn't look linear as I would expect it to be. supposed it has to do with the fact that I try to read the entire document in a buffer.
the critical part is in only one line, for each byte in the buffer (f.read()) I add the value of the key corresponding. I think this is not optimal but I don't think of another method at the moment.
for i, v in enumerate(buf):
output_buffer += bytes([(v+m*key[i%keylength])%256])
"m" is just a parameter that is set to 1 for encryption and -1 for decryption.
The file buffer "buf" and key are bytes, not strings.
If you have any suggestion on how to work with the buffer (not reading and putting into ram the entire document) or with the code for the actual encryption I would be glad to know about it.
Thanks in advance.
can the tensorflow read a file contain a normal images for example in JPG, .... or the tensorflow just read the .bin file contains images
what is the difference between .mat file and .bin file
Also when I rename the .bin file name to .mat, does the data of the file changed??
sorry maybe my language not clear because I cannot speak English very well
A file-name suffix is just a suffix (which sometimes help to get info about that file; e.g. Windows decides which tool is called when double-clicked). A suffix does not need to be correct. And of course, changing the suffix will not change the content.
Every format will need their own decoder. JPG, PNG, MAT and co.
To some extent, these are automatically used by reading out metadata (giving some assumptions!). Many image-tools have some imread-function which works for jpg and png, even if there is no suffix (because there is checking for common and supported image-formats).
I'm not sure what tensorflow does automatically, but:
jpg, png, bmp should be no problem
worst-case: use scipy to read and convert
mat is usually a matrix (with infinite different encodings) and often matlab-based
scipy can read many matlab-based formats
bin can be anything (usually stands for binary; no clear mapping like the above)
Don't get me wrong, but i expect someone trying to use tensorflow (not a small, not a simple tool) to know that changing a suffix should never magically transform the content to the new format (especially in the lossless/lossy case like png, jpg). I hope you evaluated this decision and you are not running blindly into using a popular tool.
A '.mat' file contains Matlab formatted Data (not matlab code like you would expect from a '.m' file). I'm not sure if you're even using Matlab since you didn't include the the tag in your question. '.mat' files are associated with matlab workspace; if you wanted to save your current workspace in Matlab, you would save it as a '.mat' file.
A '.bin' file is a binary file read by the computer. In general, executable (ready-to-run) programs are often identified as binary files. I think this is what you would want to use. I am unsure what you really want though because the wording of the question is difficult to understand and it seems like you have two questions here.
Changing the suffix of a file just changes what will run the file. For example, if I were to change test.txt to test.py, the data inside the text file remains the same, but the way the file is opened has changed. In this case, the file was a text file usually opened using Notepad (or some variation) then it was opened by python once changed. If you were to change a .jpg file to a txt file, you wouldn't be able to view it as a picture again, but instead, you would open a text file with a bunch of seemingly random characters which describe the picture. The picture data never changed, but the way you see it and are able to use it does.
Take a look at this website which describes the .bin extension pretty well. Also, a quick Google search goes a long way especially with questions like this.
I want to apply an AES 128b encryption (probably CBC + Padding) on a data stream.
In case it matters, I'm sending chunks of around 1500bits each.
I work in Python, and I did a small test with M2Crypto with AES encrypt in one side and decrypt at the other side. It works perfect, but probably don't really secures anything since I use the same key, same IVS and all that.
So, the question is: What the best approach for AES encryption on large data streams?
I thought about loading a new 'keys' file from time to time. Then, the application will use this file to expend and extract AES keys or something like that, but it still sounds awful to build a new AES object for each chunk, so there must be a better way.
I believe I can also use the IVS here, but not quite sure where and how.
Is there a way to encrypt files (.zip, .doc, .exe, ... any type of file) with Python?
I've looked at a bunch of crypto libraries for Python including pycrypto and ezpycrypto but as far as I see they only offer string encryption.
In Python versions prior to version 3.0, the read method of a file object will return a string, provide this string to the encryption library of your choice, the resulting string can be written to a file.
Keep in mind that on Windows-based operating systems, the default mode used when reading files may not accurately provide the contents of the file. I suggest that you be familiar with the nuances of file modes and how they behave on Windows-based OSes.
You can read the complete file into a string, encrypt it, write the encrypted string in a new file. If the file is too large, you can read in chunks.
Every time you .read from a file, you get a string (in Python < 3.0).
I need to be able to send encrypted data between a Ruby client and a Python server (and vice versa) and have been having trouble with the ruby-aes gem/library. The library is very easy to use but we've been having trouble passing data between it and the pyCrypto AES library for Python. These libraries seem to be fine when they're the only one being used, but they don't seem to play well across language boundaries. Any ideas?
Edit: We're doing the communication over SOAP and have also tried converting the binary data to base64 to no avail. Also, it's more that the encryption/decryption is almost but not exactly the same between the two (e.g., the lengths differ by one or there is extra garbage characters on the end of the decrypted string)
(e.g., the lengths differ by one or there is extra garbage characters on the end of the decrypted string)
I missed that bit. There's nothing wrong with your encryption/decryption. It sounds like a padding problem. AES always encodes data in blocks of 128 bits. If the length of your data isn't a multiple of 128 bits the data should be padded before encryption and the padding needs to be removed/ignored after encryption.
Turns out what happened was that ruby-aes automatically pads data to fill up 16 chars and sticks a null character on the end of the final string as a delimiter. PyCrypto requires you to do multiples of 16 chars so that was how we figured out what ruby-aes was doing.
It's hard to even guess at what's happening without more information ...
If I were you, I'd check that in your Python and Ruby programs:
The keys are the same (obviously). Dump them as hex and compare each byte.
The initialization vectors are the same. This is the parameter IV in AES.new() in pyCrypto. Dump them as hex too.
The modes are the same. The parameter mode in AES.new() in pyCrypto.
There are defaults for IV and mode in pyCrypto, but don't trust that they are the same as in the Ruby implementation. Use one of the simpler modes, like CBC. I've found that different libraries have different interpretations of how the mode complex modes, such as PTR, work.
Wikipedia has a great article about how block cipher modes.
Kind of depends on how you are transferring the encrypted data. It is possible that you are writing a file in one language and then trying to read it in from the other. Python (especially on Windows) requires that you specify binary mode for binary files. So in Python, assuming you want to decrypt there, you should open the file like this:
f = open('/path/to/file', 'rb')
The "b" indicates binary. And if you are writing the encrypted data to file from Python:
f = open('/path/to/file', 'wb')
f.write(encrypted_data)
Basically what Hugh said above: check the IV's, key sizes and the chaining modes to make sure everything is identical.
Test both sides independantly, encode some information and check that Ruby and Python endoded it identically. You're assuming that the problem has to do with encryption, but it may just be something as simple as sending the encrypted data with puts which throws random newlines into the data. Once you're sure they encrypt the data correctly, check that you receive exactly what you think you sent. Keep going step by step until you find the stage that corrupts the data.
Also, I'd suggest using the openssl library that's included in ruby's standard library instead of using an external gem.