Caeser cipher encryption from file in Python

Caeser cipher encryption from file in Python - python

My Caeser cipher works interactively in the shell with a string, but when I've tried to undertake separate programs to encrypt and decrypt I've run into problems, I don't know whether the input is not being split into a list or not, but the if statement in my encryption function is being bypassed and defaulting to the else statement that fills the list unencrypted. Any suggestions appreciated. I'm using FileUtilities.py from the Goldwasser book. That file is at http://prenhall.com/goldwasser/sourcecode.zip in chapter 11, but I don't think the problem is with that, but who knows. Advance thanks.
#CaeserCipher.py
class CaeserCipher:
def __init__ (self, unencrypted="", encrypted=""):
self._plain = unencrypted
self._cipher = encrypted
self._encoded = ""
def encrypt (self, plaintext):
self._plain = plaintext
plain_list = list(self._plain)
i = 0
final = []
while (i <= len(plain_list)-1):
if plain_list[i] in plainset:
final.append(plainset[plain_list[i]])
else:
final.append(plain_list[i])
i+=1
self._encoded = ''.join(final)
return self._encoded
def decrypt (self, ciphertext):
self._cipher = ciphertext
cipher_list = list(self._cipher)
i = 0
final = []
while (i <= len(cipher_list)-1):
if cipher_list[i] in cipherset:
final.append(cipherset[cipher_list[i]])
else:
final.append(cipher_list[i])
i+=1
self._encoded = ''.join(final)
return self._encoded
def writeEncrypted(self, outfile):
encoded_file = self._encoded
outfile.write('%s' %(encoded_file))
#encrypt.py
from FileUtilities import openFileReadRobust, openFileWriteRobust
from CaeserCipher import CaeserCipher
caeser = CaeserCipher()
source = openFileReadRobust()
destination = openFileWriteRobust('encrypted.txt')
caeser.encrypt(source)
caeser.writeEncrypted(destination)
source.close()
destination.close()
print 'Encryption completed.'

caeser.encrypt(source)
into
caeser.encrypt(source.read())
source is a file object - the fact that this code "works" (by not encrypting anything) is interesting - turns out that you call list() over the source before iterating and that turns it into a list of lines in the file. Instead of the usual result of list(string) which is a list of characters. So when it tries to encrypt each chracter, it finds a whole line that doesn't match any of the replacements you set.
Also like others pointed out, you forgot to include plainset in the code, but that doesn't really matter.
A few random notes about your code (probably nitpicking you didn't ask for, heh)
You typo'd "Caesar"
You're using idioms which are inefficient in python (what's usually called "not pythonic"), some of which might come from experience with other languages like C.
Those while loops could be for item in string: - strings already work as lists of bytes like what you tried to convert.
The line that writes to outfile could be just outfile.write(self._encoded)
Both functions are very similar, almost copy-pasted code. Try to write a third function that shares the functionality of both but has two "modes", encrypt and decrypt. You could just make it work over cipher_list or plain_list depending on the mode, for example
I know you're doing this for practice but the standard library includes these functions for this kind of replacements. Batteries included!
Edit: if anyone is wondering what those file functions do and why they work, they call raw_input() inside a while loop until there's a suitable file to return. openFileWriteRobust() has a parameter that is the default value in case the user doesn't input anything. The code is linked on the OP post.

Some points:
Using a context manager (with) makes sure that files are closed after being read or written.
Since the caesar cipher is a substitution cipher where the shift parameter is the key, there is no need for a separate encrypt and decrypt member function: they are the same but with the "key" negated.
The writeEncrypted method is but a wrapper for a file's write method. So the class has effectively only two methods, one of which is __init__.
That means you could easily replace it with a single function.
With that in mind your code can be replaced with this;
import string
def caesartable(txt, shift):
shift = int(shift)
if shift > 25 or shift < -25:
raise ValueError('illegal shift value')
az = string.ascii_lowercase
AZ = string.ascii_uppercase
eaz = az[-shift:]+az[:-shift]
eAZ = AZ[-shift:]+AZ[:-shift]
tt = string.maketrans(az + AZ, eaz + eAZ)
return tt
enc = caesartable(3) # for example. decrypt would be caesartable(-3)
with open('plain.txt') as inf:
txt = inf.read()
with open('encrypted.txt', 'w+') as outf:
outf.write(txt.translate(enc))
If you are using a shift of 13, you can use the built-in rot13 encoder instead.

It isn't obvious to me that there will be anything in source after the call to openFileReadRobust(). I don't know the specification for openFileReadRobust() but it seems like it won't know what file to open unless there is a filename given as a parameter, and there isn't one.
Thus, I suspect source is empty, thus plain is empty too.
I suggest printing out source, plaintext and plain to ensure that their values are what you expect them to be.
Parenthetically, the openFileReadRobust() function doesn't seem very helpful to me if it can return non-sensical values for non-sensical parameter values. I very much prefer my functions to throw an exception immediately in that sort of circumstance.

Related

Reading CSV into Dict - Works in REPL, Not in Class

I'm using read_csv from pandas to read in quite a large file - some may be familiar with - its the urban dictionary words and definitions. I have no problem doing this in the REPL, but when coding it into a class and an actual file, iterating over the "list of dictionaries" and accessing the key by name ... it returns NaN of type float see debug info
I have to assume its something with putting it in a class? Maybe I should stick to functions? Anyone help a nub amateur out and clue me in as to what th is going on here? P.S. Rather than take a parameter and read that file in every time, I'm going to just return the dictionary, but this would remain an issue. Here's the code:
def slang(self, term):
"""Attempts to define any given slang words or terms"""
urban = {}
doc = self.folder + '\\urbandict.csv'
data = read_csv(doc, on_bad_lines='skip')
recs = data.to_dict('records')
for rec in recs:
urban[rec['word'].lower()] = rec['definition']
if urban.get(term.lower()) is None:
messagebox.showinfo(
title='No Results', message='Search could not find %s' % term)
else:
meaning = urban.get(term.lower())
messagebox.showinfo(title='Definition', message=meaning)
P.P.S. The folder 📂 it's mapped to is correct... the file contents reads in fine. And the .lower() function isn't the cause, but it is referenced in the traceback. Normally, it'd be a string and it'd work fine. Just weird how it works in REPL but not in the file.

I'm doing a hash verification project, and after I store the hash, and want to verify it, it returns false after I check it against the new hash

I create a hash, and store it in a .txt file (will encrypt it later), but when I create a new hash and check against the stored one, It returns false even though the hashes are the exact same. Help?
Relevant code: `
def testHash(expectedHash, testHash):
try:
if expectedHash == testHash:
return True
else:
return False
except Exception as err:
print("Error, unkonwn if valid hash.")
print(f"Error: {err}")
return False
`
Here's the code to store the hash:
def storeHash(hash):
stored_hashes = open("storedHashes.txt", "a")
stored_hashes.write(hash+"\n")
stored_hashes.close()
And here's the code to get the stored hash:
def getHash(index):
stored_hashes = open("storedHashes.txt", "r+")
read_hash = stored_hashes.readlines(index)
clean_read_hash = ' '.join(read_hash) #required to remove the [] and \n
return clean_read_hash
Note: it returns no errors, but does not return True either.
Complete code can be found here: https://pastebin.com/2XpbmXaP
password: hashVerify
I know the file exertion on line 18 is .hs, that was a side track that led noewere, it should be .txt
Fun little fact, Windows and other OS's allow any file extension, and just use the things after the . if they don't know what it is as the type.
Note: the content of the testFile is "test for hashes"

It's hard to say for certain since you only define the functions and don't show the logic of calling them. But it looks like you just need to fix your call to stored_hashes.readlines. Change stored_hashes.readlines(index) to stored_hashes.readlines()[index]; as mentioned in a comment, the parameter accepted by stored_hashes.readlines (which returns a list, by the way) is not the index of the list but rather the total number of characters to allow to be read. That means, with your current code, you are always comparing the "test hash" to a list of every hash in your text file.

Understanding a custom encryption method in python

As part of an assignment I've been given some code written in python that was used to encrypt a message, and I have to try and understand the code and decrypt the ciphertext. I've never used python before and am somewhat out of my depth.
I understand most of it and the overall gist of what the code is trying to accomplish, however there are a few lines near the end tripping me up. Here's the entire thing (the &&& denotes sections of code which are supposed to be "damaged", while testing the code I've set secret to "test" and count to 3):
import string
import random
from base64 import b64encode, b64decode
secret = '&&&&&&&&&&&&&&' # We don't know the original message or length
secret_encoding = ['step1', 'step2', 'step3']
def step1(s):
_step1 = string.maketrans("zyxwvutsrqponZYXWVUTSRQPONmlkjihgfedcbaMLKJIHGFEDCBA","mlkjihgfedcbaMLKJIHGFEDCBAzyxwvutsrqponZYXWVUTSRQPON")
return string.translate(s, _step1)
def step2(s): return b64encode(s)
def step3(plaintext, shift=4):
loweralpha = string.ascii_lowercase
shifted_string = loweralpha[shift:] + loweralpha[:shift]
converted = string.maketrans(loweralpha, shifted_string)
return plaintext.translate(converted)
def make_secret(plain, count):
a = '2{}'.format(b64encode(plain))
for count in xrange(count):
r = random.choice(secret_encoding)
si = secret_encoding.index(r) + 1
_a = globals()[r](a)
a = '{}{}'.format(si, _a)
return a
if __name__ == '__main__':
print make_secret(secret, count=&&&)
Essentially, I assume the code is meant to choose randomly from the three encryption methods step1, step2 and step3, then apply them to the cleartext a number or times as governed by whatever the value of "count" is.
The "make_secret" method is the part that's bothering me, as I'm having difficulty working out how it ties everything together and what the overall purpose of it is. I'll go through it line by line and give my reasons on each part, so someone can correct me if I'm mistaken.
a = '2{}'.format(b64encode(plain))
This takes the base64 encoding of whatever the "plain" variable corresponds to and appends a 2 to the start of it, resulting in something like "2VGhpcyBpcyBhIHNlY3JldA==" using "this is a secret" for plain as a test. I'm not sure what the 2 is for.
r = random.choice(secret_encoding)
si = secret_encoding.index(r) + 1
r is a random selection from the secret_encoding array, while si corresponds to the next array element after r.
_a = globals()[r](a)
This is one of the parts that has me stumped. From researching global() it seems that the intention here is to turn "r" into a global dictionary consisting of the characters found in "a", ie somewhere later in the code a's characters will be used as a limited character set to choose from. Is this correct or am I way off base?
I've tried printing _a, which gives me what appears to be the letters and numbers found in the final output of the code.
a = '{}{}'.format(si, _a)
It seems as if this is creating a string which is a concatenation of the si and _a variables, however I'll admit I don't understand the purpose of doing this.
I realize this is a long question, but I thought it would be best to put the parts that are bothering me into context.

I will refrain from commenting on the readability of the code. I daresay
it was all intentional, anyway, for purposes of obfuscation. Your
professor is an evil bastard and I want to take his or her course :)
r = random.choice(secret_encoding)
...
_a = globals()[r](a)
You're way off base. This is essentially an ugly and hard-to-read way to
randomly choose one of the three functions and run it on a. The
function globals() returns a dict that maps names to identifiers; it
includes the three functions and other things. globals()[r] looks up
one of the three functions based on the name r. Putting (a) after
that runs the function with a as the argument.
a = '{}{}'.format(si, _a)
The idea here is to prepend each interim result with the number of the
function that encrypted it, so you know which function you need to
reverse to decrypt that step. They all accumulate at the beginning, and
get encrypted and re-encrypted with each step, except for the last one.
a = '2{}'.format(b64encode(plain))
Essentially, this is applying step2 first. Each encryption with
step2 prepends a 2.
So, the program applies count encryptions to the plaintext, with each
step using a randomly-chosen transformation, and the choice appears in
plaintext before the ciphertext. Your task is to read each prepended
number and apply the inverse transformation to the rest of the message.
You stop when the first character is not in "123".
One problem I see is that if the plaintext begins with a digit in
"123", it will look like we should perform another decryption step. In
practice, however, I feel sure that the professor's choice of plaintext
does not begin with such a digit (unless they're really evil).

AES Encryption pycrypto and salting

I don't have a problem with my code but I don't understand the different arguments you can use with pycrypto and AES encryption. so where I define my encryptor below, what is mode, and IV? the tutorial I found this on didn't really help me understand it. I have it working properly but I want to understand that the arguments are.
so Question #1: What are the arguments associated with defining a encryptor with pycrpto?
Question #2 is this an appropriate salting method for the encryption. I'm using a very long randomized ascii string, then converting it to a 256bit sha then using that to do AES encryption on the information, then I base64 encode and insert into the database.
def pad(string):
return string + ((16-len(string) % 16) * '{' )
password = hashlib.sha256("").digest()
IV = 16 * '\x00'
mode = AES.MODE_CBC
encryptor = AES.new(password, mode, IV=IV)
encrypted_customer_name = encryptor.encrypt(pad(customer_name))
encoded_ecryption_name = base64.b64encode(encrypted_customer_name)
customer_name = base64.b64decode(customer_name)
decryptor = AES.new(password, mode, IV=IV)
customer_name = decryptor.decrypt(customer_name)
lenofdec = customer_name.count('{')
customer_name = customer_name[:len(customer_name)-lenofdec]
My code isn't in that order but I didn't include all of the code just the relevant parts.

Ok I'm going to do my best here to answer these questions!
Q1:
Ok it looks like the signature is
new(key, *args, **kwargs)
The first argument key is pretty self explanatory, but after that you notice that it can take a number of keyword arguments.
It seems that it can take:
mode: The cypher mode (these are as follows. Look on wikipedia for definitions)
MODE_ECB = 1 Electronic Code Book (ECB). See blockalgo.MODE_ECB.
MODE_CBC = 2 Cipher-Block Chaining (CBC). See blockalgo.MODE_CBC.
MODE_CFB = 3 Cipher FeedBack (CFB). See blockalgo.MODE_CFB.
MODE_PGP = 4 This mode should not be used.
MODE_OFB = 5 Output FeedBack (OFB). See blockalgo.MODE_OFB.
MODE_CTR = 6 CounTer Mode (CTR). See blockalgo.MODE_CTR.
MODE_OPENPGP = 7 OpenPGP Mode. See blockalgo.MODE_OPENPGP.
IV: the salt (you seem to already understand this)
From here on out the options seem to be based on the specific mode you are using
counter: A function that returns the next block of data (not normally used). From the docs:
(Only MODE_CTR). A stateful function that returns the next counter block, which is a byte string of block_size bytes
segment_size is the size of the segment in CFB mode
The Pycrypto docs
Q2: Is this a good method for salting your encryption?
First what is the salt for? I find that this is a very common question that people ask, I mean we already have a password, why else would we need a key?
The answer makes a lot of sense when you talk about passwords. Lets say my password is banana, when we write this to a password file we would send it through a hash algorithm and get 5a814... (sha256).
Now next time someone tries to use the password banana they get the same hash. Any one with permissions to the file can then look and see that the passwords are the same. This is where the salt comes in. If I append a random salt before running through the hash algorithm then the hash will come out different every time, even if the passwords are the same. This makes your system WAY more secure.
Alright now for your code:
First, congrats you are calling the function correctly! But... Your code sets IV = 16 * '\x00' this is not a very good salt at all. I would recommend using os.urandom(16) to generate high quality entropy (uses system entropy) and place the output in your code. It is common practice to write the salt into the beginning of the the encrypted content.
This is tricky to say without knowing what you are attempting to do with code, but let me explain with an example:
# Get User password
MODE = AES.MODE_CBC
def encrypt(msg, password):
salt = os.urandom(16)
password = sha256(password)
crypter = AES.new(password, mode=MODE, IV=salt)
return "{}:{}".format(salt, crypter.encrypt(msg))
def decrypt(enc, password):
salt, content = enc.split(':')
password = sha256(password)
crypter = AES.new(password, mode=MODE, IV=salt)
return crypter.decrypt(content)
I hope this was helpful! Happy Coding!

Python trying to write and read class from file but something went horribly wrong

Considering this is only for my homework I don't expect much help but I just can't figure this out and honestly I can't get my head around what's going wrong. Usually I have an idea where the problem is but now I just don't get it.
Long story short: I'm trying to create a valid looking telephone number within a class and then loading it onto an array or list then later on save all of them as string into a folder. When I start the program again I want it to read the file and re-create my class and load it back into the list. (Basically a very simple repository).
Problem is even though I evaluate the stored phone number in the exact same way I validate it as input data ... I get an error which makes no sens.
Another small problem is the fact that when I re-use the data for some reason it creates white spaces in the file which in turn messes my program up badly.
Here I validate phone numbers:
def validateTel(call_ID):
if isinstance (call_ID, str) == True:
call_ID = call_ID.replace (" ", "")
if (len (call_ID) != 10):
print ("Telephone numbers are 10 digits long")
return False
for item in call_ID:
try:
int(item)
except:
print ("Telephone numbers should contain non-negative digits")
return False
else:
if (int(item) < 0):
print ("Digits are non-negative")
After this I use it and other non-relevant (to this discussion) data to create an object (class instance) and move them to a list.
Inside my class I have a load from string and a load to string. What they do is take everything from my class object so I can write it to a file using "+" as a separator so I can use string.split("+") and write it to a file. This works nicely, but when I read it ... well it's not working.
def load_data():
f = open ("data.txt", "r")
ch = f.read()
contact = agenda.contact () # class object
if ch in (""," ","None"," None"):
f.close()
return [] # if the file is empty or has None in some way I pass an empty stack
else:
stack = [] # the list where I load all my class objects
f.seek(0,0)
for line in f:
contact.loadFromString(line) # explained bellow
stack.append(deepcopy(contact))
f.close()
return stack
In loadFromString(line) all I do is validate the line (see if the data inside it at least looks OK).
Now here is the place where I validate the string I just read from the file:
def validateString (load_string):
string = string.split("+")
if len (string) != 4:
print ("System error in loading from file: Program skipping segment of corrupt data")
return False
if string[0] == "" or string[0] == " " or string[0] == None or string[0] == "None" or string[0] == " None":
print ("System error in loading from file: Name field cannot be empty")
try:
int(string[1])
except:
print("System error in loading from file: ID is not integer")
return False
if (validateTel(str(string[2])) == False):
print ("System error in loading from file: Call ID (telephone number)")
return False
return True
Small recap:
I try to load the data from file using loadFromString(). The only relevant thing that does is it tries to validate my data with validateString(string) in there the only thing that messes me up is the validateTel. But my input data gets validated in the same way my stored data does. They are perfectly identical but it gives a "System error" BUT to give such an error it should have also gave an error in the validate sub-program but it doesn't.
I hope this is enough info because my program is kinda big (for me any way) however the bug should be here somewhere.
I thank anyone brave enough to sift trough this mess.
EDIT:
The class is very simple, it looks like this:
class contact:
def __init__ (self, name = None, ID = None, tel = None, address = None):
self.__name = name
self.__id = ID
self.__tel = tel
self.__address = address
After this I have a series of setters and getters (to modify contacts and to return parts of the abstract data)
Here I also have my loadFromString and loadToString but those work just fine (except maybe they cause a small jump after each line (an empty line) which then kills my program, but that I can deal with)
My problem is somewhere in the validate or a way the repository interacts with it. The point is that even if it gives an error in the loading of the data, first the validate should print an error ... but it doesn't -_-

You said I just can't figure this out and honestly I can't get my head around what's going wrong. I think this is a great quote which sums up a large part of programming and software development in general -- dealing with crazy, weird problems and spending a lot of time trying to wrap your head around them.
Figuring out how to turn ridiculously complicated problems into small, manageable problems is the hardest part of programming, but also arguably the most important and valuable.
Here's some general advice which I think might help you:
use meaningful names for functions and variables (validateString doesn't tell me anything about what the function does; string tells me nothing about the meaning of its contents)
break down problems into small, well-defined pieces
specify your data -- what is a phone number? 10 positive digits, no spaces, no punctuation?
document/comment the input/output from functions if it's not obvious
Specific suggestions:
validateTel could probably be replaced with a simple regular expression match
try using json for serialization
if you're using json, then it's easy to use lists. I would strongly recommend this over using + as a separator -- that looks highly questionable to me
Example: using a regex
import re
def validateTel(call_ID):
phoneNumberRegex = re.compile("^\d{10}$") # match a string of 10 digits
return phoneNumberRegex.match(call_ID)
Example: using json
import json
phoneNumber1, phoneNumber2, phoneNumber3 = ... whatever ...
mylist = [phoneNumber1, phoneNumber2, phoneNumber3]
print json.dumps(mylist)

For starters, don't name your variables after reserved keywords. Instead of calling it string, call it telNumber or s or my_string.
def validateString (my_string):
working_string= my_string.split("+")
if len (working_string) != 4:
print ("System error in loading from file: Program skipping segment of corrupt data")
return False
The next line I don't really get - what is this If chain for? Accounting for bad data or something? Probably better to check for good data; bad data can come in infinite variety.
if working_string[0] == "" or working_string[0] == " " or working_string[0] == None or working_string[0] == "None" or string[0] == " None":
print ("System error in loading from file: Name field cannot be empty")
try:
int(string[1])
except:
print("System error in loading from file: ID is not integer")
return False
if (validateTel(str(working_string[2])) == False):
print ("System error in loading from file: Call ID (telephone number)")
return False
return True
Also, to give you a hint - you may want to look into regular expressions.

wow - many problems maybe connected to your problem, also as I commented - I suspect your problem is with turning the telnumber object to string.
f is is file object it won't be equal to anything. if you want to check if the file exists you should just do try /except around the file creation block. like:
try:
f = open ('data.txt','r') #also could call c=f.read() and check if c equals to none.. not really needed because you can cover an empty file in the next part iterating over f
except:
return
for line in f:
all sorts of stuff
return stack
don't use string reserved word and checking with negative numbers is very strange -is this part of the homeework? and why check by turning to int? this could also break your code - since the rest is a string.
all that said - I still suspect your main problem is with the way you turning the object into string data, It would never remain an instance of unless you used json/pickle/something else to strigfy. an object instance isn't just the class str.
and another thing - keep it simple, python is (also) about elegent and simple coding and you are trying to throw brute force with everything you know at a simple problem. focus, relax and rewrite the program.

I don't perceive all the logic.
For the moment , I can say you that you should correct the code of load_data() as follows:
def load_data():
f = open ("data.txt", "r")
ch = f.read()
contact = agenda.contact () # class object
if ch in (""," ","None"," None"):
f.close()
return [] # if the file is empty or has None in some way I pass an empty stack
else:
stack = [] # the list where I load all my class objects
f.seek(0,0)
for line in f:
contact.loadFromString(line) # explained bellow
stack.append(deepcopy(contact))
f.close()
return stack
I don't see how the file-like handler f could ever have a value None or string, so I think you want to test the content of the file -> f.read()
But then, the file's pointer is at its end and must be moved back to the start -> seek(0,0)
I will progressively add complementing considerations when I will understand more the problem.
edit
To test if a file is empty or not
import os.path
if os.path.isfile(filepath) and os.path.getsize(filepath):
.........
If the file with path filepath doesn't exist, getsize() raises an error. So the preliminary test if os.path.isfile() is necessary to avoid the second condition test to be evaluated.
But if your file can contain the strings "None" or " None" (really ? !), the getsize() will return 4 or 5 in this case.
You should avoid to manage with files containing these kinds of useless data.
edit
In validateTel(), after the instruction if isinstance (call_ID, str) == True: you are sure that call_ID is a string. Then the iteration for item in call_ID: will produce only item being ONE character long, hence it's useless to test if (int(item) < 0): , it will never happen; it could be possible that there is a sign - in the string call_ID but you won't detect it with this last condition.
In fact, as you test each character of callID, it is enough to test if it is one of the digits 0,1,2,3,4,5,6,7,8,9. If the sign - is in calml_ID, it will be detected as not being a digit.
To test if all the character in call_ID, there's an easy way provided by Python: all()
def validateTel(call_ID):
if isinstance(call_ID, str):
call_ID = call_ID.replace (" ", "")
if len(call_ID) != 10:
print ("A telephone number must be 10 digits long")
return False
else:
print ("Every character in a telephone number must be a digit")
return all(c in '0123456789' for c in call_ID)
else:
print ("call_ID must be a string")
return False
If one of the character c in call_ID isn't a digit, c in '0123456789' is False and the function all() stop the exam of the following characters and returns False; otherwise, it returns True

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.