I'm using read_csv from pandas to read in quite a large file - some may be familiar with - its the urban dictionary words and definitions. I have no problem doing this in the REPL, but when coding it into a class and an actual file, iterating over the "list of dictionaries" and accessing the key by name ... it returns NaN of type float see debug info
I have to assume its something with putting it in a class? Maybe I should stick to functions? Anyone help a nub amateur out and clue me in as to what th is going on here? P.S. Rather than take a parameter and read that file in every time, I'm going to just return the dictionary, but this would remain an issue. Here's the code:
def slang(self, term):
"""Attempts to define any given slang words or terms"""
urban = {}
doc = self.folder + '\\urbandict.csv'
data = read_csv(doc, on_bad_lines='skip')
recs = data.to_dict('records')
for rec in recs:
urban[rec['word'].lower()] = rec['definition']
if urban.get(term.lower()) is None:
messagebox.showinfo(
title='No Results', message='Search could not find %s' % term)
else:
meaning = urban.get(term.lower())
messagebox.showinfo(title='Definition', message=meaning)
P.P.S. The folder 📂 it's mapped to is correct... the file contents reads in fine. And the .lower() function isn't the cause, but it is referenced in the traceback. Normally, it'd be a string and it'd work fine. Just weird how it works in REPL but not in the file.
Related
Currently, I am working on a Boot Sequence in Python for a larger project. For this specific part of the sequence, I need to access a .JSON file (specs.json), establish it as a dictionary in the main program. I then need to take a value from the .JSON file, and add 1 to it, using it's key to find the value. Once that's done, I need to push the changes to the .JSON file. Yet, every time I run the code below, I get the error:
bootNum = spcInfDat['boot_num']
KeyError: 'boot_num'`
Here's the code I currently have:
(Note: I'm using the Python json library, and have imported dumps, dump, and load.)
# Opening of the JSON files
spcInf = open('mki/data/json/specs.json',) # .JSON file that contains the current system's specifications. Not quite needed, but it may make a nice reference?
spcInfDat = load(spcInf)
This code is later followed by this, where I attempt to assign the value to a variable by using it's dictionary key (The for statement was a debug statement, so I could visibly see the Key):
for i in spcInfDat['spec']:
print(CBL + str(i) + CEN)
# Loacting and increasing the value of bootNum.
bootNum = spcInfDat['boot_num']
print(str(bootNum))
bootNum = bootNum + 1
(Another Note: CBL and CEN are just variables I use to colour text I send to the terminal.)
This is the interior of specs.json:
{
"spec": [
{
"os":"name",
"os_type":"getwindowsversion",
"lang":"en",
"cpu_amt":"cpu_count",
"storage_amt":"unk",
"boot_num":1
}
]
}
I'm relatively new with .JSON files, as well as using the Python json library; I only have experience with them through some GeeksforGeeks tutorials I found. There is a rather good chance that I just don't know how .JSON files work in conjunction with the library, but I figure that it would still be worth a shot to check here. The GeeksForGeeks tutorial had no documentation about this, as well as there being minimal I know about how this works, so I'm lost. I've tried searching here, and have found nothing.
Issue Number 2
Now, the prior part works. But, when I attempt to run the code on the following lines:
# Changing the values of specDict.
print(CBL + "Changing values of specDict... 50%" + CEN)
specDict ={
"os":name,
"os_type":ost,
"lang":"en",
"cpu_amt":cr,
"storage_amt":"unk",
"boot_num":bootNum
}
# Writing the product of makeSpec to `specs.json`.
print(CBL + "Writing makeSpec() result to `specs.json`... 75%" + CEN)
jsonobj = dumps(specDict, indent = 4)
with open('mki/data/json/specs.json', "w") as outfile:
dump(jsonobj, outfile)
I get the error:
TypeError: Object of type builtin_function_or_method is not JSON serializable.
Is there a chance that I set up my dictionary incorrectly, or am I using the dump function incorrectly?
You can show the data using:
print(spcInfData)
This shows it to be a dictionary, whose single entry 'spec' has an array, whose zero'th element is a sub-dictionary, whose 'boot_num' entry is an integer.
{'spec': [{'os': 'name', 'os_type': 'getwindowsversion', 'lang': 'en', 'cpu_amt': 'cpu_count', 'storage_amt': 'unk', 'boot_num': 1}]}
So what you are looking for is
boot_num = spcInfData['spec'][0]['boot_num']
and note that the value obtained this way is already an integer. str() is not necessary.
It's also good practice to guard against file format errors so the program handles them gracefully.
try:
boot_num = spcInfData['spec'][0]['boot_num']
except (KeyError, IndexError):
print('Database is corrupt')
Issue Number 2
"Not serializable" means there is something somewhere in your data structure that is not an accepted type and can't be converted to a JSON string.
json.dump() only processes certain types such as strings, dictionaries, and integers. That includes all of the objects that are nested within sub-dictionaries, sub-arrays, etc. See documentation for json.JSONEncoder for a complete list of allowable types.
I'm working on a GUI editor for a propriety config format. Basically the editor will parse the config file, display the object properties so that users can edit from GUI and then write the objects back to the file.
I've got the parse - edit - write part done, except for:
The parsed data structure only include object properties information, so comments and whitespaces are lost on write
If there is any syntax error, the rest of the file is skipped
How would you address these issues? What is the usual approach to this problem? I'm using Python and Parsec module https://pythonhosted.org/parsec/documentation.html, however and help and general direction is appreciated.
I've also tried Pylens (https://pythonhosted.org/pylens/), which is really close to what I need except it can not skip syntax errors.
You asked about typical approaches to this problem. Here are two projects which tackle similar challenges to the one you describe:
sketch-n-sketch: "Direct manipulation" interface for vector images, where you can either edit the image-describing source language, or edit the image it represents directly and see those changes reflected in the source code. Check out the video presentation, it's super cool.
Boomerang: Using lenses to "focus" on the abstract meaning of some concrete syntax, alter that abstract model, and then reflect those changes in the original source.
Both projects have yielded several papers describing the approaches their authors took. As far as I can tell, the Lens approach is popular, where parsing and printing become the get and put functions of a Lens which takes a some source code and focuses on the abstract concept which that code describes.
Eventually I ran out of research time and have to settle with a rather manual skipping. Basically each time the parser fail we try to advance the cursor one character and repeat. Any parts skipped by the process, regardless of whitespace/comment/syntax error is dump into a Text structure. The code is quite reusable, except for the part you have to incorporate it to all the places with repeated results and the original parser may fail.
Here's the code, in case it helps anyone. It is written for Parsy.
class Text(object):
'''Structure to contain all the parts that the parser does not understand.
A better name would be Whitespace
'''
def __init__(self, text=''):
self.text = text
def __repr__(self):
return "Text(text='{}')".format(self.text)
def __eq__(self, other):
return self.text.strip() == getattr(other, 'text', '').strip()
def many_skip_error(parser, skip=lambda t, i: i + 1, until=None):
'''Repeat the original `parser`, aggregate result into `values`
and error in `Text`.
'''
#Parser
def _parser(stream, index):
values, result = [], None
while index < len(stream):
result = parser(stream, index)
# Original parser success
if result.status:
values.append(result.value)
index = result.index
# Check for end condition, effectively `manyTill` in Parsec
elif until is not None and until(stream, index).status:
break
# Aggregate skipped text into last `Text` value, or create a new one
else:
if len(values) > 0 and isinstance(values[-1], Text):
values[-1].text += stream[index]
else:
values.append(Text(stream[index]))
index = skip(stream, index)
return Result.success(index, values).aggregate(result)
return _parser
# Example usage
skip_error_parser = many_skip_error(original_parser)
On other note, I guess the real issue here is I'm using a parser combinator library instead of a proper two stages parsing process. In traditional parsing, the tokenizer will handle retaining/skipping any whitespace/comment/syntax error, making them all effectively whitespace and are invisible to the parser.
My python application is a pre-processor; my outputs are the inputs of another application. The other application has various multi-line section headers that have subtle changes dependent on the version. My application fills the sections.
My current "printer" defines several multiline strings with format substitutions. The print statements will be replaced with write ones:
def generate():
header="""
Multi-line header
Symbols that form a logo
changes with version
"""
anotherSection="""
this section header
is shorter
(
{myFiller}
)
"""
print header
print anotherSection.format(myFiller = "{}{}{}".format(myFiller.content))
print anotherSection2.format(myFiller = [x.strip() for x in myFiller.content2])
print anotherSection3.format(myFiller = myFiller3.content)
"myFiller" is a call to a class with raw data that may need to be formatted based on the section.
Something doesn't feel right about having a bunch of multi-line strings clouding the actual "printing". The .format(.format) stuff may get a little difficult to read as well.
Any thoughts on how to structure this?
I've been toying with a dictionary of section strings but I didn't like the print statement interface (not that I'm sold on individual print statements:
format_requirements={'header': """.....""",'section':"""...."""}
print format_requirements['header'].format(....)
The version I"m leaning towards uses functions for each section string definition. This is nice for version changes but I'm not sure it's the way to go (I'm mainly concerned with convoluting my generate function with a bunch of other functions):
def header(version):
header=""
if version == v1:
header="""...."""
if version ==v2:
header="""..."""
return header
def section(type):
section="""...."""
return section
print header(v1).format(....)
print section
I've also played with classes, iterating over list, and a few other things. After reading the style guide (nudged me into the .format() direction) and various posts I've come here for direction on something more pythonic and legible to others.
I'm open to any guidance and ways to structure this.
Thanks!
I would go with a class that keeps the raw input in ._data attributes and has methods to output the correctly formatted strings for each "version"
class MyStringFormatter(object):
def __init__(header=None, content=None):
self._data = {'header': header, 'content': content}
def version_one(self):
# format strings here
return stringv1
My Caeser cipher works interactively in the shell with a string, but when I've tried to undertake separate programs to encrypt and decrypt I've run into problems, I don't know whether the input is not being split into a list or not, but the if statement in my encryption function is being bypassed and defaulting to the else statement that fills the list unencrypted. Any suggestions appreciated. I'm using FileUtilities.py from the Goldwasser book. That file is at http://prenhall.com/goldwasser/sourcecode.zip in chapter 11, but I don't think the problem is with that, but who knows. Advance thanks.
#CaeserCipher.py
class CaeserCipher:
def __init__ (self, unencrypted="", encrypted=""):
self._plain = unencrypted
self._cipher = encrypted
self._encoded = ""
def encrypt (self, plaintext):
self._plain = plaintext
plain_list = list(self._plain)
i = 0
final = []
while (i <= len(plain_list)-1):
if plain_list[i] in plainset:
final.append(plainset[plain_list[i]])
else:
final.append(plain_list[i])
i+=1
self._encoded = ''.join(final)
return self._encoded
def decrypt (self, ciphertext):
self._cipher = ciphertext
cipher_list = list(self._cipher)
i = 0
final = []
while (i <= len(cipher_list)-1):
if cipher_list[i] in cipherset:
final.append(cipherset[cipher_list[i]])
else:
final.append(cipher_list[i])
i+=1
self._encoded = ''.join(final)
return self._encoded
def writeEncrypted(self, outfile):
encoded_file = self._encoded
outfile.write('%s' %(encoded_file))
#encrypt.py
from FileUtilities import openFileReadRobust, openFileWriteRobust
from CaeserCipher import CaeserCipher
caeser = CaeserCipher()
source = openFileReadRobust()
destination = openFileWriteRobust('encrypted.txt')
caeser.encrypt(source)
caeser.writeEncrypted(destination)
source.close()
destination.close()
print 'Encryption completed.'
caeser.encrypt(source)
into
caeser.encrypt(source.read())
source is a file object - the fact that this code "works" (by not encrypting anything) is interesting - turns out that you call list() over the source before iterating and that turns it into a list of lines in the file. Instead of the usual result of list(string) which is a list of characters. So when it tries to encrypt each chracter, it finds a whole line that doesn't match any of the replacements you set.
Also like others pointed out, you forgot to include plainset in the code, but that doesn't really matter.
A few random notes about your code (probably nitpicking you didn't ask for, heh)
You typo'd "Caesar"
You're using idioms which are inefficient in python (what's usually called "not pythonic"), some of which might come from experience with other languages like C.
Those while loops could be for item in string: - strings already work as lists of bytes like what you tried to convert.
The line that writes to outfile could be just outfile.write(self._encoded)
Both functions are very similar, almost copy-pasted code. Try to write a third function that shares the functionality of both but has two "modes", encrypt and decrypt. You could just make it work over cipher_list or plain_list depending on the mode, for example
I know you're doing this for practice but the standard library includes these functions for this kind of replacements. Batteries included!
Edit: if anyone is wondering what those file functions do and why they work, they call raw_input() inside a while loop until there's a suitable file to return. openFileWriteRobust() has a parameter that is the default value in case the user doesn't input anything. The code is linked on the OP post.
Some points:
Using a context manager (with) makes sure that files are closed after being read or written.
Since the caesar cipher is a substitution cipher where the shift parameter is the key, there is no need for a separate encrypt and decrypt member function: they are the same but with the "key" negated.
The writeEncrypted method is but a wrapper for a file's write method. So the class has effectively only two methods, one of which is __init__.
That means you could easily replace it with a single function.
With that in mind your code can be replaced with this;
import string
def caesartable(txt, shift):
shift = int(shift)
if shift > 25 or shift < -25:
raise ValueError('illegal shift value')
az = string.ascii_lowercase
AZ = string.ascii_uppercase
eaz = az[-shift:]+az[:-shift]
eAZ = AZ[-shift:]+AZ[:-shift]
tt = string.maketrans(az + AZ, eaz + eAZ)
return tt
enc = caesartable(3) # for example. decrypt would be caesartable(-3)
with open('plain.txt') as inf:
txt = inf.read()
with open('encrypted.txt', 'w+') as outf:
outf.write(txt.translate(enc))
If you are using a shift of 13, you can use the built-in rot13 encoder instead.
It isn't obvious to me that there will be anything in source after the call to openFileReadRobust(). I don't know the specification for openFileReadRobust() but it seems like it won't know what file to open unless there is a filename given as a parameter, and there isn't one.
Thus, I suspect source is empty, thus plain is empty too.
I suggest printing out source, plaintext and plain to ensure that their values are what you expect them to be.
Parenthetically, the openFileReadRobust() function doesn't seem very helpful to me if it can return non-sensical values for non-sensical parameter values. I very much prefer my functions to throw an exception immediately in that sort of circumstance.
Considering this is only for my homework I don't expect much help but I just can't figure this out and honestly I can't get my head around what's going wrong. Usually I have an idea where the problem is but now I just don't get it.
Long story short: I'm trying to create a valid looking telephone number within a class and then loading it onto an array or list then later on save all of them as string into a folder. When I start the program again I want it to read the file and re-create my class and load it back into the list. (Basically a very simple repository).
Problem is even though I evaluate the stored phone number in the exact same way I validate it as input data ... I get an error which makes no sens.
Another small problem is the fact that when I re-use the data for some reason it creates white spaces in the file which in turn messes my program up badly.
Here I validate phone numbers:
def validateTel(call_ID):
if isinstance (call_ID, str) == True:
call_ID = call_ID.replace (" ", "")
if (len (call_ID) != 10):
print ("Telephone numbers are 10 digits long")
return False
for item in call_ID:
try:
int(item)
except:
print ("Telephone numbers should contain non-negative digits")
return False
else:
if (int(item) < 0):
print ("Digits are non-negative")
After this I use it and other non-relevant (to this discussion) data to create an object (class instance) and move them to a list.
Inside my class I have a load from string and a load to string. What they do is take everything from my class object so I can write it to a file using "+" as a separator so I can use string.split("+") and write it to a file. This works nicely, but when I read it ... well it's not working.
def load_data():
f = open ("data.txt", "r")
ch = f.read()
contact = agenda.contact () # class object
if ch in (""," ","None"," None"):
f.close()
return [] # if the file is empty or has None in some way I pass an empty stack
else:
stack = [] # the list where I load all my class objects
f.seek(0,0)
for line in f:
contact.loadFromString(line) # explained bellow
stack.append(deepcopy(contact))
f.close()
return stack
In loadFromString(line) all I do is validate the line (see if the data inside it at least looks OK).
Now here is the place where I validate the string I just read from the file:
def validateString (load_string):
string = string.split("+")
if len (string) != 4:
print ("System error in loading from file: Program skipping segment of corrupt data")
return False
if string[0] == "" or string[0] == " " or string[0] == None or string[0] == "None" or string[0] == " None":
print ("System error in loading from file: Name field cannot be empty")
try:
int(string[1])
except:
print("System error in loading from file: ID is not integer")
return False
if (validateTel(str(string[2])) == False):
print ("System error in loading from file: Call ID (telephone number)")
return False
return True
Small recap:
I try to load the data from file using loadFromString(). The only relevant thing that does is it tries to validate my data with validateString(string) in there the only thing that messes me up is the validateTel. But my input data gets validated in the same way my stored data does. They are perfectly identical but it gives a "System error" BUT to give such an error it should have also gave an error in the validate sub-program but it doesn't.
I hope this is enough info because my program is kinda big (for me any way) however the bug should be here somewhere.
I thank anyone brave enough to sift trough this mess.
EDIT:
The class is very simple, it looks like this:
class contact:
def __init__ (self, name = None, ID = None, tel = None, address = None):
self.__name = name
self.__id = ID
self.__tel = tel
self.__address = address
After this I have a series of setters and getters (to modify contacts and to return parts of the abstract data)
Here I also have my loadFromString and loadToString but those work just fine (except maybe they cause a small jump after each line (an empty line) which then kills my program, but that I can deal with)
My problem is somewhere in the validate or a way the repository interacts with it. The point is that even if it gives an error in the loading of the data, first the validate should print an error ... but it doesn't -_-
You said I just can't figure this out and honestly I can't get my head around what's going wrong. I think this is a great quote which sums up a large part of programming and software development in general -- dealing with crazy, weird problems and spending a lot of time trying to wrap your head around them.
Figuring out how to turn ridiculously complicated problems into small, manageable problems is the hardest part of programming, but also arguably the most important and valuable.
Here's some general advice which I think might help you:
use meaningful names for functions and variables (validateString doesn't tell me anything about what the function does; string tells me nothing about the meaning of its contents)
break down problems into small, well-defined pieces
specify your data -- what is a phone number? 10 positive digits, no spaces, no punctuation?
document/comment the input/output from functions if it's not obvious
Specific suggestions:
validateTel could probably be replaced with a simple regular expression match
try using json for serialization
if you're using json, then it's easy to use lists. I would strongly recommend this over using + as a separator -- that looks highly questionable to me
Example: using a regex
import re
def validateTel(call_ID):
phoneNumberRegex = re.compile("^\d{10}$") # match a string of 10 digits
return phoneNumberRegex.match(call_ID)
Example: using json
import json
phoneNumber1, phoneNumber2, phoneNumber3 = ... whatever ...
mylist = [phoneNumber1, phoneNumber2, phoneNumber3]
print json.dumps(mylist)
For starters, don't name your variables after reserved keywords. Instead of calling it string, call it telNumber or s or my_string.
def validateString (my_string):
working_string= my_string.split("+")
if len (working_string) != 4:
print ("System error in loading from file: Program skipping segment of corrupt data")
return False
The next line I don't really get - what is this If chain for? Accounting for bad data or something? Probably better to check for good data; bad data can come in infinite variety.
if working_string[0] == "" or working_string[0] == " " or working_string[0] == None or working_string[0] == "None" or string[0] == " None":
print ("System error in loading from file: Name field cannot be empty")
try:
int(string[1])
except:
print("System error in loading from file: ID is not integer")
return False
if (validateTel(str(working_string[2])) == False):
print ("System error in loading from file: Call ID (telephone number)")
return False
return True
Also, to give you a hint - you may want to look into regular expressions.
wow - many problems maybe connected to your problem, also as I commented - I suspect your problem is with turning the telnumber object to string.
f is is file object it won't be equal to anything. if you want to check if the file exists you should just do try /except around the file creation block. like:
try:
f = open ('data.txt','r') #also could call c=f.read() and check if c equals to none.. not really needed because you can cover an empty file in the next part iterating over f
except:
return
for line in f:
all sorts of stuff
return stack
don't use string reserved word and checking with negative numbers is very strange -is this part of the homeework? and why check by turning to int? this could also break your code - since the rest is a string.
all that said - I still suspect your main problem is with the way you turning the object into string data, It would never remain an instance of unless you used json/pickle/something else to strigfy. an object instance isn't just the class str.
and another thing - keep it simple, python is (also) about elegent and simple coding and you are trying to throw brute force with everything you know at a simple problem. focus, relax and rewrite the program.
I don't perceive all the logic.
For the moment , I can say you that you should correct the code of load_data() as follows:
def load_data():
f = open ("data.txt", "r")
ch = f.read()
contact = agenda.contact () # class object
if ch in (""," ","None"," None"):
f.close()
return [] # if the file is empty or has None in some way I pass an empty stack
else:
stack = [] # the list where I load all my class objects
f.seek(0,0)
for line in f:
contact.loadFromString(line) # explained bellow
stack.append(deepcopy(contact))
f.close()
return stack
I don't see how the file-like handler f could ever have a value None or string, so I think you want to test the content of the file -> f.read()
But then, the file's pointer is at its end and must be moved back to the start -> seek(0,0)
I will progressively add complementing considerations when I will understand more the problem.
edit
To test if a file is empty or not
import os.path
if os.path.isfile(filepath) and os.path.getsize(filepath):
.........
If the file with path filepath doesn't exist, getsize() raises an error. So the preliminary test if os.path.isfile() is necessary to avoid the second condition test to be evaluated.
But if your file can contain the strings "None" or " None" (really ? !), the getsize() will return 4 or 5 in this case.
You should avoid to manage with files containing these kinds of useless data.
edit
In validateTel(), after the instruction if isinstance (call_ID, str) == True: you are sure that call_ID is a string. Then the iteration for item in call_ID: will produce only item being ONE character long, hence it's useless to test if (int(item) < 0): , it will never happen; it could be possible that there is a sign - in the string call_ID but you won't detect it with this last condition.
In fact, as you test each character of callID, it is enough to test if it is one of the digits 0,1,2,3,4,5,6,7,8,9. If the sign - is in calml_ID, it will be detected as not being a digit.
To test if all the character in call_ID, there's an easy way provided by Python: all()
def validateTel(call_ID):
if isinstance(call_ID, str):
call_ID = call_ID.replace (" ", "")
if len(call_ID) != 10:
print ("A telephone number must be 10 digits long")
return False
else:
print ("Every character in a telephone number must be a digit")
return all(c in '0123456789' for c in call_ID)
else:
print ("call_ID must be a string")
return False
If one of the character c in call_ID isn't a digit, c in '0123456789' is False and the function all() stop the exam of the following characters and returns False; otherwise, it returns True