I'm trying to parse user input that specifies a value to look up in a dictionary.
For example, this user input may be the string fieldA[5]. I need to then look up the key fieldA in my dictionary (which would be a list) and then grab the element at position 5. This should be generalizable to any depth of lists, e.g. fieldA[5][2][1][8], and properly fail for invalid inputs fieldA[5[ or fieldA[5][7.
I have investigated doing:
import re
key = "fieldA[5]"
re.split('\[|\]', key)
which results in:
['fieldA', '5', '']
This works as an output (I can lookup in the dictionary with it), but does not actually enforce or check pairing of brackets, and is unintuitive in how it reads and how this output would be used to look up in the dictionary.
Is there a parser library, or alternative approach available?
If you don't expect it to get much more complicated than what you described, you might be able to get away with some regex. I usually wouldn't recommend it for this purpose, but I want you to be able to start somewhere...
import re
key = input('Enter field and indices (e.g. "fieldA[5][3]"): ')
field = re.match(r'[^[]*', key).group(0)
indices = [int(num) for num in re.findall(r'\[(\d+)]', key)]
This will simply not recognize a missing field or incorrect bracketing and return an empty string or list respectively.
Related
Here is a conceptual problem that I have been having regarding the cleaning of data and how to interact with lists and tuples that I'm not sure completely how to explain, but if I can get a fix for it, I can conceptually be better at using python.
Here it is: (using python 3 and sqlite3)
I have an SQLite Database with a date column which has text in it in the format of MM-DD-YY 24:00. when viewed in DB Browser the text looks fine. However, when using a fetchall() in Python the code prints the dates out in the format of 'MM-DD-YY\xa0'. I want to clean out the \xa0 from the code and I tried some code that is a combination of what I think I should do plus another post I read on here. This is the code:
print(dates)
output [('MM-DD-YY\xa0',), ('MM-DD-YY\xa0',)etc.blahblah] i just typed this in here
to show you guys the output
dates_clean = []
for i in dates:
clean = str(i).replace(u'\xa0', u' ')
dates_clean.append(clean)
Now when I print dates_clean I get:
["('MM-DD-YY\xa0',)", "('MM-DD-YY\xa0',)"etc]
so now as you can see when i tried to clean it, it did what I wanted it to do but now the actual tuple that it was originally contained in has become part of the text itself and is contained inside another tuple. Therefore when I write this list back into SQLite using an UPDATE statement. all of the date values are contained inside a tuple.
It frustrates me because I have been facing issues such as this for a while, where I want to edit something inside of a list or a tuple and have the new value just replace the old one instead of keeping all of the characters that say it is a tuple and making them just text. Sorry if that is confusing like I said its hard for me to explain. I always end up making my data more dirty when trying to clean it up.
Any insights in how to efficiently clean data inside lists and tuples would be greatly appreciated. I think I am confused about the difference between accessing the tuple or accessing what is inside the tuple. It could also be helpful if you could suggest the name of the conceptual problem I'm dealing with so I can do more research on my own.
Thanks!
You are garbling the output by calling str() on the tuple, either implicitly when printing the whole array at once, or explicitly when trying to “clean” it.
See (python3):
>>> print("MM-DD-YY\xa024:00")
MM-DD-YY 24:00
but:
>>> print(("MM-DD-YY\xa024:00",))
('MM-DD-YY\xa024:00',)
This is because tuple.__str__ calls repr on the content, escaping the non-ascii characters in the process.
However if you print the tuple elements as separate arguments, the result will be correct. So you want to replace the printing with something like:
for row in dates:
print(*row)
The * expands the tuple to separate parameters. Since these are strings, they will be printed as is:
>>> row = ("MM-DD-YY\xa023:00", "MM-DD-YY\xa024:00")
>>> print(*row)
MM-DD-YY 23:00 MM-DD-YY 24:00
You can add separator if you wish
>>> print(*row, sep=', ')
MM-DD-YY 23:00, MM-DD-YY 24:00
... or you can format it:
>>> print('from {0} to {1}'.format(*row))
from MM-DD-YY 23:00 to MM-DD-YY 24:00
Here I am using the * again to expand the tuple to separate arguments and then simply {0} for zeroth member, {1} for first, {2} for second etc. (you can also use {} for next if you don't need to change the order, but giving the indices is clearer).
Ok, so now if you actually need to get rid of the non-breaking space anyway, replace is the right tool. You just need to apply it to each element of the tuple. There are two ways:
Explicit destructuring; applicable when the number of elements is fixed (should be; it is a row of known query):
Given:
>>> row = ('foo', 2, 5.5)
you can destructure it and construct a new tuple:
>>> (a, b, c) = row
>>> (a.replace('o', '0'), b + 1, c * 2)
('f00', 3, 11.0)
this lets you do different transformation on each column.
Mapping; applicable when you want to do the same transformation on all elements:
Given:
>>> row = ('foo', 'boo', 'zoo')
you just wrap a generator comprehension in a tuple constructor:
>>> tuple(x.replace('o', '0') for x in row)
('f00', 'b00', 'z00')
On a side-note, SQLite has some date and time functions and they expect the timestamps to be in strict IS8601 format, i.e. %Y-%m-%dT%H:%M:%S (optionally with %z at the end; using strftime format; in TR#35 format it is YYYY-MM-ddTHH-mm-ss(xx)).
In your case, dates is actually a list of tuples, with each tuple containing one string element. The , at the end of the date string is how you identify a single element tuple.
The for loop you have needs to work on each element within the tuples, instead of the tuples themselves. Something along the lines of:
for i in dates:
date = i[0]
clean = str(date).replace('\xa0', '')
dates_clean.append(date)
I am not sure this the best solution to your actual problem of manipulating data in the db, but should answer your question.
EDIT: Also, refer the Jan's reply about unicode strings and python 2 vs python 3.
What I have is a dictionary of words and I'm generating objects that contain
(1) Original word (e.g. cats)
(2) Alphabetized word (e.g. acst)
(3) Length of the word
Without knowing the length of the longest word, is it possible to create an array (or, in Python, a list) such that, as I scan through the dictionary, it will append an object with x chars into a list in array[x]?
For example, when I encounter the word "a", it will append the generated object to the list at array[1]. Next, for aardvark, if will append the generated object to the list at array[8], etc.
I thought about creating an array of size 1 and then adding on to it, but I'm not sure how it would work.
Foe example: for the first word, a, it will append it to the list stored in array[1]. However, for next word, aardvark, how am I supposed to check/generate more spots in the list until it hits 8? If I append to array, I need give the append function an arg. But, I can't give it just any arg since I don't want to change previously entered values (e.g. 'a' in array[1]).
I'm trying to optimize my code for an assignment, so the alternative is going through the list a second time after I've determined the longest word. However, I think it would be better to do it as I alphabetize the words and create the objects such that I don't have to go through the lengthy dictionary twice.
Also, quick question about syntax: listOfStuff[x].append(y) will initialize/append to the list within listOfStuff at the value x with the value y, correct?
Store the lengths as keys in a dict rather than as indexes in a list. This is really easy if you use a defaultdict from the collections module - your algorithm will look like this:
from collections import defaultdict
results = defaultdict(list)
for word in words:
results[len(word)].append(word)
This ties in to your second question: listOfStuff[x].append(y) will append to a list that already exists at listofStuff[x]. It will not create a new one if that hasn't already been initialised to a (possibly empty) list. If x isn't a valid index to the list (eg, x=3 into a listOfStuff length 2), you'll get an IndexError. If it exists but there is something other than another list there, you will probably get an AttributeError.
Using a dict takes care of the first problem for you - assigning to a non-existent dict key is always valid. Using a defaultdict extends this idea to also reading from a non-existent key - it will insert a default value given by calling the function you give the defaultdict when you create it (in this case, we gave it list, so it calls it and gets an empty list) into the dict the first time you use it.
If you can't use collections for some reason, the next best way is still to use dicts - they have a method called setdefault that works similarly to defaultdicts. You can use it like this:
results = {}
for word in words:
results.setdefault(len(word), []).append(word)
as you can see, setdefault takes two arguments: a key and a default value. If the key already exists in the dict, setdefault just returns its current value as if you'd done results[key]. If that would be an error, however, it inserts the second argument into the dictionary at that key, and then returns it. This is a little bit clunkier to use than defaultdict, but when your default value is an empty list it is otherwise the same (defaultdict is better to use when your default is expensive to create, however, since it only calls the factory function as needed, but you need to precompute it to pass into setdefault).
It is technically possible to do this with nested lists, but it is ugly. You have to:
Detect the case that the list isn't big enough
Figure out how many more elements the list needs
Grow the list to that size
the most Pythonic way to do the first bit is to catch the error (something you could also do with dicts if setdefault and defaultdict didn't exist). The whole thing looks like this:
results = []
for word in words:
try:
results[len(word)]
except IndexError:
# Grow the list so that the new highest index is
# len(word)
new_length = len(word) + 1
difference = len(results) - new_length
results.extend([] for _ in range(difference))
finally:
results[len(word)].append(word)
Stay with dicts to avoid this kind of mess. lists are specifically optimised for the case that the exact numeric index of any element isn't meaningful outside of the list, which doesn't meet your use case. This type of code is really common when you have a mismatch between what your code needs to do and what the data structures you're using are good at, and it is worth learning as early as possible how to avoid it.
Forgive me for the poor title, I really can't come up with a proper title.
Here is my problem. Say I was given a list of strings:
['2010.01.01',
'1b',
'`abc'
'12:20:33.000']
And I want to do a 'type check' so that given the first string it returns type date, second one boolean, third one a symbol, forth one a time... etc. The returned value can be a string or anything since all I want to do is to cast the correct ctypes.
Is there any way to do it?
ps: my python is 2.5
>>> str = ['2010.01.01',
... '1b',
... '`abc'
... '12:20:33.000']
>>> [type(x) for x in str]
[<type 'str'>, <type 'str'>, <type 'str'>]
Suppose that you use an eval for determine this list.
If you are completely certain you can trust the content -- that it's not, say, coming from a user who could sneak code into the list somehow -- you could map the list onto eval, which will catch native types like numbers. However there is no simple way to know what those strings should all mean -- for example, if you try to evel '2010.01.01' python will think you're trying to parse a number and then fail because of the extra decimals.
So you could try a two stage strategy: first cast the list to strings vs numbers using eval:
def try_cast (input_string):
try:
val = eval(input_string)
val_type = type(val)
return val, val_type
except:
return input_string, type('')
cast_list = map (try_cast, original_list)
that would give a list of tuples where the second item is the type and the first is the converted item. For more specialized things like dates you'd need to use the same strategy on the strings left over after the first pass, using a try/except block to attempt converting them to dates using time.strptime(). You'll need to figure out what time formats to expect and generate a parse expression for each one (you can check the python docs or something like http://www.tutorialspoint.com/python/time_strptime.htm) You'd have to try all the options and see which ones correctly converted -- if one worked, the value is a date; if not, its just a string.
In short i have a list of items. lets say they have a name of Object1, Object2,....Object10,....Object20 And so on.
This list depending on use input changes starting points, for the example lets say the list packs in the names of all objects from Object18 up to Object28
Im using a statement to select these items from the stored list that goes:
for i in nuke.allNodes():
if i.name() in hiddenLists:
i.setSelected(True)
else:
i.setSelected(False)
Which works generally... trouble is because "in" (for inside the list) doesnt specify i want it to have to match an entire entry of the list, Instead of JUST selecting Object 18-28 it selects Object1 Object2 And Object 18-28 (reason being of course, Object18 and so on begin with Object1, and the 20s with a 2)
I cant pad the strings due to the fact that these are set names a program creates and have to stay the same. my only question is, is there a better operator than in that makes it have to match exactly rather than see Object1 within 'Object18'?
Looks like hiddenLists is a string (str) entered by the user. Use the split method on that string to make it a list first. Then the "in" clause will do what you want.
For instance, if the user enters a comma-separated list:
hiddenLists = [x.strip() for x in hiddenLists.split(",")]
if i.name() in hiddenLists:
...
I have a default dict of dicts whose primary key is a timestamp in the string form 'YYYYMMDD HH:MM:SS.' The keys are entered sequentially. How do I access the last entered key or the key with the latest timestamp?
Use an OrderedDict from the collections module if you simply need to access the last item entered. If, however, you need to maintain continuous sorting, you need to use a different data structure entirely, or at least an auxiliary one for the purposes of indexing.
Edit: I would add that, if accessing the final element is an operation that you have to do very rarely, it may be sufficient simply to sort the dict's keys and select the maximum. If you have to do this frequently, however, repeatedly sorting would become prohibitively expensive. Depending on how your code works, the simplest approach would probably be to simply maintain a single variable that, at any given point, contains the last key added and/or the maximum value added (i.e., is updated with each subsequent addition to the dict). If you want to maintain a record of additions that extends beyond just the last item, however, and don't require continuous sorting, an OrderedDict is ideal.
Use OrderedDict rather than a built-in dict
You can try something like this:
>>> import time
>>> data ={'20120627 21:20:23':'first','20120627 21:20:40':'last'}
>>> latest = lambda d: time.strftime('%Y%m%d %H:%M:%S',max(map(lambda x: time.strptime(x,'%Y%m%d %H:%M:%S'),d.keys())))
>>> data[latest(data)]
'last'
but it probably would be slow on large data sets.
If you want to know who entered the last (according to time of entrance) see the example below:
import datetime
format='%Y%m%d %H:%M'
Dict={'20010203 12:00':'Dave',
'20000504 03:00':'Pete',
'20020825 23:00':'kathy',
'20030102 01:00':'Ray'}
myDict={}
for key,val in Dict.iteritems():
TIME= str(datetime.datetime.strptime(key,format))
myDict[TIME]= val
myDict=sorted(myDict.iteritems(), key=lambda (TIME,v): (TIME))
print myDict[-1]