Python graceful fail on int() call? - python

I have to make a rudimentary FSM in a class, and am writing it in Python. The assignment requires we read the transitions for the machine from a text file. So for example, a FSM with 3 states, each of which have 2 possible transitions, with possible inputs 'a' and 'b', wolud have a text file that looks like this:
2 # first line lists all final states
0 a 1
0 b 2
1 a 0
1 b 2
2 a 0
2 b 1
I am trying to come up with a more pythonic way to read a line at a time and convert the states to ints, while keeping the input vals as strings. Basically this is the idea:
self.finalStates = f.readline().strip("\n").split(" ")
for line in f:
current_state, input_val, next_state = [int(x) for x in line.strip("\n").split(" ")]
Of course, when it tries to int("a") it throws a ValueError. I know I could use a traditional loop and just catch the ValueError but I was hoping to have a more Pythonic way of doing this.

You should really only be trying to parse the tokens that you expect to be integers
for line in f:
tokens = line.split(" ")
current_state, input_val, next_state = int(tokens[0]), tokens[1], int(tokens[2])
Arguably more-readable:
for line in f:
current_state, input_val, next_state = parseline(line)
def parseline(line):
tokens = line.split(" ")
return (int(tokens[0]), tokens[1], int(tokens[2]))

This is something very functional, but I'm not sure if it's "pythonic"... And it may cause some people to scratch their heads. You should really have a "lazy" zip() to do it this way if you have a large number of values:
types = [int, str, int]
for line in f:
current_state, input_val, next_state = multi_type(types, line)
def multi_type(ts,xs): return [t(x) for (t,x) in zip(ts, xs.strip().split())]
Also the arguments you use for strip and split can be omitted, because the defaults will work here.
Edit: reformatted - I wouldn't use it as one long line in real code.

You got excellent answers that match your problem well. However, in other cases, there may indeed be situations where you want to convert some fields to int if feasible (i.e. if they're all digits) and leave them as str otherwise (as the title of your question suggests) without knowing in advance which fields are ints and which ones are not.
The traditional Python approach is try/except...:
def maybeint(s):
try: return int(s)
except ValueError: return s
...which you need to wrap into a function as there's no way to do a try/except in an expression (e.g. in a list comprehension). So, you'd use it like:
several_fields = [maybeint(x) for x in line.split()]
However, it is possible to do this specific task inline, if you prefer:
several_fields = [(int(x) if x.isdigit() else x) for x in line.split()]
the if/else "ternary operator" looks a bit strange, but one can get used to it;-); and the isdigit method of a string gives True if the string is nonempty and only has digits.
To repeat, this is not what you should do in your specific case, where you know the specific int-str-int pattern of input types; but it might be appropriate in a more general situation where you don't have such precise information in advance!

self.finalStates = [int(state) for state in f.readline().split()]
for line in f:
words = line.split()
current_state, input_val, next_state = int(words[0]), words[1], int(words[2])
# now do something with values
Note that you can shorten line.strip("\n").split(" ") down to just line.split(). The default behavior of str.split() is to split on any white space, and it will return a set of words that have no leading or trailing white space of any sort.
If you are converting the states to int in the loop, I presume you want the finalStates to be int as well.

Related

looping over optional arguments (strings) in python

I have lists of strings, some are hashtags - like #rabbitsarecool others are short pieces of prose like "My rabbits name is fred."
I have written a program to seperate them:
def seperate_hashtags_from_prose(*strs):
props = []
hashtags = []
for x in strs:
if x[0]=="#" and x.find(' ')==-1:
hashtags += x
else:
prose += x
return hashtags, prose
seperate_hashtags_from_prose(["I like cats","#cats","Rabbits are the best","#Rabbits"])
This program does not work. in the above example when i debug it, it tells me that on the first loop:
x=["I like cats","#cats","Rabbits are the best",#Rabbits].
Thisis not what I would have expected - my intuition is that something about the way the loop over optional arguments is constructed is causing an error- but i can't see why.
There are several issues.
The most obvious is switching between props and prose. The code you posted does not run.
As others have commented, if you use the * in the function call, you should not make the call with a list. You could use seperate_hashtags_from_prose("I like cats","#cats","Rabbits are the best","#Rabbits") instead.
The line hashtags += x does not do what you think it does. When you use + as an operator on iterables (such as list and string) it will concatenate them. You probably meant hashtags.append(x) instead.

Error when trying to build logical parser

So i have these strings stored in database and i want to convert them to python expression to use them with if statement. I will store these strings into list and will loop over them.
For example:
string = "#apple and #banana or #grapes"
i am able to convert this string by replacing # with "a==" and # with "b==" to this :
if a == apple and b == banana or b == grapes
hash refers to a
# refers to b
But when i use eval it throws up error "apple is not defined" because apple is not in quotes. so what i want is this:
if a == "apple" and b == "banana" or b == "grapes"
Is there any way i can do this ?
The strings stored in DB can have any type of format, can have multiple and/or conditions.
Few examples:
string[0] = "#apple and #banana or #grapes"
string[1] = "#apple or #banana and #grapes"
string[2] = "#apple and #banana and #grapes"
There will be else condition where no condition is fullfilled
Thanks
If I understand correctly you are trying so setup something of a logical parser - you want to evaluate if the expression can possibly be true, or not.
#word or #otherword
is always true since it's possible to satisfy this with #=word for example, but
#word and #otherword
is not since it is impossible to satisfy this. The way you were going is using Python's builtin interpreter, but you seem to "make up" variables a and b, which do not exist. Just to give you a starter for such a parser, here is one bad implementation:
from itertools import product
def test(string):
var_dict = {}
word_dict = {}
cur_var = ord('a')
expression = []
for i,w in enumerate(string.split()):
if not i%2:
if w[0] not in var_dict:
var_dict[w[0]] = chr(cur_var)
word_dict[var_dict[w[0]]] = []
cur_var += 1
word_dict[var_dict[w[0]]].append(w[1:])
expression.append('{}=="{}"'.format(var_dict[w[0]],w[1:]))
else: expression.append(w)
expression = ' '.join(expression)
result = {}
for combination in product(
*([(v,w) for w in word_dict[v]] for v in word_dict)):
exec(';'.join('{}="{}"'.format(v,w) for v,w in combination)+';value='+expression,globals(),result)
if result['value']: return True
return False
Beyond not checking if the string is valid, this is not great, but a place to start grasping what you're after.
What this does is create your expression in the first loop, while saving a hash mapping the first characters of words (w[0]) to variables named from a to z (if you want more you need to do better than cur_var+=1). It also maps each such variable to all the words it was assigned to in the original expression (word_dict).
The second loop runs a pretty bad algorithm - product will give all the possible paring of variable and matching word, and I iterate each combination and assign our fake variables the words in an exec command. There are plenty of reasons to avoid exec, but this is easiest for setting the variables. If I found a combination that satisfies the expression, I return True, otherwise False. You cannot use eval if you want to assign stuff (or for if,for,while etc.).
Not this can drastically be improved on by writing your own logical parser to read the string, though it will probably be longer.
#Evaluted as (#apple and #banana) or #grapes) by Python - only #=apple #=banana satisfies this.
>>> test("#apple and #banana or #grapes")
True
#Evaluted as #apple or (#banana and #grapes) by Python - all combinations satisfy this as # does not matter.
>>> test("#apple or #banana and #grapes")
True
#demands both #=banana and #=grapes - impossible.
>>> test("#apple and #banana and #grapes")
False
I am not sure of what you are asking here, but you can use the replace and split functions :
string = "#apple and #banana"
fruits = string.replace("#", "").split("and")
if a == fruits[0] and b == fruits[1]:
Hope this helps

How to find the index of multiple sub string in a string by python?

I'd like to find the location in a string for certain characters such as "FC" or "FL". For single case like FC, I used the find() function to return the index of the characters in a string, as below.
for line in myfile:
location = line.find('FC')
But when it comes to adding FL, how do I add it without using an if statement in the for loop? I don't want to add redundant lines so I hope there is an elegant solution.
Each line will include either "FC" or "FL" but not both.
Each line will include either "FC" or "FL" but not both.
That makes the following somewhat hacky trick possible:
location = max(line.find(sub) for sub in ("FC", "FL"))
The idea is that of the two values, one will be -1 and the other will be positive (the index where it was found) so the greater value is the location.
Note that if "FC" is found the method will still search for "FL" and not find it, which will reduce performance if the string being searched is long, whereas solutions using a conditional will avoid this redundant calculation. However if the string is short then using the least amount of python code and letting C do all the thinking is probably fastest (though you should test your case if it really matters).
You can also avoid using a for comprehension for this simple function call:
location = max(map(line.find, ("FC", "FL")))
You can do this:
for line in myfile:
location = [line.find(substring) for substring in ('FC', 'FL') if line.find(substring) != -1][0]
It's similar to the solution suggested by #NathanielFord, the only difference is, I added if line.find(substring) != -1 to the generator to solve the problem I pointed at and moved getting the element with zero index to the same line to make it shorter. (#NathanielFord, I'm sorry you removed your answer before I suggested this in the comments)
Though, it's not a very elegant solution because it will call .find() twice, but it is shorter than using fors.
If you want the most elegant solution, then a conditional is probably your solution. It won't be a "redundant" line, but it will make your code look nice and readable:
for line in myfile:
location = line.find('FC')
if location == -1:
location = line.find('FL')
It is a little unclear what your desired output is, and there are more elegant ways to handle it depending on that, but essentially you're looking for:
def multifind(line):
for substring in ['FC', 'FL']:
location = line.find(substring)
if location is not -1:
return location
return None
locations = [multifind(line) for line in myfile]
Sample run:
myfile = ["abcFCabc","abFLcabc", "abc", ""]
>>> def multifind(line):
... for substring in ['FC', 'FL']:
... location = line.find(substring)
... if location is not -1:
... return location
... return None
...
>>> locations = [multifind(line) for line in myfile]
>>> locations
[3, 2, None, None]
Note that this is not quite as elegant as the solution with the if inside the for loop.

Python name variable from string [duplicate]

This question already has answers here:
How can you dynamically create variables? [duplicate]
(8 answers)
Closed 8 years ago.
Is it possible to create a variable name based on the value of a string?
I have a script that will read a file for blocks of information and store them in a dictionary. Each block's dictionary will then be appended to a 'master' dictionary. The number of blocks of information in a file will vary and uses the word 'done' to indicate the end of a block.
I want to do something like this:
master={}
block=0
for lines in file:
if line != "done":
$block.append(line)
elif line == "done":
master['$block'].append($block)
block = block + 1
If a file had content like so:
eggs
done
bacon
done
ham
cheese
done
The result would be a dictionary with 3 lists:
master = {'0': ["eggs"], '1': ["bacon"], '2': ["ham", "cheese"]}
How could this be accomplished?
I would actually suggest you to use a list instead. Is there any specific point why would you need dicts that are array-ish?
In case you could do with an array, you can use this:
with open("yourFile") as fd:
arr = [x.strip().split() for x in fd.read().split("done")][:-1]
Output:
[['eggs'], ['bacon'], ['ham', 'cheese']]
In case you wanted number-string indices, you could use this:
with open("yourFile") as fd:
l = [x.strip().split() for x in fd.read().split("done")][:-1]
print dict(zip(map(str,range(len(l))),l))
You seem to be misunderstanding how dictionaries work. They take keys that are objects, so no magic is needed here.
We can however, make your code nicer by using a collections.defaultdict to make the sublists as required.
from collections import defaultdict
master = defaultdict(list)
block = 0
for line in file:
if line == "done":
block += 1
else:
master[block].append(line)
I would, however, suggest that a dictionary is unnecessary if you want continuous, numbered indices - that's what lists are for. In that case, I suggest you follow Thrustmaster's first suggestion, or, as an alternative:
from itertools import takewhile
def repeat_while(predicate, action):
while True:
current = action()
if not predicate(current):
break
else:
yield current
with open("test") as file:
action = lambda: list(takewhile(lambda line: not line == "done", (line.strip() for line in file)))
print(list(repeat_while(lambda x: x, action)))
I think that split on "done" is doomed to failure. Consider the list:
eggs
done
bacon
done
rare steak
well done stake
done
Stealing from Thrustmaster (which I gave a +1 for my theft) I'd suggest:
>>> dict(enumerate(l.split() for l in open(file).read().split('\ndone\n') if l))
{0: ['eggs'], 1: ['bacon'], 2: ['ham', 'cheese']}
I know this expects a trailing "\n". If there is a question there you could use "open(file).read()+'\n'" or even "+'\n\ndone\n'" if the final done is optional.
Use setattr or globals().
See How do I call setattr() on the current module?
Here's your code again, for juxtaposition:
master={}
block=0
for lines in file:
if line != "done":
$block.append(line)
elif line == "done":
master['$block'].append($block)
block = block + 1
As mentioned in the post by Thrustmaster, it makes more sense to use a nested list here. Here's how you would do that; I've changed as little as possible structurally from your original code:
master=[[]] # Start with a list containing only a single list
for line in file: # Note the typo in your code: you wrote "for lines in file"
if line != "done":
master[-1].append(line) # Index -1 is the last element of your list
else: # Since it's not not "done", it must in fact be "done"
master.append([])
The only thing here is that you'll end up with one extra list at the end of your master list, so you should add a line to delete the last, empty sublist:
del master[-1]

Is there a better way to create dynamic functions on the fly, without using string formatting and exec?

I have written a little program that parses log files of anywhere between a few thousand lines to a few hundred thousand lines. For this, I have a function in my code which parses every line, looks for keywords, and returns the keywords with the associated values.
These log files contain of little sections. Each section has some values I'm interested in and want to store as a dictionary.
I have simplified the sample below, but the idea is the same.
My original function looked like this, it gets called between 100 and 10000 times per run, so you can understand why I want to optimize it:
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
elif 'apples' in line:
d['apples'] = True
elif 'bananas' in line:
d['bananas'] = True
elif line.startswith('End of section'):
return d
f = open('fruit.txt','r')
d = parse_txt(f)
print d
The problem I run into, is that I have a lot of conditionals in my program, because it checks for a lot of different things and stores the values for it. And when checking every line for anywhere between 0 and 30 keywords, this gets slow fast. I don't want to do that, because, not every time I run the program I'm interested in everything. I'm only ever interested in 5-6 keywords, but I'm parsing every line for 30 or so keywords.
In order to optimize it, I wrote the following by using exec on a string:
def make_func(args):
func_str = """
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
"""
if 'apples' in args:
func_str += """
elif 'apples' in line:
d['apples'] = True
"""
if 'bananas' in args:
func_str += """
elif 'bananas' in line:
d['bananas'] = True
"""
func_str += """
elif line.startswith('End of section'):
return d"""
print func_str
exec(func_str)
return parse_txt
args = ['apples','bananas']
fun = make_func(args)
f = open('fruit.txt','r')
d = fun(f)
print d
This solution works great, because it speeds up the program by an order of magnitude and it is relatively simple. Depending on the arguments I put in, it will give me the first function, but without checking for all the stuff I don't need.
For example, if I give it args=['bananas'], it will not check for 'apples', which is exactly what I want to do.
This makes it much more efficient.
However, I do not like it this solution very much, because it is not very readable, difficult to change something and very error prone whenever I modify something. Besides that, it feels a little bit dirty.
I am looking for alternative or better ways to do this. I have tried using a set of functions to call on every line, and while this worked, it did not offer me the speed increase that my current solution gives me, because it adds a few function calls for every line. My current solution doesn't have this problem, because it only has to be called once at the start of the program. I have read about the security issues with exec and eval, but I do not really care about that, because I'm the only one using it.
EDIT:
I should add that, for the sake of clarity, I have greatly simplified my function. From the answers I understand that I didn't make this clear enough.
I do not check for keywords in a consistent way. Sometimes I need to check for 2 or 3 keywords in a single line, sometimes just for 1. I also do not treat the result in the same way. For example, sometimes I extract a single value from the line I'm on, sometimes I need to parse the next 5 lines.
I would try defining a list of keywords you want to look for ("keywords") and doing this:
for word in keywords:
if word in line:
d[word] = True
Or, using a list comprehension:
dict([(word,True) for word in keywords if word in line])
Unless I'm mistaken this shouldn't be much slower than your version.
No need to use eval here, in my opinion. You're right in that an eval based solution should raise a red flag most of the time.
Edit: as you have to perform a different action depending on the keyword, I would just define function handlers and then use a dictionary like this:
def keyword_handler_word1(line):
(...)
(...)
def keyword_handler_wordN(line):
(...)
keyword_handlers = { 'word1': keyword_handler_word1, (...), 'wordN': keyword_handler_wordN }
Then, in the actual processing code:
for word in keywords:
# keyword_handlers[word] is a function
keyword_handlers[word](line)
Use regular expressions. Something like the next:
>>> lookup = {'a': 'apple', 'b': 'banane'} # keyword: characters to look for
>>> pattern = '|'.join('(?P<%s>%s)' % (key, val) for key, val in lookup.items())
>>> re.search(pattern, 'apple aaa').groupdict()
{'a': 'apple', 'b': None}
def create_parser(fruits):
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
elif line.startswith('End of section'):
return d
else:
for testfruit in fruits:
if testfruit in line:
d[testfruit] = True
This is what you want - create a test function dynamically.
Depending on what you really want to do, it is, of course, possibe to remove one level of complexity and define
def parse_txt(f, fruits):
[...]
or
def parse_txt(fruits, f):
[...]
and work with functools.partial.
You can use set structure, like this:
fruit = set(['cocos', 'apple', 'lime'])
need = set (['cocos', 'pineapple'])
need. intersection(fruit)
return to you 'cocos'.

Categories