Python Disambiguation - python

I am currently building a MUD (Multi-User-Domain) for an rpg game. Doing this entirely in Python to both make a game I enjoy, and learn python. A problem I am running in to, and due to the extreme specificity of the question, I've been unable to find the right answer.
So, here's what I need, in a nut-shell. I don't have a good snippet of code that fully shows what I need as I'd have to paste about 50 lines to have to 5 lines I'm using make sense.
targetOptions = ['Joe', 'Bob', 'zombie', 'Susan', 'kobold', 'Bill']
A cmd in our game is attack, where we type 'a zombie' and we then proceed to kill the zombie. However, I want to just type 'a z'. We've tried a few different things in our code, but they're all unstable and often just wrong. One of our attempts returned something like ['sword', 'talisman'] as matches for 'get sword'. So, is there a way to search this list and have it return a matched value?
I also need to just return value[0] if there are say, 2 zombies in the room and I type 'a z'. Thanks for all your help ahead of time, and I hope I was clear enough for what I'm looking for. Please let me know if more info is needed. And don't worry about the whole attacking thing, I just need to send 'zo' and get 'zombie' or something similar. Thanks!

Welcome to SO and Python! I suggest you take a look at the official Python documentation and spend some time looking around what's included in the Python Standard Library.
The difflib module contains a function get_close_matches() that can help you with approximate string comparisons. Here's how it looks like:
from difflib import get_close_matches
def get_target_match(target, targets):
'''
Approximates a match for a target from a sequence of targets,
if a match exists.
'''
source, targets = targets, map(str.lower, targets)
target = target.lower()
matches = get_close_matches(target, targets, n=1, cutoff=0.25)
if matches:
match = matches[0]
return source[targets.index(match)]
else:
return None
target = 'Z'
targets = ['Joe', 'Bob', 'zombie', 'Susan', 'kobold', 'Bill']
match = get_target_match(target, targets)
print "Going nom on %s" % match # IT'S A ZOMBIE!!!

>>> filter(lambda x: x.startswith("z"), ['Joe', 'Bob', 'zombie', 'Susan', 'kobold', 'Bill'])
['zombie']
>>> cmd = "a zom"
>>> cmd.split()
['a', 'zom']
>>> cmd.split()[1]
'zom'
>>> filter(lambda x: x.startswith(cmd.split()[1]), ['Joe', 'Bob', 'zombie', 'Susan', 'kobold', 'Bill'])
['zombie']
does that help?
filter filters a list (2nd arg) for things that the 1st arg accepts. cmd is your command and cmd.split()[1] gets the part after the space. lambda x: x.startswith(cmd.split()[1]) is a function (a lambda expression) that asks "does x start with the command after the space?"
for another test, if cmd is "a B" then there are two matches:
>>> cmd = "a B"
>>> filter(lambda x: x.startswith(cmd.split()[1]), ['Joe', 'Bob', 'zombie', 'Susan', 'kobold', 'Bill'])
['Bob', 'Bill']

Related

how to get combinations of substrings in python

I want to get combinations of substrings like an example below:
original string: “This is my pen”
expected output list: [“This is my pen”, “This is mypen”, “This ismy pen”, “Thisis my pen”, “This ismypen”, “Thisismy pen”, “Thisismypen”]
As you can see in the example, I’d like to get all substring combinations by removing a white space character(s) while keeping the order of the sequence.
I’ve tried to use strip(idx) function and from itertools import combinations. But it was hard to keep the order of the original sentence and also get all possible cases at the same time.
Any basic ideas will be welcomed! Thank you very much.
I’m a newbie to programming so please let me know if I need to write more details. Thanks a lot.
Try this:
import itertools
s = "This is my pen"
s_list = s.split()
s_len = len(s_list)
Then:
r = ["".join(itertools.chain(*zip(s_list, v+("",))))
for v in itertools.product([" ", ""], repeat=s_len-1)]
This results in r having the following value:
['This is my pen',
'This is mypen',
'This ismy pen',
'This ismypen',
'Thisis my pen',
'Thisis mypen',
'Thisismy pen',
'Thisismypen']
If you just want to iterate over the values, you can avoid creating the top-level list as follows:
for u in ("".join(itertools.chain(*zip(s_list, v+("",))))
for v in itertools.product([" ", ""], repeat=s_len-1)):
print(u)
which produces:
This is my pen
This is mypen
This ismy pen
This ismypen
Thisis my pen
Thisis mypen
Thisismy pen
Thisismypen

scrape data and sort it using Python 2.7 and selenium

i'm trying to scrape data in a website using selenium and python 2.7. Here is the code from the data that i want to scrape
<textarea>let, either, and, have, rather, because, your, with, other, that, neither, since, however, its, will, some, own, than, should, wants, they, got, may, what, least, else, cannot, like, whom, which, who, why, his, these, been, had, the, all, likely, their, must, our</textarea>
i need to insert all that words to list and sort it. for now this is my progres
wordlist = []
data = browser.find_element_by_tag_name("textarea")
words = data.get_attribute()
wordlist.append(words)
print words
print wordlist.sort()
any help or clue would be useful for me
Note that wordlist.sort() doesn't return list, but just sorts existed list, so you might need to do
wordlist.sort()
print wordlist
or try below code to get required output
data = driver.find_element_by_tag_name("textarea")
words = data.get_attribute('value')
sorted_list = sorted(words.split(', '))
print sorted_list
# ['all,', 'and,', 'because,', 'been,', 'cannot,', 'either,', 'else,', 'got,', 'had,', 'have,', 'his,', 'however,', 'its,', 'least,', 'let,', 'like,', 'likely,', 'may,', 'must,', 'neither,', 'other,', 'our', 'own,', 'rather,', 'should,', 'since,', 'some,', 'than,', 'that,', 'the,', 'their,', 'these,', 'they,', 'wants,', 'what,', 'which,', 'who,', 'whom,', 'why,', 'will,', 'with,', 'your,']
I was able to recreate your issue using the following code:
words = ["hello", "world", "abc", "def"]
wordlist = []
wordlist.append(words)
print(words)
print(wordlist.sort())
This outputs:
['hello', 'world', 'abc', 'def']
None
Which I believe is the issue you are having.
To fix it I did two things:
1) wordlist.append(words) for wordlist = words.copy() - this copies the array rather than appending the array to an array element and 2) move the wordlist.sort() out of the print function - sort returns nothing and is an in place sort so returns nothing.
So, the complete updated example is:
words = ["hello", "world", "abc", "def"]
wordlist = []
wordlist = words.copy()
wordlist.sort()
print(words)
print(wordlist)
Which now outputs the sorted list (as you required):
['hello', 'world', 'abc', 'def']
['abc', 'def', 'hello', 'world']

How to formatting a list into a string?

I am trying to format a string in python that takes arguments as items from a list of names.
The catch is, I want to print all the list items with double quotes and backslash and one after each other in the same string only.
The code is:
list_names=['Alex', 'John', 'Joseph J']
String_to_pring='Hi my name is (\\"%s\\")'%(list_names)
The output should look like this:
'Hi my name is (\"Alex\",\"John\",\"Joseph J\")'
But instead, I keep getting like this:
'Hi my names is (\"['Alex','John','Joseph J']\")'
I've even tried using .format() and json.dumps() but still the same result.
Is there any way to print the desired output or can I only print each list item at a time?
Without changing much of your code, you could simply format the repr representation of the list that's converted into a tuple.
# proper way - this is what you actually want
list_names = ['Alex', 'John', 'Joseph J']
string_to_print = 'Hi my name is %s' % (repr(tuple(list_names)))
 
print(string_to_print)
# Hi my name is ('Alex', 'John', 'Joseph J')
If you want to get your exact output, just do some string replacing:
# improper way
list_names = ['Alex', 'John', 'Joseph J']
string_to_print = 'Hi my name is %s' % (repr(tuple(list_names)).replace("\'", '\\"'))
print(string_to_print)
# Hi my name is (\"Alex\", \"John\", \"Joseph J\")
if you're trying to pass string_to_print to some other place, just try the proper way first, it might actually work for you.
If you were mindful enough, you'll find that the previous "improper way" contains a small bug, try this adding "Alex's house" into list_names, the output would look like this:
Hi my name is (\"Alex\", \"John\", \"Joseph J\", "Alex\"s house")
To take care of that bug, you'll need to have a better way of replacing, by using re.sub().
from re import sub
list_names = ['Alex', 'John', 'Joseph J', "Alex's house"]
string_to_print = 'Hi my name is %s' % (sub(r'([\'\"])(.*?)(?!\\\1)(\1)', r'\"\2\"', repr(tuple(list_names))))
print(string_to_print)
But if things like this wouldn't happen during your usage, I would suggest to keep using the "improper way" as it's a lot simpler.
There is no function for formatting lists as human-friendly strings You have to format lists yourself:
names = ",".join(r'\"{}\"'.format(name) for name in list_names)
print(names)
#\"Alex\",\"John\",\"Joseph J\"
print('Hi my name is ({})'.format(names))
#Hi my name is (\"Alex\",\"John\",\"Joseph J\")
This is one way using format and join:
list_names = ['Alex', 'John', 'Joseph J']
String_to_pring='Hi my name is (\\"{}\\")'.format('\\",\\"'.join(i for i in list_names))
# Hi my name is (\"Alex\",\"John\",\"Joseph J\")

What's a more efficient way of looping with regex?

I have a list of names which I'm using to pull out of a target list of strings. For example:
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
output = ['Chris Smith', 'Kim', 'CHRIS']
So the rules so far are:
Case insensitive
Cannot match partial word ('ie Christmas/hijacked shouldn't match Chris/Jack)
Other words in string are okay as long as name is found in the string per the above criteria.
To accomplish this, another SO user suggested this code in this thread:
[targ for targ in target_list if any(re.search(r'\b{}\b'.format(name), targ, re.I) for name in first_names)]
This works very accurately so far, but very slowly given the names list is ~5,000 long and the target list ranges from 20-100 lines long with some strings up to 30 characters long.
Any suggestions on how to improve performance here?
SOLUTION: Both of the regex based solutions suffered from OverflowErrors so unfortunately I could not test them. The solution that worked (from #mglison's answer) was:
new_names = set(name.lower() for name in names)
[ t for t in target if any(map(new_names.__contains__,t.lower().split())) ]
This provided a tremendous increase in performance from 15 seconds to under 1 second.
Seems like you could combine them all into 1 super regex:
import re
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
regex_string = '|'.join(r"(?:\b"+re.escape(x)+r"\b)" for x in names)
print regex_string
regex = re.compile(regex_string,re.I)
print [t for t in target if regex.search(t)]
A non-regex solution which will only work if the names are a single word (no whitespace):
new_names = set(name.lower() for name in names)
[ t for t in target if any(map(new_names.__contains__,t.lower().split())) ]
the any expression could also be written as:
any(x in new_names for x in t.lower().split())
or
any(x.lower() in new_names for x in t.split())
or, another variant which relies on set.intersection (suggested by #DSM below):
[ t for t in target if new_names.intersection(t.lower().split()) ]
You can profile to see which performs best if performance is really critical, otherwise choose the one that you find to be easiest to read/understand.
*If you're using python2.x, you'll probably want to use itertools.imap instead of map if you go that route in the above to get it to evaluate lazily -- It also makes me wonder if python provides a lazy str.split which would have performance on par with the non-lazy version ...
this one is the simplest one i can think of:
[item for item in target if re.search(r'\b(%s)\b' % '|'.join(names), item)]
all together:
import re
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
results = [item for item in target if re.search(r'\b(%s)\b' % '|'.join(names), item)]
print results
>>>
['Chris Smith', 'Kim']
and to make it more efficient, you can compile the regex first.
regex = re.compile( r'\b(%s)\b' % '|'.join(names) )
[item for item in target if regex.search(item)]
edit
after considering the question and looking at some comments, i have revised the 'solution' to the following:
import re
names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kim','Christmas is here', 'CHRIS']
regex = re.compile( r'\b((%s))\b' % ')|('.join([re.escape(name) for name in names]), re.I )
results = [item for item in target if regex.search(item)]
results:
>>>
['Chris Smith', 'Kim', 'CHRIS']
You're currently doing one loop inside another, iterating over two lists. That's always going to give you quadratic performance.
One local optimisation is to compile each name regex (which will make applying each regex faster). However, the big win is going to be to combine all of your regexes into one regex which you apply to each item in your input. See #mgilson's answer for how to do that. After that, your code performance should scale linearly as O(M+N), rather than O(M*N).

Parse out elements from a pattern

I am trying to parse the result output from a natural language parser (Stanford parser).
Some of the results are as below:
dep(Company-1, rent-5')
conj_or(rent-5, share-10)
amod(information-12, personal-11)
prep_about(rent-5, you-14)
amod(companies-20, non-affiliated-19)
aux(provide-23, to-22)
xcomp(you-14, provide-23)
dobj(provide-23, products-24)
aux(requested-29, 've-28)
The result am trying to get are:
['dep', 'Company', 'rent']
['conj_or', 'rent', 'share']
['amod', 'information', 'personal']
...
['amod', 'companies', 'non-affiliated']
...
['aux', 'requested', "'ve"]
First I tried to directly get these elements out, but failed.
Then I realized regex should be the right way forward.
However, I am totally unfamiliar with regex. With some exploration, I got:
m = re.search('(?<=())\w+', line)
m2 =re.search('(?<=-)\d', line)
and stuck.
The first one can correctly get the first elements, e.g. 'dep', 'amod', 'conj_or', but I actually have not totally figured out why it is working...
Second line is trying to get the second elements, e.g. 'Company', 'rent', 'information', but I can only get the number after the word. I cannot figure out how to lookbefore rather than lookbehind...
BTW, I also cannot figure out how to deal with exceptions such as 'non-affiliated' and "'ve".
Could anyone give some hints or help. Highly appreciated.
It is difficult to give an optimal answer without knowing the full range of possible outputs, however, here's a possible solution:
>>> [re.findall(r'[A-Za-z_\'-]+[^-\d\(\)\']', line) for line in s.split('\n')]
[['dep', 'Company', 'rent'],
['conj_or', 'rent', 'share'],
['amod', 'information', 'personal'],
['prep_about', 'rent', 'you'],
['amod', 'companies', 'non-affiliated'],
['aux', 'provide', 'to'],
['xcomp', 'you', 'provide'],
['dobj', 'provide', 'products'],
['aux', 'requested', "'ve"]]
It works by finding all the groups of contiguous letters ([A-Za-z] represent the interval between capital A and Z and small a and z) or the characters "_" and "'" in the same line.
Furthermore it enforce the rule that your matched string must not have in the last position a given list of characters ([^...] is the syntax to say "must not contain any of the characters (replace "..." with the list of characters)).
The character \ escapes those characters like "(" or ")" that would otherwise be parsed by the regex engine as instructions.
Finally, s is the example string you gave in the question...
HTH!
Here is something you're looking for:
([\w-]*)\(([\w-]*)-\d*, ([\w-]*)-\d*\)
The parenthesis around [\w-]* are for grouping, so that you can access data as:
ex = r'([\w-]*)\(([\w-]*)-\d*, ([\w-]*)-\d*\)'
m = re.match(ex, line)
print(m.group(0), m.group(1), m.group(2))
Btw, I recommend using "Kodos" program written in Python+PyQT to learn and test regular expressions. It's my favourite tool to test regexs.
If the results from the parser are as regular as suggested, regexes may not be necessary:
from pprint import pprint
source = """
dep(Company-1, rent-5')
conj_or(rent-5, share-10)
amod(information-12, personal-11)
prep_about(rent-5, you-14)
amod(companies-20, non-affiliated-19)
aux(provide-23, to-22)
xcomp(you-14, provide-23)
dobj(provide-23, products-24)
aux(requested-29, 've-28)
"""
items = []
for line in source.splitlines():
head, sep, tail = line.partition('(')
if head:
item = [head]
head, sep, tail = tail.strip('()').partition(', ')
item.append(head.rpartition('-')[0])
item.append(tail.rpartition('-')[0])
items.append(item)
pprint(items)
Output:
[['dep', 'Company', 'rent'],
['conj_or', 'rent', 'share'],
['amod', 'information', 'personal'],
['prep_about', 'rent', 'you'],
['amod', 'companies', 'non-affiliated'],
['aux', 'provide', 'to'],
['xcomp', 'you', 'provide'],
['dobj', 'provide', 'products'],
['aux', 'requested', "'ve"]]

Categories