Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
im trying to do a kind of basic AI for a school project using python's re, and i wanted to ask is there a way to determine if given pattern exists in string, like
string = raw_input()
"""I type in 'Hey can you check the weather?' """
and if it does find word 'weather' it returns true or false.
I would then use it to run through series of if statements, to chceck what the user wants to do(like weather, date, time, and other things).
Also, i would be very happy to hear from you guys if u had better idea of solving this problem.
You do not need regex for such simple checks. Simply use in to check for given word in your sentence as:
>>> my_string = 'Hey can you check the weather?'
>>> 'weather' in my_string
True
So, in order to check from list of words, you may use any() as:
>>> words_to_check = ['hello', 'world', 'weather']
>>> any(word in my_string for word in words_to_check)
True
As mentioned by #DYZ, in order to do case in-sensitive match, you need to make a check on the lowercased string as:
# Converts string to lowercase v
>>> any(word in my_string.lower() for word in words_to_check)
True
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Is there a best practice to remove weird whitespace unicode characters from strings in Python?
For example if a string contains one of the following unicodes in this table I would like to remove it.
I was thinking of putting the unicodes into a list then doing a loop using replace but I'm sure there is a more pythonic way of doing so.
You should be able to use this
[''.join(letter for letter in word if not letter.isspace()) for word in word_list]
because if you read the docs for str.isspace it says:
Return True if there are only whitespace characters in the string and there is at least one character, False otherwise.
A character is whitespace if in the Unicode character database (see unicodedata), either its general category is Zs (“Separator, space”), or its bidirectional class is one of WS, B, or S.
If you look at the unicode character list for category Zs.
Regex is your friend in cases like this, you can simply iterate over your list applying a regex substitution
import re
r = re.compile(r"^\s+")
dirty_list = [...]
# iterate over dirty_list substituting
# any whitespace with an empty string
clean_list = [
r.sub("", s)
for s in dirty_list
]
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What does .start() do in the following script?
import re
str2 = re.search("\((\w+)\)", str1)
return str1[:str2.start()].strip()
If you are more of a reader, the documentation of match.start() would tell you what it does.
If you are more of an experimenter, open an interactive python console, and input the following (feel free to change the input data, after all you are an experimenter):
>>> import re
>>> str1 = 'Hello (python) world'
>>> str2 = re.search("\((\w+)\)", str1)
>>> str2.start()
6
>>> str1[:6]
'Hello '
>>>
Short explanation: it tells you the index of the starting position of the match.
Hope this answer will teach you something more than just what does match.start() do ;-)
From the Python documentation for the start method
https://docs.python.org/3/library/re.html
It returns the index of the substring that matched.
So, str2.start() is where the regex was matched in str1.
Think of that return as saying,
Returning everything in str1 up to where the regex was matched, and strip whitespace.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
AKA the correct version of this:
if ['hi', 'hello', 'greetings'] in userMessage:
print('Hello!')
I tried what's showed above but it says it cannot use lists, it must use a single string. Same thing if I set the array to an object/variable. If I use "or" it doesn't seem to work altogether.
If the goal is just to say if any of the known list appears in userMessage, and you don't care which one it is, use any with a generator expression:
if any(srchstr in userMessage for srchstr in ('hi', 'hello', 'greetings')):
It will short-circuit when it gets a hit, so if hi appears in the input, it doesn't check the rest, and immediately returns True.
If the words must be found as individual words (so userMessage = "This" should be false, even though hi appears in it), then use:
if not {'hi', 'hello', 'greetings'}.isdisjoint(userMessage.split()):
which also short-circuits, but in a different way; it iterates userMessage.split() until it matches one of the keywords, then stops and returns False (which the not flips to True), returning True (flipped to False by not) only if none of the words matches a keyword.
You can do:
found = set(['hi','hello','greetings']) & set(userMessage.split())
for obj in found:
print found
if you are looking for multiple words as well
You can also compare multiple elements using Set:
if set(['hi', 'hello', 'greetings']) <= set(userMessage.split()):
print("Hello!")
But be careful to use split(), once it will not avoid ponctuation. So, if your userMessage is something like "hi, hello, greetings." it will compare the words against ["hi,", "hello,", "greetings."]
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a string without space. eg system-gnome-theme-60.0.2-1.el6.
I have to check in 100 other such strings (without space) which have a few of the previously specified words; e.g. gnome, samba.
How do I do it in python?
There can be any prefix or suffix in the string attached with samba. I have to detect them, what do I do?
Currently I have done this:
for x in array_actual:
for y in array_config:
print x.startswith(y)
print ans
which is completely wrong because it is checking only the first word of the string. That word can be anywhere, between any text.
Instead of using str.startswith(), use the in operator:
if y in x:
or use a regular expression with the | pipe operator:
all_words = re.compile('|'.join([re.escape(line.split(None, 1)[0]) for line in array_config]))
for x in array_actual:
if all_words.search(x):
The '|'.join([...]) list comprehension first escapes each word (making sure that meta characters are matched literally, and are not interpreted as regular expression patterns). For the list ['gnome', 'samba'] this creates the pattern:
gnome|samba
matching any string that contains either word.