How to extract a interrogation sentence from a string [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a string. For example :
"This is a string.Is this a question?What is the Question? I Dont know what the question is. Can you please list out the question?"
I want to extract the questions from this text using regex
what i tried
re.findall(r'(how|can|what|where|describe|who|when)(.*?)\s*\?',message,re.I|re.M))
But it gives out other things as well and if I gives the questions it separates the (how what which etc) and the rest of the question
For the above example my output is
[('is', ' is a string.Is this a question'), ('What', ' is the Question'), ('what', ' the question is. Can you please list out the question')]
Where as I want the entire question to be together.

It's totally impractical to search for key words when determining whether a sentence is a question. Given your list: how|can|what|where|describe|who|when, I can easily write sentences containing one of those words, which are not questions!
There are many ways you could tackle matching a sentence. For example, taking this as a baseline:
^\s*[A-Za-z,;'"\s]+[.?!]$
We could first alter it to match multiple sentences in the same string:
(^|(?<=[.?!]))\s*[A-Za-z,;'"\s]+[.?!]
This uses a look-behind to ensure that a sentence has just finished (unless we're at the start of the string).
And then adjust it to match only sentences which end with ?:
(^|(?<=[.?!]))\s*[A-Za-z,;'"\s]+\?
Here is an online demo of my regex, on your original string.

To have the entire question together, you should just enclose the whole pattern in parenthesis.
Here is another, simplified version:
\b([A-Z][^.!]*[?])

Thank you for helping me out
the answer was provided by #Fredrik
and can be found here https://regex101.com/r/rT1mQ0/2
\s*([^.?]*(?:how|can|what|where|describe|who|when)[^.?]*?\s*\?)

Related

Replace spaces in a locations but not as simple [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I made a service to record some data from trains company. But over the past few months, they have modified the rules to save the names of train stations. For instance, there is both 'Saint-Charles' and 'Saint - Charles', so my requests are not complete in my database.
I would like to know if there is a quick (and safe) way to unify the both syntax? I would like to change 'Saint - Charles' to 'Saint-Charles' but I don't really know how to do it safely. Indeed, I have other locations 'Saint James' and I don't want to make a rule to replace the space in the word.
Maybe regex expression will help, but I am not familiar with this.
I use Python for my service.
Thank you for your help.
Regards,
my_str = "Saint - Charles"
converted_string = "-".join([substring.strip() for substring in my_str.split("-")])
print(converted_string)
Saint-Charles
How this works is we split the original string by "-", then we use .strip() function to trim out spaces both at the start and end of the substring, then finally joining the substrings back which results in spaces left and right of "-" being removed.
Strings without "-" like "Saint James" will be unaffected.

Splitting strings in 80%/20% parts [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a list of strings and I only want to keep the first 80% of text of each string. So, if a string has for example 100 words, I only want to keep the first 80 words. The split function is not suitable for this problem.
What function can I use, while iterating over the list, to achieve this?
Why isn't it?
sentence = "long string lots of words..."
parts = sentence.split()
newsentence = ' '.join(parts[:len(parts)*4//5])

Guidance on basic python assignment [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Need to create a python code to provide a list of tuples (searched words, list of occurrences).
the searched words are listed in a Thesaurus which need to be searched in a series of documents in a Corpus.
Any suggestion/guidance?
After you read the file, you could simply use split on space to get a list of words. This however would include punctuation. To remove the punctuation you could get a list of punctuation from "string" library's "punctuation" attribute and replace the occurences of punctuation in the words list obtained above with empty string,"". Your words might have special symbols such as "/" to represent or. Then you would need regular expressions to extract the words.

In python, how do you locate a specific word in a string file. [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I know you can use seek() to find a specific byte and start reading from there. How do you find a specific word in a file and start reading from there. For example, how do I start reading a file from the word 'Origin'! Thanks for any help!!
You can implement this efficiently by using the same algorithm that grep uses to find words. This is the Boyer-Moore string search algorithm.
Fundamentally you search for the last letter of the string. You do this by creating a list of all of the letters in your target word, and then you inspect letters in the file using seek. If you find a letter which is not in the word then you know that the word cannot end before the full length of the word, so you can skip that far ahead and test again. If the letter is in the word then you use the possible positions of it in the word to refine your search. If you find the last letter, then you can move back to the expected start of the word and check that it is as you expect.

Capitalize letter in the middle of a string using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have been using the following code to capitalize words:
with open("capitalize.txt") as f:
for line in f:
print line.title(),
It works fine but I want to be able to capitalize letters in the middle of the string e.g
change javascript to JavaScript, how can I do this using python?
It seems that you're not describing an algorithmic transformation (eg first letter, last letter, word boundaries, etc) but rather an arbitrary capitalization scheme in the context of known words.
As such, you'll probably want a permutation of the following using replace:
with open("capitalize.txt") as f:
for line in f:
print line.replace("javascript", "JavaScript")
If you've got a known set of words, then you can make it fancier, such as creating a dict {'javascript': 'JavaScript'} and then looping through the keys replacing each key with its value, but the basic approach will be more manual than you're envisioning.

Categories