When to use Groups in Regular Expressions? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm a newbie and learning more about regular expressions. I'm still unclear as to why we use groups. I used them in the below regular expression below:
(http:)\//(\w)+\.(\w)+\.(\w)+
This will extract URL's, as in the below sentence:
This is http://www.google.com, this is http://www.yahoo.com.
I did use groups but I was very unsure as to why. I saw this explanation online but confused as to what it means:
By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex.
So any simplified clarification of groups would be great.

When I use groups it is usually because I need to replace some, but not all, of a specific regular expression pattern.
As an example let's say you have a large text file, and you want to change all hostnames that end in .com to end in .biz instead.
Obviously you can't just blindly replace .com with .biz, because that text could occur somewhere that isn't a hostname. So you need a way to identify just pieces of text that look like hostnames.
I won't go into the full hostname rules here, but for purposes of this example, let's pretend that hostnames are two to four sequences of alphabetic characters separated by periods, such as ibm.com or www.santa.northpole.org.
A regular expression to identify hostnames that end in .com might look like this:
([a-z]+\.){1,3}com
Which means "one or more letters followed by a period, occurring one to three times, followed by com."
The first part of the expression is inside parentheses, meaning it can be handled separately from the rest. So you could have a replacement pattern like this:
\1.biz
Meaning "Keep the first group expression unchanged and put .biz at the end."

Related

Replace spaces in a locations but not as simple [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I made a service to record some data from trains company. But over the past few months, they have modified the rules to save the names of train stations. For instance, there is both 'Saint-Charles' and 'Saint - Charles', so my requests are not complete in my database.
I would like to know if there is a quick (and safe) way to unify the both syntax? I would like to change 'Saint - Charles' to 'Saint-Charles' but I don't really know how to do it safely. Indeed, I have other locations 'Saint James' and I don't want to make a rule to replace the space in the word.
Maybe regex expression will help, but I am not familiar with this.
I use Python for my service.
Thank you for your help.
Regards,
my_str = "Saint - Charles"
converted_string = "-".join([substring.strip() for substring in my_str.split("-")])
print(converted_string)
Saint-Charles
How this works is we split the original string by "-", then we use .strip() function to trim out spaces both at the start and end of the substring, then finally joining the substrings back which results in spaces left and right of "-" being removed.
Strings without "-" like "Saint James" will be unaffected.

Combining two regular expressions with different grouping requirements [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have two different repeated character substitution rules I'd like to combine into one regex.
I can do this in python 3.x:
import re
s = r'http://www.google.com/search=ooo-eeee-aa-ii-uuuu'
aiu=re.compile(r'(([aiu])\2{1,})')
eo=re.compile(r'(([eo])\2{2,})')
eo.sub(r'\2',aiu.sub(r'\2',s))
IF there is a major performance gain (this operation will be applied millions of times), is there a single regex expression that achieves what these two achieve (without having to nest calls like I did above).
You can combine the two substitutions with an alternation pattern. The replacement string can be both \1 and \2 together, since one of them will be empty and not affect the output anyway.
aeiou = re.compile(r'([aiu])\1{1,}|([eo])\2{2,}')
aeiou.sub(r'\1\2', s)

Guidance on basic python assignment [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Need to create a python code to provide a list of tuples (searched words, list of occurrences).
the searched words are listed in a Thesaurus which need to be searched in a series of documents in a Corpus.
Any suggestion/guidance?
After you read the file, you could simply use split on space to get a list of words. This however would include punctuation. To remove the punctuation you could get a list of punctuation from "string" library's "punctuation" attribute and replace the occurences of punctuation in the words list obtained above with empty string,"". Your words might have special symbols such as "/" to represent or. Then you would need regular expressions to extract the words.

How to extract a interrogation sentence from a string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a string. For example :
"This is a string.Is this a question?What is the Question? I Dont know what the question is. Can you please list out the question?"
I want to extract the questions from this text using regex
what i tried
re.findall(r'(how|can|what|where|describe|who|when)(.*?)\s*\?',message,re.I|re.M))
But it gives out other things as well and if I gives the questions it separates the (how what which etc) and the rest of the question
For the above example my output is
[('is', ' is a string.Is this a question'), ('What', ' is the Question'), ('what', ' the question is. Can you please list out the question')]
Where as I want the entire question to be together.
It's totally impractical to search for key words when determining whether a sentence is a question. Given your list: how|can|what|where|describe|who|when, I can easily write sentences containing one of those words, which are not questions!
There are many ways you could tackle matching a sentence. For example, taking this as a baseline:
^\s*[A-Za-z,;'"\s]+[.?!]$
We could first alter it to match multiple sentences in the same string:
(^|(?<=[.?!]))\s*[A-Za-z,;'"\s]+[.?!]
This uses a look-behind to ensure that a sentence has just finished (unless we're at the start of the string).
And then adjust it to match only sentences which end with ?:
(^|(?<=[.?!]))\s*[A-Za-z,;'"\s]+\?
Here is an online demo of my regex, on your original string.
To have the entire question together, you should just enclose the whole pattern in parenthesis.
Here is another, simplified version:
\b([A-Z][^.!]*[?])
Thank you for helping me out
the answer was provided by #Fredrik
and can be found here https://regex101.com/r/rT1mQ0/2
\s*([^.?]*(?:how|can|what|where|describe|who|when)[^.?]*?\s*\?)

Capitalize letter in the middle of a string using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have been using the following code to capitalize words:
with open("capitalize.txt") as f:
for line in f:
print line.title(),
It works fine but I want to be able to capitalize letters in the middle of the string e.g
change javascript to JavaScript, how can I do this using python?
It seems that you're not describing an algorithmic transformation (eg first letter, last letter, word boundaries, etc) but rather an arbitrary capitalization scheme in the context of known words.
As such, you'll probably want a permutation of the following using replace:
with open("capitalize.txt") as f:
for line in f:
print line.replace("javascript", "JavaScript")
If you've got a known set of words, then you can make it fancier, such as creating a dict {'javascript': 'JavaScript'} and then looping through the keys replacing each key with its value, but the basic approach will be more manual than you're envisioning.

Categories