Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
For example:
string = "abcdefghi"
separated = "abc" + x + "ghi"
x = ???
I want to find x, using any string.
x=re.search('(?<=abc).*(?=ghi)','abcdefghi').group(0)
print(x)
output
def
Explanation
Regex
(?<=abc) #Positive look behind. Start match after abc
.* #Collect everything that matches the look behind and look ahead conditions
(?=ghi) #Positive look ahead. Match only chars that come before ghi
re.search documentation here.
A Match Object is returned by re.search. A group(0) call on it would return the full match. Detail on Match Object can be found here.
Note:
The regex is aggressive so would match/return defghixyz in abcdefghixyzghi.
See demo here.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 months ago.
The community is reviewing whether to reopen this question as of 4 months ago.
Improve this question
I am trying to build a check for strings that allows a pattern "xxxx-yyyy-zzzz". The string needs to have three blocks seperated by "-". Each block can contain "a-z", "A-Z", "_" (underscore) and "." (dot).
This is what I got to now:
file_name: str = "actors-zero-This_Is_Fine.exe.yml"
ab = re.compile("^([A-Z][0-9])-([A-Z][0-9])-([A-Z][0-9])+$")
if ab.match(file_name):
pass
else:
print(f"WARNING wrong syntax in {file_name}")
sys.exit()
Output is:
WARNING wrong name in actors-zero-This_Is_Fine.exe.yml
If I understand the question correctly, you want 3 groups of alphanumerical characters (plus .) separated by dashes:
Regex for this would be ^([\w.]+)-([\w.]+)-([\w.]+)$
^ matches the start of a string
Then we have 3 repeating groups:
([\w.]+) will match letters, numbers, underscores (\w) and dots (.) at least one time (+)
We make 3 of these, then separate each with a dash.
We finish off the regex with a $ to match the end of the string, making sure you're matching the whole thing.
What exactly is your question?
This looks alright so far. Your file name returns the warning because you have not specified the underscores in the third "block".
Instead of ([A-Z][0-9]). you could use character classes:
Like so: ^([\w.]+)-([\w.]+)-([\w.]+)$
Generally, I found the chapter in Automate The Boring Stuff on regex Pattern matching very concise and helpful:
https://automatetheboringstuff.com/2e/chapter7/
You will find the above table in this chapter also.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 months ago.
Improve this question
How can we reduce a string like haaaaaaapppppyyyyyy to haappyy
Such that repetition is allowed to a maximum of twice in a row for a character in a string?
including any character ( special characters also )
converting --------------------- to --
We can use a regex replacement:
inp = "haaaaaaapppppyyyyyy"
output = re.sub(r'(\w)\1{2,}', r'\1\1', inp)
print(output) # haappyy
The above logic matches any one character which is followed by itself two or more times. It then replaces with just two of the character.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to remove links in the format Reddit uses
comment = "Hello this is my [website](https://www.google.com)"
no_links = RemoveLinks(comment)
# no_links == "Hello this is my website"
I found a similar question about the same thing, but I don't know how to translate it to python.
I am not that familiar with regex so I would appreciate it if you explained what's happening.
You could do the following:
import re
pattern = re.compile('\[(.*?)\]\(.*?\)')
comment = "Hello this is my [website](https://www.google.com)"
print(pattern.sub(r'\1', comment))
The line:
pattern = re.compile('\[(.*?)\]\(.*?\)')
creates a regex pattern that will search for anything surrounded by square brackets, followed by anything surrounded by parenthesis, the '?' indicates that they should match as little text as possible (non-greedy).
The function sub(r'\1', comment) replaces a match by the first capturing group in this case the text inside the brackets.
For more information about regex I suggest you read this.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
How do I match the following pattern using re?
2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490
I used python's split() function to separate the attributes out but as the data is huge, the process is getting killed due to memory errors.
If you put the long version of string it would be better.
So how can you make it ? That is the answer:
import re
str = "2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490"
pattern = re.compile("(.*?),", re.DOTALL) #we use re.DOTALL to continue splitting after endlines.
result = pattern.findall(str) #we can't find the last statement (2584490) because of the pattern so we will apply second process
pattern = re.compile("(.*?)", re.DOTALL)
str2 = str[-50:-1]+str[-1] #we take last partition of string to find out last statement by using split() method
result.append(str2.split(",")[-1])
print result
It works...
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.