Regular Expression for alpha/numeric characters, spaces and dashes [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I need some help writing a couple of complex regular expression that are way over my head.
The first Regex, I want to exclude everything except:
The letters A to Z in both upper and lowercase
Single spaces
Single Dashes (-)
For the second, I want the same as above but also allow:
The numbers 0 to 9
Apostrophes (')
Question Marks (?)
Exclamation Marks (!)
Colons & Semi-Colons (: & ;)
Periods/Fullstops & commas (. & ,)
As a side note, are there any online generators that i can type a list of allowed characters into that will generate one for me?
Many thanks.

To satisfy the "single" requirement, you'll need a lookeahead, along the lines of:
r1 = r"""(?xi)
^
(
[a-z]+
|
\x20(?!\x20)
|
-(?!-)
)
+
$
"""
\x20(?!\x20) reads "a space, if not followed by another space".
For the second re, just add extra chars to the first group: [a-z0-9&+ etc].

Related

Python check string pattern follows xxxx-yyyy-zzzz [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 months ago.
The community is reviewing whether to reopen this question as of 4 months ago.
Improve this question
I am trying to build a check for strings that allows a pattern "xxxx-yyyy-zzzz". The string needs to have three blocks seperated by "-". Each block can contain "a-z", "A-Z", "_" (underscore) and "." (dot).
This is what I got to now:
file_name: str = "actors-zero-This_Is_Fine.exe.yml"
ab = re.compile("^([A-Z][0-9])-([A-Z][0-9])-([A-Z][0-9])+$")
if ab.match(file_name):
pass
else:
print(f"WARNING wrong syntax in {file_name}")
sys.exit()
Output is:
WARNING wrong name in actors-zero-This_Is_Fine.exe.yml
If I understand the question correctly, you want 3 groups of alphanumerical characters (plus .) separated by dashes:
Regex for this would be ^([\w.]+)-([\w.]+)-([\w.]+)$
^ matches the start of a string
Then we have 3 repeating groups:
([\w.]+) will match letters, numbers, underscores (\w) and dots (.) at least one time (+)
We make 3 of these, then separate each with a dash.
We finish off the regex with a $ to match the end of the string, making sure you're matching the whole thing.
What exactly is your question?
This looks alright so far. Your file name returns the warning because you have not specified the underscores in the third "block".
Instead of ([A-Z][0-9]). you could use character classes:
Like so: ^([\w.]+)-([\w.]+)-([\w.]+)$
Generally, I found the chapter in Automate The Boring Stuff on regex Pattern matching very concise and helpful:
https://automatetheboringstuff.com/2e/chapter7/
You will find the above table in this chapter also.

Adding . in string whenever it changes from character to number, or number to character in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a bunch of strings that look like this:
7EE1,
4NF1,
5NF4a,
8F1
They all start with a number, following a few characters, and then another number, then another few characters. And there is no limit on how many chucks they can go. There is no limit for consecutive characters. What I am trying to do is adding "." into the string whenever it changes from character to number or vice verse. For example, the desired output is:
7.EE.1,
4.NF.1,
5.NF.4.a,
8.F.1
I think it can be solved with regular expression, but I haven't learned it before. I am working on creating a regex for this. Any tips would be appreciated!
Here is a very compact way of doing this using regular expressions:
inp = ["7EE1", "4NF1", "5NF4a", "8F1"];
output = [re.sub(r'(\d+(?=\D)|\D+(?=\d))', r'\1.', x) for x in inp]
print(output) # ['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']
The regex works by matching (and capturing) a series of either all digit characters, or all non digit characters, which in turn is followed by a character of the opposite class. It then replaces with whatever was capture followed by a dot separator. Here is an explanation:
( match AND capture:
\d+ one or more digits
(?=\D) followed by a non digit character
| OR
\D+ one or more non digits
(?=\d) followed by a digit character
) stop capture
Note that the lookaheads used above are zero width, so nothing is captured from them.
One way without using re:
from itertools import groupby
inp = ["7EE1", "4NF1", "5NF4a", "8F1"]
def add_dot(string):
return ".".join(["".join(g)
for k, g in groupby(string, key=str.isdigit)])
[add_dot(i) for i in inp]
Output:
['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']

How to write a regex to capture letters separated by punctuation in Python 3? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am new to regex and encountered a problem. I need to parse a list of last names and first names to use in a url and fetch an html page. In my last names or first names, if it's something like "John, Jr" then it should only return John but if it's something like "J.T.R", it should return "JTR" to make the url work. Here is the code I wrote but it doesn't capture "JTR".
import re
last_names_parsed=[]
for ln in last_names:
L_name=re.match('\w+', ln)
last_names_parsed.append(L_name[0])
However, this will not capture J.T.R properly. How should I modify the code to properly handle both?
you can add \. to the regular expression:
import re
final_data = [re.sub('\.', '', re.findall('(?<=^)[a-zA-Z\.]+', i)[0]) for i in last_names]
Regex explanation:
(?<=^): positive lookbehind, ensures that the ensuring regex will only register the match if the match is found at the beginning of the string
[a-zA-Z\.]: matches any occurrence of alphabetical characters: [a-zA-Z], along with a period .
+: searches the previous regex ([a-zA-Z\.]) as long as a period or alphabetic character is found. For instance, in "John, Jr", only John will be matched, because the comma , is not included in the regex expression [a-zA-Z\.], thus halting the match.

finding a word in a string without spaces [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a string without space. eg system-gnome-theme-60.0.2-1.el6.
I have to check in 100 other such strings (without space) which have a few of the previously specified words; e.g. gnome, samba.
How do I do it in python?
There can be any prefix or suffix in the string attached with samba. I have to detect them, what do I do?
Currently I have done this:
for x in array_actual:
for y in array_config:
print x.startswith(y)
print ans
which is completely wrong because it is checking only the first word of the string. That word can be anywhere, between any text.
Instead of using str.startswith(), use the in operator:
if y in x:
or use a regular expression with the | pipe operator:
all_words = re.compile('|'.join([re.escape(line.split(None, 1)[0]) for line in array_config]))
for x in array_actual:
if all_words.search(x):
The '|'.join([...]) list comprehension first escapes each word (making sure that meta characters are matched literally, and are not interpreted as regular expression patterns). For the list ['gnome', 'samba'] this creates the pattern:
gnome|samba
matching any string that contains either word.

Finding Zip number from String [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a string like
x = '''
Anrede:*
Herr
*Name:*
Tobias
*Firma:*
*Strasse/Nr:*
feringerweg
*PLZ/Ort:*
72531
*Mail:*
tovoe#gmeex.de [1]
'''
In that there is a zip number PLZ/Ort:, this is zip number, i wanted to find the zip number from whole string, so the possible way is to use regex, but don't know regex,
Assuming the input in your example is file with multiple strings, you can try something like this:
import re
for line in open(filename, 'r'):
matchPattern = "^(\d{5})$"
match = re.match(matchPattern, line, flags=0)
print match.group(0) #the whole match
If this is just a long string, you can use the same match pattern but without ^ (line begin) and $ (line end) indicators --> (\d{5})
I'm assuming that the Postleitzahl always follows two lines that look like *PLZ/Ort:* and
, and that it's the only text on its line. If that's the case, then you can use something like:
import re
m = re.search('^\*PLZ/Ort:\*\n
\n(\d{5})', x, re.M)
if m:
print m.group(1)
You can try this regex:
(?<=PLZ\/Ort)[\s\S]+?([a-zA-Z0-9\- ]{3,9})
It will support Alpha numeric postal codes as well. You can see postal codes length/format from here.

Categories