Python check string pattern follows xxxx-yyyy-zzzz [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 months ago.
The community is reviewing whether to reopen this question as of 4 months ago.
Improve this question
I am trying to build a check for strings that allows a pattern "xxxx-yyyy-zzzz". The string needs to have three blocks seperated by "-". Each block can contain "a-z", "A-Z", "_" (underscore) and "." (dot).
This is what I got to now:
file_name: str = "actors-zero-This_Is_Fine.exe.yml"
ab = re.compile("^([A-Z][0-9])-([A-Z][0-9])-([A-Z][0-9])+$")
if ab.match(file_name):
pass
else:
print(f"WARNING wrong syntax in {file_name}")
sys.exit()
Output is:
WARNING wrong name in actors-zero-This_Is_Fine.exe.yml

If I understand the question correctly, you want 3 groups of alphanumerical characters (plus .) separated by dashes:
Regex for this would be ^([\w.]+)-([\w.]+)-([\w.]+)$
^ matches the start of a string
Then we have 3 repeating groups:
([\w.]+) will match letters, numbers, underscores (\w) and dots (.) at least one time (+)
We make 3 of these, then separate each with a dash.
We finish off the regex with a $ to match the end of the string, making sure you're matching the whole thing.

What exactly is your question?
This looks alright so far. Your file name returns the warning because you have not specified the underscores in the third "block".
Instead of ([A-Z][0-9]). you could use character classes:
Like so: ^([\w.]+)-([\w.]+)-([\w.]+)$
Generally, I found the chapter in Automate The Boring Stuff on regex Pattern matching very concise and helpful:
https://automatetheboringstuff.com/2e/chapter7/
You will find the above table in this chapter also.

Related

Adding . in string whenever it changes from character to number, or number to character in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a bunch of strings that look like this:
7EE1,
4NF1,
5NF4a,
8F1
They all start with a number, following a few characters, and then another number, then another few characters. And there is no limit on how many chucks they can go. There is no limit for consecutive characters. What I am trying to do is adding "." into the string whenever it changes from character to number or vice verse. For example, the desired output is:
7.EE.1,
4.NF.1,
5.NF.4.a,
8.F.1
I think it can be solved with regular expression, but I haven't learned it before. I am working on creating a regex for this. Any tips would be appreciated!
Here is a very compact way of doing this using regular expressions:
inp = ["7EE1", "4NF1", "5NF4a", "8F1"];
output = [re.sub(r'(\d+(?=\D)|\D+(?=\d))', r'\1.', x) for x in inp]
print(output) # ['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']
The regex works by matching (and capturing) a series of either all digit characters, or all non digit characters, which in turn is followed by a character of the opposite class. It then replaces with whatever was capture followed by a dot separator. Here is an explanation:
( match AND capture:
\d+ one or more digits
(?=\D) followed by a non digit character
| OR
\D+ one or more non digits
(?=\d) followed by a digit character
) stop capture
Note that the lookaheads used above are zero width, so nothing is captured from them.
One way without using re:
from itertools import groupby
inp = ["7EE1", "4NF1", "5NF4a", "8F1"]
def add_dot(string):
return ".".join(["".join(g)
for k, g in groupby(string, key=str.isdigit)])
[add_dot(i) for i in inp]
Output:
['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']

Regex- match on a specific string with any two digit integer in it? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a list of filenames that look like this:
red.t<0 padded int>z.white.blue<0 padded int>.ab00.txt2
For example:
red.t01z.white.blue12.ab00.txt2
red.t02z.white.blue45.ab00.txt2
red.t03z.white.blue09.ab00.txt2
I want to match on this sequence, for any two digit number. The 00 near the end is constant, and it shouldn't match on any other values there. ie, this wouldn't match red.t03z.white.blue09.ab01.txt2.
I tried red.t[0-9]*z.white.blue[0-9]*.ab00.txt, but that only works when I have the first [0-9]* in there, the second one makes it no longer match. What is the solution to this?
You could use anchors to assert the start and end of the string, escape the dot to match it literally and use a quantifier 0-9[{2} to match 2 digits.
^red\.t[0-9]{2}z\.white\.blue[0-9]{2}\.ab00\.txt2$
Regex demo

Python Best Practice to Remove Whitespace Variations from List of Strings [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Is there a best practice to remove weird whitespace unicode characters from strings in Python?
For example if a string contains one of the following unicodes in this table I would like to remove it.
I was thinking of putting the unicodes into a list then doing a loop using replace but I'm sure there is a more pythonic way of doing so.
You should be able to use this
[''.join(letter for letter in word if not letter.isspace()) for word in word_list]
because if you read the docs for str.isspace it says:
Return True if there are only whitespace characters in the string and there is at least one character, False otherwise.
A character is whitespace if in the Unicode character database (see unicodedata), either its general category is Zs (“Separator, space”), or its bidirectional class is one of WS, B, or S.
If you look at the unicode character list for category Zs.
Regex is your friend in cases like this, you can simply iterate over your list applying a regex substitution
import re
r = re.compile(r"^\s+")
dirty_list = [...]
# iterate over dirty_list substituting
# any whitespace with an empty string
clean_list = [
r.sub("", s)
for s in dirty_list
]

Find a part of a string using python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
For example:
string = "abcdefghi"
separated = "abc" + x + "ghi"
x = ???
I want to find x, using any string.
x=re.search('(?<=abc).*(?=ghi)','abcdefghi').group(0)
print(x)
output
def
Explanation
Regex
(?<=abc) #Positive look behind. Start match after abc
.* #Collect everything that matches the look behind and look ahead conditions
(?=ghi) #Positive look ahead. Match only chars that come before ghi
re.search documentation here.
A Match Object is returned by re.search. A group(0) call on it would return the full match. Detail on Match Object can be found here.
Note:
The regex is aggressive so would match/return defghixyz in abcdefghixyzghi.
See demo here.

finding a word in a string without spaces [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a string without space. eg system-gnome-theme-60.0.2-1.el6.
I have to check in 100 other such strings (without space) which have a few of the previously specified words; e.g. gnome, samba.
How do I do it in python?
There can be any prefix or suffix in the string attached with samba. I have to detect them, what do I do?
Currently I have done this:
for x in array_actual:
for y in array_config:
print x.startswith(y)
print ans
which is completely wrong because it is checking only the first word of the string. That word can be anywhere, between any text.
Instead of using str.startswith(), use the in operator:
if y in x:
or use a regular expression with the | pipe operator:
all_words = re.compile('|'.join([re.escape(line.split(None, 1)[0]) for line in array_config]))
for x in array_actual:
if all_words.search(x):
The '|'.join([...]) list comprehension first escapes each word (making sure that meta characters are matched literally, and are not interpreted as regular expression patterns). For the list ['gnome', 'samba'] this creates the pattern:
gnome|samba
matching any string that contains either word.

Categories