Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 months ago.
Improve this question
I have a text file called my_file.txt that has the following content:
R.A.O.S-VARIATION WITH WAVE PERIOD/FREQUENCY
VEL R.A.O.S-VARIATION WITH WAVE PERIOD/FREQUENCY
ACC R.A.O.S-VARIATION WITH WAVE PERIOD/FREQUENCY
SOME OTHER STRING
Now, I want to find the string R.A.O.S-VARIATION WITH WAVE PERIOD/FREQUENCY without any of the two other choises that have either VEL or ACC in it using a regular expression. There are white spaces before those strings.
How do I get the correct match using:
re.search(regex, line)
How should the regular expression look like that finds the desired string, without the two other options? I prefer to not specify all preleading white spaces in the string.
For a fixed-string match like this, you don't need a regular expression.
Use this instead:
if line.strip() == "R.A.O.S-VARIATION WITH WAVE PERIOD/FREQUENCY":
# do stuff
If you really want to use a regular expression, use re.fullmatch() to match the whole line:
if re.fullmatch(r"\s*R\.A\.O\.S-VARIATION WITH WAVE PERIOD/FREQUENCY\s*", line):
# do stuff
What you can do is remove the leading whitespace using the .lstrip() function and then search for the string.
re.search(regex, line.lstrip())
re.findall(r'R.A.O.S-VARIATION WITH WAVE PERIOD/FREQUENCY', line)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 months ago.
The community is reviewing whether to reopen this question as of 4 months ago.
Improve this question
I am trying to build a check for strings that allows a pattern "xxxx-yyyy-zzzz". The string needs to have three blocks seperated by "-". Each block can contain "a-z", "A-Z", "_" (underscore) and "." (dot).
This is what I got to now:
file_name: str = "actors-zero-This_Is_Fine.exe.yml"
ab = re.compile("^([A-Z][0-9])-([A-Z][0-9])-([A-Z][0-9])+$")
if ab.match(file_name):
pass
else:
print(f"WARNING wrong syntax in {file_name}")
sys.exit()
Output is:
WARNING wrong name in actors-zero-This_Is_Fine.exe.yml
If I understand the question correctly, you want 3 groups of alphanumerical characters (plus .) separated by dashes:
Regex for this would be ^([\w.]+)-([\w.]+)-([\w.]+)$
^ matches the start of a string
Then we have 3 repeating groups:
([\w.]+) will match letters, numbers, underscores (\w) and dots (.) at least one time (+)
We make 3 of these, then separate each with a dash.
We finish off the regex with a $ to match the end of the string, making sure you're matching the whole thing.
What exactly is your question?
This looks alright so far. Your file name returns the warning because you have not specified the underscores in the third "block".
Instead of ([A-Z][0-9]). you could use character classes:
Like so: ^([\w.]+)-([\w.]+)-([\w.]+)$
Generally, I found the chapter in Automate The Boring Stuff on regex Pattern matching very concise and helpful:
https://automatetheboringstuff.com/2e/chapter7/
You will find the above table in this chapter also.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a bunch of strings that look like this:
7EE1,
4NF1,
5NF4a,
8F1
They all start with a number, following a few characters, and then another number, then another few characters. And there is no limit on how many chucks they can go. There is no limit for consecutive characters. What I am trying to do is adding "." into the string whenever it changes from character to number or vice verse. For example, the desired output is:
7.EE.1,
4.NF.1,
5.NF.4.a,
8.F.1
I think it can be solved with regular expression, but I haven't learned it before. I am working on creating a regex for this. Any tips would be appreciated!
Here is a very compact way of doing this using regular expressions:
inp = ["7EE1", "4NF1", "5NF4a", "8F1"];
output = [re.sub(r'(\d+(?=\D)|\D+(?=\d))', r'\1.', x) for x in inp]
print(output) # ['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']
The regex works by matching (and capturing) a series of either all digit characters, or all non digit characters, which in turn is followed by a character of the opposite class. It then replaces with whatever was capture followed by a dot separator. Here is an explanation:
( match AND capture:
\d+ one or more digits
(?=\D) followed by a non digit character
| OR
\D+ one or more non digits
(?=\d) followed by a digit character
) stop capture
Note that the lookaheads used above are zero width, so nothing is captured from them.
One way without using re:
from itertools import groupby
inp = ["7EE1", "4NF1", "5NF4a", "8F1"]
def add_dot(string):
return ".".join(["".join(g)
for k, g in groupby(string, key=str.isdigit)])
[add_dot(i) for i in inp]
Output:
['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Is there a best practice to remove weird whitespace unicode characters from strings in Python?
For example if a string contains one of the following unicodes in this table I would like to remove it.
I was thinking of putting the unicodes into a list then doing a loop using replace but I'm sure there is a more pythonic way of doing so.
You should be able to use this
[''.join(letter for letter in word if not letter.isspace()) for word in word_list]
because if you read the docs for str.isspace it says:
Return True if there are only whitespace characters in the string and there is at least one character, False otherwise.
A character is whitespace if in the Unicode character database (see unicodedata), either its general category is Zs (“Separator, space”), or its bidirectional class is one of WS, B, or S.
If you look at the unicode character list for category Zs.
Regex is your friend in cases like this, you can simply iterate over your list applying a regex substitution
import re
r = re.compile(r"^\s+")
dirty_list = [...]
# iterate over dirty_list substituting
# any whitespace with an empty string
clean_list = [
r.sub("", s)
for s in dirty_list
]
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
How do I match the following pattern using re?
2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490
I used python's split() function to separate the attributes out but as the data is huge, the process is getting killed due to memory errors.
If you put the long version of string it would be better.
So how can you make it ? That is the answer:
import re
str = "2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490"
pattern = re.compile("(.*?),", re.DOTALL) #we use re.DOTALL to continue splitting after endlines.
result = pattern.findall(str) #we can't find the last statement (2584490) because of the pattern so we will apply second process
pattern = re.compile("(.*?)", re.DOTALL)
str2 = str[-50:-1]+str[-1] #we take last partition of string to find out last statement by using split() method
result.append(str2.split(",")[-1])
print result
It works...
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a string without space. eg system-gnome-theme-60.0.2-1.el6.
I have to check in 100 other such strings (without space) which have a few of the previously specified words; e.g. gnome, samba.
How do I do it in python?
There can be any prefix or suffix in the string attached with samba. I have to detect them, what do I do?
Currently I have done this:
for x in array_actual:
for y in array_config:
print x.startswith(y)
print ans
which is completely wrong because it is checking only the first word of the string. That word can be anywhere, between any text.
Instead of using str.startswith(), use the in operator:
if y in x:
or use a regular expression with the | pipe operator:
all_words = re.compile('|'.join([re.escape(line.split(None, 1)[0]) for line in array_config]))
for x in array_actual:
if all_words.search(x):
The '|'.join([...]) list comprehension first escapes each word (making sure that meta characters are matched literally, and are not interpreted as regular expression patterns). For the list ['gnome', 'samba'] this creates the pattern:
gnome|samba
matching any string that contains either word.