My program is given an object with parameters, and I need to get the parameters' values.
The object my program is given will look like this:
Object = """{{objectName|
parameter1=random text|
parameter2=that may or may not|
parameter3=contain any letter (well, almost)|
parameter4=this is some [[problem|problematic text]], Houston, we have a problem!|
otherParameters=(order of parameters is random, but their name is fixed)}}"""
(all parameters might or might not exist)
I am trying to get the properties values.
In the first 3 lines, its pretty easy. a simple regex will find it:
if "parameter1" in Object:
parameter1 = re.split(r"parameter1=(.*?)[\|\}]", Object)[1]
if "parameter2" in Object:
parameter2 = re.split(r"parameter2=(.*?)[\|\}]", Object)[1]
and so on.
The problem is with parameter4, the above regex (property4=(.*?)[\|\}]) will only return this is some [[problem, since the regex stops at the vertical bar.
Now here is the thing: vertical bar will only appear as part of the text inside "[[]]".
For example, parameter1=a[[b|c]]d might appear, but parameter1=a|bc| will never appear.
I need a regex which will stop at vertical bar, unless it is inside double square brackets. So for example, for parameter4, I will get this is some [[problem|problematic text]], Houston, we have a problem!
Worked here when I removed the "?":
parameter4 = re.split(r"parameter4=(.*)[\|\}]", object_)[1]
I also changed the name of the variable to "object_" because "object" is a built-in object in Python
Best.
Apparently, there is no perfect solution.
For other readers possibly reading this question in the future, the closest solution is, as pointed by Wiktor Stribiżew in the comments, parameter4=([^[}|]*(?:\[\[.*?]][^[}|]*)*).
This regex will only work if the param text does not contain any single [, } and | but may contain [[...]] sub-strings.
If you want to understand this regex better, you might want to have a look here: https://regex101.com/r/bWVvKg/2
Related
I'm really sorry for asking because there are some questions like this around. But can't get the answer fixed to make problem.
This are the input lines (e.g. from a config file)
profile2.name=share2
profile8.name=share8
profile4.name=shareSSH
profile9.name=share9
I just want to extract the values behind the = sign with Python 3.9. regex.
I tried this on regex101.
^profile[0-9]\.name=(.*?)
But this gives me the variable name including the = sign as result; e.g. profile2.name=. But I want exactly the inverted opposite.
The expected results (what Pythons re.find_all() return) are
['share2', 'share8', 'shareSSH', 'share9']
Try pattern profile\d+\.name=(.*), look at Regex 101 example
import re
re.findall('profile\d+\.name=(.*)', txt)
# output
['share2', 'share8', 'shareSSH', 'share9']
But this problem doesn't necessarily need regex, split should work absolutely fine:
Try removing the ? quantifier. It will make your capture group match an empty st
regex101
Currently the string I want to change looks like ABCDEFGHIJKL
I'm looking to change it to AB CDEF GHIJ KL
checked around but I was only able to find help on entering spaces at regular intervals.
With no particular rules defined, I can only see the following approach at the moment:
string_with_spaces = f"{string[:2]} {string[2:6]} {string[6:10]} {string[10:]}"
Based on what you have said in your question, you are not interested in adding spaces at regular intervals.
I am trying to use discord.utils.escape_mentions to get rid of mentions in message.content.
Long story short, I noticed that it's not working as I expect.
var = discord.utils.escape_mentions("test #!334765815435886592 test")
print(var)
Prints to me source string as output
test #!334765815435886592 test
However, here is the escape_mentions definition
return re.sub(r'#(everyone|here|[!&]?[0-9]{17,20})', '#\u200b\\1', text)
and if I just copy that and replace '#\u200b\1' with an empty string all works well and I am getting a nice result
test test
Can someone explain to me this behavior and how I can get to work this function?
I beleive you're misunderstanding what escape_mentions really does. It looks like you expect it to replace #[0-9]+ with an empty string, which it does not do.
re.sub takes 3 arguments -- a pattern, the text to replace it with (in a sense), and the text to operate on. Take a look at this:
As you can see, I get <#\u200b700796664276844612> as my output. The interesting thing is, if I were to print it, I get this instead:
Notice how it looks the same as my original text. The reason for this is because \u200b is actually a zero width space, which when printed is invisible.
So, in reality, escape_mentions inserts a zero width space in between the # and the id. So in discord, instead of it being a mention, it will merely be text.
I have already read this: Why doesn't Python have multiline comments?
So in my IDLE , I wrote a comment:
Hello#World
Anything after the d of world is also a part of the comment.In c++ , I am aware of a way to close the comment like:
/*Mycomment*/
Is there a way to end a comment in Python?
NOTE: I would not prefer not to use the triple quotes.
You've already read there are no multiline comments, only single line. Comments cause Python to ignore everything until the end of the line. You "close" them with a newline!
I don't particularly like it, but some people use multiline strings as comments. Since you're just throwing away the value, you can approximate a comment this way. The only time it's really doing anything is when it's the first line in a function or class block, in which case it is treated as a docstring.
Also, this may be more of a shell scripting convention, but what's so bad about using multiple single line comments?
#####################################################################
# It is perfectly fine and natural to write "multi-line" comments #
# using multiple single line comments. Some people even draw boxes #
# with them! #
#####################################################################
You can't close a comment in python other than by ending the line.
There are number of things you can do to provide a comment in the middle of an expression or statement, if that's really what you want to do.
First, with functions you annotate arguments -- an annotation can be anything:
def func(arg0: "arg0 should be a str or int", arg1: (tuple, list)):
...
If you start an expression with ( the expression continues beyond newlines until a matching ) is encountered. Thus
assert (
str
# some comment
.
# another comment
join
) == str.join
You can emulate comments by using strings. They are not exactly comments, since they execute, but they don't return anything.
print("Hello", end = " ");"Comment";print("World!")
if you start with triple quotes, end with triple quotes
I'm hoping to match the beginning of a string differently based on whether a certain block of characters is present later in the string. A very simplified version of this is:
re.search("""^(?(pie)a|b)c.*(?P<pie>asda)$""", 'acaaasda')
Where, if <pie> is matched, I want to see a at the beginning of the string, and if it isn't then I'd rather see b.
I'd use normal numerical lookahead but there's no guarantee how many groups will or won't be matched between these two.
I'm currently getting error: unknown group name. The sinking feeling in my gut tells me that this is because what I want is impossible (look-ahead to named groups isn't exactly a feature of a regular language parser), but I really really really want this to work -- the alternative is scrapping 4 or 5 hours' worth of regex writing and redoing it all tomorrow as a recursive descent parser or something.
Thanks in advance for any help.
Unfortunately, I don't think there is a way to do what you want to do with named groups. If you don't mind duplication too much, you could duplicate the shared conditions and OR the expressions together:
^(ac.*asda|bc.*)$
If it is a complicated expression you could always use string formatting to share it (rather than copy-pasting the shared part):
common_regex = "c.*"
final_regex = "^(a{common}asda|b{common})$".format(common=common_regex)
You can use something like that:
^(?:a(?=c.*(?P<pie>asda)$)|b)c.*$
or without .*$ if you don't need it.