I am trying to use discord.utils.escape_mentions to get rid of mentions in message.content.
Long story short, I noticed that it's not working as I expect.
var = discord.utils.escape_mentions("test #!334765815435886592 test")
print(var)
Prints to me source string as output
test #!334765815435886592 test
However, here is the escape_mentions definition
return re.sub(r'#(everyone|here|[!&]?[0-9]{17,20})', '#\u200b\\1', text)
and if I just copy that and replace '#\u200b\1' with an empty string all works well and I am getting a nice result
test test
Can someone explain to me this behavior and how I can get to work this function?
I beleive you're misunderstanding what escape_mentions really does. It looks like you expect it to replace #[0-9]+ with an empty string, which it does not do.
re.sub takes 3 arguments -- a pattern, the text to replace it with (in a sense), and the text to operate on. Take a look at this:
As you can see, I get <#\u200b700796664276844612> as my output. The interesting thing is, if I were to print it, I get this instead:
Notice how it looks the same as my original text. The reason for this is because \u200b is actually a zero width space, which when printed is invisible.
So, in reality, escape_mentions inserts a zero width space in between the # and the id. So in discord, instead of it being a mention, it will merely be text.
Related
I have the following code (it changes the string/filepath, replacing the numbers at the end of the filename + the file extension, and replaces that with "#.exr"). I hope I made the problem replicatable below.
I was doing it this way because the filename can be typed in all kinds of ways, for example:
r_frame.003.exr (but also)
r_12_frame.03.exr
etc.
import pyseq
import re
#create render sequence list
selected_file = 'H:/test/r_frame1.exr'
without_extention = selected_file.replace(".exr", "")
my_regex_pattern = r"\d+\b"
sequence_name_with_replaced_number = re.sub(my_regex_pattern, "#.exr" ,without_extention)
mijn_sequences = fileseq.findSequencesOnDisk(sequence_name_with_replaced_number)
If I print the "sequence_name_with_replaced_number" value, this results in the console in:
'H:/test/r_frame#.exr'
When I use that variable inside that function like this:
mijn_sequences = fileseq.findSequencesOnDisk(sequence_name_with_replaced_number)
Then it does not work.
But when I manually replace that last line into:
mijn_sequences = fileseq.findSequencesOnDisk('H:/test/r_frame#.exr')
Then it works fine. (it's the seems like same value/string)
But this is not an viable option, the whole point of the code if to have the computer do this for thousands of frames.
Anybody any idea what might be the cause of this?
I already tried re-converting the variable into a string with str()
I tried other ways like using an f-string, I wasn't sure how to convert it into a raw string since the variable already exists.
After this I will do simple for loop going trough al the files in that sequence. The reason I'm doing this workflow is to delete the numbers before the .exr file extensions and replace them with # signs. (but ognoring all the bumbers that are not at the end of the filename, hence that regex above. Again, the "sequence_name_with_replaced_number" variable seems ok in the console. It spits out: 'H:/test/r_frame#.exr' (that's what I need it to be)
It's fixed!
the problem was correct, every time I did a cut and past from the variable value in the console and treated it as manual input it worked.
Then I did a len() of both values, and there was a difference by 2!
What happend?
The console added the ''
But in the generated variable it had those baked in as extra letters.
i fixed it by adding
cleaned_sequence = sequence_name_with_replaced_number[1:-1]
so 'H:/test/r_frame1.exr' (as the console showed me)
was not the same as 'H:/test/r_frame1.exr' (what I inserted manually, because I added these marks, in the console there are showed automatically)
I'm trying to create a random text generator in python. I'm using Markovify to produce the required text, a filter to not let it start generating text unless the first word is capitalized and, to prevent it from ending "mid sentence", want the program to search from the back of the output to the front and remove all text after the last (for instance) period. I want it to ignore all other instances of the selected delimiter(s). I have no idea how many instances of the delimiter will occur in the generated text, nor have anyway to know in advance.
While looking into this I found rsplit(), and tried using that, but ran into a problem.
'''tweet = buff.rsplit('.')[-1] '''
The above is what I tried first, and I thought it was working until I noticed that all of the lines printed with that had only a single sentence in them. Never more than that. The problem seems to be that the text is being dumped into an array of strings, and the [-1] bit is calling just one entry from that array.
'''tweet = buff.rsplit('.') - buff.rsplit('.')[-1] '''
Next I tried the above. The thinking, was that it would remove the last entry in the array, and then I could just print what remained. It... didn't go to plan. I get an "unsupported operand type" error, specifically tied to the attempt to subtract. Not sure what I'm missing at this point.
.rsplit has second optional argument - maxsplit i.e. maximum number of split to do. You could use it following way:
txt = 'some.text.with.dots'
all_but_last = txt.rsplit('.', 1)[0]
print(all_but_last)
Output:
some.text.with
I have a simple function which when given an input like (x,y), it will return {{x},{x,y}}.
In the cases that x=y, it naturally returns {{x},{x,x}}.
I can't figure out how to get Regex to substitute 'x' in place of 'x,x'. But even if I could figure out how to do this, the expression would go from {{x},{x,x}} to {{x},{x}}, which itself would need to be substituted for {{x}}.
The closest I have gotten has been:
re.sub('([0-9]+),([0-9]+)',r'\1',string)
But this function will also turn {{x},{x,y}} into {{x},{x}}, which is not desired. Also you may notice that the function searches for numbers only, which is fine because I really only intend to be using numbers in the place of x and y; however, if there is a way to get it to work with any letter as well (lower case or capital) the would be even more ideal.
Note also that if I give my original function (x,y,z) it will read it as ((x,y),z) and thus return {{{{x},{x,y}}},{{{x},{x,y}},z}}, thus in the case that x=y=z, I would want to be able to have a Regex function call itself repeatedly to reduce this to {{{{x}}},{{{x}},x}} instead of {{{{x},{x,x}}},{{{x},{x,x}},x}}.
If it helps at all, this is essentially an attempt at making a translation (into sets) using the Kuratowski definition of an ordered pair.
Essentially to solve this you need recursion, or more simply, keep applying the regex in a loop until the replacement doesn't change the input string. For example using your regex from https://regex101.com/r/Yl1IJv/4:
s = '{{ab},{ab,ab}}'
while True:
news = re.sub(r'(?P<first>.?(\w+|\d+).?),(?P=first)', r'\g<1>', s, 0)
if news == s:
break
s = news
print(s)
Output
{{ab}}
Demo on rextester
With
s = '{{{{x},{x,x}}},{{{x},{x,x}},x}}'
The output is
{{{{x}}},{{{x}},x}}
as required. Demo on rextester
My program is given an object with parameters, and I need to get the parameters' values.
The object my program is given will look like this:
Object = """{{objectName|
parameter1=random text|
parameter2=that may or may not|
parameter3=contain any letter (well, almost)|
parameter4=this is some [[problem|problematic text]], Houston, we have a problem!|
otherParameters=(order of parameters is random, but their name is fixed)}}"""
(all parameters might or might not exist)
I am trying to get the properties values.
In the first 3 lines, its pretty easy. a simple regex will find it:
if "parameter1" in Object:
parameter1 = re.split(r"parameter1=(.*?)[\|\}]", Object)[1]
if "parameter2" in Object:
parameter2 = re.split(r"parameter2=(.*?)[\|\}]", Object)[1]
and so on.
The problem is with parameter4, the above regex (property4=(.*?)[\|\}]) will only return this is some [[problem, since the regex stops at the vertical bar.
Now here is the thing: vertical bar will only appear as part of the text inside "[[]]".
For example, parameter1=a[[b|c]]d might appear, but parameter1=a|bc| will never appear.
I need a regex which will stop at vertical bar, unless it is inside double square brackets. So for example, for parameter4, I will get this is some [[problem|problematic text]], Houston, we have a problem!
Worked here when I removed the "?":
parameter4 = re.split(r"parameter4=(.*)[\|\}]", object_)[1]
I also changed the name of the variable to "object_" because "object" is a built-in object in Python
Best.
Apparently, there is no perfect solution.
For other readers possibly reading this question in the future, the closest solution is, as pointed by Wiktor Stribiżew in the comments, parameter4=([^[}|]*(?:\[\[.*?]][^[}|]*)*).
This regex will only work if the param text does not contain any single [, } and | but may contain [[...]] sub-strings.
If you want to understand this regex better, you might want to have a look here: https://regex101.com/r/bWVvKg/2
I am trying to convert a multiline string to a single list which should be possible using splitlines() but for some reason it continues to convert each line into a list instead of processing all the lines at once. I tried to do it out of the for loop but doesnt seem to have any effect. I need the lines as a single list to use it another function. Below is how I get the multiline into a single variable. What am I missing???
multiline_string_final = []
for match_multiline in re.finditer(r'(^(\w+):\sThis particular string\s*|This particular string\s*)\{\s(\w+)\s\{(.*?)\}', string, re.DOTALL):
multi_line_string = match_multiline.group(4)
print multiline_string
This last print statement prints out the strings like this:
blah=0; blah_blah=1; Foo=3;
blah=4; blah_blah=5; Foo=0;
However I need:
['blah=0; blah_blah=1; Foo=3;''blah=4; blah_blah=5; Foo=0;']
I understand it has to be something with the finditer but cant seem to rectify.
Your new problem also has nothing to do with finditer. (Also, your code is still not an MCVE, you still haven't shown us the sample input data, etc., making it harder to help you.)
From this desired output:
['blah=0; blah_blah=1; Foo=3;''blah=4; blah_blah=5; Foo=0;']
I'm pretty sure what you're looking for is to get a list of the matches, instead of printing out each match on its own. That isn't a valid list, because it's missing the comma between the elements,* but I'll assume that's a typo from you making up data instead of building an MCVE and copying and pasting the real output.
Anyway, to get a list, you have to build a list. Printing things to the screen doesn't build anything. So, try this:
multiline_string_final.append(multiline_string)
Then, at the end—not inside the loop, only after the loop has finished—you can print that out:
print multiline_string_final
And it'll look like this:
['blah=0; blah_blah=1; Foo=3;',
'blah=4; blah_blah=5; Foo=0;']
* Actually, it is a valid list, because adjacent strings get concatenated… but it's not the string you wanted, and not a format Python would ever print out for you.
The problem has nothing to do with the finditer, it's that you're doing the wrong thing:
for line in multiline_string:
print multiline_string.splitlines()
If multiline_string really is a multiline string, then for line in multiline_string will iterate over the characters of that string.
Then, within the loop, you completely ignore line anyway, and instead print multiline_string.splitlines()).
So, if multiline_string is this:
abc
def
Then you'll print ['abc\n', 'def\n'] 8 times in a row. That's not what you want (or what you described).
What you want to do is:
split the string into lines
loop over those lines, not over the original un-split string
print each line, not the whole thing
So:
for line in multiline_string.splitlines():
print line