Replace sequence of chars in string with its length [duplicate] - python

This question already has answers here:
Python replace string pattern with output of function
(4 answers)
Closed 5 years ago.
Say I have the following string:
mystr = "6374696f6e20????28??????2c??2c????29"
And I want to replace every sequence of "??" with its length\2. So for the example above, I want to get the following result:
mystr = "6374696f6e2022832c12c229"
Meaning:
???? replaced with 2
?????? replaced with 3
?? replaced with 1
???? replaced with 2
I tried the following but I'm not sure it's the good approach, and anyway -- it doesn't work:
regex = re.compile('(\?+)')
matches = regex.findall(mystr)
if matches:
for match in matches:
match_length = len(match)/2
if (match_length > 0):
mystr= regex .sub(match_length , mystr)

You can use a callback function in Python's re.sub. FYI lambda expressions are shorthand to create anonymous functions.
See code in use here
import re
mystr = "6374696f6e20????28??????2c??2c????29"
regex = re.compile(r"\?+")
print(re.sub(regex, lambda m: str(int(len(m.group())/2)), mystr))
There seems to be uncertainty about what should happen in the case of ???. The above code will result in 1 since it converts to int. Without int conversion the result would be 1.0. If you want to ??? to become 1? you can use the pattern (?:\?{2})+ instead.

Related

How do I get part of a string with a regex in Python [duplicate]

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 2 years ago.
I am new to regex's with python
I have a string which has got a sub-string which I would like to extract from
I have the following pattern:
r = re.compile("(flag{.+[^}]})")
and the string is
Something has gone horribly wrong\n\nflag{Hi!}
I would like to get hold of just flag{Hi!}
I have tried it with:
a = re.search(r,string)
a = re.split(r,string)
None of the approaches work, if I print a I get None
How can I get hold of the desired flag.
Thanks in advance
import re
str="Something has gone horribly wrong\n\nflag{Hi!}"
r = re.compile("(flag{.+[^}]})")
a = re.search(r,str)
print(a.group())
This worked.
Firstly, as mentioned in the comments, your output is not None. You do get a match, the match you were looking for. You actually get a Match object that spans from position 35 -> 44 and matches flag{Hi!}. You can use group() to get the match represented as a string:
>>> a = re.search(r, string)
>>> print(a.group())
"flag{Hi!}"
You can also shorten your regex a little bit. There really isn't a need to use .+ because it becomes redundant when you add [^}], which matches all characters that aren't a closing curly bracket (}):
"(flag{[^}]+})"
You can replace the +, which matches one or more with * which matches zero or more if you want to match things like flag{} where there are no characters inside the curly brackets.
We can directly search the string for matching string.
import re
line = 'Something has gone horribly wrong\n\nflag{Hi!}'
r = re.search("(flag{[^}]*})", line)
print(r.group())
Output:-
flag{Hi!}

remove certain charicters from a string python [duplicate]

This question already has answers here:
Remove specific characters from a string in Python
(26 answers)
Closed 2 years ago.
is there a function in python that does something like this:
input:
text = "s.om/e br%0oken tex!t".remove(".","/","%","0","!")
print(text)
output:
some broken text
The only thing that i know that can kinda to this is .replace("x", "") and that takes way too long to get rid of lots of different charicters. Thanks in advance.
Use regex module re to replace them all. The [] means any character in it :
text = re.sub("[./%0!]", "", "s.om/e br%0oken tex!t")
There is a module named re which is used in Regular expressions. You can use its sub function to replace or substitute characters from a string. Then you can try like this:
from re import sub
text = sub("[./%0!]","","The string")
print(text)
Regex details: Character class of . / % 0 ! if these are found in string replace them with a blank string and later print the text variable.
You might use str.maketrans combined with .translate; example:
t = str.maketrans("","","./%0!")
text = "s.om/e br%0oken tex!t"
cleantext = text.translate(t)
print(cleantext) # print(cleantext)
maketrans accept 3 arguments, every n-th character from first will be replaced with n-th character from second, all characters present in third will be jettisoned. In this case we only want to jettison so 1st and 2nd arguments are empty strs.
Alternatively you might use comprehension as follows:
text = "s.om/e br%0oken tex!t"
cleantext = ''.join(i for i in text if i not in "./%0!")
print(cleantext) # some broken text

My regular expression does not take the second number [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 3 years ago.
I have this string:
field = 1400 x 3524
I want to take these numbers into two seperate variables so I can perform multiplication. This is how I do it:
num1 = re.match("(\d{3,4})(?= x)", field).group(1)
num2 = re.match("(?<=x )(\d{3,4})", field).group(1)
I works with the first number, but the second number comes out as a NoneType.
What am I doing wrong?
Try this:
>>> import re
>>> a = 'field = 1400 x 3524'
>>> m = re.findall( r'\d+', a )
>>> m
['1400', '3524']
>>>
re module documentation states that:
Note that patterns which start with positive lookbehind assertions
will not match at the beginning of the string being searched; you will
most likely want to use the search() function rather than the match()
function
In your case that means you should do:
import re
field = "1400 x 3524"
num2 = re.search("(?<=x )(\d{3,4})", field).group(0)
print(num2) # 3524
Note that here beyond changing match to search I also changed group(1) to group(0)

Match content between curly braces than also can contain curly braces [duplicate]

This question already has answers here:
Matching Nested Structures With Regular Expressions in Python
(6 answers)
Closed 8 years ago.
If I have a string:
s = aaa{bbb}ccc{ddd{eee}fff}ggg
is it possible to find all matches based on outer curly braces?
m = re.findall(r'\{.+?\}', s, re.DOTALL)
returns
['{bbb}', '{ddd{eee}']
but I need:
['{bbb}', '{ddd{eee}fff}']
Is it possible with python regex?
If you want it to work in any depth, but don't necessarily need to use regex, you can implement a simple stack based automaton:
s = "aaa{bbb}ccc{ddd{eee}fff}ggg"
def getStuffInBraces(text):
stuff=""
count=0
for char in text:
if char=="{":
count += 1
if count > 0:
stuff += char
if char=="}":
count -= 1
if count == 0 and stuff != "":
yield stuff
stuff=""
getStuffInBraces is an iterator, so if you want a list of results, you can use print(list(getStuffInBraces(s))).
{(?:[^{}]*{[^{]*})*[^{}]*}
Try this.See demo.
https://regex101.com/r/fA6wE2/28
P.S It will only work the {} is not more than 1 level deep.
You could use this regex also.
\{(?:{[^{}]*}|[^{}])*}
DEMO
>>> s = 'aaa{bbb}ccc{ddd{eee}fff}ggg'
>>> re.findall(r'\{(?:{[^{}]*}|[^{}])*}', s)
['{bbb}', '{ddd{eee}fff}']
Use recursive regex for 1 level deep.
\{(?:(?R)|[^{}])*}
Code:
>>> import regex
>>> regex.findall(r'\{(?:(?R)|[^{}])*}', s)
['{bbb}', '{ddd{eee}fff}']
But this would be supported by the external regex module.

Python parser for Calculator [duplicate]

This question already has answers here:
Evaluating a mathematical expression in a string
(14 answers)
Closed 9 years ago.
I am trying to write a parser which takes expressions as a input from file.
expressions can be A=B=10 or B=(C-A)-4 etc.
What i have tried so far is . I am reading a file IP.txt
import re
opert = '+-/*()_='
fileName = "input.txt"
f = open(fileName,'r')
variableDict = {}
lines = f.readlines()
for i in lines:
for x in re.finditer(r'[A-Z_]\w*', i):
print x.group() # prints list containing all the alphabets.
for z in re.finditer(r'[0-9]\d*', i):
print z.group() # prints list containing all the numbers.
for c in i:
if c in opert:
print c # prints all the operators.
# '_' has special meaning. '_' can only be used before numbers only like _1 or _12 etc
#And i have parsed this also using
print re.findall(r'[_][0-9]\d+',i) # prints the _digits combination.
Now the problem is i have struck at how should i proceed with expression evaluation.
First some rule which i must mention about above inputs are.
No line should be greater then 50 characters.
Left most operator will always be '=' assignment operator.
'=' always Preceded by variables[A-Z],operators are {'+','-','/','*','_'}, digits {0-9}.
How should i first extract the first variable then push it into python list then '=' operator,then either '(','A-Z' push it into stack and so on
Could someone help me with this problem. I am overwhelmed with problem..
If any one is not able to understand the description please goto this link
So, you asked about the stack problem, which of course you need for evaluation. I would do something like this:
import re #1
stack = [] #2 FIX: NOT NECESSARY (since fourth line returns a list anyway)
inputstr = "A=B=C+26-(23*_2 )-D" #3
stack = re.findall(r'(?:[A-Z])|(?:[0-9]+)|(?:[/*+_=\(\)-])', inputstr) #4
while len(stack): #5
print stack.pop() #6
First three lines are some init stuff only. After that, I would make a stack with regex in the fourth line. (?:[A-Z]) matches variable, (?:[0-9]+) matches number (which may have more than one digit) and (?:[/*+_=\(\)-]) matches all the operators. Braces are escaped, and - is on the end, so you don't have to escape it.
Fifth and sixth line prints the stack.
I used (?: ...) because I don't want to match either group. Hard to explain - just try to run it without ?: and you will see the effect.

Categories