Python: Use Regular expression to remove something

Python: Use Regular expression to remove something - python

I've got a string looks like this
ABC(a =2,b=3,c=5,d=5,e=Something)
I want the result to be like
ABC(a =2,b=3,c=5)
What's the best way to do this? I prefer to use regular expression in Python.
Sorry, something changed, the raw string changed to
ABC(a =2,b=3,c=5,dddd=5,eeee=Something)

longer = "ABC(a =2,b=3,c=5,d=5,e=Something)"
shorter = re.sub(r',\s*d=\d+,\s*e=[^)]+', '', longer)
# shorter: 'ABC(a =2,b=3,c=5)'
When the OP finally knows how many elements are there in the list, he can also use:
shorter = re.sub(r',\s*d=[^)]+', '', longer)
it cuts the , d= and everything after it, but not the right parenthesis.

Non regex
>>> s="ABC(a =2,b=3,c=5,d=5,e=Something)"
>>> ','.join(s.split(",")[:-2])+")"
'ABC(a =2,b=3,c=5)'
If you want regex to get rid always the last 2
>>> s="ABC(a =2,b=3,c=5,d=5,e=6,f=7,g=Something)"
>>> re.sub("(.*)(,.[^,]*,.[^,]*)\Z","\\1)",s)
'ABC(a =2,b=3,c=5,d=5,e=6)'
>>> s="ABC(a =2,b=3,c=5,d=5,e=Something)"
>>> re.sub("(.*)(,.[^,]*,.[^,]*)\Z","\\1)",s)
'ABC(a =2,b=3,c=5)'
If its always the first 3,
>>> s="ABC(a =2,b=3,c=5,d=5,e=Something)"
>>> re.sub("([^,]+,[^,]+,[^,]+)(,.*)","\\1)",s)
'ABC(a =2,b=3,c=5)'
>>> s="ABC(q =2,z=3,d=5,d=5,e=Something)"
>>> re.sub("([^,]+,[^,]+,[^,]+)(,.*)","\\1)",s)
'ABC(q =2,z=3,d=5)'

import re
re.sub(r',d=\d*,e=[^\)]*','', your_string)

Related

How to escape null characters .i.e [' '] while using regex split function? [duplicate]

I have the following file names that exhibit this pattern:
000014_L_20111007T084734-20111008T023142.txt
000014_U_20111007T084734-20111008T023142.txt
...
I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:
time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
But this gives me two extra empty strings in the returned list:
time_info=['', '20111007T084734', '20111008T023142', '']
How do I get only the two time stamp information? i.e. I want:
time_info=['20111007T084734', '20111008T023142']

I'm no Python expert but maybe you could just remove the empty strings from your list?
str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
time_info = filter(None, str_list)

Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.
>>> f = '000014_L_20111007T084734-20111008T023142.txt'
>>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()
>>> time_info
('20111007T084734', '20111008T023142')
You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')

If the timestamps are always after the second _ then you can use str.split and str.strip:
>>> strs = "000014_L_20111007T084734-20111008T023142.txt"
>>> strs.strip(".txt").split("_",2)[-1].split("-")
['20111007T084734', '20111008T023142']

Since this came up on google and for completeness, try using re.findall as an alternative!
This does require a little re-thinking, but it still returns a list of matches like split does. This makes it a nice drop-in replacement for some existing code and gets rid of the unwanted text. Pair it with lookaheads and/or lookbehinds and you get very similar behavior.
Yes, this is a bit of a "you're asking the wrong question" answer and doesn't use re.split(). It does solve the underlying issue- your list of matches suddenly have zero-length strings in it and you don't want that.

>>> f='000014_L_20111007T084734-20111008T023142.txt'
>>> f[10:-4].split('-')
['0111007T084734', '20111008T023142']
or, somewhat more general:
>>> f[f.rfind('_')+1:-4].split('-')
['20111007T084734', '20111008T023142']

python parsing a string

I have a list with strings.
list_of_strings
They look like that:
'/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
I want to part this string into:
/folder1/folder2/folder3/folder4/folder5/exp-* and put this into a new list.
I thought to do something like that, but I am lacking the right snippet to do what I want:
list_of_stringparts = []
for string in sorted(list_of_strings):
part= string.split('/')[7] # or whatever returns the first part of my string
list_of_stringparts.append(part)
has anyone an idea? Do I need a regex?

You are using array subscription which extracts one (eigth) element. To get first seven elements, you need a slicing [N:M:S] like this:
>>> l = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> l.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
In our case N is ommitted (by default 0) and S is step which is by default set to 1, so you'll get elements 0-7 from the result of split.
To construct your string back, use join():
>>> '/'.join(s)
'/folder1/folder2/folder3/folder4/folder5/exp-*'

I would do like this,
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> s.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
>>> '/'.join(s.split('/')[:7])
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Using re.match
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> re.match(r'.*?\*', s).group()
'/folder1/folder2/folder3/folder4/folder5/exp-*'

Your example suggests that you want to partition the strings at the first * character. This can be done with str.partition():
list_of_stringparts = []
list_of_strings = ['/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder1/exp-*/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder/blah/pow']
for s in sorted(list_of_strings):
head, sep, tail = s.partition('*')
list_of_stringparts.append(head + sep)
>>> list_of_stringparts
['/folder/blah/pow', '/folder1/exp-*', '/folder1/folder2/folder3/folder4/folder5/exp-*']
Or this equivalent list comprehension:
list_of_stringparts = [''.join(s.partition('*')[:2]) for s in sorted(list_of_strings)]
This will retain any string that does not contain a * - not sure from your question if that is desired.

Regex search and replace substring in Python

I need some help with a regular expression in python.
I have a string like this:
>>> s = '[i1]scale=-2:givenHeight_1[o1];'
How can I remove givenHeight_1 and turn the string to this?
>>> '[i1]scale=-2:360[o1];'
Is there an efficient one-liner regex for such a job?
UPDATE 1:
my regex so far is something like this but currently not working:
re.sub('givenHeight_1[o1]', '360[o1]', s)

You can use positive look around with re.sub :
>>> s = '[i1]scale=-2:givenHeight_1[o1];'
>>> re.sub(r'(?<=:).*(?=\[)','360',s)
'[i1]scale=-2:360[o1];'
The preceding regex will replace any thing that came after : and before [ with an '360'.
Or based on your need you can use str.replace directly :
>>> s.replace('givenHeight_1','360')
'[i1]scale=-2:360[o1];'

python regex find characters from and end of the string

svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz
from the following string I need to fetch Rev1233. So i was wondering if we can have any regexpression to do that. I like to do following string.search ("Rev" uptill next /)
so far I split this using split array
s1,s2,s3,s4,s5 = string ("/",4)

You don't need a regex to do this. It is as simple as:
str = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
str.split('/')[-2]

Here is a quick python example
>>> impot re
>>> s = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
>>> p = re.compile('.*/(Rev\d+)/.*')
>>> p.match(s).groups()[0]
'Rev1223'

Find second part from the end using regex, if preferred:
/(Rev\d+)/[^/]+$
http://regex101.com/r/cC6fO3/1
>>> import re
>>> m = re.search('/(Rev\d+)/[^/]+$', 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz')
>>> m.groups()[0]
'Rev1223'

Python: Getting text of a Regex match

I have a regex match object in Python. I want to get the text it matched. Say if the pattern is '1.3', and the search string is 'abc123xyz', I want to get '123'. How can I do that?
I know I can use match.string[match.start():match.end()], but I find that to be quite cumbersome (and in some cases wasteful) for such a basic query.
Is there a simpler way?

You can simply use the match object's group function, like:
match = re.search(r"1.3", "abc123xyz")
if match:
doSomethingWith(match.group(0))
to get the entire match. EDIT: as thg435 points out, you can also omit the 0 and just call match.group().
Addtional note: if your pattern contains parentheses, you can even get these submatches, by passing 1, 2 and so on to group().

You need to put the regex inside "()" to be able to get that part
>>> var = 'abc123xyz'
>>> exp = re.compile(".*(1.3).*")
>>> exp.match(var)
<_sre.SRE_Match object at 0x691738>
>>> exp.match(var).groups()
('123',)
>>> exp.match(var).group(0)
'abc123xyz'
>>> exp.match(var).group(1)
'123'
or else it will not return anything:
>>> var = 'abc123xyz'
>>> exp = re.compile("1.3")
>>> print exp.match(var)
None

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Use Regular expression to remove something - python

I've got a string looks like this ABC(a =2,b=3,c=5,d=5,e=Something) I want the result to be like ABC(a =2,b=3,c=5) What's the best way to do this? I prefer to use regular expression in Python. Sorry, something changed, the raw string changed to ABC(a =2,b=3,c=5,dddd=5,eeee=Something)

import re re.sub(r',d=\d,e=[^\)]','', your_string)

Related

How to escape null characters .i.e [' '] while using regex split function? [duplicate]

python parsing a string

Regex search and replace substring in Python

python regex find characters from and end of the string

Python: Getting text of a Regex match

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Use Regular expression to remove something - python

I've got a string looks like this ABC(a =2,b=3,c=5,d=5,e=Something) I want the result to be like ABC(a =2,b=3,c=5) What's the best way to do this? I prefer to use regular expression in Python. Sorry, something changed, the raw string changed to ABC(a =2,b=3,c=5,dddd=5,eeee=Something)

import re re.sub(r',d=\d*,e=[^\)]*','', your_string)

Related

How to escape null characters .i.e [' '] while using regex split function? [duplicate]

python parsing a string

Regex search and replace substring in Python

python regex find characters from and end of the string

Python: Getting text of a Regex match

Categories

Resources

import re re.sub(r',d=\d,e=[^\)]','', your_string)