python replace remove one tab from each line - python

For example I've got a file with lines like this
\tline1\t
\t\tline2\t
\t\tline3
\tline4
I need to remove only first tab in the beginning of every string(and I don't care if there are more tabs in the line)
So the result suppose to look like this
line1\t
\tline2\t
\tline3
line4
How to do it?

s = "\thello"
s.replace("\t", "", 1)
Unsure if it's needed but this will handle stuff like `"hello\tworld" also, i.e. replace the first tab in the string disregarding where in the string it is

Regular expressions may help:
>>> import re
>>> pattern = re.compile('^\t') # match a tab in the beginning of the line
>>> pattern.sub('', '\tline1\t')
'line1\t'
>>> pattern.sub('', '\t\tline2\t')
'\tline2\t'
>>> pattern.sub('', 'line3\t')
'line3\t'

Related

Split string if separator is not in-between two characters

I want to write a script that reads from a csv file and splits each line by comma except any commas in-between two specific characters.
In the below code snippet I would like to split line by commas except the commas in-between two $s.
line = "$abc,def$,$ghi$,$jkl,mno$"
output = line.split(',')
for o in output:
print(o)
How do I write output = line.split(',') so that I get the following terminal output?
~$ python script.py
$abc,def$
$ghi$
$jkl,mno$
You can do this with a regular expression:
In re, the (?<!\$) will match a character not immediately following a $.
Similarly, a (?!\$) will match a character not immediately before a dollar.
The | character cam match multiple options. So to match a character where either side is not a $ you can use:
expression = r"(?<!\$),|,(?!\$)"
Full program:
import re
expression = r"(?<!\$),|,(?!\$)"
print(re.split(expression, "$abc,def$,$ghi$,$jkl,mno$"))
One solution (maybe not the most elegant but it will work) is to replace the string $,$ with something like $,,$ and then split ,,. So something like this
output = line.replace('$,$','$,,$').split(',,')
Using regex like mousetail suggested is the more elegant and robust solution but requires knowing regex (not that anyone KNOWS regex)
Try regular expressions:
import re
line = "$abc,def$,$ghi$,$jkl,mno$"
output = re.findall(r"\$(.*?)\$", line)
for o in output:
print('$'+o+'$')
$abc,def$
$ghi$
$jkl,mno$
First, you can identify a character that is not used in that line:
c = chr(max(map(ord, line)) + 1)
Then, you can proceed as follows:
line.replace('$,$', f'${c}$').split(c)
Here is your example:
>>> line = '$abc,def$,$ghi$,$jkl,mno$'
>>> c = chr(max(map(ord, line)) + 1)
>>> result = line.replace('$,$', f'${c}$').split(c)
>>> print(*result, sep='\n')
$abc,def$
$ghi$
$jkl,mno$

How to strip square braces from string?

I'm trying to strip a string containing []. Seems like it's not the case.
line = "This contains [Hundreds] of potatoes"
line = line.strip("[]")
print(line)
Actual results is the line, untouched.
How would I do this? Regex?
You can use regex for replacing the unwanted characters. Here's what you can do:
import re
line = "This contains [Hundreds] of potatoes"
line = re.sub(r"[\[\]]", "", line)
print(line)
I think your problem is that you are using .strip when you should be using .replace
it works like this:
list = ("This contains [Hundreds] of potatoes")
list = list.replace("[" ,"")
list = list.replace("]" ,"")
print(list)
the first argument is what you want to replace and the second is what you are replacing by or with.
Hope this helps!
You can also use str.translate:
Python 2
>>> line.translate(None, '[]')
'This contains Hundreds of potatoes'
Python 3
>>> line.translate(str.maketrans('', '','[]'))
'This contains Hundreds of potatoes'

how to avoid regex matching "Revert "Revert"

I have the following code which matches both line1 and line2,I ONLY want to match line2 but not line1, can anyone provide guidance on how to do this?
import re
line1 = '''Revert "Revert <change://problem/47614601> [tech feature][V3 Driver] Last data path activity timestamp update is required for feature""'''
line2 = '''Revert <change://problem/47614601> [tech feature][V3 Driver] Last data path activity timestamp update is required for feature"'''
if re.findall(".*?(?:Revert|revert)\s*\S*(?:change:\/\/problem\/)(\d{8})", line2):
match = re.findall(".*?(?:Revert|revert)\s*\S*(?:change:\/\/problem\/)(\d{8})", line2)
print "Revert radar match%s"%match
revert_radar = True
print revert_radar
Something like this should do what you want:
>>> regex = "(?!:(?:R|r)evert.*)(?:Revert|revert)\s*\S*(?:change:\/\/problem\/)(\d{8})"
>>> re.match(regex, line1) is None
True
>>> re.match(regex, line2).groups()
('47614601',)
Negative look behind: word NOT proceeded by sameword-space-doublequote
r'''(?<![Rr]evert ")[Rr]evert\s<change:[/][/]problem[/]\d{8}.*"'''
Negative look ahead: pattern you want NOT followed by a double doublequote
r'''[Rr]evert\s<change:[/][/]problem[/]\d{8}.*?\w"(?!")'''
this doesn't work if the unwanted line immediately precedes the wanted line without an intervening newline character.
it does work if the unwanted line follows the wanted line.
If looking at individual lines, look for the pattern at the start of string - of the three this is the least work, most efficient for the regex engine
r'''^[Rr]evert\s<change:[/][/]problem[/]\d{8}.*$'''
If the line you are searching for is embedded in a long string but there is a newline character before that line you can use the previous pattern with the multiline flag.

Python re match last underscore in a string

I have some strings that look like this
S25m\S25m_16Q_-2dB.png
S25m\S25m_1_16Q_0dB.png
S25m\S25m_2_16Q_2dB.png
I want to get the string between slash and the last underscore, and also the string between last underscore and extension, so
Desired:
[S25m_16Q, S25m_1_16Q, S25m_2_16Q]
[-2dB, 0dB, 2dB]
I was able to get the whole thing between slash and extension by doing
foo = "S25m\S25m_16Q_-2dB.png"
match = re.search(r'([a-zA-Z0-9_-]*)\.(\w+)', foo)
match.group(1)
But I don't know how to make a pattern so I could split it by the last underscore.
Capture the groups you want to get.
>>> re.search(r'([-\w]*)_([-\w]+)\.\w+', "S25m\S25m_16Q_-2dB.png").groups()
('S25m_16Q', '-2dB')
>>> re.search(r'([-\w]*)_([-\w]+)\.\w+', "S25m\S25m_1_16Q_0dB.png").groups()
('S25m_1_16Q', '0dB')
>>> re.search(r'([-\w]*)_([-\w]+)\.\w+', "S25m\S25m_2_16Q_2dB.png").groups()
('S25m_2_16Q', '2dB')
* matches the previous character set greedily (consumes as many as possible); it continues to the last _ since \w includes letters, numbers, and underscore.
>>> zip(*[m.groups() for m in re.finditer(r'([-\w]*)_([-\w]+)\.\w+', r'''
... S25m\S25m_16Q_-2dB.png
... S25m\S25m_1_16Q_0dB.png
... S25m\S25m_2_16Q_2dB.png
... ''')])
[('S25m_16Q', 'S25m_1_16Q', 'S25m_2_16Q'), ('-2dB', '0dB', '2dB')]
A non-regex solution (albeit rather messy):
>>> import os
>>> s = "S25m\S25m_16Q_-2dB.png"
>>> first, _, last = s.partition("\\")[2].rpartition('_')
>>> print (first, os.path.splitext(last)[0])
('S25m_16Q', '-2dB')
I know it says using re, but why not just use split?
strings = """S25m\S25m_16Q_-2dB.png
S25m\S25m_1_16Q_0dB.png
S25m\S25m_2_16Q_2dB.png"""
strings = strings.split("\n")
parts = []
for string in strings:
string = string.split(".png")[0] #Get rid of file extension
string = string.split("\\")
splitString = string[1].split("_")
firstPart = "_".join(splitString[:-1]) # string between slash and last underscore
parts.append([firstPart, splitString[-1]])
for line in parts:
print line
['S25m_16Q', '-2dB']
['S25m_1_16Q', '0dB']
['S25m_2_16Q', '2dB']
Then just transpose the array,
for line in zip(*parts):
print line
('S25m_16Q', 'S25m_1_16Q', 'S25m_2_16Q')
('-2dB', '0dB', '2dB')

python regular expression substitute

I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.
score:0.86|taxid:9606(Human)|intact:EBI-999900
How to write regular expression for this in python.
>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'
If there are multiple taxids, use re.findall, which returns a list of all matches:
>>> re.findall(r'taxid:(\d+)', s)
['9606']
for line in lines:
match = re.match(".*\|taxid:([^|]+)\|.*",line)
print match.groups()

Categories