Replace part of a matched string in python

Replace part of a matched string in python - python

I have the following matched strings:
punctacros="Tasla"_TONTA
punctacros="Tasla"_SONTA
punctacros="Tasla"_JONTA
punctacros="Tasla"_BONTA
I want to replace only a part (before the underscore) of the matched strings, and the rest of it should remain the same in each original string.
The result should look like this:
TROGA_TONTA
TROGA_SONTA
TROGA_JONTA
TROGA_BONTA

Edit:
This should work:
from re import sub
with open("/path/to/file") as myfile:
lines = []
for line in myfile:
line = sub('punctacros="Tasla"(_.*)', r'TROGA\1', line)
lines.append(line)
with open("/path/to/file", "w") as myfile:
myfile.writelines(lines)
Result:
TROGA_TONTA
TROGA_SONTA
TROGA_JONTA
TROGA_BONTA
Note however, if your file is exactly like the sample given, you can replace the re.sub line with this:
line = "TROGA_"+line.split("_", 1)[1]
eliminating the need of Regex altogether. I didn't do this though because you seem to want a Regex solution.

mystring.replace('punctacross="Tasla"', 'TROGA_')
where mystring is string with those four lines. It will return string with replaced values.

If you want to replace everything before the first underscore, try this:
#! /usr/bin/python3
data = ['punctacros="Tasla"_TONTA',
'punctacros="Tasla"_SONTA',
'punctacros="Tasla"_JONTA',
'punctacros="Tasla"_BONTA',
'somethingelse!="Tucku"_CONTA']
for s in data:
print('TROGA' + s[s.find('_'):])

Related

Appending string to a line read from file places appended string on next line

I have a .txt file like the following:
abc
def
ghi
Now, I want to add some string behind each row directly. However, my output is:
abc
---testdef
---testghi---test
My code is as follows:
file_read = open("test.txt", "r")
lines = file_read.readlines()
file_read.close()
new_file = open("res.txt", "w")
for line in lines:
new_file.write(line + "---test") # I tried to add "\r" in the middle of them, but this didn't work.
new_file.close()

You need to strip the new line using .rstrip():
for line in lines:
new_file.write(f"{line.rstrip()}---test\n")
Then, res.txt contains:
abc---test
def---test
ghi---test

What I understood is, you want to add a string behind a string, for example "abcd" should be changed into "---testabcd".
So the mistake you made is in new_file.write(line + "---test"), if you want add string1 before a string2, then you have to specify string1 first then string2.
So change it tonew_file.write("---test" + line)
Tip: Instead of using '+' operator, use f strings or .format.
f"---test{line}" this is f string.
"Hello {friends}".format(friends="myname")
For use of '\r':
Whenever you will use this special escape character \r, the rest of the content after the \r will come at the front of your line and will keep replacing your characters one by one until it takes all the contents left after the \r in that string.
print('Python is fun')
Output: Python is fun
Now see what happens if I use a carriage return here
print('Python is fun\r123456')
Output: 123456 is fun
So basically, it just replaces indexes of string to character after \r.

python open csv search for pattern and strip everything else

I got a csv file 'svclist.csv' which contains a single column list as follows:
pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1
pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs
I need to strip each line from everything except the PL5 directoy and the 2 numbers in the last directory
and should look like that
PL5,00
PL5,01
I started the code as follow:
clean_data = []
with open('svclist.csv', 'rt') as f:
for line in f:
if line.__contains__('profile'):
print(line, end='')
and I'm stuck here.
Thanks in advance for the help.

you can use the regular expression - (PL5)[^/].{0,}([0-9]{2,2})
For explanation, just copy the regex and paste it here - 'https://regexr.com'. This will explain how the regex is working and you can make the required changes.
import re
test_string_list = ['pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1',
'pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs']
regex = re.compile("(PL5)[^/].{0,}([0-9]{2,2})")
result = []
for test_string in test_string_list:
matchArray = regex.findall(test_string)
result.append(matchArray[0])
with open('outfile.txt', 'w') as f:
for row in result:
f.write(f'{str(row)[1:-1]}\n')
In the above code, I've created one empty list to hold the tuples. Then, I'm writing to the file. I need to remove the () at the start and end. This can be done via str(row)[1:-1] this will slice the string.
Then, I'm using formatted string to write content into 'outfile.csv'

You can use regex for this, (in general, when trying to extract a pattern this might be a good option)
import re
pattern = r"pf=/usr/sap/PL5/SYS/profile/PL5_.*(\d{2})"
with open('svclist.csv', 'rt') as f:
for line in f:
if 'profile' in line:
last_two_numbers = pattern.findall(line)[0]
print(f'PL5,{last_two_numbers}')
This code goes over each line, checks if "profile" is in the line (this is the same as _contains_), then extracts the last two digits according to the pattern

I made the assumption that the number is always between the two underscores. You could run something similar to this within your for-loop.
test_str = "pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1"
test_list = test_str.split("_") # splits the string at the underscores
output = test_list[1].strip(
"abcdefghijklmnopqrstuvwxyz" + str.swapcase("abcdefghijklmnopqrstuvwxyz")) # removing any character
try:
int(output) # testing if the any special characters are left
print(f"PL5, {output}")
except ValueError:
print(f'Something went wrong! Output is PL5,{output}')

Get words from first line until first space and without first character in python

I have a textfile where I want to extract the first word, but without the first character and put it into a list. Is there a way in python to do this without using regex?
A text example of what I have looks like:
#blabla sjhdiod jszncoied
Where I want the first word in this case blabla without the #.
If regex is the only choice, then how will the regex look like?

This should do the trick:
l = []
for line in open('file'):
l.append(line.split()[0][1:])
Edit: If you have empty lines, this will throw an error. You will have to check for empty lines. Here is a possible solution:
l = []
for line in open('file'):
if line.strip():
l.append(line.split()[0][1:])

Pythonic way:
my_list = [line.split(' ', 1)[0][1:] for line in open('file') if line.startswith('#')]

a textfile where I want to extract the first word, but without the
first character and put it into a list
result = []
with open('file.txt', 'r') as f:
l = next(f).strip() # getting the 1st line
result.append(l[1:l.find(' ')])
print(result)
The output:
['blabla']

Simple enough if your input is so regular:
s = "#blabla sjhdiod jszncoied"
s.split()[0].strip('#')
blabla
split splits on whitespace by default. Take the first token and strip away '#'.

Python - Replace parenthesis with periods and remove first and last period

I am trying to take an input file with a list of DNS lookups that contains subdomain/domain separators with the string length in parenthesis as opposed to periods. It looks like this:
(8)subdomain(5)domain(3)com(0)
(8)subdomain(5)domain(3)com(0)
(8)subdomain(5)domain(3)com(0)
I would like to replace the parenthesis and numbers with periods and then remove the first and last period. My code currently does this, but leaves the last period. Any help is appreciated. Here is the code:
import re
file = open('test.txt', 'rb')
writer = open('outfile.txt', 'wb')
for line in file:
newline1 = re.sub(r"\(\d+\)",".",line)
if newline1.startswith('.'):
newline1 = newline1[1:-1]
writer.write(newline1)

You can split the lines with \(\d+\) regex and then join with . stripping commas at both ends:
for line in file:
res =".".join(re.split(r'\(\d+\)', line))
writer.write(res.strip('.'))
See IDEONE demo

Given that your re.sub call works like this:
> re.sub(r"\(\d+\)",".", "(8)subdomain(5)domain(3)com(0)")
'.subdomain.domain.com.'
the only thing you need to do is strip the resulting string from any leading and trailing .:
> s = re.sub(r"\(\d+\)",".", "(8)subdomain(5)domain(3)com(0)")
> s.strip(".")
'subdomain.domain.com'
Full drop in solution:
for line in file:
newline1 = re.sub(r"\(\d+\)",".",line).strip(".")
writer.write(newline1)

import re
def repl(matchobj):
if matchobj.group(1):
return "."
else:
return ""
x="(8)subdomain(5)domain(3)com(0)"
print re.sub(r"^\(\d+\)|((?<!^)\(\d+\))(?!$)|\(\d+\)$",repl,x)
Output:subdomain.domain.com.
You can define your own replace function.

import re
for line in file:
line = re.sub(r'\(\d\)','.',line)
line = line.strip('.')

Splitting lines in python based on some character

Input:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Output:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
'!' is the starting character and +0013 should be the ending of each line (if present).
Problem which I am getting:
Output is like :
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
Any help would be highly appreciated...!!!
My code:
file_open= open('sample.txt','r')
file_read= file_open.read()
file_open2= open('output.txt','w+')
counter =0
for i in file_read:
if '!' in i:
if counter == 1:
file_open2.write('\n')
counter= counter -1
counter= counter +1
file_open2.write(i)

You can try something like this:
with open("abc.txt") as f:
data=f.read().replace("\r\n","") #replace the newlines with ""
#the newline can be "\n" in your system instead of "\r\n"
ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines
for x in ans:
print "!"+x #or write to some other file
.....:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.

Could you just use str.split?
lines = file_read.split('!')
Now lines is a list which holds the split data. This is almost the lines you want to write -- The only difference is that they don't have trailing newlines and they don't have '!' at the start. We can put those in easily with string formatting -- e.g. '!{0}\n'.format(line). Then we can put that whole thing in a generator expression which we'll pass to file.writelines to put the data in a new file:
file_open2.writelines('!{0}\n'.format(line) for line in lines)
You might need:
file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
if you find that you're getting more newlines than you wanted in the output.
A few other points, when opening files, it's nice to use a context manager -- This makes sure that the file is closed properly:
with open('inputfile') as fin:
lines = fin.read()
with open('outputfile','w') as fout:
fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)

Another option, using replace instead of split, since you know the starting and ending characters of each line:
In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '')
In [15]: print data.replace('+0013!', "+0013\n!")
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.

Just for some variance, here is a regular expression answer:
import re
outputFile = open('output.txt', 'w+')
with open('sample.txt', 'r') as f:
for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL):
outputFile.write(line.replace("\n", "") + '\n')
outputFile.close()
It will open the output file, get the contents of the input file, and loop through all the matches using the regular expression !.+?(?=!|$) with the re.DOTALL flag. The regular expression explanation & what it matches can be found here: http://regex101.com/r/aK6aV4
After we have a match, we strip out the new lines from the match, and write it to the file.

Let's try to add a \n before every "!"; then let python splitlines :-) :
file_read.replace("!", "!\n").splitlines()

I will actually implement as a generator so that you can work on the data stream rather than the entire content of the file. This will be quite memory friendly if working with huge files
>>> def split_on_stream(it,sep="!"):
prev = ""
for line in it:
line = (prev + line.strip()).split(sep)
for parts in line[:-1]:
yield parts
prev = line[-1]
yield prev
>>> with open("test.txt") as fin:
for parts in split_on_stream(fin):
print parts
,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:19,000.0,0,37N22.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace part of a matched string in python - python

mystring.replace('punctacross="Tasla"', 'TROGA_') where mystring is string with those four lines. It will return string with replaced values.

If you want to replace everything before the first underscore, try this: #! /usr/bin/python3 data = ['punctacros="Tasla"_TONTA', 'punctacros="Tasla"_SONTA', 'punctacros="Tasla"_JONTA', 'punctacros="Tasla"_BONTA', 'somethingelse!="Tucku"_CONTA'] for s in data: print('TROGA' + s[s.find('_'):])

Related

Appending string to a line read from file places appended string on next line

python open csv search for pattern and strip everything else

Get words from first line until first space and without first character in python

Python - Replace parenthesis with periods and remove first and last period

Splitting lines in python based on some character

Categories

Resources