I'm pretty new to Python and I've been trying to work out how to use if statements where it looks if a particular string is preceded by a certain pattern.
For example, the text in particular
Sep 09 07:54:28 INFO: line of text here
I have multiple lines like this in a file that I have my script reading from. The dates and times change on every line so I can't specify that text exactly.
I'm trying to replace the word INFO with something else.
However, the word INFO is scattered throughout the text file and I don't want to replace every instance of it.
I only want to replace INFO if it's preceded by number number, colon, number number, colon, number number.
So I've used if statements, string.replace(old, new) and I've been reading about 'positive lookbehind assertions' eg. (?<=abc)def
But I'm unsure how to specify a pattern of text, rather than specifying the exact text.
Just need pointing in the right direction! Thanks
EDIT: I should also point out that there are other instances of INFO that is preceded by a number, so I didn't want to make the rule simply 'be preceded by a number'. It will need to be specifically that pattern (xx:xx:xx)
EDIT2: Putting another example here to clarify further based on a comment
Sep 09 07:54:28 INFO: line of text here that contains many words
line of text that also contains the word INFO in the same line
Sep 09 07:56:30 INFO: line of text here that also contains many words
121334234: line of text here that contains INFO as well
I want replace the word INFO, but only on lines that have the time in that format (num, num, colon, num num, colon, num num)
EDIT 3:
with open(infile) as f:
f = f.read()
with open(infile, 'r') as IN, open('output.html', 'w') as OUT:
f = re.sub(r'(?<=\d{2}:\d{2}:\d{2})\s*INFO\b', ' INFO2', f)
this isn't returning any error but it doesn't perform any action
EDIT 4:
OUT.write(re.sub(r'(?<=\d{2}:\d{2}:\d{2})\s*INFO\b', ' INFO2', f))
Now this does replace INFO with INFO2 but it also stops all the code below it from working. But it's dependent on where I place the code. If I place it after all of my other code, it doesn't seem to do anything, if I place it straight after where I define my IN and OUT then it breaks all formatting from the code below it
You may use the following approach:
import re
s = '''Sep 09 07:54:28 INFO: line of text here that contains many words
line of text that also contains the word INFO in the same line
Sep 09 07:56:30 INFO: line of text here that also contains many words
121334234: line of text here that contains INFO as well'''
repl_str = 'new_info' # sample replacement string
s = re.sub(r'(?<=\d{2}:\d{2}:\d{2})\s*INFO\b', f' {repl_str}', s)
print(s)
The output:
Sep 09 07:54:28 new_info: line of text here that contains many words
line of text that also contains the word INFO in the same line
Sep 09 07:56:30 new_info: line of text here that also contains many words
121334234: line of text here that contains INFO as well
A simple regex like
(?<=\d\d:\d\d:\d\d\s)INFO
would find all such INFO strings
You can find the pattern without using positive lookbehind assertions as well. Assuming that your file name is test.txt, you can do it as following-
with open("test.txt", "r") as reader:
obj = re.compile(r'\d+\s+\d+:\d+:\d+\s+INFO')
for line in reader:
x = obj.search(line)
if x:
# do what you want to do
Related
I started playing with Python and programming in general like 3 weeks ago so be gentle ;)
What i try to do is convert text files the way i want them to be, the text files have same pattern but the words i want to replace are unknown. So the program must first find them, set a pattern and then replace them to words i want.
For example:
xxxxx
xxxxx
Line3 - word - xxxx xxxx
xxxxx xxxx
word
word
xxxx word
Legend:
xxxxx = template words, present in every file
word = random word, our target
I am able to localize first apperance of the word because it appears always in the same place of the file, from then it appears randomly.
MY code:
f1 = open('test.txt', 'r')
f2 = open('file2.txt', 'w')
pattern = ''
for line in f1.readlines():
if line.startswith('Seat 1'):
line = line.split(' ', 3)
pattern = line[2]
line = ' '.join(line)
f2.write(line)
elif pattern in line.strip():
f2.write(line.replace(pattern, 'NewWord'))
else:
f2.write(line)
f1.close()
f2.close()
This code doesnt work, whats wrong ?
welcome to the world of Python!
I believe you are on the right track and are very close to the correct solution, however I see a couple of potential issues which may cause your program to not run as expected.
If you are trying to see if a string equals another, I would use == instead of is (see this answer for more info)
When reading a file, lines end with \n which means your variable line might never match your word. To fix this you could use strip, which automatically removes leading and trailing "space" characters (like a space or a new line character)
elif line.strip() == pattern:
This is not really a problem but a recommendation, since you are just starting out. When dealing with files it is highly recommended to use the with statement that Python provides (see question and/or tutorial)
Update:
I saw that you might have the word be part of the line, do instead of using == as recommended in point 1, you could use in, but you need to reverse the order, i.e.
elif pattern in line:
Let's say I have a file:
This is the first line
Ages = ["young*","old*"] //This was the second line, I put a * on purpose
This is the third line
The scenario is as:
I know there is the "Ages" array inside the file, but I don't have any idea about its elements.
I now want to append a specific string, say "test*" after each element, the file would become:
This is the first line
Ages = ["young\*test\*","old\*test\*"] //This was the second line, I put a * on purpose
This is the third line
Any help?
First you need to open the file with read and write mode and read it all to a string, then I think your best bet would be to use regular expressions to group what you want to replace and replace it. Then write it back to the file. It would be something similar to below:
pattern = re.compile(r'') # this would be your pattern
with open("filename.txt", "r+") as file:
content = file.read() #all the content as string
replaced = pattern.sub('the match group with your addition', content)
file.seek(0) #the seek is necessary to return back to the start of the file
file.write(replaced) # write the modified content to the file
file.truncate() # truncate if there are any trailing parts from before
I think this is a homework so I won't complete the regular expression pattern there but I will give a hint:
Ages with trailing 0 or more spaces = with trailing 0 or more spaces array open character with a match group for your inner array elements, and another inner group that signifies the end of each element, then you can replace it accordingly.
Can anyone please provide me the regex code for printing only the first line of the data in the text file??? I am using spyder
i have tried may solutions but it prints all my data in every line ...last one helped me but it chose two lines. i just want the first line of my text file only till it encounters line break or till the text starts from next line.
import re
def getname(s):
nameregex=re.findall(r'^.*?[\.!\?](?:\s|$)',line)
if len(nameregex)!=0:
print(nameregex)
s = open('yesno.txt')
for line in s:
getname(s)
In the output i am getting first two lines.
Basically i am trying to print the company name only which is mostly in the first line.
Read the file into a variable using read() and use re.search to get the match:
import re
def getname(s):
nameregex=re.search(r'^.*?[.!?](?!\S)', s) # Run search with regex
if nameregex: # If there is a match
print(nameregex.group()) # Get Group 0 - whole match - value
s = open('yesno.txt', 'r') # Open file handle to read it
contents = s.read() # Get all file contents
getname(contents) # Run the getname method with the contents
See the Python demo.
The regex is a bit modified to avoid the whitespace at the end. See details:
^ - start of the string
.*? - any 0 or more chars other than line break chars, as few as possible
[.!?] - ., ! or ? char
(?!\S) - there must be a whitespace or end of string here.
See the regex graph:
I have data in the following form in a file:
<string1> abc:string2 <http://yago-knowledge.org/resource/wikicategory_Sports_clubs_established</text\u003e\n______<sha1\u003eqwjfowt5my8t6yuszdb88k2ehskjuh0</sha1\u003e\n____</revision\u003e\n__</page\u003e\n__<page\u003e\n____<title\u003ePortal:Tropical_cyclones/Anniversaries/August_22</title\u003e\n____<ns\u003e100</ns\u003e\n____<id\u003e7957689</id\u003e\n____<revision\u003e\n______<id\u003e446349886</id\u003e\n______<timestamp\u003e2011-08-23T17:38:19Z</timestamp\u003e\n______<contributor\u003e\n________<username\u003eLightbot</username\u003e\n________<id\u003e7178666</id\u003e\n______</contributor\u003e\n______<comment\u003eDelink_non-obscure_units._Conversions._Report_bugs_to_[[User_talk:Lightmouse>.
The delimiter in the above file is a tab (\t) i.e. string1 is separated from abc:string2by \t. Similarly for the rest of the strings.
Now I want to retain just alphabets, numbers, /, :,'.' and _ within the strings which are enclosed within <>. I want to delete all the characters apart from the specified ones from the strings which are enlosed in <>.
Is there some way by which I may achieve this using linux commands or python? I want to replace all the unwanted characters by an underscore.
<string1> abc:string2 <http://yago-knowledge.org/resource/wikicategory_Sports_clubs_established_text_u003e_n_______sha1_u003eqwjfowt5my8t6yuszdb88k2ehskjuh0_sha1_u003e_n_____revision_u003e_n___/page_u003e_n___page_u003e_n_____title_u003ePortal:Tropical_cyclones/Anniversaries/August_22_/title_u003e_n_____ns_u003e100_/ns_u003e_n_____id_u003e7957689_/id_u003e_n_____revision_u003e_n_______id_u003e446349886_/id_u003e_n_______timestamp_u003e2011-08-23T17:38:19Z_/timestamp_u003e_n_______contributor_u003e_n_________username_u003eLightbot_/username_u003e_n_________id_u003e7178666_/id_u003e_n_______/contributor_u003e_n_______comment_u003eDelink_non-obscure_units._Conversions._Report_bugs_to___User_talk:Lightmouse>.
Is there some way by which I may achieve this?
You can probably achieve this just with UNIX tools and some crazy regular expression, but I would write a small Python script for this:
Open two files (input and output) with open()
Iterate over the input file line by line: for line in input_file:
Split the line at tab: for part in line.split('\t'):
Check if a part is enclosed in <>: if part.startswith('<') and line.endswith('>'):
Filter characters with a regular expression: filtered_part = re.sub(r'[^a-zA-Z0-9/:._]', '', part)
Join the filtered parts back together: filtered_line = '\t'.join(filtered_parts)
Write the filtered line to the output file: output_file.write(filtered_line + '\n')
Following this outline, it should be easy for you to write a working script.
I would like to replace every line that starts with a certain expression (example: <Output>) with what I want the output path to be. I have found and got to work a python script that replaces one string with another, in every occurrence in a file - something like:
text = open( path ).read()
if output_pattern in text:
open( path, 'w' ).write( text.replace( pattern, replace ) )
However I would like to replace the text.replace( pattern, replace ) with something that replaces the entire line that contains pattern with replace. I have tried some things and failed miserably.
Note: I can read but not quite write python...
One of my failures did replace the pattern with the line. Actually, it replaced the entire file with only the replace pattern, as many times as it was needed... Yeah, not funny since I was doing a recursive search (and the previous attempt, to replace one string with another, worked perfectly, so I was brave and set my target directory as the root of what I want to work with)
There are other great examples that read line by line and write to an output file, and then copy the output file to the input file, but I got an error doing that.
I don't really want to use regex because the patterns that I might want to search for (and especially what I want to replace) (may) contain many special characters including backslashes, but these could be escaped if needed.
To replace lines with replace if they start with pattern:
text = open(path).read()
new_text = '\n'.join(replace if line.startswith(pattern) else line
for line in text.splitlines())
open(path, 'w').write(new_text)
Or optimized for memory usage, and using the with statement, which is a bit more idiomatic:
with open(input_path) as text, open(output_path, 'w') as new_text:
new_text.write(''.join(replace if line.startswith(pattern) else line
for line in text))
You'll want to make sure replace has a newline character (\n) in it for the latter example to work as you'd expect.