Using regex in Python script [duplicate] - python

This question already has answers here:
How to input a regex in string.replace?
(7 answers)
Closed 4 years ago.
Am trying to write a python script to search and replace this line:
time residue 3 Total
with an empty line.
This is my script:
import glob
read_files = glob.glob("*.agr")
with open("out.txt", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
with open("out.txt", "r") as file:
filedata = file.read()
filedata = filedata.replace(r'#time\s+residue\s+[0-9]\s+Total', 'x')
with open("out.txt", "w") as file:
file.write(filedata)
Using this, am not able to get any replacement. Why could that be? The rest of the code is working fine. The output file has not change to it so i suspect that the pattern cant be found.
Thank you.

The str.replace method replaces a fixed substring. Use re.sub instead if you're looking to replace a match of a regex pattern:
import re
filedata = re.sub(r'#time\s+residue\s+[0-9]\s+Total', 'x', filedata)

Related

How to get just the first word of every line of file using python? [duplicate]

This question already has an answer here:
List the first words per line from a text file in Python
(1 answer)
Closed 7 months ago.
as you can see i'm a newbie and i don't know how to ask this question so i'm going to explain.
i was writing Somali dictionary in text format and i have a lot of words and their meaning, so i want to have those words only not their meaning in another text format file in order to have a list of only vocabulary. Is their a way i can do that. Example "abaabid m.dh eeg abaab². ld ababid. ld abaab¹, abaabis." I have hundred of these words and their meaning and i want to pick only the word "abaabid" and etc. so how can i automate it in python instead of copy pasting manually all day?. Stop saying post the code as text, i don't even know how to write the code and that's why i'm asking this question. This screenshot is the text file showing words with their meaning.
If you just want a script to read the dictionary entries and then write the words into a separate file, try something like this:
def get_words(filename='Somali Dictionary.txt'):
with open(filename, 'r') as f:
lines = [line.split()[0] for line in f.readlines() if line != '\n']
f.close()
return lines
def write_words(lines, filename='Somali Words.txt'):
with open(filename, 'w') as f:
for line in lines:
f.write(line)
f.write('\n')
f.close()
Example usage:
words = get_words()
write_words(words)
Or, alternatively:
if __name__ == '__main__':
words = get_words()
write_words(words)
In order to get the first word of every line follow these steps
f = open('file.txt', 'r')
for line in f:
print(line.split(' ')[0])
or
with open('convert.txt', 'r') as f:
for line in f:
print(line.split(' ')[0])
If it shows you error in console about (UnicodeDecodeError: 'charmap' codec can't decode) you can fix by adding encoding='utf-8'(i'm using .txt file) and my file format is utf-8 and down below is how you are adding in your code
with open('convert.txt', 'r', encoding='utf-8') as f:
for line in f:
print(line.split(' ')[0])

Read txt file into python [duplicate]

This question already has answers here:
Wrong encoding when reading file in Python 3?
(1 answer)
Read special characters from .txt file in python
(3 answers)
Closed last year.
I have german wordlist which contain special charachters like ä,ö,ü. and a word e.g. like "Nährstoffe". But when i read the text file and create a dict from it i get a wrong word out of it.
Here is my code in python3:
import random
import csv
import os
permanettxtfile='wortliste.txt'
newlines = open(permanettxtfile, "r")
lines=newlines.read().split('\n')
random.shuffle(lines)
linkdict=dict.fromkeys(lines)
print(linkdict)
I get as output:
'Nährstoffe': None
But i want:
'Nährstoffe': None
How can i solve this issue? Is this an UTF-8 issue?
Try opening file in utf-8 encoding:
import random
import csv
import os
permanettxtfile='wortliste.txt'
with open(permanettxtfile, 'r', encoding='utf-8') as file:
lines = file.read().split('\n')
random.shuffle(lines)
linkdict = dict.fromkeys(lines)
print(linkdict)
Also don't forget to close it with context manager as in my example or with newlines.close() for your example
Specify the encoding using
open(permanettxtfile, "r", encoding="UTF-8")
It is most likly a encoding issue you can try this:
with open("filename.txt", "rb") as f:
contents = f.read().decode("UTF-8")
or
with open("filename.txt", encoding='utf-8') as f:
contents = f.read()

Python Tex tfile [duplicate]

This question already has answers here:
How do I append to a file?
(13 answers)
Closed 1 year ago.
I want to store the inputs in my text file every time I run it. But when I run the script the old inputs gets deleted! how can I fix this?
sample code:
name = input('Enter name: ')
f = open('test.txt', 'w')
f.write(name)
f.close()
You should open the file in append mode:
f = open('test.txt', 'at')
Notice the at, meaning append text. You have w mode, meaning write text (text is default), which first truncates the file. This is why your old inputs got deleted.
With the 'w' in the open() you create a new empty file everytime.
use open('test.txt', 'a+')
Also I would suggest you use the with statement:
name = input('Enter name: ')
with open('test.txt', 'a+') as file:
file.write(name)
write the additional text in append mode
f = open('test.txt', 'a')
'w' is write mode and so deletes any other data before rewriting any new data

Pythons equivalent to PHP's file(fn, FILE_IGNORE_NEW_LINES) [duplicate]

This question already has answers here:
How to read a file line-by-line into a list?
(28 answers)
Closed 9 years ago.
I want to read a file in python and put each new line into an array. I know how to do it in PHP, with the file(fn, FILE_IGNORE_NEW_LINES); function and it's FILE_IGNORE_NEW_LINES parameter, but how do I do it in Python?
When reading a file (line-by-line), usually the new line characters are appended to the end of the line, as you loop through it. If you want to get rid of them?
with open('filename.txt', 'r') as f:
for line in f:
line = line.strip('\n')
#do things with the stripped line!
This is the same as (In Python):
with open("file.txt", "r") as f:
for line in f:
line = line.rstrip("\n")
...
You want this:
with open('filename.txt', 'r') as f:
data = [line.replace('\n', '') for line in f]

Replace string within file contents [duplicate]

This question already has answers here:
Replacing instances of a character in a string
(17 answers)
How to search and replace text in a file?
(22 answers)
How to read a large file - line by line?
(11 answers)
Writing a list to a file with Python, with newlines
(26 answers)
Closed 7 months ago.
How can I open a file, Stud.txt, and then replace any occurences of "A" with "Orange"?
with open("Stud.txt", "rt") as fin:
with open("out.txt", "wt") as fout:
for line in fin:
fout.write(line.replace('A', 'Orange'))
If you'd like to replace the strings in the same file, you probably have to read its contents into a local variable, close it, and re-open it for writing:
I am using the with statement in this example, which closes the file after the with block is terminated - either normally when the last command finishes executing, or by an exception.
def inplace_change(filename, old_string, new_string):
# Safely read the input filename using 'with'
with open(filename) as f:
s = f.read()
if old_string not in s:
print('"{old_string}" not found in {filename}.'.format(**locals()))
return
# Safely write the changed content, if found in the file
with open(filename, 'w') as f:
print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
s = s.replace(old_string, new_string)
f.write(s)
It is worth mentioning that if the filenames were different, we could have done this more elegantly with a single with statement.
#!/usr/bin/python
with open(FileName) as f:
newText=f.read().replace('A', 'Orange')
with open(FileName, "w") as f:
f.write(newText)
Using pathlib (https://docs.python.org/3/library/pathlib.html)
from pathlib import Path
file = Path('Stud.txt')
file.write_text(file.read_text().replace('A', 'Orange'))
If input and output files were different you would use two different variables for read_text and write_text.
If you wanted a change more complex than a single replacement, you would assign the result of read_text to a variable, process it and save the new content to another variable, and then save the new content with write_text.
If your file was large you would prefer an approach that does not read the whole file in memory, but rather process it line by line as show by Gareth Davidson in another answer (https://stackoverflow.com/a/4128192/3981273), which of course requires to use two distinct files for input and output.
Something like
file = open('Stud.txt')
contents = file.read()
replaced_contents = contents.replace('A', 'Orange')
<do stuff with the result>
with open('Stud.txt','r') as f:
newlines = []
for line in f.readlines():
newlines.append(line.replace('A', 'Orange'))
with open('Stud.txt', 'w') as f:
for line in newlines:
f.write(line)
If you are on linux and just want to replace the word dog with catyou can do:
text.txt:
Hi, i am a dog and dog's are awesome, i love dogs! dog dog dogs!
Linux Command:
sed -i 's/dog/cat/g' test.txt
Output:
Hi, i am a cat and cat's are awesome, i love cats! cat cat cats!
Original Post: https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands
easiest way is to do it with regular expressions, assuming that you want to iterate over each line in the file (where 'A' would be stored) you do...
import re
input = file('C:\full_path\Stud.txt', 'r')
#when you try and write to a file with write permissions, it clears the file and writes only #what you tell it to the file. So we have to save the file first.
saved_input
for eachLine in input:
saved_input.append(eachLine)
#now we change entries with 'A' to 'Orange'
for i in range(0, len(old):
search = re.sub('A', 'Orange', saved_input[i])
if search is not None:
saved_input[i] = search
#now we open the file in write mode (clearing it) and writing saved_input back to it
input = file('C:\full_path\Stud.txt', 'w')
for each in saved_input:
input.write(each)

Categories