I am new in Python. I am looking for the number of occurrences of a text string in a defined folder containing text files. I'm talking about the total number of this particular string.
def errors():
errors = 0
file = open ("\\d:\\myfolder\\*.txt", "r")
data = file.read()
errors = data.count("errors")
return errors
print("Errors:", errors)
Your code doesn't make any sense, but if I understand what you want to do, then here's some pseudo-code to get you going:
from glob import glob
text_file_paths = glob("\\d:\\myfolder\\*.txt")
error_counting = 0
for file_path in text_file_paths:
with open(file_path, 'r') as f:
all_file_lines = f.readlines()
error_counting += sum([line.count('errors') for line in all_lines])
print(error_counting)
Does that help?
Related
What I'm trying to do is basically reading a csv file to a list, ( one column only ). Then I need to read 11 elements ( one element = 9 digit number ) from the list, with comma, to a row with newline. All goes into another text file. 11 elements i a row match a A4 sheet. Then iterate over remaining elements in the list. I can't figure out how. Below follows my code I'm working on:
count = 0
textfile = 'texstf.txt'
filepath = 'testfile2.csv'
with open(filepath, "r") as f:
lines = [ str(line.rstrip()) for line in f ]
for key in lines:
while(count < 11):
with open(textfile, "w") as myfile:
myfile.write(','.join(lines))
count += 1
csv sample:
6381473
6381783
6381814
...
expected output to file sample:
6381473,6381783,6381814
I ran your code and it looks like it is working. If you could provide any more context of what specifically is not working with your code such as error messages, that would be helpful. Make sure you have the correct filepath for each file you are trying to read and write.
Here is an alternative way to do this based off of this similar question:
import csv
import os
textfile = os.getcwd() + r'\texstf.txt'
filepath = os.getcwd() + r'\testfile2.csv'
with open(textfile, "w") as my_output_file:
with open(filepath, "r") as my_input_file:
[ my_output_file.write(" ".join(row)+',') for row in csv.reader(my_input_file)]
my_output_file.close()
I have 100 text files in a folder that I wanna load into a list.
I was only able to load one file. how can I load all the files?
Here is what I did
with open('varmodel/varmodel_2.var') as f:
varmodel_2 = f.read()
print(varmodel_2)
However, instead of 2, I have from 1 to 100
You can use the glob module to do just that. It gives you a list of all files/folders in a directory. Here is the code you would use to get a string containing all of the file information:
import glob
string = ""
for filename in glob.glob("*"):
with open(filename, "r") as f:
string += f.read()
print(string)
all_files = []
for dir in glob.glob('varmodel/*'):
with open(dir) as f:
varmodel = f.read(varmodel )
# not sure about txt file content
# it may need preprocess before put to list
all_files.append(varmodel )
print(varmodel)
I have several txt files that I would like to search and print a single line if it starts with certain words (<meta property="og:description" content=). I currently have this code, which I want to search just each file in the specified folder:
import glob
filepath = '/Volumes/hardDrive/Folder/files/*'
corpus = glob.glob(filepath)
for textfile in corpus:
f = open(textfile, 'r')
pTxt = []
for ln in f:
if ln.startswith(r'<meta property="og:description" content='):
pTxt.append(ln[2:])
print(pTxt)
Right now, it’s returning [] (without stopping) when it shouldn’t be, which it’s also returning when I shorten the text to “<meta” (which should return several more results). How can I fix this so that it only prints the target line from each file?
import glob
filepath = '/home/filepath/*.txt'
for textfile in glob.glob(filepath):
pTxt = []
f = open(textfile, 'r').readlines()
for ln in f:
if ln.startswith(r'<meta property="og:description" content='):
pTxt.append(ln)
print(pTxt)
This Should Work Fine
I am trying to save my output from x .txt files in only one .txt file.
The .txt file should look like the output as you can see in the picture below.
What this program actually does is read a couple of .txt files with tons of data which I filter out using regex.
My source code:
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
print(values)
Thank you for your time! :)
Then you just need to open a file and write values to it. Try with this. You might need to format (I cannot test since I don't have your text files. I am assuming the output you have in values is correct and keep in mind that this is appending, so if you run more than once you will get duplicates.
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
outF = open("myOutFile.txt", "a")
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
outF.write(values)
print(values)
I m trying to open up a text file and look for string Num_row_lables. If the value for Num_row_labels is greater than or equal to 10, then print the name of the file.
In the example below, my text file test.mrk has some text in the format below: P.s., my text file doesn't have Num_row_labels >= 10. It always has "equal to".
Format= { Window_Type="Tabular", Tabular= { Num_row_labels=10 } }
so I created a variable teststring to hold the pattern I will be looking at.
Then I opened the file.
Then using re, I got Num_row_labels=10 in my variable called match.
Using group() on match, I extracted the threshold number I wanted and using int() converted the string to int.
My purpose is to read the text file to find/print the value for Num_row_labels along with the name of file if the text file has Num_row_labels = 10 or any # greater than 10.
Here's my test code:
import os
import os.path
import re
teststring = """Format= { Window_Type="Tabular", Tabular= { Num_row_labels=10 } }"""
fname = "E:\MyUsers\ssbc\test.mrk"
fo = open(fname, "r")
match = re.search('Num_row_labels=(\d+)', teststring)
tnum = int(match.group(1))
if(tnum>=10):
print(fname)
How do I make sure that I m searching the match in the content of opened file and checking the condition for tnum>=10? My test code would simply print the file name only on the basis of last 4 lines. I want to be sure that the search is all over the content of my text file.
so what you want to do is to read out the whole file as a string, and search for your pattern on that string
with open(fname, "r") as fo:
content_as_string = fo.read()
match = re.search('Num_row_labels=(\d+)', content_as_string)
# do want you want to the matchings
Python code to read file content based on condition
file = '../input/testtxt/kaggle.txt'
output = []
with open(file, 'r') as fp:
lines = fp.readlines()
for i in lines:
if('Image for' in i):
output.append(i)
print(output)