I have several txt files that I would like to search and print a single line if it starts with certain words (<meta property="og:description" content=). I currently have this code, which I want to search just each file in the specified folder:
import glob
filepath = '/Volumes/hardDrive/Folder/files/*'
corpus = glob.glob(filepath)
for textfile in corpus:
f = open(textfile, 'r')
pTxt = []
for ln in f:
if ln.startswith(r'<meta property="og:description" content='):
pTxt.append(ln[2:])
print(pTxt)
Right now, it’s returning [] (without stopping) when it shouldn’t be, which it’s also returning when I shorten the text to “<meta” (which should return several more results). How can I fix this so that it only prints the target line from each file?
import glob
filepath = '/home/filepath/*.txt'
for textfile in glob.glob(filepath):
pTxt = []
f = open(textfile, 'r').readlines()
for ln in f:
if ln.startswith(r'<meta property="og:description" content='):
pTxt.append(ln)
print(pTxt)
This Should Work Fine
Related
I have a directory of 50 txt files. I want to combine the contents of each file into a Python list.
Each file looks like;
line1
line2
line3
I am putting the files / file path into a list with this code. I just need to loop through file_list and append the content of each txt file to a list.
from pathlib import Path
def searching_all_files():
dirpath = Path(r'C:\num')
assert dirpath.is_dir()
file_list = []
for x in dirpath.iterdir():
if x.is_file():
file_list.append(x)
elif x.is_dir():
file_list.extend(searching_all_files(x))
return file_list
But I am unsure best method
Maybe loop something close to this?
NOTE: NOT REAL CODE!!!! JUST A THOUGHT PULLED FROM THE AIR. THE QUESTION ISNT HOW TO FIX THIS. I AM JUST SHOWING THIS AS A THOUGHT. ALL METHODS WELCOME.
file_path = Path(r'.....')
with open(file_path) as f:
source_path = f.read().splitlines()
source_nospaces = [x.strip(' ') for x in source_path]
return source_nospaces
You could make use of pathlib.rglob in order to search for all files in a directory recursively and readlines() to append the contents to list:
from pathlib import Path
files = Path('/tmp/text').rglob('*.txt')
res = []
for file in files:
res += open(file).readlines()
print(res)
Out:
['file_content2\n', 'file_content3\n', 'file_content1\n']
I have multiple text files containing different text.
They all contain a single appearance of the same 2 lines I am interested in:
================================================================
Result: XX/100
I am trying to write a script to collect all those XX values (numerical values between 0 and 100), and paste them in a CSV file with the text file name in column A and the numerical value in column B.
I have considered using Python or PowerShell for this purpose.
How can I identify the line where "Result" appears under the string of "===..", collect its content until '\n', and then strip it from "Result: " and "/100"?
"Result" and other numerical values could appear in the files, but never in the quoted format, and below "=====", like the line im interested in.
Thank you!
Edit: I have written this poor naive attempt to collect the numerical values.
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
for filename in os.listdir(dir_path):
if filename.endswith(".txt"):
with open(filename,"r") as f:
lineFound=False
for index, line in enumerate(f):
if lineFound:
line=line.replace("Result: ", "")
line=line.replace("/100","")
line.strip()
grade=line
lineFound=False
print(grade, end='')
continue
if index>3:
if "================================================================" in line:
lineFound=True
I'd still be happy to learn if there's a simple way to do this with PowerShell tbh
For the output, I used csv writer to append the results to a file one by one.
So there's two steps involved here, first is to get a list of files. There's a ton of answers for that one on stackoverflow, but this one is stupidly complete.
Once you have the list of files, you can simply just load the files themselves one by one, and then do some simple string.split() to get the value you want.
Finally, write the results into a CSV file. Since the CSV file is a simple one, you don't need to use the CSV library for this.
See the code example below. Note that I copied/pasted the function for generating the list of files from my personal github repo. I reuse that one a lot.
import os
def get_files_from_path(path: str = ".", ext:str or list=None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this.
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result
filelist = get_files_from_path("path/to/files/", ext=".txt")
split1 = "================================================================\nResult: "
split2 = "/100"
with open("output.csv", "w") as outfile:
outfile.write('filename, value\n')
for filename in filelist:
with open(filename) as infile:
value = infile.read().split(split1)[1].split(split2)[0]
print(value)
outfile.write(f'"{filename}", {value}\n')
You could try this.
In this example the filename written to the CSV will be its full (absolute) path. You may just want the base filename.
Uses the same, albeit seemingly unnecessary, mechanism for deriving the source directory. It would be unusual to have your Python script in the same directory as your data.
import os
import glob
equals = '=' * 64
dir_path = os.path.dirname(os.path.realpath(__file__))
outfile = os.path.join(dir_path, 'foo.csv')
with open(outfile, 'w') as csv:
print('A,B', file=csv)
for file in glob.glob(os.path.join(dir_path, '*.txt')):
prev = None
with open(file) as indata:
for line in indata:
t = line.split()
if len(t) == 2 and t[0] == 'Result:' and prev.startswith(equals):
v = t[1].split('/')
if len(v) == 2 and v[1] == '100':
print(f'{file},{v[0]}', file=csv)
break
prev = line
I have 100 text files in a folder that I wanna load into a list.
I was only able to load one file. how can I load all the files?
Here is what I did
with open('varmodel/varmodel_2.var') as f:
varmodel_2 = f.read()
print(varmodel_2)
However, instead of 2, I have from 1 to 100
You can use the glob module to do just that. It gives you a list of all files/folders in a directory. Here is the code you would use to get a string containing all of the file information:
import glob
string = ""
for filename in glob.glob("*"):
with open(filename, "r") as f:
string += f.read()
print(string)
all_files = []
for dir in glob.glob('varmodel/*'):
with open(dir) as f:
varmodel = f.read(varmodel )
# not sure about txt file content
# it may need preprocess before put to list
all_files.append(varmodel )
print(varmodel)
I am new in Python. I am looking for the number of occurrences of a text string in a defined folder containing text files. I'm talking about the total number of this particular string.
def errors():
errors = 0
file = open ("\\d:\\myfolder\\*.txt", "r")
data = file.read()
errors = data.count("errors")
return errors
print("Errors:", errors)
Your code doesn't make any sense, but if I understand what you want to do, then here's some pseudo-code to get you going:
from glob import glob
text_file_paths = glob("\\d:\\myfolder\\*.txt")
error_counting = 0
for file_path in text_file_paths:
with open(file_path, 'r') as f:
all_file_lines = f.readlines()
error_counting += sum([line.count('errors') for line in all_lines])
print(error_counting)
Does that help?
I have more than 30 text files. I need to do some processing on each text file and save them again in text files with different names.
Example-1: precise_case_words.txt ---- processing ---- precise_case_sentences.txt
Example-2: random_case_words.txt ---- processing ---- random_case_sentences.txt
Like this i need to do for all text files.
present code:
new_list = []
with open('precise_case_words.txt') as inputfile:
for line in inputfile:
new_list.append(line)
final = open('precise_case_sentences.txt', 'w+')
for item in new_list:
final.write("%s\n" % item)
Am manually copy+paste this code all the times and manually changing the names everytime. Please suggest me a solution to avoid manual job using python.
Suppose you have all your *_case_words.txt in the present dir
import glob
in_file = glob.glob('*_case_words.txt')
prefix = [i.split('_')[0] for i in in_file]
for i, ifile in enumerate(in_file):
data = []
with open(ifile, 'r') as f:
for line in f:
data.append(line)
with open(prefix[i] + '_case_sentence.txt' , 'w') as f:
f.write(data)
This should give you an idea about how to handle it:
def rename(name,suffix):
"""renames a file with one . in it by splitting and inserting suffix before the ."""
a,b = name.split('.')
return ''.join([a,suffix,'.',b]) # recombine parts including suffix in it
def processFn(name):
"""Open file 'name', process it, save it under other name"""
# scramble data by sorting and writing anew to renamed file
with open(name,"r") as r, open(rename(name,"_mang"),"w") as w:
for line in r:
scrambled = ''.join(sorted(line.strip("\n")))+"\n"
w.write(scrambled)
# list of filenames, see link below for how to get them with os.listdir()
names = ['fn1.txt','fn2.txt','fn3.txt']
# create demo data
for name in names:
with open(name,"w") as w:
for i in range(12):
w.write("someword"+str(i)+"\n")
# process files
for name in names:
processFn(name)
For file listings: see How do I list all files of a directory?
I choose to read/write line by line, you can read in one file fully, process it and output it again on block to your liking.
fn1.txt:
someword0
someword1
someword2
someword3
someword4
someword5
someword6
someword7
someword8
someword9
someword10
someword11
into fn1_mang.txt:
0demoorsw
1demoorsw
2demoorsw
3demoorsw
4demoorsw
5demoorsw
6demoorsw
7demoorsw
8demoorsw
9demoorsw
01demoorsw
11demoorsw
I happened just today to be writing some code that does this.