Comparing .txt files in Python using askdirectory

Comparing .txt files in Python using askdirectory - python

I have the below code to run through a directory and select all of the files and compare them to an inserted wordlist file. However I get the following error TypeError: invalid file: ['C:/Users/Nathan/Desktop/chats\\(1,).out'] I cannot figure out how to change the os.path.join to correctly show the file location.
self.wordopp = askdirectory(title="Select chat log directory")
path = self.wordopp
files = os.listdir(path)
paths = []
wordlist = self.wordop
for file in files:
paths.append(os.path.join(path, file))
f = open(wordlist)
l = set(w.strip().lower() for w in f)
with open(paths) as f:
found = False
file = open("out.txt", "w")
for line in paths:
line = line.lower()
if any(w in line for w in l):
found = True
file.write(line)
print(line)
if not found:
print(line)

Consider this line of code:
with open(paths) as f:
Ask yourself, "what is paths"? It is a list of filenames, not a single file. That's pretty much what the error is telling you: that a list is an invalid file.
Considering that you are looping over a list of filenames, my guess is that your intention is to do:
with open(file) as f:
or maybe
with open(paths[-1]) as f:

Related

Go through files in given directory with python, read each file line by line and remove first and last string in the line and save updated file

So I have some .txt files inside of directory. Each .txt file contains some paths like:
'C:\d\folder\project\folder\Folder1\Folder2\Folder3\Module.c'
'C:\d\folder\project\folder\Folder1\Folder2\Folder3\Module2.c'
'C:\d\folder\project\folder\Folder1\Folder2\Folder3\Module3.c'
I need just some small function that will go through each line of each file inside of a dir and remove there ', so only clear path is left like:
C:\d\folder\project\folder\Folder1\Folder2\Folder3\Module.c
C:\d\folder\project\folder\Folder1\Folder2\Folder3\Module2.c
C:\d\folder\project\folder\Folder1\Folder2\Folder3\Module3.c
My code at the moment is:
for filename in files:
with open(filename, 'r') as file:
content = file.read().split('\n')
for line in content:
if line.startswith('')and line.endswith(''):
remove('')
Please assist!
SOLUTION:
I have managed to find a solution with a bit different approach:
for filename in files:
f = open(filename, 'rt')
filedata = f.read()
filedata = filedata.replace("'","")
f.close()
f = open(filename, 'wt')
f.write(filedata)
f.close()
Thanks!

python has a hirarchy to strings ', ", "" and so on so you can wrap a uptick into quotes for a split. Since we have the first element '' before the tick the second is your path
line.split("'")[1]
Edit: If i understood you correctly you want this
for filename in files:
paths = []
with open(filename, 'r') as file:
content = file.read().split('\n')
for line in content:
paths.append(line.split("'")[1])
file.close()
with open(filename, 'w') as file:
file.writelines(paths)
file.close()

Soo I just did bit different approach and managed to find a solution:
for filename in files:
f = open(filename, 'rt')
filedata = f.read()
filedata = filedata.replace("'","")
f.close()
f = open(filename, 'wt')
f.write(filedata)
f.close()
Thanks guys anyway!

How to add for loop in python?

I'm creating new files from originally existing ones in the mdp folder by changing a couple of lines in those files using python. I need to do this for 1000 files. Can anyone suggest a for loop which reads all files and changes them and creates new in one go?
This way I have to change the the number followed by 'md_' in the path and it's tedious because there are a 1000 files here.
I tried using str() but there was a 'could not read file error'
fin = open("/home/abc/xyz/mdp/md_1.mdp", "rt")
fout = open("/home/abc/xyz/middle/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('integrator = md', 'integrator
= md-vv'))
fin = open("/home/abc/xyz/middle/md_1.mdp", "rt")
fout = open("/home/abc/xyz/mdb/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('dt = 0.001', 'dt
= -0.001'))
fin.close()
fout.close()

os.listdir(path) is your friend:
import os
sourcedir = "/home/abc/xyz/mdp"
destdir = "/home/abc/xyz/middle"
for filename in os.listdir(sourcedir):
if not filename.endswith(".mdp"):
continue
source = os.path.join(sourcedir, filename)
dest = os.path.join(destdir, filename)
# with open(xxx) as varname makes sure the file(s)
# will be closed whatever happens in the 'with' block
# NB text mode is the default, and so is read mode
with open(source) as fin, open(dest, "w") as fout:
# python files are iterable... avoids reading
# the whole file in memory at once
for line in fin:
# will only work for those exact strings,
# you may want to use regexps if number of
# whitespaces vary etc
line = line.replace("dt = 0.001", "dt = -0.001")
line = line.replace(
'integrator = md',
'integrator = md-vv'
)
fout.write(line)

Assuming you want to edit all files that are located in the mdp folder you could do something like this.
import os
dir = "/home/abc/xyz/mdp/"
for filename in os.listdir(dir):
with open(dir + filename, "r+") as file:
text = file.read()
text = text.replace("dt = 0.001", "dt = -0.001")
file.seek(0)
file.write(text)
file.truncate()
This will go through every file and change it using str.replace().
If there are other files in the mdp folder that you do not want to edit, you could use and if-statement to check for the correct file name. Add something like this to encase the with open statement.
if filename.startswith("md_")

How to loop through each file in a folder, do some action to the file and save output to a file in another folder Python

I have a folder with multiple files like so:
1980
1981
1982
In each of these files is some text. I want to loop through each of these files and do some operation to each file then save the edited file to another folder and move onto the next file and so on. The result would be that I have the original folder and then another folder with the edited version of each file in it like so:
1980_filtered
1981_filtered
1982_filtered
Is it possible to do this?
Currently I have some code that loops through the files in a folder, does some filtering to each file and then saves all the edits of each file into one massive file. Here is my code:
import os
input_location = 'C:/Users/User/Desktop/mini_mouse'
output_location = 'C:/Users/User/Desktop/filter_mini_mouse/mouse'
for root, dir, files in os.walk(input_location):
for file in files:
os.chdir(input_location)
with open(file, 'r') as f, open('NLTK-stop-word-list', 'r') as f2:
mouse_file = f.read().split() # reads file and splits it into a list
stopwords = f2.read().split()
x = (' '.join(i for i in mouse_file if i.lower() not in (x.lower() for x in stopwords)))
with open(output_location, 'a') as output_file:
output_file.write(x)
Any help would be greatly appreciated!

You need to specify what each new file is called. To do so, Python has some good string formatting methods. Fortunately, your new desired file names are easy to do in a loop
import os
input_location = 'C:/Users/User/Desktop/mini_mouse'
output_location = 'C:/Users/User/Desktop/filter_mini_mouse/mouse'
for root, dir, files in os.walk(input_location):
for file in files:
new_file = "{}_filtered.txt".format(file)
os.chdir(input_location)
with open(file, 'r') as f, open('NLTK-stop-word-list', 'r') as f2:
mouse_file = f.read().split()
stopwords = f2.read().split()
x = (' '.join(i for i in mouse_file if i.lower() not in (x.lower() for x in stopwords)))
with open(output_location+'/'+new_file, 'w') as output_file: # Changed 'append' to 'write'
output_file.write(x)
If you're in Python 3.7, you can do
new_file = f"{file}_filtered.txt"
and
with open(f"{output_location}/{new_file}", 'w') as output_file:
output_file.write(x)

First of all you should start by opening the NLTK-stop-word-list only once, so I moved it outside of your loops. Second, os.chdir() is redundant, you can use os.path.join() to get your current file path (and to construct your new file path):
import os
input_location = 'C:/Users/User/Desktop/mini_mouse'
output_location = 'C:/Users/User/Desktop/filter_mini_mouse/'
stop_words_path = 'C:/Users/User/Desktop/NLTK-stop-word-list.txt'
with open(stop_words_path, 'r') as stop_words:
for root, dirs, files in os.walk(input_location):
for name in files:
file_path = os.path.join(root, name)
with open(file_path, 'r') as f:
mouse_file = f.read().split() # reads file and splits it into a list
stopwords = stop_words.read().split()
x = (' '.join(i for i in mouse_file if i.lower() not in (x.lower() for x in stopwords)))
new_file_path = os.path.join(output_location, name) + '_filtered'
with open(new_file_path, 'a') as output_file:
output_file.write(x)
P.S: I took the liberty to change some of your variable names as they were part of python's built in words ('file' and 'dir'). If you'll run __builtins__.__dict__.keys() you'll see them there.

Python, check if multiple files are empty-print results to separate file

I have list of files for which i need to check if they're empty or not
if they are non-empty print file name and file content, else do nothing
for example: file 1.html content: a, 2.html content: b, 3.html -empty
need to create resulting file with content of both files:
output.txt:
1.html
a
2.html
b
i have this code:
import os
files = ["1.html", "2.html", "3.html"];
for i in range(len(files)):
with open(files) as file:
first = file.read(1)
if not first:
print('') #nothing to print
else:
print file #print file name
print file.read() #print file content
and getting:
with open(files) as file:
TypeError: coercing to Unicode: need string or buffer, list found

You're complicating it too much, just load the file contents upfront - print it if there's something, ignore if not:
files = ["1.html", "2.html", "3.html"]
for filename in files:
with open(filename, "r") as f:
contents = f.read()
if contents:
print(filename)
print(contents)

for file in files:
with open(file) as fin:
if fin.read():
print file
print file.read()

You render your with statement moot since you are opening the initial array, and not files[i]. The better way to handle this is:
files = ["1.html", "2.html", "3.html"];
for f in files:
with open(f) as file:
first = file.read(1)
if not first:
print('') #nothing to print
else:
print f #print file name
print file.read() #print file content

Wrtie a file to a new file with addition text based on criteria within source file in python

I am trying to get a Python script that will open a few text files, read the content and every time it finds a word from a list, block that out with new text, then write it to a new file, for each file.
Right now, I can get it to write all of the source files to a single file, which is my script below, but I am not sure how to proceed to having a new file for every source file.
import os
KeyWords=["Magic","harry","wand"]
rootdir = "C:\\books"
fileslist = []
##blanks file and preps for new data
fileout = open(rootdir+"\\output\\newfile.txt","w")
print (fileout)
fileout.write("Start of file\n\nLocation of output: "+rootdir+"\\output \n\nFiles that are being Processed:\n\n")
fileout.close()
def sourcelist(fileslist):
file=open(fileslist,"r")
fileout=open(rootdir+"\\output\\newfile.txt", "a")
for line in file:
if any(word.lower() in line.lower() for word in KeyWords):
print("Word Found\n\n" + '\t'+line + "\nEnd\n")
fileout.write("<<<SEARCH TERM FOUND>>>\n\n" + '\t'+line + "\n<<<END OF BLOCK>>>\n")
else:
#print('\t'+line) #No need to print the lines with no Key words in
fileout.write('\t'+line)
#return #not sure what return does?
for root, dirs, files in os.walk(rootdir):
dirs.clear()
for file in files:
filepath = root + os.sep + file
if filepath.endswith(".txt"):
fileslist.append(filepath)
for path in fileslist:
sourcelist(path)
print("\n".join(fileslist))
with open(rootdir+"\\output\\newfile.txt","a") as output:
output.write("\n".join(fileslist)+"\n\n\n")
output.close()

This is a bit tough to answer as a whole, but here's a general approach.
I have the following file structure:
hp_extracts: # directory
hp_parser.py
-- inps/
-- harry_1.txt
-- harry_2.txt
-- outs/
<nothing>
Contents of inps/harry_1.txt:
When Harry got his wand it was Magic
something something magic something
something harry something
Contents of inps/harry_2.txt:
magic something something
something
harry something something
This is the contents of hp_parser.py:
import os
all_files = os.listdir('inps/')
keywords=["magic","harry","wand"]
for file in all_files:
with open('inps/{}'.format(file)) as infile, open('outs/{}'.format(file), 'w') as outfile:
for line in infile:
#print(line)
for word in line.split():
if word.lower() in keywords:
line = line.replace(word, '<<<SEARCH TERM FOUND>>> {} <<<END OF BLOCK>>>'.format(word))
outfile.write(line)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing .txt files in Python using askdirectory - python

Related

Go through files in given directory with python, read each file line by line and remove first and last string in the line and save updated file

How to add for loop in python?

How to loop through each file in a folder, do some action to the file and save output to a file in another folder Python

Python, check if multiple files are empty-print results to separate file

Wrtie a file to a new file with addition text based on criteria within source file in python

Categories

Resources