python grep reverse matching

python grep reverse matching - python

I would like to build a small python script that basicaly does the reverse of grep.
I want to match the files in a directory/subdirectory that doesn't have a "searched_string".
So far i've done that:
import os
filefilter = ['java','.jsp']
path= "/home/patate/code/project"
for path, subdirs, files in os.walk(path):
for name in files:
if name[-4:] in filefilter :
print os.path.join(path, name)
This small script will be listing everyfiles with "java" or "jsp" extension inside each subdirectory, and will output them full path.
I'm now wondering how to do the rest, for example i would like to be able if I forgot a session management entry in one file (allowing anyone a direct file access), to search for :
"if (!user.hasPermission" and list the file which does not contain this string.
Any help would be greatly appreciated !
Thanks

To check if a file with a path bound to variable f contains a string bound to name s, simplest (and acceptable for most reasonably-sized files) is something like
with open(f) as fp:
if s in fp.read():
print '%s has the string' % f
else:
print '%s doesn't have the string' % f
In your os.walk loop, you have the root path and filename separately, so
f = os.path.join(path, name)
(what you're unconditionally printing) is the path you want to open and check.

Instead of printing file name call function that will check if file content do not match texts you want to have in source files. In such cases I use check_file() that looks like this:
WARNING_RX = (
(re.compile(r'if\s+\(!\s+user.hasPermission'), 'user.hasPermission'),
(re.compile(r'other regexp you want to have'), 'very important'),
)
def check_file(fn):
f = open(fn, 'r')
content = f.read()
f.close()
for rx, rx_desc in WARNING_RX:
if not rx.search(content):
print('%s: not found: %s' % (fn, rx_desc))

Related

how to search for a pattern in a file on other directory in python on windows?

I am writing a program where in I need to find a specific words in a file of another directory.
For ex: I am in C:\python\ directory, I want to search a pattern in a file which is located in D drive. D:\Sample_file\file.txt.
I have many such files in the sample_file directory,where in which I need to
find specific words in that file.
For example: In D:\Sample_file\file.txt I need to find "hello world" and
in D:\Sample_file\file1.txt I need to find "0 files not found" and in
D:\Sample_another\file3.txt I need to find "oh my god!!"
how do I do it.
I have tried os.relpath but couldn't achieve the desired result
import os
paths = 'D:/'
def dir_list_folder(paths):
for folderName in os.listdir(paths):
if (folderName.find('.') == -1):
folderPath = os.path.join(paths,folderName );
dir_list_folder(folderPath);
else:
print ('Files is :'+ folderName );
I expect to if it matches a pattern it should return true,else false

Here is example of for loop that you're looking for. It is expected that files is a list of full paths to your files.
patterns = ['hello world', '0 files found']
for fileName in files:
with open(fileName) as f:
allLines = {x.strip() for x in f.readlines()}
for pattern in patterns:
if pattern in allLines:
print('`%s` detected in %s' % (pattern, fileName))
# then you can return True
# and here False
Not 100% sure that this is the best approach, but I hope it helps you.

Search for file names using a text list

I am currently trying to search for a list of missing attachments that are scattered throughout our file share server.I have a txt file with a list of filenames I would like to find. Ideally at some point I would like to copy the file to another location when found. At this moment it doesn't seem to be working, because it returns no results and I have verified the files exist.
import os
source = open("Test.txt", "r")
dest = '//somewhere/somefolder'
path = '//somewhere/anotherfolder'
for line in source:
print 'Searching . . .'
print line
for root, dirs, files in os.walk(path):
for name in files:
if name in line:
# shutil.copy(os.path.join(root, name), dest)
print 'Found!'
print path, name
print 'Done';

Question/Comment: is there a way to ignore the path and only search for the filename?
For instance:
import os
fpath = "//somewhere/anotherfolder/filename.pdf"
fname = os.path.basename(fpath)
print('fname=%s' % fname)
Output:
fname=filename.pdf

Processing specific files in a directory in python

I wrote a small python program which processes all the files in a directory. I want to restrict that to include only JSON files in that directory. For example, the line fname in fileList: in the code snipped below should only enumerate files with the extension *.json
#Set the directory you want to start from
rootDir = '/home/jas_parts'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
fname='jas_parts/'+fname
with open(fname, 'r+') as f:
json_data = json.load(f)
event = json_data['Total']
print(event)

Since your file name is string you can use the str.endswith method to check if it is json file.
if fname.endswith('.json'):
#do_something()

Just filter the names that you are interested in.
if fname[-5:] == '.json':
(of course, you can also use os.path.splitext, or re, doesn't really matter how you get to the extension)

Here's a general solution to the question: "How do I do X to all files with names matching some pattern under directory Y?"
#!python
from __future__ import print_function
import fnmatch, os, os.path
def files_under(directory, pattern):
'''Yield all files matching pattern under some directory
'''
for p, dnames, fnames in os.walk(directory):
for match in fnmatch.filter(fnames, pattern):
yield(os.path.join(p, match))
if __name__ == '__main__':
import sys
if len(sys.argv) < 3:
print('Must supply path and (quoted) pattern', file=sys.stderr)
sys.exit(1)
try:
for each in files_under(sys.argv[1], sys.argv[2]):
print(each)
except EnvironmentError, e:
print ('Error trying to walk tree: %s ' % e, file=sys.stderr)
sys.exit(2)
The function is files_under() and the rest is just a very simplistic wrapper around it to print the matching results.
It's also easy to extend this to handle multiple patterns and even, with a little extra work, to ensure that files with names matching multiple patterns are only yielded once each. But I'll leave these enhancements as an exercise to the student.

How To Include "part#+.rar" In "Extensions

So I'm using a script to split filenames into "name" and "extension" so I can then apply a bunch of rules to and play around with the "name" and have the script put everything back together at the end.
At the moment, I'm using:
import os, shutil, re
def rename_file (original_filename):
name, extension = os.path.splitext(original_filename)
name = re.sub(r"\'", r"", name) # etc...more of these...
new_filename = name + extension
try:
# moves files or directories (recursively)
shutil.move(original_filename, new_filename)
except shutil.Error:
print ("Couldn't rename file %(original_filename)s!" % locals())
[rename_file(f) for f in os.listdir('.') if not f.startswith('.')]
My problem is that os.path.splitext() includes "the .part(s)" of the ".partX.rar" as part of the filename, whereas I'd like it to be included as part of the file extension.
How can I get the the script to do that (without having a list of "extensions" or a completely separate script for rar files)?
Thanks!

os.path.splitext does a reverse search for '.' and returns the first match it finds. So out of the box splitext will not do what you need. If you are just using it to tokenize file names I suggest that you parse the filename yourself by splitting on . taking the left side as the name and then rejoining the right side.
Here is one way to do it:
def split_name(file_name):
'''
Returns root_filename, 'middle tokens', and extension
'''
tokens = file_name.split('.')
return (tokens[0], ".".join(tokens[1:-1]), tokens[-1]) if len(tokens) >1 else file_name
file_name = 'this.is.a.txt'
split_name(file_name)
#result is:
>>>  ('this', 'is.a', 'txt')

Replace strings in files by Python, recursively in given directory and its subdirectories?

How can you replace a string match inside a file with the given replacement, recursively, inside a given directory and its subdirectories?
Pseudo-code:
import os
import re
from os.path import walk
for root, dirs, files in os.walk("/home/noa/Desktop/codes"):
for name in dirs:
re.search("dbname=noa user=noa", "dbname=masi user=masi")
// I am trying to replace here a given match in a file

Put all this code into a file called mass_replace. Under Linux or Mac OS X, you can do chmod +x mass_replace and then just run this. Under Windows, you can run it with python mass_replace followed by the appropriate arguments.
#!/usr/bin/python
import os
import re
import sys
# list of extensions to replace
DEFAULT_REPLACE_EXTENSIONS = None
# example: uncomment next line to only replace *.c, *.h, and/or *.txt
# DEFAULT_REPLACE_EXTENSIONS = (".c", ".h", ".txt")
def try_to_replace(fname, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
if replace_extensions:
return fname.lower().endswith(replace_extensions)
return True
def file_replace(fname, pat, s_after):
# first, see if the pattern is even in the file.
with open(fname) as f:
if not any(re.search(pat, line) for line in f):
return # pattern does not occur in file so we are done.
# pattern is in the file, so perform replace operation.
with open(fname) as f:
out_fname = fname + ".tmp"
out = open(out_fname, "w")
for line in f:
out.write(re.sub(pat, s_after, line))
out.close()
os.rename(out_fname, fname)
def mass_replace(dir_name, s_before, s_after, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
pat = re.compile(s_before)
for dirpath, dirnames, filenames in os.walk(dir_name):
for fname in filenames:
if try_to_replace(fname, replace_extensions):
fullname = os.path.join(dirpath, fname)
file_replace(fullname, pat, s_after)
if len(sys.argv) != 4:
u = "Usage: mass_replace <dir_name> <string_before> <string_after>\n"
sys.stderr.write(u)
sys.exit(1)
mass_replace(sys.argv[1], sys.argv[2], sys.argv[3])
EDIT: I have changed the above code from the original answer. There are several changes. First, mass_replace() now calls re.compile() to pre-compile the search pattern; second, to check what extension the file has, we now pass in a tuple of file extensions to .endswith() rather than calling .endswith() three times; third, it now uses the with statement available in recent versions of Python; and finally, file_replace() now checks to see if the pattern is found within the file, and doesn't rewrite the file if the pattern is not found. (The old version would rewrite every file, changing the timestamps even if the output file was identical to the input file; this was inelegant.)
EDIT: I changed this to default to replacing every file, but with one line you can edit to limit it to particular extensions. I think replacing every file is a more useful out-of-the-box default. This could be extended with a list of extensions or filenames not to touch, options to make it case insensitive, etc.
EDIT: In a comment, #asciimo pointed out a bug. I edited this to fix the bug. str.endswith() is documented to accept a tuple of strings to try, but not a list. Fixed. Also, I made a couple of the functions accept an optional argument to let you pass in a tuple of extensions; it should be pretty easy to modify this to accept a command-line argument to specify which extensions.

Do you really need regular expressions?
import os
def recursive_replace( root, pattern, replace )
for dir, subdirs, names in os.walk( root ):
for name in names:
path = os.path.join( dir, name )
text = open( path ).read()
if pattern in text:
open( path, 'w' ).write( text.replace( pattern, replace ) )

Of course, if you just want to get it done without coding it up, use find and xargs:
find /home/noa/Desktop/codes -type f -print0 | \
xargs -0 sed --in-place "s/dbname=noa user=noa/dbname=masi user=masi"
(And you could likely do this with find's -exec or something as well, but I prefer xargs.)

This is how I would find and replace strings in files using python. This is a simple little function that will recursively search a directories for a string and replace it with a string. You can also limit files with a certain file extension like the example below.
import os, fnmatch
def findReplace(directory, find, replace, filePattern):
for path, dirs, files in os.walk(os.path.abspath(directory)):
for filename in fnmatch.filter(files, filePattern):
filepath = os.path.join(path, filename)
with open(filepath) as f:
s = f.read()
s = s.replace(find, replace)
with open(filepath, "w") as f:
f.write(s)
This allows you to do something like:
findReplace("some_dir", "find this", "replace with this", "*.txt")

this should work:
import re, os
import fnmatch
for path, dirs, files in os.walk(os.path.abspath(directory)):
for filename in fnmatch.filter(files, filePattern):
filepath = os.path.join(path, filename)
with open("namelist.wps", 'a') as out:
with open("namelist.wps", 'r') as readf:
for line in readf:
line = re.sub(r"dbname=noa user=noa", "dbname=masi user=masi", line)
out.write(line)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python grep reverse matching - python

Related

how to search for a pattern in a file on other directory in python on windows?

Search for file names using a text list

Processing specific files in a directory in python

How To Include "part#+.rar" In "Extensions

Replace strings in files by Python, recursively in given directory and its subdirectories?

Categories

Resources