I would like to search for a word in all the files in a directory.
Eg.:
I have a folder "Test Directory" and in it I have 5 files
TestFile1.txt ... TestFile5.txt
Let's say only one of them contains a specific word Test written inside it. How can I search through all of them until I find the one with the word?
You should go with some full search engines like Elasticsearch and use python to query over Elasticsearch. But if you doing this one-time, then you take help from below code.
for file in glob.glob("/folder/path/*.txt"):
if 'word' in open(file).read():
print(file)
Related
I am trying to run a search query in Box root folder to find folder names that contain a particular string. However I only want the folders that are 1 level below (similar to a ls command). However get_items() will return folders matching the string even deeper down.
For example if I search for "AA" in the below folder structure it should only return Folder1AA, Folder2AA and Folder3AA and not Folder4AA and Folder5AA :
StartingFolder
Folder1AA
File1B
Folder4AA
Folder1C
File1D
Folder2AA
Folder5AA
File1C
Folder2B
File1D
Folder3AA
File1B
Any ideas on how to do that ?
I have a directory that creates a new subfolder each day, each subfolder's name always starts with the date it was created (i.e. MMDDYY). I need to prompt the user for the date of the file they need (something they'd already have) and search for a subfolder that has a matching prefix in the name. The rest of the folder name can be ignored.
If a folder with the correct prefix is found there will be a similar prompt to locate files in the folder that have a name leading with a 5 digit number that the user would also have. Those files just need copied to a new location. I'm just getting stuck on how to locate a subfolder when I only have the prefix to the folder name and same with the file inside that folder once it's found.
For example, I'm looking for a file that generated on 1/10/2019, the file name starts with 42333. The full folder name would be something like 01102019CHA71H2HBMNN. There would be two files that are found, one with a full file name that might be 42333aaabc.xrf and the other would be 42333aaabc with no file extension. These file names could exist in multiple other folders but usually I need them for specific dates.
If I understood correctly, you need a algorithm that the input is a prefix (a string).
In Python you can make "membership" tests with strings, for example:
>>> string = "A long string"
>>> "long" in string
True
Your algorithm would work with something like:
"If {prefix as string} in {directory/file name as string}:
do something"
But if your question is how to list files inside a directory, you can do this by two libraries:
os
subprocess (by calling "ls" in Linux or "dir" in Windows)
Or you could use, also the re library which is for regular expressions. It's a bit complex but way more flexible.
Good source for debugging RegEx: https://regexr.com/
For learning RegEx in Python: https://www.w3schools.com/python/python_regex.asp
Best wishes, pal
For learning
Hello fellow Pythonistas,
I have a script which searches through all files contained within a single directory for a 'string' keyword. If it finds the 'string' keyword within any of the files, it will print the name of this file to the IDLE command screen. It seems to work quite well. The inputs are gathered by the program using user prompts.Typically I am searching for a single word within a large series of text files.
HOWEVER - Now I want to build on this in two ways
1) I want to modify the code to that it can also search through all the files contained within sub-folders within the specified directory.
2) I would also like to specify that the searches are limited to a certain type of file extension such as .txt.
Can anyone provide some guidance on either of these two enhancements ???
I am using Python 3 and am VERY new to Python (Just started playing with it 2 weeks ago in an attempt to automate some boring searches through my employers folder structures)
Much appreciated to anyone who can provide some help.
Cheers,
Fraz
# This script will search through all files within a single directory for a single key word
# If the script finds the word within any of the files it will print the name of this file to the command line
import os
print ('When answering questions, do not add a space and use forward slash separators on file paths')
print ('')
# Variables to be defined by user input
user_input = input('Paste the directory you want to search?')
directory = os.listdir(user_input)
searchstring = input('What word are you trying to find within these files?')
for fname in directory:
if os.path.isfile(user_input + os.sep + fname):
# Full path
f = open(user_input + os.sep + fname, 'r')
if searchstring in f.read():
print('found string in file "%s"' % fname)
f.close()enter code here
I have a folder full of jar, html, css, exe type file. How can I check the file?
I already run "file" command on *NIX and using python-magic. but the result is all like this.
test : Zip archive data, at least v1.0 to extract
How can I get information specifically like test : jar only using using magic number.
How do I do like this?
While not required, most JAR files have a META-INF/MANIFEST.MF file contained within them. You could check for the existence of this file, after checking if it's a zip file:
import zipfile
def zipFileContains(zipFileName, pathName):
f = zipfile.ZipFile(zipFileName, "r")
result = any(x.startswith(pathName.rstrip("/")) for x in f.namelist())
f.close()
return result
print zipFileContains("test.jar", "META-INF/MANIFEST.MF")
However, it might be better to just check if it's a zip file that ends in .jar.
Magic alone won't do it for you, since a JAR is literally just a zip file. Read more about the format here.
Using python, I'm trying to find all files in /sys and match a certain file. The problem I'm having is that not all files are being found. It's not a matter of access. I know that python can read and write to the file, which I've tested manually using file.open("file_path","w") and file.write(). I just want to know whether there is some trick to locating files I'm missing here:
import os,re
for roots,dirs,files in os.walk('/sys'):
match=re.search(r'\S+/rq_affinity',roots)
if match:
print(match.group())
I've already tried writing every single file found using os.walk() to a file and then using the shell and grep to see if the file I'm looking for is there, so the problem isn't with matching.
FIXED search:
import os,re
for roots,dirs,files in os.walk('/sys'):
for file in files:
match=re.search(r'\S+/rq_affinity',os.path.join(roots,file))
if match:
print(match.group())
rq_affinity is a file isn't it? Why would you get that in roots?
Also the entries under /sys/dev/block are symlinks so you need to tell os.walk to follow them with followlinks=True.