I am trying to recursively loop through a series of directories (about 3 levels deep). In each directory is a series of text files, I want to replace a line of text with the directory path if the line contains a certain string so for example.
/path/to/text/file/fName.txt
If a line in fName in fName.txt text contains the string 'String1' I want to replace this line with 'some text' + file where file is the last part of the path.
This seems like it should be easy in python but I can't seem to manage it.
Edit: Apologies for a very badly written question, I had to rush off, shouldn't have hit enter.
Here's what I have so far
import os
for dirname, dirs, files in os.walk("~/dir1/dir2"):
print files
for fName in files:
fpath = os.path.join(dirname, fName)
print fpath
f = open(fpath)
for line in f:
#where I'm getting stuck
s = s.replace("old_txt", "new_txt")
#create new file and save output
What I'm getting stuck on is how to replace an entire line based on only a section of the line. For example if the line were
That was a useless question,
I can't seem to make replace to what I want. What I'm trying to do is change the entire line based only on searching for 'useless'. Also, is there a better way of modyfying a single line than re-writing the entire file?
Thanks.
os.walk (look at example) is all you need
parse each file with with open(...) as f:, analyze it, and overwrite it (carefully, after testing) with with open(..., 'w') as f:
Related
I'm trying to read files and collect the data inside the files. I am looking inside my directory, moving towards the folder, and looking there.
I would like to read every line in the file.
I have read that my output looks like binary. I've tried looking around on stackoverflow. I have also made sure the file I am reading is a txt file.
import os
def ratio(filename):
cwd = str(os.getcwd())
cwd = cwd[:-8]
cwd = cwd + "Equities\\" + str(filename) + ".txt"
file = open(cwd, "r")
line_1=str(file.readline(4))
print(line_1)
The readline(4) should return:
Current assets
My readline(4) function returns:
ÿþA\000
If you are trying to read a specific line try:
lines=file.readlines()
text = lines[4]
readline reads one line at a time. You can use readlines instead to read all the lines at once into a list.
As I mentioned in my comment readline(4) only reads 4 bytes from the first line, not the fourth line.
I am writing some scripts to process some text files in python. Locally the script reads from a single txt file thus i use
index_file = open('index.txt', 'r')
for line in index_file:
....
and loop through the file to find a matching string, but when using amazon EMR, the index.txt file per se, is split into multiple txt files in a single folder.
Thus i would like to replicate that locally and read from multiple txt file for a certain string, but i struggle to find clean code to do that.
What is the best way to go about it while writing minimal code?
import os
from glob import glob
def readindex(path):
pattern = '*.txt'
full_path = os.path.join(path, pattern)
for fname in sorted(glob(full_path)):
for line in open(fname, 'r'):
yield line
# read lines to memory list for using multiple times
linelist = list(readindex("directory"))
for line in linelist:
print line,
This script defines a generator (see this question for details about generators) to iterate through all the files in directory "directory" that have extension "txt" in sorted order. It yields all the lines as one stream that after calling the function can be iterated through as if the lines were coming from one open file, as that seems to be what the question author wanted. The comma at the end of print line, makes sure that newline is not printed twice, although the content of the for loop would be replaced by question author anyway. In that case one can use line.rstrip() to get rid of the newline.
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.
What I'm trying to do is troll through a directory of log files which begin like this "filename001.log" there can be 100s of files in a directory
The code I want to run against each files is to check to make sure that the 8th position of the log always contains a number. I have a suspicion that a non-digit is throwing off our parser. Here's some simple code I'm trying to check this:
# import re
from urlparse import urlparse
a = '/folderA/filename*.log' #<< currently this only does 1 file
b = '/folderB/' #<< I'd like it to write the same file name as it read
with open(b, 'w') as newfile, open(a, 'r') as oldfile:
data = oldfile.readlines()
for line in data:
parts = line.split()
status = parts[8] # value of 8th position in the log file
isDigit = status.isdigit()
if isDigit = False:
print " Not A Number :",status
newfile.write(status)
My problem is:
How do I tell it to read all the files in a directory? (The above really only works for 1 file at a time)
If I find something is not a number I would like to write that character into a file in a different folder but of the same name as the log file. For example I find filename002.log has a "*" in one of the log lines. I would like folderB/filename002.log to made and the non-digit character to the written.
Sounds sounds simple enough I'm just a not very good at coding.
To read files in one directory matching a given pattern and write to another, use the glob module and the os.path functions to construct the output files:
srcpat = '/folderA/filename*.log'
dstdir = '/folderB'
for srcfile in glob.iglob(srcpat):
if not os.path.isfile(srcfile): continue
dstfile = os.path.join(dstdir, os.path.basename(srcfile))
with open(srcfile) as src, open(dstfile, 'w') as dst:
for line in src:
parts = line.split()
status = parts[8] # value of 8th position in the log file
if not status.isdigit():
print " Not A Number :", status
dst.write(status) # Or print >>dst, status if you want newline
This will create empty files even if no bad entries are found. You can either wait until you're finished processing the files (and the with block is closed) and just check the file size for the output and delete it if empty then, or you can move to a lazy approach where you delete the output file before beginning iteration unconditionally, but don't open it; only if you get a bad value do you open the file (for append instead of write to keep earlier loops' output from being discarded), write to it, allow it to close.
Import os and use: for filenames in os.listdir('path'):. This will list all files in the directory, including subdirectories.
Simply open a second file with the correct path. Since you already have filename from iterating with the above method, you only have to replace the directory. You can use os.path.join for that.
I have a text file (filenames.txt) which contains over 200 file paths:
/home/chethan/purpose1/script1.txt
/home/chethan/purpose2/script2.txt
/home/chethan/purpose3/script3.txt
/home/chethan/purpose4/script4.txt
Out of the multiple lines present in each of these files, each of them contain a line which is a filename like Reference.txt. My objective is to replace .txt in Reference.txt with .csv in every file. As a beginner of Python I referred to several questions in stackoverflow on similar cases and wrote the following code.
My code:
#! /usr/bin/python
#filename modify_hob.py
import fileinput
f = open('/home/chethan/filenames.txt', 'r')
for i in f.readlines():
for line in fileinput.FileInput(i.strip(),inplace=1):
line = line.replace("txt","csv"),
f.close()
f.close()
When I run my code, the contents of txt files (script1, script2..) mentioned above are wiped away, i.e., they won't be having a single line of text inside them! I am puzzled with this behavior and not able to find out a solution.
This should get you going (untested):
#! /usr/bin/python
#filename modify_hob.py
# Open the file with filenames list.
with open('filenames.txt') as list_f:
# Iterate over the lines, each line represents a file name.
for filename in list_f:
# Rewrite its content.
with open(filename) as f:
content = f.read()
with open(filename, 'w') as f:
f.write(content.replace('.txt', '.csv'))
In your code below, f is set to the open file object of filename.txt and
nothing else. That is what you are closing in both the last two lines.
Also, you are not writing anything back to the files, so you can't expect your
changes to be written back to the disk. (Unless the fileinput module does some
dark magic that I'm missing.)
I ran a grep command and found several hundred instances of a string in a large directory of data. This file is 2 MB and has strings that I would like to extract out and put into an Excel file for easy access later. The part that I'm extracting is a path to a data file I need to work on later.
I have been reading about Python lately and thought I could somehow do this extraction automatically. But I'm a bit stumped how to start. I have this so far:
data = open("C:\python27\text.txt").read()
if "string" in data:
But then I'm not sure what to use to get out of the file what I want. Anything for a beginner to chew on?
EDIT
Here is some more info on what I was looking for. I have several hundred lines in a text file. Each line has a path and some strings like this:
/path/to/file:STRING=SOME_STRING, ANOTHER_STRING
What I would like from these lines are the paths of those lines with a specific "STRING=SOME_STRING". For example if the line looks like this, I want the path (/path/to/file) to be extracted to another file:
/path/to/file:STRING=SOME_STRING
All this is quite easily done with standard Python, but for "excel" (xls,or xlsx) files -- you'd have to install a third party library for that. However, if you need just a 2D table that cna open up on a spreadsheed you can use Comma Separated Values (CSV) files - these are comaptible with Excel and other spreadsheet software, and comes integrated in Python.
As for searching a string inside a file, it is straightforward. You may not even need regular expressions for most things. What information do you want along with the string?
Also, the "os" module onthse standardlib has some functions to list all files in a directory, or in a directory tree. The most straightforward is os.listdir(path)
String methods like "count" and "find" can be used beyond "in" to locate the string in a file, or count the number of ocurrences.
And finally, the "CSV" module can write a properly formated file to read in ay spreadsheet.
Along the away, you may abuse python's buit-in list objects as an easy way to manipulate data sets around.
Here is a sample programa that counts strings given in the command line found in files in a given directory,, and assembles a .CSV table with them:
# -*- coding: utf-8 -*-
import csv
import sys, os
output_name = "count.csv"
def find_in_file(path, string_list):
count = []
file_ = open(path)
data = file_.read()
file_.close()
for string in string_list:
count.append(data.count(string))
return count
def main():
if len(sys.argv) < 3:
print "Use %s directory_path <string1>[ string2 [...]])\n" % __package__
sys.exit(1)
target_dir = sys.argv[1]
string_list = sys.argv[2:]
csv_file = open(output_name, "wt")
writer = csv.writer(csv_file)
header = ["Filename"] + string_list
writer.writerow(header)
for filename in os.listdir(target_dir):
path = os.path.join(target_dir, filename)
if not os.path.isfile(path):
continue
line = [filename] + find_in_file(path, string_list)
writer.writerow(line)
csv_file.close()
if __name__=="__main__":
main()
The steps to do this are as follows:
Make a list of all files in the directory (This isn't necessary if you're only interested in a single file)
Extract the names of those files that you're interested in
In a loop, read in those files line by line
See if the line matches your pattern
Extract the part of the line before the first : character
So, the code would look something like this, provided your text files are formatted the way you've shown in the question and that this format is reliably correct:
import sys, os, glob
dir_path = sys.argv[1]
if dir_path[-1] != os.sep: dir_path+=os.sep
file_list = glob.glob(dir_path+'*.txt') #use standard *NIX wildcards to get your file names, in this case, all the files with a .txt extension
with open('out_file.csv', 'w') as out_file:
for filename in file_list:
with open(filename, 'r') as in_file:
for line in in_file:
if 'STRING=SOME_STRING' in line:
out_file.write(line.split(':')[0]+'\n')
This program would be run as python extract_paths.py path/to/directory and would give you a file called out_file.csv in your current directory.
This file can then be imported into Excel as a CSV file. If your input is less reliable than you've suggested, regular expressions might be a better choice.