With the code below I have made a htmlfiles.txt that contains HTML filenames in a directory:
import os
entries = os.listdir('/home/stupidroot/Documents/html.files.test')
count=0
for line in entries:
count += 1
f = open("htmlfiles.txt", "a")
f.write(line + "\n")
f.close()
In the second phase I want to make modification in every file like this:
lines = open('filename.html').readlines()
open('filename.html', 'w').writelines(lines[20:-20])
This code deletes the first and the last 20 lines in a HTML file.
I just want to make with all files with for loop simple.
You just need to open the htmlfiles.txt file, and read each filename from each line and do your stuff :
with open("htmlfiles.txt") as fic:
for filename in fic:
filename = filename.rstrip()
lines = open(filename).readlines()
open(filename, 'w').writelines(lines[20:-20])
Related
I'm creating new files from originally existing ones in the mdp folder by changing a couple of lines in those files using python. I need to do this for 1000 files. Can anyone suggest a for loop which reads all files and changes them and creates new in one go?
This way I have to change the the number followed by 'md_' in the path and it's tedious because there are a 1000 files here.
I tried using str() but there was a 'could not read file error'
fin = open("/home/abc/xyz/mdp/md_1.mdp", "rt")
fout = open("/home/abc/xyz/middle/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('integrator = md', 'integrator
= md-vv'))
fin = open("/home/abc/xyz/middle/md_1.mdp", "rt")
fout = open("/home/abc/xyz/mdb/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('dt = 0.001', 'dt
= -0.001'))
fin.close()
fout.close()
os.listdir(path) is your friend:
import os
sourcedir = "/home/abc/xyz/mdp"
destdir = "/home/abc/xyz/middle"
for filename in os.listdir(sourcedir):
if not filename.endswith(".mdp"):
continue
source = os.path.join(sourcedir, filename)
dest = os.path.join(destdir, filename)
# with open(xxx) as varname makes sure the file(s)
# will be closed whatever happens in the 'with' block
# NB text mode is the default, and so is read mode
with open(source) as fin, open(dest, "w") as fout:
# python files are iterable... avoids reading
# the whole file in memory at once
for line in fin:
# will only work for those exact strings,
# you may want to use regexps if number of
# whitespaces vary etc
line = line.replace("dt = 0.001", "dt = -0.001")
line = line.replace(
'integrator = md',
'integrator = md-vv'
)
fout.write(line)
Assuming you want to edit all files that are located in the mdp folder you could do something like this.
import os
dir = "/home/abc/xyz/mdp/"
for filename in os.listdir(dir):
with open(dir + filename, "r+") as file:
text = file.read()
text = text.replace("dt = 0.001", "dt = -0.001")
file.seek(0)
file.write(text)
file.truncate()
This will go through every file and change it using str.replace().
If there are other files in the mdp folder that you do not want to edit, you could use and if-statement to check for the correct file name. Add something like this to encase the with open statement.
if filename.startswith("md_")
Essentially the data is temperatures from 4 different states over the course of 12 months, so there is 48 files to be populated into my folder on my desktop directory. But I am not sure how to take the data being pulled from the web and then take the files being saved in my program to be sent to the directory of my desktop. That's what I am confused about, how to take the files being created on in my program and send them to a folder on my desktop.
I am copying the data from the web, cleaning it up, then saving it into a file, then taking that file and wanting to save it to a folder on my desktop.
Here is the code:
import urllib
def accessData(Id, Month):
url = "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=" + str(Id) + "&year=2017&month=" + str(Month) + "&graphspan=month&format=1"
infile = urllib.urlopen(url)
readLineByLine = infile.readlines()
infile.close()
return readLineByLine
f = open('stations.csv', 'r')
for line in f.readlines():
vals = line.split(',')
for j in range(1,13): # accessing months here from 1 to 12, b/c 13 exclusive
data = accessData(line, j)
filename = "{}-0{}-2017.csv".format(vals[0], j)
print(str(filename))
row_count = len(data)
for i in range(2, row_count):
if(data[i] != '<br>\n' and data[i] != '\n'):
writeFile = open(filename, 'w')
writeFile.write(data[i])
openfile = open(Desktop, writeFile , 'r')
file.close()
Have you tried running the script from your desktop. It looks like you haven't specified a directory. So maybe running from your desktop should output your results to your current working directory.
Alternatively, you could try use the in-built os library.
import os
os.getcwd() # to get the current working directory
os.chdir(pathname) # change your working directory to the path specified.
This would change your working directory to the place you want to save your files.
Also, in regards to the last four lines of your code. file is not open, so you cannot close this. Also, I do not believe you need the openfile statement.
writeFile = open(filename, 'w')
writeFile.write(data[i])
openfile = open(Desktop, writeFile , 'r')
file.close()
Try this instead.
with open(filename, 'w') as writeFile:
for i in range(2, row_count):
if(data[i] != '<br>\n' and data[i] != '\n'):
writeFile.write(data[i])
Using this approach you shouldn't need to close the file. 'w' is to write as if a new file, change this to 'a' if you need to append to the file.
You just need to provide writeFile.write() with the path to your destination file, rather than just the filename (which will otherwise be saved into your current working directory.)
Try something like:
f = open('stations.csv', 'r')
target_dir = "/path/to/your/Desktop/folder/"
for line in f.readlines():
...
# We can open the file outside your inner "row" loop
# using the combination of the path to your Desktop
# and your filename
with open(target_dir+filename, 'w') as writeFile:
for i in range(2, row_count):
if(data[i] != '<br>\n' and data[i] != '\n'):
writeFile.write(data[i])
# The "writeFile" object will close automatically outside the
# "with ... " block
As others have mentioned, you could approach this two different ways:
1) Run the script directly from the directory to which you would like to save the files. Then you would just need to specify the full path to the .csv file you are reading.
2) You could provide the full path to where you would like to save the files when you write them, however this seems more intensive and unnecessary.
On another note, when opening files for the purpose of reading/writing them, use with to simply open the file for as long as you need it, then when you exit the with statement, the file will automatically be closed.
Here is an example of Option 1 with some clean-up:
import urllib
def accessData(Id, Month):
url = "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=" + str(Id) + "&year=2017&month=" + str(Month) + "&graphspan=month&format=1"
infile = urllib.urlopen(url)
readLineByLine = infile.readlines()
infile.close()
return readLineByLine
with open('Path to File' + 'stations.csv', 'r') as f:
for line in f.readlines():
vals = line.split(',')
for j in range(1,13):
data = accessData(line, j)
filename = "{}-0{}-2017.csv".format(vals[0], j)
with open(filename, 'w') as myfile:
for i in range(2, len(data)):
if data[i]!='<br>\n' and data[i]!='\n':
myfile.write(data[i])
print(filename + ' - Completed')
I am trying to get a Python script that will open a few text files, read the content and every time it finds a word from a list, block that out with new text, then write it to a new file, for each file.
Right now, I can get it to write all of the source files to a single file, which is my script below, but I am not sure how to proceed to having a new file for every source file.
import os
KeyWords=["Magic","harry","wand"]
rootdir = "C:\\books"
fileslist = []
##blanks file and preps for new data
fileout = open(rootdir+"\\output\\newfile.txt","w")
print (fileout)
fileout.write("Start of file\n\nLocation of output: "+rootdir+"\\output \n\nFiles that are being Processed:\n\n")
fileout.close()
def sourcelist(fileslist):
file=open(fileslist,"r")
fileout=open(rootdir+"\\output\\newfile.txt", "a")
for line in file:
if any(word.lower() in line.lower() for word in KeyWords):
print("Word Found\n\n" + '\t'+line + "\nEnd\n")
fileout.write("<<<SEARCH TERM FOUND>>>\n\n" + '\t'+line + "\n<<<END OF BLOCK>>>\n")
else:
#print('\t'+line) #No need to print the lines with no Key words in
fileout.write('\t'+line)
#return #not sure what return does?
for root, dirs, files in os.walk(rootdir):
dirs.clear()
for file in files:
filepath = root + os.sep + file
if filepath.endswith(".txt"):
fileslist.append(filepath)
for path in fileslist:
sourcelist(path)
print("\n".join(fileslist))
with open(rootdir+"\\output\\newfile.txt","a") as output:
output.write("\n".join(fileslist)+"\n\n\n")
output.close()
This is a bit tough to answer as a whole, but here's a general approach.
I have the following file structure:
hp_extracts: # directory
hp_parser.py
-- inps/
-- harry_1.txt
-- harry_2.txt
-- outs/
<nothing>
Contents of inps/harry_1.txt:
When Harry got his wand it was Magic
something something magic something
something harry something
Contents of inps/harry_2.txt:
magic something something
something
harry something something
This is the contents of hp_parser.py:
import os
all_files = os.listdir('inps/')
keywords=["magic","harry","wand"]
for file in all_files:
with open('inps/{}'.format(file)) as infile, open('outs/{}'.format(file), 'w') as outfile:
for line in infile:
#print(line)
for word in line.split():
if word.lower() in keywords:
line = line.replace(word, '<<<SEARCH TERM FOUND>>> {} <<<END OF BLOCK>>>'.format(word))
outfile.write(line)
I have n files in the location /root as follows
result1.txt
abc
def
result2.txt
abc
def
result3.txt
abc
def
and so on.
I must create a consolidated file called result.txt with all the values concatenated from all result files looping through the n files in a location /root/samplepath.
It may be easier to use cat, as others have suggested. If you must do it with Python, this should work. It finds all of the text files in the directory and appends their contents to the result file.
import glob, os
os.chdir('/root')
with open('result.txt', 'w+') as result_file:
for filename in glob.glob('result*.txt'):
with open(filename) as file:
result_file.write(file.read())
# append a line break if you want to separate them
result_file.write("\n")
That could be an easy way of doing so
Lets says for example that my file script.py is in a folder and along with that script there is a folder called testing, with inside all the text files named like file_0, file_1....
import os
#reads all the files and put everything in data
number_of_files = 0
data =[]
for i in range (number_of_files):
fn = os.path.join(os.path.dirname(__file__), 'testing/file_%d.txt' % i)
f = open(fn, 'r')
for line in f:
data.append(line)
f.close()
#write everything to result.txt
fn = os.path.join(os.path.dirname(__file__), 'result.txt')
f = open(fn, 'w')
for element in data:
f.write(element)
f.close()
I need to find every instance of "translate" in a text file and replace a value 4 lines after finding the text:
"(many lines)
}
}
translateX xtran
{
keys
{
k 0 0.5678
}
}
(many lines)"
The value 0.5678 needs to be 0. It will always be 4 lines below the "translate" string
The file has up to about 10,000 lines.
example text file name: 01F.pz2.
I'd also like to cycle through the folder and repeat the process for every file with the pz2 extension (up to 40).
Any help would be appreciated!
Thanks.
I'm not quite sure about the logic for replacing 0.5678 in your file, therefore I use a function for that - change it to whatever you need, or explain more in details what you want. Last number in line? only floating-point number?
Try:
import os
dirname = "14432826"
lines_distance= 4
def replace_whatever(line):
# Put your logic for replacing here
return line.replace("0.5678", "0")
for filename in filter(lambda x:x.endswith(".pz2") and not x.startswith("m_"), os.listdir(dirname)):
print filename
with open(os.path.join(dirname, filename), "r") as f_in, open(os.path.join(dirname,"m_%s" % filename), "w") as f_out:
replace_tasks = []
for line in f_in:
# search marker in line
if line.strip().startswith("translate"):
print "Found marker in", line,
replace_tasks.append(lines_distance)
# replace if necessary
if len(replace_tasks)>0 and replace_tasks[0] == 0:
del replace_tasks[0]
print "line to change is", line,
line_to_write = replace_whatever(line)
else:
line_to_write = line
# Write to output
f_out.write(line_to_write)
# decrease counters
for i, task in enumerate(replace_tasks):
replace_tasks[i] -= 1
The comments within the code should help understanding. The main concept is the list replace_tasks that keeps record of when the next line to modify will come.
Remarks: Your code sample suggests that the data in your file are structured. It will definitely be saver to read this structure and work on it instead of search-and-replace approach on a plain text file.
Thorsten, I renamed my original files to have the .old extension and the following code works:
import os
target_dir = "."
# cycle through files
for path, dirs, files in os.walk(target_dir):
# file is the file counter
for file in files:
# get the filename and extension
filename, ext = os.path.splitext(file)
# see if the file is a pz2
if ext.endswith('.old') :
# rename the file to "old"
oldfilename = filename + ".old"
newfilename = filename + ".pz2"
old_filepath = os.path.join(path, oldfilename)
new_filepath = os.path.join(path, newfilename)
# open the old file for reading
oldpz2 = open (old_filepath,"r")
# open the new file for writing
newpz2 = open (new_filepath,"w")
# reset changeline
changeline = 0
currentline = 0
# cycle through old lines
for line in oldpz2 :
currentline = currentline + 1
if line.strip().startswith("translate"):
changeline = currentline + 4
if currentline == changeline :
print >>newpz2," k 0 0"
else :
print >>newpz2,line