im building a system to sort some .log files in .txt format so that I later can send it to excel. There is 70+ files and in every file im scanning for a keyword, I get 1000+ strings that I want to save in a .txt. I can get every string that I want and see from which .log file each log has ben taken, but now I want to rename the file that the .log came from with a corresponding number 1,2,3,4,5...(one number for every file instead for its file name). code:
import glob
def LogFile(filename, tester):
message = []
data = []
print(filename)
with open(filename) as filesearch: # open search file
filesearch = filesearch.readlines() # read file
i = 1
d = {}
for filename in filename:
if not d.get(filename, False):
d[filename] = i
i += 1
for line in filesearch:
if tester in line: # extract ""
start = '-> '
end = ':\ '
number = line[line.find(start)+3: line.find(end)] #[ord('-> '):ord(' :\ ')]
data.append(number) # store all found wors in array
text = line[line.find(end)+3:]
message.append(text)
with open('Msg.txt', 'a') as handler: # create .txt file
for i in range(len(data)):
handler.write(f"{i}|{data[i]}|{message[i]}")
# open with 'w' to "reset" the file.
with open('Msg.txt', 'w') as file_handler:
pass
# ---------------------------------------------------------------------------------
for filename in glob.glob(r'C:\Users\FKAISER\Desktop\AccessSPA\SPA\*.log'):
LogFile(filename, 'Sending Request: Tester')
I have tried using this function
i = 1
d = {}
for filename in filename:
if not d.get(filename, False):
d[filename] = i
i += 1
but then it looks like this
i want each file to have the same number as in the picture, 1 indicates for all the 26 logs and 2 indicates for the 10 file in that folder... etc
Related
I have this code where I am trying to search a Directory and Sub Directories for a specified string within .xls and .xlsx files and return the file name for now. When I run this - I get a return of each file directory path as text for the files ending in .xls and .xlsx and the search string parameter I use under those same returned results. The code is not isolating the files with the string - rather, just returning the file path as text for all results and adding my string parameter to search for under that. What could be happening here? and is it possible to pass a list here and copy the discovered files to a folder? That is where I am trying to get with this in the end. Thank you.
import os
import openpyxl
def findFiles(strings, dir, subDirs, fileContent, fileExtensions):
filesInDir = []
foundFiles = []
filesFound = 0
if not subDirs:
for filename in os.listdir(dir):
if os.path.isfile(os.path.join(dir, filename).replace("\\", "/")):
filesInDir.append(os.path.join(dir, filename).replace("\\", "/"))
else:
for root, subdirs, files in os.walk(dir):
for f in files:
if not os.path.isdir(os.path.join(root, f).replace("\\", "/")):
filesInDir.append(os.path.join(root, f).replace("\\", "/"))
print(filesInDir)
if filesInDir:
for file in filesInDir:
print("Current file: "+file)
filename, extension = os.path.splitext(file)
if fileExtensions:
fileText = extension
else:
fileText = os.path.basename(filename).lower()
if fileContent:
fileText += getFileContent(file).lower()
for string in strings:
print(string)
if string in fileText:
foundFiles.append(file)
filesFound += 1
break
return foundFiles
def getFileContent(filename):
if filename.partition(".")[2] in supportedTypes:
if filename.endswith(".xls"):
content = ""
with openpyxl.load_workbook(filename) as pdf:
for x in range(0, len(pdf.pages)):
page = pdf.pages[x]
content = content + page.extract_text()
return content
elif filename.endswith(".xlsx"):
with openpyxl.load_workbook(filename, 'r') as f:
content = ""
lines = f.readlines()
for x in lines:
content = content + x
f.close()
return content
else:
return ""
supportedTypes = [".xls", ".xlsx"]
print(findFiles(strings=["55413354"], dir="C:/Users/User/", subDirs=True, fileContent=True, fileExtensions=False))
Expected output sample - reflects a find for string '55413354` - as in, that string was located in below file name only out of 3 files.
Excel File Name 123
Actual output - Returns everything - no filter is happening, and includes my search string under the file name.
path/Excel File Name 123
55413354
path/Excel File Name 321
55413354
path/Excel File Name 111
55413354
With the code below I have made a htmlfiles.txt that contains HTML filenames in a directory:
import os
entries = os.listdir('/home/stupidroot/Documents/html.files.test')
count=0
for line in entries:
count += 1
f = open("htmlfiles.txt", "a")
f.write(line + "\n")
f.close()
In the second phase I want to make modification in every file like this:
lines = open('filename.html').readlines()
open('filename.html', 'w').writelines(lines[20:-20])
This code deletes the first and the last 20 lines in a HTML file.
I just want to make with all files with for loop simple.
You just need to open the htmlfiles.txt file, and read each filename from each line and do your stuff :
with open("htmlfiles.txt") as fic:
for filename in fic:
filename = filename.rstrip()
lines = open(filename).readlines()
open(filename, 'w').writelines(lines[20:-20])
I have many files ('*.pl-pl'). My script has to find each of this files and merge them into one xlsx file using openpyxl.
Now, I want to rebuild those files, I want rebuild the same files as originals.
But there is a problem after writing:
(content variable contains content of one file (read from one excel cell))
with open(path,'w') as f:
f.write(content.encode('utf-8'))
So now, I check, whether original files are the same as new files. Text in those files seems to be the same but there are little differencies in size. When I use WinDiff application to check them, it finds some touples which are different but it says that they are different in blanks only.
Could you give me an advice how to rebuild those files to be the same as before?
Or is this way correct?
Note: I try to rebuild them to be sure that there will be the same encoding etc. because the merged excel file will be used to translation and then translated files has to be rebuilt instead of originals.
Here is the code - it checks directory and prints all file names and contents into the one temporary file. Then, it creates an excel file - 1st. column is path (to be able reconstruct dir) and 2nd column contains content of the file, where new lines has been switched to '='
def print_to_file():
import os
for root, dirs, files in os.walk("OriginalDir"):
for file in files:
text = []
if file.endswith(".pl-pl"):
abs_path = os.path.join(root, file)
with open(abs_path) as f:
for line in f:
text.append(line.strip('\n'))
mLib.printToFile('files.mdoc', abs_path + '::' + '*=*'.join(text)) #'*=*' represents '\n'
def write_it():
from openpyxl import Workbook
import xlsxwriter
file = 'files.mdoc'
workbook = Workbook()
worksheet = workbook.worksheets[0]
worksheet.title = "Translate"
i = 0
with open(file) as f:
classes = set()
for line in f:
i += 1
splitted = line.strip('\n').split('::')
name = splitted[0]
text = splitted[1].split('*=*')
text = [x.encode('string-escape') for x in text]
worksheet.cell('B{}'.format(i)).style.alignment.wrap_text = True
worksheet.cell('B{}'.format(i)).value = splitted[1]
worksheet.cell('A{}'.format(i)).value = splitted[0]
workbook.save('wrap_text1.xlsx')
import openpyxl
def rebuild():
wb = openpyxl.load_workbook('wrap_text1.xlsx')
ws = wb.worksheets[0]
row_count = ws.get_highest_row()
for i in xrange(1, row_count + 1):
dir_file = ws.cell('A{}'.format(i)).value
content = ws.cell('B{}'.format(i)).value
remake(dir_file, content)
import os
def remake(path, content):
content = re.sub('\*=\*', '\n', content)
result = ''
splt = path.split('\\')
file = splt[-1]
for dir in splt[:-1]:
result += dir + '/'
# print result
if not os.path.isdir(result):
# print result
os.mkdir(result)
with open(path, 'w') as f:
f.write(content.encode('utf-8'))
# print_to_file() # print to temp file - paths and contents separated by '::'
# write_it() # write it into the excel file
# rebuilt() # reconstruct directory
My goal is to get to a txt file that is withing the second layer of zip files. The issue is that the txt file has the same name in all the .zip, so it overwrites the .txt and it only returns 1 .txt
from ftplib import *
import os, shutil, glob, zipfile, xlsxwriter
ftps = FTP_TLS()
ftps.connect(host='8.8.8.8', port=23)
ftps.login(user='xxxxxxx', passwd='xxxxxxx')
print ftps.getwelcome()
print 'Access was granted'
ftps.prot_p()
ftps.cwd('DirectoryINeed')
data = ftps.nlst() #Returns a list of .zip diles
data.sort() #Sorts the thing out
theFile = data[-2] #Its a .zip file #Stores the .zip i need to retrieve
fileSize = ftps.size(theFile) #gets the size of the file
print fileSize, 'bytes' #prints the size
def grabFile():
filename = 'the.zip'
localfile = open(filename, 'wb')
ftps.retrbinary('RETR ' + theFile, localfile.write)
ftps.quit()
localfile.close()
def unzipping():
zip_files = glob.glob('*.zip')
for zip_file in zip_files:
with zipfile.ZipFile(zip_file, 'r')as Z:
Z.extractall('anotherdirectory')
grabFile()
unzipping()
lastUnzip()
After this runs it grabs the .zip that I need and extracts the contents to a folder named anotherdirectory. Where it holds the second tier of .zips. This is where I get into trouble. When I try to extract the files from each zip. They all share the same name. I end up with a single .txt when I need one for each zip.
I think you're specifying the same output directory and filename each time. In the unzipping function,
change
Z.extractall('anotherdirectory')
to
Z.extractall(zip_file)
or
Z.extractall('anotherdirectory' + zip_file)
if the zip_file's are all the same, give each output folder a unique numbered name:
before unzipping function:
count = 1
then replace the other code with this:
Z.extractall('anotherdirectory/' + str(count))
count += 1
Thanks to jeremydeanlakey's response, I was able to get this part of my script. Here is how I did it:
folderUnzip = 'DirectoryYouNeed'
zip_files = glob.glob('*.zip')
count = 1
for zip_file in zip_files:
with zipfile.ZipFile(zip_file, 'r') as Z:
Z.extractall(folderUnzip + '/' + str(count))
count += 1
I need to find every instance of "translate" in a text file and replace a value 4 lines after finding the text:
"(many lines)
}
}
translateX xtran
{
keys
{
k 0 0.5678
}
}
(many lines)"
The value 0.5678 needs to be 0. It will always be 4 lines below the "translate" string
The file has up to about 10,000 lines.
example text file name: 01F.pz2.
I'd also like to cycle through the folder and repeat the process for every file with the pz2 extension (up to 40).
Any help would be appreciated!
Thanks.
I'm not quite sure about the logic for replacing 0.5678 in your file, therefore I use a function for that - change it to whatever you need, or explain more in details what you want. Last number in line? only floating-point number?
Try:
import os
dirname = "14432826"
lines_distance= 4
def replace_whatever(line):
# Put your logic for replacing here
return line.replace("0.5678", "0")
for filename in filter(lambda x:x.endswith(".pz2") and not x.startswith("m_"), os.listdir(dirname)):
print filename
with open(os.path.join(dirname, filename), "r") as f_in, open(os.path.join(dirname,"m_%s" % filename), "w") as f_out:
replace_tasks = []
for line in f_in:
# search marker in line
if line.strip().startswith("translate"):
print "Found marker in", line,
replace_tasks.append(lines_distance)
# replace if necessary
if len(replace_tasks)>0 and replace_tasks[0] == 0:
del replace_tasks[0]
print "line to change is", line,
line_to_write = replace_whatever(line)
else:
line_to_write = line
# Write to output
f_out.write(line_to_write)
# decrease counters
for i, task in enumerate(replace_tasks):
replace_tasks[i] -= 1
The comments within the code should help understanding. The main concept is the list replace_tasks that keeps record of when the next line to modify will come.
Remarks: Your code sample suggests that the data in your file are structured. It will definitely be saver to read this structure and work on it instead of search-and-replace approach on a plain text file.
Thorsten, I renamed my original files to have the .old extension and the following code works:
import os
target_dir = "."
# cycle through files
for path, dirs, files in os.walk(target_dir):
# file is the file counter
for file in files:
# get the filename and extension
filename, ext = os.path.splitext(file)
# see if the file is a pz2
if ext.endswith('.old') :
# rename the file to "old"
oldfilename = filename + ".old"
newfilename = filename + ".pz2"
old_filepath = os.path.join(path, oldfilename)
new_filepath = os.path.join(path, newfilename)
# open the old file for reading
oldpz2 = open (old_filepath,"r")
# open the new file for writing
newpz2 = open (new_filepath,"w")
# reset changeline
changeline = 0
currentline = 0
# cycle through old lines
for line in oldpz2 :
currentline = currentline + 1
if line.strip().startswith("translate"):
changeline = currentline + 4
if currentline == changeline :
print >>newpz2," k 0 0"
else :
print >>newpz2,line