I am currently an oper for multiple irc servers, and I am trying to have a reliable way to log our channels due to a high amount of abuse. I have for the current time been using pierc, but I need all the functionality of ZNC.
My question is, using python what would be a simple way to loop through the ZNC log directory to parse the logs into a mysql database. The directory looks like the following:
username_ircnetwork_channel_20160209.log username2_ircnetwork2_channel_20160209.log
I know I can itterate through each file with something to this effect:
fileOpen = open("~/.znc/moddata/log/")
fileOpen = fileOpen.read().splitlines()
for line in fileOpen:
do something
However I am at a loss at a clean way to cycle through the log directory to check each file. Is there a decent way in python to accomplish this?
You could use Python's os module with listdir and loop through the files:
import os
path = '/path/to/logs/'
listing = os.listdir(path)
for infile in listing:
with open(path + infile, 'rb') as f:
content = f.read()
# parse however you need
↳ https://docs.python.org/2/library/os.html
Related
I am trying to access a file from a Box folder as I am working on two different computers. So the file path is pretty much the same except for the username.
I am trying to load a numpy array from a .npy file and I could easily change the path each time, but it would be nice if I could make it universal.
Here is what the line of code looks like on my one computer:
y_pred_walking = np.load('C:/Users/Eric/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy')
And here is what the line of code looks like on the other computer:
y_pred_walking = 'C:/Users/erapp/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy'
The only difference is that the username on one computer is Eric and the other is erapp, but is there a way where I can make the line universal to all computers where all computers will have the Box folder?
You could either save the file to a path that doesn't depend on the user: e.g. 'C:/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy'
Or you could do some string formatting. One way would be with an environment or configuration variable that indicates which is the relevant user, and then for your load statement:
import os
current_user = os.environ.get("USERNAME") # assuming you're running on the Windows box as the relevant user
# Now load the formatted string. f-strings are better, but this is more obvious since f-strings are still very new to Python
y_pred_walking = 'C:/Users/{user}/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy'.format(user=current_user)
Yes, there is a way, at least for the problem as it is right now solution is pretty simple: to use f-strings
user='Eric'
y_pred_walking =np.load(f'C:/Users/{user}/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy')
or more general
def pred_walking(user):
return np.load(f'C:/Users/{user}/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy')
so on any machine you just do
y_pred_walking=pred_walking(user)
with defined user before, to receive the result
Simply search the folders recursivly for your file:
filename = 'y_pred_test.npy'
import os
import random
# creates 1000 directories with a 1% chance of having the file as well
for k in range(20):
for i in range(10):
for j in range(5):
os.makedirs(f"./{k}/{i}/{j}")
if random.randint(1,100) == 2:
with open(f"./{k}/{i}/{j}/{filename}","w") as f:
f.write(" ")
# search the directories for your file
found_in = []
# this starts searching in your current folder - you can give it your c:\Users\ instead
for root,dirs,files in os.walk("./"):
if filename in files:
found_in.append(os.path.join(root,filename))
print(*found_in,sep = "\n")
File found in:
./17/3/1/y_pred_test.npy
./3/8/1/y_pred_test.npy
./16/3/4/y_pred_test.npy
./16/5/3/y_pred_test.npy
./14/2/3/y_pred_test.npy
./0/5/4/y_pred_test.npy
./11/9/0/y_pred_test.npy
./9/8/1/y_pred_test.npy
If you get read errors because of missing file/directory permissions you can start directly in the users folder:
# Source: https://stackoverflow.com/a/4028943/7505395
from pathlib import Path
home = str(Path.home())
found_in = []
for root,dirs,files in os.walk(home):
if filename in files:
found_in.append(os.path.join(root,filename))
# use found_in[0] or break as soon as you find first file
You can use the expanduser function in the os.path module to modify a path to start from the home directory of a user
https://docs.python.org/3/library/os.path.html#os.path.expanduser
I just created a simple python script that goes through a folder with polyline shapefiles and merges them. Through the Windows 8 Task Scheduler I have scheduled the script to run when I want.
All I would like to do now is modify my script so I can slightly change the name of each shapefile output. For example, the script name for Week 1 would be MergedTracks_1, for Week 2 would be MergedTracks_2, for Week 3 would be MergedTracks_3, etc..
Does anybody have an idea on how to do this with the current script I have? I am running ArcGIS 10.2. I would appreciate any insight if possible. Below is the script I am currently using in PythonWin. Thanks so much in advance!!!
import arcpy, os
outPut = r"C:\Users\student2\Desktop\WeedTracksMergeScript\Output" # Output
arcpy.env.workspace = r"C:\Users\student2\Desktop\WeedTracksMergeScript"
shplist = arcpy.ListFeatureClasses('*.shp')
print shplist # prints polyline .shp list in current arcpy.env.workspace
arcpy.Merge_management(shplist, os.path.join(outPut, "MergedTracks_1.shp"))
print "Done"
You can use pickle to keep track of the last filename that was written, and then use that as part of your logic to determine what the next filename format should be.
Check out the tutorial here:
https://wiki.python.org/moin/UsingPickle
The main idea would be to do something like this:
Here is some pseudocode:
load pickle file
if pickle file exists
load pickle file
use logic to increment new filename
else:
use default file name
do logic for your work and write with set file name
write to pickle file for the new file name
Here is a quick example on how to use pickle:
import pickle
from os.path import exists
pickle_file = "current_file_name.pickle"
if exists(pickle_file):
with open(pickle_file, "rb") as pkr:
current_filename = pickle.load(pkr)
else:
current_filename = "current_file_1"
with open(pickle_file, "wb") as pkw:
pickle.dump(current_filename, pkw)
I think datetime stamps (strings) would be a lot less complicated and arguably more useful.
Many formatting options are documented at http://strftime.org
import datetime
dd = datetime.datetime.now().strftime("%Y%m%d")
shpfile = os.path.join(out_path), "MergedTracks_{}.shp".format(dd))
arcpy.Merge_management(shplist, shpfile)
I'm putting together my first web2py app, and I've run into a bit of a problem. I have some data stored in static/mydata.json that I'd like to access in a couple places (specifically, in one of my models, as well as a module).
If this were a normal python script, obviously I'd do something like:
import json
with open('/path/to/mydata.json') as f:
mydata = json.load(f)
In the context of web2py, I can get the url of the file from URL('static', 'mydata.json'), but I'm not sure how to load mydata - can I just do mydata = json.load(URL('static','mydata.json')? Or is there another step required to open the file?
It's advisable to use os.path.join with request.folder to build paths to files.
import os
filepath = os.path.join(request.folder,'static','mydata.json')
From that point on, you should be able to use that filepath to open the json file as per usual.
import os
filepath = os.path.join(request.folder,'static','mydata.json')
mydata = json.load(filepath)
I'm reading data file (text), and generating a number of reports, each one is written to a different output file (also text). I'm opening them the long way:
fP = open('file1','w')
invP = open('inventory','w')
orderP = open('orders','w')
... and so on, with a corresponding group of close() lines at the end.
If I could open them with a for loop, using a list of fP names and file names, I could guarantee closing the same files.
I tried using a dictionary of fp:filename, but that [obviously] didn't work, because either the fP variable is undefined, or a string 'fP' isn't a good file object name.
Since these are output files, I probably don't need to check for open errors - if I can't open one or more, I can't go on anyway.
Is there any way to open a group of files (not more than 10 or so) from a list of names, in a loop?
Good news! Python 3.3 brings in a standard safe way to do this:
contextlib.ExitStack
From the docs:
Each instance maintains a stack of registered callbacks that are called in reverse order when the instance is closed.
(...)
Since registered callbacks are invoked in the reverse order of registration, this ends up behaving as if multiple nested with statements had been used with the registered set of callbacks.
Here's an example how to use it:
from contextlib import ExitStack
with ExitStack() as stack:
files = [
stack.enter_context(open(filename))
for filename in filenames
]
# ... use files ...
When the code leaves the with statement, all files that have already been opened will be closed.
This way you also know that if 2 files get opened and then third file fails to open, the two already-opened files will be closed correctly. Also if an exception is raised anytime inside the with block, you'll see correct cleanup.
Yes, you can use a list comprehension:
filenames = ['file1.txt', 'file2.txt', 'file3.txt'...]
filedata = {filename: open(filename, 'w') for filename in filenames}
Now, all of the opened instances are saved in filedata, assigned to the name of the file.
To close them:
for file in filedata.values():
file.close()
Since you are saying there are many data files.Instead of entering filenames manually into a list.You can get the filenames into a list with this.
from os import listdir
from os.path import isfile, join
files_in_dir = [ f for f in listdir('/home/cam/Desktop') if isfile(join('/home/cam/Desktop',f)) ]
Now you can
for file in files_in_dir:
with open(file, 'w') as f:
f.do_something
Use the with keyword to guarantee that opened files (and other similar resources, known as "context managers") are closed:
with open(file_path, 'w') as output_file:
output_file.write('whatever')
Upon exiting the with block, the file will be properly closed -- even if an exception occurs.
You could easily loop over a list of paths to the desired files:
files = ['fp1.txt', 'inventory', 'orders']
for file in files:
with open(file, 'w') as current_file:
current_file.do_some_stuff()
You can open as many files as you want and keep them in a list to close them later:
fds = [open(path, 'w') for path in paths]
# ... do stuff with files
# close files
for fd in fds:
fd.close()
Or you could use a dictionary for better readability:
# map each file name to a file descriptor
files = {path: open(path, 'w') for path in paths}
# use file descriptors through the mapping
files['inventory'].write("Something")
# close files
for path in files:
files[path].close()
Both answers above are good if you know or define ahead of time the list of files you will want to create. But, in case you want a more generic solution, you can build that list just in time, use your OS to create empty files on disk (this is done different ways depending on the OS you are), then create the list of files interactively this way:
import os
working_folder = input("Enter full path for the working folder/directory: ")
os.chdir(working_folder)
filenames_list = os.listdir()
#you can filter too, if you need so:
#filenames_list = [filename for filename in os.listdir() if '.txt' in filename]
#then you can do as Reut Sharabani and A.J. suggest above and make a list of file descriptors
file_descriptors = [open(filename, 'w') for filename in filenames_list]
#or a dictionary as Reut Sharabani suggests (I liked that one Reut :)
#do whatever you need to do with all those files already opened for writing
#then close the files
for fd in file_descriptors:
fd.close()
It is ok to use "with"; as some suggest, if you work with only one file at the time (from start to finish), but if you want to work with all the files at the same time, it is better a list or dictionary of file descriptors.
I need to scan a file system for a list of files, and log those who don't exist. Currently I have an input file with a list of the 13 million files which need to be investigated. This script needs to be run from a remote location, as I do not have access/cannot run scripts directly on the storage server.
My current approach works, but is relatively slow. I'm still fairly new to Python, so I'm looking for tips on speeding things up.
import sys,os
from pz import padZero #prepends 0's to string until desired length
output = open('./out.txt', 'w')
input = open('./in.txt', 'r')
rootPath = '\\\\server\share\' #UNC path to storage
for ifid in input:
ifid = padZero(str(ifid)[:-1], 8) #extracts/formats fileName
dir = padZero(str(ifid)[:-3], 5) #exracts/formats the directory containing the file
fPath = rootPath + '\\' + dir + '\\' + ifid + '.tif'
try:
size = os.path.getsize(fPath) #don't actually need size, better approach?
except:
output.write(ifid+'\n')
Thanks.
dirs = collections.defaultdict(set)
for file_path in input:
file_path = file_path.rjust(8, "0")
dir, name = file_path[:-3], file_path
dirs[dir].add(name)
for dir, files in dirs.iteritems():
for missing_file in files - set(glob.glob("*.tif")):
print missing_file
Explanation
First read the input file into a dictionary of directory: filename. Then for each directory, list all the TIFF files in that directory on the server, and (set) subtract this from the collection of filenames you should have. Print anything that's left.
EDIT: Fixed silly things. Too late at night when I wrote this!
That padZero and string concatenation stuff looks to me like it would take a good percent of time.
What you want it to do is spend all its time reading the directory, very little else.
Do you have to do it in python? I've done similar stuff in C and C++. Java should be pretty good too.
You're going to be I/O bound, especially on a network, so any changes you can make to your script will result in very minimal speedups, but off the top of my head:
import os
input, output = open("in.txt"), open("out.txt", "w")
root = r'\\server\share'
for fid in input:
fid = fid.strip().rjust(8, "0")
dir = fid[:-3] # no need to re-pad
path = os.path.join(root, dir, fid + ".tif")
if not os.path.isfile(path):
output.write(fid + "\n")
I don't really expect that to be any faster, but it is arguably easier to read.
Other approaches may be faster. For example, if you expect to touch most of the files, you could just pull a complete recursive directory listing from the server, convert it to a Python set(), and check for membership in that rather than hitting the server for many small requests. I will leave the code as an exercise...
I would probably use a shell command to get the full listing of files in all directories and subdirectories in one hit. Hopefully this will minimise the amount of requests you need to make to the server.
You can get a listing of the remote server's files by doing something like:
Linux: mount the shared drive as /shared/directory/ and then do ls -R /shared/directory > ~/remote_file_list.txt
Windows: Use Map Network Drive to mount the shared drive as drive letter X:, then do dir /S X:/shared_directory > C:/remote_file_list.txt
Use the same methods to create a listing of your local folder's contents as local_file_list.txt. You python script will then reduce to an exercise in text processing.
Note: I did actually have to do this at work.