I have an application running on multi containers. most of them are python and JS services, but a few are PostgreSQL DB.
I wrote a script that collects the logs from every container and stores them in a zip file.
the first version of the script reads the logs of every container in a string variable and then adds it as a file to the zip archive. this version works great.
import subprocess
import zipfile
# Get a list of all running docker containers
containers = subprocess.run(['docker', 'ps', '--format', '{{.Names}}'], capture_output=True, text=True).stdout.strip().split()
# Create a zip file
with zipfile.ZipFile("logs.zip", mode='w') as archive:
# Iterate over the list of containers
for container in containers:
# Retrieve the log files of the current container
log = subprocess.run(['docker', 'logs', container], capture_output=True, text=True).stdout
# Add the log files to the archive with container name
archive.writestr(container + '.log', log)
I created a new version that will read the logs with chunks, to not overfill the memory.
import subprocess
import zipfile
# Get a list of all running docker containers
containers = subprocess.run(['docker', 'ps', '--format', '{{.Names}}'], capture_output=True, text=True).stdout.strip().split()
# Create a zip file
with zipfile.ZipFile("logs.zip", mode='w') as archive:
# Iterate over the list of containers
for container in containers:
# Retrieve the log files of the current container
log_process = subprocess.Popen(['docker', 'logs', container], stdout=subprocess.PIPE, text=True)
# Read the logs in chunks and write to the archive
with archive.open(container + '.log', mode='w') as log_file:
while True:
chunk = log_process.stdout.read(4096)
if not chunk:
break
log_file.write(chunk. Encode())
But in this version, when it iterates over PostgreSQL services, it prints logs to the terminal.
Related
I have a PySpark Script where data is processed and then converted to CSV files. As the end result should be ONE CSV file accessible via WinSCP, I do some additional processing to put the CSV files on the worker nodes together and transfer it out of HDFS to the FTP Server (I think it's called Edge Node).
from py4j.java_gateway import java_import
import os
YYMM = date[2:7].replace('-','')
# First, clean out both HDFS and local folder so CSVs do not stack up (data history is stored in DB anyway if update option is enabled)
os.system('hdfs dfs -rm -f -r /hdfs/path/new/*')
os.system('rm -f /ftp/path/new/*')
#timestamp = str(datetime.now()).replace(' ','_').replace(':','-')[0:19]
df.coalesce(1).write.csv('/hdfs/path/new/dataset_temp_' + date, header = "true", sep = "|")
# By default, output CSV has weird name ("part-0000-..."). To give proper name and delete automatically created upper folder, do some more processing
java_import(spark._jvm, 'org.apache.hadoop.fs.Path')
sc = spark.sparkContext
fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration())
file = fs.globStatus(sc._jvm.Path('/hdfs/path/new/dataset_temp_' + date + '/part*'))[0].getPath().getName()
fs.rename(sc._jvm.Path('/hdfs/path/new/dataset_temp_' + date + "/" + file), sc._jvm.Path('/hdfs/path/new/dataset_' + YYMM + '.csv'))
fs.delete(sc._jvm.Path('/hdfs/path/new/dataset_temp_' + date), True)
# Shift CSV file out of HDFS into "regular" SFTP server environment
os.system('hdfs dfs -copyToLocal hdfs://<server>/hdfs/path/new/dataset_' + YYMM + '.csv' + ' /ftp/path/new')
In Client mode all works fine. But when I switch to Cluster, it gives an Error Message that the final /ftp/path/new in the CopyToLocal-Command is not found, I suppose because it is looking on the Worker Nodes and not on the Edge Node. Is there any way to overcome this? As an alternative, I thought to do the final CopyToLocal command from a batch script outside of the Spark Session, but I'd rather have it all in one script...
Instead of running the OS commands in your spark script, you can directly write the output out the ftp location. You need to provide the path to the ftp location with savemode set to overwrite. you can then run the code to rename the data after your spark script has completed.
YYMM = date[2:7].replace('-','')
df.coalesce(1).write
.mode("overwrite")
.csv('/ftp/path/new/{0}'.format(date), header = "true", sep = "|")
#run the command below in a separate step once the above code has executed.
os.system("mv /ftp/path/new/{0}/*.csv /ftp/path/new/{0}/dataset_' + YYMM + '.csv".format(date))
I have made an assumption that the ftp location is accessible by the worker nodes since you are able to run the copyToLOcal command in client mode. If the location is not accessible, you will have to write the file out the hdfs location as before and run the moving of ile and renaming of the file in a separate process/code outsideof the spark script
I am trying to run a python script from a powershell script inside SQL Server Agent.
I was able to execute most job of a python script (Task1-Task2) except the last portion (Task3) where it runs a third party exe file called SQBConverter (from RedGate) which converts files from SQB format to BAK format.
When I manually run powershell script directly which runs python script, there is no issue.
I modified the "Log On As" from default ("Local System") to my own (JDoh), and it executes the powershell within SQL Server Agent, but it only does the job except where it converts files from SQB to BAK format (Task3).
Without changing to my own (JDoh), it would not even executes the any job of python script.
I don't think there is any issue with powershell script side because it still triggers python script when I changed the "Log On As" to "Local System". It does not show error, but it shows as SQL Server Agent job completed. But, it does not run any tasks within python script at all.
So, I am guessing it might be something to do with SQL Server Agent not able to trigger/run the SQBConverter exe file.
Here is whole python code (ConvertToBAK.py) to give you the whole idea of logic. It does everything until where it converts from SQB to BAK (Task3: last two lines).
import os
from os import path
import datetime
from datetime import timedelta
import glob
import shutil
import re
import time, sys
today = datetime.date.today()
yesterday = today - timedelta(days = 1)
yesterday = str(yesterday)
nonhyphen_yesterday = yesterday.replace('-','')
revised_yesterday = "LOG_us_xxxx_multi_replica_" + nonhyphen_yesterday
src = "Z:\\TestPCC\\FTP"
dst = "Z:\\TestPCC\\Yesterday"
password = "Password"
path = "Z:\\TestPCC\\FTP"
now = time.time()
### Task1: To delete old files (5 days or older)
for f in os.listdir(path):
f = os.path.join(path, f)
if os.stat(f).st_mtime < now - 5 * 86400:
if os.path.isfile(f):
os.remove(os.path.join(path, f))
filelist = glob.glob(os.path.join(dst, "*"))
for f in filelist:
os.remove(f)
### Task2: To move all files from one folder to other folder location
src_files = os.listdir(src)
src_files1 = [g for g in os.listdir(src) if re.match(revised_yesterday, g)]
for file_name in src_files1:
full_file_name = os.path.join(src, file_name)
if os.path.isfile(full_file_name):
shutil.copy(full_file_name, dst)
### Task3: Convert from SQB format to BAK format (running SQBConverter.exe)
for f in glob.glob(r'Z:\\TestPCC\\Yesterday\\*.SQB'):
os.system( f'SQBConverter "{f}" "{f[:-4]}.bak" {password}' )
This is powershell code (Test.ps1):
$path = 'Z:\TestPCC'
$file = 'ConvertToBAK.py'
$cmd = $path+"\\"+$file # This line of code will create the concatenate the path and file
Start-Process $cmd # This line will execute the cmd
This is screenshot of SQL Server Agent's step:
I looked at the properties of SQBConverter exe file itself, and I granted FULL control for all users listed.
I got it working by modifying the last line of my Python code.
From:
os.system( f'SQBConverter "{f}" "{f[:-4]}.bak" {password}' )
To (absolute path):
os.system( f'Z:\\TestPCC\\SQBConverter.exe "{f}" "{f[:-4]}.bak" {password}' )
I am using pysftp library's get_r function (https://pysftp.readthedocs.io/en/release_0.2.9/pysftp.html#pysftp.Connection.get_r) to get a local copy of a directory structure from sftp server.
Is that the correct approach for a situation when the contents of the remote directory have changed and I would like to get only the files that changed since the last time the script was run?
The script should be able to sync the remote directory recursively and mirror the state of the remote directory - f.e. with a parameter controlling if the local outdated files (those that are no longer present on the remote server) should be removed, and any changes to the existing files and new files should be fetched.
My current approach is here.
Example usage:
from sftp_sync import sync_dir
sync_dir('/remote/path/', '/local/path/')
Use the pysftp.Connection.listdir_attr to get file listing with attributes (including the file timestamp).
Then, iterate the list and compare against local files.
import os
import pysftp
import stat
remote_path = "/remote/path"
local_path = "/local/path"
with pysftp.Connection('example.com', username='user', password='pass') as sftp:
sftp.cwd(remote_path)
for f in sftp.listdir_attr():
if not stat.S_ISDIR(f.st_mode):
print("Checking %s..." % f.filename)
local_file_path = os.path.join(local_path, f.filename)
if ((not os.path.isfile(local_file_path)) or
(f.st_mtime > os.path.getmtime(local_file_path))):
print("Downloading %s..." % f.filename)
sftp.get(f.filename, local_file_path)
Though these days, you should not use pysftp, as it is dead. Use Paramiko directly instead. See pysftp vs. Paramiko. The above code will work with Paramiko too with its SFTPClient.listdir_attr.
I am working on a code to copy images from a folder in a local directory to a remote directory. I am trying to use scp.
So in my directory, there is a folder that contains subfolders with images in it. There are also images that are in the main folder that are not in subfolders. I am trying to iterate through the subfolders and individual images and sort them by company, then make corresponding company folders for those images to be organized and copied onto the remote directory.
I am having problems creating the new company folder in the remote directory.
This is what I have:
def imageSync():
path = os.path.normpath("Z:\Complete")
folders = os.listdir(path)
subfolder = []
#separates subfolders from just images in complete folder
for folder in folders:
if folder[len(folder)-3:] == "jpg":
pass
else:
subfolder.append(folder)
p = dict()
for x in range(len(subfolder)):
p[x] = os.path.join(path, subfolder[x])
sub = []
for location in p.items():
sub.append(location[1])
noFold= []
for s in sub:
path1 = os.path.normpath(s)
images = os.listdir(path1)
for image in images:
name = image.split("-")
comp = name[0]
pathway = os.path.join(path1, image)
path2 = "scp " + pathway + " blah#192.168.1.10: /var/files/ImageSync/" + comp
pathhh = os.system(path2)
if not os.path.exists(pathhh):
noFold.append(image)
There's more to the code, but I figured the top part would help explain what I am trying to do.
I have created a ssh key in hopes of making os.system work, but Path2 is returning 1 when I would like it to be the path to the remote server. (I tried this: How to store the return value of os.system that it has printed to stdout in python?)
Also how do I properly check to see if the company folder in the remote directory already exists?
I have looked at Secure Copy File from remote server via scp and os module in Python and How to copy a file to a remote server in Python using SCP or SSH? but I guess I am doing something wrong.
I'm new to Python so thanks for any help!
try this to copy dirs and nested subdirs from local to remote:
cmd = "sshpass -p {} scp -r {}/* root#{}://{}".format(
remote_root_pass,
local_path,
remote_ip,
remote_path)
os.system(cmd)
don't forget to import os,
You may check the exitcode returned (0 for success)
Also you might need to "yum install sshpass"
And change /etc/ssh/ssh_config
StrictHostKeyChecking ask
to:
StrictHostKeyChecking no
Have you ever tried this feedback calling an external zip.py script to work? My CGITB does not show any error messages. It simply did not invoke external .py script to work. It simply skipped over to gush. I should be grateful if you can assist me in making this zip.py callable in feedback.py.
Regards. David
#**********************************************************************
# Description:
# Zips the contents of a folder.
# Parameters:
# 0 - Input folder.
# 1 - Output zip file. It is assumed that the user added the .zip
# extension.
#**********************************************************************
# Import modules and create the geoprocessor
#
import sys, zipfile, arcgisscripting, os, traceback
gp = arcgisscripting.create()
# Function for zipping files. If keep is true, the folder, along with
# all its contents, will be written to the zip file. If false, only
# the contents of the input folder will be written to the zip file -
# the input folder name will not appear in the zip file.
#
def zipws(path, zip, keep):
path = os.path.normpath(path)
# os.walk visits every subdirectory, returning a 3-tuple
# of directory name, subdirectories in it, and filenames
# in it.
#
for (dirpath, dirnames, filenames) in os.walk(path):
# Iterate over every filename
#
for file in filenames:
# Ignore .lock files
#
if not file.endswith('.lock'):
gp.AddMessage("Adding %s..." % os.path.join(path, dirpath, file))
try:
if keep:
zip.write(os.path.join(dirpath, file),
os.path.join(os.path.basename(path),
os.path.join(dirpath, file)[len(path)+len(os.sep):]))
else:
zip.write(os.path.join(dirpath, file),
os.path.join(dirpath[len(path):], file))
except Exception, e:
gp.AddWarning(" Error adding %s: %s" % (file, e))
return None
if __name__ == '__main__':
try:
# Get the tool parameter values
#
infolder = gp.GetParameterAsText(0)
outfile = gp.GetParameterAsText(1)
# Create the zip file for writing compressed data. In some rare
# instances, the ZIP_DEFLATED constant may be unavailable and
# the ZIP_STORED constant is used instead. When ZIP_STORED is
# used, the zip file does not contain compressed data, resulting
# in large zip files.
#
try:
zip = zipfile.ZipFile(outfile, 'w', zipfile.ZIP_DEFLATED)
zipws(infolder, zip, True)
zip.close()
except RuntimeError:
# Delete zip file if exists
#
if os.path.exists(outfile):
os.unlink(outfile)
zip = zipfile.ZipFile(outfile, 'w', zipfile.ZIP_STORED)
zipws(infolder, zip, True)
zip.close()
gp.AddWarning(" Unable to compress zip file contents.")
gp.AddMessage("Zip file created successfully")
except:
# Return any python specific errors as well as any errors from the geoprocessor
#
tb = sys.exc_info()[2]
tbinfo = traceback.format_tb(tb)[0]
pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo +
"\nError Info:\n " + str(sys.exc_type) +
": " + str(sys.exc_value) + "\n"
gp.AddError(pymsg)
msgs = "GP ERRORS:\n" + gp.GetMessages(2) + "\n"
gp.AddError(msgs)
zip() is a built-in function in Python. Therefore it is a bad practice to use zip as a variable name. zip_ can be used instead of.
execfile() function reads and executes a Python script.
It is probably that you actually need just import zip_ in feedback.py instead of execfile().
Yay ArcGIS.
Just to clarify how are you trying to call this script using popen, can you post some code?
If your invoking this script via another script in the ArcGIS environment, then the thing is, when you use Popen the script wont be invoked within the ArcGIS environment, instead it will be invoked within windows. So you will loose all real control over it.
Also just another ArcGIS comment you never initalize a license for the geoprocessor.
My suggestion refactor your code, into a module function that simply attempts to zip the files, if it fails print the message out to ArcGIS.
If you want post how you are calling it, and how this is being run.