Asynchronous tasks in Plone to query Python Package Index

Asynchronous tasks in Plone to query Python Package Index - python

I want to periodically (every hour?) query the Python Package Index API from Plone. Something equivalent to:
$ for i in `yolk -L 24 | awk '{print $1}'` # get releases made in last 24 hours
do
# search for plone classifier
results=`yolk -M $i -f classifiers | grep -i plone`
if [ $results ]; then
echo $i
fi
done
Results:
collective.sendaspdf
gocept.selenium
Products.EnhancedNewsItemImage
adi.workingcopyflag
Products.SimpleCalendarPortlet
Products.SimpleCalendar
Then I want to display this information in a template. I would love to, at least initially, avoid having to persist the results.
How do I display the results in a template without having to wait for the query to finish? I know there are some async packages available e.g.:
plone.app.async
But I'm not sure what the general approach should be (assuming I can schedule an async task, I may need to store the results somewhere. If I have to store the results, I'd prefer to do it "lightweight" e.g. annotations)

How about the low, low tech version?
Use a cron-job to run the query, put this in a temp file, then move the file into a known location, with a timestamp in the filename.
Then, when someone requests the page in question (showing new packages), simply read the newest file in that location:
filename = sorted(os.listdir(location))[-1]
data = open(os.path.join(location, filename)).read()
By using a move, you guarantee that the newest file in the designated location is always a complete file, avoiding a partial result being read.

Related

Get latest file from samba share using smbclient

My directory contains the file and it's different versions out of which I want to pick the latest version which can either be sorted by date or by the revision number at the end of file name some thing like
Myfile2001.txt
Where 2001 is the revision number.
How can I get the latest file from samba-share directory using smb-client, I thought of using mask to take out all the names and pipe it to output and then to performing some searching algorithm to find the largest number (latest revision) and then use smbclient get to get the file, but this does not seems an optimal solution and it's too tedious. I wonder if there is any other way to do it ?
EDIT: I figured out an alternate way in python(Just for ease) to capture the output of smbclient get ls in text file or STDOUT and then use python to find the latest file's name. Now I cannot figure out how can I redirect the output of the above command to a text file or STDOUT to process it according to a logic.
Is there any way to do it? As smbclient does not allow the ioredirection, still I am stuck at the same point with newer approach. I have gone through pysmb but cannot rely on it as it is an experimental library, however, any solution with pysmb is also accepted to solve the purpose momentarily.

I've solved this issue using awk in bash script. Goal : download the most recent csv file
${SmbCmd} "ls <mask>" 2>/dev/null\
| awk '{ if ($1 ~ "csv$") print $1 }' | sort | tail -1)
Where ${SmbCmd} have all the values to send to smb server, as the path to smbclient, the authentification method, the smb server name, the smb dir .... and finish by "--command", in long form
Of course, my csv files names have the creation date "name_yyyy-mm-dd.csv".
You can try something like that
${SmbCmd} "ls <mask>-*" | awk '{ if ($1 ~ "csv$") print $8$5$6";"$1 }'
But, the month isn't numeric

Serial Numbers from a Storage Controller over SSH

Background
I'm working on a bash script to pull serial numbers and part numbers from all the devices in a server rack, my goal is to be able to run a single script (inventory.sh) and walk away while it generates text files containing the information I need. I'm using bash for maximum compatibility, the RHEL 6.7 systems do have Perl and Python installed, however they have minimal libraries. So far I haven't had to use anything other than bash, but I'm not against calling a Perl or Python script from my bash script.
My Problem
I need to retrieve the Serial Numbers and Part numbers from the drives in a Dot Hill Systems AssuredSAN 3824, as well as the Serial numbers from the equipment inside. The only way I have found to get all the information I need is to connect over SSH and run the following three commands dumping the output to a local file:
show controllers
show frus
show disks
Limitations:
I don't have "sshpass" installed, and would prefer not to install it.
The Controller is not capable of storing SSH keys ( no option in custom shell).
The Controller also cannot write or transfer local files.
The Rack does NOT have access to the Internet.
I looked at paramiko, but while Python is installed I do not have pip.
I also cannot use CPAN.
For what its worth, the output comes back in XML format. (I've already written the code to parse it in bash)
Right now I think my best option would be to have a library for Python or Perl in the folder with my other scripts, and write a script to dump the commands' output to files that I can parse with my bash script. Which language is easier to just provide a library in a file? I'm looking for a library that is as small and simple as possible to use. I just need a way to get the output of those commands to XML files. Right now I am just using ssh 3 times in my script and having to enter the password each time.

Have a look at SNMP. There is a reasonable chance that you can use SNMP tools to remotely extract the information you need. The manufacturer should be able to provide you with the MIBs.

I ended up contacting the Manufacturer and asking my question. They said that the system isn't setup for connecting without a password, and their SNMP is very basic and won't provide the information I need. They said to connect to the system with FTP and use "get logs " to download an archive of the configuration and logs. Not exactly ideal as it takes 4 minutes just to run that one command but it seems to be my only option. Below is the script I wrote to retrieve the file automatically by adding the login credentials to the .netrc file. This works on RHEL 6.7:
#!/bin/bash
#Retrieve the logs and configuration from a Dot Hill Systems AssuredSAN 3824 automatically.
#Modify "LINE" and "HOST" to fit your configuration.
LINE='machine <IP> login manage password <password>'
HOST='<IP>'
AUTOLOGIN="/root/.netrc"
FILE='logfiles.zip'
#Check for and verify the autologin file
if [ -f $AUTOLOGIN ]; then
printf "Found auto-login file, checking for proper entry... \r"
READLINE=`cat $AUTOLOGIN | grep "$LINE"`
#Append the line to the end of .netrc if file exists but not the line.
if [ "$LINE" != "$READLINE" ]; then
printf "Proper entry not found, creating it... \r"
echo "$LINE" >> "$AUTOLOGIN"
else
printf "Proper entry found... \r"
fi
#Create the Autologin file if it doesn't exist
else
printf "Auto-Login file does not exist, creating it and setting permissions...\r"
echo "$LINE" > "$AUTOLOGIN"
chmod 600 "$AUTOLOGIN"
fi
#Start getting the information from the controller. (This takes a VERY long time)
printf "Retrieving Storage Controller data, this will take awhile... \r"
ftp $HOST << SCRIPT
get logs $FILE
SCRIPT
exit 0
This gave me a bunch of files in the zip, but all I needed was the "store_....logs" file. It was about 500,000 lines long, the first portion is the entire configuration in XML format, then the configuration in text format, followed by the logs from the system. I parsed the file and stripped off the logs at the end which cut the file down to 15,000 lines. From there I divided it into two files (config.xml and config.txt). I then pulled the XML output of the 3 commands that I needed and it to the 3 files my previously written script searches for. Now my inventory script pulls in everything it needs, albeit pretty slow due to waiting 4 minutes for the system to generate the zip file. I hope this helps someone in the future.
Edit:
Waiting 4 minutes for the system to compile was taking too long. So I ended up using paramiko and python scripts to dump output from the commands to files that my other code can parse. It accepts the IP of the Controller as a parameter. Here is the script for those interested. Thank you again for all the help.
#!/usr/bin/env python
#Saves output of "show disks" from the storage Controller to an XML file.
import paramiko
import sys
import re
import xmltodict
IP = sys.argv[1]
USERNAME = "manage"
PASSWORD = "password"
FILENAME = "./logfiles/disks.xml"
cmd = "show disks"
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
client.connect(IP,username=USERNAME,password=PASSWORD)
stdin, stdout, stderr = client.exec_command(cmd)
except Exception as e:
sys.exit(1)
data = ""
for line in stdout:
if re.search('#', line):
pass
else:
data += line
client.close()
f = open(FILENAME, 'w+')
f.write(data)
f.close()
sys.exit(0)

"gsutil rm" command using STDIN

I use gsutil in a Linux environment for managing files in GCS. I enjoy being able to use the command
gsutil -m cp -I gs://...
preceded by some other command to pass the STDIN to gsutil for uploading files; in doing so, I can maintain a local list of files that have been uploaded or generate specific patterns to upload and hand them off.
I would like to be able to do a similar command like
gsutil -m rm -I gs://...
to scrub files similarly. Presently, I build a big list of files to remove and run it with the following code:
while read line
do
gsutil rm gs://...
done < "$myfile.txt"
This is extraordinarily slow compared to the multithreaded "gsutil -m rm..." command, and enabling the -m flag has no effect when you have to process files one at a time from a list. I also experimented with just running
gsutil -m rm gs://.../* # remove everything
<my command> | gsutil -m cp -I gs://.../ # put back the pieces that I want
but this involves recopying a lot of a data and wastes a lot of time; the data is already there and just needs to have some removed. Any thoughts would be appreciated. Also, I don't have a lot of flexibility on either end with renaming files; otherwise, a quick rename before uploading would handle all of this.

As an interim solution, since we don't have a -I option for rm right now, how about just creating a string of all the objects you want to delete in your loop and then using gsutil -m rm to delete it? You could also do this with a simple python script that invokes the gsutil command from within python as a separate process.
Expanding on your earlier example, maybe something like the following (disclaimer: my bash-fu isn't the greatest, and I haven't tested this):
objects=''
while read line
do
objects="$objects gs://$line"
done
gsutil -m rm $objects

For anyone wondering, I wound up doing like Zach Wilt indicated above. For reference, I was removing on the order of a couple thousand files from a span of 5 directories, so roughly 10,000 files. Doing this without the "-m" switch was taking upwards of 30 minutes; with the "-m" switch, it takes less than 30 seconds. Zoom!
For a robust example: I am using this to update Google Cloud Storage files to match local files. On the current day, I have a program that dumps lots of files that are incremental, and also a handful that are "rolled up". After a week, the incremental files get scrubbed locally automatically, but the same should happen in GCS to save the space. Here's how to do this:
#!/bin/bash
# get the full date strings for touch
start=`date --date='-9 days' +%x`
end=`date --date='-8 days' +%x`
# other vars
mon=`date --date='-9 days' +%b | tr [A-Z] [a-z]`
day=`date --date='-9 days' +%d`
# display start and finish times
echo "Cleaning files from $start"
# update start and finish times
touch --date="$start" /tmp/start1
touch --date="$end" /tmp/end1
# repeat for all servers
for dr in "dir1" "dir2" "dir3" ...
do
# list files in range and build retention file
find /local/path/$dr/ -newer /tmp/start1 ! -newer /tmp/end1 > "$dr-local.txt"
# get list of all files from appropriate folder on GCS
gsutil ls gs://gcs_path/$mon/$dr/$day/ > "$dr-gcs.txt"
# formatting the host list file
sed -i "s|gs://gcs_path/$mon/$dr/$day/|/local/path/$dr/|" "$dr-gcs.txt"
# build sed command file to delete matches
while read line
do
echo "\|$line|d" >> "$dr-del.txt"
done < "$dr-local.txt"
# run command file to strip lines for files that need to remain
sed -f "$dr-del.txt" <"$dr-gcs.txt" >"$dr-out.txt"
# convert local names to GCS names
sed -i "s|/local/path/$dr/|gs://gcs_path/$mon/$dr/$day/|" "$dr-out.txt"
# new variable to hold string
del=""
# convert newline separated file to one long string
while read line
do
del="$del$line "
done < "$dr-out.txt"
# remove all files matching the final output
gsutil -m rm $del
# cleanup files
rm $dr-local.txt
rm $dr-gcs.txt
rm $dr-del.txt
rm $dr-out.txt
done
You'll need to modify to fit your needs, but this is a concrete and working method for deleting files locally, and then synchronizing the change to Google Cloud Storage. Obviously, modify to fit your needs. Thanks again to #Zach Wilt.

grep logfile for a specific timeframe

I need to filter messages out of a log file which has the following format:
2013-03-22T11:43:21.817078+01:00 INFO log msg 1...
...
2013-03-22T11:44:32.817114+01:00 WARNING log msg 2...
...
2013-03-22T11:45:45.817777+01:00 INFO log msg 3...
...
2013-03-22T11:46:59.547325+01:00 INFO log msg 4...
...
(where ... means "more messages")
The filtering must be done based on a timeframe.
This is part of a bash script, and at this point in the code the timeframe is stored as $start_time and $end_time. For example:
start_time = "2013-03-22T11:45:20"
end_time = "2013-03-22T11:45:50"
Note that the exact value of $start_time or $end_time may may never appear in the log file; yet there will be several messages within the timeframe [$start_time, $end_time] which are the ones I'm looking for.
Now, I'm almost convinced I'll need a Python script to do the filtering, but I'd rather use grep (or awk, or any other tool) since it should run much faster (the log files are big).
Any suggestions?

based on the log content in your question, I think an awk oneliner may help:
awk -F'.' -vs="$start_time" -ve="$end_time" '$1>s && $1<e' logfile
Note: this is filtering content excluding the start and end time.

$ start_time="2013-03-22T11:45:20"
$ end_time="2013-03-22T11:45:50"
$ awk -F'.' '$1>s&&$1<e' s=$start_time e=$end_time file
2013-03-22T11:45:45.817777+01:00 INFO log msg 3...

Is there any way to get ps output programmatically?

I've got a webserver that I'm presently benchmarking for CPU usage. What I'm doing is essentially running one process to slam the server with requests, then running the following bash script to determine the CPU usage:
#! /bin/bash
for (( ;; ))
do
echo "`python -c 'import time; print time.time()'`, `ps -p $1 -o '%cpu' | grep -vi '%CPU'`"
sleep 5
done
It would be nice to be able to do this in Python so I can run it in one script instead of having to run two. I can't seem to find any platform independent (or at least platform independent to linux and OS X) way to get the ps output in Python without actually launching another process to run the command. I can do that, but it would be really nice if there were an API for doing this.
Is there a way to do this, or am I going to have to launch the external script?

You could check out this question about parsing ps output using Python.
One of the answers suggests using the PSI python module. It's an extension though, so I don't really know how suitable that is for you.
It also shows in the question how you can call a ps subprocess using python :)

My preference is to do something like this.
collection.sh
for (( ;; ))
do
date; ps -p $1 -o '%cpu'
done
Then run collection.sh >someFile while you "slam the server with requests".
Then kill this collection.sh operation after the server has been slammed.
At the end, you'll have file with your log of date stamps and CPU values.
analysis.py
import datetime
with( "someFile", "r" ) as source:
for line in source:
if line.strip() == "%CPU": continue
try:
date= datetime.datetime.strptime( line, "%a %b %d %H:%M:%S %Z %Y" )
except ValueError:
cpu= float(line)
print date, cpu # or whatever else you want to do with this data.

You could query the CPU usage with PySNMP. This has the added benefit of being able to take measurements from a remote computer. For that matter, you could install a VM of Zenoss or its kin, and let it do the monitoring for you.

if you don't want to invoke PS then why don't you try with /proc file system.I think you can write you python program and read the files from /proc file system and extract the data you want.I did this using perl,by writing inlined C code in perl script.I think you can find similar way in python as well.I think its doable,but you need to go through /prof file system and need to figure out what you want and how you can get it.
http://www.faqs.org/docs/kernel/x716.html
above URL might give some initial push.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.