Get latest file from samba share using smbclient - python

My directory contains the file and it's different versions out of which I want to pick the latest version which can either be sorted by date or by the revision number at the end of file name some thing like
Myfile2001.txt
Where 2001 is the revision number.
How can I get the latest file from samba-share directory using smb-client, I thought of using mask to take out all the names and pipe it to output and then to performing some searching algorithm to find the largest number (latest revision) and then use smbclient get to get the file, but this does not seems an optimal solution and it's too tedious. I wonder if there is any other way to do it ?
EDIT: I figured out an alternate way in python(Just for ease) to capture the output of smbclient get ls in text file or STDOUT and then use python to find the latest file's name. Now I cannot figure out how can I redirect the output of the above command to a text file or STDOUT to process it according to a logic.
Is there any way to do it? As smbclient does not allow the ioredirection, still I am stuck at the same point with newer approach. I have gone through pysmb but cannot rely on it as it is an experimental library, however, any solution with pysmb is also accepted to solve the purpose momentarily.

I've solved this issue using awk in bash script. Goal : download the most recent csv file
${SmbCmd} "ls <mask>" 2>/dev/null\
| awk '{ if ($1 ~ "csv$") print $1 }' | sort | tail -1)
Where ${SmbCmd} have all the values to send to smb server, as the path to smbclient, the authentification method, the smb server name, the smb dir .... and finish by "--command", in long form
Of course, my csv files names have the creation date "name_yyyy-mm-dd.csv".
You can try something like that
${SmbCmd} "ls <mask>-*" | awk '{ if ($1 ~ "csv$") print $8$5$6";"$1 }'
But, the month isn't numeric

Related

Serial Numbers from a Storage Controller over SSH

Background
I'm working on a bash script to pull serial numbers and part numbers from all the devices in a server rack, my goal is to be able to run a single script (inventory.sh) and walk away while it generates text files containing the information I need. I'm using bash for maximum compatibility, the RHEL 6.7 systems do have Perl and Python installed, however they have minimal libraries. So far I haven't had to use anything other than bash, but I'm not against calling a Perl or Python script from my bash script.
My Problem
I need to retrieve the Serial Numbers and Part numbers from the drives in a Dot Hill Systems AssuredSAN 3824, as well as the Serial numbers from the equipment inside. The only way I have found to get all the information I need is to connect over SSH and run the following three commands dumping the output to a local file:
show controllers
show frus
show disks
Limitations:
I don't have "sshpass" installed, and would prefer not to install it.
The Controller is not capable of storing SSH keys ( no option in custom shell).
The Controller also cannot write or transfer local files.
The Rack does NOT have access to the Internet.
I looked at paramiko, but while Python is installed I do not have pip.
I also cannot use CPAN.
For what its worth, the output comes back in XML format. (I've already written the code to parse it in bash)
Right now I think my best option would be to have a library for Python or Perl in the folder with my other scripts, and write a script to dump the commands' output to files that I can parse with my bash script. Which language is easier to just provide a library in a file? I'm looking for a library that is as small and simple as possible to use. I just need a way to get the output of those commands to XML files. Right now I am just using ssh 3 times in my script and having to enter the password each time.
Have a look at SNMP. There is a reasonable chance that you can use SNMP tools to remotely extract the information you need. The manufacturer should be able to provide you with the MIBs.
I ended up contacting the Manufacturer and asking my question. They said that the system isn't setup for connecting without a password, and their SNMP is very basic and won't provide the information I need. They said to connect to the system with FTP and use "get logs " to download an archive of the configuration and logs. Not exactly ideal as it takes 4 minutes just to run that one command but it seems to be my only option. Below is the script I wrote to retrieve the file automatically by adding the login credentials to the .netrc file. This works on RHEL 6.7:
#!/bin/bash
#Retrieve the logs and configuration from a Dot Hill Systems AssuredSAN 3824 automatically.
#Modify "LINE" and "HOST" to fit your configuration.
LINE='machine <IP> login manage password <password>'
HOST='<IP>'
AUTOLOGIN="/root/.netrc"
FILE='logfiles.zip'
#Check for and verify the autologin file
if [ -f $AUTOLOGIN ]; then
printf "Found auto-login file, checking for proper entry... \r"
READLINE=`cat $AUTOLOGIN | grep "$LINE"`
#Append the line to the end of .netrc if file exists but not the line.
if [ "$LINE" != "$READLINE" ]; then
printf "Proper entry not found, creating it... \r"
echo "$LINE" >> "$AUTOLOGIN"
else
printf "Proper entry found... \r"
fi
#Create the Autologin file if it doesn't exist
else
printf "Auto-Login file does not exist, creating it and setting permissions...\r"
echo "$LINE" > "$AUTOLOGIN"
chmod 600 "$AUTOLOGIN"
fi
#Start getting the information from the controller. (This takes a VERY long time)
printf "Retrieving Storage Controller data, this will take awhile... \r"
ftp $HOST << SCRIPT
get logs $FILE
SCRIPT
exit 0
This gave me a bunch of files in the zip, but all I needed was the "store_....logs" file. It was about 500,000 lines long, the first portion is the entire configuration in XML format, then the configuration in text format, followed by the logs from the system. I parsed the file and stripped off the logs at the end which cut the file down to 15,000 lines. From there I divided it into two files (config.xml and config.txt). I then pulled the XML output of the 3 commands that I needed and it to the 3 files my previously written script searches for. Now my inventory script pulls in everything it needs, albeit pretty slow due to waiting 4 minutes for the system to generate the zip file. I hope this helps someone in the future.
Edit:
Waiting 4 minutes for the system to compile was taking too long. So I ended up using paramiko and python scripts to dump output from the commands to files that my other code can parse. It accepts the IP of the Controller as a parameter. Here is the script for those interested. Thank you again for all the help.
#!/usr/bin/env python
#Saves output of "show disks" from the storage Controller to an XML file.
import paramiko
import sys
import re
import xmltodict
IP = sys.argv[1]
USERNAME = "manage"
PASSWORD = "password"
FILENAME = "./logfiles/disks.xml"
cmd = "show disks"
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
client.connect(IP,username=USERNAME,password=PASSWORD)
stdin, stdout, stderr = client.exec_command(cmd)
except Exception as e:
sys.exit(1)
data = ""
for line in stdout:
if re.search('#', line):
pass
else:
data += line
client.close()
f = open(FILENAME, 'w+')
f.write(data)
f.close()
sys.exit(0)

Passing individual lines from files into a python script using a bash script

This might be a simple question, but I am new to bash scripting and have spent quite a bit of time on this with no luck; I hope I can get an answer here.
I am trying to write a bash script that reads individual lines from a text file and passes them along as argument for a python script. I have a list of files (which I have saved into a single text file, all on individual lines) that I need to be used as arguments in my python script, and I would like to use a bash script to send them all through. Of course I can take the tedious way and copy/paste the rest of the python command to individual lines in the script, but I would think there is a way to do this with the "read line" command. I have tried all sorts of combinations of commands, but here is the most recent one I have:
#!/bin/bash
# Command Output Test
cat infile.txt << EOF
while read line
do
VALUE = $line
python fits_edit_head.py $line $line NEW_PARA 5
echo VALUE+"huh"
done
EOF
When I do this, all I get returned is the individual lines from the input file. I have the extra VALUE there to see if it will print that, but it does not. Clearly there is something simple about the "read line" command that I do not understand but after messing with it for quite a long time, I do not know what it is. I admit I am still a rookie to this bash scripting game, and not a very good one at that. Any help would certainly be appreciated.
You probably meant:
while read line; do
VALUE=$line ## No spaces allowed
python fits_edit_head.py "$line" "$line" NEW_PARA 5 ## Quote properly to isolate arguments well
echo "$VALUE+huh" ## You don't expand without $
done < infile.txt
Python may also read STDIN so that it could accidentally read input from infile.txt so you can use another file descriptor:
while read -u 4 line; do
...
done 4< infile.txt
Better yet if you're using Bash 4.0, it's safer and cleaner to use readarray:
readarray -t lines < infile.txt
for line in "${lines[#]}; do
...
done

Get lines from stdout after timestamp

There is a huge log of errors/warnings/infos printed out on stdout. I am only interested in the lines logged after I start a specific action.
Other information: I am using Python to telnet to a shell environment. I execute the commands on shell and store the time the action is started. I then call a command to view the log which spits it on stdout. I expect to read in the greped lines after that timestamp back to Python. I also store the current time but not sure how to use that (maybe grep on a date range?)
I can redirect to a file and use find but the log is huge and I'd rather not read all of it.
I can grep -n to get line number and then read everything after but I'm not sure how to.
Concept regex to egrep on is something like: {a-timestamp}*
Any suggestions would be appreciated!
awk '/the-timestamp-I-have/,0' the-log-file
This will print the lines from the-log-file, starting at the first line that matches the-timestamp-I-have and continuing through the last line.
Ref:
http://www.catonmat.net/blog/awk-one-liners-explained-part-three/
http://www.catonmat.net/blog/ten-awk-tips-tricks-and-pitfalls/#awk_ranges

Asynchronous tasks in Plone to query Python Package Index

I want to periodically (every hour?) query the Python Package Index API from Plone. Something equivalent to:
$ for i in `yolk -L 24 | awk '{print $1}'` # get releases made in last 24 hours
do
# search for plone classifier
results=`yolk -M $i -f classifiers | grep -i plone`
if [ $results ]; then
echo $i
fi
done
Results:
collective.sendaspdf
gocept.selenium
Products.EnhancedNewsItemImage
adi.workingcopyflag
Products.SimpleCalendarPortlet
Products.SimpleCalendar
Then I want to display this information in a template. I would love to, at least initially, avoid having to persist the results.
How do I display the results in a template without having to wait for the query to finish? I know there are some async packages available e.g.:
plone.app.async
But I'm not sure what the general approach should be (assuming I can schedule an async task, I may need to store the results somewhere. If I have to store the results, I'd prefer to do it "lightweight" e.g. annotations)
How about the low, low tech version?
Use a cron-job to run the query, put this in a temp file, then move the file into a known location, with a timestamp in the filename.
Then, when someone requests the page in question (showing new packages), simply read the newest file in that location:
filename = sorted(os.listdir(location))[-1]
data = open(os.path.join(location, filename)).read()
By using a move, you guarantee that the newest file in the designated location is always a complete file, avoiding a partial result being read.

Script to compare a string in two different files

I am brand new to stackoverflow and to scripting. I was looking for help to get started in a script, not necessarily looking for someone to write it.
Here's what I have:
File1.csv - contains some information, I am only interested in MAC addresses.
File2.csv - has some different information, but also contains MAC address.
I need a script that parses the MAC addresses from file1.csv and logs a report if any MAC address shows up in file2.csv.
The questions:
Any tips on the language I use, preferably perl, python or bash?
Can anyone suggest some structure for the logic needed (even if just in psuedo-code)?
update
Using #Adam Wagner's approach, I am really close!
import csv
#Need to strip out NUL values from .csv file to make python happy
class FilteredFile(file):
def next(self):
return file.next(self).replace('\x00','').replace('\xff\xfe','')
reader = csv.reader(FilteredFile('wifi_clients.csv', 'rb'), delimiter=',', quotechar='|')
s1 = set(rec[0] for rec in reader)
inventory = csv.reader(FilteredFile('inventory.csv','rb'),delimiter=',')
s2 = set(rec[6] for rec in inventory)
shared_items = s1.intersection(s2)
print shared_items
This always outputs:(even if I doctor the .csv files to have matching MAC addresses)
set([])
Contents of the csv files
wifi_clients.csv
macNames, First time seen, Last time seen,Power, # packets, BSSID, Probed ESSIDs
inventory.csv
Name,Manufacturer,Device Type,Model,Serial Number,IP Address,MAC Address,...
Here's the approach I'd take:
Iterate over each csv file (python has a handy csv module for accomplishing this), capturing the mac-address and placing it in a set (one per file). And once again, python has a great builtin set type. Here's a good example of using the csv module and of-course, the docs.
Next, you can get the intersection of set1 (file1) and set2 (file2). This will show you mac-addresses that exist in both files one and two.
Example (in python):
s1 = set([1,2,3]) # You can add things incrementally with "s1.add(value)"
s2 = set([2,3,4])
shared_items = s1.intersection(s2)
print shared_items
Which outputs:
set([2, 3])
Logging these shared items could be done with anything from printing (then redirecting output to a file), to using the logging module, to saving directly to a file.
I'm not sure how in-depth of an answer you were looking for, but this should get you started.
Update: CSV/Set usage example
Assuming you have a file "foo.csv", that looks something like this:
bob,123,127.0.0.1,mac-address-1
fred,124,127.0.0.1,mac-address-2
The simplest way to build the set, would be something like this:
import csv
set1 = set()
for record in csv.reader(open('foo.csv', 'rb')):
user, machine_id, ip_address, mac_address = record
set1.add(mac_address)
# or simply "set1.add(record[3])", if you don't need the other fields.
Obviously, you'd need something like this for each file, so you may want to put this in a function to make life easier.
Finally, if you want to go the less-verbose-but-cooler-python-way, you could also build the set like this:
csvfile = csv.reader(open('foo.csv', 'rb'))
set1 = set(rec[3] for rec in csvfile) # Assuming mac-address is the 4th column.
I strongly recommend python to do this.
'Cause you didn't give the structure of the csv file, I can only show a framework:
def get_MAC_from_file1():
... parse the file to get MAC
return a_MAC_list
def get_MAC_from_file2():
... parse the file to get MAC
return a_MAC_list
def log_MACs():
MAC_list1, MAC_list2 = get_MAC_from_file1(), get_MAC_from_file2()
for a_MAC in MAC_list1:
if a_MAC in MAC_list2:
...write your logs
if the data set is large, use a dict or set instead of the list and the intersect operation. But as it's MAC address, I guess your dataset is not that large. So keeping the script easy to read is the most important thing.
Awk is perfect for this
{
mac = $1 # assuming the mac addresses are in the first column
do_grep = "grep " mac " otherfilename" # we'll use grep to check if the mac address is in the other file
do_grep | getline mac_in_other_file # pipe the output of the grep command into a new variable
close(do_grep) # close the pipe
if(mac_in_other_file != ""){ # if grep found the mac address in the other file
print mac > "naughty_macs.log" # append the mac address to the log file
}
}
Then you'd run that on the first file:
awk -f logging_script.awk mac_list.txt
(this code is untested and I'm not the greatest awk hacker, but it should give the general idea)
For the example purpose generate 2 files that that look like yours.
File1:
for i in `seq 100`; do
echo -e "user$i\tmachine$i\t192.168.0.$i\tmac$i";
done > file1.csv
File2 (contains random entries of "mac addresses" numbered from 1-200)
for j in `seq 100`; do
i=$(($RANDOM % 200)) ;
echo -e "mac$i\tmachine$i\tuser$i";
done > file2.csv
Simplest approach would be to use join command and do a join on the appropriate field. This approach has the advantage that fields from both files would be available in the output.
Based on the example files above, the command would look like this:
join -1 4 -2 1 <(sort -k4 file1.csv) <(sort -k1 file2.csv)
join needs the input to be sorted by the field you are matching, that's why the sort is there (-k tells which column to use)
The command above matches rows from file1.csv with rows from file2.csv if column 4 in the first file is equal with column 1 from the second file.
If you only need specific fields, you can specify the output format to the join command:
join -1 4 -2 1 -o1.4 1.2 <(sort -k4 file1.csv) <(sort -k1 file2.csv)
This would print only the mac address and the machine field from the first file.
If you only need a list of matching mac addresses, you can use uniq or sort -u. Since the join output will be sorted by mac, uniq is faster. But if you need a unique list of another field, sort -u is better.
If you only need the mac addresses that match, grep can accept patterns from a file, and you can use cut to extract only the forth field.
fgrep -f<(cut -f4 file1.csv) file2.csv
The above would list all the lines in file2.csv that contain a mac address from file1
Note that I'm using fgrep which doesn't do pattern matching. Also, if file1 is big, this may be slower than the first approach. Also, it assumes that the mac is present only in the field1 of file2 and the other fields don't contain mac addresses.
If you only need the mac, you can either use -o option on fgrep but there are grep variants that don't have it, or you can pipe the output trough cut and then sort -u
fgrep -f<(cut -f4 file1.csv) file2.csv | cut -f1 | sort -u
This would be the bash way.
Python and awk hints have been shown above, I will take a stab at perl:
#!/usr/bin/perl -w
use strict;
open F1, $ARGV[0];
my %searched_mac_addresses = map {chomp; (split /\t/)[3] => 1 } <F1>;
close F1;
open F2, $ARGV[1];
while (<F2>) {
print if $searched_mac_addresses{(split "\t")[0]}
}
close F2
First you create a dictionary containing all the mac addresses from the first file:
my %searched_mac_addresses = map {chomp; (split /\t/)[3] => 1 } <F1>;
reads all the lines from the file1
chomp removes the end of line
split splits the line based on tab, you can use a more complex regexp if needed
() around split force an array context
[3] selects the forth field
map runs a piece of code for all elements of the array
=> generates a dictionary (hash in perl's terminology) element instead of an array
Then you read line by line the second file, and check if the mac exists in the above dictionary:
while (<F2>) {
print if $searched_mac_addresses{(split "\t")[0]}
}
while () will read the file F2, and put each line in the $_ variable
print without any parameters prints the default variable $_
if can postfix a instruction
dictionary elements can be accessed via {}
split by default splits the $_ default variable

Categories