Output of command to list using Python - python

I'm writing an automation script, where it needs to run a command and the output of command should be captured as a list.
For example:
# ls -l | awk '{print $9}'
test1
test2
I want the output to be captured as a list like var = ["test1", "test2"].
Right now I tried this but it is saving as string instead of list:
# Filter the tungsten services
s = subprocess.Popen(["ls -l | awk '{print $9}'"], shell=True, stdout=subprocess.PIPE).stdout
service_state = s.read()
Please guide me if anyone has any idea to achieve this.

You can use
service_states = s.read().splitlines()
but note that this is brittle: File names can contain odd characters (like spaces).
So you're probably better off using os.listdir(path) which gives you a list of file names.

You can post-process the string according to your needs.
string.splitlines() (https://docs.python.org/2/library/stdtypes.html#str.splitlines) will break the string into a list of lines.
If you need to split the results further, you can use .split().

No needed for subprocess:
a, d , c = os.walk('.').next()
service_state = d + c

Related

Executing awk in Python shell

I have a shell command which parses a certain content and gives the required output. I need to implement this in python but the shell command has a new line character "\n" which is not getting getting executed when run through python command.
Of the many lines in the output log, the required line looks like - configurationFile=/app/log/conf/the_jvm_name.4021.logback.xml
I would only need the_jvm_name from the above. The syntax will always be the same. The shell command works fine.
Shell Command -
ps -ef | grep 12345 | tr " " "\n" | grep logback.configurationFile | awk -F"/" '{print $NF}'| cut -d. -f1
Python (escaped all the required double quotes) -
import subprocess
pid_arr = "12345"
sh_command = "ps -ef | grep "+pid_arr+" | tr \" \" \"\n\" | grep configurationFile | awk -F \"/\" '{print $NF}' | cut -d. -f1"
outpt = subprocess.Popen(sh_command , shell=True,stdout=subprocess.PIPE).communicate()[0].decode('utf-8').strip()
With python, I'm not getting the desired output. It just prints configurationFile as it is in the command.
what am I missing here. Any other better way for getting this details?
You can achieve what you want using a regex substitution in Python:
output = subprocess.check_output(["ps", "-ef"])
for line in output.splitlines():
if re.search("12345", line):
output = re.sub(r".*configurationFile=.*/([^.]+).*", r"\1", line)
This captures the part after the last / in the configuration file path, up to the next ..
You could make it slightly more robust by checking only the second column (the PID) for 12345, either by splitting each line on white space:
cols = re.split("\s+", line)
if len(cols) > 1 and cols[1] == "12345":
or by using a better regex, like:
if re.match(r"\S+\s+12345\s", line):
Note that you could also shorten your pipe considerable by just doing something like:
ps -ef | sed -nE '/12345/ { s/.*configurationFile=.*\/([^.]*).*/\1/; p }'
Your shell command works, but it has to deal with too many lines of output and too many fields per line. An easier solution is to tell the ps command to just give you 1 line and on that line, just one field that you care about. For example, on my system:
ps -o cmd h 979
will output:
/usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
The -o cmd flag will output only the CMD column of the output, while the h parameter represents a command to tell ps to omit the header. Finally, the 979 is the process ID, which tells ps to output information just for this process.
This output is not exactly what you have in your problem, but similar enough. Once we limited the output, we eliminate the need for other commands such as grep, awk, ... At this point, we can use regular expression to extract what we want:
from __future__ import print_function
import re
import subprocess
pid = '979'
command = ['ps', '-o', 'cmd', 'h', pid]
output = subprocess.check_output(command)
pattern = re.compile(r"""
config-file= # Literal string search
.+\/ # Everything up to the last forward slash
([^.]+) # Non-dot chars, this is what we want
""", re.VERBOSE)
matched = pattern.search(output)
if matched:
print(matched.group(1))
Notes
For the regular expression, I am using a verbose form, allowing me to use comment to annotate my pattern. I like this way as regular expression can be difficult to read
On your system, please adjust the "configuration-file" part to work with your output.

python subprocess awk with -F option and using variable for input file

I have a text file that has data delimited with '|'
E.g.
123 | 456 | 789
I want to print the second column only.
I can use awk in the shell like this: awk -F'|' '{print $2}' file.txt
However, I want to use python subprocess to do this. And also the input file must be a variable.
Right now, this is what I have.
import subprocess
file = "file-03-10-2016.txt"
with open('another_file.txt', 'wb') as output:
var = subprocess.check_call(['awk', '{print $2}', file])
print var
This prints the second column but it uses space as a delimiter. I want to change the delimiter to '|' using the -F option for awk.
Try:
var = subprocess.check_call(['awk', '-F|', '{print $2}', file])
However, I feel like I should point out that this task is very easy to do in pure python:
def awk_split(file_name, column, fs=None):
with open(file_name, 'r') as file_stream:
for line in file_stream:
yield line.split(fs)[column]
for val in awk_split(file, 1, fs='|'):
# do something...
subprocess.check_call takes a list of strings that are joined with space characters and passed to the shell. So you can just add the -F'|' argument as an item in the list. The only catch, is that the list is using single quotes. If you want to be consistent, you need to escape the single quotes in your argument:
var = subprocess.check_call(['awk', '-F\'|\'', '{print $2}', file])
Alternatively, python also accepts doublequotes as string delimiters:
var = subprocess.check_call(['awk', "-F'|'", '{print $2}', file])
Hope that helps.

Split Command - Choose Output Name

I have a text file named myfile.txt. The file contains 50,000 lines and I would like to split it into 50 text files. I know that this is easy with the split command:
split myfile.txt
This will output 50 1000-line files: xaa, xab, and xac.
My question, how do I run split my text file so that it names the output files:
1.txt
2.txt
3.txt
...
50.txt
Seeking answers in python or bash please. Thank you!
Here is a potential solution using itertools.islice to get the chunks and string formatting for the different file names:
from itertools import islice
with open('myfile.txt') as in_file:
for i in range(1, 51):
with open('{0}.txt'.format(i), 'w') as out_file:
lines = islice(in_file, 1000)
out_file.writelines(lines)
its not exactly what you are looking for, but running
split -d myfile.txt
will output
x00
x01
x02
...
To generate test data in empty directory, you can use
seq 50000 | split -d
To rename in the way that you want, you can use
ls x* | awk '{print $0, (substr($0,2)+1) ".txt"}' | xargs -n2 mv
Here's a funny one: if your split command supports the --filter option, you can use it!
If you call
split --filter=./banana myfile.txt
then the command ./banana will be executed with the environmental variable FILE set to the name split would choose to write the chunk it's processing. This command will receive on its standard input the chunk being processed. If this command returns a non-zero status code, then split will interrupt its operations.
Together with the -d option, that's exactly what you want. With the -d option, the name split will choose for the filenames will be x01, x02, etc.
Make a script:
#!/bin/bash
# remove the leading x from FILE
n=${FILE#x}
# check that n is a number
[[ $n = +([[:digit:]]) ]] || exit 1
# remove the leading zeroes from n
n=$((10#$n))
# send stdin to file
cat > "$n.txt"
Call this script banana, chmod +x it and let's go:
split -d --filter=./banana myfile.txt
This --filter option is really funny.
Here's an example of how you could split this file in bash:
split -l 1000 -d --additional-suffix=.txt myfile.txt
The -l argument determines the number of lines included in each split file (1000 in this case, for 50 total files), the -d argument uses numbers instead of letters for the suffixes, and the value we pass to the --additional-suffix argument here gives each file a .txt file extension.
This will create
x00.txt
x01.txt
x01.txt
etc.
If you wanted to change the 'x' portion of the output files, you'd want to add a prefix after the input file (e.g. myfile.txt f would create f01.txt, f02.txt, etc.)
Note that without --additional-suffix, your files will all lack filename extensions.
I've looked to see if there's a way to split a file and name them with only the suffix, but I haven't found anything.
A simple approach:
f=open('your_file')
count_line,file = 0,1
for x in f:
count_line +=1
if count%1000 == 1:
f1 = open(str(file) + '.txt','w')
f1.write(x)
file +=1
elif count_line%1000 == 0:
f1.write(x)
f1.close()
else:f1.write(x)

Storing value from a parsed ping

I'm working on some code that performs a ping operation from python and extracts only the latency by using awk. This is currently what I have:
from os import system
l = system("ping -c 1 sitename | awk -F = 'FNR==2 {print substr($4,1,length($4)-3)}'")
print l
The system() call works fine, but I get an output in terminal rather than the value storing into l. Basically, an example output I'd get from this particular block of code would be
90.3
0
Why does this happen, and how would I go about actually storing that value into l? This is part of a larger thing I'm working on, so preferably I'd like to keep it in native python.
Use subprocess.check_output if you want to store the output in a variable:
from subprocess import check_output
l = check_output("ping -c 1 sitename | awk -F = 'FNR==2 {print substr($4,1,length($4)-3)}'", shell=True)
print l
Related: Extra zero after executing a python script
os.system() returns the return code of the called command, not the output to stdout.
For detail on how to properly get the command's output (including pre-Python 2.7), see this: Running shell command from Python and capturing the output
BTW I would use Ping Package https://pypi.python.org/pypi/ping
It looks promising
Here is how I store output to a variable.
test=$(ping -c 1 google.com | awk -F"=| " 'NR==2 {print $11}')
echo "$test"
34.9

Script to compare a string in two different files

I am brand new to stackoverflow and to scripting. I was looking for help to get started in a script, not necessarily looking for someone to write it.
Here's what I have:
File1.csv - contains some information, I am only interested in MAC addresses.
File2.csv - has some different information, but also contains MAC address.
I need a script that parses the MAC addresses from file1.csv and logs a report if any MAC address shows up in file2.csv.
The questions:
Any tips on the language I use, preferably perl, python or bash?
Can anyone suggest some structure for the logic needed (even if just in psuedo-code)?
update
Using #Adam Wagner's approach, I am really close!
import csv
#Need to strip out NUL values from .csv file to make python happy
class FilteredFile(file):
def next(self):
return file.next(self).replace('\x00','').replace('\xff\xfe','')
reader = csv.reader(FilteredFile('wifi_clients.csv', 'rb'), delimiter=',', quotechar='|')
s1 = set(rec[0] for rec in reader)
inventory = csv.reader(FilteredFile('inventory.csv','rb'),delimiter=',')
s2 = set(rec[6] for rec in inventory)
shared_items = s1.intersection(s2)
print shared_items
This always outputs:(even if I doctor the .csv files to have matching MAC addresses)
set([])
Contents of the csv files
wifi_clients.csv
macNames, First time seen, Last time seen,Power, # packets, BSSID, Probed ESSIDs
inventory.csv
Name,Manufacturer,Device Type,Model,Serial Number,IP Address,MAC Address,...
Here's the approach I'd take:
Iterate over each csv file (python has a handy csv module for accomplishing this), capturing the mac-address and placing it in a set (one per file). And once again, python has a great builtin set type. Here's a good example of using the csv module and of-course, the docs.
Next, you can get the intersection of set1 (file1) and set2 (file2). This will show you mac-addresses that exist in both files one and two.
Example (in python):
s1 = set([1,2,3]) # You can add things incrementally with "s1.add(value)"
s2 = set([2,3,4])
shared_items = s1.intersection(s2)
print shared_items
Which outputs:
set([2, 3])
Logging these shared items could be done with anything from printing (then redirecting output to a file), to using the logging module, to saving directly to a file.
I'm not sure how in-depth of an answer you were looking for, but this should get you started.
Update: CSV/Set usage example
Assuming you have a file "foo.csv", that looks something like this:
bob,123,127.0.0.1,mac-address-1
fred,124,127.0.0.1,mac-address-2
The simplest way to build the set, would be something like this:
import csv
set1 = set()
for record in csv.reader(open('foo.csv', 'rb')):
user, machine_id, ip_address, mac_address = record
set1.add(mac_address)
# or simply "set1.add(record[3])", if you don't need the other fields.
Obviously, you'd need something like this for each file, so you may want to put this in a function to make life easier.
Finally, if you want to go the less-verbose-but-cooler-python-way, you could also build the set like this:
csvfile = csv.reader(open('foo.csv', 'rb'))
set1 = set(rec[3] for rec in csvfile) # Assuming mac-address is the 4th column.
I strongly recommend python to do this.
'Cause you didn't give the structure of the csv file, I can only show a framework:
def get_MAC_from_file1():
... parse the file to get MAC
return a_MAC_list
def get_MAC_from_file2():
... parse the file to get MAC
return a_MAC_list
def log_MACs():
MAC_list1, MAC_list2 = get_MAC_from_file1(), get_MAC_from_file2()
for a_MAC in MAC_list1:
if a_MAC in MAC_list2:
...write your logs
if the data set is large, use a dict or set instead of the list and the intersect operation. But as it's MAC address, I guess your dataset is not that large. So keeping the script easy to read is the most important thing.
Awk is perfect for this
{
mac = $1 # assuming the mac addresses are in the first column
do_grep = "grep " mac " otherfilename" # we'll use grep to check if the mac address is in the other file
do_grep | getline mac_in_other_file # pipe the output of the grep command into a new variable
close(do_grep) # close the pipe
if(mac_in_other_file != ""){ # if grep found the mac address in the other file
print mac > "naughty_macs.log" # append the mac address to the log file
}
}
Then you'd run that on the first file:
awk -f logging_script.awk mac_list.txt
(this code is untested and I'm not the greatest awk hacker, but it should give the general idea)
For the example purpose generate 2 files that that look like yours.
File1:
for i in `seq 100`; do
echo -e "user$i\tmachine$i\t192.168.0.$i\tmac$i";
done > file1.csv
File2 (contains random entries of "mac addresses" numbered from 1-200)
for j in `seq 100`; do
i=$(($RANDOM % 200)) ;
echo -e "mac$i\tmachine$i\tuser$i";
done > file2.csv
Simplest approach would be to use join command and do a join on the appropriate field. This approach has the advantage that fields from both files would be available in the output.
Based on the example files above, the command would look like this:
join -1 4 -2 1 <(sort -k4 file1.csv) <(sort -k1 file2.csv)
join needs the input to be sorted by the field you are matching, that's why the sort is there (-k tells which column to use)
The command above matches rows from file1.csv with rows from file2.csv if column 4 in the first file is equal with column 1 from the second file.
If you only need specific fields, you can specify the output format to the join command:
join -1 4 -2 1 -o1.4 1.2 <(sort -k4 file1.csv) <(sort -k1 file2.csv)
This would print only the mac address and the machine field from the first file.
If you only need a list of matching mac addresses, you can use uniq or sort -u. Since the join output will be sorted by mac, uniq is faster. But if you need a unique list of another field, sort -u is better.
If you only need the mac addresses that match, grep can accept patterns from a file, and you can use cut to extract only the forth field.
fgrep -f<(cut -f4 file1.csv) file2.csv
The above would list all the lines in file2.csv that contain a mac address from file1
Note that I'm using fgrep which doesn't do pattern matching. Also, if file1 is big, this may be slower than the first approach. Also, it assumes that the mac is present only in the field1 of file2 and the other fields don't contain mac addresses.
If you only need the mac, you can either use -o option on fgrep but there are grep variants that don't have it, or you can pipe the output trough cut and then sort -u
fgrep -f<(cut -f4 file1.csv) file2.csv | cut -f1 | sort -u
This would be the bash way.
Python and awk hints have been shown above, I will take a stab at perl:
#!/usr/bin/perl -w
use strict;
open F1, $ARGV[0];
my %searched_mac_addresses = map {chomp; (split /\t/)[3] => 1 } <F1>;
close F1;
open F2, $ARGV[1];
while (<F2>) {
print if $searched_mac_addresses{(split "\t")[0]}
}
close F2
First you create a dictionary containing all the mac addresses from the first file:
my %searched_mac_addresses = map {chomp; (split /\t/)[3] => 1 } <F1>;
reads all the lines from the file1
chomp removes the end of line
split splits the line based on tab, you can use a more complex regexp if needed
() around split force an array context
[3] selects the forth field
map runs a piece of code for all elements of the array
=> generates a dictionary (hash in perl's terminology) element instead of an array
Then you read line by line the second file, and check if the mac exists in the above dictionary:
while (<F2>) {
print if $searched_mac_addresses{(split "\t")[0]}
}
while () will read the file F2, and put each line in the $_ variable
print without any parameters prints the default variable $_
if can postfix a instruction
dictionary elements can be accessed via {}
split by default splits the $_ default variable

Categories