Wildcard or * for matching a datetime python 2.7 - python

I am trying to match the following string and not having any luck. Below you will find my attempt.
LOG FORMAT:
riskserver.2014-04-07-08:45:01.log
I think I only will need the year month and date. So I was attempting a wildcard * which python 2.7 does not seem to like.
cmd = 'tail -n10000 /opt/rubedo/log/riskserver.'+nowFormat+*'
Help is very much appreciated here. Thanks, I hope I explained this well, and some can understand.
I am using subprocess with grep involved.
tail: cannot open `/opt/rubedo/log/riskserver.2014-04-08' for reading: No such file or directory
grep: not: No such file or directory
EDIT:
now = datetime.datetime.now().strftime("%H:%M:%S")
nowFormat = datetime.datetime.now().strftime("%Y\-%m\-%d")

glob is the splat (*) , eg ls *.txt gets processed by the linux shell into ls f1.txt f2.txt f3.txt f4.txt ...
so that ls actually recieves a list of files that match not the matching string. that is what they mean in the comments
nowFormat = "2014-04-07"
cmd = 'tail -n10000 /opt/rubedo/log/riskserver.'+nowFormat+'*'
os.system(cmd) #this will execute it through your linux shell you should see the output, allthough this call will not give you access to the output in python
or in python
import glob
fnames = glob.glob('/opt/rubedo/log/riskserver.'+nowFormat+'*')
print fnames

Related

Running bash on python to read from S3 Bucket and saving output

I'm trying to run the following bash command on python and save that output into a variable. I'm new to using bash so any help will be appreciated.
Here's my usecase I have data stored in an S3 bucket (let's say the path is s3://test-bucket/folder1/subd1/datafiles/)
in the datafiles folder there are multiple data files:
a1_03_27_2020_N.csv
a1_04_05_2021_O.csv
a1_07_16_2021_N.csv
I'm trying to select the latest file (in this case a1_07_16_2021_N) and then read that data file using pandas
Here's what I have so far
The command to select the latest file
ls -t a1*|head -1
but then I'm not sure how to
1- run that command on python
2- how to save that the output as a variable
(I know this is not correct but something like
latest_file = os.environ['ls -t a1*|head -1'])
Then read the file:
df = pd.read_csv(latest_file)
Thank you in advance again!
Python replaces most shell functionality. You can do the search and filtering in python itself. No need for a callout.
from pathlib import Path
dir_to_search = Path("test-bucket/folder1/subd1/datafiles/")
try:
latest = max(dir_to_search.glob("a1*.csv"), key=lambda path: path.stat().st_mtime)
print(latest)
except ValueError:
print("no csv here")
But if you want to run the shell, several functions in subprocess will do it. For instance,
import subprocess as subp
result = subp.run("ls -t test-bucket/folder1/subd1/datafiles/a1* | head -1",
shell=True,
capture_output=True, text=True).stdout.strip()

I am trying to print the last line of every file in a directory using shell command from python script

I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array.
Here is the piece of code I have written.
import os
temp = os.system('ls -l /home/demo/ | wc -l')
no_of_files = temp - 1
command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"
file_list=[os.system(command)]
for i in range(len(file_list))
os.system('tail -1 file_list[i]')
Your shell scripting is orders of magnitude too complex.
output = subprocess.check_output('tail -qn1 *', shell=True)
or if you really prefer,
os.system('tail -qn1 *')
which however does not capture the output in a Python variable.
If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:
output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))
As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:
for filename in os.listdir('.'):
os.system("tail -n 1 '%s'" % filename)
This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).
for filename in os.listdir('.'):
print(subprocess.check_output(['tail', '-n1', filename]))
Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.
for filename in os.listdir('.'):
with open (filename, 'r') as handle:
for line in handle:
pass
# print the last one only
print(line.rstrip('\r\n'))
If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.
os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True
Example:
>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")
Edit (as suggested by #tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:
>>> import glob
>>> names = glob.glob("/home/demo/*")
will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.
Another option is:
>>> import os
>>> os.listdir("/home/demo")
Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt
The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.
You could likely use a loop without much issue:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
with open(f, 'rb') as fh:
last = fh.readlines()[-1].decode()
print('file: {0}\n{1}\n'.format(f, last))
fh.close()
Output:
file.txt
Hello, World!
...
If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:
for f in files:
print('file: {0}'.format(f))
subprocess.check_call(['tail', '-n', '1', f])
print('\n')
The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.
you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs
On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.
os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess
Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory

Chain of UNIX commands within Python

I'd like to execute the following UNIX command in Python:
cd 2017-02-10; pwd; echo missing > 123.txt
The date directory DATE = 2017-02-10 and OUT = 123.txt are already variables in Python so I have tried variations of
call("cd", DATE, "; pwd; echo missing > ", OUT)
using the subprocess.call function, but I’m struggling to find documentation for multiple UNIX commands at once, which are normally separated by ; or piping with >
Doing the commands on separate lines in Python doesn’t work either because it “forgets” what was executed on the previous line and essentiality resets.
You can pass a shell script as a single argument, with strings to be substituted as out-of-band arguments, as follows:
date='2017-02-10'
out='123.txt'
subprocess.call(
['cd "$1"; pwd; echo missing >"$2"', # shell script to run
'_', # $0 for that script
date, # $1 for that script
out, # $2 for that script
], shell=True)
This is much more secure than substituting your date and out values into a string which is evaluated by the shell as code, because these values are treated as literals: A date of $(rm -rf ~) will not in fact try to delete your home directory. :)
Doing the commands on separate lines in Python doesn’t work either
because it “forgets” what was executed on the previous line and
essentiality resets.
This is because if you have separate calls to subprocess.call it will run each command in its own shell, and the cd call has no effect on the later shells.
One way around that would be to change the directory in the Python script itself before doing the rest. Whether or not this is a good idea depends on what the rest of the script does. Do you really need to change directory? Why not just write "missing" to 2017-02-10/123.txt from Python directly? Why do you need the pwd call?
Assuming you're looping through a list of directories and want to output the full path of each and also create files with "missing" in them, you could perhaps do this instead:
import os
base = "/path/to/parent"
for DATE, OUT in [["2017-02-10", "123.txt"], ["2017-02-11", "456.txt"]]:
date_dir = os.path.join(base, DATE)
print(date_dir)
out_path = os.path.join(date_dir, OUT)
out = open(out_path, "w")
out.write("missing\n")
out.flush()
out.close()
The above could use some error handling in case you don't have permission to write to the file or the directory doesn't exist, but your shell commands don't have any error handling either.
>>> date = "2017-02-10"
>>> command = "cd " + date + "; pwd; echo missing > 123.txt"
>>> import os
>>> os.system(command)

Escape $ in filename

I have a list of files on my filesystem which I'd like to chmod to 664 via python.
On of the filenames/dirpaths (I am not allowed to change the filename nor dirpaths!!!) is:
/home/media/Music/Ke$ha/song.mp3 (NOTE $ is a literal, not a variable!)
I receive the files in a list: ['/some/path/file1', '/some/otherpath/file2', etc...]
If I try to run the following code:
files = ['/home/media/Music/Ke$ha/song.mp3']
for file in files:
os.chmod(file, 0664)
It complains that it cannot find /home/media/Music/Ke$ha/song.mp3. Most likely (I guess) because the called shell tries to expand $ha, which is obviously wrong.
The 'Ke$ha' file is just an example, there are many more files with escape characters in it (e.g. /home/media/Music/Hill's fire/song.mp3)
The question I have is: How can I elegantly convince python and/or the shell to handle these files properly?
Kind regards,
Robert Nagtegaal.
You can do like this
files=["/home/media/Music/Ke$ha/song.mp3", "/home/media/Music/Hill's fire/song.mp3"]
import os,re
os.system("chmod 777 " + re.escape(files[i]))
How about this, a raw string? Also is your username 'media'?
files = [r'/home/media/Music/Ke$ha/song.mp3']

Capture output of complex shell command in python

I would like to embed a command in a python script and capture the output. In this scenario I'm trying to use "find" to find an indeterminate number of files in an indeterminate number of subdirs, and grep each matching file for a string, something like:
grep "rabbit" `find . -name "*.txt"`
I'm running Python 2.6.6 (yeah, I'm sorry too, but can't budge the entire organization for this right now).
I've tried a bunch of things using subprocess, shlex, etc. that have been suggested in here, but I haven't found a syntax that will either swallow this, or ends up sucking the "find..."as the search string forgrep`, etc. Suggestions appreciated.
Ken
import subprocess
find_p = subprocess.Popen(["find", ".", "-name", "*.txt"], stdout=subprocess.PIPE)
grep_p = subprocess.Popen(["xargs", "grep", "rabbit"], stdin=find_p.stdout)
grep_p.wait()

Categories