Order a sequence of dates as they occur in calendar year

Order a sequence of dates as they occur in calendar year - python

I've got a series of pipes to convert dates in a text file into unique, human readable output and pull out MM DD. Now I would like to resort the output so that the dates display in the order in which they occur during the year. Anybody know a good technique using the standard shell or with a readily installable package on *nix?
Feb 4
Feb 5
Feb 6
Feb 7
Feb 8
Jan 1
Jan 10
Jan 11
Jan 12
Jan 13
Jan 2
Jan 25
Jan 26
Jan 27
Jan 28
Jan 29
Jan 3
Jan 30
Jan 31
Jan 4
Jan 5
Jan 6
Jan 7
Jan 8
Jan 9

There is a utility called sort with an option -M for sorting by month. If you have it installed, you could use that. For instance:
sort -k1 -M test.txt
-k1: First column
-M: Sort by month
Edited per twalberg's suggestion below:
sort -k1,1M -k2,2n test.txt

In two steps:
$ while read line; do date -d "$line" "+%Y%m%d"; done < file | sort -n > temp
$ while read line; do date -d "$line" "+%b %d"; done < temp > file
Firstly we convert dates to YYYYMMDD and order them:
$ while read line; do date -d "$line" "+%Y%m%d"; done < file | sort -n > temp
$ cat temp
20130101
20130102
20130103
20130104
20130105
20130106
20130107
20130108
20130109
20130110
20130111
20130112
20130113
20130125
20130126
20130127
20130128
20130129
20130130
20130131
20130204
20130205
20130206
20130207
20130208
Then we print them back to previous format %b %d:
$ while read line; do date -d "$line" "+%b %d"; done < temp > file
$ cat file
Jan 01
Jan 02
Jan 03
Jan 04
Jan 05
Jan 06
Jan 07
Jan 08
Jan 09
Jan 10
Jan 11
Jan 12
Jan 13
Jan 25
Jan 26
Jan 27
Jan 28
Jan 29
Jan 30
Jan 31
Feb 04
Feb 05
Feb 06
Feb 07
Feb 08

and sed -n "1 {
H
x
s/.(\n)./01 Jan\102 Feb\103 Mar\104 Apr\105 May\106 Jun\107 Jul\105 Aug\109 Sep\110 Oct\111 Nov\112 Dec/
x
}
s/^\(.\{3\}\) \([0-9]\) *$/\1 0\2/
H
$ {
x
t subs
: subs
s/^\([0-9]\{2\}\) \([[:alpha:]]\{3\}\)\(\n\)\(.*\)\n\2/\1 \2\3\4\3\1 \2/
t subs
s/^[0-9]\{2\} [[:alpha:]]\{3\}\n//
t subs
p
}
" | sort | sed "s/^[0-9][0-9] //"
still need a sort (or a lot more complex sed for sorting) and when sort -M doesn't work

Related

python compressed 4Gb bz2 EOFError: end of stream was already found nested subfolders

I'm trying to read a specific file from a compressed file bz2 using python.
tar = tarfile.open(filename, "r|bz2", bufsize=57860311)
for tarinfo in tar:
print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
if tarinfo.isreg():
print "a regular file."
# read the file
f = tar.extractfile(tarinfo)
#print f.read()
elif tarinfo.isdir():
print "a directory."
else:
print "something else."
tar.close()
But at the end I got the error:
/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.pyc in read(self, size)
577 buf = "".join(t)
578 else:
--> 579 buf = self._read(size)
580 self.pos += len(buf)
581 return buf
/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.pyc in _read(self, size)
594 break
595 try:
--> 596 buf = self.cmp.decompress(buf)
597 except IOError:
598 raise ReadError("invalid compressed data")
EOFError: end of stream was already found
I also tried to list the files within the tar through 'tar.list()' and again ...
-rwxr-xr-x lindauer/or3uunp 0 2013-05-21 00:58:36 r3.2/
-rw-r--r-- lindauer/or3uunp 6057 2012-01-05 14:41:00 r3.2/readme.txt
-rw-r--r-- lindauer/or3uunp 44732 2012-01-04 10:08:54 r3.2/psychometric.csv
-rw-r--r-- lindauer/or3uunp 57860309 2012-01-04 09:58:20 r3.2/logon.csv
/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.pyc in _read(self, size)
594 break
595 try:
--> 596 buf = self.cmp.decompress(buf)
597 except IOError:
598 raise ReadError("invalid compressed data")
EOFError: end of stream was already found
I listed the files inside the archive using the tar command. Here is the result:
tar -tvf r3.2.tar.bz2
drwxr-xr-x 0 lindauer or3uunp 0 May 21 2013 r3.2/
-rw-r--r-- 0 lindauer or3uunp 6057 Jan 5 2012 r3.2/readme.txt
-rw-r--r-- 0 lindauer or3uunp 44732 Jan 4 2012 r3.2/psychometric.csv
-rw-r--r-- 0 lindauer or3uunp 57860309 Jan 4 2012 r3.2/logon.csv
-rw-r--r-- 0 lindauer or3uunp 12494829865 Jan 5 2012 r3.2/http.csv
-rw-r--r-- 0 lindauer or3uunp 1066622500 Jan 5 2012 r3.2/email.csv
-rw-r--r-- 0 lindauer or3uunp 218962503 Jan 5 2012 r3.2/file.csv
-rw-r--r-- 0 lindauer or3uunp 29156988 Jan 4 2012 r3.2/device.csv
drwxr-xr-x 0 lindauer or3uunp 0 May 20 2013 r3.2/LDAP/
-rw-r--r-- 0 lindauer or3uunp 140956 Jan 4 2012 r3.2/LDAP/2011-01.csv
-rw-r--r-- 0 lindauer or3uunp 147370 Jan 4 2012 r3.2/LDAP/2010-05.csv
-rw-r--r-- 0 lindauer or3uunp 149221 Jan 4 2012 r3.2/LDAP/2010-02.csv
-rw-r--r-- 0 lindauer or3uunp 141717 Jan 4 2012 r3.2/LDAP/2010-12.csv
-rw-r--r-- 0 lindauer or3uunp 148931 Jan 4 2012 r3.2/LDAP/2010-03.csv
-rw-r--r-- 0 lindauer or3uunp 147370 Jan 4 2012 r3.2/LDAP/2010-04.csv
-rw-r--r-- 0 lindauer or3uunp 149793 Jan 4 2012 r3.2/LDAP/2009-12.csv
-rw-r--r-- 0 lindauer or3uunp 143979 Jan 4 2012 r3.2/LDAP/2010-09.csv
-rw-r--r-- 0 lindauer or3uunp 145591 Jan 4 2012 r3.2/LDAP/2010-07.csv
-rw-r--r-- 0 lindauer or3uunp 139444 Jan 4 2012 r3.2/LDAP/2011-03.csv
-rw-r--r-- 0 lindauer or3uunp 142347 Jan 4 2012 r3.2/LDAP/2010-11.csv
-rw-r--r-- 0 lindauer or3uunp 138285 Jan 4 2012 r3.2/LDAP/2011-04.csv
-rw-r--r-- 0 lindauer or3uunp 149793 Jan 4 2012 r3.2/LDAP/2010-01.csv
-rw-r--r-- 0 lindauer or3uunp 146008 Jan 4 2012 r3.2/LDAP/2010-06.csv
-rw-r--r-- 0 lindauer or3uunp 144711 Jan 4 2012 r3.2/LDAP/2010-08.csv
-rw-r--r-- 0 lindauer or3uunp 137967 Jan 4 2012 r3.2/LDAP/2011-05.csv
-rw-r--r-- 0 lindauer or3uunp 140085 Jan 4 2012 r3.2/LDAP/2011-02.csv
-rw-r--r-- 0 lindauer or3uunp 143420 Jan 4 2012 r3.2/LDAP/2010-10.csv
-r--r--r-- 0 lindauer or3uunp 3923 Jan 4 2012 r3.2/license.txt
I think this is due to the fact the archive has subfolders and for some reason python libraries have problems in dealing with subfolders extractions?
I also tried to open the tar file manually and I have no problems so I don't think the file is corrupted. Any help appreciated.

Comment: I tried the debug=3 and I get : ReadError: bad checksum
Found the following related Infos:
tar: directory checksum error
Cause
This error message from tar(1) indicates that the checksum of the directory and the files it has read from tape does not match the checksum advertised in the header block. Usually this message indicates the wrong blocking factor, although it could indicate corrupt data on tape.
Action
To resolve this problem, make certain that the blocking factor you specify on the command line (after -b) matches the blocking factor originally specified. If in doubt, leave out the block size and let tar(1) determine it automatically. If that remedy does not help, the tape data could be corrupted.
SE:tar-ignore-or-fix-checksum
I'd try the -i switch to see if you can just ignore and messages regarding EOF.
-i, --ignore-zeros ignore zeroed blocks in archive (means EOF)
Example
$ tar xivf backup.tar
bugs.python.org:tarfile-headererror
The comment in tarfile.py reads (Don't know the date of the file!):
- # We shouldn't rely on this checksum, because some tar programs
- # calculate it differently and it is merely validating the
- # header block.
ReadError: unexpected end of data
From the tarfile Documentation
The tarfile module defines the following exceptions:
exception tarfile.ReadError
Is raised when a tar archive is opened, that either cannot be handled by the tarfile module or is somehow invalid.
First, try with another tar archiv file to verify your python environent.
Second, check if your tar archiv file match the following format:
tarfile.DEFAULT_FORMAT
The default format for creating archives. This is currently GNU_FORMAT.
Third, instead of using tarfile.open(...), to create a tarfile instance, try to use the following, to set debug=3.
tar = tarfile.TarFile(name=filename, debug=3)
tar.open()
...
class tarfile.TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)

Python gzip gives null bytes

I'm trying to parse some log files in Python, but my responses always return only null bytes.
I've confirmed that the file in question does contain data:
$ zcat Events.log.gz | wc -c
188371128
$ zcat Events.log.gz | head
17 Jan 2018 08:10:35,863: {"deviceType":"A16ZV8BU3SN1N3",[REDACTED]}
17 Jan 2018 08:10:35,878: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:35,886: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:35,911: {"deviceType":"A2CZFJ2RKY7SE2",[REDACTED]}
17 Jan 2018 08:10:35,937: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:35,963: {"appOtaState":"ota",[REDACTED]}
17 Jan 2018 08:10:35,971: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:36,006: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:36,013: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:36,041: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
But attempting to read it in Python gives only null bytes:
$ python
Python 2.6.9 (unknown, Sep 14 2016, 17:46:59)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> filename = 'Events.log.gz'
>>> import gzip
>>> content = gzip.open(filename).read()
>>> len(content)
188371128
>>> for i in range(10):
... content[i*10000:(i*10000)+10]
...
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I've tried explicitly setting 'mode' to either 'r' or 'rb', with no difference in result.
I've also tried subprocess.Popen(['zcat', filename], stdout=subprocess.PIPE).stdout.read(), with the same response.
Perhaps relevantly, when I tried to zcat the file to another file, the output was a binary file:
$ zcat Events.log.gz > /tmp/logoutput
$ less /tmp/logoutput
"/tmp/logoutput" may be a binary file. See it anyway?
[y]
^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#...
$ head /tmp/logoutput
17 Jan 2018 08:10:35,863: {"deviceType":"A16ZV8BU3SN1N3",[REDACTED]}
17 Jan 2018 08:10:35,878: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:35,886: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:35,911: {"deviceType":"A2CZFJ2RKY7SE2",[REDACTED]}
17 Jan 2018 08:10:35,937: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:35,963: {"appOtaState":"ota",[REDACTED]}
17 Jan 2018 08:10:35,971: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:36,006: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:36,013: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:36,041: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}

Python regex is not extracting a substring from my log file

I'm using
date = re.findall(r"^(?:\w{3} ){2}\d{2} (?:[\d]{2}:){2}\d{2} \d{4}$", message)
in Python 2.7 to extract the substrings:
Wed Feb 04 13:29:49 2015
Thu Feb 05 13:45:08 2015
from a log file like this:
1424,Wed Feb 04 13:29:49 2015,51
1424,Thu Feb 05 13:45:08 2015,29
It is not working, and I'm required to use regex for this task, otherwise I would have split() it. What am I doing wrong?

As your sub-strings doesn't began from the first part of your string you dont need to assert position at start and end of the string so you can remove ^ and $
:
>>> s ="""
1424,Wed Feb 04 13:29:49 2015,51
1424,Thu Feb 05 13:45:08 2015,29"""
>>> date = re.findall(r"(?:\w{3} ){2}\d{2} (?:[\d]{2}:){2}\d{2} \d{4}", s)
>>> date
['Wed Feb 04 13:29:49 2015', 'Thu Feb 05 13:45:08 2015']
Also as an alternative proposition you can just use a positive look-behind :
>>> date = re.findall(r"(?<=\d{4},).*", s)
>>> date
['Wed Feb 04 13:29:49 2015,51', 'Thu Feb 05 13:45:08 2015,29']
or without using regex you can use str.split() and str.partition() for such tasks :
>>> s ="""
1424,Wed Feb 04 13:29:49 2015,51
1424,Thu Feb 05 13:45:08 2015,29"""
>>> [i.partition(',')[-1] for i in s.split('\n')]
['Wed Feb 04 13:29:49 2015,51', 'Thu Feb 05 13:45:08 2015,29']

a simple way to do this is just match by the commas
message = '1424,Wed Feb 04 13:29:49 2015,51 1424,Thu Feb 05 13:45:08 2015,29'
date = re.findall(r",(.*?),", message)
print date
>>> ['Wed Feb 04 13:29:49 2015', 'Thu Feb 05 13:45:08 2015']
DEMO

You dont need regex, use split.
line = "1424,Wed Feb 04 13:29:49 2015,51"
date = line.split(",")[1]
print date
>>>Wed Feb 04 13:29:49 2015

Using re.search in python filter function

I am not able to use re.search inside a filter expression.
I am trying to use re.search to extract the href values from a list where each element is a html line.
Here is what I am doing:
>>> filter(lambda html_line: re.search('.*a href=\"([^\"]*).*', html_line), data)
[u'Directory Feb 28 23:57 <b>2014.02.28</b>'
u'Directory Mar 01 23:59 <b>2014.03.01</b>'
u'Directory Mar 02 23:50 <b>2014.03.02</b>'
u'Directory Mar 03 23:59 <b>2014.03.03</b>'
u'Directory Mar 04 23:50 <b>2014.03.04</b>'
u'Directory Mar 05 23:50 <b>2014.03.05</b>'
u'Directory Mar 06 23:50 <b>2014.03.06</b>'
u'Directory Mar 07 23:50 <b>2014.03.07</b>'
u'Directory Mar 08 23:50 <b>2014.03.08</b>']
My re.search call seems to be working correctly.
For example, this works:
>>> for html_line in data:
print re.search('.*a href=\"([^\"]*).*', html_line).group(1)
/MyApp/LogBrowser?type=crawler/2014.02.28
/MyApp/LogBrowser?type=crawler/2014.03.01
/MyApp/LogBrowser?type=crawler/2014.03.02
/MyApp/LogBrowser?type=crawler/2014.03.03
/MyApp/LogBrowser?type=crawler/2014.03.04
/MyApp/LogBrowser?type=crawler/2014.03.05
/MyApp/LogBrowser?type=crawler/2014.03.06
/MyApp/LogBrowser?type=crawler/2014.03.07
/MyApp/LogBrowser?type=crawler/2014.03.08

filter will only filter the items it won't return the href value, you can use a list comprehension for this:
r = re.compile(r'.*a href=\"([^\"]*).*')
data = [x.group(1) for x in (r.search(html_line) for html_line in data)
if x is not None]

Python time delta

Please let me know if I am doing this part correctly. I am trying to grab files ONLY modified in the past 24 hours. However my output is ALL files in the directory regardless of modified time:
yesterday = date.today() - timedelta(days=1)
dayToStr = yesterday.strftime('%Y%m%d')
file_list_attr = sftp.listdir_attr()
for file in file_list_attr:
if file.st_mtime <= dayToStr:
print file
Output
-rw-r--r-- 1 4012 60 3404961 09 Jan 18:32 2_YEAR_912828UD0_20130109.dat
-rw-r--r-- 1 4012 60 10206411 09 Jan 18:32 3_YEAR_912828UG3_20130109.dat
-rw-r--r-- 1 4012 60 68311760 09 Jan 18:34 5_YEAR_912828UE8_20130109.dat
-rw-r--r-- 1 4012 60 54215712 09 Jan 18:35 7_YEAR_912828UF5_20130109.dat
-rw-r--r-- 1 4012 60 88014103 09 Jan 18:37 10_YEAR_912828TY6_20130109.dat
-rw-r--r-- 1 4012 60 53565072 09 Jan 18:38 30_YEAR_912810QY7_20130109.dat
-rw-r--r-- 1 4012 60 8527412 04 Jan 18:31 2_YEAR_912828UD0_20130104.dat
-rw-r--r-- 1 4012 60 21659138 04 Jan 18:31 3_YEAR_912828UC2_20130104.dat
-rw-r--r-- 1 4012 60 91281894 04 Jan 18:34 5_YEAR_912828UE8_20130104.dat
-rw-r--r-- 1 4012 60 80421507 04 Jan 18:36 7_YEAR_912828UF5_20130104.dat
-rw-r--r-- 1 4012 60 108700356 04 Jan 18:38 10_YEAR_912828TY6_20130104.dat
-rw-r--r-- 1 4012 60 50204292 04 Jan 18:39 30_YEAR_912810QY7_20130104.dat
-rw-r--r-- 1 4012 60 2319656 07 Jan 18:24 2_YEAR_912828UD0_20130107.dat
-rw-r--r-- 1 4012 60 6978760 07 Jan 18:24 3_YEAR_912828UC2_20130107.dat
-rw-r--r-- 1 4012 60 53579177 07 Jan 18:25 5_YEAR_912828UE8_20130107.dat
-rw-r--r-- 1 4012 60 46069381 07 Jan 18:26 7_YEAR_912828UF5_20130107.dat
-rw-r--r-- 1 4012 60 70802355 07 Jan 18:28 10_YEAR_912828TY6_20130107.dat
-rw-r--r-- 1 4012 60 43050822 07 Jan 18:29 30_YEAR_912810QY7_20130107.dat
-rw-r--r-- 1 4012 60 2713906 08 Jan 18:31 2_YEAR_912828UD0_20130108.dat
-rw-r--r-- 1 4012 60 8889264 08 Jan 18:31 3_YEAR_912828UC2_20130108.dat
-rw-r--r-- 1 4012 60 63857903 08 Jan 18:32 5_YEAR_912828UE8_20130108.dat
-rw-r--r-- 1 4012 60 55544096 08 Jan 18:34 7_YEAR_912828UF5_20130108.dat
-rw-r--r-- 1 4012 60 89750161 08 Jan 18:36 10_YEAR_912828TY6_20130108.dat
-rw-r--r-- 1 4012 60 59233399 08 Jan 18:37 30_YEAR_912810QY7_20130108.dat

file.st_mtime is an integer timestamp.
dayToStr is a string.
In Python2, integers always compare less than strings for the rather arbitrary reason that the i in int comes before the s in str alphabetically:
In [123]: 1234 < 'foobar'
Out[123]: True
In Python3, comparing an int to a str raises a TypeError:
>>> 1234 < 'foobar'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()
Instead, compare datetime objects:
import datetime as DT
import os
yesterday = DT.datetime.now() - DT.timedelta(days=1)
# or, if you want 00:00 AM, yesterday:
# yesterday = DT.datetime.now().replace(hour = 0, minute = 0, second = 0, microsecond = 0) - DT.timedelta(days=1)
file_list_attr = sftp.listdir_attr()
for pfile in file_list_attr:
if DT.datetime.fromtimestamp(pfile.st_mtime) > yesterday:
print pfile
References:
datetime.fromtimestamp: This was used to convert the timestamp to a DT.datetime object.
datetime.replace: This was suggested for setting the hours, minutes, seconds (of yesterday) back to zero.

Appears to fail when comparing to 'yesterday'
for pfile in file_list_attr:
print DT.datetime.fromtimestamp(pfile.st_mtime)
2013-01-09 18:32:06
2013-01-09 18:32:22
2013-01-09 18:34:07
2013-01-09 18:35:27
2013-01-09 18:37:38
for pfile in file_list_attr:
print DT.datetime.fromtimestamp(pfile.st_mtime) > yesterday
Traceback (most recent call last):
File "<pyshell#41>", line 2, in <module>
print DT.datetime.fromtimestamp(pfile.st_mtime) > yesterday
TypeError: can't compare datetime.datetime to datetime.date

here's an example of how you can:
list all the files in a directory
Print all the files that meet the condition of been modified 24h ago
# Task: grab files ONLY modified in the past 24 hours
import os
import datetime
myPath = "/users/george/documents/"
# Adding all the files found in myFolder in a collection
fileCollection = os.listdir(myPath)
# Iterating through the files, printing their last modified date
for i in fileCollection:
# Getting the timestamp in a variable
fileModTimeStamp = os.path.getmtime(myPath + str(i))
fileModDateTime = datetime.datetime.fromtimestamp(fileModTimeStamp)
# Calculating the time delta
currentTime = datetime.datetime.now()
timeElapsed = currentTime - fileModDateTime
# 24h dimedelta
twentyFourHours = datetime.datetime(1900, 1, 2, 0, 0, 0, 0) - datetime.datetime(1900, 1, 1, 0, 0, 0, 0)
# Print the files that meet the condition
if timeElapsed <= twentyFourHours:
print "The File: " + str(i) + " Was Last Modified At: " + str(fileModDateTime) + " ,Which was about: " \
+ str(timeElapsed) + " ago."

I dont believe the os module will work as I am using paramiko to SFTP to the remote host and perform actions on the files in the directory
for filename in file_list_attr:
mtime = os.path.getmtime(filename)
print mtime
Traceback (most recent call last):
File "<pyshell#22>", line 2, in <module>
mtime = os.path.getmtime(filename)
File "U:\ActivPy\lib\genericpath.py", line 54, in getmtime
return os.stat(filename).st_mtime
TypeError: coercing to Unicode: need string or buffer, SFTPAttributes found

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Order a sequence of dates as they occur in calendar year - python

There is a utility called sort with an option -M for sorting by month. If you have it installed, you could use that. For instance: sort -k1 -M test.txt -k1: First column -M: Sort by month Edited per twalberg's suggestion below: sort -k1,1M -k2,2n test.txt

Related

python compressed 4Gb bz2 EOFError: end of stream was already found nested subfolders

Python gzip gives null bytes

Python regex is not extracting a substring from my log file

Using re.search in python filter function

Python time delta

Categories

Resources