I'm trying to parse some log files in Python, but my responses always return only null bytes.
I've confirmed that the file in question does contain data:
$ zcat Events.log.gz | wc -c
188371128
$ zcat Events.log.gz | head
17 Jan 2018 08:10:35,863: {"deviceType":"A16ZV8BU3SN1N3",[REDACTED]}
17 Jan 2018 08:10:35,878: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:35,886: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:35,911: {"deviceType":"A2CZFJ2RKY7SE2",[REDACTED]}
17 Jan 2018 08:10:35,937: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:35,963: {"appOtaState":"ota",[REDACTED]}
17 Jan 2018 08:10:35,971: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:36,006: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:36,013: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:36,041: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
But attempting to read it in Python gives only null bytes:
$ python
Python 2.6.9 (unknown, Sep 14 2016, 17:46:59)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> filename = 'Events.log.gz'
>>> import gzip
>>> content = gzip.open(filename).read()
>>> len(content)
188371128
>>> for i in range(10):
... content[i*10000:(i*10000)+10]
...
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I've tried explicitly setting 'mode' to either 'r' or 'rb', with no difference in result.
I've also tried subprocess.Popen(['zcat', filename], stdout=subprocess.PIPE).stdout.read(), with the same response.
Perhaps relevantly, when I tried to zcat the file to another file, the output was a binary file:
$ zcat Events.log.gz > /tmp/logoutput
$ less /tmp/logoutput
"/tmp/logoutput" may be a binary file. See it anyway?
[y]
^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#...
$ head /tmp/logoutput
17 Jan 2018 08:10:35,863: {"deviceType":"A16ZV8BU3SN1N3",[REDACTED]}
17 Jan 2018 08:10:35,878: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:35,886: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:35,911: {"deviceType":"A2CZFJ2RKY7SE2",[REDACTED]}
17 Jan 2018 08:10:35,937: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:35,963: {"appOtaState":"ota",[REDACTED]}
17 Jan 2018 08:10:35,971: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:36,006: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:36,013: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:36,041: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
Related
I have a text file that is output from a command that I ran with Netmiko to retrieve data from a Cisco WLC of things that are causing interference on our WiFi network. I stripped out just what I needed from the original 600k lines of code down to a couple thousand lines like this:
AP Name.......................................... 010-HIGH-FL4-AP04
Microwave Oven 11 10 -59 Mon Dec 18 08:21:23 2017
WiMax Mobile 11 0 -84 Fri Dec 15 17:09:45 2017
WiMax Fixed 11 0 -68 Tue Dec 12 09:29:30 2017
AP Name.......................................... 010-2nd-AP04
Microwave Oven 11 10 -61 Sat Dec 16 11:20:36 2017
WiMax Fixed 11 0 -78 Mon Dec 11 12:33:10 2017
AP Name.......................................... 139-FL1-AP03
Microwave Oven 6 18 -51 Fri Dec 15 12:26:56 2017
AP Name.......................................... 010-HIGH-FL3-AP04
Microwave Oven 11 10 -55 Mon Dec 18 07:51:23 2017
WiMax Mobile 11 0 -83 Wed Dec 13 16:16:26 2017
The goal is to end up with a csv file that strips out the 'AP Name ...' and puts what left on the same line as the rest of the information in the next line. The problem is some have two lines below the AP name and some have 1 or none. I have been at it for 8 hours and cannot find the best way to make this happen.
This is the latest version of code that I was trying to use, any suggestions for making this work? I just want something I can load up in excel and create a report with:
with open(outfile_name, 'w') as out_file:
with open('wlc-interference_raw.txt', 'r')as in_file:
#Variables
_ap_name = ''
_temp = ''
_flag = False
for i in in_file:
if 'AP Name' in i:
#write whatever was put in the temp file to disk because new ap now
#add another temp variable in case an ap has more than 1 interferer and check if new AP name
out_file.write(_temp)
out_file.write('\n')
#print(_temp)
_ap_name = i.lstrip('AP Name.......................................... ')
_ap_name = _ap_name.rstrip('\n')
_temp = _ap_name
#print(_temp)
elif '----' in i:
pass
elif 'Class Type' in i:
pass
else:
line_split = i.split()
for x in line_split:
_temp += ','
_temp += x
_temp += '\n'
I think your best option is to read all lines of the file, then split into sections starting with AP Name. Then you can work on parsing each section.
Example
s = """AP Name.......................................... 010-HIGH-FL4-AP04
Microwave Oven 11 10 -59 Mon Dec 18 08:21:23 2017
WiMax Mobile 11 0 -84 Fri Dec 15 17:09:45 2017
WiMax Fixed 11 0 -68 Tue Dec 12 09:29:30 2017
AP Name.......................................... 010-2nd-AP04
Microwave Oven 11 10 -61 Sat Dec 16 11:20:36 2017
WiMax Fixed 11 0 -78 Mon Dec 11 12:33:10 2017
AP Name.......................................... 139-FL1-AP03
Microwave Oven 6 18 -51 Fri Dec 15 12:26:56 2017
AP Name.......................................... 010-HIGH-FL3-AP04
Microwave Oven 11 10 -55 Mon Dec 18 07:51:23 2017
WiMax Mobile 11 0 -83 Wed Dec 13 16:16:26 2017"""
import re
class AP:
"""
A class holding each section of the parsed file
"""
def __init__(self):
self.header = ""
self.content = []
sections = []
section = None
for line in s.split('\n'): # Or 'for line in file:'
# Starting new section
if line.startswith('AP Name'):
# If previously had a section, add to list
if section is not None:
sections.append(section)
section = AP()
section.header = line
else:
if section is not None:
section.content.append(line)
sections.append(section) # Add last section outside of loop
for section in sections:
ap_name = section.header.lstrip("AP Name.") # lstrip takes all the characters given, not a literal string
for line in section.content:
print(ap_name + ",", end="")
# You can extract the date separately, if needed
# Splitting on more than one space using a regex
line = ",".join(re.split(r'\s\s+', line))
print(line.rstrip(',')) # Remove trailing comma from imperfect split
Output
010-HIGH-FL4-AP04,Microwave Oven,11,10,-59,Mon Dec 18 08:21:23 2017
010-HIGH-FL4-AP04,WiMax Mobile,11,0,-84,Fri Dec 15 17:09:45 2017
010-HIGH-FL4-AP04,WiMax Fixed,11,0,-68,Tue Dec 12 09:29:30 2017
010-2nd-AP04,Microwave Oven,11,10,-61,Sat Dec 16 11:20:36 2017
010-2nd-AP04,WiMax Fixed,11,0,-78,Mon Dec 11 12:33:10 2017
139-FL1-AP03,Microwave Oven,6,18,-51,Fri Dec 15 12:26:56 2017
010-HIGH-FL3-AP04,Microwave Oven,11,10,-55,Mon Dec 18 07:51:23 2017
010-HIGH-FL3-AP04,WiMax Mobile,11,0,-83,Wed Dec 13 16:16:26 2017
Tip:
You don't need Python to write the CSV, you can output to a file using the command line
python script.py > output.csv
I have the following dataframe:
name Jan Feb Mar Apr May Jun Jul Aug \
0 IBM 156.08 160.01 159.81 165.22 172.25 167.15 164.75 152.77
1 MSFT 45.51 43.08 42.13 43.47 47.53 45.96 45.61 45.51
2 GOOGLE 512.42 537.99 559.72 540.50 535.24 532.92 590.09 636.84
3 APPLE 110.64 125.43 125.97 127.29 128.76 127.81 125.34 113.39
Sep Oct Nov Dec
0 145.36 146.11 137.21 137.96
1 43.56 48.70 53.88 55.40
2 617.93 663.59 735.39 755.35
3 112.80 113.36 118.16 111.73
Which I want to transform into the following:
Month AAPL GOOG IBM
0 Jan 117.160004 534.522445 153.309998
1 Feb 128.460007 558.402511 161.940002
2 Mar 124.430000 548.002468 160.500000
3 Apr 125.150002 537.340027 171.289993
4 May 130.279999 532.109985 169.649994
I've been fiddling around with melt and pivot but have no idea how to get this to work.
Any advice would be appreciated.
Thanks
Let's use 'set_index','rename_axis', and T for transpose.
df.set_index('name')\
.rename_axis(None).T\
.rename_axis('Month')\
.reset_index()
Output:
Month IBM MSFT GOOGLE APPLE
0 Jan 156.08 45.51 512.42 110.64
1 Feb 160.01 43.08 537.99 125.43
2 Mar 159.81 42.13 559.72 125.97
3 Apr 165.22 43.47 540.50 127.29
4 May 172.25 47.53 535.24 128.76
5 Jun 167.15 45.96 532.92 127.81
6 Jul 164.75 45.61 590.09 125.34
7 Aug 152.77 45.51 636.84 113.39
Creative Way
pd.DataFrame({'Month': df.columns[1:]}).assign(**{c: v for c, *v in df.values})
Month APPLE GOOGLE IBM MSFT
0 Jan 110.64 512.42 156.08 45.51
1 Feb 125.43 537.99 160.01 43.08
2 Mar 125.97 559.72 159.81 42.13
3 Apr 127.29 540.50 165.22 43.47
4 May 128.76 535.24 172.25 47.53
5 Jun 127.81 532.92 167.15 45.96
6 Jul 125.34 590.09 164.75 45.61
7 Aug 113.39 636.84 152.77 45.51
8 Sep 112.80 617.93 145.36 43.56
9 Oct 113.36 663.59 146.11 48.70
10 Nov 118.16 735.39 137.21 53.88
11 Dec 111.73 755.35 137.96 55.40
Set_index basically changes the group by element and groups data according to the index specified by you so If you set_index to name of the group this should solve your issue.
I've got a series of pipes to convert dates in a text file into unique, human readable output and pull out MM DD. Now I would like to resort the output so that the dates display in the order in which they occur during the year. Anybody know a good technique using the standard shell or with a readily installable package on *nix?
Feb 4
Feb 5
Feb 6
Feb 7
Feb 8
Jan 1
Jan 10
Jan 11
Jan 12
Jan 13
Jan 2
Jan 25
Jan 26
Jan 27
Jan 28
Jan 29
Jan 3
Jan 30
Jan 31
Jan 4
Jan 5
Jan 6
Jan 7
Jan 8
Jan 9
There is a utility called sort with an option -M for sorting by month. If you have it installed, you could use that. For instance:
sort -k1 -M test.txt
-k1: First column
-M: Sort by month
Edited per twalberg's suggestion below:
sort -k1,1M -k2,2n test.txt
In two steps:
$ while read line; do date -d "$line" "+%Y%m%d"; done < file | sort -n > temp
$ while read line; do date -d "$line" "+%b %d"; done < temp > file
Firstly we convert dates to YYYYMMDD and order them:
$ while read line; do date -d "$line" "+%Y%m%d"; done < file | sort -n > temp
$ cat temp
20130101
20130102
20130103
20130104
20130105
20130106
20130107
20130108
20130109
20130110
20130111
20130112
20130113
20130125
20130126
20130127
20130128
20130129
20130130
20130131
20130204
20130205
20130206
20130207
20130208
Then we print them back to previous format %b %d:
$ while read line; do date -d "$line" "+%b %d"; done < temp > file
$ cat file
Jan 01
Jan 02
Jan 03
Jan 04
Jan 05
Jan 06
Jan 07
Jan 08
Jan 09
Jan 10
Jan 11
Jan 12
Jan 13
Jan 25
Jan 26
Jan 27
Jan 28
Jan 29
Jan 30
Jan 31
Feb 04
Feb 05
Feb 06
Feb 07
Feb 08
and sed -n "1 {
H
x
s/.(\n)./01 Jan\102 Feb\103 Mar\104 Apr\105 May\106 Jun\107 Jul\105 Aug\109 Sep\110 Oct\111 Nov\112 Dec/
x
}
s/^\(.\{3\}\) \([0-9]\) *$/\1 0\2/
H
$ {
x
t subs
: subs
s/^\([0-9]\{2\}\) \([[:alpha:]]\{3\}\)\(\n\)\(.*\)\n\2/\1 \2\3\4\3\1 \2/
t subs
s/^[0-9]\{2\} [[:alpha:]]\{3\}\n//
t subs
p
}
" | sort | sed "s/^[0-9][0-9] //"
still need a sort (or a lot more complex sed for sorting) and when sort -M doesn't work
I have a complicated python server app, that runs constantly all the time. Below is a very simplified version of it.
When I run the below app using python; "python Main.py". It uses 8mb of ram straight away, and stays at 8mb of ram, as it should.
When I run it using pypy "pypy Main.py". It begins by using 22mb of ram and over time the ram usage grows. After a 30 seconds its at 50mb, after an hour its at 60mb.
If I change the "b.something()" to be "pass" it doesn't gobble up memory like that.
I'm using pypy 1.9 on OSX 10.7.4
I'm okay with pypy using more ram than python.
Is there a way to stop pypy from eating up memory over long periods of time?
import sys
import time
import traceback
class Box(object):
def __init__(self):
self.counter = 0
def something(self):
self.counter += 1
if self.counter > 100:
self.counter = 0
try:
print 'starting...'
boxes = []
for i in range(10000):
boxes.append(Box())
print 'running!'
while True:
for b in boxes:
b.something()
time.sleep(0.02)
except KeyboardInterrupt:
print ''
print '####################################'
print 'KeyboardInterrupt Exception'
sys.exit(1)
except Exception as e:
print ''
print '####################################'
print 'Main Level Exception: %s' % e
print traceback.format_exc()
sys.exit(1)
Below is a list of times and the ram usage at that time (I left it running over night).
Wed Sep 5 22:57:54 2012, 22mb ram
Wed Sep 5 22:57:54 2012, 23mb ram
Wed Sep 5 22:57:56 2012, 24mb ram
Wed Sep 5 22:57:56 2012, 25mb ram
Wed Sep 5 22:57:58 2012, 26mb ram
Wed Sep 5 22:57:58 2012, 27mb ram
Wed Sep 5 22:57:59 2012, 29mb ram
Wed Sep 5 22:57:59 2012, 30mb ram
Wed Sep 5 22:58:00 2012, 31mb ram
Wed Sep 5 22:58:02 2012, 32mb ram
Wed Sep 5 22:58:03 2012, 33mb ram
Wed Sep 5 22:58:05 2012, 34mb ram
Wed Sep 5 22:58:08 2012, 35mb ram
Wed Sep 5 22:58:10 2012, 36mb ram
Wed Sep 5 22:58:12 2012, 38mb ram
Wed Sep 5 22:58:13 2012, 39mb ram
Wed Sep 5 22:58:16 2012, 40mb ram
Wed Sep 5 22:58:19 2012, 41mb ram
Wed Sep 5 22:58:21 2012, 42mb ram
Wed Sep 5 22:58:23 2012, 43mb ram
Wed Sep 5 22:58:26 2012, 44mb ram
Wed Sep 5 22:58:28 2012, 45mb ram
Wed Sep 5 22:58:31 2012, 46mb ram
Wed Sep 5 22:58:33 2012, 47mb ram
Wed Sep 5 22:58:35 2012, 49mb ram
Wed Sep 5 22:58:35 2012, 50mb ram
Wed Sep 5 22:58:36 2012, 51mb ram
Wed Sep 5 22:58:36 2012, 52mb ram
Wed Sep 5 22:58:37 2012, 54mb ram
Wed Sep 5 22:59:41 2012, 55mb ram
Wed Sep 5 22:59:45 2012, 56mb ram
Wed Sep 5 22:59:45 2012, 57mb ram
Wed Sep 5 23:00:58 2012, 58mb ram
Wed Sep 5 23:02:20 2012, 59mb ram
Wed Sep 5 23:02:20 2012, 60mb ram
Wed Sep 5 23:02:27 2012, 61mb ram
Thu Sep 6 00:18:00 2012, 62mb ram
http://doc.pypy.org/en/latest/gc_info.html#minimark-environment-variables shows how to tweak the gc
Compared to cpython, pypy uses different garbage collection strategies.
If the increase in memory is due to something in your program, you could try to run a forced garbage collection every now and then, by using the collect function from the gc module. In this case, it might also help to explicitly del large objects that you don't need anymore and that don't go out of scope.
If it is due to the internal workings of pypy, it might be worth it submitting a bug report, as Mark Dickinson suggested.
I have a script that is parsing out fields within email headers that represent dates and times. Some examples of these strings are as follows:
Fri, 10 Jun 2011 11:04:17 +0200 (CEST)
Tue, 1 Jun 2011 11:04:17 +0200
Wed, 8 Jul 1992 4:23:11 -0200
Wed, 8 Jul 1992 4:23:11 -0200 EST
Before I was confronted with the CEST/EST portions at the ends of some the strings I had things working pretty well just using datetime.datetime.strptime like this:
msg['date'] = 'Wed, 8 Jul 1992 4:23:11 -0200'
mail_date = datetime.datetime.strptime(msg['date'][:-6], '%a, %d %b %Y %H:%M:%S')
I tried to put a regex together to match the date portions of the string while excluding the timezone information at the end, but I was having issues with the regex (I couldn't match a colon).
Is using a regex the best way to parse all of the examples above? If so, could someone share a regex that would match these examples? In the end I am looking to have a datetime object.
From python time to age part 2, timezones:
from email import utils
utils.parsedate_tz('Fri, 10 Jun 2011 11:04:17 +0200 (CEST)')
utils.parsedate_tz('Fri, 10 Jun 2011 11:04:17 +0200')
utils.parsedate_tz('Fri, 10 Jun 2011 11:04:17')
The output is:
(2011, 6, 10, 11, 4, 17, 0, 1, -1, 7200)
(2011, 6, 10, 11, 4, 17, 0, 1, -1, 7200)
(2011, 6, 10, 11, 4, 17, 0, 1, -1, None)
Perhaps I misunderstood your question, but wont a simple split suffice?
#!/usr/bin/python
d = ["Fri, 10 Jun 2011 11:04:17 +0200 (CEST)", "Tue, 1 Jun 2011 11:04:17 +0200",
"Wed, 8 Jul 1992 4:23:11 -0200", "Wed, 8 Jul 1992 4:23:11 -0200 EST"]
for i in d:
print " ".join(i.split()[0:5])
Fri, 10 Jun 2011 11:04:17
Tue, 1 Jun 2011 11:04:17
Wed, 8 Jul 1992 4:23:11
Wed, 8 Jul 1992 4:23:11