Python - Parsing a text file into a csv file

Python - Parsing a text file into a csv file - python

I have a text file that is output from a command that I ran with Netmiko to retrieve data from a Cisco WLC of things that are causing interference on our WiFi network. I stripped out just what I needed from the original 600k lines of code down to a couple thousand lines like this:
AP Name.......................................... 010-HIGH-FL4-AP04
Microwave Oven 11 10 -59 Mon Dec 18 08:21:23 2017
WiMax Mobile 11 0 -84 Fri Dec 15 17:09:45 2017
WiMax Fixed 11 0 -68 Tue Dec 12 09:29:30 2017
AP Name.......................................... 010-2nd-AP04
Microwave Oven 11 10 -61 Sat Dec 16 11:20:36 2017
WiMax Fixed 11 0 -78 Mon Dec 11 12:33:10 2017
AP Name.......................................... 139-FL1-AP03
Microwave Oven 6 18 -51 Fri Dec 15 12:26:56 2017
AP Name.......................................... 010-HIGH-FL3-AP04
Microwave Oven 11 10 -55 Mon Dec 18 07:51:23 2017
WiMax Mobile 11 0 -83 Wed Dec 13 16:16:26 2017
The goal is to end up with a csv file that strips out the 'AP Name ...' and puts what left on the same line as the rest of the information in the next line. The problem is some have two lines below the AP name and some have 1 or none. I have been at it for 8 hours and cannot find the best way to make this happen.
This is the latest version of code that I was trying to use, any suggestions for making this work? I just want something I can load up in excel and create a report with:
with open(outfile_name, 'w') as out_file:
with open('wlc-interference_raw.txt', 'r')as in_file:
#Variables
_ap_name = ''
_temp = ''
_flag = False
for i in in_file:
if 'AP Name' in i:
#write whatever was put in the temp file to disk because new ap now
#add another temp variable in case an ap has more than 1 interferer and check if new AP name
out_file.write(_temp)
out_file.write('\n')
#print(_temp)
_ap_name = i.lstrip('AP Name.......................................... ')
_ap_name = _ap_name.rstrip('\n')
_temp = _ap_name
#print(_temp)
elif '----' in i:
pass
elif 'Class Type' in i:
pass
else:
line_split = i.split()
for x in line_split:
_temp += ','
_temp += x
_temp += '\n'

I think your best option is to read all lines of the file, then split into sections starting with AP Name. Then you can work on parsing each section.
Example
s = """AP Name.......................................... 010-HIGH-FL4-AP04
Microwave Oven 11 10 -59 Mon Dec 18 08:21:23 2017
WiMax Mobile 11 0 -84 Fri Dec 15 17:09:45 2017
WiMax Fixed 11 0 -68 Tue Dec 12 09:29:30 2017
AP Name.......................................... 010-2nd-AP04
Microwave Oven 11 10 -61 Sat Dec 16 11:20:36 2017
WiMax Fixed 11 0 -78 Mon Dec 11 12:33:10 2017
AP Name.......................................... 139-FL1-AP03
Microwave Oven 6 18 -51 Fri Dec 15 12:26:56 2017
AP Name.......................................... 010-HIGH-FL3-AP04
Microwave Oven 11 10 -55 Mon Dec 18 07:51:23 2017
WiMax Mobile 11 0 -83 Wed Dec 13 16:16:26 2017"""
import re
class AP:
"""
A class holding each section of the parsed file
"""
def __init__(self):
self.header = ""
self.content = []
sections = []
section = None
for line in s.split('\n'): # Or 'for line in file:'
# Starting new section
if line.startswith('AP Name'):
# If previously had a section, add to list
if section is not None:
sections.append(section)
section = AP()
section.header = line
else:
if section is not None:
section.content.append(line)
sections.append(section) # Add last section outside of loop
for section in sections:
ap_name = section.header.lstrip("AP Name.") # lstrip takes all the characters given, not a literal string
for line in section.content:
print(ap_name + ",", end="")
# You can extract the date separately, if needed
# Splitting on more than one space using a regex
line = ",".join(re.split(r'\s\s+', line))
print(line.rstrip(',')) # Remove trailing comma from imperfect split
Output
010-HIGH-FL4-AP04,Microwave Oven,11,10,-59,Mon Dec 18 08:21:23 2017
010-HIGH-FL4-AP04,WiMax Mobile,11,0,-84,Fri Dec 15 17:09:45 2017
010-HIGH-FL4-AP04,WiMax Fixed,11,0,-68,Tue Dec 12 09:29:30 2017
010-2nd-AP04,Microwave Oven,11,10,-61,Sat Dec 16 11:20:36 2017
010-2nd-AP04,WiMax Fixed,11,0,-78,Mon Dec 11 12:33:10 2017
139-FL1-AP03,Microwave Oven,6,18,-51,Fri Dec 15 12:26:56 2017
010-HIGH-FL3-AP04,Microwave Oven,11,10,-55,Mon Dec 18 07:51:23 2017
010-HIGH-FL3-AP04,WiMax Mobile,11,0,-83,Wed Dec 13 16:16:26 2017
Tip:
You don't need Python to write the CSV, you can output to a file using the command line
python script.py > output.csv

Related

calculate churn based on 21 days interval

I worked on calculating churn using the mix of pandas dataframe of git logs and git show command for a particular commit to see where exactly the changes has been done based on loc. However, I could not able to calculate churn based on the days i.e. I mean calculate churn when an engineer rewrites or deletes their own code that is less than 3 weeks old.
This is how I have done for such dataframe for each commit based
git logs dataframe
sha timestamp date author message body age insertion deletion filepath churn merges
1 1 cae635054 Sat Jun 26 14:51:23 2021 -0400 2021-06-26 18:51:23+00:00 Andrew Clark `act`: Resolve to return value of scope function (#21759) When migrating some internal tests I found it annoying that I couldn't -24 days +12:21:32.839997
2 21 cae635054 Sat Jun 26 14:51:23 2021 -0400 2021-06-26 18:51:23+00:00 Andrew Clark `act`: Resolve to return value of scope function (#21759) When migrating some internal tests I found it annoying that I couldn't -24 days +12:21:32.839997 31.0 0.0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js 31.0
3 22 cae635054 Sat Jun 26 14:51:23 2021 -0400 2021-06-26 18:51:23+00:00 Andrew Clark `act`: Resolve to return value of scope function (#21759) When migrating some internal tests I found it annoying that I couldn't -24 days +12:21:32.839997 1.0 1.0 packages/react-test-renderer/src/ReactTestRenderer.js 0.0
4 23 cae635054 Sat Jun 26 14:51:23 2021 -0400 2021-06-26 18:51:23+00:00 Andrew Clark `act`: Resolve to return value of scope function (#21759) When migrating some internal tests I found it annoying that I couldn't -24 days +12:21:32.839997 24.0 14.0 packages/react/src/ReactAct.js 10.0
5 25 e2453e200 Fri Jun 25 15:39:46 2021 -0400 2021-06-25 19:39:46+00:00 Andrew Clark act: Add test for bypassing queueMicrotask (#21743) Test for fix added in #21740 -25 days +13:09:55.839997 50.0 0.0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js 50.0
6 27 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn DevTools: Update tests to fix warnings/errors (#21748) Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18) -26 days +20:12:53.839997 4.0 5.0 packages/react-devtools-shared/src/__tests__/FastRefreshDevToolsIntegration-test.js -1.0
7 28 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn DevTools: Update tests to fix warnings/errors (#21748) Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18) -26 days +20:12:53.839997 4.0 4.0 packages/react-devtools-shared/src/__tests__/componentStacks-test.js 0.0
8 29 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn DevTools: Update tests to fix warnings/errors (#21748) Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18) -26 days +20:12:53.839997 12.0 12.0 packages/react-devtools-shared/src/__tests__/console-test.js 0.0
9 30 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn DevTools: Update tests to fix warnings/errors (#21748) Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18) -26 days +20:12:53.839997 7.0 6.0 packages/react-devtools-shared/src/__tests__/editing-test.js 1.0
10 31 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn DevTools: Update tests to fix warnings/errors (#21748) Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18) -26 days +20:12:53.839997 47.0 42.0 packages/react-devtools-shared/src/__tests__/inspectedElement-test.js 5.0
11 32 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn DevTools: Update tests to fix warnings/errors (#21748) Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18) -26 days +20:12:53.839997 7.0 6.0 packages/react-devtools-shared/src/__tests__/ownersListContext-test.js 1.0
12 33 73ffce1b6 Thu Jun 24 22:42:44 2021 -0400 2021-06-25 02:42:44+00:00 Brian Vaughn DevTools: Update tests to fix warnings/errors (#21748) Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18) -26 days +20:12:53.839997 22.0 21.0 packages/react-devtools-shared/src/__tests__/profilerContext-test.js 1.0
churn calculation
commits = df["sha"].unique().tolist()
for commit in commits:
contribution, churn = await self.calculate_churn(commit)
async def calculate_churn(self, stream):
PREVIOUS_BASE_DIR = os.path.abspath("")
try:
GIT_DIR = os.path.join(PREVIOUS_BASE_DIR, "app/git/react.git")
os.chdir(GIT_DIR)
except FileNotFoundError as e:
raise ValueError(e)
cmd = f"git show --format= --unified=0 --no-prefix {stream}"
cmds = [f"{cmd}"]
results = get_proc_out(cmds)
[files, contribution, churn] = get_loc(results)
# need to circle back to previous path
os.chdir(PREVIOUS_BASE_DIR)
return contribution, churn
def is_new_file(result, file):
# search for destination file (+++ ) and update file variable
if result.startswith("+++"):
return result[result.rfind(" ") + 1 :]
else:
return file
def is_loc_change(result, loc_changes):
# search for loc changes (## ) and update loc_changes variable
# ## -1,5 +1,4 ##
# ## -l,s +l,s ##
if result.startswith("##"):
# loc_change = result[2+1: ] -> -1,5 +1,4 ##
loc_change = result[result.find(" ") + 1 :]
# loc_change = loc_change[:9] -> -1,5 +1,4
loc_change = loc_change[: loc_change.find(" ##")]
return loc_change
else:
return loc_changes
def get_loc_change(loc_changes):
# removals
# -1,5 +1,4 = -1,5
left = loc_changes[: loc_changes.find(" ")]
left_dec = 0
# 2
if left.find(",") > 0:
# 2
comma = left.find(",")
# 5
left_dec = int(left[comma + 1 :])
# 1
left = int(left[1:comma])
else:
left = int(left[1:])
left_dec = 1
# additions
# +1,4
right = loc_changes[loc_changes.find(" ") + 1 :]
right_dec = 0
if right.find(",") > 0:
comma = right.find(",")
right_dec = int(right[comma + 1 :])
right = int(right[1:comma])
else:
right = int(right[1:])
right_dec = 1
if left == right:
return {left: (right_dec - left_dec)}
else:
return {left: left_dec, right: right_dec}
def get_loc(results):
files = {}
contribution = 0
churn = 0
file = ""
loc_changes = ""
for result in results:
new_file = is_new_file(result, file)
if file != new_file:
file = new_file
if file not in files:
files[file] = {}
else:
new_loc_changes = is_loc_change(
result, loc_changes
) # returns either empmty or -6 +6 or -13, 0 +14, 2 format
if loc_changes != new_loc_changes:
loc_changes = new_loc_changes
locc = get_loc_change(loc_changes) # {2: 0} or {8: 0, 9: 1}
for loc in locc:
# files[file] = {2: 0, 8: 0, 9: 1}
# print("loc", loc, files[file], locc[loc])
if loc in files[file]:
# change of lines triggered
files[file][loc] += locc[loc]
churn += abs(locc[loc])
else:
files[file][loc] = locc[loc]
contribution += abs(locc[loc])
else:
continue
return [files, contribution, churn]
How can I utilize this same code but check churn only if there is changes in code that is only 3 weeks old?

The only practical way to do this is to iterate through the DataDrame, and because that sucks with pandas, it almost always means you have the wrong data structure. If you're not doing numerical analysis, and it looks like you aren't, then just keep a simple list of dicts. Pandas has its shining points, but it's not a universal database.
Here's the rough code you'd need, although I'm glossing over details:
# Go through the df row by row.
lastdate = {}
for index,row in df.iterrows():
if row['filepath'] in lastdate:
if lastdate[row['filepath']] - row['date'] < timedelta(days=21):
print( "Last change to", row['filepath'], "was within three weeks" )
lastdate[row['filepath']] = row['date']

python compressed 4Gb bz2 EOFError: end of stream was already found nested subfolders

I'm trying to read a specific file from a compressed file bz2 using python.
tar = tarfile.open(filename, "r|bz2", bufsize=57860311)
for tarinfo in tar:
print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
if tarinfo.isreg():
print "a regular file."
# read the file
f = tar.extractfile(tarinfo)
#print f.read()
elif tarinfo.isdir():
print "a directory."
else:
print "something else."
tar.close()
But at the end I got the error:
/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.pyc in read(self, size)
577 buf = "".join(t)
578 else:
--> 579 buf = self._read(size)
580 self.pos += len(buf)
581 return buf
/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.pyc in _read(self, size)
594 break
595 try:
--> 596 buf = self.cmp.decompress(buf)
597 except IOError:
598 raise ReadError("invalid compressed data")
EOFError: end of stream was already found
I also tried to list the files within the tar through 'tar.list()' and again ...
-rwxr-xr-x lindauer/or3uunp 0 2013-05-21 00:58:36 r3.2/
-rw-r--r-- lindauer/or3uunp 6057 2012-01-05 14:41:00 r3.2/readme.txt
-rw-r--r-- lindauer/or3uunp 44732 2012-01-04 10:08:54 r3.2/psychometric.csv
-rw-r--r-- lindauer/or3uunp 57860309 2012-01-04 09:58:20 r3.2/logon.csv
/usr/local/Cellar/python#2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.pyc in _read(self, size)
594 break
595 try:
--> 596 buf = self.cmp.decompress(buf)
597 except IOError:
598 raise ReadError("invalid compressed data")
EOFError: end of stream was already found
I listed the files inside the archive using the tar command. Here is the result:
tar -tvf r3.2.tar.bz2
drwxr-xr-x 0 lindauer or3uunp 0 May 21 2013 r3.2/
-rw-r--r-- 0 lindauer or3uunp 6057 Jan 5 2012 r3.2/readme.txt
-rw-r--r-- 0 lindauer or3uunp 44732 Jan 4 2012 r3.2/psychometric.csv
-rw-r--r-- 0 lindauer or3uunp 57860309 Jan 4 2012 r3.2/logon.csv
-rw-r--r-- 0 lindauer or3uunp 12494829865 Jan 5 2012 r3.2/http.csv
-rw-r--r-- 0 lindauer or3uunp 1066622500 Jan 5 2012 r3.2/email.csv
-rw-r--r-- 0 lindauer or3uunp 218962503 Jan 5 2012 r3.2/file.csv
-rw-r--r-- 0 lindauer or3uunp 29156988 Jan 4 2012 r3.2/device.csv
drwxr-xr-x 0 lindauer or3uunp 0 May 20 2013 r3.2/LDAP/
-rw-r--r-- 0 lindauer or3uunp 140956 Jan 4 2012 r3.2/LDAP/2011-01.csv
-rw-r--r-- 0 lindauer or3uunp 147370 Jan 4 2012 r3.2/LDAP/2010-05.csv
-rw-r--r-- 0 lindauer or3uunp 149221 Jan 4 2012 r3.2/LDAP/2010-02.csv
-rw-r--r-- 0 lindauer or3uunp 141717 Jan 4 2012 r3.2/LDAP/2010-12.csv
-rw-r--r-- 0 lindauer or3uunp 148931 Jan 4 2012 r3.2/LDAP/2010-03.csv
-rw-r--r-- 0 lindauer or3uunp 147370 Jan 4 2012 r3.2/LDAP/2010-04.csv
-rw-r--r-- 0 lindauer or3uunp 149793 Jan 4 2012 r3.2/LDAP/2009-12.csv
-rw-r--r-- 0 lindauer or3uunp 143979 Jan 4 2012 r3.2/LDAP/2010-09.csv
-rw-r--r-- 0 lindauer or3uunp 145591 Jan 4 2012 r3.2/LDAP/2010-07.csv
-rw-r--r-- 0 lindauer or3uunp 139444 Jan 4 2012 r3.2/LDAP/2011-03.csv
-rw-r--r-- 0 lindauer or3uunp 142347 Jan 4 2012 r3.2/LDAP/2010-11.csv
-rw-r--r-- 0 lindauer or3uunp 138285 Jan 4 2012 r3.2/LDAP/2011-04.csv
-rw-r--r-- 0 lindauer or3uunp 149793 Jan 4 2012 r3.2/LDAP/2010-01.csv
-rw-r--r-- 0 lindauer or3uunp 146008 Jan 4 2012 r3.2/LDAP/2010-06.csv
-rw-r--r-- 0 lindauer or3uunp 144711 Jan 4 2012 r3.2/LDAP/2010-08.csv
-rw-r--r-- 0 lindauer or3uunp 137967 Jan 4 2012 r3.2/LDAP/2011-05.csv
-rw-r--r-- 0 lindauer or3uunp 140085 Jan 4 2012 r3.2/LDAP/2011-02.csv
-rw-r--r-- 0 lindauer or3uunp 143420 Jan 4 2012 r3.2/LDAP/2010-10.csv
-r--r--r-- 0 lindauer or3uunp 3923 Jan 4 2012 r3.2/license.txt
I think this is due to the fact the archive has subfolders and for some reason python libraries have problems in dealing with subfolders extractions?
I also tried to open the tar file manually and I have no problems so I don't think the file is corrupted. Any help appreciated.

Comment: I tried the debug=3 and I get : ReadError: bad checksum
Found the following related Infos:
tar: directory checksum error
Cause
This error message from tar(1) indicates that the checksum of the directory and the files it has read from tape does not match the checksum advertised in the header block. Usually this message indicates the wrong blocking factor, although it could indicate corrupt data on tape.
Action
To resolve this problem, make certain that the blocking factor you specify on the command line (after -b) matches the blocking factor originally specified. If in doubt, leave out the block size and let tar(1) determine it automatically. If that remedy does not help, the tape data could be corrupted.
SE:tar-ignore-or-fix-checksum
I'd try the -i switch to see if you can just ignore and messages regarding EOF.
-i, --ignore-zeros ignore zeroed blocks in archive (means EOF)
Example
$ tar xivf backup.tar
bugs.python.org:tarfile-headererror
The comment in tarfile.py reads (Don't know the date of the file!):
- # We shouldn't rely on this checksum, because some tar programs
- # calculate it differently and it is merely validating the
- # header block.
ReadError: unexpected end of data
From the tarfile Documentation
The tarfile module defines the following exceptions:
exception tarfile.ReadError
Is raised when a tar archive is opened, that either cannot be handled by the tarfile module or is somehow invalid.
First, try with another tar archiv file to verify your python environent.
Second, check if your tar archiv file match the following format:
tarfile.DEFAULT_FORMAT
The default format for creating archives. This is currently GNU_FORMAT.
Third, instead of using tarfile.open(...), to create a tarfile instance, try to use the following, to set debug=3.
tar = tarfile.TarFile(name=filename, debug=3)
tar.open()
...
class tarfile.TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)

How to error check list index out of range

I have a read.log file that will have lines such as...
10.2.177.170 Tue Jun 19 03:30:55 CDT 2018
10.2.177.170 Tue Jun 19 03:31:03 CDT 2018
10.2.177.170 Tue Jun 19 03:31:04 CDT 2018
10.2.177.170 Tue Jun 19 03:32:04 CDT 2018
10.2.177.170 Tue Jun 19 03:33:04 CDT 2018
My code will read the 3rd to last line and combine strings. So the normal output would be:
2018:19:03:32:04
My problem is, if there are only 4 or less lines of data such as
10.1.177.170 Tue Jun 19 03:30:55 CDT 2018
10.1.177.170 Tue Jun 19 03:31:03 CDT 2018
10.1.177.170 Tue Jun 19 03:31:04 CDT 2018
10.1.177.170 Tue Jun 19 03:32:04 CDT 2018
I get an error
x1 = line.split()[0]
IndexError: list index out of range
How can I error check this or keep it from happening? I have been trying to check how many lines there are in the log and if less than 5, print a notice. Are there better options?
def run():
f = open('read.log', 'r')
lnumber = dict()
for num,line in enumerate(f,1):
x1 = line.split()[0]
log_day = line.split()[3]
log_time = line.split()[4]
log_year = line.split()[6]
if x1 in lnumber:
lnumber[x1].append((log_year + ":" + log_day + ":" + log_time))
else:
lnumber[x1] = [(num,log_time)]
if x1 in lnumber and len(lnumber.get(x1,None)) > 2:
# if there are less than 3 lines in document, this will fail
line_time = (lnumber[x1][-3].__str__())
print(line_time)
else:
print('nothing')
f.close
run()

f.readlines() gives you a list of lines in a file. So, you could try reading in all the lines in a file:
f = open('firewall.log', 'r')
lines = f.readlines()
And exiting if there are 4 or less lines:
if len(lines) <= 4:
f.close()
print("4 or less lines in file")
exit()
That IndexError you're getting is because you're calling split() on a line with nothing on it. I would suggest doing something like if not line: continue to avoid that case.

Python gzip gives null bytes

I'm trying to parse some log files in Python, but my responses always return only null bytes.
I've confirmed that the file in question does contain data:
$ zcat Events.log.gz | wc -c
188371128
$ zcat Events.log.gz | head
17 Jan 2018 08:10:35,863: {"deviceType":"A16ZV8BU3SN1N3",[REDACTED]}
17 Jan 2018 08:10:35,878: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:35,886: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:35,911: {"deviceType":"A2CZFJ2RKY7SE2",[REDACTED]}
17 Jan 2018 08:10:35,937: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:35,963: {"appOtaState":"ota",[REDACTED]}
17 Jan 2018 08:10:35,971: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:36,006: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:36,013: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:36,041: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
But attempting to read it in Python gives only null bytes:
$ python
Python 2.6.9 (unknown, Sep 14 2016, 17:46:59)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> filename = 'Events.log.gz'
>>> import gzip
>>> content = gzip.open(filename).read()
>>> len(content)
188371128
>>> for i in range(10):
... content[i*10000:(i*10000)+10]
...
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I've tried explicitly setting 'mode' to either 'r' or 'rb', with no difference in result.
I've also tried subprocess.Popen(['zcat', filename], stdout=subprocess.PIPE).stdout.read(), with the same response.
Perhaps relevantly, when I tried to zcat the file to another file, the output was a binary file:
$ zcat Events.log.gz > /tmp/logoutput
$ less /tmp/logoutput
"/tmp/logoutput" may be a binary file. See it anyway?
[y]
^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#...
$ head /tmp/logoutput
17 Jan 2018 08:10:35,863: {"deviceType":"A16ZV8BU3SN1N3",[REDACTED]}
17 Jan 2018 08:10:35,878: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:35,886: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:35,911: {"deviceType":"A2CZFJ2RKY7SE2",[REDACTED]}
17 Jan 2018 08:10:35,937: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:35,963: {"appOtaState":"ota",[REDACTED]}
17 Jan 2018 08:10:35,971: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}
17 Jan 2018 08:10:36,006: {"deviceType":"A2JTEGS8GUPDOF",[REDACTED]}
17 Jan 2018 08:10:36,013: {"deviceType":"A1CTGXB4BA274T",[REDACTED]}
17 Jan 2018 08:10:36,041: {"deviceType":"A1DL2DVDQVK3Q",[REDACTED]}

String formatting in a loop

I've got a directory full of JPG photographs. I want to take the file names of those photographs and end up with the following being printed:
<description>Test. <![CDATA[<img src='.
/files/fantaWP.jpg]>]]></description>
The file name is a variable. I've tried my very best below and i'm nearly there, but I end up with the following output:
<description>Test. <![CDATA[<img src='.
/files/['fantaWP.jpg', 'icon', 'p1.JPG', 'p2.JPG', 'p3.jpg', 'p4.jpg']>]]></description>
Here is my code:
photofileName = []
path='C:\Users\Simon\Desktop\Dir\pics'
dirList=os.listdir(path)
for fname in dirList:
photofileName.append(fname)
print photofileName
photoVar = [x for x in photofileName]
itemsInListOne = 3
iterations = itemsInListOne
num = 0
while num < iterations:
num = num+1
print ("\<description>Test. <![CDATA[<img src='./files/{}'>]]></description>\n".format(photoVar))
Thank you in advance.

The following should be enough if I understand you correctly.
for fname in os.listdir(path):
print("\<description>Test. <![CDATA[<img src='./files/{}'>]]>=</description>\n".format(fname))
Example:
>>> path = "/home/msvalkon/Pictures/Sample Album"
>>> for fname in os.listdir(path):
... print("\<description>Test. <![CDATA[<img src='./files/{}'>]]>=</description>\n".format(fname))
...
...
\<description>Test. <![CDATA[<img src='./files/Costa Rican Frog.jpg'>]]>=</description>
\<description>Test. <![CDATA[<img src='./files/Pensive Parakeet.jpg'>]]>=</description>
\<description>Test. <![CDATA[<img src='./files/Boston City Flow.jpg'>]]>=</description>
>>>
And the content of the path..
msvalkon#Lunkwill:~/Pictures/Sample Album$ ll
total 1208
drwxrwxr-x 2 msvalkon msvalkon 4096 Apr 19 2012 ./
drwxr-xr-x 7 msvalkon msvalkon 28672 Jan 3 18:27 ../
-rw-rw-r-- 1 msvalkon msvalkon 339773 Dec 13 2009 Boston City Flow.jpg
-rw-rw-r-- 1 msvalkon msvalkon 354633 Dec 13 2009 Costa Rican Frog.jpg
-rw-rw-r-- 1 msvalkon msvalkon 480098 Dec 13 2009 Pensive Parakeet.jpg

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Parsing a text file into a csv file - python

Related

calculate churn based on 21 days interval

python compressed 4Gb bz2 EOFError: end of stream was already found nested subfolders

How to error check list index out of range

Python gzip gives null bytes

String formatting in a loop

Categories

Resources