Parse elements from a text string (in a list)

Parse elements from a text string (in a list) - python

I have a list element which is text.
print ((temp_list))
Output:
['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
I wish to get this output:
Aug 7 14:23
I have tried to remove the whitespace but that messes up the output, which makes it harder to separate out the elements I want.

You can split the text and get the 5th, 6th and 9th fields:
f = temp_list[0].split()
print(' '.join((f[4], f[5], f[8])))

Using Regex.
import re
temp_list = ['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
for i in temp_list:
m = re.search(r"(?P<date>(Jun|Jul|Aug|Sep).*?)\(", i)
if m:
print(m.group('date'))
Output:
Aug 7 14:22 - 14:23

sample = 'root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) '
# split the string on space characters
data = sample.split(' ')
# inspect our list in console, the list should now contain mix of words and spaces (empty string)
print(data)
# since empty string evaluates to False in Python, we can remove them like this from our list with filter function
data = filter(lambda x: x, data)
# outputs: ['root', 'pts/3', '100.121.17.73', 'Tue', 'Aug', '7', '14:22', '-', '14:23', '(00:00)']
print(data)
# in the end we collect relevant data by slicing the list
# from index 3rd to 6th and join them into one string with that data separated by one space in-between.
result = ' '.join(data[3:6])
# outputs: Tue Aug 7
print(result)

If you always have the kind of pattern 'Tue Aug 7 14:22 - 14:23' in your string, then I suggest you using regex to match this pattern:
import re
temp_list = ['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
m = re.search(r'\w{3} +(\w{3}) +(\d{1,2}) +\d\d:\d\d +- +(\d\d:\d\d)', temp_list[0])
result = ' '.join([m.group(i) for i in (1,2,3)])
print(result) # Aug 7 14:23

Or:
l=['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
print(' '.join(l[0].split()[-6:][:-1]))
Output:
Aug 7 14:22 - 14:23

Related

How can i convert this string to integers & other strings in Python

I got this string...
String = '-268 14 7 19 - Fri Aug 3 12:32:08 2018\n'
I want to get the first 4 numbers (-268, 14, 7, 19) in integer-variables and Fri Aug 3 12:32:08 in another string-variable.
Is that possible?

Using basic python
string = '-268 14 7 19 - Fri Aug 3 12:32:08 2018\n'
vals, date = string.strip().split(' - ')
int_vals = [int(v) for v in vals.split()]
print(int_vals) # [-268, 14, 7, 19]
print(date) # Fri Aug 3 12:32:08 2018
Using regex
import re
match = re.search(r'([-\d]+) ([-\d]+) ([-\d]+) ([-\d]+)[ -]*(.*)', string)
date = match.group(5)
int_vals = [int(v) for v in match.groups()[:4]] # same results

Use str.split
Ex:
String = '-268 14 7 19 - Fri Aug 3 12:32:08 2018\n'
first, second = String.split(" - ")
first = tuple(int(i) for i in first.split())
print(first)
print(second)
Output:
(-268, 14, 7, 19)
Fri Aug 3 12:32:08 2018

Use split and map for it:
left, date = String.split(' - ')
numbers = list(map(int, left.split()))
print(numbers, date)

How to error check list index out of range

I have a read.log file that will have lines such as...
10.2.177.170 Tue Jun 19 03:30:55 CDT 2018
10.2.177.170 Tue Jun 19 03:31:03 CDT 2018
10.2.177.170 Tue Jun 19 03:31:04 CDT 2018
10.2.177.170 Tue Jun 19 03:32:04 CDT 2018
10.2.177.170 Tue Jun 19 03:33:04 CDT 2018
My code will read the 3rd to last line and combine strings. So the normal output would be:
2018:19:03:32:04
My problem is, if there are only 4 or less lines of data such as
10.1.177.170 Tue Jun 19 03:30:55 CDT 2018
10.1.177.170 Tue Jun 19 03:31:03 CDT 2018
10.1.177.170 Tue Jun 19 03:31:04 CDT 2018
10.1.177.170 Tue Jun 19 03:32:04 CDT 2018
I get an error
x1 = line.split()[0]
IndexError: list index out of range
How can I error check this or keep it from happening? I have been trying to check how many lines there are in the log and if less than 5, print a notice. Are there better options?
def run():
f = open('read.log', 'r')
lnumber = dict()
for num,line in enumerate(f,1):
x1 = line.split()[0]
log_day = line.split()[3]
log_time = line.split()[4]
log_year = line.split()[6]
if x1 in lnumber:
lnumber[x1].append((log_year + ":" + log_day + ":" + log_time))
else:
lnumber[x1] = [(num,log_time)]
if x1 in lnumber and len(lnumber.get(x1,None)) > 2:
# if there are less than 3 lines in document, this will fail
line_time = (lnumber[x1][-3].__str__())
print(line_time)
else:
print('nothing')
f.close
run()

f.readlines() gives you a list of lines in a file. So, you could try reading in all the lines in a file:
f = open('firewall.log', 'r')
lines = f.readlines()
And exiting if there are 4 or less lines:
if len(lines) <= 4:
f.close()
print("4 or less lines in file")
exit()
That IndexError you're getting is because you're calling split() on a line with nothing on it. I would suggest doing something like if not line: continue to avoid that case.

Python - Parsing a text file into a csv file

I have a text file that is output from a command that I ran with Netmiko to retrieve data from a Cisco WLC of things that are causing interference on our WiFi network. I stripped out just what I needed from the original 600k lines of code down to a couple thousand lines like this:
AP Name.......................................... 010-HIGH-FL4-AP04
Microwave Oven 11 10 -59 Mon Dec 18 08:21:23 2017
WiMax Mobile 11 0 -84 Fri Dec 15 17:09:45 2017
WiMax Fixed 11 0 -68 Tue Dec 12 09:29:30 2017
AP Name.......................................... 010-2nd-AP04
Microwave Oven 11 10 -61 Sat Dec 16 11:20:36 2017
WiMax Fixed 11 0 -78 Mon Dec 11 12:33:10 2017
AP Name.......................................... 139-FL1-AP03
Microwave Oven 6 18 -51 Fri Dec 15 12:26:56 2017
AP Name.......................................... 010-HIGH-FL3-AP04
Microwave Oven 11 10 -55 Mon Dec 18 07:51:23 2017
WiMax Mobile 11 0 -83 Wed Dec 13 16:16:26 2017
The goal is to end up with a csv file that strips out the 'AP Name ...' and puts what left on the same line as the rest of the information in the next line. The problem is some have two lines below the AP name and some have 1 or none. I have been at it for 8 hours and cannot find the best way to make this happen.
This is the latest version of code that I was trying to use, any suggestions for making this work? I just want something I can load up in excel and create a report with:
with open(outfile_name, 'w') as out_file:
with open('wlc-interference_raw.txt', 'r')as in_file:
#Variables
_ap_name = ''
_temp = ''
_flag = False
for i in in_file:
if 'AP Name' in i:
#write whatever was put in the temp file to disk because new ap now
#add another temp variable in case an ap has more than 1 interferer and check if new AP name
out_file.write(_temp)
out_file.write('\n')
#print(_temp)
_ap_name = i.lstrip('AP Name.......................................... ')
_ap_name = _ap_name.rstrip('\n')
_temp = _ap_name
#print(_temp)
elif '----' in i:
pass
elif 'Class Type' in i:
pass
else:
line_split = i.split()
for x in line_split:
_temp += ','
_temp += x
_temp += '\n'

I think your best option is to read all lines of the file, then split into sections starting with AP Name. Then you can work on parsing each section.
Example
s = """AP Name.......................................... 010-HIGH-FL4-AP04
Microwave Oven 11 10 -59 Mon Dec 18 08:21:23 2017
WiMax Mobile 11 0 -84 Fri Dec 15 17:09:45 2017
WiMax Fixed 11 0 -68 Tue Dec 12 09:29:30 2017
AP Name.......................................... 010-2nd-AP04
Microwave Oven 11 10 -61 Sat Dec 16 11:20:36 2017
WiMax Fixed 11 0 -78 Mon Dec 11 12:33:10 2017
AP Name.......................................... 139-FL1-AP03
Microwave Oven 6 18 -51 Fri Dec 15 12:26:56 2017
AP Name.......................................... 010-HIGH-FL3-AP04
Microwave Oven 11 10 -55 Mon Dec 18 07:51:23 2017
WiMax Mobile 11 0 -83 Wed Dec 13 16:16:26 2017"""
import re
class AP:
"""
A class holding each section of the parsed file
"""
def __init__(self):
self.header = ""
self.content = []
sections = []
section = None
for line in s.split('\n'): # Or 'for line in file:'
# Starting new section
if line.startswith('AP Name'):
# If previously had a section, add to list
if section is not None:
sections.append(section)
section = AP()
section.header = line
else:
if section is not None:
section.content.append(line)
sections.append(section) # Add last section outside of loop
for section in sections:
ap_name = section.header.lstrip("AP Name.") # lstrip takes all the characters given, not a literal string
for line in section.content:
print(ap_name + ",", end="")
# You can extract the date separately, if needed
# Splitting on more than one space using a regex
line = ",".join(re.split(r'\s\s+', line))
print(line.rstrip(',')) # Remove trailing comma from imperfect split
Output
010-HIGH-FL4-AP04,Microwave Oven,11,10,-59,Mon Dec 18 08:21:23 2017
010-HIGH-FL4-AP04,WiMax Mobile,11,0,-84,Fri Dec 15 17:09:45 2017
010-HIGH-FL4-AP04,WiMax Fixed,11,0,-68,Tue Dec 12 09:29:30 2017
010-2nd-AP04,Microwave Oven,11,10,-61,Sat Dec 16 11:20:36 2017
010-2nd-AP04,WiMax Fixed,11,0,-78,Mon Dec 11 12:33:10 2017
139-FL1-AP03,Microwave Oven,6,18,-51,Fri Dec 15 12:26:56 2017
010-HIGH-FL3-AP04,Microwave Oven,11,10,-55,Mon Dec 18 07:51:23 2017
010-HIGH-FL3-AP04,WiMax Mobile,11,0,-83,Wed Dec 13 16:16:26 2017
Tip:
You don't need Python to write the CSV, you can output to a file using the command line
python script.py > output.csv

Python: Working out whitespace in date

I have a log file file.txt and it has the date format as '%b %_d %H:%M:%S'.
When the day of the month is between the 1st and 9th, it pads out the field with a space.
I'm just wondering if my code is the best way to check if this includes a space or not as I'm just trying to pull out the date/time from each line
file.txt
Sep 8 16:13:02 blah
Sep 8 16:14:02 blahblah
Sep 8 16:15:02 blablahblah
Code:
with open('file.txt','r') as f:
for line in f:
if int(line.split()[1]) < 10:
d = line.split()[0] + ' ' + line.split()[1] + ' ' + line.split()[2] #double space after [0]
else:
d = line.split()[0] + ' ' + line.split()[1] + ' ' + line.split()[2] #single space after [0]
print d

If you want your output field to be padded with spaces, you can use python string formatting spec.
>>> for line in 'Sep 8 16:13:02 blah', 'Sep 12 16:13:02 blah':
>>> print('{0} {1:>2} {2}'.format(*line.split()))
Sep 8 16:13:02
Sep 12 16:13:02
{1:>2} means that field 1 should be right aligned and at least 2 characters wide. Missing characters will be padded with spaces.
In python 3.6+ you can also use f-strings to make it more self-explanatory.
>>> for line in 'Sep 8 16:13:02 blah', 'Sep 12 16:13:02 blah blah blah':
>>> month, date, time, *rest = line.split()
>>> print(f'date: {month} {date:>2} {time}\ncomment: {" ".join(rest)}')
date: Sep 8 16:13:02
comment: blah
date: Sep 12 16:13:02
comment: blah blah blah

Based on the comment by jedwards:
from datetime import datetime
f = '''Sep 8 16:13:02 blah
Sep 8 16:14:02 blahblah
Sep 8 16:15:02 blablahblah'''.splitlines()
for line in f:
d = datetime.strptime(line[:15], '%b %d %H:%M:%S')
print(d)
Output:
1900-09-08 16:13:02
1900-09-08 16:14:02
1900-09-08 16:15:02

displaying calendar items closest to today using datetime

I have a dictionary of my calendar items for a month (date as "key", items in the form of a list as "value") that I want to print out a certain way (That dictionary in included in the code, assigned to dct). I only want to display items that are on or after the current date (i.e. today). The display format is:
day: item1, item2
I also want those items to span only 5 lines of stdout with each line 49 characters wide (spaces included). This is necessary because the output will be displayed in conky (app for linux).
Since a day can have multiple agenda items, the output will have to be wrapped and printed out on more than one line. I want the code to account for that by selecting only those days whose items can fit in 5 or less lines instead of printing 5 days with associated items on >5 lines. For e.g.
day1: item1, item2
item3
day2: item1
day3: item1,
item2
Thats 3 days on/after current day printing on 5 lines with each line 49 char wide. Strings exceeding 49 char are wrapped on newline.
Here is the code i've written to do this:
#!/usr/bin/env python
from datetime import date, timedelta, datetime
import heapq
import re
import textwrap
pattern_string = '(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)'
pattern = re.compile(pattern_string)
# Explanation of pattern_string:
# ------------------------------
#( #start of group #1
#1[012] # start with 10, 11, 12
#| # or
#[1-9] # start with 1,2,...9
#) #end of group #1
#: # follow by a semi colon (:)
#[0-5][0-9] # follw by 0..5 and 0..9, which means 00 to 59
#(\\s)? # follow by a white space (optional)
#(?i) # next checking is case insensitive
#(am|pm) # follow by am or pm
# The 12-hour clock format is start from 0-12, then a semi colon (:) and follow by 00-59 , and end with am or pm.
# Time format that match:
# 1. "1:00am", "1:00 am","1:00 AM" ,
# 2. "1:00pm", "1:00 pm", "1:00 PM",
# 3. "12:50 pm"
d = date.today() # datetime.date(2013, 8, 11)
e = datetime.today() # datetime.datetime(2013, 8, 11, 5, 56, 28, 702926)
today = d.strftime('%a %b %d') # 'Sun Aug 11'
dct = {
'Thu Aug 01' : [' Weigh In'],
'Thu Aug 08' : [' 8:00am', 'Serum uric acid test', '12:00pm', 'Make Cheesecake'],
'Sun Aug 11' : [" Awais chotu's birthday", ' Car wash'],
'Mon Aug 12' : ['10:00am', 'Start car for 10 minutes'],
'Thu Aug 15' : [" Hooray! You're Facebook Free!", '10:00am', 'Start car for 10 minutes'],
'Mon Aug 19' : ['10:00am', 'Start car for 10 minutes'],
'Thu Aug 22' : ['10:00am', 'Start car for 10 minutes'],
'Mon Aug 26' : ['10:00am', 'Start car for 10 minutes'],
'Thu Aug 29' : ['10:00am', 'Start car for 10 minutes']
}
def join_time(lst):
'''Searches for a time format string in supplied list and concatenates it + the event next to it as an single item
to a list and returns that list'''
mod_lst = []
for number, item in enumerate(lst):
if re.search(pattern, item):
mod_lst.append(item + ' ' + lst[number+1]) # append the item (i.e time e.g '1:00am') and the item next to it (i.e. event)
del lst[number+1]
else:
mod_lst.append(item)
return mod_lst
def parse_date(datestring):
return datetime.strptime(datestring + ' ' + str(date.today().year), "%a %b %d %Y") # returns a datetime obj for the time string; "Sun Aug 11" = datetime.datetime(1900, 8, 11, 0, 0)
deltas = [] # holds datetime.timedelta() objs; timedelta(days, seconds, microseconds)
val_len = []
key_len = {}
for key in dct:
num = len(''.join(item for item in dct[key]))
val_len.append(num) # calculate the combined len of all items in the
# list which are the val of a key and add them to val_len
if num > 37:
key_len[key] = 2
else:
key_len[key] = 1
# val_len = [31, 9, 61, 31, 31, 49, 31, 32, 31]
# key_len = {'Sun Aug 11': 1, 'Mon Aug 12': 1, 'Thu Aug 01': 1, 'Thu Aug 15': 2, 'Thu Aug 22': 1, 'Mon Aug 19': 1, 'Thu Aug 08': 2, 'Mon Aug 26': 1, 'Thu Aug 29': 1}
counter = 0
for eachLen in val_len:
if eachLen > 37:
counter = counter + 2
else:
counter = counter + 1
# counter = 11
if counter > 5: # because we want only those 5 events in our conky output which are closest to today
n = counter - 5 # n = 6, these no of event lines should be skipped
for key in dct:
deltas.append(e - parse_date(key)) # today - key date (e.g. 'Sun Aug 11') ---> datetime.datetime(2013, 8, 11, 5, 56, 28, 702926) - datetime.datetime(1900, 8, 11, 0, 0)
# TODO: 'n' no of event lines should be skipped, NOT n no of days!
for key in sorted(dct, key=parse_date): # sorted() returns ['Thu Aug 01', 'Thu Aug 08', 'Sun Aug 11', 'Mon Aug 12', 'Thu Aug 15', 'Mon Aug 19', 'Thu Aug 22', 'Mon Aug 26', 'Thu Aug 29']
tdelta = e - parse_date(key)
if tdelta in heapq.nlargest(n, deltas): # heapq.nlargest(x, iterable[, key]); returns list of 'x' no. of largest items in iterable
pass # In this case it should return a list of top 6 largest timedeltas; if the tdelta is in
# that list, it means its not amongst the 5 events we want to print
else:
if key == today:
value = dct[key]
val1 = '${color green}' + key + '$color: '
mod_val = join_time(value)
val2 = textwrap.wrap(', '.join(item for item in mod_val), 37)
print val1 + '${color 40E0D0}' + '$color\n ${color 40E0D0}'.join(item for item in val2) + '$color'
else:
value = dct[key]
mod_val = join_time(value)
output = key + ': ' + ', '.join(item for item in mod_val)
print '\n '.join(textwrap.wrap(output, 49))
else:
for key in sorted(dct, key=parse_date):
if key == today:
value = dct[key]
val1 = '${color green}' + key + '$color: '
mod_val = join_time(value)
val2 = textwrap.wrap(', '.join(item for item in mod_val), 37)
print val1 + '${color 40E0D0}' + '$color\n ${color 40E0D0}'.join(item for item in val2) + '$color'
else:
value = dct[key]
mod_val = join_time(value)
output = key + ': ' + ', '.join(item for item in mod_val)
print '\n '.join(textwrap.wrap(output, 49))
The result is:
Thu Aug 22: 10:00am Start car for 10 minutes
Mon Aug 26: 10:00am Start car for 10 minutes
Thu Aug 29: 10:00am Start car for 10 minutes
I've commented the code heavily so it shouldn't be difficult to figure out how it works. I'm basically calculating the days farthest away from current day using datetime and skipping those days and their items. The code usually works well but once in a while it doesn't. In this case the output should have been:
Mon Aug 19: 10:00am Start car for 10 minutes
Thu Aug 22: 10:00am Start car for 10 minutes
Mon Aug 26: 10:00am Start car for 10 minutes
Thu Aug 29: 10:00am Start car for 10 minutes
since these are the days after the current day (Fri 16 Aug) whose items fit in 5 lines. How do I fix it to skip n no of lines rather than no of days farthest away from today?
I was thinking of using key_len dict to somehow filter the output further, by printing the items of only those days whose items length sum up to < or = 5...
I'm stuck.

It's very hard to tell what you're asking here, and your code is a huge muddle.
However, the reason you're getting the wrong output in the given example is very obvious, and matches the TODO comment in your code, so I'm going to assume that's the only part you're asking about:
# TODO: 'n' no of event lines should be skipped, NOT n no of days!
I don't understand why you want to skip to the last 5 lines after today instead of the first 5, but I'll assume you have some good reason for that.
The easiest way to solve this is to just do them in reverse, prepend the lines to a string instead of printing them directly, stop when you've reached 5 lines, and then print the string. (This would also save the wasteful re-building of the heap over and over, etc.)
For example, something like this:
outlines = []
for key in sorted(dct, key=parse_date, reverse=True): # sorted() returns ['Thu Aug 01', 'Thu Aug 08', 'Sun Aug 11', 'Mon Aug 12', 'Thu Aug 15', 'Mon Aug 19', 'Thu Aug 22', 'Mon Aug 26', 'Thu Aug 29']
if parse_date(key) < parse_date(today):
break
tdelta = e - parse_date(key)
if key == today:
value = dct[key]
val1 = '${color green}' + key + '$color: '
mod_val = join_time(value)
val2 = textwrap.wrap(', '.join(item for item in mod_val), 37)
outstr = val1 + '${color 40E0D0}' + '$color\n ${color 40E0D0}'.join(item for item in val2) + '$color'
outlines[:0] = outstr.splitlines()
else:
value = dct[key]
mod_val = join_time(value)
output = key + ': ' + ', '.join(item for item in mod_val)
outstr = '\n '.join(textwrap.wrap(output, 49))
outlines[:0] = outstr.splitlines()
if len(outlines) >= 5:
break
print '\n'.join(outlines)
There are a lot of ways you could simplify this. For example, instead of passing around string representations of dates and using parse_date all over the place, just pass around dates, and format them once at the end. Use string formatting instead of 120-character multiple-concatenation expressions. Build your data structures once and use them, instead of rebuilding them over and over where you need them. And so on. But this should be all you need to get it to work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse elements from a text string (in a list) - python

I have a list element which is text. print ((temp_list)) Output: ['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) '] I wish to get this output: Aug 7 14:23 I have tried to remove the whitespace but that messes up the output, which makes it harder to separate out the elements I want.

You can split the text and get the 5th, 6th and 9th fields: f = temp_list[0].split() print(' '.join((f[4], f[5], f[8])))

Using Regex. import re temp_list = ['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) '] for i in temp_list: m = re.search(r"(?P<date>(Jun|Jul|Aug|Sep).*?)\(", i) if m: print(m.group('date')) Output: Aug 7 14:22 - 14:23

Or: l=['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) '] print(' '.join(l[0].split()[-6:][:-1])) Output: Aug 7 14:22 - 14:23

Related

How can i convert this string to integers & other strings in Python

How to error check list index out of range

Python - Parsing a text file into a csv file

Python: Working out whitespace in date

displaying calendar items closest to today using datetime

Categories

Resources