Python: Working out whitespace in date - python

I have a log file file.txt and it has the date format as '%b %_d %H:%M:%S'.
When the day of the month is between the 1st and 9th, it pads out the field with a space.
I'm just wondering if my code is the best way to check if this includes a space or not as I'm just trying to pull out the date/time from each line
file.txt
Sep 8 16:13:02 blah
Sep 8 16:14:02 blahblah
Sep 8 16:15:02 blablahblah
Code:
with open('file.txt','r') as f:
for line in f:
if int(line.split()[1]) < 10:
d = line.split()[0] + ' ' + line.split()[1] + ' ' + line.split()[2] #double space after [0]
else:
d = line.split()[0] + ' ' + line.split()[1] + ' ' + line.split()[2] #single space after [0]
print d

If you want your output field to be padded with spaces, you can use python string formatting spec.
>>> for line in 'Sep 8 16:13:02 blah', 'Sep 12 16:13:02 blah':
>>> print('{0} {1:>2} {2}'.format(*line.split()))
Sep 8 16:13:02
Sep 12 16:13:02
{1:>2} means that field 1 should be right aligned and at least 2 characters wide. Missing characters will be padded with spaces.
In python 3.6+ you can also use f-strings to make it more self-explanatory.
>>> for line in 'Sep 8 16:13:02 blah', 'Sep 12 16:13:02 blah blah blah':
>>> month, date, time, *rest = line.split()
>>> print(f'date: {month} {date:>2} {time}\ncomment: {" ".join(rest)}')
date: Sep 8 16:13:02
comment: blah
date: Sep 12 16:13:02
comment: blah blah blah

Based on the comment by jedwards:
from datetime import datetime
f = '''Sep 8 16:13:02 blah
Sep 8 16:14:02 blahblah
Sep 8 16:15:02 blablahblah'''.splitlines()
for line in f:
d = datetime.strptime(line[:15], '%b %d %H:%M:%S')
print(d)
Output:
1900-09-08 16:13:02
1900-09-08 16:14:02
1900-09-08 16:15:02

Related

remove spaces at the beginning from array

I want to remove whitespace from a array,at the beginning.
This is my code:
f = open("demofile.txt", "r")
lines = f.readlines()
for i in list(lines):
w = i[3:]
w = ', '.join(w.split())
#print(w)
#time.sleep(1)
y = i[2]
y=int(y)+1
#print(y)
c1=np.array([w])
c1 = [int(i) for i in c1[0].replace(" ", "").split(",")]
c1=np.array([c1]*3)
c1=np.transpose(c1)
a=str(c1).replace("[",'')
a=str(a).replace("]",'')
print(a)
Input: <=1 2011 2021 2031
My Output:
2011 2011 2011
2021 2021 2021
2031 2031 2031
I need:
2011 2011 2011
2021 2021 2021
2031 2031 2031
I tried the function strip
Try adding this line before print(a): a=str(a).replace("\n ",'\n'). \n means new line, so if the first letter in a line is a space, it will be removed.
A cleaner option is as follows:
a = ""
for row in c1:
a = f'{a}{" ".join(map(str, row))}\n'

Print a calendar starting from today's date

I want to print a calendar of the current month but starting from today's date in python. Is there any way I can do this?
Only thing I thought to try was :
import calendar
y = int(input("Input the year : "))
m = int(input("Input the month : "))
d = int (input("Input the day: "))
print(calendar.month(y, m, d))
which in retrospect is a dumb idea because all it did was :
but considering my 3 day experience in python it seemed dumb enough to work.
I want the end result to look something like this:
Essentially,I want the calendar to show only the remaining days of the month,instead of the whole month.
The calender method returns a string. You can use conventual string manipulation methods to modify the string before printing - f.e. regex replacement:
import calendar
import re
y = 2020
m = 5
d = 15
h = calendar.month(y, m, 3) # 3 is the width of the date column, not the day-date
print(h)
print()
# replace all numbers before your day, take care of either spaces or following \n
for day in range(d):
# replace numbers at the start of a line
pattern = rf"\n{day} "
h = re.sub(pattern, "\n " if day < 10 else "\n ", h)
# replace numbers in the middle of a line
pattern = rf" {day} "
h = re.sub(pattern, " " if day < 10 else " ", h)
# replace numbers at the end of a line
pattern = rf" {day}\n"
h = re.sub(pattern, " \n" if day < 10 else " \n", h)
print(h)
Output:
May 2020
Mon Tue Wed Thu Fri Sat Sun
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
# after replacement
May 2020
Mon Tue Wed Thu Fri Sat Sun
15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Parse elements from a text string (in a list)

I have a list element which is text.
print ((temp_list))
Output:
['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
I wish to get this output:
Aug 7 14:23
I have tried to remove the whitespace but that messes up the output, which makes it harder to separate out the elements I want.
You can split the text and get the 5th, 6th and 9th fields:
f = temp_list[0].split()
print(' '.join((f[4], f[5], f[8])))
Using Regex.
import re
temp_list = ['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
for i in temp_list:
m = re.search(r"(?P<date>(Jun|Jul|Aug|Sep).*?)\(", i)
if m:
print(m.group('date'))
Output:
Aug 7 14:22 - 14:23
sample = 'root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) '
# split the string on space characters
data = sample.split(' ')
# inspect our list in console, the list should now contain mix of words and spaces (empty string)
print(data)
# since empty string evaluates to False in Python, we can remove them like this from our list with filter function
data = filter(lambda x: x, data)
# outputs: ['root', 'pts/3', '100.121.17.73', 'Tue', 'Aug', '7', '14:22', '-', '14:23', '(00:00)']
print(data)
# in the end we collect relevant data by slicing the list
# from index 3rd to 6th and join them into one string with that data separated by one space in-between.
result = ' '.join(data[3:6])
# outputs: Tue Aug 7
print(result)
If you always have the kind of pattern 'Tue Aug 7 14:22 - 14:23' in your string, then I suggest you using regex to match this pattern:
import re
temp_list = ['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
m = re.search(r'\w{3} +(\w{3}) +(\d{1,2}) +\d\d:\d\d +- +(\d\d:\d\d)', temp_list[0])
result = ' '.join([m.group(i) for i in (1,2,3)])
print(result) # Aug 7 14:23
Or:
l=['root pts/3 100.121.17.73 Tue Aug 7 14:22 - 14:23 (00:00) ']
print(' '.join(l[0].split()[-6:][:-1]))
Output:
Aug 7 14:22 - 14:23

How can i convert this string to integers & other strings in Python

I got this string...
String = '-268 14 7 19 - Fri Aug 3 12:32:08 2018\n'
I want to get the first 4 numbers (-268, 14, 7, 19) in integer-variables and Fri Aug 3 12:32:08 in another string-variable.
Is that possible?
Using basic python
string = '-268 14 7 19 - Fri Aug 3 12:32:08 2018\n'
vals, date = string.strip().split(' - ')
int_vals = [int(v) for v in vals.split()]
print(int_vals) # [-268, 14, 7, 19]
print(date) # Fri Aug 3 12:32:08 2018
Using regex
import re
match = re.search(r'([-\d]+) ([-\d]+) ([-\d]+) ([-\d]+)[ -]*(.*)', string)
date = match.group(5)
int_vals = [int(v) for v in match.groups()[:4]] # same results
Use str.split
Ex:
String = '-268 14 7 19 - Fri Aug 3 12:32:08 2018\n'
first, second = String.split(" - ")
first = tuple(int(i) for i in first.split())
print(first)
print(second)
Output:
(-268, 14, 7, 19)
Fri Aug 3 12:32:08 2018
Use split and map for it:
left, date = String.split(' - ')
numbers = list(map(int, left.split()))
print(numbers, date)

displaying calendar items closest to today using datetime

I have a dictionary of my calendar items for a month (date as "key", items in the form of a list as "value") that I want to print out a certain way (That dictionary in included in the code, assigned to dct). I only want to display items that are on or after the current date (i.e. today). The display format is:
day: item1, item2
I also want those items to span only 5 lines of stdout with each line 49 characters wide (spaces included). This is necessary because the output will be displayed in conky (app for linux).
Since a day can have multiple agenda items, the output will have to be wrapped and printed out on more than one line. I want the code to account for that by selecting only those days whose items can fit in 5 or less lines instead of printing 5 days with associated items on >5 lines. For e.g.
day1: item1, item2
item3
day2: item1
day3: item1,
item2
Thats 3 days on/after current day printing on 5 lines with each line 49 char wide. Strings exceeding 49 char are wrapped on newline.
Here is the code i've written to do this:
#!/usr/bin/env python
from datetime import date, timedelta, datetime
import heapq
import re
import textwrap
pattern_string = '(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)'
pattern = re.compile(pattern_string)
# Explanation of pattern_string:
# ------------------------------
#( #start of group #1
#1[012] # start with 10, 11, 12
#| # or
#[1-9] # start with 1,2,...9
#) #end of group #1
#: # follow by a semi colon (:)
#[0-5][0-9] # follw by 0..5 and 0..9, which means 00 to 59
#(\\s)? # follow by a white space (optional)
#(?i) # next checking is case insensitive
#(am|pm) # follow by am or pm
# The 12-hour clock format is start from 0-12, then a semi colon (:) and follow by 00-59 , and end with am or pm.
# Time format that match:
# 1. "1:00am", "1:00 am","1:00 AM" ,
# 2. "1:00pm", "1:00 pm", "1:00 PM",
# 3. "12:50 pm"
d = date.today() # datetime.date(2013, 8, 11)
e = datetime.today() # datetime.datetime(2013, 8, 11, 5, 56, 28, 702926)
today = d.strftime('%a %b %d') # 'Sun Aug 11'
dct = {
'Thu Aug 01' : [' Weigh In'],
'Thu Aug 08' : [' 8:00am', 'Serum uric acid test', '12:00pm', 'Make Cheesecake'],
'Sun Aug 11' : [" Awais chotu's birthday", ' Car wash'],
'Mon Aug 12' : ['10:00am', 'Start car for 10 minutes'],
'Thu Aug 15' : [" Hooray! You're Facebook Free!", '10:00am', 'Start car for 10 minutes'],
'Mon Aug 19' : ['10:00am', 'Start car for 10 minutes'],
'Thu Aug 22' : ['10:00am', 'Start car for 10 minutes'],
'Mon Aug 26' : ['10:00am', 'Start car for 10 minutes'],
'Thu Aug 29' : ['10:00am', 'Start car for 10 minutes']
}
def join_time(lst):
'''Searches for a time format string in supplied list and concatenates it + the event next to it as an single item
to a list and returns that list'''
mod_lst = []
for number, item in enumerate(lst):
if re.search(pattern, item):
mod_lst.append(item + ' ' + lst[number+1]) # append the item (i.e time e.g '1:00am') and the item next to it (i.e. event)
del lst[number+1]
else:
mod_lst.append(item)
return mod_lst
def parse_date(datestring):
return datetime.strptime(datestring + ' ' + str(date.today().year), "%a %b %d %Y") # returns a datetime obj for the time string; "Sun Aug 11" = datetime.datetime(1900, 8, 11, 0, 0)
deltas = [] # holds datetime.timedelta() objs; timedelta(days, seconds, microseconds)
val_len = []
key_len = {}
for key in dct:
num = len(''.join(item for item in dct[key]))
val_len.append(num) # calculate the combined len of all items in the
# list which are the val of a key and add them to val_len
if num > 37:
key_len[key] = 2
else:
key_len[key] = 1
# val_len = [31, 9, 61, 31, 31, 49, 31, 32, 31]
# key_len = {'Sun Aug 11': 1, 'Mon Aug 12': 1, 'Thu Aug 01': 1, 'Thu Aug 15': 2, 'Thu Aug 22': 1, 'Mon Aug 19': 1, 'Thu Aug 08': 2, 'Mon Aug 26': 1, 'Thu Aug 29': 1}
counter = 0
for eachLen in val_len:
if eachLen > 37:
counter = counter + 2
else:
counter = counter + 1
# counter = 11
if counter > 5: # because we want only those 5 events in our conky output which are closest to today
n = counter - 5 # n = 6, these no of event lines should be skipped
for key in dct:
deltas.append(e - parse_date(key)) # today - key date (e.g. 'Sun Aug 11') ---> datetime.datetime(2013, 8, 11, 5, 56, 28, 702926) - datetime.datetime(1900, 8, 11, 0, 0)
# TODO: 'n' no of event lines should be skipped, NOT n no of days!
for key in sorted(dct, key=parse_date): # sorted() returns ['Thu Aug 01', 'Thu Aug 08', 'Sun Aug 11', 'Mon Aug 12', 'Thu Aug 15', 'Mon Aug 19', 'Thu Aug 22', 'Mon Aug 26', 'Thu Aug 29']
tdelta = e - parse_date(key)
if tdelta in heapq.nlargest(n, deltas): # heapq.nlargest(x, iterable[, key]); returns list of 'x' no. of largest items in iterable
pass # In this case it should return a list of top 6 largest timedeltas; if the tdelta is in
# that list, it means its not amongst the 5 events we want to print
else:
if key == today:
value = dct[key]
val1 = '${color green}' + key + '$color: '
mod_val = join_time(value)
val2 = textwrap.wrap(', '.join(item for item in mod_val), 37)
print val1 + '${color 40E0D0}' + '$color\n ${color 40E0D0}'.join(item for item in val2) + '$color'
else:
value = dct[key]
mod_val = join_time(value)
output = key + ': ' + ', '.join(item for item in mod_val)
print '\n '.join(textwrap.wrap(output, 49))
else:
for key in sorted(dct, key=parse_date):
if key == today:
value = dct[key]
val1 = '${color green}' + key + '$color: '
mod_val = join_time(value)
val2 = textwrap.wrap(', '.join(item for item in mod_val), 37)
print val1 + '${color 40E0D0}' + '$color\n ${color 40E0D0}'.join(item for item in val2) + '$color'
else:
value = dct[key]
mod_val = join_time(value)
output = key + ': ' + ', '.join(item for item in mod_val)
print '\n '.join(textwrap.wrap(output, 49))
The result is:
Thu Aug 22: 10:00am Start car for 10 minutes
Mon Aug 26: 10:00am Start car for 10 minutes
Thu Aug 29: 10:00am Start car for 10 minutes
I've commented the code heavily so it shouldn't be difficult to figure out how it works. I'm basically calculating the days farthest away from current day using datetime and skipping those days and their items. The code usually works well but once in a while it doesn't. In this case the output should have been:
Mon Aug 19: 10:00am Start car for 10 minutes
Thu Aug 22: 10:00am Start car for 10 minutes
Mon Aug 26: 10:00am Start car for 10 minutes
Thu Aug 29: 10:00am Start car for 10 minutes
since these are the days after the current day (Fri 16 Aug) whose items fit in 5 lines. How do I fix it to skip n no of lines rather than no of days farthest away from today?
I was thinking of using key_len dict to somehow filter the output further, by printing the items of only those days whose items length sum up to < or = 5...
I'm stuck.
It's very hard to tell what you're asking here, and your code is a huge muddle.
However, the reason you're getting the wrong output in the given example is very obvious, and matches the TODO comment in your code, so I'm going to assume that's the only part you're asking about:
# TODO: 'n' no of event lines should be skipped, NOT n no of days!
I don't understand why you want to skip to the last 5 lines after today instead of the first 5, but I'll assume you have some good reason for that.
The easiest way to solve this is to just do them in reverse, prepend the lines to a string instead of printing them directly, stop when you've reached 5 lines, and then print the string. (This would also save the wasteful re-building of the heap over and over, etc.)
For example, something like this:
outlines = []
for key in sorted(dct, key=parse_date, reverse=True): # sorted() returns ['Thu Aug 01', 'Thu Aug 08', 'Sun Aug 11', 'Mon Aug 12', 'Thu Aug 15', 'Mon Aug 19', 'Thu Aug 22', 'Mon Aug 26', 'Thu Aug 29']
if parse_date(key) < parse_date(today):
break
tdelta = e - parse_date(key)
if key == today:
value = dct[key]
val1 = '${color green}' + key + '$color: '
mod_val = join_time(value)
val2 = textwrap.wrap(', '.join(item for item in mod_val), 37)
outstr = val1 + '${color 40E0D0}' + '$color\n ${color 40E0D0}'.join(item for item in val2) + '$color'
outlines[:0] = outstr.splitlines()
else:
value = dct[key]
mod_val = join_time(value)
output = key + ': ' + ', '.join(item for item in mod_val)
outstr = '\n '.join(textwrap.wrap(output, 49))
outlines[:0] = outstr.splitlines()
if len(outlines) >= 5:
break
print '\n'.join(outlines)
There are a lot of ways you could simplify this. For example, instead of passing around string representations of dates and using parse_date all over the place, just pass around dates, and format them once at the end. Use string formatting instead of 120-character multiple-concatenation expressions. Build your data structures once and use them, instead of rebuilding them over and over where you need them. And so on. But this should be all you need to get it to work.

Categories