Transform continious date string (20190327200000000W) in date time - python

I'm doing an application which parse a XML from http request and one of the attributes is a date.
The problem is that the format is a string without separation, for example: '20190327200000000W' and I need to transform it into a datetime format to send it to a database.
All the information I have found is with some kind of separation char (2019-03-23 ...). Can you help me?
Thanks!!!

Maybe this? (in jupypter notebook)
from datetime import datetime
datetime_object = datetime.strptime('20190327200000000W', '%Y%m%d%H%M%S%fW')
datetime_object

Well I have solved this, at first I did that Xenobiologist said, but I had a format problem, so I decided to delete the last character (the X of %X)...and I realized that I hadn't a string, I had a list, so I transformed to string and did the operations. My code (I'll put only the inside for loop part, without the parsing part):
for parse in tree.iter(aaa):
a = parse.get(m)
respon = a.split(' ')
if m == 'Fh':
x = str(respon[0])
x2 = len(x)
x3 = x[:x2-1]
print (x3)
y = time.strptime(x3, "%Y%m%d%H%M%S%f")

Related

PySpark / Python Slicing and Indexing Issue

Can someone let me know how to pull out certain values from a Python output.
I would like the retrieve the value 'ocweeklyreports' from the the following output using either indexing or slicing:
'config': '{"hiveView":"ocweeklycur.ocweeklyreports"}
This should be relatively easy, however, I'm having problem defining the Slicing / Indexing configuation
The following will successfully give me 'ocweeklyreports'
myslice = config['hiveView'][12:30]
However, I need the indexing or slicing modified so that I will get any value after'ocweeklycur'
I'm not sure what output you're dealing with and how robust you're wanting it but if it's just a string you can do something similar to this (for a quick and dirty solution).
input = "Your input"
indexStart = input.index('.') + 1 # Get the index of the input at the . which is where you would like to start collecting it
finalResponse = input[indexStart:-2])
print(finalResponse) # Prints ocweeklyreports
Again, not the most elegant solution but hopefully it helps or at least offers a starting point. Another more robust solution would be to use regex but I'm not that skilled in regex at the moment.
You could almost all of it using regex.
See if this helps:
import re
def search_word(di):
st = di["config"]["hiveView"]
p = re.compile(r'^ocweeklycur.(?P<word>\w+)')
m = p.search(st)
return m.group('word')
if __name__=="__main__":
d = {'config': {"hiveView":"ocweeklycur.ocweeklyreports"}}
print(search_word(d))
The following worked best for me:
# Extract the value of the "hiveView" key
hive_view = config['hiveView']
# Split the string on the '.' character
parts = hive_view.split('.')
# The value you want is the second part of the split string
desired_value = parts[1]
print(desired_value) # Output: "ocweeklyreports"

comparing two list with different format

I have two list :-
influx = [u'mphhos-fnwp-010101-2',
u'mphhos-fnwp-010101-1',
u'mphhos-fnwp-010101-7',
u'mphhos-fnwp-010101-10',
u'mphhos-fnwp-010101-9',
u'mphhos-fnwp-010101-4',
u'mphhos-fnwp-010101-3',
u'mphhos-fnwp-010101-8',
u'mphhos-fnwp-010101-6',
u'mphhos-fnwp-010101-5',
u'mphhos-fnwp-010101-11']
etcd =[u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-4',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-9',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-1',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-10',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-3',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-6',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-7',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-8',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-11',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-2',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-5']
Etcd is the parent list and I want to compare influx with Etcd.
1.) I want to get all elements which are not present in the list influx and return them.
2.) How I can convert the etcd list into influx list formatting by omitting /xymon/fnwp/mphhos/
Either of the above question will get me my solution.
I tried lots of methods but I am not getting my solution as they are in different format.
I will get my answer by doing set(etcd)-set(influx) but as they are in different format I am getting all the items in the list.
str.rsplit
[x for x in etcd if x.rsplit('/', 1)[1] not in influx]
Per rafaelc's suggestion
infx = set(influx)
[x for x in etcd if x.rsplit('/', 1)[1] not in infx]
One simple solution would be to remove the prefixes
for i, char in enumerate(etcd):
char = char.replace('/xymon/fnwp/mphhos/', '')
etcd[i] = char
And then you could find the differences using set().
influx = [u'mphhos-fnwp-010101-2',
u'mphhos-fnwp-010101-1',
u'mphhos-fnwp-010101-7',
u'mphhos-fnwp-010101-10',
u'mphhos-fnwp-010101-9',
u'mphhos-fnwp-010101-4',
u'mphhos-fnwp-010101-3',
u'mphhos-fnwp-010101-8',
u'mphhos-fnwp-010101-6',
u'mphhos-fnwp-010101-5',
u'mphhos-fnwp-010101-11']
etcd =[u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-4',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-9',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-1',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-10',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-3',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-6',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-7',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-8',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-11',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-2',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-5']
etcd = [x.replace('/xymon/fnwp/mphhos/', '') for x in etcd]
# or using regex
# etcd = [re.sub('/xymon/fnwp/mphhos/', '', x) for x in etcd]
diff = set(etcd) - set(influx)
print(diff)

How to trim spaces within timestamps using 'm/d/yy' format

I have a Python script that generates .csv files from other data sources.
Currently, an error happens when the user manually adds a space to a date by accident. Instead of inputting the date as "1/13/17", a space may be added at the front (" 1/13/17") so that there's a space in front of the month.
I've included the relevant part of my Python script below:
def processDateStamp(sourceStamp):
matchObj = re.match(r'^(\d+)/(\d+)/(\d+)\s', sourceStamp)
(month, day, year) = (matchObj.group(1), matchObj.group(2), matchObj.group(3))
return "%s/%s/%s" % (month, day, year)
How do I trim the space issue in front of month and possibly on other components of the date (the day and year) as well for the future?
Thanks in advance.
Since you're dealing with dates, it might be more appropriate to use datetime.strptime than regex here. There are two advantages of this approach:
It makes it slightly clearer to anyone reading that you're trying to parse dates.
Your code will be more prone to throw exceptions when trying to parse data that doesn't represent dates, or represent dates in an incorrect format - this is good because it helps you catch and address issues that might otherwise go unnoticed.
Here's the code:
from datetime import datetime
def processDateStamp(sourceStamp):
date = datetime.strptime(sourceStamp.replace(' ', ''), '%M/%d/%y')
return '{}/{}/{}'.format(date.month, date.day, date.year)
if __name__ == '__main__':
print(processDateStamp('1/13/17')) # 1/13/17
print(processDateStamp(' 1/13/17')) # 1/13/17
print(processDateStamp(' 1 /13 /17')) # 1/13/17
You also can use parser from python-dateutil library. The main benefit you will get - it can recognize the datetime format for you (sometimes it may be useful):
from dateutil import parser
from datetime import datetime
def processDateTimeStamp(sourceStamp):
dt = parser.parse(sourceStamp)
return dt.strftime("%m/%d/%y")
processDateTimeStamp(" 1 /13 / 17") # returns 01/13/17
processDateTimeStamp(" jan / 13 / 17")
processDateTimeStamp(" 1 - 13 - 17")
processDateTimeStamp(" 1 .13 .17")
Once again, a perfect opportunity to use split, strip, and join:
def remove_spaces(date_string):
date_list = date_string.split('/')
result = '/'.join(x.strip() for x in date_list)
return result
Examples
In [7]: remove_spaces('1/13/17')
Out[7]: '1/13/17'
In [8]: remove_spaces(' 1/13/17')
Out[8]: '1/13/17'
In [9]: remove_spaces(' 1/ 13/17')
Out[9]: '1/13/17'

Python : Convert time format to unix_timeformat

I have a time format like this
t = "2012-03-20T08:31:00-05:00"
I can extract the time contents using RegEx like this.
p = re.compile("(\d{4})\-(\d\d)\-(\d\d)T(\d\d):(\d\d):(\d\d)[\-|+]\d\d:\d\d")
matches = p.findall(t)
But I was wondering if there is a way to convert this format directly to unix_timestamp without using RegEx ? Is there a calendar library or something similar ?
datetime.datetime.strptime is your friend :)
http://docs.python.org/library/datetime.html#datetime.datetime.strptime
Use time.strptime:
time.strptime(t, "%Y-%m-%dT%H:%M:%S-%Z")
Unfortunately, this doesn't appear to work with numeric time zone offsets. python-dateutil should be able to handle your format:
dateutil.parser.parse(t)
Alternatively, you could split your string before the numeric offset:
t, offset_sign, offset = t[:-6], t[-6], t[-5:]
t = time.strptime(t, "%Y-%m-%dT%H:%M:%S")
offset = time.strptime(offset, "%H:%M")
Use strptime from the time module

Fastest way to read comma separated files (including datetimes) in python

I have data stored in comma delimited txt files. One of the columns represents a datetime.
I need to load each column into separate numpy arrays (and decode the date into a python datetime object).
What is the fastest way to do this (in terms of run time)?
NB. the files are several hundred MB of data and currently take several minutes to load in.
e.g. mydata.txt
15,3,0,2003-01-01 00:00:00,12.2
15,4.5,0,2003-01-01 00:00:00,13.7
15,6,0,2003-01-01 00:00:00,18.4
15,7.5,0,2003-01-01 00:00:00,17.9
15,9,0,2003-01-01 00:00:00,17.7
15,10.5,0,2003-01-01 00:00:00,16.3
15,12,0,2003-01-01 00:00:00,17.2
Here is my current code (it works, but is slow):
import csv
import datetime
import time
import numpy
a=[]
b=[]
c=[]
d=[]
timestmp=[]
myfile = open('mydata.txt',"r")
# Read in the data
csv_reader = csv.reader(myfile)
for row in csv_reader:
a.append(row[0])
b.append(row[1])
c.append(row[2])
timestmp.append(row[3])
d.append(row[4])
a = numpy.array(a)
b = numpy.array(b)
c = numpy.array(c)
d = numpy.array(d)
# Convert Time string list into list of Python datetime objects
times = []
time_format = "%Y-%m-%d %H:%M:%S"
for i in xrange(len(timestmp)):
times.append(datetime.datetime.fromtimestamp(time.mktime(time.strptime(timestmp[i], time_format))))
Is there a more efficient way to do this?
Any help is very much appreciated -thanks!
(edit: In the end the bottleneck turned out to be with the datetime conversion, and not reading the file as I originally assumed.)
First, you should run your sample script with Python's built-in profiler to see where the problem actually might be. You can do this from the command-line:
python -m cProfile myscript.py
Secondly, what jumps at me at least, why is that loop at the bottom necessary? Is there a technical reason that it can't be done while reading mydata.txt in the loop you have above the instantiation of the numpy arrays?
Thirdly, you should create the datetime objects directly, as it also supports strptime. You don't need to create a time stamp, make the time, and just make a datetime from a timestamp.
Your loop at the bottom can just be re-written like this:
times = []
timestamps = []
TIME_FORMAT = "%Y-%m-%d %H:%M:%S"
for t in timestmp:
parsed_time = datetime.datetime.strptime(t, TIME_FORMAT)
times.append(parsed_time)
timestamps.append(time.mktime(parsed_time.timetuple()))
I too the liberty of PEP-8ing your code a bit, such as changing your constant to all caps. Also, you can iterate over a list just by using the in operator.
Try numpy.loadtxt(), the doc string has a good example.
You can also try to use copy=False when call numpy.array since the default behavior is copy it, this can speed up the script (especially since you said it process a lot of data).
npa = numpy.array(ar, copy=False)
If you follow Mahmoud Abdelkader's advice and use the profiler, and find out that the bottleneck is in the csv loader, you could always try replacing your csv_reader with this:
for line in open("ProgToDo.txt"):
row = line.split(',')
a.append(int(row[0]))
b.append(int(row[1]))
c.append(int(row[2]))
timestmp.append(row[3])
d.append(float(row[4]))
But more probable I think is that you have a lot of data conversions. Especially the last loop for time conversion will take a long time if you have millions of conversions! If you succeed in doing it all in one step (read+convert), plus taking Terseus advice on not copying the arrays to numpy dittos, you will reduce execution times.
I'm not completely sure if this will help but you may be able to speed up the reading of the file by using ast.literal_eval. For example:
from ast import literal_eval
myfile = open('mydata.txt',"r")
mylist = []
for line in myfile:
line = line.strip()
e = line.rindex(",")
row = literal_eval('[%s"%s"%s]' % (line[:e-19], line[e-19:e], line[e:]))
mylist.append(row)
a, b, c, timestamp, d = zip(*mylist)
# a, b, c, timestamp, and d are what they were after your csv_reader loop

Categories