Python strptime ``%z`` - python

EDIT2: This question is assuming a POSIX-ish platform with Python
linked against Glibc.
On my system, round-trip conversion using the %z formatting directive
using Python’s time library fails to parse the offset part of ISO 8601
formatted timestamps. This snippet:
import time
time.daylight = 0
fmt = "%Y-%m-%dT%H:%M:%SZ%z"
a=time.gmtime()
b=time.strftime(fmt, a)
c=time.strptime(b, fmt)
d=time.strftime(fmt, c)
print ("»»»»", a == c, b == d)
print ("»»»»", a.tm_zone, b)
print ("»»»»", c.tm_zone, d)
outputs:
»»»» False False
»»»» GMT 2018-02-16T09:26:34Z+0000
»»»» None 2018-02-16T09:26:34Z
whereas the expected output would be
»»»» True True
»»»» GMT 2018-02-16T09:26:34Z+0000
»»»» GMT 2018-02-16T09:26:34Z+0000
How do I get %z to respect that offset?
Python 3.3.2 and 3.6.4
[Glibc 2.17 and 2.25 ⇒ see below!]
EDIT: Glibc can be acquitted as proven by this C analogue:
#define _XOPEN_SOURCE
#define _DEFAULT_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
/* 2018-02-16T09:59:21Z+0000 */
#define ISO8601_FMT "%Y-%m-%dT%H:%M:%SZ%z"
int main () {
const time_t t0 = time (NULL);
struct tm a;
char b [27];
struct tm c;
char d [27];
(void)setenv ("TZ", "UTC", 1);
tzset ();
daylight = 0;
(void)gmtime_r (&t0, &a); /* a=time.gmtime () */
(void)strftime (b, sizeof(b), ISO8601_FMT, &a); /* b=time.strftime (fmt, a) */
(void)strptime (b, ISO8601_FMT, &c); /* c=time.strptime (b, fmt) */
(void)strftime (d, sizeof(d), ISO8601_FMT, &c); /* d=time.strftime (fmt, c) */
printf ("»»»» b ?= d %s\n", strcmp (b, d) == 0 ? "yep" : "hell, no");
printf ("»»»» %d <%s> %s\n", a.tm_isdst, a.tm_zone, b);
printf ("»»»» %d <%s> %s\n", c.tm_isdst, c.tm_zone, d);
}
Which outputs
»»»» b ?= d yep
»»»» 0 <GMT> 2018-02-16T10:28:18Z+0000
»»»» 0 <(null)> 2018-02-16T10:28:18Z+0000

With the "time.gmtime()" naturally you are getting the UTC time, so the offset will be always +0000, therefore an output string "2018-02-16T09:26:34Z" is correct for the ISO8601. If you want absolutely the "+0000" add it manually because it will be alway the same:
d = time.strftime(fmt, c) + '+0000'

I don't pretend to have the solution to generate the proper hour shift according to the time zone, but I can explain what happens here.
As hinted in Python timezone '%z' directive for datetime.strptime() not available answers:
strptime is implemeted in pure python so it has a constant behaviour
strftime depends on the platform/C library it was linked against.
On my system (Windows, Python 3.4), %z returns the same thing as %Z ("Paris, Madrid"). So when strptime tries to parse it back as digits, it fails. Your code gives me:
ValueError: time data '2018-02-16T10:00:49ZParis, Madrid' does not match format '%Y-%m-%dT%H:%M:%SZ%z'
It's system dependent for the generation, and not for the parsing.
This dissymetry explains the weird behaviour.

Related

data gets corrupted between c and python

I am trying to use Cython and ctypes to call a c library function using Python.
But the data bytes get corrupted somehow. Could someone please help to locate the issue?
testCRC.c:
#include <stdio.h>
unsigned char GetCalculatedCrc(const unsigned char* stream){
printf("Stream is %x %x %x %x %x %x %x\n",stream[0],stream[1],stream[2],stream[3],stream[4],stream[5],stream[6]);
unsigned char dummy=0;
return dummy;
}
wrapped.pyx:
# Exposes a c function to python
def c_GetCalculatedCrc(const unsigned char* stream):
return GetCalculatedCrc(stream)
test.py:
x_ba=(ctypes.c_ubyte *7)(*[0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41])
x_ca=(ctypes.c_char * len(x_ba)).from_buffer(x_ba)
y=c_GetCalculatedCrc(x_ca.value)
output:
Stream is d3 ff f7 7f 0 0 5f # expected
0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41
Solution:
1.
I had to update the cython to 0.29 to have the fix for the bug which was not allowing to use the typed memory.(read only problem).
2.
It worked passing x_ca.raw. But when x_ca.value was passed it threw error 'out of bound access.'
After the suggestions from #ead & #DavidW:
´.pyx´:
def c_GetCalculatedCrc(const unsigned char[:] stream):
# Exposes a c function to python
print "received %s\n" %stream[6]
return GetCalculatedCrc(&stream[0])
´test.py´:
x_ba=(ctypes.c_ubyte *8)(*[0x47,0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41])
x_ca=(ctypes.c_char * len(x_ba)).from_buffer(x_ba)
y=c_GetCalculatedCrc(x_ca.raw)
output:
Stream is 47 d3 ff f7 7f 0 0 41
As pointed out by #DavidW the problem is your usage of x_ca.value: When x_ca.value is called, every time a new bytes-object is created (see documentation) and memory is copied:
x_ca.value is x_ca.value
#False -> every time a new object is created
However, when the memory is copied, it handles \0-character as end of string (which is typical for C-strings), as can be seen in source code:
static PyObject *
CharArray_get_value(CDataObject *self, void *Py_UNUSED(ignored))
{
Py_ssize_t i;
char *ptr = self->b_ptr;
for (i = 0; i < self->b_size; ++i)
if (*ptr++ == '\0')
break;
return PyBytes_FromStringAndSize(self->b_ptr, i);
}
Thus the result of x_ca.value is a bytes object of length 4, which doesn't share memory with x_ca - when you access stream[6] it leads to undefined behavior - anything could happen (also a crash).
So what can be done?
Normally, you just cannot have a pointer-argument in a def-function, but char * is an exception - a bytes object can be automatically converted to char *, which however doesn't happen via buffer protocol but via PyBytes_AsStringAndSize.
This is the reason, why you cannot pass x_ca to c_GetCalculatedCrc as it is: x_ca implements the buffer protocol, but is not a bytes-object and thus there is no PyBytes_AsStringAndSize.
An alternative is to use typed memory view, which utilizes the buffer protocol, i.e.
%%cython
def c_GetCalculatedCrc(const unsigned char[:] stream):
print(stream[6]);
and now passing x_ca directly, with original length/content:
c_GetCalculatedCrc(x_ca)
# 65 as expected
Another alternative would be to pass x_ca.raw to function expecting const unsigned char * as argument, as has been pointed out by #DavidW in comments, which shares memory with x_ca. However I would prefer the typed memory views - they are safer than raw pointers and you would not run into surprisingly undefined behavior.

datetime.datetime.strptime() using variable input length

datetime.datetime.strptime seems to force directive matching regardless of the actual string length used. By using shorter strings, the directives will force the datetime.datetime object to use "something" in the string regardless of actual directives.
This is the correct behavior with enough input to fill the directives
>>> datetime.datetime.strptime('20180822163014', '%Y%m%d%H%M%S')
datetime.datetime(2018, 8, 22, 16, 30, 14)
This directives however will change the previous parsing
>>> datetime.datetime.strptime('20180822163014', '%Y%m%d%H%M%S%f')
datetime.datetime(2018, 8, 22, 16, 30, 1, 400000)
Is there any way to drop rightmost directives if input string is not long enough instead of cannibalizing the left ones?
I've tagged C and ubuntu because documentation says
"The full set of format codes supported varies across platforms,
because Python calls the platform C library’s strftime() function, and
platform variations are common. To see the full set of format codes
supported on your platform, consult the strftime(3) documentation."
EDIT:
man ctime shows the following structure as output. It is interesting that the microseconds ( %f ) precision doesn't seem to be supported.
struct tm {
int tm_sec; /* Seconds (0-60) */
int tm_min; /* Minutes (0-59) */
int tm_hour; /* Hours (0-23) */
int tm_mday; /* Day of the month (1-31) */
int tm_mon; /* Month (0-11) */
int tm_year; /* Year - 1900 */
int tm_wday; /* Day of the week (0-6, Sunday = 0) */
int tm_yday; /* Day in the year (0-365, 1 Jan = 0) */
int tm_isdst; /* Daylight saving time */
};
Well, I guess you have to do it by yourself, which doesn't seems to hard because you know the pattern.
Something like that should to the job
pattern = ""
if len(s) == 0: raise Exception "empty time string"
if len(s) <= 4: pattern += "%Y"
... # as many if as you need here
datetime.datetime.strptime(s, pattern)
Which is very painful to write if you have long date pattern, but I doubt that there is some function doing it already in the datetime module - for the reason that its just a binding with C.
You can try to do something more generic and ask if it could be add to the datetime module.

Calling a C function in Python and returning 2 values

I am trying to figure out how to return 2 vales from a C function that I called in python. I have read through the material online and am using struct to output the two variables. I am able to output the variables when I call this function in the same C file. However, when I try to call it in python, it still only returns one value.
This is my C code:
struct re_val {
double predict_label;
double prob_estimates;
};
struct re_val c_func(const char* dir, double a, double b, double c, double d )
{
double x[] = {a,b,c,d};
printf ("x[0].index: %d \n", 1);
printf ("x[0].value: %f \n", x[0]);
printf ("x[1].index: %d \n", 2);
printf ("x[1].value: %f \n", x[1]);
printf ("x[2].index: %d \n", 3);
printf ("x[2].value: %f \n", x[2]);
printf ("x[3].index: %d \n", 4);
printf ("x[3].value: %f \n", x[3]);
printf ("\nThis is the Directory: %s \n", dir);
struct re_val r;
r.predict_label = 5.0;
r.prob_estimates = 8.0;
return r;
}
This is my Python code:
calling_function = ctypes.CDLL("/home/ruven/Documents/Sonar/C interface/Interface.so")
calling_function.c_func.argtypes = [ctypes.c_char_p, ctypes.c_double, ctypes.c_double, ctypes.c_double, ctypes.c_double]
calling_function.c_func.restype = ctypes.c_double
q = calling_function.c_func("hello",1.3256, 2.45, 3.1248, 4.215440)
print q
Currently, when I run my python file in the terminal it outputs this:
x[0].index: 1
x[0].value: 1.325600
x[1].index: 2
x[1].value: 2.450000
x[2].index: 3
x[2].value: 3.124800
x[3].index: 4
x[3].value: 4.215440
This is the Directory: hello
5.0
Instead,I would like it to output this:
x[0].index: 1
x[0].value: 1.325600
x[1].index: 2
x[1].value: 2.450000
x[2].index: 3
x[2].value: 3.124800
x[3].index: 4
x[3].value: 4.215440
This is the Directory: hello
5.0
8.0
Your C code is fine, the problem you are experiencing is in how you use python ctypes. You should tell that the function returns a struct re_val and not a double:
calling_function.c_func.restype = ctypes.c_double
The above makes the function return a single double value in the eyes of ctypes. You should tell python that the function returns a structure:
import ctypes as ct
# Python representation of the C struct re_val
class ReVal(ct.Structure):
_fields_ = [("predict_label", ct.c_double),("prob_estimates", ct.c_double)]
calling_function = ctypes.CDLL("/home/ruven/Documents/Sonar/C interface/Interface.so")
calling_function.c_func.argtypes = [ctypes.c_char_p, ctypes.c_double, ctypes.c_double, ctypes.c_double, ctypes.c_double]
# and instead of c_double use:
calling_function.c_func.restype = ReVal
This way you tell python's ctypes that the function returns a aggregate object that is a subclass of ctypes.Structure that matches the struct re_val from the c library.
NOTE Be very carefull with argtypes and restype, if you use these incorrectly it is easy to crash the python interpreter. Then you get a segfault instead of a nice traceback.

How to exchange time between C and Python

I'm writing some C code that needs to embed the current time in its (binary) output file. Later, this file will be read by some other C code (possibly compiled for different architecture) and/or some python code. In both cases calculations may be required on the time.
What I'd like to know is:
How do I get current UTC time in C? Is time() the write call?
What format should I write this to file in? ASN1? ISO?
How do I convert to that format?
How do I read that format in C and Python and convert it into something useful?
You could use rfc 3339 datetime format (a profile of ISO8601). It avoids many pitfalls of unconstrained ISO8601 timestamps.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void) {
char buf[21];
time_t ts = time(NULL);
struct tm *tp = gmtime(&ts);
if (tp == NULL || tp->tm_year > 8099 || tp->tm_year < 0) {
perror("gmtime");
exit(EXIT_FAILURE);
}
if (strftime(buf, sizeof buf, "%Y-%m-%dT%H:%M:%SZ", tp) == 0) {
fprintf(stderr, "strftime returned 0\n");
exit(EXIT_FAILURE);
}
exit(puts(buf) != EOF ? EXIT_SUCCESS : EXIT_FAILURE);
}
Output
2014-12-20T11:08:44Z
To read it in Python:
>>> from datetime import datetime, timezone
>>> dt = datetime.strptime('2014-12-20T11:08:44Z', '%Y-%m-%dT%H:%M:%SZ')
>>> dt = dt.replace(tzinfo=timezone.utc)
>>> print(dt)
2014-12-20 11:08:44+00:00
Use the following C code to get a suitable date output:
time_t rawtime;
struct tm *now;
char timestamp[80];
time(&rawtime);
now = gmtime(&rawtime);
strftime(timestamp, sizeof(timestamp), "%Y%m%d%H%M%S", now);
Then use the following python to read it:
start_time = datetime.datetime.strptime(data, "%Y%m%d%H%M%S")
Variations on the format work, as long as it's consistent.

md5 a string multiple times get different result on different platform

t.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <openssl/md5.h>
static char* unsigned_to_signed_char(const unsigned char* in , int len) {
char* res = (char*)malloc(len * 2 + 1);
int i = 0;
memset(res , 0 , len * 2 + 1);
while(i < len) {
sprintf(res + i * 2 , "%02x" , in[i]);
i ++;
};
return res;
}
static unsigned char * md5(const unsigned char * in) {
MD5_CTX ctx;
unsigned char * result1 = (unsigned char *)malloc(MD5_DIGEST_LENGTH);
MD5_Init(&ctx);
printf("len: %lu \n", strlen(in));
MD5_Update(&ctx, in, strlen(in));
MD5_Final(result1, &ctx);
return result1;
}
int main(int argc, char *argv[])
{
const char * i = "abcdef";
unsigned char * data = (unsigned char *)malloc(strlen(i) + 1);
strncpy(data, i, strlen(i));
unsigned char * result1 = md5(data);
free(data);
printf("%s\n", unsigned_to_signed_char(result1, MD5_DIGEST_LENGTH));
unsigned char * result2 = md5(result1);
free(result1);
printf("%s\n", unsigned_to_signed_char(result2, MD5_DIGEST_LENGTH));
unsigned char * result3 = md5(result2);
free(result2);
printf("%s\n", unsigned_to_signed_char(result3, MD5_DIGEST_LENGTH));
return 0;
}
makeflle
all:
cc t.c -Wall -L/usr/local/lib -lcrypto
and t.py
#!/usr/bin/env python
import hashlib
import binascii
src = 'abcdef'
a = hashlib.md5(src).digest()
b = hashlib.md5(a).digest()
c = hashlib.md5(b).hexdigest().upper()
print binascii.b2a_hex(a)
print binascii.b2a_hex(b)
print c
The results of python script on Debian6 x86 and MacOS 10.6 are the same:
e80b5017098950fc58aad83c8c14978e
b91282813df47352f7fe2c0c1fe9e5bd
85E4FBD1BD400329009162A8023E1E4B
the c version on MacOS is:
len: 6
e80b5017098950fc58aad83c8c14978e
len: 48
eac9eaa9a4e5673c5d3773d7a3108c18
len: 64
73f83fa79e53e9415446c66802a0383f
Why it is different from Debian6 ?
Debian environment:
gcc (Debian 4.4.5-8) 4.4.5
Python 2.6.6
Linux shuge-lab 2.6.26-2-686 #1 SMP Thu Nov 25 01:53:57 UTC 2010 i686 GNU/Linux
OpenSSL was installed from testing repository.
MacOS environment:
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Python 2.7.1
Darwin Lees-Box.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
OpenSSL was installed from MacPort.
openssl #1.0.0d (devel, security)
OpenSSL SSL/TLS cryptography library
I think you are allocating bytes exactly for MD5 result, without ending \0. Then you are calculating MD5 of block of memory that starts with result from previous MD5 calculating but with some random bytes after it. You should allocate one byte more for result and set it to \0.
My proposal:
...
unsigned char * result1 = (unsigned char *)malloc(MD5_DIGEST_LENGTH + 1);
result1[MD5_DIGEST_LENGTH] = 0;
...
The answers so far don't seem to me to have stated the issue clearly enough. Specifically the problem is the line:
MD5_Update(&ctx, in, strlen(in));
The data block you pass in is not '\0' terminated, so the call to update may try to process further bytes beyond the end of the MD5_DIGEST_LENGTH buffer. In short, stop using strlen() to work out the length of an arbitrary buffer of bytes: you know how long the buffers are supposed to be so pass the length around.
You don't '\0' terminate the string you're passing to md5 (which I
suppose takes a '\0' terminated string, since you don't pass it the
length). The code
memset( data, 0, sizeof( strlen( i ) ) );
memcpy( data, i, strlen( i ) );
is completely broken: sizeof( strlen( i ) ) is the same as
sizeof( size_t ), 4 or 8 on typical machines. But you don't want the
memset anyway. Try replacing these with:
strcpy( data, i );
Or better yet:
std::string i( "abcdef" );
, then pass i.c_str() to md5 (and declare md5 to take a char
const*. (I'd use an std::vector<unsigned char> in md5() as well,
and have it return it. And unsigned_to_signed_char would take the
std::vector<unsigned char> and return std::string.)

Categories