Python: XOR each character in a string - python

I'm trying to validate a checksum on a string which in this case is calculated by performing an XOR on each of the individual characters.
Given my test string:
check_against = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
I figured it would be as simple as:
result = 0
for char in check_against:
result = result ^ ord(char)
I know the result should be 28, however my code gives 40.
I'm not sure what encoding the text is suppose to be in, although I've tried encoding/decoding in utf-8 and ascii, both with the same result.
I implemented this same algorithm in C by simply doing an XOR over the char array with perfect results, so what am I missing?
Edit
So it was a little while ago that I implemented (what I thought) was the same thing in C. I knew it was in an Objective-C project but I thought I had just done it this way. Totally wrong, first there was a step where I converted the checksum string value at the end to hex like so (I'm filling some things in here so that I'm only pasting what is relevant):
unsigned int checksum = 0;
NSScanner *scanner = [NSScanner scannerWithString:#"26"];
[scanner scanHexInt:&checksum];
Then I did the following to compute the checksum:
NSString sumString = #"GPGLL,5300.97914,N,00259.98174,E,125926,A";
unsigned int sum = 0;
for (int i=0;i<sumString.length;i++) {
sum = sum ^ [sumString characterAtIndex:i];
}
Then I would just compare like so:
return sum == checksum;

So as #metatoaster, #XD573, and some others in the comments have helped figure out, the issue was the difference between the result, which was base 10, and my desired solution (in base 16).
The result for the code, 40 is correct - in base 10, however my correct value I was trying to achieve, 28 is given in base 16. Simply converting the solution from base 16 to base 10, for example like so:
int('28', 16)
I get 40, the computed result.

#python3
str = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
cks = 0
i = 0
while(i<len(str)):
cks^=ord(str[i])
i+=1
print("hex:",hex(cks))
print("dec:",cks)

I created the C version as shown here:
#include <stdio.h>
#include <string.h>
int main()
{
char* str1="GPGLL,5300.97914,N,00259.98174,E,125926,A";
int sum = 0;
int i = 0;
for (i; i < strlen(str1); i++) {
sum ^= str1[i];
}
printf("checksum: %d\n", sum);
return 0;
}
And When I compiled and ran it:
$ gcc -o mytest mytest.c
$ ./mytest
checksum: 40
Which leads me to believe that the assumptions you have from your equivalent C code are incorrect.

Related

If all strings either need a length or null terminator, how do high level languages just construct a string with chars?

So I am writing some C code to play around with mocking up a scripting language.
I ran into a scenario where if I run a function to import a file, say import("file.c") I run into an issue where I can not necessarily use a pointer because it is not null terminated. I would also need to give the length of the string like import("file.c", 5) or use a null terminating character import("file.c\0"). I assume using a buffer is the way to go with a fixed size such as char file_name[256] which probably covers a file name large enough. But that raises some interesting questions regarding 'higher' level programming languages like say Python or Golang. So Golong's imports look like this from a internet search:
import (
"fmt"
"math"
)
I would assume the libraries are being treated as string, no? What about Python?
import pandas as pd
import math
import functools
Are those also being treated as strings? At least, to me, I would assume golang's imports are.
But let's forget imports entirely. What about just strings?
Python's string is:
s = "I like Apple Pie"
I saw here that strings in golang are defined as:
type _string struct {
elements *byte // underlying bytes
len int // number of bytes
}
Then the next segment of code says:
const World = "world"
where there is no len specified. What gives?
How does golang, or in general, 'higher' level languages make use of strings without having to specify a null terminated string or the length with a number? Or am I missing something entirely?
I come from a Python background with some C but it seems pretty similar in most programming languages today.
The fact that you don't write string length or the null terminating character for a string literal does not mean that it cannot be done automatically: the compiler can do it (because it knows the string length at compilation time) and is very likely doing it.
For example in C:
The null character ('\0', L'\0', char16_t(), etc) is always
appended to the string literal: thus, a string literal "Hello" is a
const char[6] holding the characters 'H', 'e', 'l', 'l', 'o', and
'\0'.
Here is a small C program that show that null character is appended to string literal:
#include <stdio.h>
#include <string.h>
int main()
{
char *p="hello";
int i;
i = 0;
while (p[i] != '\0')
{
printf("%c ", p[i]);
i++;
}
printf("\nstrlen(p)=%ld\n", strlen(p));
return 0;
}
Execution:
./hello
h e l l o
strlen(p)=5
You can also compile the program in debug mode with:
gcc -g -o hello -Wall -pedantic -Wextra hello.c
and check with gdb:
gdb hello
...
(gdb) b main
Breakpoint 1 at 0x400585: file hello.c, line 6.
(gdb) r
Starting program: /home/pifor/c/hello
Breakpoint 1, main () at hello.c:6
6 char *p="hello";
(gdb) n
9 i = 0;
(gdb) print *(p+5)
$7 = 0 '\000'
(gdb) print *(p+4)
$8 = 111 'o'
(gdb) print *p
$10 = 104 'h'
A string value in Go is a pointer to bytes and a length. Here's Go's definition of the type:
type StringHeader struct {
Data uintptr
Len int
}
In the case of a string literal like "world", the compiler counts the bytes to set the length. The literal is represented by StringHeader{Data: pointerToBytesWorld, Len: 5} at runtime.
The length is implicit in the slice operands for a string value through a slice expression:
s := "Hello"
s = s[1:4] // length is 4 - 1 = 3
String conversions take the length from the operand:
b := []byte{'F', 'o', 'p'}
s = string(b) // length is 3, same as len(b)

Incorrect CRC calculation in protocol. One is implemented using zlib and the other one is calculated in function

I am implementing a protocol in an STM32F412 board. It's almost done, I just need to do a CRC check for the received data.
I tried using the internal CRC module for calculating the CRC but I could not match the result to any online CRC algorithm online, so I decided to do a simple implementation of the Ethernet CRC.
static const uint32_t crc32_tab[] =
{
0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L,
0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L,
0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL,
0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L,
0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L,
0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L,
0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL,
0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L,
0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL,
0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L,
0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L,
0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L,
0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL,
0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL,
0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L,
0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL,
0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L,
0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L,
0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L,
0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL,
0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L,
0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L,
0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL,
0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L,
0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L,
0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L,
0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L,
0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L,
0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL,
0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL,
0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L,
0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L,
0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL,
0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL,
0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L,
0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL,
0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L,
0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL,
0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L,
0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL,
0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L,
0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L,
0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL,
0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L,
0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L,
0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L,
0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L,
0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L,
0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L,
0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
0x2d02ef8dL
};
uint32_t calc_crc_calculate(uint8_t *pData, uint32_t uLen)
{
uint32_t val = 0xFFFFFFFFU;
int i;
for(i = 0; i < uLen; i++) {
val = crc32_tab[(val ^ pData[i]) & 0xFF] ^ ((val >> 8) & 0x00FFFFFF);
}
return val^0xFFFFFFFF;
}
I calculated the crc of 0x6F and compared the result to the online calculators and it apparently matches.
When I try to test the protocol with my python code I'm just unable to match the CRCs. On python I'm using the following code:
d = 0x6f
crc = zlib.crc32(bytes(d))&0xFFFFFFFF
I'm now unable to tell which is right. Apparently my algorithm is OK because it matches the online calculator. BUT those online calculators do not seem to be reliable sometimes and I doubt that python's zlib implementation is wrong .. I may be using it wrong at worst.
Actually you can compute the Ethernet CRC32 with the builtin module of the STM32. It took me quite a while to make it match up as well.
This code should match up for sizes divisible by 4 (I also used python zlib on the other end):
#include "stm32l4xx_hal.h"
uint32_t CRC32_Compute(const uint32_t *data, size_t sizeIn32BitWords)
{
CRC_HandleTypeDef hcrc = {
.Instance = CRC,
.Init.DefaultPolynomialUse = DEFAULT_POLYNOMIAL_ENABLE,
.Init.DefaultInitValueUse = DEFAULT_INIT_VALUE_ENABLE,
.Init.InputDataInversionMode = CRC_INPUTDATA_INVERSION_WORD,
.Init.OutputDataInversionMode = CRC_OUTPUTDATA_INVERSION_ENABLE,
.InputDataFormat = CRC_INPUTDATA_FORMAT_WORDS,
};
HAL_StatusTypeDef status = HAL_CRC_Init(&hcrc);
assert (status == HAL_OK)
uint32_t checksum = HAL_CRC_Calculate(&hcrc, data, sizeIn32BitWords);
uint32_t checksumInverted = ~checksum;
return checksumInverted;
}
The challenge with sizes not divisible by 4 is to get the "inversion/reversal" (changing the bit order) right. There is an example how the hardware handles this in the "RM0394 Reference manual STM32L43xxx STM32L44xxx STM32L45xxx STM32L46xxx advanced ARM®-based 32-bit MCUs Rev 3" on page 333.
The essence is that reversal reverses the bit order. For CRC32 this reversal must happen on the word level, i.e. over 32 bits.
Ok. It certainly was a bug on my part. But it was happening in my python code.
I suddenly realized that I was practically doing bytes(0x6F) which just creates an array with 111 positions.
What I actually needed to do was
import struct
d = pack('B', 0x6F)
crc = zlib.crc32(bytes(d))&0xFFFFFFFF
This question could have been avoided had I just done a little bit of rubber duck debugging. Hopefuly this will help someone else.

Cython print() outputs before C printf(), even when placed afterwards

I'm trying to pick up Cython.
import counter
cdef public void increment():
counter.increment()
cdef public int get():
return counter.get()
cdef public void say(int times):
counter.say(times)
This is the "glue code" I'm using to call functions from counter.py, a pure Python source code file. It's laid out like this:
count = 0
def increment():
global count
count += 1
def get():
global count
return count
def say(times):
global count
print(str(count) * times)
I have successfully compiled and run this program. The functions work fine. However, a very strange thing occured when I tested this program:
int main(int argc, char *argv[]) {
Py_Initialize();
// The following two lines add the current working directory
// to the environment variable `PYTHONPATH`. This allows us
// to import Python modules in this directory.
PyRun_SimpleString("import sys");
PyRun_SimpleString("sys.path.append(\".\")");
PyInit_glue();
// Tests
for (int i = 0; i < 10; i++)
{
increment();
}
int x = get();
printf("Incremented %d times\n", x);
printf("The binary representation of the number 42 is");
say(3);
Py_Finalize();
return 0;
}
I would expect the program to produce this output:
Incremented 10 times
The binary representation of the number 42 is
101010
However, it prints this:
Incremented 10 times
101010
The binary representation of the number 42 is
But if I change the line
printf("The binary representation of the number 42 is");
to
printf("The binary representation of the number 42 is\n");
then the output is corrected.
This seems strange to me. I understand that if I want to print the output of a Python function, I might just as well return it to C and store it in a variable, and use C's printf() rather than the native Python print(). But I would be very interested to hear the reason this is happening. After all, the printf() statement is reached before the say() statement (I double checked this in gdb just to make sure). Thanks for reading.

Equivalent expression in Python

I am a Python n00b and at the risk of asking an elementary question, here I go.
I am porting some code from C to Python for various reasons that I don't want to go into.
In the C code, I have some code that I reproduce below.
float table[3][101][4];
int kx[6] = {0,1,0,2,1,0};
int kz[6] = {0,0,1,0,1,2};
I want an equivalent Python expression for the C code below:
float *px, *pz;
int lx = LX; /* constant defined somewhere else */
int lz = LZ; /* constant defined somewhere else */
px = &(table[kx[i]][0][0])+lx;
pz = &(table[kz[i]][0][0])+lz;
Can someone please help me by giving me the equivalent expression in Python?
Here's the thing... you can't do pointers in python, so what you're showing here is not "portable" in the sense that:
float *px, *pz; <-- this doesn't exist
int lx = LX; /* constant defined somewhere else */
int lz = LZ; /* constant defined somewhere else */
px = &(table[kx[i]][0][0])+lx;
pz = &(table[kz[i]][0][0])+lz;
^ ^ ^
| | |
+----+----------------------+---- Therefore none of this makes any sense...
What you're trying to do is have a pointer to some offset in your multidimensional array table, because you can't do that in python, you don't want to "port" this code verbatim.
Follow the logic beyond this, what are you doing with px and pz? That is the code you need to understand to try and port.
There is no direct equivalent for your C code, since Python has no pointers or pointer arithmetic. Instead, refactor your code to index into the table with bracket notation.
table[kx[i]][0][lx] = 3
would be a rough equivalent of the C
px = &(table[kx[i]][0][0])+lx;
*px = 3;
Note that in Python, your table would not be contiguous. In particular, while this might work in C:
px[10] = 3; // Bounds violation!
This will IndexError in Python:
table[kx[i]][0][lx + 10] = 3

Python Fast Input Output Using Buffer Competitive Programming

I have seen people using buffer in different languages for fast input/output in Online Judges. For example this http://www.spoj.pl/problems/INTEST/ is done with C like this:
#include <stdio.h>
#define size 50000
int main (void){
unsigned int n=0,k,t;
char buff[size];
unsigned int divisible=0;
int block_read=0;
int j;
t=0;
scanf("%lu %lu\n",&t,&k);
while(t){
block_read =fread(buff,1,size,stdin);
for(j=0;j<block_read;j++){
if(buff[j]=='\n'){
t--;
if(n%k==0){
divisible++;
}
n=0;
}
else{
n = n*10 + (buff[j] - '0');
}
}
}
printf("%d",divisible);
return 0;
How can this be done with python?
import sys
file = sys.stdin
size = 50000
t = 0
while(t != 0)
block_read = file.read(size)
...
...
Most probably this will not increase performance though – Python is interpreted language, so you basically want to spend as much time in native code (standard library input/parsing routines in this case) as possible.
TL;DR either use built-in routines to parse integers or get some sort of 3rd party library which is optimized for speed.
I tried solving this one in Python 3 and couldn't get it to work no matter how I tried reading the input. I then switched to running it under Python 2.5 so I could use
import psyco
psyco.full()
After making that change I was able to get it to work by simply reading input from sys.stdin one line at a time in a for loop. I read the first line using raw_input() and parsed the values of n and k, then used the following loop to read the remainder of the input.
for line in sys.stdin:
count += not int(line) % k

Categories