How to debug Access Violation that it thrown from windows library ucrtbase?

How to debug Access Violation that it thrown from windows library ucrtbase? - python

My app (java based) launches python for Windows, which, in turn, calls os.spawnv to launch another python.
From time to time I am having Access Violation exception.
00 005eedb0 763e68f3 ucrtbase!<lambda_7d9ee38b11181ddfdf5bd66394e53cb7>::operator()+0x1b
01 005eedfc 763e65d9 ucrtbase!construct_environment_block<char>+0xdb
02 005eee14 763e7aba ucrtbase!common_pack_argv_and_envp<char>+0x31
03 005eeebc 763e778a ucrtbase!execute_command<char>+0x62
04 005eeee8 763e8066 ucrtbase!common_spawnv<char>+0x13f
05 005eeef8 65a323d7 ucrtbase!_spawnve+0x16
06 005eef38 65a360c6 python35!os_spawnve_impl(int mode = 0n0, struct _object * path = 0x03adfde0, struct _object * argv = 0x03b258a0, struct _object * env = 0x03b25a80)+0x1a7 [c:\build\cpython\modules\posixmodule.c # 5299]
I've set bp on c:\build\cpython\modules\posixmodule.c # 5299 and here is what I see in python sources
Py_BEGIN_ALLOW_THREADS
spawnval = _spawnve(mode, path_char, argvlist, envlist);
Py_END_ALLOW_THREADS
I've checked all arguments twice: they are ok. mode is 0, path_char is path to my interpeter, argvlist and envlist are both char**: NULL-terminated arrays of NULL-teminated strings.
So, it is not python fault.
I know that _spawnve is not thread safe, but there is only one thread.
I do not have sources nor private symbols for MS ucrtbase. What is the right approach to investigate it?
--
What is the difference between ucrtbased.dll and ucrtbase.dll?
Should I compile Python against ucrtbased.dll to find more symbols?

Aha! Mystery solved.
See https://bugs.jython.org/issue29908 for details, but basically spawnve() is broken. It relies on these secret current directory environment variables that start with an =, and is likely to crash if the environment doesn't contain them.
cmd.exe and explorer.exe set them when launching processes, but if you launch a process yourself with a constrained environment that doesn't include them, then that process itself tries to call spawnve() you are heading to crash land.
The workaround is to set at least one environment variable that matches the pattern; e.g. =c:=pants and you are good.

I have just encountered the same problem while trying to build SciPy; an access violation is thrown as part of setup.py trying to spawnve() a C compiler.
I haven't gotten to the bottom of it yet, but stepping through the disassembly, in my case at least. Here's what happens:
Calls get_environment_from_os() which gives you a pointer to the current process's environment variables.
Iterates through this list of null-terminated strings looking for environment variables whose name starts with an '='
Apparently this is Windows's way of expressing the set of current directories, e.g. there should be an environment variable '=c:' with a value like "c:\mystuff"
It never finds any, and trundling off into uninitialized memory in its futile quest.
Boom.
Inspecting the memory address returned by get_environment_from_os() shows me a pretty sane looking list of environment variables, but none whose key begins with an '=' character.
I'm still digging into exactly why and how things get into this state; it doesn't always seem to happen which makes me suspect threads, but like you, I can't find any evidence to back this up; though I don't have intimate knowledge about how distutils works.

Related

How do I apply the printers.py modification? (Linux OS)

I checked the core file because the process(c++ lang) running on Linux died, and the contents of the core file
[Corefile]
File "/usr/lib64/../share/gdb/python/libstdcxx/v6/printers.py", line 558, in to_string
return self.val['_M_dataplus']['_M_p'].lazy_string (length = len)
RuntimeError: Cannot access memory at address 0x3b444e45203b290f
I think that there was a problem with class StdStringPrinter at printers.py.
So I looked up a text that explained the problem I was looking for on this site , modified printers.py, and created a .gdbinit on my home path and wrote the content.
How to enable gdb pretty printing for C++ STL objects in Eclipse CDT?
Eclipse/CDT Pretty Print Errors
But this method is a little different from the one I'm looking for because it's done in Eclipse.
my gdb version is 7.6.1-94.el7
[printer.py]
class StdStringPrinter:
"Print a std::basic_string of some kind"
def __init__(self, typename, val):
self.val = val
def to_string(self):
# Make sure &string works, too.
type = self.val.type
if type.code == gdb.TYPE_CODE_REF:
type = type.target ()
sys.stdout.write("HelloWorld") // TEST Code
# Calculate the length of the string so that to_string returns
# the string according to length, not according to first null
# encountered.
ptr = self.val ['_M_dataplus']['_M_p']
realtype = type.unqualified ().strip_typedefs ()
reptype = gdb.lookup_type (str (realtype) + '::_Rep').pointer ()
header = ptr.cast(reptype) - 1
len = header.dereference ()['_M_length']
if hasattr(ptr, "lazy_string"):
return ptr.lazy_string (length = len)
return ptr.string (length = len)
def display_hint (self):
return 'string'
[.gdbinit]
python
import sys
sys.path.insert(0, '/home/Hello/gcc-4.8.2/python')
from libstdcxx.v6.printers import register_libstdcxx_printers
register_libstdcxx_printers (None)
end
My question is to modify printers.py, write gdbinit, and then re-compile the process to test whether it has been applied as modified.
How can I print my modified TEST code at Linux Terminal?

I think that there was a problem with class StdStringPrinter at printers.py
I think you are fundamentally confused, and your problem has nothing at all to do with printers.py.
You didn't show us your GDB session, but it appears that you have tried to print some variable of type std::string, and when you did so, GDB produced this error:
RuntimeError: Cannot access memory at address 0x3b444e45203b290f
What this error means is that GDB could not read value from memory location 0x3b444e45203b290f. On an x86_64 system, such a location indeed can not be readable, because that address does not have canonical form.
Conclusion: the pointer that you followed (likely a pointer to std::string in your program) does not actually point to std::string. "Fixing" the printers.py is not going to solve that problem.
This conclusion is corroborated by
the process(c++ lang) running on Linux died,
Finally, the pointer that you gave GDB to print: 0x3b444e45203b290f looks suspiciously like an ASCII string. Decoding it, we have: \xf); END;. So it's very likely that your program scribbled ); END; over a location where the pointer was supposed to be, and that you have a buffer overflow of some sort.
P.S.
My question is to modify printers.py, write gdbinit, and then re-compile the process to test whether it has been applied as modified.
This question also shows fundamental misunderstanding of how printers.py works. It has nothing to do with your program (it's loaded into GDB).
Recompiling anything (either your program or GDB) is not required. Simply restarting GDB should be all that's neccessary for it to pick up the new version of printers.py (not that that would fix anything).

Error in using C SDK in python

I'm trying to use a SDK with python.
I have multiple dll files in the SPK.
My script:
import ctypes
import os
malib = ctypes.WinDLL(os.path.join('D:\Downloads\Aura API\sdk\AURA_SDK.dll'))
print(malib.GetClaymoreKeyboardLedCount(1)) #function in the dll
I get the error :
WindowsError: exception: access violation reading 0x00000005
I can use some of the functions normaly but for others I get this issue. Also there are different dll's in the SDK and I think the problem could come from the communication between these dll (I only open one of these dll in the script) also because function not working seem to be using the other dll or/and communication with the computer.
Thanks if you got advices

You haven't set the argtypes and restype for the functions you're calling.
This means that, instead of knowing what C types to convert your arguments to, ctypes has to guess based on the Python types you pass it. If it guesses wrong, you will either pass garbage, or, worse, corrupt the stack, leading to an access violation.
For example, imagine this C function:
void func(int64_t n, char *s);
If you do this:
lib = # however you load the library
lib.func(2, 'abc')
… then ctypes, it will convert that 2 to a 32-bit int, not a 64-bit one. If you're using a 32-bit Python and DLL, that means n will get the 2 and the pointer to 'abc' crammed into one meaningless number, and s will be an uninitialized pointer to some arbitrary location in memory that, if your lucky, won't be mapped to anything and will raise an access violation.
But if you first do this:
lib = # however you load the library
lib.func.argtypes = [ctypes.c_int64, ctypes.c_char_p]
lib.func.restype = None
lib.func(2, 'abc')
… then ctypes will convert the 2 to a 64-bit int, so n will get 2 and s will get 'abc' and everyone will be happy.

CTypes error in return value

I'm testing a very simple NASM dll (64-bit) called from ctypes. I pass a single int_64 and the function is expected to return a different int_64.
I get the same error every time:
OSError: exception: access violation writing 0x000000000000092A
where hex value translates to the value I am returning (in this case, 2346). If I change that value, the hex value changes to that value, so the problem is in the value I am returning in rax. I get the same error if I assign mov rax,2346.
I have tested this repeatedly, trying different things, and I've done a lot of research, but this seemingly simple problem is still not solved.
Here is the Python code:
def lcm_ctypes():
input_value = ctypes.c_int64(235)
hDLL = ctypes.WinDLL(r"C:/NASM_Test_Projects/While_Loop_01/While_loops-01.dll")
CallTest = hDLL.lcm
CallTest.argtypes = [ctypes.c_int64]
CallTest.restype = ctypes.c_int64
retvar = CallTest (input_value)
Here is the NASM code:
[BITS 64]
export lcm
section .data
return_val: dq 2346
section .text
finit
lcm:
push rdi
push rbp
mov rax,qword[return_val]
pop rbp
pop rdi
Thanks for any information to help solve this problem.

Your function correctly loads 2346 (0x92a) into RAX. Then execution continues into some following bytes because you didn't jmp or ret.
In this case, we can deduce that the following bytes are probably 00 00, which decodes as add byte [rax], al, hence the access violation writing 0x000000000000092A error message. (i.e. it's not a coincidence that the address it's complaining about is your constant).
As Michael Petch said, using a debugger would have found the problem.
You also don't need to save/restore rdi because you're not touching it. The Windows x86-64 calling convention has many call-clobbered registers, so for a least-common-multiple function you shouldn't need to save/restore anything, just use rax, rcx, rdx, r8, r9, and whatever else Windows lets you clobber (I forget, check the calling convention docs in the x86 tag wiki, especially Agner Fog's guide).
You should definitely use default rel at the top of your file, so the [return_val] load will use a RIP-relative addressing mode instead of absolute.
Also, finit never executes because it's before your function label. But you don't need it either. You had the same finit in your previous asm question: Passing arrays to NASM DLL, pointer value gets reset to zero, where it was also not needed and not executed. The calling convention requires that on function entry (and return), the x87 FPU is already in the state that finit puts it in, more or less. So you don't need it before executing x87 instructions like fmulp and fidivr. But you weren't doing that anyway, you were using SSE FP instructions (which is recommended, especially in 64-bit mode), which don't touch the x87 state at all.
Go read a good tutorial and some docs (some links in the x86 tag wiki) so you understand what's going on well enough to debug a problem like this on your own, or you will have a bad time writing anything more complicated. Guessing what might work doesn't work very well for asm.
From a deleted non-answer: https://www.cs.uaf.edu/2017/fall/cs301/reference/nasm_vs/ shows how to set up Visual Studio to build an executable out of C++ and NASM sources, so you can debug it.

Dll causes Python to crash when using memset

I've been working on a project where I'm trying to use an old CodeBase library, written in C++, in Python. What I want is to use CodeBase to reindex a .dbf-file that has a .cdx-index. But currently, Python is crashing during runtime. A more detailed explanation will follow further down.
As for Python, I'm using ctypes to load the dll and then execute a function I added myself which should cause no problems, since it doesn't use a single line of code that CodeBase itself isn't using.
Python Code:
import ctypes
cb_interface = ctypes.CDLL("C4DLL.DLL")
cb_interface.reindex_file("C:\\temp\\list.dbf")
Here's the CodeBase function I added, but it requires some amount of knowledge that I can't provide right now, without blowing this question up quite a bit. If neccessary, I will provide further insight, as much as I can:
S4EXPORT int reindex_file(const char* file){
CODE4 S4PTR *code;
DATA4 S4PTR *data;
code4initLow(&code, 0, S4VERSION, sizeof(CODE4));
data = d4open(code, file);
d4reindex(data);
return 1;
}
According to my own debugging, my problem happens in code4initLow. Python crashes with a window saying "python.exe has stopped working", when the dll reaches the following line of code:
memset( (void *)c4, 0, sizeof( CODE4 ) ) ;
c4 here is the same object as code in the previous code-block.
Is there some problem with a dll trying to alter memory during runtime? Could it be a python problem that would go away if I were to create a .exe-file from my python script?
If someone could answer me these questions and/or provide a solution for my python-crashing-problem, I would greatly appreciate it.
And last but not least, this is my first question here. If I have accidently managed to violate a written or unwritten rule here, I apologize and promise to fix that as soon as possible.

First of all the pointer code doesn't point anywhere as it is uninitialized. Secondly you don't actually try to fill the structure, since you pass memset a pointer to the pointer.
What you should probably do is declare code as a normal structure instance (and not a pointer), and then use &code when passing it to d4open.

Like Joachim Pileborg saying the problem is, to pass a Nullpointer to code4initLow. ("Alternative") Solution is to allocate Memory for the struct CODE4 S4PTR *code = (CODE4*)malloc(sizeof(CODE4));, then pass the Pointer like code4initLow(code, 0, S4VERSION, sizeof(CODE4));.

fuse utimensat problem

I am developing fuse fs at python (with fuse-python bindings). What method I need to implement that touch correctly work? At present I have next output:
$ touch m/My\ files/d3elete1.me
touch: setting times of `m/My files/d3elete1.me': Invalid argument
File exists "d3elete1.me":
$ ls -l m/My\ files/d3elete1.me
-rw-rw-rw- 1 root root 0 Jul 28 15:28 m/My files/d3elete1.me
Also I was trying to trace system calls:
$ strace touch m/My\ files/d3elete1.me
...
open("m/My files/d3elete1.me", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK|O_LARGEFILE, 0666) = 3
dup2(3, 0) = 0
close(3) = 0
utimensat(0, NULL, NULL, 0) = -1 EINVAL (Invalid argument)
close(0) = 0
...
As you see utimensat failed. I was trying to implement empty utimens and utime but its are not even called.

Try launching fuse with the -f option. Fuse will stay in foreground and you can see errors in the console.

You must implement utimens and getattr. Not all the system calls necessarily map directly to the C calls you might be expecting. Many of them are used internally by FUSE to check and navigate your filesystem, depending on which FUSE options are set.
I believe in your case FUSE is preceding it's interpretation of utimesat to utimens, with a getattr check to verify that the requested file is present, and has the expected attributes.
Update0
This is a great coincidence. There is a comment below suggestion that the issue likes with the fact that FUSE does not support utimensat. This is not the case. I had the exact same traceback you've provided while using fuse-python on Ubuntu 10.04. I poked around a little, it would appear that the fuse-python 0.2 bindings are for FUSE 2.6, it may be that a slight change has introduced this error (FUSE is now at version 2.8). My solution was to stop using fuse-python (the code is an ugly mess), and I found an alternate binding fusepy. I've not looked back, and had no trouble since.
I highly recommend you take a look, your initialization code will be cleaner, and minimal changes are required to adapt to to the new binding. Best of all, it's only one module, and an easy read.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.