3 weeks back I posted a question on here to understand how WordPress saves passwords to db. Mystical suggested I look at the source code, and I tried to but I am not too good with php so I am trying to convert relevant functions to python. Here is what I have so far:
Python:
import base64
from email.encoders import encode_base64
from hashlib import md5
prefix = '$P$B'
salt = 'KcFRBGXE'
password = '^zVw*wSFshV2' #password i enter to login
real_hashed_pass = '$P$BKcFRBGXEWOVYQShBC1edT7f3e3Nca1' #this is stored in wp db
hashed_pass = md5((salt + password).encode('utf-8')).hexdigest()
for i in range(8193):
hashed_pass = md5((hashed_pass + password).encode('utf-8')).hexdigest()
# for i in range(17):
# hashed_pass = base64.standard_b64encode(hashed_pass)
hashed_pass = prefix + salt + hashed_pass
print(hashed_pass == real_hashed_pass)
Relevant PHP (full code):
<?php
class PasswordHash {
var $itoa64;
var $iteration_count_log2;
var $portable_hashes;
var $random_state;
function __construct($iteration_count_log2, $portable_hashes)
{
$this->itoa64 = './0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
if ($iteration_count_log2 < 4 || $iteration_count_log2 > 31)
$iteration_count_log2 = 8;
$this->iteration_count_log2 = $iteration_count_log2;
$this->portable_hashes = $portable_hashes;
$this->random_state = microtime();
if (function_exists('getmypid'))
$this->random_state .= getmypid();
}
function encode64($input, $count)
{
$output = '';
$i = 0;
do {
$value = ord($input[$i++]);
$output .= $this->itoa64[$value & 0x3f];
if ($i < $count)
$value |= ord($input[$i]) << 8;
$output .= $this->itoa64[($value >> 6) & 0x3f];
if ($i++ >= $count)
break;
if ($i < $count)
$value |= ord($input[$i]) << 16;
$output .= $this->itoa64[($value >> 12) & 0x3f];
if ($i++ >= $count)
break;
$output .= $this->itoa64[($value >> 18) & 0x3f];
} while ($i < $count);
return $output;
}
function crypt_private($password, $setting)
{
$output = '*0';
if (substr($setting, 0, 2) === $output)
$output = '*1';
$id = substr($setting, 0, 3);
# We use "$P$", phpBB3 uses "$H$" for the same thing
if ($id !== '$P$' && $id !== '$H$')
return $output;
$count_log2 = strpos($this->itoa64, $setting[3]);
if ($count_log2 < 7 || $count_log2 > 30)
return $output;
$count = 1 << $count_log2;
$salt = substr($setting, 4, 8);
if (strlen($salt) !== 8)
return $output;
# We were kind of forced to use MD5 here since it's the only
# cryptographic primitive that was available in all versions
# of PHP in use. To implement our own low-level crypto in PHP
# would have resulted in much worse performance and
# consequently in lower iteration counts and hashes that are
# quicker to crack (by non-PHP code).
$hash = md5($salt . $password, TRUE);
do {
$hash = md5($hash . $password, TRUE);
} while (--$count);
$output = substr($setting, 0, 12);
$output .= $this->encode64($hash, 16);
return $output;
}
function CheckPassword($password, $stored_hash)
{
if ( strlen( $password ) > 4096 ) {
return false;
}
$hash = $this->crypt_private($password, $stored_hash);
if ($hash[0] === '*')
$hash = crypt($password, $stored_hash);
# This is not constant-time. In order to keep the code simple,
# for timing safety we currently rely on the salts being
# unpredictable, which they are at least in the non-fallback
# cases (that is, when we use /dev/urandom and bcrypt).
return $hash === $stored_hash;
}
}
My goal is to have the python code produce the same hashed password as the wordpress code. I think the error in the python code is at the commented out loop but I am not sure how to fix it.
Thank you for the help!
UPDATE
When someone enters their passcode you hash it.
$hash1 = hash('ripemd320',$passcode);
$sql = "SELECT `hash` FROM `Client` WHERE `Number` = $client LIMIT 1";
$results = mysqli_query($link,$sql);
list($hash2) = mysqli_fetch_array($results, MYSQLI_NUM);
if($hash1 == $hash2){unlock the pearly gates;}
END OF UPDATE
No 0one should ever save a password in a db. So when you ask "how WordPress saves passwords to db" the answer is they do not.
Are you working ona Word Press add on or do you want to "save passwords" in the same manner as WP?
Word Press is not the place to copy any PHP coding techniques.
When you bring Python into the equation I have to think you are not working with WP but want to do it the same was as WP. That would be a bad idea.
Passwords are not that complicated. And it only takes a few lines of code.
When the password is created you save the hash in the user's table. When they login you get the hash from the table, hash the password given and compare the two.
I recommend using only numerical usernames. Then when you get the username you convert it to an integer and SQL injection is impossible.
My NodeJS & Python scripts don't return the same hash, what could cause this issue?
Node.js
const { createHmac } = require("crypto");
var message = 'v1:1583197109:'
var key = 'Asjei8578FHasdjF85Hfjkasi875AsjdiAas_CwueKL='
const digest = Buffer.from(key, "base64");
const hash = createHmac("sha256", digest)
.update(message)
.digest("hex");
console.log(hash)
> 7655b4f816dc7725fb4507a20f2b97823979ea00b121c84b76924fea167dcaf7
Python3
message = 'v1:1583197109:'
key = 'Asjei8578FHasdjF85Hfjkasi875AsjdiAas_CwueKL=' + '=' #add a "=" to avoid incorrect padding
digest = base64.b64decode(key.encode('utf-8'))
hash_ = hmac.new(digest, message.encode('utf-8'), hashlib.sha256)
hash_result = hash_.hexdigest()
print(hash_result)
> c762b612d7c56d3f9c95052181969b42c604c2d41b7ce5fc7f5a06457e312d5b
I guess it could be the extra = to avoid the incorrect padding but my key ends with a single =.
Node.js Buffer.from(..., 'base64') can consume the input in the "urlsafe" base64 (https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings), and _ is not a valid Base64 character for python, while it is for node.
Adding altchars that correspond to the "urlsafe" version of Base64 to python code yields equal hashes.
const { createHmac } = require("crypto");
var message = 'v1:1583197109:'
var key = 'Asjei8578FHasdjF85Hfjkasi875AsjdiAas_CwueKL='
const digest = Buffer.from(key, "base64");
const hash = createHmac("sha256", digest)
.update(message)
.digest("hex");
console.log(hash) // 7655b4f816dc7725fb4507a20f2b97823979ea00b121c84b76924fea167dcaf7
message = 'v1:1583197109:'
key = 'Asjei8578FHasdjF85Hfjkasi875AsjdiAas_CwueKL=' + '=' #add a "=" to avoid incorrect padding
digest = base64.b64decode(key.encode('utf-8'), altchars='-_')
hash_ = hmac.new(digest, message.encode('utf-8'), hashlib.sha256)
hash_result = hash_.hexdigest()
print(hash_result) # 7655b4f816dc7725fb4507a20f2b97823979ea00b121c84b76924fea167dcaf7
Also, python's b64decode has validate kwarg, which would check the input string and "fail loud" instead of ignoring incorrect characters
I have the following example where I want to output a variable in the same json format with os.system (the idea is to output with a system command) but the double quotes are ignored in output.
import json
import os
import requests
PAYLOAD_CONF = {
"cluster": {
"ldap": "string",
"processes": 22
}
}
paystr = (str(PAYLOAD_CONF))
paydic = (json.dumps(PAYLOAD_CONF))
os.system("echo "+paystr+"")
os.system("echo "+paydic+"")
Output:
{cluster: {processes: 22, ldap: string}}
{cluster: {processes: 22, ldap: string}}
Can you help me in a workaround where I can output this with double quotes? It's very important to output with system command.
For cases like this, where you embed variables into a shell command you should use shlex.quote. Using this (and with minor cleanup), the code can be written as:
import json
import os
import shlex
PAYLOAD_CONF = {
"cluster": {
"ldap": "string",
"processes": 22
}
}
paydic = json.dumps(PAYLOAD_CONF)
os.system("echo " + shlex.quote(paydic))
Output:
{"cluster": {"ldap": "string", "processes": 22}}
Using subprocess
The subprocess module contains a lot of helper functions for calling external applications. These functions are generally preferrable to use than os.system for various security reasons.
If there is no hard dependency of os.system you can also use one of depending on your needs:
subprocess.call -- This will return even if the subprocess fails. If this returns a non-zero value, the process exited abnormally.
subprocess.check_call -- This will raise an exception if the process exits abnormally.
subprocess.check_output -- This will return stdout from the subprocess and raise an exception if it exits abnormally.
... The module contains many other helpful functions for interacting with subprocess which you should check out if the above don't suit your needs.
using check_output, the code becomes:
from subprocess import check_call
import json
import os
import shlex
PAYLOAD_CONF = {
"cluster": {
"ldap": "string",
"processes": 22
}
}
paydic = json.dumps(PAYLOAD_CONF)
check_call(["echo", paydic])
You're not adding quotes; you're adding an empty string to the end.
Also, echo is going to interpret the first set of double quotes as a wrapper around the argument – not as part of the string itself. In the echo command itself, you need to escape double quotes with a backslash, e.g. echo \"hello\" will output "hello", whereas echo "hello" will output hello.
In a Python string, you're going to have to escape the literal backslash in the echo command with another backslash, e.g. os.system('echo \\"hello\\"') for output "hello".
Applying this to your case and using format to make it easy:
import json
import os
import requests
PAYLOAD_CONF = {
"cluster": {
"ldap": "string",
"processes": 22
}
}
paystr = (str(PAYLOAD_CONF))
paydic = (json.dumps(PAYLOAD_CONF))
os.system('echo \\"{}\\"'.format(paystr))
os.system('echo \\"{}\\"'.format(paydic))
Output:
"{cluster: {ldap: string, processes: 22}}"
"{cluster: {ldap: string, processes: 22}}"
Your paystr variable is also unnecessary, since all objects are automatically converted to strings by print and format via their inherited or overridden __str__ methods.
EDIT:
To output the variable as it appears in Python you just need to iterate through the payload dict and render each key-value pair in a type-sensitive way.
import os
import requests
def make_payload_str(payload, nested=1):
payload_str = "{\n" + "\t" * nested
for i, k in enumerate(payload.keys()):
v = payload[k]
if type(k) is str:
payload_str += '\\"{}\\"'.format(k)
else:
payload_str += str(k)
payload_str += ": "
if type(v) is str:
payload_str += '\\"{}\\"'.format(v)
elif type(v) is dict:
payload_str += make_payload_str(v, nested=nested + 1)
else:
payload_str += str(v)
# Only add comma if not last element
if i < len(payload) - 1:
payload_str += ",\n" + "\t" * nested
else:
payload_str += "\n"
return payload_str + "\n" + "\t" * (nested - 1) + "}"
PAYLOAD_CONF = {
"cluster": {
"ldap": "string",
"processes": 22
}
}
paystr = make_payload_str(PAYLOAD_CONF)
os.system('echo "{}"'.format(paystr))
Output:
{
"cluster": {
"ldap": "string",
"processes": 22
}
}
If the payload contains a dictionary, as it does in the example you provided, the function calls itself to produce the string for that dictionary, indented the right number of tabs using the nested parameter.
If the payload is also allowed to have lists and other more complex types, you'll have to include cases that account for those, but it's just more of the same.
I'm working on a project using nlohmann's json C++ implementation.
How can one easily explore nlohmann's JSON keys/vals in GDB ?
I tried to use this STL gdb wrapping since it provides helpers to explore standard C++ library structures that nlohmann's JSON lib is using.
But I don't find it convenient.
Here is a simple use case:
json foo;
foo["flex"] = 0.2;
foo["awesome_str"] = "bleh";
foo["nested"] = {{"bar", "barz"}};
What I would like to have in GDB:
(gdb) p foo
{
"flex" : 0.2,
"awesome_str": "bleh",
"nested": etc.
}
Current behavior
(gdb) p foo
$1 = {
m_type = nlohmann::detail::value_t::object,
m_value = {
object = 0x129ccdd0,
array = 0x129ccdd0,
string = 0x129ccdd0,
boolean = 208,
number_integer = 312266192,
number_unsigned = 312266192,
number_float = 1.5427999782486669e-315
}
}
(gdb) p foo.at("flex")
Cannot evaluate function -- may be inlined // I suppose it depends on my compilation process. But I guess it does not invalidate the question.
(gdb) p *foo.m_value.object
$2 = {
_M_t = {
_M_impl = {
<std::allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nlohmann::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long long, unsigned long long, double, std::allocator, nlohmann::adl_serializer> > > >> = {
<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nlohmann::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long long, unsigned long long, double, std::allocator, nlohmann::adl_serializer> > > >> = {<No data fields>}, <No data fields>},
<std::_Rb_tree_key_compare<std::less<void> >> = {
_M_key_compare = {<No data fields>}
},
<std::_Rb_tree_header> = {
_M_header = {
_M_color = std::_S_red,
_M_parent = 0x4d72d0,
_M_left = 0x4d7210,
_M_right = 0x4d7270
},
_M_node_count = 5
}, <No data fields>}
}
}
I found my own answer reading further the GDB capabilities and stack overflow questions concerning print of std::string.
The short path is the easiest.
The other path was hard, but I'm glad I managed to do this. There is lots of room for improvements.
there is an open issue for this particular matter here https://github.com/nlohmann/json/issues/1952*
Short path v3.1.2
I simply defined a gdb command as follows:
# this is a gdb script
# can be loaded from gdb using
# source my_script.txt (or. gdb or whatever you like)
define pjson
# use the lohmann's builtin dump method, ident 4 and use space separator
printf "%s\n", $arg0.dump(4, ' ', true).c_str()
end
# configure command helper (text displayed when typing 'help pjson' in gdb)
document pjson
Prints a lohmann's JSON C++ variable as a human-readable JSON string
end
Using it in gdb:
(gdb) source my_custom_script.gdb
(gdb) pjson foo
{
"flex" : 0.2,
"awesome_str": "bleh",
"nested": {
"bar": "barz"
}
}
Short path v3.7.0 [EDIT] 2019-onv-06
One may also use the new to_string() method,but I could not get it to work withing GDB with a live inferior process. Method below still works.
# this is a gdb script
# can be loaded from gdb using
# source my_script.txt (or. gdb or whatever you like)
define pjson
# use the lohmann's builtin dump method, ident 4 and use space separator
printf "%s\n", $arg0.dump(4, ' ', true, json::error_handler_t::strict).c_str()
end
# configure command helper (text displayed when typing 'help pjson' in gdb)
document pjson
Prints a lohmann's JSON C++ variable as a human-readable JSON string
end
April 18th 2020: WORKING FULL PYTHON GDB (with live inferior process and debug symbols)
Edit 2020-april-26: the code (offsets) here are out of blue and NOT compatible for all platforms/JSON lib compilations. The github project is much more mature regarding this matter (3 platforms tested so far). Code is left there as is since I won't maintain 2 codebases.
versions:
https://github.com/nlohmann/json version 3.7.3
GNU gdb (GDB) 8.3 for GNAT Community 2019 [rev=gdb-8.3-ref-194-g3fc1095]
c++ project built with GPRBUILD/ GNAT Community 2019 (20190517) (x86_64-pc-mingw32)
The following python code shall be loaded within gdb. I use a .gdbinit file sourced in gdb.
Github repo: https://github.com/LoneWanderer-GH/nlohmann-json-gdb
GDB script
Feel free to adopt the loading method of your choice (auto, or not, or IDE plugin, whatever)
set print pretty
# source stl_parser.gdb # if you like the good work done with those STL containers GDB parsers
source printer.py # the python file is given below
python gdb.printing.register_pretty_printer(gdb.current_objfile(), build_pretty_printer())
Python script
import gdb
import platform
import sys
import traceback
# adapted from https://github.com/hugsy/gef/blob/dev/gef.py
# their rights are theirs
HORIZONTAL_LINE = "_" # u"\u2500"
LEFT_ARROW = "<-" # "\u2190 "
RIGHT_ARROW = "->" # " \u2192 "
DOWN_ARROW = "|" # "\u21b3"
nlohmann_json_type_namespace = \
r"nlohmann::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, " \
r"std::allocator<char> >, bool, long long, unsigned long long, double, std::allocator, nlohmann::adl_serializer>"
# STD black magic
MAGIC_STD_VECTOR_OFFSET = 16 # win 10 x64 values, beware on your platform
MAGIC_OFFSET_STD_MAP = 32 # win 10 x64 values, beware on your platform
""""""
# GDB black magic
""""""
nlohmann_json_type = gdb.lookup_type(nlohmann_json_type_namespace).pointer()
# for in memory direct jumps. cast to type is still necessary yet to obtain values, but this could be changed by chaning the types to simpler ones ?
std_rb_tree_node_type = gdb.lookup_type("std::_Rb_tree_node_base::_Base_ptr").pointer()
std_rb_tree_size_type = gdb.lookup_type("std::size_t").pointer()
""""""
# nlohmann_json reminder. any interface change should be reflected here
# enum class value_t : std::uint8_t
# {
# null, ///< null value
# object, ///< object (unordered set of name/value pairs)
# array, ///< array (ordered collection of values)
# string, ///< string value
# boolean, ///< boolean value
# number_integer, ///< number value (signed integer)
# number_unsigned, ///< number value (unsigned integer)
# number_float, ///< number value (floating-point)
# discarded ///< discarded by the the parser callback function
# };
""""""
enum_literals_namespace = ["nlohmann::detail::value_t::null",
"nlohmann::detail::value_t::object",
"nlohmann::detail::value_t::array",
"nlohmann::detail::value_t::string",
"nlohmann::detail::value_t::boolean",
"nlohmann::detail::value_t::number_integer",
"nlohmann::detail::value_t::number_unsigned",
"nlohmann::detail::value_t::number_float",
"nlohmann::detail::value_t::discarded"]
enum_literal_namespace_to_literal = dict([(e, e.split("::")[-1]) for e in enum_literals_namespace])
INDENT = 4 # beautiful isn't it ?
def std_stl_item_to_int_address(node):
return int(str(node), 0)
def parse_std_str_from_hexa_address(hexa_str):
# https://stackoverflow.com/questions/6776961/how-to-inspect-stdstring-in-gdb-with-no-source-code
return '"{}"'.format(gdb.parse_and_eval("*(char**){}".format(hexa_str)).string())
class LohmannJSONPrinter(object):
"""Print a nlohmann::json in GDB python
BEWARE :
- Contains shitty string formatting (defining lists and playing with ",".join(...) could be better; ident management is stoneage style)
- Parsing barely tested only with a live inferior process.
- It could possibly work with a core dump + debug symbols. TODO: read that stuff
https://doc.ecoscentric.com/gnutools/doc/gdb/Core-File-Generation.html
- Not idea what happens with no symbols available, lots of fields are retrieved by name and should be changed to offsets if possible
- NO LIB VERSION MANAGEMENT. TODO: determine if there are serious variants in nlohmann data structures that would justify working with strucutres
- PLATFORM DEPENDANT TODO: remove the black magic offsets or handle them in a nicer way
NB: If you are python-kaizer-style-guru, please consider helping or teaching how to improve all that mess
"""
def __init__(self, val, indent_level=0):
self.val = val
self.field_type_full_namespace = None
self.field_type_short = None
self.indent_level = indent_level
self.function_map = {"nlohmann::detail::value_t::null": self.parse_as_leaf,
"nlohmann::detail::value_t::object": self.parse_as_object,
"nlohmann::detail::value_t::array": self.parse_as_array,
"nlohmann::detail::value_t::string": self.parse_as_str,
"nlohmann::detail::value_t::boolean": self.parse_as_leaf,
"nlohmann::detail::value_t::number_integer": self.parse_as_leaf,
"nlohmann::detail::value_t::number_unsigned": self.parse_as_leaf,
"nlohmann::detail::value_t::number_float": self.parse_as_leaf,
"nlohmann::detail::value_t::discarded": self.parse_as_leaf}
def parse_as_object(self):
assert (self.field_type_short == "object")
o = self.val["m_value"][self.field_type_short]
# traversing tree is a an adapted copy pasta from STL gdb parser
# (http://www.yolinux.com/TUTORIALS/src/dbinit_stl_views-1.03.txt and similar links)
# Simple GDB Macros writen by Dan Marinescu (H-PhD) - License GPL
# Inspired by intial work of Tom Malnar,
# Tony Novac (PhD) / Cornell / Stanford,
# Gilad Mishne (PhD) and Many Many Others.
# Contact: dan_c_marinescu#yahoo.com (Subject: STL)
#
# Modified to work with g++ 4.3 by Anders Elton
# Also added _member functions, that instead of printing the entire class in map, prints a member.
node = o["_M_t"]["_M_impl"]["_M_header"]["_M_left"]
# end = o["_M_t"]["_M_impl"]["_M_header"]
tree_size = o["_M_t"]["_M_impl"]["_M_node_count"]
# in memory alternatives:
_M_t = std_stl_item_to_int_address(o.referenced_value().address)
_M_t_M_impl_M_header_M_left = _M_t + 8 + 16 # adding bits
_M_t_M_impl_M_node_count = _M_t + 8 + 16 + 16 # adding bits
node = gdb.Value(long(_M_t_M_impl_M_header_M_left)).cast(std_rb_tree_node_type).referenced_value()
tree_size = gdb.Value(long(_M_t_M_impl_M_node_count)).cast(std_rb_tree_size_type).referenced_value()
i = 0
if tree_size == 0:
return "{}"
else:
s = "{\n"
self.indent_level += 1
while i < tree_size:
# STL GDB scripts write "+1" which in my w10 x64 GDB makes a +32 bits move ...
# may be platform dependant and should be taken with caution
key_address = std_stl_item_to_int_address(node) + MAGIC_OFFSET_STD_MAP
# print(key_object['_M_dataplus']['_M_p'])
k_str = parse_std_str_from_hexa_address(hex(key_address))
# offset = MAGIC_OFFSET_STD_MAP
value_address = key_address + MAGIC_OFFSET_STD_MAP
value_object = gdb.Value(long(value_address)).cast(nlohmann_json_type)
v_str = LohmannJSONPrinter(value_object, self.indent_level + 1).to_string()
k_v_str = "{} : {}".format(k_str, v_str)
end_of_line = "\n" if tree_size <= 1 or i == tree_size else ",\n"
s = s + (" " * (self.indent_level * INDENT)) + k_v_str + end_of_line # ",\n"
if std_stl_item_to_int_address(node["_M_right"]) != 0:
node = node["_M_right"]
while std_stl_item_to_int_address(node["_M_left"]) != 0:
node = node["_M_left"]
else:
tmp_node = node["_M_parent"]
while std_stl_item_to_int_address(node) == std_stl_item_to_int_address(tmp_node["_M_right"]):
node = tmp_node
tmp_node = tmp_node["_M_parent"]
if std_stl_item_to_int_address(node["_M_right"]) != std_stl_item_to_int_address(tmp_node):
node = tmp_node
i += 1
self.indent_level -= 2
s = s + (" " * (self.indent_level * INDENT)) + "}"
return s
def parse_as_str(self):
return parse_std_str_from_hexa_address(str(self.val["m_value"][self.field_type_short]))
def parse_as_leaf(self):
s = "WTFBBQ !"
if self.field_type_short == "null" or self.field_type_short == "discarded":
s = self.field_type_short
elif self.field_type_short == "string":
s = self.parse_as_str()
else:
s = str(self.val["m_value"][self.field_type_short])
return s
def parse_as_array(self):
assert (self.field_type_short == "array")
o = self.val["m_value"][self.field_type_short]
start = o["_M_impl"]["_M_start"]
size = o["_M_impl"]["_M_finish"] - start
# capacity = o["_M_impl"]["_M_end_of_storage"] - start
# size_max = size - 1
i = 0
start_address = std_stl_item_to_int_address(start)
if size == 0:
s = "[]"
else:
self.indent_level += 1
s = "[\n"
while i < size:
# STL GDB scripts write "+1" which in my w10 x64 GDB makes a +16 bits move ...
offset = i * MAGIC_STD_VECTOR_OFFSET
i_address = start_address + offset
value_object = gdb.Value(long(i_address)).cast(nlohmann_json_type)
v_str = LohmannJSONPrinter(value_object, self.indent_level + 1).to_string()
end_of_line = "\n" if size <= 1 or i == size else ",\n"
s = s + (" " * (self.indent_level * INDENT)) + v_str + end_of_line
i += 1
self.indent_level -= 2
s = s + (" " * (self.indent_level * INDENT)) + "]"
return s
def is_leaf(self):
return self.field_type_short != "object" and self.field_type_short != "array"
def parse_as_aggregate(self):
if self.field_type_short == "object":
s = self.parse_as_object()
elif self.field_type_short == "array":
s = self.parse_as_array()
else:
s = "WTFBBQ !"
return s
def parse(self):
# s = "WTFBBQ !"
if self.is_leaf():
s = self.parse_as_leaf()
else:
s = self.parse_as_aggregate()
return s
def to_string(self):
try:
self.field_type_full_namespace = self.val["m_type"]
str_val = str(self.field_type_full_namespace)
if not str_val in enum_literal_namespace_to_literal:
return "TIMMY !"
self.field_type_short = enum_literal_namespace_to_literal[str_val]
return self.function_map[str_val]()
# return self.parse()
except:
show_last_exception()
return "NOT A JSON OBJECT // CORRUPTED ?"
def display_hint(self):
return self.val.type
# adapted from https://github.com/hugsy/gef/blob/dev/gef.py
# inspired by https://stackoverflow.com/questions/44733195/gdb-python-api-getting-the-python-api-of-gdb-to-print-the-offending-line-numbe
def show_last_exception():
"""Display the last Python exception."""
print("")
exc_type, exc_value, exc_traceback = sys.exc_info()
print(" Exception raised ".center(80, HORIZONTAL_LINE))
print("{}: {}".format(exc_type.__name__, exc_value))
print(" Detailed stacktrace ".center(80, HORIZONTAL_LINE))
for (filename, lineno, method, code) in traceback.extract_tb(exc_traceback)[::-1]:
print("""{} File "{}", line {:d}, in {}()""".format(DOWN_ARROW, filename, lineno, method))
print(" {} {}".format(RIGHT_ARROW, code))
print(" Last 10 GDB commands ".center(80, HORIZONTAL_LINE))
gdb.execute("show commands")
print(" Runtime environment ".center(80, HORIZONTAL_LINE))
print("* GDB: {}".format(gdb.VERSION))
print("* Python: {:d}.{:d}.{:d} - {:s}".format(sys.version_info.major, sys.version_info.minor,
sys.version_info.micro, sys.version_info.releaselevel))
print("* OS: {:s} - {:s} ({:s}) on {:s}".format(platform.system(), platform.release(),
platform.architecture()[0],
" ".join(platform.dist())))
print(horizontal_line * 80)
print("")
exit(-6000)
def build_pretty_printer():
pp = gdb.printing.RegexpCollectionPrettyPrinter("nlohmann_json")
pp.add_printer(nlohmann_json_type_namespace, "^{}$".format(nlohmann_json_type_namespace), LohmannJSONPrinter)
return pp
######
# executed at autoload (or to be executed by in GDB)
# gdb.printing.register_pretty_printer(gdb.current_objfile(),build_pretty_printer())
BEWARE :
- Contains shitty string formatting (defining lists and playing with ",".join(...) could be better; ident management is stoneage style)
- Parsing barely tested only with a live inferior process.
- It could possibly work with a core dump + debug symbols. TODO: read that stuff
https://doc.ecoscentric.com/gnutools/doc/gdb/Core-File-Generation.html
- Not idea what happens with no symbols available, lots of fields are retrieved by name and should be changed to offsets if possible
- NO LIB VERSION MANAGEMENT. TODO: determine if there are serious variants in nlohmann data structures that would justify working with structures
- PLATFORM DEPENDANT TODO: remove the black magic offsets or handle them in a nicer way
NB: If you are python-kaizer-style-guru, please consider helping or teaching how to improve all that mess
some (light tests):
gpr file:
project Debug_Printer is
for Source_Dirs use ("src", "include");
for Object_Dir use "obj";
for Main use ("main.cpp");
for Languages use ("C++");
package Naming is
for Spec_Suffix ("c++") use ".hpp";
end Naming;
package Compiler is
for Switches ("c++") use ("-O3", "-Wall", "-Woverloaded-virtual", "-g");
end Compiler;
package Linker is
for Switches ("c++") use ("-g");
end Linker;
end Debug_Printer;
main.cpp
#include // i am using the standalone json.hpp from the repo release
#include
using json = nlohmann::json;
int main() {
json fooz;
fooz = 0.7;
json arr = {3, "25", 0.5};
json one;
one["first"] = "second";
json foo;
foo["flex"] = 0.2;
foo["bool"] = true;
foo["int"] = 5;
foo["float"] = 5.22;
foo["trap "] = "you fell";
foo["awesome_str"] = "bleh";
foo["nested"] = {{"bar", "barz"}};
foo["array"] = { 1, 0, 2 };
std::cout << "fooz" << std::endl;
std::cout << fooz.dump(4) << std::endl << std::endl;
std::cout << "arr" << std::endl;
std::cout << arr.dump(4) << std::endl << std::endl;
std::cout << "one" << std::endl;
std::cout << one.dump(4) << std::endl << std::endl;
std::cout << "foo" << std::endl;
std::cout << foo.dump(4) << std::endl << std::endl;
json mixed_nested;
mixed_nested["Jean"] = fooz;
mixed_nested["Baptiste"] = one;
mixed_nested["Emmanuel"] = arr;
mixed_nested["Zorg"] = foo;
std::cout << "5th element" << std::endl;
std::cout << mixed_nested.dump(4) << std::endl << std::endl;
return 0;
}
outputs:
(gdb) source .gdbinit
Breakpoint 1, main () at F:\DEV\Projets\nlohmann.json\src\main.cpp:45
(gdb) p mixed_nested
$1 = {
"Baptiste" : {
"first" : "second"
},
"Emmanuel" : [
3,
"25",
0.5,
],
"Jean" : 0.69999999999999996,
"Zorg" : {
"array" : [
1,
0,
2,
],
"awesome_str" : "bleh",
"bool" : true,
"flex" : 0.20000000000000001,
"float" : 5.2199999999999998,
"int" : 5,
"nested" : {
"bar" : "barz"
},
"trap " : "you fell",
},
}
Edit 2019-march-24 : add precision given by employed russian.
Edit 2020-april-18 : after a long night of struggling with python/gdb/stl I had something working by the ways of the GDB documentation for python pretty printers. Please forgive any mistakes or misconceptions, I banged my head a whole night on this and everything is flurry-blurry now.
Edit 2020-april-18 (2): rb tree node and tree_size could be traversed in a more "in-memory" way (see above)
Edit 2020-april-26: add warning concerning the GDB python pretty printer.
My solution was to edit the ~/.gdbinit file.
define jsontostring
printf "%s\n", $arg0.dump(2, ' ', true, nlohmann::detail::error_handler_t::strict).c_str()
end
This makes the "jsontostring" command available on every gdb session without the need of sourcing any files.
(gdb) jsontostring object
I have an application that tries to read a specific key file and this can happen multiple times during the program's lifespan. Here is the function for reading the file:
__status
_read_key_file(const char * file, char ** buffer)
{
FILE * pFile = NULL;
long fsize = 0;
pFile = fopen(file, "rb");
if (pFile == NULL) {
_set_error("Could not open file: ", 1);
return _ERROR;
}
// Get the filesize
while(fgetc(pFile) != EOF) {
++fsize;
}
*buffer = (char *) malloc(sizeof(char) * (fsize + 1));
// Read the file and write it to the buffer
rewind(pFile);
size_t result = fread(*buffer, sizeof(char), fsize, pFile);
if (result != fsize) {
_set_error("Reading error", 0);
fclose(pFile);
return _ERROR;
}
fclose(pFile);
pFile = NULL;
return _OK;
}
Now the problem is that for a single open/read/close it works just fine, except when I run the function the second time - it will always segfault at this line: while(fgetc(pFile) != EOF)
Tracing with gdb, it shows that the segfault occurs deeper within the fgetc function itself.
I am a bit lost, but obviously am doing something wrong, since if I try to tell the size with fseek/ftell, I always get a 0.
Some context:
Language: C
System: Linux (Ubuntu 16 64bit)
Please ignore functions
and names with underscores as they are defined somewhere else in the
code.
Program is designed to run as a dynamic library to load in Python via ctypes
EDIT
Right, it seems there's more than meets the eye. Jean-François Fabre spawned an idea that I tested and it worked, however I am still confused to why.
Some additional context:
Suppose there's a function in C that looks something like this:
_status
init(_conn_params cp) {
_status status = _NONE;
if (!cp.pkey_data) {
_set_error("No data, open the file", 0);
if(!cp.pkey_file) {
_set_error("No public key set", 0);
return _ERROR;
}
status = _read_key_file(cp.pkey_file, &cp.pkey_data);
if (status != _OK) return status;
}
/* SOME ADDITIONAL WORK AND CHECKING DONE HERE */
return status;
}
Now in Python (using 3.5 for testing), we generate those conn_params and then call the init function:
from ctypes import *
libCtest = CDLL('./lib/lib.so')
class _conn_params(Structure):
_fields_ = [
# Some params
('pkey_file', c_char_p),
('pkey_data', c_char_p),
# Some additonal params
]
#################### PART START #################
cp = _conn_params()
cp.pkey_file = "public_key.pem".encode('utf-8')
status = libCtest.init(cp)
status = libCtest.init(cp) # Will cause a segfault
##################### PART END ###################
# However if we do
#################### PART START #################
cp = _conn_params()
cp.pkey_file = "public_key.pem".encode('utf-8')
status = libCtest.init(cp)
# And then
cp = _conn_params()
cp.pkey_file = "public_key.pem".encode('utf-8')
status = libCtest.init(cp)
##################### PART END ###################
The second PART START / PART END will not cause the segfault in this context.
Would anyone know a reason to why?