Python print environment variable memory address - python

Is it possible to print my environment variable memory address ?
With gdb-peda i have a memory address looking like 0xbffffcd6 with searchmem and i know it's the right form. (0xbfff????) but gdb moved the stack with some other environment variable.
I would like with my python script to get this address and then do my trick and include my shellcode.
i tried (with Python):
print hex(id(os.environ["ENVVAR"]))
print memoryview(os.environ["ENVVAR"])
# output :
# 0xb7b205c0L
# <memory at 0xb7b4dd9c>
With Ruby :
puts (ENV['PATH'].object_id << 1).to_s(16)
# output :
# -4836c38c
If anyone have an idea, with python or ruby.

The cpython built in function id() returns a unique id for any object, which is not exactly it's memory address but is as close as you can get to such.
For example, we have variable x. id(x) does not return the memory address of the variable x, rather it returns the memory address of the object that x points to.
There's a strict separation between 'variables' and 'memory objects'. In the standard implementation, python allocates a set of locals and a stack for the virtual machine to operate on. All local slots are disjoint, so if you load an object from local slot x onto the stack and modify that object, the "location" of the x slot doesn't change.
http://docs.python.org/library/functions.html#id

I suppose you could do that using the ctypes module to call the native getenv directly :
import ctypes
libc = ctypes.CDLL("libc.so.6")
getenv = libc.getenv
getenv.restype = ctypes.c_voidp
print('%08x' % getenv('PATH'))

This seems an impossible task at least in python.
There are few things to take in consideration from this question:
ASLR would make this completely impossible
Every binary can have it's own overhead, different argv, so, the only reliable option is to execute the binary and trace it's memory until we found the environment variable we are looking for. Basically, even if we can find the environment address in the python process, it would be at a different position in the binary you are trying to exploit.
Best fit to answer this question is to use http://python3-pwntools.readthedocs.io/en/latest/elf.html which is taking a coredump file where it's easy to find the address.

Please keep in mind that system environment variable is not an object you can access by its memory address. Each process, like Python or Ruby process running your script will receive its own copy of environment. Thats why results returned by Python and Ruby interpreters are so different.
If you would like to modify system environment variable you should use API provided by your programming language.
Please see this or that post for Python solution.

Thanks for #mickael9, I have writen a function to calculate address of an environment variable in a program:
def getEnvAddr(envName, ELFfile):
import ctypes
libc = ctypes.CDLL('libc.so.6')
getenv = libc.getenv
getenv.restype = ctypes.c_voidp
ptr = getenv(envName)
ptr += (len('/usr/bin/python') - len(ELFfile)) * 2
return ptr
For example:
user#host:~$ ./getenvaddr.elf PATH /bin/ls
PATH will be at 0xbfffff22 in /bin/ls
user#host:~$ python getenvaddr.py PATH /bin/ls
PATH will be at 0xbfffff22 in /bin/ls
user#host:~$
Note: This function only works in Linux system.

The getenv() function is inherently not reentrant because it returns a value pointing to static data.
In fact, for higher performance of getenv(), the implementation could also maintain a separate copy of the environment in a data structure that could be searched much more quickly (such as an indexed hash table, or a binary tree), and update both it and the linear list at environ when setenv() or unsetenv() is invoked.
So the address returned by getenv is not necessarily from the environment.
Process memory layout;
(source: duartes.org)
(source: cloudfront.net)
Memory map
import os
def mem_map():
path_hex = hex(id(os.getenv('PATH'))).rstrip('L')
path_address = int(path_hex, 16)
for line in open('/proc/self/maps'):
if 'stack' in line:
line = line.split()
first, second = line[0].split('-')
first, second = int(first, 16), int(second, 16)
#stack grows towards lower memory address
start, end = max(first, second), min(first, second)
print('stack:\n\tstart:\t0x{}\n\tend:\t0x{}\n\tsize:\t{}'.format(start, end, start - end))
if path_address in range(end, start+1):
print('\tgetenv("PATH") ({}) is in the stack'.format(path_hex))
else:
print('\tgetenv("PATH") ({}) is not in the stack'.format(path_hex))
if path_address > start:
print('\tgetenv("PATH") ({}) is above the stack'.format(path_hex))
else:
print('\tgetenv("PATH") ({}) is not above the stack'.format(path_hex))
print('')
continue
if 'heap' in line:
line = line.split()
first, second = line[0].split('-')
first, second = int(first, 16), int(second, 16)
#heap grows towards higher memory address
start, end = min(first, second), max(first, second)
print('heap:\n\tstart:\t0x{}\n\tend:\t0x{}\n\tsize:\t{}'.format(start, end, end - start))
if path_address in range(start, end+1):
print('\tgetenv("PATH") ({}) in the heap'.format(path_hex))
else:
print('\tgetenv("PATH") ({}) is not in the heap'.format(path_hex))
print('')
Output;
heap:
start: 0x170364928
end: 0x170930176
size: 565248
getenv("PATH") (0xb74d2330) is not in the heap
stack:
start: 0x0xbffa8000L
end: 0x0xbff86000L
size: 139264
getenv("PATH") (0xb74d2330) is not in the stack
getenv("PATH") (0xb74d2330) is not above the stack
Environment is above the stack. So its address should be higher than the stack. But the address id shows is not in the stack, not in the heap and not above the stack. Is it really an address? or my calculation is wrong!
Here's the code to check where an object lies in memory.
def where_in_mem(obj):
maps = {}
for line in open('/proc/self/maps'):
line = line.split()
start, end = line[0].split('-')
key = line[-1] if line[-1] != '0' else 'anonymous'
maps.setdefault(key, []).append((int(start, 16), int(end, 16)))
for key, pair in maps.items():
for start, end in pair:
# stack starts at higher memory address and grows towards lower memory address
if 'stack' in key:
if start >= id(obj) >= end:
print('Object "{}" ({}) in the range {} - {}, mapped to {}'.format(obj, hex(id(obj)), hex(start), hex(end), key))
continue
if start <= id(obj) <= end:
print('Object "{}" ({}) in the range {} - {}, mapped to {}'.format(obj, hex(id(obj)), hex(start), hex(end), key))
where_in_mem(1)
where_in_mem(os.getenv('PATH'))
Output;
Object "1" (0xa17f8b0) in the range 0xa173000 - 0xa1fd000, mapped to [heap]
Object "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games" (0xb74a1330L) in the range 0xb7414000L - 0xb74d6000L, mapped to anonymous
What's anonymous in the above output?
It is also possible to create an anonymous memory mapping that does not correspond to any files, being used instead for program data. In Linux, if you request a large block of memory via malloc(), the C library will create such an anonymous mapping instead of using heap memory. ‘Large’ means larger than MMAP_THRESHOLD bytes, 128 kB by default and adjustable via mallopt().
Anatomy of a Program in Memory
So the os.environ['PATH'] is in the malloced region.

In ruby it's possible - this post covers the general case:
Accessing objects memory address in ruby..? "You can get the actual pointer value of an object by taking the object id, and doing a bitwise shift to the left"
puts (ENV['RAILS_ENV'].object_id << 1).to_s(16)
> 7f84598a8d58

Related

LLDB Python ReadMemory returns NoneType

I have a script in Python where I check value of x0 register which is a pointer to some memory address and read memory from there. But in my Python script, ReadMemory return None and because of that bytearray function call throws an error. In LLDB console memory read <x0 value> works. Code is below:
import random
import lldb
debugger = lldb.SBDebugger.Create()
target = debugger.GetTargetAtIndex(0)
def modify_memory(debugger, command, result, internal_dict):
thread = debugger.GetThread()
# if called from console itself, debugger argument type is true but if it comes from breakpoint
# debugger type == SBFrame. SBFrame has GetThread() method; so debugger above is actually frame
db = lldb.SBDebugger.Create()
if thread:
#db.HandleCommand("print \"a\"")
frame = thread.GetSelectedFrame()
# Read the value of register x0
x0 = frame.FindRegister("x0")
x0_value = x0.GetValue()
x0_value = int(x0_value,16)
print(x0_value)
# Read memory at the address stored in x0
memory = process.ReadMemory(x0_value+16+4, 256, lldb.SBError())
print(memory)
if memory != None:
print("finally something!")
# Modify a random byte in the memory
random_byte = random.randint(2, 255)
memory = bytearray(memory)
memory[random_byte] = random.randint(0, 255)
# Write the modified memory back to the original location
process.WriteMemory(x0_value, memory, lldb.SBError())
process.Continue()
else:
db.HandleCommand("print \"thread NOT found\"")
process.Continue()
I add modify_memory as a command to breakpoints.
I tried to run command from interactive console but did not manage to re-create thread, process etc variables. Also, I add function modify_memory as command but then variable "debugger" come as SBDebugger (which is actually true :) )but if I add this to a breakpoint, "debugger" variable becomes SBFrame which has the method GetThread
You want to use the newer command definition form:
def command_function(debugger, command, exe_ctx, result, internal_dict):
If you define your function that way, lldb will pass an SBExecutionContext in the exe_ctx parameter that contains the frame/thread/process/target you should act on in the command.
The point is that at any given stop, lldb first queries all the threads, and for each of them that have stopped for a reason, runs the relevant callbacks, then decides whether to stop or not, then computes the selected thread. So at the time breakpoint callbacks are being run, the selected thread is still the one from the last time the debugee stopped.
The original form was really an oversight in the design. We kept the old form around for compatibility reasons, but at this point it's unlikely there are any lldb's around that don't include the more useful form. So that's really the one you want to use.

How to free up memory in Lambda due to numpy.core._exceptions._ArrayMemoryError using pandas compare()

When I run this code in a lambda function in which the memory allocation setting is set to max (10240):
df_compare = first_less_dupes[compare_columns].compare(second_less_dupes[compare_columns])
I'm seeing this error:
Unable to allocate 185. MiB for an array with shape (2697080, 9) and data type float64 - Error type:<class 'numpy.core._exceptions._ArrayMemoryError'>
I've run this code many times with smaller dfs without issue. So I began attacking this from a memory capacity/clean-up approach my assumption being: I need to free up memory. I use two snippets of code to audit my memory usage:
def print_current_memory():
'''
Gets the current process and checks current memory usage
'''
process = psutil.Process(os.getpid())
mbs = round(process.memory_info().rss / 1024 / 1024,2)
print('Current memory usage:',mbs, 'MB')
And
for obj_name in list(locals().keys()):
size = str(sys.getsizeof(locals()[obj_name]))
mbs = str(round(int(size) / 1024 / 1024,2))
print(f'{obj_name}: {mbs}MB. {size}B.')
The print_current_memory function does just what it says in its comments. The loop prints out a list of all local variables and their size. Using the loop I identified several objects that I did not need. (Strangely the summed size of the listed objects should have greatly exceeded the lambda limit (even before the error)).
So I delete those objects and garbage collect (I understand gc may not be necessary).
print_current_memory()
print('Deleting first & limited')
del first
del first_limited
print('Deleting second & limited')
del second
del second_limited
print('Deleting both_df')
del both_df
print('Garbage collecting')
gc.collect()
print_current_memory()
After running this I see:
I am clearly doing something wrong since the current memory usage doesn't decrease. And that is my main concern: How do I decrease memory usage to make space for this new dataframe? But Perhaps I'm asking the wrong question and need to question my assumptions like: Can I monitor current-memory-usuage in a Lambda the same way I would with a Window OS? Am I deleting objects the right way? My use of gc probably illustrates how little I know about it so am I using that correctly?
del does not properly delete objects, but simple drops the reference tied to the variable name being deleted. You must make sure that every other reference is properly dropped too.
Then you might still need to wait for garbage collection to happen. However with pandas and numpy, the actual data is managed in C++, and therefore the garbage collection should be immediate when the last reference is dropped.
Since you are working in an Amazon lambda, the data you transform does not need to be kept, because you just want your result out. Then it is certainly safe for you to use inplace to replace the original data with your processed data, and therefore free up space. Perhaps such tutorial could get you started.

Spark: Why does Python significantly outperform Scala in my use case?

To compare performance of Spark when using Python and Scala I created the same job in both languages and compared the runtime. I expected both jobs to take roughly the same amount of time, but Python job took only 27min, while Scala job took 37min (almost 40% longer!). I implemented the same job in Java as well and it took 37minutes too. How is this possible that Python is so much faster?
Minimal verifiable example:
Python job:
# Configuration
conf = pyspark.SparkConf()
conf.set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
conf.set("spark.executor.instances", "4")
conf.set("spark.executor.cores", "8")
sc = pyspark.SparkContext(conf=conf)
# 960 Files from a public dataset in 2 batches
input_files = "s3a://commoncrawl/crawl-data/CC-MAIN-2019-35/segments/1566027312025.20/warc/CC-MAIN-20190817203056-20190817225056-00[0-5]*"
input_files2 = "s3a://commoncrawl/crawl-data/CC-MAIN-2019-35/segments/1566027312128.3/warc/CC-MAIN-20190817102624-20190817124624-00[0-3]*"
# Count occurances of a certain string
logData = sc.textFile(input_files)
logData2 = sc.textFile(input_files2)
a = logData.filter(lambda value: value.startswith('WARC-Type: response')).count()
b = logData2.filter(lambda value: value.startswith('WARC-Type: response')).count()
print(a, b)
Scala job:
// Configuration
config.set("spark.executor.instances", "4")
config.set("spark.executor.cores", "8")
val sc = new SparkContext(config)
sc.setLogLevel("WARN")
sc.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
// 960 Files from a public dataset in 2 batches
val input_files = "s3a://commoncrawl/crawl-data/CC-MAIN-2019-35/segments/1566027312025.20/warc/CC-MAIN-20190817203056-20190817225056-00[0-5]*"
val input_files2 = "s3a://commoncrawl/crawl-data/CC-MAIN-2019-35/segments/1566027312128.3/warc/CC-MAIN-20190817102624-20190817124624-00[0-3]*"
// Count occurances of a certain string
val logData1 = sc.textFile(input_files)
val logData2 = sc.textFile(input_files2)
val num1 = logData1.filter(line => line.startsWith("WARC-Type: response")).count()
val num2 = logData2.filter(line => line.startsWith("WARC-Type: response")).count()
println(s"Lines with a: $num1, Lines with b: $num2")
Just by looking at the code, they seem to be identical. I looked a the DAGs and they didn't provide any insights (or at least I lack the know-how to come up with an explanation based on them).
I would really appreciate any pointers.
Your basic assumption, that Scala or Java should be faster for this specific task, is just incorrect. You can easily verify it with minimal local applications. Scala one:
import scala.io.Source
import java.time.{Duration, Instant}
object App {
def main(args: Array[String]) {
val Array(filename, string) = args
val start = Instant.now()
Source
.fromFile(filename)
.getLines
.filter(line => line.startsWith(string))
.length
val stop = Instant.now()
val duration = Duration.between(start, stop).toMillis
println(s"${start},${stop},${duration}")
}
}
Python one
import datetime
import sys
if __name__ == "__main__":
_, filename, string = sys.argv
start = datetime.datetime.now()
with open(filename) as fr:
# Not idiomatic or the most efficient but that's what
# PySpark will use
sum(1 for _ in filter(lambda line: line.startswith(string), fr))
end = datetime.datetime.now()
duration = round((end - start).total_seconds() * 1000)
print(f"{start},{end},{duration}")
Results (300 repetitions each, Python 3.7.6, Scala 2.11.12), on Posts.xml from hermeneutics.stackexchange.com data dump with mix of matching and non matching patterns:
Python 273.50 (258.84, 288.16)
Scala 634.13 (533.81, 734.45)
As you see Python is not only systematically faster, but also is more consistent (lower spread).
Take away message is ‒ don't believe unsubstantiated FUD ‒ languages can be faster or slower on specific tasks or with specific environments (for example here Scala can be hit by JVM startup and / or GC and / or JIT), but if you claims like " XYZ is X4 faster" or "XYZ is slow as compared to ZYX (..) Approximately, 10x slower" it usually means that someone wrote really bad code to test things.
Edit:
To address some concerns raised in the comments:
In the OP code data is passed in mostly in one direction (JVM -> Python) and no real serialization is required (this specific path just passes bytestring as-is and decodes on UTF-8 on the other side). That's as cheap as it gets when it comes to "serialization".
What is passed back is just a single integer by partition, so in that direction impact is negligible.
Communication is done over local sockets (all communication on worker beyond initial connect and auth is performed using file descriptor returned from local_connect_and_auth, and its nothing else than socket associated file). Again, as cheap as it gets when it comes to communication between processes.
Considering difference in raw performance shown above (much higher than what you see in you program), there is a lot of margin for overheads listed above.
This case is completely different from cases where either simple or complex objects have to be passed to and from Python interpreter in a form that is accessible to both parties as pickle-compatible dumps (most notable examples include old-style UDF, some parts of old-style MLLib).
Edit 2:
Since jasper-m was concerned about startup cost here, one can easily prove that Python has still significant advantage over Scala even if input size is significantly increased.
Here are results for 2003360 lines / 5.6G (the same input, just duplicated multiple times, 30 repetitions), which way exceeds anything you can expect in a single Spark task.
Python 22809.57 (21466.26, 24152.87)
Scala 27315.28 (24367.24, 30263.31)
Please note non-overlapping confidence intervals.
Edit 3:
To address another comment from Jasper-M:
The bulk of all the processing is still happening inside a JVM in the Spark case.
That is simply incorrect in this particular case:
The job in question is map job with single global reduce using PySpark RDDs.
PySpark RDD (unlike let's say DataFrame) implement gross of functionality natively in Python, with exception input, output and inter-node communication.
Since it is single stage job, and final output is small enough to be ignored, the main responsibility of JVM (if one was to nitpick, this is implemented mostly in Java not Scala) is to invoke Hadoop input format, and push data through socket file to Python.
The read part is identical for JVM and Python API, so it can be considered as constant overhead. It also doesn't qualify as the bulk of the processing, even for such simple job like this one.
The Scala job takes longer because it has a misconfiguration and, therefore, the Python and Scala jobs had been provided with unequal resources.
There are two mistakes in the code:
val sc = new SparkContext(config) // LINE #1
sc.setLogLevel("WARN")
sc.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
sc.hadoopConfiguration.set("spark.executor.instances", "4") // LINE #4
sc.hadoopConfiguration.set("spark.executor.cores", "8") // LINE #5
LINE 1. Once the line has been executed, the resource configuration of the Spark job is already established and fixed. From this point on, no way to adjust anything. Neither the number of executors nor the number of cores per executor.
LINE 4-5. sc.hadoopConfiguration is a wrong place to set any Spark configuration. It should be set in the config instance you pass to new SparkContext(config).
[ADDED]
Bearing the above in mind, I would propose to change the code of the Scala job to
config.set("spark.executor.instances", "4")
config.set("spark.executor.cores", "8")
val sc = new SparkContext(config) // LINE #1
sc.setLogLevel("WARN")
sc.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
and re-test it again. I bet the Scala version is going to be X times faster now.

Writing to memory in a single operation

I'm writing a userspace driver for accessing FPGA registers in Python 3.5 that mmaps the FPGA's PCI address space, obtains a memoryview to provide direct access to the memory-mapped register space, and then uses struct.pack_into("<I", ...) to write a 32-bit value into the selected 32-bit aligned address.
def write_u32(address, data):
assert address % 4 == 0, "Address must be 32-bit aligned"
path = path.lib.Path("/dev/uio0")
file_size = path.stat().st_size
with path.open(mode='w+b') as f:
mv = memoryview(mmap.mmap(f.fileno(), file_size))
struct.pack_into("<I", mv, address, data)
Unfortunately, it appears that struct.pack_into does a memset(buf, 0, ...) that clears the register before the actual value is written. By examining write operations within the FPGA, I can see that the register is set to 0x00000000 before the true value is set, so there are at least two writes across the PCI bus (in fact for 32-bit access there are three, two zero writes, then the actual data. 64-bit involves six writes). This causes side-effects with some registers that count the number of write operations, or some that "clear on write" or trigger some event when written.
I'd like to use an alternative method to write the register data in a single write to the memory-mapped register space. I've looked into ctypes.memmove and it looks promising (not yet working), but I'm wondering if there are other ways to do this.
Note that a register read using struct.unpack_from works perfectly.
Note that I've also eliminated the FPGA from this by using a QEMU driver that logs all accesses - I see the same double zero-write access before data is written.
I revisited this in 2022 and the situation hasn't really changed. If you're considering using memoryview to write blocks of data at once, you may find this interesting.
Perhaps this would work as needed?
mv[address:address+4] = struct.pack("<I", data)
Update:
As seen from the comments, the code above does not solve the problem. The following variation of it does, however:
mv_as_int = mv.cast('I')
mv_as_int[address/4] = data
Unfortunately, precise understanding of what happens under the hood and why exactly memoryview behaves this way is beyond the capabilities of modern technology and will thus stay open for the researchers of the future to tackle.
You could try something like this:
def __init__(self,offset,size=0x10000):
self.offset = offset
self.size = size
mmap_file = os.open('/dev/mem', os.O_RDWR | os.O_SYNC)
mem = mmap.mmap(mmap_file, self.size,
mmap.MAP_SHARED,
mmap.PROT_READ | mmap.PROT_WRITE,
offset=self.offset)
os.close(mmap_file)
self.array = np.frombuffer(mem, np.uint32, self.size >> 2)
def wread(self,address):
idx = address >> 2
return_val = int(self.array[idx])
return return_val
def wwrite(self,address,data):
idx = address >> 2
self.array[idx] = np.uint32(data)

Is there a hidden possible deadlock in ppmap/parallel python?

I am having some trouble with using a parallel version of map (ppmap wrapper, implementation by Kirk Strauser).
The function I am trying to run in parallel runs a simple regular expression search on large number of strings (protein sequences), which are parsed from the filesystem using BioPython's SeqIO. Each of function calls uses their own file.
If I run the function using a normal map, everything works as expected. However, when using the ppmap, some of the runs simple freeze, there is no CPU usage and the main program does not even react to KeyboardInterrupt. Also, when I look onto the running processes, the workers are still there (but not using any CPU anymore).
e.g.
/usr/bin/python -u /usr/local/lib/python2.7/dist-packages/pp-1.6.1-py2.7.egg/ppworker.py 2>/dev/null
Furthermore, the workers do not seem to freeze on any particular data entry - if I manually kill the process and re-run the execution, it stops at a different point. (So I have temporarily resorted to keeping a list of finished entries and re-started the program multiple times).
Is there any way to see where the problem is?
Sample of the code that I am running:
def analyse_repeats(data):
"""
Loads whole proteome in memory and then looks for repeats in sequences,
flags both real repeats and sequences not containing particular aminoacid
"""
(organism, organism_id, filename) = data
import re
letters = ['C','M','F','I','L','V','W','Y','A','G','T','S','Q','N','E','D','H','R','K','P']
try:
handle = open(filename)
data = Bio.SeqIO.parse(handle, "fasta")
records = [record for record in data]
store_records = []
for record in records:
sequence = str(record.seq)
uniprot_id = str(record.name)
for letter in letters:
items = set(re.compile("(%s+)" % tuple(([letter] * 1))).findall(sequence))
if items:
for item in items:
store_records.append((organism_id,len(item), uniprot_id, letter))
else:
# letter not present in the string, "zero" repeat
store_records.append((organism_id,0, uniprot_id, letter))
handle.close()
return (organism,store_records)
except IOError as e:
print e
return (organism, [])
res_generator = ppmap.ppmap(
None,
analyse_repeats,
zip(todo_list, organism_ids, filenames)
)
for res in res_generator:
# process the output
If I use simple map instead of the ppmap, everything works fine:
res_generator = map(
analyse_repeats,
zip(todo_list, organism_ids, filenames)
)
You could try using one of the methods (like map) of the Pool object from the multiprocessing module instead. The advantage is that it's built in and doesn't require external packages. It also works very well.
By default, it uses as many worker processes as your computer has cores, but you can specifiy a higher number as well.
May I suggest using dispy (http://dispy.sourceforge.net)? Disclaimer: I am the author. I understand it doesn't address the question directly, but hopefully helps you.

Categories