Reading memory with Pymem (MultiThreading problem?) - python

Hi I am currently working on Reinforcement-Learning project using Python, the problem is that I need to read a memory address to access game variables for example Speed of Car or Track Progress. Everything is working fine I also tried to create 2 threads that simulateously access the memory address. But When My PPO Agent worker is creating new Enviroment, with the class containing the Speedometer.py class and tries calling return_speed_mph() the program gives an error:
raise pymem.exception.MemoryReadError(address, struct.calcsize('i'), e.error_code)pymem.exception.MemoryReadError: Could not read memory at: 9521188, length: 4 - GetLastError: 6
The Speedometer class looks like this:
from pymem import *
from pymem.process import *
from multiprocessing import Lock, Value
class Speedometer:
mem: Pymem
module: pymem.process
shared_speed: Value
lock: Lock
offsets = [0xC, 0xC, 0x38, 0xC, 0x54]
def __init__(self):
self.module = module_from_name(self.mem.process_handle, "speed.exe").lpBaseOfDll
self.mem = Pymem("speed.exe")
self.lock = Lock()
self.shared_speed = Value('i', 0)
def return_speed_mph(self):
with self.lock:
result = self.mem.read_int(self.get_pointer_address(self.module + 0x00514824, self.offsets))
return result
def get_pointer_address(self, base, offsets):
addr = self.mem.read_int(base)
for offset in offsets:
if offset != offsets[-1]:
addr = self.mem.read_int(addr + offset)
addr = addr + offsets[-1]
return addr
I temporarily fixed the problem by moving the Inicialization of mem and module directly into return_speed_mph() function, but I think that it is memory inefficient:
def return_speed_mph(self):
with self.lock:
self.mem = Pymem("speed.exe")
self.module = module_from_name(self.mem.process_handle, "speed.exe").lpBaseOfDll
result = self.mem.read_int(self.get_pointer_address(self.module + 0x00514824, self.offsets))
return result
Is there any way to fix it somehow without moving the initialization of the module and mem directly into the return_speed_mph() function?
Thanks in advance. :D
(The Library I am using is called Pymem)

Related

Memory leak while retrieving data from a proxy class

I am multi-processing data from a series of files.
To achieve the purpose, I built a class to distribute the data.
I started 4 processes that will visit the same class and retrieve data.
The problem is, if I use the class method (retrieve()) to retrieve data, the memory will keep going up. If I don't, the memory is stable, even though the data keeps refreshing by getData(). How to keep a stable memory usage while retrieving data? Or any other way to achieve the same goal?
import pandas as pd
from multiprocessing import Process, RLock
from multiprocessing.managers import BaseManager
class myclass():
def __init__(self, path):
self.path = path
self.lock = RLock()
self.getIter()
def getIter(self):
self.iter = pd.read_csv(self.path, chunksize=1000)
def getData(self):
with self.lock:
try:
self.data = next(self.iter)
except:
self.getIter()
self.data = next(self.iter)
def retrieve(self):
return self.data
def worker(c):
while True:
c.getData()
# Uncommenting the following line, memory usage goes up
data = c.retrieve()
#Generate a testing file
with open('tmp.csv', 'w') as f:
for i in range(1000000):
f.write('%f\n'%(i*1.))
BaseManager.register('myclass', myclass)
bm = BaseManager()
bm.start()
c = bm.myclass('tmp.csv')
for i in range(4):
p = Process(target=worker, args=(c,))
p.start()
I wasn't able to find out the cause nor solving it, but after changing the data type for the returning variable from pandas.DataFrame to a str (json string), the problem goes.

Why does my Ubuntu freeze after launching python script?

I wrote simple script, that calculates bandwidth of my network. I used library scapy to sniff all incoming traffic and calculate speed. Here is my code, that sniffer traffic:
from time import sleep
from threading import Thread, Event
from scapy.all import *
class Sniffer(Thread):
def __init__(self):
Thread.__init__(self)
self.count_downloaded_bytes = 0
def run(self):
sniff(filter="ip", prn=self.get_packet)
def get_packet(self, packet):
self.count_downloaded_bytes += len(packet) # calculate size of packets
def get_count_downloaded_bytes(self):
count_d_bytes = self.count_downloaded_bytes
self.count_downloaded_bytes = 0
return count_d_bytes # returns size of downloaded data in bytes
This code calculates bandwidth in Mb/s every 10 seconds
class NetworkSpeed(Thread):
def __init__(self):
Thread.__init__(self)
self.sniffer = Sniffer() # create seconds thread, that sniffs traffic
self.start()
def calculate_bandwidth(self, count_downloaded_bytes, duration):
download_speed = (count_downloaded_bytes / 1000000 * 8) / duration
print('download_speed = ', download_speed)
def run(self):
counter = 0
self.sniffer.start()
while True:
if counter == 10:
self.calculate_bandwidth(self.sniffer.get_count_downloaded_bytes(), 10)
counter = 0
counter += 1
sleep(1)
network_speed = NetworkSpeed()
I know, that the code is not really good, it is just a prototype. But I have next problem: I launched this script with root privileges and after 5 minutes my computer hangs, it started to work very-very slowly. It seems that this script took all RAM. How can I fix this ? because script should work at least 1 day.
I think the problem may lay in the sniff function, try to call with
def run(self):
sniff(filter="ip", prn=self.get_packet,store=False)
so that it doesn't save packets and fill the ram.

Python sharing a deque between multiprocessing processes

I've been looking at the following questions for the pas hour without any luck:
Python sharing a dictionary between parallel processes
multiprocessing: sharing a large read-only object between processes?
multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
I've written a very basic test file to illustrate what I'm trying to do:
from collections import deque
from multiprocessing import Process
import numpy as np
class TestClass:
def __init__(self):
self.mem = deque(maxlen=4)
self.process = Process(target=self.run)
def run(self):
while True:
self.mem.append(np.array([0, 1, 2, 3, 4]))
def print_values(x):
while True:
print(x)
test = TestClass()
process = Process(target=print_values(test.mem))
test.process.start()
process.start()
Currently this outputs the following :
deque([], maxlen=4)
How can I access the mem value's from the main code or the process that runs "print_values"?
Unfortunately multiprocessing.Manager() doesn't support deque but it does work with list, dict, Queue, Value and Array. A list is fairly close so I've used it in the example below..
from multiprocessing import Process, Manager, Lock
import numpy as np
class TestClass:
def __init__(self):
self.maxlen = 4
self.manager = Manager()
self.mem = self.manager.list()
self.lock = self.manager.Lock()
self.process = Process(target=self.run, args=(self.mem, self.lock))
def run(self, mem, lock):
while True:
array = np.random.randint(0, high=10, size=5)
with lock:
if len(mem) >= self.maxlen:
mem.pop(0)
mem.append(array)
def print_values(mem, lock):
while True:
with lock:
print mem
test = TestClass()
print_process = Process(target=print_values, args=(test.mem, test.lock))
test.process.start()
print_process.start()
test.process.join()
print_process.join()
You have to be a little careful using manager objects. You can use them a lot like the objects they reference but you can't do something like... mem = mem[-4:] to truncate the values because you're changing the referenced object.
As for coding style, I might move the Manager objects outside the class or move the print_values function inside it but for an example, this works. If you move things around, just note that you can't use self.mem directly in the run method. You need to pass it in when you start the process or the fork that python does in the background will create a new instance and it won't be shared.
Hopefully this works for your situation, if not, we can try to adapt it a bit.
So by combining the code provided by #bivouac0 and the comment #Marijn Pieters posted, I came up with the following solution:
from multiprocessing import Process, Manager, Queue
class testClass:
def __init__(self, maxlen=4):
self.mem = Queue(maxsize=maxlen)
self.process = Process(target=self.run)
def run(self):
i = 0
while True:
self.mem.empty()
while not self.mem.full():
self.mem.put(i)
i += 1
def print_values(queue):
while True:
values = queue.get()
print(values)
if __name__ == "__main__":
test = testClass()
print_process = Process(target=print_values, args=(test.mem,))
test.process.start()
print_process.start()
test.process.join()
print_process.join()

python, multthreading, safe to use pandas "to_csv" on common file?

I've got some code that works pretty nicely. It's a while-loop that goes through a list of dates, finds files on my HDD that corresponds to those dates, does some calculations with those files, and then outputs to a "results.csv" file using the command:
my_df.to_csv("results.csv",mode = 'a')
I'm wondering if it's safe to create a new thread for each date, and call the stuff in the while loop on several dates at a time?
MY CODE:
import datetime, time, os
import sys
import threading
import helperPY #a python file containing the logic I need
class myThread (threading.Thread):
def __init__(self, threadID, name, counter,sn, m_date):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
self.sn = sn
self.m_date = m_date
def run(self):
print "Starting " + self.name
m_runThis(sn, m_date)
print "Exiting " + self.name
def m_runThis(sn, m_date):
helperPY.helpFn(sn,m_date) #this is where the "my_df.to_csv()" is called
sn = 'XXXXXX'
today=datetime.datetime(2016,9,22) #
yesterday=datetime.datetime(2016,6,13)
threadList = []
i_threadlist=0
while(today>yesterday):
threadList.append(myThread(i_threadlist, str(today), i_threadlist,sn,today))
threadList[i_threadlist].start()
i_threadlist = i_threadlist +1
today = today-datetime.timedelta(1)
Writing the file in multiple threads is not safe. But you can create a lock to protect that one operation while letting the rest run in parallel. Your to_csv isn't shown, but you could create the lock
csv_output_lock = threading.Lock()
and pass it to helperPY.helpFn. When you get to the operation, do
with csv_output_lock:
my_df.to_csv("results.csv",mode = 'a')
You get parallelism for other operations - subject to the GIL of course - but the file access is protected.

How to properly set up multiprocessing proxy objects for objects that already exist

I'm trying to share an existing object across multiple processing using the proxy methods described here. My multiprocessing idiom is the worker/queue setup, modeled after the 4th example here.
The code needs to do some calculations on data that are stored in rather large files on disk. I have a class that encapsulates all the I/O interactions, and once it has read a file from disk, it saves the data in memory for the next time a task needs to use the same data (which happens often).
I thought I had everything working from reading the examples linked to above. Here is a mock up of the code that just uses numpy random arrays to model the disk I/O:
import numpy
from multiprocessing import Process, Queue, current_process, Lock
from multiprocessing.managers import BaseManager
nfiles = 200
njobs = 1000
class BigFiles:
def __init__(self, nfiles):
# Start out with nothing read in.
self.data = [ None for i in range(nfiles) ]
# Use a lock to make sure only one process is reading from disk at a time.
self.lock = Lock()
def access(self, i):
# Get the data for a particular file
# In my real application, this function reads in files from disk.
# Here I mock it up with random numpy arrays.
if self.data[i] is None:
with self.lock:
self.data[i] = numpy.random.rand(1024,1024)
return self.data[i]
def summary(self):
return 'BigFiles: %d, %d Storing %d of %d files in memory'%(
id(self),id(self.data),
(len(self.data) - self.data.count(None)),
len(self.data) )
# I'm using a worker/queue setup for the multprocessing:
def worker(input, output):
proc = current_process().name
for job in iter(input.get, 'STOP'):
(big_files, i, ifile) = job
data = big_files.access(ifile)
# Do some calculations on the data
answer = numpy.var(data)
msg = '%s, job %d'%(proc, i)
msg += '\n Answer for file %d = %f'%(ifile, answer)
msg += '\n ' + big_files.summary()
output.put(msg)
# A class that returns an existing file when called.
# This is my attempted workaround for the fact that Manager.register needs a callable.
class ObjectGetter:
def __init__(self, obj):
self.obj = obj
def __call__(self):
return self.obj
def main():
# Prior to the place where I want to do the multprocessing,
# I already have a BigFiles object, which might have some data already read in.
# (Here I start it out empty.)
big_files = BigFiles(nfiles)
print 'Initial big_files.summary = ',big_files.summary()
# My attempt at making a proxy class to pass big_files to the workers
class BigFileManager(BaseManager):
pass
getter = ObjectGetter(big_files)
BigFileManager.register('big_files', callable = getter)
manager = BigFileManager()
manager.start()
# Set up the jobs:
task_queue = Queue()
for i in range(njobs):
ifile = numpy.random.randint(0, nfiles)
big_files_proxy = manager.big_files()
task_queue.put( (big_files_proxy, i, ifile) )
# Set up the workers
nproc = 12
done_queue = Queue()
process_list = []
for j in range(nproc):
p = Process(target=worker, args=(task_queue, done_queue))
p.start()
process_list.append(p)
task_queue.put('STOP')
# Log the results
for i in range(njobs):
msg = done_queue.get()
print msg
print 'Finished all jobs'
print 'big_files.summary = ',big_files.summary()
# Shut down the workers
for j in range(nproc):
process_list[j].join()
task_queue.close()
done_queue.close()
main()
This works in the sense that it calculates everything correctly, and it is caching the data that is read along the way. The only problem I'm having is that at the end, the big_files object doesn't have any of the files loaded. The final msg returned is:
Process-2, job 999. Answer for file 198 = 0.083406
BigFiles: 4303246400, 4314056248 Storing 198 of 200 files in memory
But then after it's all done, we have:
Finished all jobs
big_files.summary = BigFiles: 4303246400, 4314056248 Storing 0 of 200 files in memory
So my question is: What happened to all the stored data? It's claiming to be using the same self.data according to the id(self.data). But it's empty now.
I want the end state of big_files to have all the saved data that it accumulated along the way, since I actually have to repeat this entire process many times, so I don't want to have to redo all the (slow) I/O each time.
I'm assuming it must have something to do with my ObjectGetter class. The examples for using BaseManager only show how to make a new object that will be shared, not share an existing one. So am I doing something wrong with way I get the existing big_files object? Can anyone suggest a better way to do this step?
Thanks much!

Categories