Deleted objects in Python 3 not being released from memory? [duplicate]

Deleted objects in Python 3 not being released from memory? [duplicate] - python

I wrote a python script that backs up my files while I'm sleeping at night. The program is designed to run whenever the computer is on and to automatically shut down the computer after the backups are done. My code looks like this:
from datetime import datetime
from os import system
from backup import backup
while True:
today = datetime.now()
# Perform backups on Monday at 3:00am.
if today.weekday() == 0 and today.hour == 3:
print('Starting backups...')
# Perform backups.
backup("C:\\Users\\Jeff Moorhead\\Desktop", "E:\\")
backup("C:\\Users\\Jeff Moorhead\\Desktop", "F:\\")
backup("C:\\Users\\Jeff Moorhead\\OneDrive\\Documents", "E:\\")
backup("C:\\Users\\Jeff Moorhead\\OneDrive\\Documents", "F:\\")
# Shutdown computer after backups finish.
system('shutdown /s /t 10')
break
else:
del today
continue
The backup function is from another file that I wrote to perform more customized backups on a case by by case basis. This code all works perfectly fine, but I'm wondering if the del statement
del today
is really necessary. I put it in thinking that it would prevent my memory from getting filled up by thousands of datetime objects, but then I read that Python uses garbage collection, similar to Java. Further, does the todayvariable automatically get replaced with each pass through the while loop? I know that the program works as intended with the del statement, but if it is unnecessary, then I would like to get rid of it if only for the sake of brevity! What are it's actual effects on memory?

I put it in thinking that it would prevent my memory from getting filled up by thousands of datetime objects
The del statement is not necessary, you may simply remove that block. Python will free the space from those local variables automatically.
... but then I read that Python uses garbage collection, similar to Java.
The above statement is misguided: this has nothing to do with the garbage collector, which exists to break up circular references. In CPython, the memory is released when the object reference count decreases to zero, and that would occur even if the garbage collector is disabled.
Further, does the today variable automatically get replaced with each pass through the while loop? I know that the program works as intended with the del statement, but if it is unnecessary, then I would like to get rid of it if only for the sake of brevity! What are it's actual effects on memory?
A new datetime object is created on each iteration of the loop.
The today name in scope will be rebound to the newly created datetime instance. The old datetime instance will be deleted because no reference exists on it (since the only existing reference is lost once you rebound the name today to a different object). Once again, I stress that this is just ref-counting and has nothing to do with gc.
On an unrelated note, your program will busy-loop and consume an entire CPU with this while loop. You should consider adding a call to time.sleep into the loop so the process will remain mostly idle. Or, better yet, schedule the task to run periodically using cron.

Related

Segmentation fault when initializing array

I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api

This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library

To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.

RAM is not freed after a Python function is invoked

I'm using an in-house Python library for scientific computing. I need to consecutively copy an object, modify it, and then delete it. The object is huge which causes my machine to run out of memory after a few cycles.
The first problem is that I use python's del to delete the object, which apparently only dereferences the object, rather than freeing up RAM.
The second problem is that even when I encapsulate the whole process in a function, after the function is invoked, the RAM is still not freed up. Here's a code snippet to better explain the issue.
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
def f():
for i in range(5):
clone = ws.copy_project(proj)
result = do_something_with(clone)
del clone
f()
gc.collect()
>>> ws
{'sim_01': [<openpnm.network.Cubic object at 0x7fed1c417780>],
'sim_02': [<openpnm.network.Cubic object at 0x7fed1c417888>],
'sim_03': [<openpnm.network.Cubic object at 0x7fed1c417938>],
'sim_04': [<openpnm.network.Cubic object at 0x7fed1c417990>],
'sim_05': [<openpnm.network.Cubic object at 0x7fed1c4179e8>],
'sim_06': [<openpnm.network.Cubic object at 0x7fed1c417a40>]}
My question is how do I completely delete a Python object?
Thanks!
PS. In the code snippet, each time ws.copy_project is called, a copy of proj is stored in ws dictionary.

There are some really smart python people on here. They may be able to tell you better ways to keep your memory clear, but I have used leaky libraries before, and found one (so-far) foolproof way to guarantee that your memory gets cleared after use: execute the memory hog in another process.
To do this, you'd need to arrange for an easy way to make your long calculation be executable separately. I have done this by adding special flags to my existing python script that tells it just to run that function; you may find it easier to put that function in a separate .py file, e.g.:
do_something_with.py
import sys
def do_something_with(i)
# Your example is still too vague. Clearly, something differentiates
# each do_something_with, otherwise you're just taking the
# same inputs 5 times over.
# Whatever the difference is, pass it in as an argument to the function
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
# You may not even need to clone anymore?
clone = ws.copy_project(proj)
result = do_something_with(clone)
# Whatever arg(s) you need to get to the function, just pass it in on the command line
if __name__ == "__main__":
sys.exit(do_something_with(sys.args[1:]))
You can do this using any of the python tools that handle subprocesses. In python 3.5+, the recommended way to do this is subprocess.run. You could change your bigger function to something like this:
import subprocess
invoke_do_something(i):
completed_args = subprocess.run(["python", "do_something_with.py", str(i)], check=False)
return completed_args.returncode
results = map(invoke_do_something, range(5))
You'll obviously need to tailor this to fit your own situation, but by running in a subprocess, you're guaranteed to not have to worry about the memory getting cleaned up. As an added bonus, you could potentially use multiprocess.Pool.map to use multiple processors at one time. (I deliberately coded this to use map to make such a transition simple. You could still use your for loop if you prefer, and then you don't need the invoke... function.) Multiprocessing could speed up your processing, but since you're already worried about memory, is almost certainly a bad idea - with multiple processes of the big memory hog, your system itself will likely quickly run out of memory and kill your process.
Your example is fairly vague, so I've written this at a high level. I can answer some questions if you need.

Will Python close an fd if it's out of a local scope?

I found Python close my file descriptor automatically. Run the follow code and use lsof to find the open file. When sleep in function openAndSleep, I found file "fff" was holding by the process. But when it run out of the function, file "fff" was not holding any more.
import time
def openAndSleep():
f = open("fff", 'w')
print "opened, sleep 10 sec"
time.sleep(10)
print "sleep finish"
openAndSleep()
print "in main...."
time.sleep(10000)
I check class file, it has no __del__ method. It seems strange, anyone know something about it?

Yes, CPython will.
File objects close automatically when their reference count drops to 0. A local scope being cleaned up means that the refcount drops, and if the local scope was the only reference then the file object refcount drops to 0 and is closed.
However, it is better to use the file object as a context manager in a with statement and have it closed automatically that way; don't count on the specific garbage handling implementation of CPython:
def openAndSleep():
with open("fff", 'w') as f:
print "opened, sleep 10 sec"
time.sleep(10)
print "sleep finish"
Note that __del__ is a hook for custom Python classes; file objects are implemented in C and fill the tp_dealloc slot instead. The file_dealloc() function closes the file object.
If you want to hold a file object open for longer, make sure there is still a reference to it. Store a reference to it somewhere else too. Return it from the function and store the return value, for example, or make it a global, etc.

In short: Yes.
Python spares the user the need to manage his memory by implementing a Garbage Collection mechanism.
This basically means that each object in Python will be automatically freed and removed if no one uses it, to free memory and resources so they can be used later in the program again.
File Objects are Pythonic objects, the same as any other object in Python and they too are managed by the garbage collector. Once you leave the function scope the Garbage Collector sees that no one uses the file (using a reference counter) and disposes of the object - which means closing it as well.
What you can do to avoid it is to open the file without using the Python file object by using os.open which returns a file descriptor (int) rather than a Python file object. The file descriptor will then not be discarded by the Garbage Collector since it's not a Python object but an Operating System object and thus your code will work.
You should be careful to close (os.close) the fd later, though, or you will leak resources and sooner or later your program will crash (A process can only store 1024 file descriptors and then no more files can be opened)!
Additional information:
http://www.digi.com/wiki/developer/index.php/Python_Garbage_Collection

What happens to overwritten variables in python?

I am writing some python code to process huge amounts of data (almost 6 million pieces!).
In the code, I'm using a huge for loop to process each set. In that loop, I'm using the same variables every loop and overwriting them. When I ran the program, I noticed that the longer I ran it, the slower it got. Furthermore, upon further experimenting, I discovered that the speed if you ran it for values 10,000 - 10,100 was the same as from 0 to 100. Thus I concluded that since I was not creating more variables and merely processing existing ones, every time I overwrote a variable, it must be being saved somewhere by python.
So:
Am I right? must it be python saving my overwritten somewhere?
Or am I wrong? Is something else happening?

Python uses reference counting to keep track of variables. When all references to a variable are removed, that variable is garbage collected. However that garbage collection is done by python at it's own whim, not right away.
It could be that your code is going faster than python garbage collects, or that you have something wrong with your code. Since you didn't give any of your code there's no real way to know.

Python does not copy the original value of a variable before saving an overwritten variables.
Possibly you are seeing the effect of various caches with the program slowing down. Or if you are creating objects, an garbage collector is being called to delete the objects you created that are no longer referenced.
Do you have example code that shows this behavior you are seeing?
For example:
import hashlib
import random
import time
def test():
t = []
for i in xrange(20000):
if (i == 0) | (i==100)|(i==10000)|(i==10100):
t.append(time.time())
for j in range(1,10):
a = hashlib.sha512(str(random.random()))
b = hashlib.sha512(str(random.random()))
c = hashlib.sha512(str(random.random()))
d = hashlib.sha512(str(random.random()))
e = hashlib.sha512(str(random.random()))
f = hashlib.sha512(str(random.random()))
g = hashlib.sha512(str(random.random()))
print t[1]-t[0], t[3]-t[2]
Then running 10 times:
>>> for i in range(10):
test()
0.0153768062592 0.0147190093994
0.0148379802704 0.0147860050201
0.0145788192749 0.0147390365601
0.0147459506989 0.0146520137787
0.0147008895874 0.0147621631622
0.0145609378815 0.0146908760071
0.0144789218903 0.014506816864
0.0146539211273 0.0145659446716
0.0145878791809 0.0146989822388
0.0146920681 0.0147240161896
Gives nearly identical times to within standard error (especially if I exclude the very first interval where it was slightly slower where it had to first initialize a,b,c,d,e,f,g).

Python and Excel - check if file is open

hey guys, I need help considering win32com in Python:
I have a routine that opens a Workbook, creates a sheet and puts some data on it.
If everything runs fine the woorkbook is saved and closed - If not the python session is terminated but the woorkbook is left open. So the reference is lost. Now when restarting the code Excel prompts you with the msg "workbook still open do you want to re-open?".
So what I want is to suppress this msg. I found a solution that works for me when python terminates before writing to the sheet:
open_copys = self.xlApp.Workbooks.Count
if open_copys > 0:
""" Check if any copy is the desired one"""
for i in range(0, open_copys):
if(self.xlApp.Workbooks[i].FullName == self.file_path):
self.xlBook = self.xlApp.Workbooks[i]
else:
self.xlBook = self.xlApp.Workbooks.Open(self.file_path)
But if any changes were made on the EXCEL sheet this method is obsolet.
Anyone got an ides how to get back a reference to an open and changed worksheet from a new python session?
thx

I'm not familiar with Python but have done some Excel/Word COM code in other languages.
Excel's Application.DisplayAlerts property might help. Setting it to False suppresses most messages that Excel might normally show, and auto-chooses a default response, though I think there are some exceptions.
Looking at your existing code, I guess you'd insert this line before opening the workbook:
self.xlApp.DisplayAlerts = False

Have you tried to remove all references to your COM objects before terminating the Python interpreter ? You can force them to be garbage collected (using gc.collect()) to be really sure they are gone. This way the workbook shouldn't remain open in memory and you won't have the error message.
Try adding a "close()" method to your class, with something like the following, and call it before the end of your script.
import gc
...
def close(self):
del self.xlApp
if hasattr(self, 'xlBook'):
del self.xlBook
gc.collect()

You're going about this the wrong way. You do NOT want to let Python terminate, leaving orphaned Excel processes. This is especially important if you are going to install and run this code on other machines. Instead, find your errors and handle them - then you'll never have orphaned processes to deal with.
That said, there are a few things you can consider. You can choose either to instantiate a new Excel process each time (Dispatch) or work with an existing one (DispatchEx). This lets you do things like see what workbooks are open and to close them, or ensures that your process will not interfere with others. Also as Scott said, the Excel Application has some interesting properties, like suppressing errors for unattended running, that are worth learning.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.