Caching a script using NUKE API - python

I want to write a script that uses Nuke's built-in Performance Timers to "sanity-check" the current comp.
For this I am clearing all of the viewer cache to start off fresh. Now I need to trigger the caching. As it seems the only way to achieve this is by using nuke.activeViewer().play(1). Using this call I get my timeline cached but I have no indication of when the timeline is fully cached to be able to stop and reset the Performace Timers.
I am aware that I can also use nuke.activeViewer().frameControl(+1) to skip 1 frame at a time till I'm at the last frame but it seems to me that using this call is not causing the comp to cache that frame. Actually the timeline indicates that the frame is cached but nuke.activeViewer().node().frameCached(nuke.frame()) is returning false.
Nevertheless I have written something that is working but only really barely.
Here it is:
import nuke
nuke.clearRAMCache()
vc = nuke.activeViewer()
v = vc.node()
fr = v.playbackRange()
vc.frameControl(-6)
print fr.maxFrame()
cached_frames = 0
while cached_frames < fr.maxFrame():
print "Current Frame: {}".format(nuke.frame())
if not v.frameCached(nuke.frame()):
print "Frame: {} not cached".format(nuke.frame())
while not v.frameCached(nuke.frame()):
print "caching..."
vc.play(1)
print "Frame: {} cached".format(nuke.frame())
print "Incrementing from caching"
cached_frames += 1
else:
vc.frameControl(1)
print "incrementing from skipping"
#cached_frames += 1
print "Cached Frames: {}".format(cached_frames)
print "DONE"
vc.stop()
I know that this is not a really nice piece of code but sometimes these lines execute really well and at other times it just hangs a random (at least it seems so) amount of time.
So are there any callbacks available or writable for the Viewer in Nuke or something similar?
Any help is much appreciated!

What specific requirement w.r.t performance do you want to achieve ?
Nuke has its built in feature
"Nuke can display accurate performance timing data onscreen or output it to XML file to help you troubleshoot bottlenecks in slow scripts. When performance timing is enabled, timing information is displayed in the Node Graph, and the nodes themselves are colored according to the proportion of the total processing time spent in each one, from green (fast nodes) through to red (slow nodes)." -
referred to

Mimicking callbacks on the viewer timeline is only achievable using Threads.
Just create a Thread, check for the current frame to be cached and step to the next frame using nuke.activeViewer().frameControl() from that Thread.

Related

Combiner Functions Seemingly Not emitting correct results

So I'm working on a test streaming case. Reading from pubsub and for now, sending to stdout for some visuals on the pipeline and transforms.
I believe I'm getting some unusual output, and believe I'm likely missing something so hoping someone can help.
Take my code (stripped back to debug):
with beam.Pipeline(options=opts)as p:
(
p
| ReadFromPubSub(topic=topic_name
,timestamp_attribute='timestamp')
| beam.WindowInto(beam.window.FixedWindows(beam.window.Duration(5)),
trigger=beam.trigger.AfterWatermark(),
accumulation_mode=beam.trigger.AccumulationMode.ACCUMULATING)
| beam.CombineGlobally(beam.combiners.CountCombineFn()).without_defaults()
| beam.Map(print)
)
I am generating an arbitrary number of events and pushing those to my topic - currently 40. I can confirm through the generation of the events that they all succeed in reaching the topic. Upon simply printing the results of the topic (using beam), I can see what I would expect.
However, what I wanted to try was some basic window aggregation and using both beam.CombineGlobally(beam.combiners.CountCombineFn()) and beam.combiners.Count.Globally(), I notice 2 things happening (not strictly at the same time).
The first issue:
When I print additional window start/ end timestamps, I am getting more than 1 instance of the same window returned. My expectation on a local runner, would be that there is a single fixed window collecting the number of events and emitting a result.
This is the DoFn I've used to get a picture of the windowing data.
class ShowWindowing(beam.DoFn):
def process(self, elem, window = beam.DoFn.WindowParam):
yield f'I am an element: {elem}\nstart window time:{window.start.to_utc_datetime()} and the end window time: {window.end.to_utc_datetime()}'
And to reiterate, the issue is that I am not getting 'duplicate' results, it is rather I am getting multiple semi-grouped results.
The second issue I have (which I feel is related to the above but I've seen this occur without the semi-grouping of elements):
When I execute my pipeline through the CLI (I use notebooks a lot), and generate events to my topic, I am getting considerably less output back which appear to be just partial results.
Example: I produce 40 events - each event has a lag of half a second. My window is set to 5 seconds, I expect (give or take) a combined result of 10 each 5 seconds over 20 seconds. What I get is a completely partial result. This could be a count of 1 over a window or a count of 8.
I've read and re-read the docs (admittedly skipping over some of it just to seek an answer) but I've referenced the katas and the Google Dataflow quest to look for examples/ alternatives and I cannot identify where I'm going wrong.
Thanks
I think this boils down to a TODO in the Python local runner in handling watermarks for PubSub subscriptions. Essentially, it thinks it has received all the data up until now, but there is still data in PubSub that has a timestamp less than now() which becomes late data once it is actually read.
A real runner such as Dataflow won't have this issue.

How can I measure the coverage (in production system)?

I would like to measure the coverage of my Python code which gets executed in the production system.
I want an answer to this question:
Which lines get executed often (hot spots) and which lines are never used (dead code)?
Of course this must not slow down my production site.
I am not talking about measuring the coverage of tests.
I assume you are not talking about test suite code coverage which the other answer is referring to. That is a job for CI indeed.
If you want to know which code paths are hit often in your production system, then you're going to have to do some instrumentation / profiling. This will have a cost. You cannot add measurements for free. You can do it cheaply though and typically you would only run it for short amounts of time, long enough until you have your data.
Python has cProfile to do full profiling, measuring call counts per function etc. This will give you the most accurate data but will likely have relatively high impact on performance.
Alternatively, you can do statistical profiling which basically means you sample the stack on a timer instead of instrumenting everything. This can be much cheaper, even with high sampling rate! The downside of course is a loss of precision.
Even though it is surprisingly easy to do in Python, this stuff is still a bit much to put into an answer here. There is an excellent blog post by the Nylas team on this exact topic though.
The sampler below was lifted from the Nylas blog with some tweaks. After you start it, it fires an interrupt every millisecond and records the current call stack:
import collections
import signal
class Sampler(object):
def __init__(self, interval=0.001):
self.stack_counts = collections.defaultdict(int)
self.interval = interval
def start(self):
signal.signal(signal.VTALRM, self._sample)
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
def _sample(self, signum, frame):
stack = []
while frame is not None:
formatted_frame = '{}({})'.format(
frame.f_code.co_name,
frame.f_globals.get('__name__'))
stack.append(formatted_frame)
frame = frame.f_back
formatted_stack = ';'.join(reversed(stack))
self.stack_counts[formatted_stack] += 1
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
You inspect stack_counts to see what your program has been up to. This data can be plotted in a flame-graph which makes it really obvious to see in which code paths your program is spending the most time.
If i understand it right you want to learn which parts of your application is used most often by users.
TL;DR;
Use one of the metrics frameworks for python if you do not want to do it by hand. Some of them are above:
DataDog
Prometheus
Prometheus Python Client
Splunk
It is usually done by function level and it actually depends on application;
If it is a desktop app with internet access:
You can create a simple db and collect how many times your functions are called. For accomplish it you can write a simple function and call it inside every function that you want to track. After that you can define an asynchronous task to upload your data to internet.
If it is a web application:
You can track which functions are called from js (mostly preferred for user behaviour tracking) or from web api. It is a good practice to start from outer to go inner. First detect which end points are frequently called (If you are using a proxy like nginx you can analyze server logs to gather information. It is the easiest and cleanest way). After that insert a logger to every other function that you want to track and simply analyze your logs for every week or month.
But you want to analyze your production code line by line (it is a very bad idea) you can start your application with python profilers. Python has one already: cProfile.
Maybe make a text file and through your every program method just append some text referenced to it like "Method one executed". Run the web application like 10 times thoroughly as a viewer would and after this make a python program that reads the file and counts a specific parts of it or maybe even a pattern and adds it to a variable and outputs the variables.

Is it possible to force a 2 second looping callback in Python?

I'm trying to get a looping call to run every 2 seconds. Sometimes, I get the desired functionality, but othertimes I have to wait up to ~30 seconds which is unacceptable for my applications purposes.
I reviewed this SO post and found that looping call might not be reliable for this by default. Is there a way to fix this?
My usage/reason for needing a consistent ~2 seconds:
The function I am calling scans an image (using CV2) for a dollar value and if it finds that amount it sends a websocket message to my point of sale client. I can't have customers waiting 30 seconds for the POS terminal to ask them to pay.
My source code is very long and not well commented as of yet, so here is a short example of what I'm doing:
#scan the image for sales every 2 seconds
def scanForSale():
print ("Now Scanning for sale requests")
#retrieve a new image every 2 seconds
def getImagePreview():
print ("Loading Image From Capture Card")
lc = LoopingCall(scanForSale)
lc.start(2)
lc2 = LoopingCall(getImagePreview)
lc2.start(2)
reactor.run()
I'm using a Raspberry Pi 3 for this application, which is why I suspect it hangs for so long. Can I utilize multithreading to fix this issue?
Raspberry Pi is not a real time computing platform. Python is not a real time computing language. Twisted is not a real time computing library.
Any one of these by itself is enough to eliminate the possibility of a guarantee that you can run anything once every two seconds. You can probably get close but just how close depends on many things.
The program you included in your question doesn't actually do much. If this program can't reliably print each of the two messages once every two seconds then presumably you've overloaded your Raspberry Pi - a Linux-based system with multitasking capabilities. You need to scale back your usage of its resources until there are enough available to satisfy the needs of this (or whatever) program.
It's not clear whether multithreading will help - however, I doubt it. It's not clear because you've only included an over-simplified version of your program. I would have to make a lot of wild guesses about what your real program does in order to think about making any suggestions of how to improve it.

How to measure the sample rate in python when getting data from a ADC?

I am working in a project using Raspberry Pi 3 B where I get data from a IR sensor(Sharp GP2Y0A21YK0F) through a ADC MPC3008 and display it in real-time using PyQtgraph library.
However, it seems that I am getting very few samples and the graph is not "smooth" as I expect.
I am using the Adafruit Python MCP3008 Library and the function mcp.read_adc(0) to get the data.
Is there a way to measure the sample rate in Python?
Thank you
Hugo Oliveira
I would suggest setting up some next level buffering, ideally via multiprocessing (see multiprocessing and GUI updating - Qprocess or multiprocessing?) to better get a handle on how fast you can access the data. Currently you're using a QTimer to poll with, which is only getting 3 raw reads every 50 msec... so you're REALLY limiting yourself artificially via the timer. I haven't used the MCP3008, but a quick look at some at their code seems like you'll have to set up some sample testing to try some things out, or investigate further for better documentation. The question is the behavior of the mcp.read_adc(0) method and is it blocking or non-blocking... if non-blocking, does it return stale data if there's no new data, ... etc. It would be ideal if it was blocking from a timing sense, you could just set up a loop on it and time delta each successive return to determine how fast you're able to get new samples. If it's non-blocking, you would want it to return null for no new samples, and only return the actual samples that were new if it does return something. You'll have to play around with it and see how it behaves. At any rate, once you get the secondary thread set up to just poll the mcp.read_adc(0), then you can use the update() timer to collect the latest buffer and plot it. I also don't know the implications of multi-threading / multiprocessing on the RaspPI (see general discussion here: Multiprocessing vs Threading Python) , but anything should be better than the QTimer polling.

live plotting in matplotlib while performing a measurement that takes time

I would like to perform a measurement and plot a graph while the measurement is
running. This measurements takes quite some time in python (it has to retrieve data over a slow connection). The problem is that the graph freezes when measuring. The measurement
consists of setting a center wavelength, and then measuring some signal.
My program looks something like this:
# this is just some arbitrary library that has the functions set_wavelength and
# perform_measurement
from measurement_module import set_wavelength, perform_measurement
from pylab import *
xdata = np.linspace(600,1000,30) # this will be the x axis
ydata = np.zeros(len(xdata)) # this will be the y data. It will
for i in range(len(xdata)):
# this call takes approx 1 s
set_wavelength(xdata[i])
# this takes approx 10 s
ydata[i] = perform_measurement(xdata)
# now I would like to plot the measured data
plot(xdata,ydata)
draw()
This will work when it is run in IPython with the -pylab module switched on,
but while the measurement is running the figure will freeze. How can modify
the behaviour to have an interactive plot while measuring?
You cannot simply use pylab.ion(), because python is busy while performing the measurements.
regards,
Dirk
You can, though maybe a bit awkward, run the data-gathering as a serparate process. I find Popen in the subprocess module quite handy. Then let that data-gathering script save what it does to disk somewhere and you use
Popen.poll()
To check if it has completed.
It ought to work.
I recommend buffering the data in large chunks and render/re-render when the buffer fills up. If you want it to be nonblocking look at greenlets.
from gevent.greenlet import Greenlet
import copy
def render(buffer):
'''
do rendering stuff
'''
pass
buff = ''
while not_finished:
buff = connection.read()
g = Greenlet(render, copy.deepcopy(buff))
g.start()
Slow input and output is the perfect time to use threads and queues in Python. Threads have there limitations, but this is the case where they work easily and effectively.
Outline of how to do this:
Generally the GUI (e.g., the matplotlib window) needs to be in the main thread, so do the data collection in a second thread. In the data thread, check for new data coming in (and if you do this in some type of infinite polling loop, put in a short time.sleep to release the thread occasionally). Then, whenever needed, let the main thread know that there's some new data to be processed/displayed. Exactly how to do this depends on details of your program and your GUI, etc. You could just use a flag in the data thread that you check for from the main thread, or a theading.Event, or, e.g., if you have a wx backend for matplotlib wx.CallAfter is easy. I recommend looking through one of the many Python threading tutorials to get a sense of it, and also threading with a GUI usually has a few issues too so just do a quick google on threading with your particular backend. This sounds cumbersome as I explain it so briefly, but it's really pretty easy and powerful, and will be smoother than, e.g., reading and writing to the same file from different processes.
Take a look at Traits and Chaco, Enthought's type system and plotting library. They provide a nice abstraction to solve the problem you're running into. A Chaco plot will update itself whenever any of its dependencies change.

Categories