Regression testing when "test oracle" is an informal output comparison - python

I maintain a Python program that provides advice on certain topics. It does this by applying a complicated algorithm to the input data.
The program code is regularly changed, both to resolve newly found bugs, and to modify the underlying algorithm.
I want to use regression tests. Trouble is, there's no way to tell what the "correct" output is for a certain input - other than by running the program (and even then, only if it has no bugs).
I describe below my current testing process. My question is whether there are tools to help automate this process (and of course, if there is any other feedback on what I'm doing).
The first time the program seemed to run correctly for all my input cases, I saved their outputs in a folder I designated for "validated" outputs. "Validated" means that the output is, to the best of my knowledge, correct for a given version of my program.
If I find a bug, I make whatever changes I think would fix it. I then rerun the program on all the input sets, and manually compare the outputs. Whenever the output changes, I do my best to informally review those changes and figure out whether:
the changes are exclusively due to the bug fix, or
the changes are due, at least in part, to a new bug I introduced
In case 1, I increment the internal version counter. I mark the output file with a suffix equal to the version counter and move it to the "validated" folder. I then commit the changes to the Mercurial repository.
If in the future, when this version is no longer current, I decide to branch off it, I'll need these validated outputs as the "correct" ones for this particular version.
In case 2, I of course try to find the newly introduced bug, and fix it. This process continues until I believe the only changes versus the previous validated version are due to the intended bug fixes.
When I modify the code to change the algorithm, I follow a similar process.

Here's the approach I'll probably use.
Have Mercurial manage the code, the input files, and the regression test outputs.
Start from a certain parent revision.
Make and document (preferably as few as possible) modifications.
Run regression tests.
Review the differences with the parent revision regression test output.
If these differences do not match the expectations, try to see whether a new bug was introduced or whether the expectations were incorrect. Either fix the new bug and go to 3, or update the expectations and go to 4.
Copy the output of regression tests to the folder designated for validated outputs.
Commit the changes to Mercurial (including the code, the input files, and the output files).

Related

What is the best or proper way to allow debugging of generated code?

For various reasons, in one project I generate executable code by means of generating AST from various source files the compiling that to bytecode (though the question could also work for cases where the bytecode is generated directly I guess).
From some experimentation, it looks like the debugger more or less just uses the lineno information embedded in the AST alongside the filename passed to compile in order to provide a representation for the debugger's purposes, however this assumes the code being executed comes from a single on-disk file.
That is not necessarily the case for my project, the executable code can be pieced together from multiple sources, and some or all of these sources may have been fetched over the network, or been retrieved from non-disk storage (e.g. database).
And so my Y questions, which may be the wrong ones (hence the background):
is it possible to provide a memory buffer of some sort, or is it necessary to generate a singular on-disk representation of the "virtual source"?
how well would the debugger deal with jumping around between the different bits and pieces if the virtual source can't or should not be linearised[0]
and just in case, is the assumption of Python only supporting a single contiguous source file correct or can it actually be fed multiple sources somehow?
[0] for instance a web-style literate program would be debugged in its original form, jumping between the code sections, not in the so-called "tangled" form
Some of this can be handled by the trepan3k debugger. For other things various hooks are in place.
First of all it can debug based on bytecode alone. But of course stepping instructions won't be possible if the line number table doesn't exist. And for that reason if for no other, I would add a "line number" for each logical stopping point, such as at the beginning of statements. The numbers don't have to be line numbers, they could just count from 1 or be indexes into some other table. This is more or less how go's Pos type position works.
The debugger will let you set a breakpoint on a function, but that function has to exist and when you start any python program most of the functions you define don't exist. So the typically way to do this is to modify the source to call the debugger at some point. In trepan3k the lingo for this is:
from trepan.api import debug; debug()
Do that in a place where the other functions you want to break on and that have been defined.
And the functions can be specified as methods on existing variables, e.g. self.my_function()
One of the advanced features of this debugger is that will decompile the bytecode to produce source code. There is a command called deparse which will show you the context around where you are currently stopped.
Deparsing bytecode though is a bit difficult so depending on which kind of bytecode you get the results may vary.
As for the virtual source problem, well that situation is somewhat tolerated in the debugger, since that kind of thing has to go on when there is no source. And to facilitate this and remote debugging (where the file locations locally and remotely can be different), we allow for filename remapping.
Another library pyficache is used to for this remapping; it has the ability I believe remap contiguous lines of one file into lines in another file. And I think you could use this over and over again. However so far there hasn't been need for this. And that code is pretty old. So someone would have to beef up trepan3k here.
Lastly, related to trepan3k is a trepan-xpy which is a CPython bytecode debugger which can step bytecode instructions even when the line number table is empty.

Difference between approxCountDsitinct and approx_count_distinct in spark functions

Can anyone tell the difference between pyspark.sql.functions.approxCountDistinct (I know it is deprecated) and pyspark.sql.functions.approx_count_distinct? I have used both versions in a project and have experienced different values
As you mentioned it, pyspark.sql.functions.approxCountDistinct is deprecated. The reason is most likely just a style concern. They probably wanted everything to be in snake case. As you can see in the source code pyspark.sql.functions.approxCountDistinct simply calls pyspark.sql.functions.approx_count_distinct, nothing more except giving you a warning. So regardless the one you use, the very same code runs in the end.
Also, still according to the source code, approx_count_distinct is based on the HyperLogLog++ algorithm. I am not very familiar with the algorithm but it is based on repetitive set merging. Therefore, the result will most likely depend on the order in which the various results of the executors are merged. Since this is not deterministic with spark, this could explain why you witness different results.

Execution Code Tracking - How to know which code has been executed in project?

Let say that I have open source project from which I would like to borrow some functionality. Can I get some sort of report generated during execution and/or interaction of this project?
Report should contain e.g.:
which functions has been called,
in which order,
which classes has been instantiated etc.?
Would be nice to have some graphic output for that... you know, if else tree and highlighted the executed branch etc.
I am mostly interested in python and C (perl would be fine too) but if there is any universal tool that cover multiple languages (or one tool per language) for that, it would be very nice.
PS: I am familiar with debuggers but I do not want to step every singe line of code and check if this is the correct instruction. I'm assuming that if functions/methods/classes etc. are properly named then one can get some hints about where to find desired piece of code. But only naming is not enough because you do not know (from brief overview of code) if hopefully looking function foo() does not require some data that was generated by obscure function bar() etc. For that reason I am looking for something that can visualize relations between running code.
PS: Do not know if this is question for SO or programmers.stackexchange. Feel free to move if you wish. PS: I've noticed that tags that I've used are not recommended but execution flow tracking is the best phrase to describe this process
Check out Ned Batchelder's coverage and perhaps the graphviz/dot library called pycallgraph. May not be exactly what you need and also (python-only) but in the ballpark.
Pycallgraph is actually likelier to be of interest because it shows the execution path, not just what codelines got executed. It only renders to PDF normally, but it wasn't too difficult to get it to do SVG instead (dot/graphviz supports svg and other formats, pycallgraph was hardcoding pdf rendering).
Neither will do exactly what you want but they are a start.

Updating files with a Perforce trigger before submit

I understand that this question has, in essence, already been asked, but that question did not have an unequivocal answer, so please bear with me.
Background: In my company, we use Perforce submission numbers as part of our versioning. Regardless of whether this is a correct method or not, that is how things are. Currently, many developers do separate submissions for code and documentation: first the code and then the documentation to update the client-facing docs with what the new version numbers should be. I would like to streamline this process.
My thoughts are as follows: create a Perforce trigger (which runs on the server side) which scans the submitted documentation files (such as .txt) for a unique term (such as #####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####) and then replaces it with the value of what the change list would be when submitted. I already know how to determine this value. What I cannot figure out, is how or where to update the files.
I have already determined that using the change-content trigger (whether possible or not), which
"fire[s] after changelist creation and file transfer, but prior to committing the submit to the database",
is the way to go. At this point the files need to exist somewhere on the server. How do I determine the (temporary?) location of these files from within, say, a Python script so that I can update or sed to replace the placeholder value with the intended value? The online documentation for Perforce which I have found so far have not been very explicit on whether this is possible or how the mechanics of a submission at this stage would work.
EDIT
Basically what I am looking for is RCS-like functionality, but without the unsightly special character sequences which accompany it. After more digging, what I am asking is the same as this question. However I believe that this must be possible, because the trigger is running on the server side and the files had already been transferred to the server. They must therefore be accessible by the script.
EXAMPLE
Consider the following snippet from a release notes document:
[#####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
This is what the user submits. I then want the trigger to intercept this file during the submission process (as mentioned, at the change-content stage) and alter it so that what is eventually stored within Perforce looks like this:
[52738] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
Where 52738 is the final change list number of what the user submitted. (As mentioned, I can already determine this number, so please do dwell on this point.) I.e., what the user sees on the Perforce client console is.
Changelist 52733 renamed 52738.
Submitted change 52738.
Are you trying to replace the content of pending changelist files that were edited on a different client workspace (and different user)?
What type of information are you trying to replace in the documentation files? For example,
is it a date, username like with RCS keyword expansion? http://www.perforce.com/perforce/doc.current/manuals/p4guide/appendix.filetypes.html#DB5-18921
I want to get better clarification on what you are trying to accomplish in case there is another way to do what you want.
Depending on what you are trying to do, you may want to consider shelving ( http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.files.html#d0e5537 )
Also, there is an existing Perforce enhancement request I can add your information to,
regarding client side triggers to modify files on the client side prior to submit. If it becomes implemented, you will be notified by email.
99w,
I have also added you to an existing enhancement request for Customizable RCS keywords, along
with the example you provided.
Short of using a post-command trigger to edit the archive content directly and then update the checksum in the database, there is currently not a way to update the file content with the custom-edited final changelist number.
One of the things I learned very early on in programming was to keep out of interrupt level as much as possible, and especially don't do stuff in interrupt that requires resources that can hang the system. I totally get that you want to resolve the internal labeling in sequence, but a better way to do it may be to just set up the edit during the trigger so that a post trigger tool can perform the file modification.
Correct me if I'm looking at this wrong, but there seems a bit of irony, or perhaps recursion, if you are trying to make a file change during the course of submitting a file change. It might be better to have a second change list that is reserved for the log. You always know where that file is, in your local file space. That said, ktext files and $ fields may be able to help.

How can unit test make changes to code quicker?

The following point (in bold) is mentioned in this famous Stackoverflow question:
Unit Tests allows you to make big changes to code quickly. You know it works now because you've run the tests, when you make the changes you need to make, you need to get the tests working again. This saves hours.
In my case, I finished writing a program in Python 2.7. Now I started writing the test using PyUnit. The test will be another class (derived from "unittest.TestCase") which will exist in a different file. (I did not know that the test should be written before or during development at the beginning)
As I am writing the test, I started wondering: In case I modified my program code, and ran my test again, then the test should still work without changes because it was not changed (the above point suggests that you need to make changes to test to make it work!) It is the program code itself that was changed and not the test.
I do not understand how the last sentence in the above-mentioned point makes sense. I hope I can find somebody who can help me in understanding it.
Thanks
Unit tests verify contracts. They won't change if contracts are unchanged. A programmer can freely modify implementation feeling himself protected from errors by UT.
The sentence you quote is about changing contracts - UT indicates a change in contract and programmer should ensure this change is reasonable. In well designed software this is easier than verifying correctness of implementation, hence speed-up of the process.
The test should actually execute the package code, so that breaking the package will show up in tests.
I think the highlighted sentence should have little more details, like if the original 'contract', or 'requirement' of the module its testing is changed, or not changed.
My quick read says, the original contract has not changed. But still you have to run, and make sure it works. Or if your code improved performance due to your modification, it should readjust the test to reflect improvements. But again requirement remained same, and your code is performing better.

Categories