json data for/in python unittests - python

I'm writing an ORM-like library for Mongo. I've written some models, and want to make sure that they and the machinery that supports them are correct, so I'd like to write unit tests for them. I figured the best way to do that would be to simply write out some test data as JSON, then pass them to my models and see if the valid data is considered valid and the invalid data as invalid.
My question is where to put that data: it seems like a lot of non-TestCase stuff to put in test_models.py, but adding a separate test_data_for_models.json contributes its own headaches (versioning, etc.)
Which is the more recommended/idiomatic? Bear in mind I don't really need to generate any data--I'll likely just adapt data that's already in the db--I'm just unsure where to put it.

Related

How do you pythonically reuse a complex data structure without copying or re-reading a file?

I'm writing a Discord bot that, among many other things, must engage in question-answer dialogues with users. There may be multiple of the same kind of dialogue ("script") running at once asynchronously.
On program startup, a YAML file is read which contains scripts (not in the programming sense), which in turn contain lists of questions, each of which contain the question text. Both scripts and questions contain accessory attributes such as regex formulas, rejection responses, and display names.
Currently, ChatBotManager contains a list of Script objects which in turn contain Question objects, all of which can error-check the relevant portion of the YAML they are passed on initialization. Since the YAML is user-editable, this initialization is non-trivial, and should throw critical-exceptions when passed bad data.
After all this initialization, I have a very nice hierarchical structure with "has a" rather than "is a" relationships, and each level of data can contain methods relevant to it.
The question is this: which of the below ways of recording user responses is most pythonic, expandable, and readable?
Record user responses in a separate, partially-mirrored datastructure and refer to the Script and contained Question objects as read-only
Record user responses in Question objects themselves, then use some way to ensure that data does not contaminate parallel, identical chatbots. Both kinds of objects can have methods relevant to only their function, and other functions need little knowledge of the contents.
Note: When finally processing the responses, I need access to accessory attributes of each question.
I'm not asking this because I am stumped and can't progress: the first option is clearly within my knowledge. I want to know best practice. The second option embraces decoupling and clearly delineated responsibilities, but I don't know how to implement it properly. Perhaps my Googling for a solution lacked the right terminology?
I don't want to re-read the YAML file on every object's creation, nor run the validation code each time. Reading and validation need only happen on bot startup. I'd prefer to keep validation methods in the object that will store the validated data.
One idea I had is to deep-copy a Script object on each chatbot creation, then discard it on completion. I've read this is an expensive operation, and while my program isn't low on resources, it seems bad practice.
Should each Question be able to generate a Response object that contains the response and the data about the question that is needed later? These could then be compiled in a ScriptResponse object generated by the Script which also contains information about the script.

python-hypothesis: Retrieving or reformatting a falsifying example

Is it possible to retrieve or reformat the falsifying example after a test failure? The point is to show the example data in a different format - data generated by the strategy is easy to work with in the code but not really user friendly, so I'm looking at how to display it in a different form. Even a post-mortem tool working with the example database would be enough, but there does not seem to be any API allowing that, or am I missing something?
You can call note to record additional information during a test, such as your own custom-formatted copy of the generated inputs.
When Hypothesis finds a falsifying example, it will also print out the notes that were recorded the execution of that particular example.
Even a post-mortem tool working with the example database would be enough, but there does not seem to be any API allowing that, or am I missing something?
The example database uses a private format and only records the choices a strategy made to generate the falsifying example, so there's no way to extract the data of the example short of re-running the test.
Stuart's recommendation of hypothesis.note(...) is a good one.

Best way to retrieve database results for further use?

I am working heavily with a database, using python, and I am trying to write code that actually makes my life easier.
Most of the time, I need to run a query and get results to process them; most of the time I get the same fields from the same table, so my idea was to collect the various results in an object, to process it later.
I am using SQLAlchemy for the DB interaction. From what I can read, there is no direct way to just say "dump the result of this query to an object", so I can access the various fields like
print object.fieldA
print object.fieldB
and so on. I tried dumping the results to JSON, but even that require parsing and it is not as straightforward as I hoped.
So at this point is there anything else that I can actually try? Or should I write a custom object that mimic the db structure, and parse the result with for loops, to put the data in the right place? I was hoping to find a way to do this automatically, but so far it seems that the only way to get something close to what I am looking for, is to use JSON.
EDIT:
Found some info about serialization and the capabilities that SQLAlchemy has, to read a table and reproduce a sort of 1:1 copy of it in an object, but I am not sure that this will actually work with a query.
Found that the best way is to actually use a custom object.
You can use reflection trough SQLAlchemy to extrapolate the structure, but if you are dealing with a small database with few tables, you can simply create on your own the object that will host the data. This gives you control over the object and what you can put in it.
There are obvious other ways, but since nobody posted anything; I assume that either are too easy to be mentioned, or too hard and specific to each case.

Is it possible to inject shell/python commands from a configuration file?

Say you have a some meta data for a custom file format that your python app reads. Something like a csv with variables that can change as the file is manipulated:
var1,data1
var2,data2
var3,data3
So if the user can manipulate this meta data, do you have to worry about someone crafting a malformed meta data file that will allow some arbitrary code execution? The only thing I can imagine if you you made the poor choice to make var1 be a shell command that you execute with os.sys(data1) in your own code somewhere. Also, if this were C then you would have to worry about buffers being blown, but I don't think you have to worry about that with python. If your reading in that data as a string is it possible to somehow escape the string "\n os.sys('rm -r /'), this SQL like example totally wont work, but is there similar that is possible?
If you are doing what you say there (plain text, just reading and parsing a simple format), you will be safe. As you indicate, Python is generally safe from the more mundane memory corruption errors that C developers can create if they are not careful. The SQL injection scenario you note is not a concern when simply reading in files in python.
However, if you are concerned about security, which it seems you are (interjection: good for you! A good programmer should be lazy and paranoid), here are some things to consider:
Validate all input. Make sure that each piece of data you read is of the expected size, type, range, etc. Error early, and don't propagate tainted variables elsewhere in your code.
Do you know the expected names of the vars, or at least their format? Make sure the validate that it is the kind of thing you expect before you use it. If it should be just letters, confirm that with a regex or similar.
Do you know the expected range or format of the data? If you're expecting a number, make sure it's a number before you use it. If it's supposed to be a short string, verify the length; you get the idea.
What if you get characters or bytes you don't expect? What if someone throws unicode at you?
If any of these are paths, make sure you canonicalize and know that the path points to an acceptable location before you read or write.
Some specific things not to do:
os.system(attackerControlledString)
eval(attackerControlledString)
__import__(attackerControlledString)
pickle/unpickle attacker controlled content (here's why)
Also, rather than rolling your own config file format, consider ConfigParser or something like JSON. A well understood format (and libraries) helps you get a leg up on proper validation.
OWASP would be my normal go-to for providing a "further reading" link, but their Input Validation page needs help. In lieu, this looks like a reasonably pragmatic read: "Secure Programmer: Validating Input". A slightly dated but more python specific one is "Dealing with User Input in Python"
Depends entirely on the way the file is processed, but generally this should be safe. In Python, you have to put in some effort if you want to treat text as code and execute it.

Converting Python App into Django

I've got a Python program with about a dozen classes, with several classes possessing instances of other classes, e.g. ObjectA has a list of ObjectB's, and a dictionary of (ObjectC, ObjectD) pairs.
My goal is to put the program's functionality on a website.
I've written and tested JSON encode and decode methods for each class. The problem as I see it now is that I need to choose between starting over and writing the models and logic afresh from a database perspective, or simply storing the python objects (encoded as JSON) in the database, and pulling out the saved states for changes.
Can someone confirm that these are both valid approaches, and that I'm not missing any other simple options?
Man, what I think you can do is convert the classes you already have made into django model classes. Of course, only the ones that need to be saved to a database. The other classes, as the rest of the code, I recommend you to encapsulate them for use as helper functions. So you don't have to change too much your code and it's going to work fine. ;D
Or, another choice, that can be easier to implement is: put everything in a helper, the classes, the functions and everything else.
SO you'll just need to call the functions in your views and define models to save your data into the database.
Your idea of saving the objects as JSON on the database works, but it's ugly. ;)
Anyway, if you are in a hurry to deliver the website, anything is valid. Just remember that things made in this way always give us lots of problems in the future.
It hopes that it could be useful! :D

Categories