Checkmarx OS_Access_Violation on Python os.enviro, parseargs, and main

Checkmarx OS_Access_Violation on Python os.enviro, parseargs, and main - python

I'm getting an OS_Access_Violation in several places in source code across different python projects. It shows up in areas like this:
if __name__ == '__main__':
main(sys.argv[1:])
often coupled with something like:
os.makedirs(args.output_dir, exist_ok=True)
as well as
elif args.backend == "beefygoodness":
os.environ["MMMMM_TACOS"] = "beefygoodness"
and
'args = parser.parse_args()'
There's no description associated with this finding, so I'm unsure what it means and what the proper remediation is. I'm also not sure if it's referring to an access violation in the developer sense (aka, program crash) or if it's a reference to data that shouldn't be accessible, or what exactly.
Google is no help on this either, unfortunately.
So does anyone know what this cryptic high-priority finding is referring to, and what the proper fix is? Thanks!

I haven't tested the query yet, but the result make sense.
You use the input that came from the user, so you have no certainty about its integrity, and it can be hostile.
For example:
args[1:]: You expect for 4 arguments, but the user can give you more, and unexpectedly affect the system.
Now, If I understand your question correctly, you said the vulnerable flow starts in main call and ends in one of os calls.
At this point you should understand that the unprotected input from the user, used as argument of os methods.
What if the user set /root/pwd as directory input?
Or what if the user set malicious text to the environment variable?

I think better solution is to save the arguments/env-variables as files [and credentials fetching from secret store vault] and consume those in runtime.

Related

How I can be aware of a code change in the return when using unpacking

Recently I have a problem when a coworker made a change in a signature for a function return, we have clients that call the function in this way:
example = function()
But then as I was depending on his changes, he unintentionally change to this:
example, other_stuff = function()
I was not aware of this change, I did the merge and everything seem ok but then the error happen as I was expecting one value, but now it was trying to unpack two
So my question is knowing python is not a typed language, is there a way to know this happen and prevent this behavior (a tool or something), because sadly was until a runtime error was raise when I notice this, or how did we need to handle this

Sounds like a process error. An API shouldn't change its signature without considering its users. Within a project its easy, just search. Externally, the API version number should be bumped and the change should be in the change notes.
The API should have unit tests which include return value tests. "he unintentionally changed" issues should all be caught there. Since this didn't happen, a bug report against the tests should be written.
Of course, the coworker could just change those tests. But all of that should be in a code reviewed change set in your source repository. The coworker should have to justify the change and how to mitigate breakage. Since this API appears to have external clients, it should be very difficult to get an API signature change as all clients will need to be notified.

Python, remove password from memory

Im storing some user raw_input as a variable in Python 2.7, the issue is that this is sensitive as it is the encryption passphrase for a cryptocurrency wallet.
Therefore I want to ensure that once the Python script is completed, there is no trace of the passphase left anywhere on the system.
Where passphrase is the variable, is this at the end of the program:
del passphrase
good to utterly remove all traces?

No del xxx or implicit deletion (leaving the current scope) may not be enough to hide the previously stored value. Note that this may crucially depend on your OS and your Python implementation.
However I would advise not to roll your own security systems unless you really, really know what you're doing but rather search an already existing solution for whatever it is you want to do and use that. For example I'm not sure if either raw_input or input are suitable for cryptographical needs.
You may get additional help in Information Security StackExchange.

Updating files with a Perforce trigger before submit

I understand that this question has, in essence, already been asked, but that question did not have an unequivocal answer, so please bear with me.
Background: In my company, we use Perforce submission numbers as part of our versioning. Regardless of whether this is a correct method or not, that is how things are. Currently, many developers do separate submissions for code and documentation: first the code and then the documentation to update the client-facing docs with what the new version numbers should be. I would like to streamline this process.
My thoughts are as follows: create a Perforce trigger (which runs on the server side) which scans the submitted documentation files (such as .txt) for a unique term (such as #####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####) and then replaces it with the value of what the change list would be when submitted. I already know how to determine this value. What I cannot figure out, is how or where to update the files.
I have already determined that using the change-content trigger (whether possible or not), which
"fire[s] after changelist creation and file transfer, but prior to committing the submit to the database",
is the way to go. At this point the files need to exist somewhere on the server. How do I determine the (temporary?) location of these files from within, say, a Python script so that I can update or sed to replace the placeholder value with the intended value? The online documentation for Perforce which I have found so far have not been very explicit on whether this is possible or how the mechanics of a submission at this stage would work.
EDIT
Basically what I am looking for is RCS-like functionality, but without the unsightly special character sequences which accompany it. After more digging, what I am asking is the same as this question. However I believe that this must be possible, because the trigger is running on the server side and the files had already been transferred to the server. They must therefore be accessible by the script.
EXAMPLE
Consider the following snippet from a release notes document:
[#####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
This is what the user submits. I then want the trigger to intercept this file during the submission process (as mentioned, at the change-content stage) and alter it so that what is eventually stored within Perforce looks like this:
[52738] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
Where 52738 is the final change list number of what the user submitted. (As mentioned, I can already determine this number, so please do dwell on this point.) I.e., what the user sees on the Perforce client console is.
Changelist 52733 renamed 52738.
Submitted change 52738.

Are you trying to replace the content of pending changelist files that were edited on a different client workspace (and different user)?
What type of information are you trying to replace in the documentation files? For example,
is it a date, username like with RCS keyword expansion? http://www.perforce.com/perforce/doc.current/manuals/p4guide/appendix.filetypes.html#DB5-18921
I want to get better clarification on what you are trying to accomplish in case there is another way to do what you want.
Depending on what you are trying to do, you may want to consider shelving ( http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.files.html#d0e5537 )
Also, there is an existing Perforce enhancement request I can add your information to,
regarding client side triggers to modify files on the client side prior to submit. If it becomes implemented, you will be notified by email.

99w,
I have also added you to an existing enhancement request for Customizable RCS keywords, along
with the example you provided.
Short of using a post-command trigger to edit the archive content directly and then update the checksum in the database, there is currently not a way to update the file content with the custom-edited final changelist number.

One of the things I learned very early on in programming was to keep out of interrupt level as much as possible, and especially don't do stuff in interrupt that requires resources that can hang the system. I totally get that you want to resolve the internal labeling in sequence, but a better way to do it may be to just set up the edit during the trigger so that a post trigger tool can perform the file modification.
Correct me if I'm looking at this wrong, but there seems a bit of irony, or perhaps recursion, if you are trying to make a file change during the course of submitting a file change. It might be better to have a second change list that is reserved for the log. You always know where that file is, in your local file space. That said, ktext files and $ fields may be able to help.

Is "cgi.escape" necessary in a Python application?

Was looking over a developer's code. He did something that I have never seen before in a Python application. His background is in PHP and is just learning python, so I don't know if this is perhaps a holdover from the different system architectures that he is used to working with.
He told me that the purpose of this code is to prevent the user from attacking the application via code insertion. I'm pretty sure this is unnecessary for our use case since we are never evaluating the data as code, but I just wanted to make sure and ask the community.
# Import library
from cgi import escape
# Get information that the client submitted
fname = GET_request.get('fname', [''] )[0]
# Make sure client did not submit malicious code <- IS THIS NECESSARY?
if fname:
fname = escape(fname)
Is this typically necessary in a Python application?
In what situations is it necessary?
In what situations is it not necessary?

If user input is going into a database, or anywhere else it might be executed, then code injection could be a problem.
This question asks about ways to prevent code injection in php, but the principle is the same - SQL queries containing malicious code get executed, potentially doing things like deleting all your data.
The escape function converts <, > and & characters into html-safe sequences.
From those two links it doesn't look like escape() is enough on it's own, but something does need to be done to stop malicious code. Of course this may well be being taken care of elsewhere in your code.

Is it ever polite to put code in a python configuration file?

One of my favorite features about python is that you can write configuration files in python that are very simple to read and understand. If you put a few boundaries on yourself, you can be pretty confident that non-pythonistas will know exactly what you mean and will be perfectly capable of reconfiguring your program.
My question is, what exactly are those boundaries? My own personal heuristic was
Avoid flow control. No functions, loops, or conditionals. Those wouldn't be in a text config file and people aren't expecting to have understand them. In general, it probably shouldn't matter the order in which your statements execute.
Stick to literal assignments. Methods and functions called on objects are harder to think through. Anything implicit is going to be a mess. If there's something complicated that has to happen with your parameters, change how they're interpreted.
Language keywords and error handling are right out.
I guess I ask this because I came across a situation with my Django config file where it seems to be useful to break these rules. I happen to like it, but I feel a little guilty. Basically, my project is deployed through svn checkouts to a couple different servers that won't all be configured the same (some will share a database, some won't, for example). So, I throw a hook at the end:
try:
from settings_overrides import *
LOCALIZED = True
except ImportError:
LOCALIZED = False
where settings_overrides is on the python path but outside the working copy. What do you think, either about this example, or about python config boundaries in general?

There is a Django wiki page, which addresses exactly the thing you're asking.
http://code.djangoproject.com/wiki/SplitSettings
Do not reinvent the wheel. Use configparser and INI files. Python files are to easy to break by someone, who doesn't know Python.

Your heuristics are good. Rules are made so that boundaries are set and only broken when it's obviously a vastly better solution than the alternate.
Still, I can't help but wonder that the site checking code should be in the parser, and an additional configuration item added that selects which option should be taken.
I don't think that in this case the alternative is so bad that breaking the rules makes sense...
-Adam

I think it's a pain vs pleasure argument.
It's not wrong to put code in a Python config file because it's all valid Python, but it does mean you could confuse a user who comes in to reconfigure an app. If you're that worried about it, rope it off with comments explaining roughly what it does and that the user shouldn't edit it, rather edit the settings_overrides.py file.
As for your example, that's nigh on essential for developers to test then deploy their apps. Definitely more pleasure than pain. But you should really do this instead:
LOCALIZED = False
try:
from settings_overrides import *
except ImportError:
pass
And in your settings_overrides.py file:
LOCALIZED = True
... If nothing but to make it clear what that file does.. What you're doing there splits overrides into two places.

As a general practice, see the other answers on the page; it all depends. Specifically for Django, however, I see nothing fundamentally wrong with writing code in the settings.py file... after all, the settings file IS code :-)
The Django docs on settings themselves say:
A settings file is just a Python module with module-level variables.
And give the example:
assign settings dynamically using normal Python syntax. For example:
MY_SETTING = [str(i) for i in range(30)]

Settings as code is also a security risk. You import your "config", but in reality you are executing whatever code is in that file. Put config in files that you parse first and you can reject nonsensical or malicious values, even if it is more work for you. I blogged about this in December 2008.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.