I have a discord bot that executes
with open("input.bc", "w") as f:
f.write(INPUT)
where INPUT is a string limited to 2000 characters. this file later gets deleted. Is this safe if INPUT is whatever the user wishes it to be?
then runs the bc file (google bc programming language if you are curios)
by the way, this file later gets executed with
execlp("bc", "bc", "-q", "bc_funcs/lib.bc", "bc_funcs/init.bc",
FILE_NAME, "bc_funcs/exit.bc", NULL);
forgot to mention: After 5 seconds if the bc file doesnt finish executing the process is stopped.
The question wording is a little misleading, you would not be trusting user input. User input cannot be trusted.
You will be trusting bc though. I don't know bc to this detail, but it appears to not allow malicious operations. The reason you would still probably not want to run arbitrary user input is exactly because of that trust in the bc implementation that you have. Probably it was not supposed to allow arbitrary operations beyond maths, and there is no known vulnerability that I can find, but these implementations might have vulnerabilities that people have just not discovered yet. In case of bc I think the risk of a potential vulnerability is increased by the fact that not a lot of research might have gone into finding such vulnerabilities.
So in short, while there might not be a known vulnerability right now, my take is it would probably be possible to exploit bc in a way that compromises your server.
Another potential vulnerability (depending on your attacker model) is the file operation involved. You are writing user input to a file (which is fine), and then separate from that you read that file and run it in bc. An attacker might be able to add path elements to run something else as bc on the server, or replace the written .bc file with something else before being run, potentially creating incorrect results (and that's only the best case). File operations are tricky to get right if you assume some level of access for an attacker.
If I get you right, you just let any user do input and than execue it. It's kinda not safe, if a lot of users will use it.
Related
First of all sorry for my bad english.
I'm working on a project and I need to generate a code (ID) that I can verify later.
As my project is very extensive I will give you and example and later what I need to solve.
Example: I have a code that get the temperature of a place once a day, and the data is stored in a local database (I save the temperature, the date, and the unique ID).
The code is encrypted (No one can see the source code of the program).
Now my problem.
I need to be sure that the data stored in my database has not been modified.
What I think can solve this is: For example, the date is 08-19-2017 and the temperature is 25°C. I can do some math operations (for example, multiply all) and get an ID, and later on I can verify if the code match the date and temperature.
Do you think this is a good solution or is there a better one?
Thanks all.
I'm using Python and linux.
The code is encrypted (No one can see the source code of the program).
That's a fallacy. Unless you're using a secure processor that can actually decrypt things into memory that can't be read by the operating system, your program is never truly encrypted. Sure, the original python might be hidden, but from the assembly, a somewhat skilled person can easily gather what is happening.
So, since this is kind of a data security question: Security by obscurity doesn't work on general-purpose hardware. Especially not with relatively high-level things like Python.
Now my problem. I need to be sure that the data stored in my database has not been modified.
That is a hard problem, indeed. The problem is that: if someone's able to fully reconstruct the state of your program, they can also reconstruct what your encryption would have done if the data was different.
There's a few ways around that. But in the end, they all break down to a single principle:
You need some hardware device that can encrypt your data as it comes and proves it hasn't been tampered with, e.g. by keeping a counter of how many things have been encrypted. So, if you have e.g 100 things in the database that have been encrypted by your secure, uncloneable crypto hardware, and it shows it has only been used 100 times, you're fine. The same would apply if that hardware would, for example, always do "encrypt(input bytes + timestamp)".
You can't do that in software on a general purpose OS — software can always be made to run with modified data, and if it's just that you patch the physical memory accessed just in time.
So, what you'll need specific hardware. Feels like a crypto smart card would be able to do something like that, but I don't know whether that includes the functionality to keep a counter or include the timestamp.
One solution that might work is basically using a stream cipher to ensure the integrity of the whole data "stream". Here, part of the secret is the state in which the encryption algorithm is in. Imagine this: You have a smart card with a secret key from a keypair generated on the card itself on it. You hold the other key in your cellar.
You, before shipping the device, encrypt something secret. That puts the smartcard in a state that the malicious tamperer can't guess.
You encrypt the first value, save the output. That changes the internal state!
You encrypt and save the output of a known word or sequence
repeat 2. + 3. for all the other values to be stored.
at the end, you decrypt the data in the database using the key you kept in your cellar. Since the internal state necessarily changed with the input data (i.e. encrypting the same data twice doesn't give the same result!!), the data isn't correctly decryptable if you something is missing from the records. You can immediately check by the output generated by the known word.
takeaway
What you're trying to do is hard – that namely being:
running software on hardware that you have no control over and having to ensure the authenticity of the data it produced.
Now, the impossible part is actually making sure that data hasn't been tampered with before it enters your software – who says that, for example, the driver for your temperature sensor hasn't been replaced by something that always says "-18 °C"? To avoid the capability of people to tamper with your software, you'll need hardware that enforces the non-tampering. And that's not something you can do on PC-style hardware, unless you disable all debugging possibilities and ensure you have safe booting capability.
Im storing some user raw_input as a variable in Python 2.7, the issue is that this is sensitive as it is the encryption passphrase for a cryptocurrency wallet.
Therefore I want to ensure that once the Python script is completed, there is no trace of the passphase left anywhere on the system.
Where passphrase is the variable, is this at the end of the program:
del passphrase
good to utterly remove all traces?
No del xxx or implicit deletion (leaving the current scope) may not be enough to hide the previously stored value. Note that this may crucially depend on your OS and your Python implementation.
However I would advise not to roll your own security systems unless you really, really know what you're doing but rather search an already existing solution for whatever it is you want to do and use that. For example I'm not sure if either raw_input or input are suitable for cryptographical needs.
You may get additional help in Information Security StackExchange.
I understand that this question has, in essence, already been asked, but that question did not have an unequivocal answer, so please bear with me.
Background: In my company, we use Perforce submission numbers as part of our versioning. Regardless of whether this is a correct method or not, that is how things are. Currently, many developers do separate submissions for code and documentation: first the code and then the documentation to update the client-facing docs with what the new version numbers should be. I would like to streamline this process.
My thoughts are as follows: create a Perforce trigger (which runs on the server side) which scans the submitted documentation files (such as .txt) for a unique term (such as #####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####) and then replaces it with the value of what the change list would be when submitted. I already know how to determine this value. What I cannot figure out, is how or where to update the files.
I have already determined that using the change-content trigger (whether possible or not), which
"fire[s] after changelist creation and file transfer, but prior to committing the submit to the database",
is the way to go. At this point the files need to exist somewhere on the server. How do I determine the (temporary?) location of these files from within, say, a Python script so that I can update or sed to replace the placeholder value with the intended value? The online documentation for Perforce which I have found so far have not been very explicit on whether this is possible or how the mechanics of a submission at this stage would work.
EDIT
Basically what I am looking for is RCS-like functionality, but without the unsightly special character sequences which accompany it. After more digging, what I am asking is the same as this question. However I believe that this must be possible, because the trigger is running on the server side and the files had already been transferred to the server. They must therefore be accessible by the script.
EXAMPLE
Consider the following snippet from a release notes document:
[#####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
This is what the user submits. I then want the trigger to intercept this file during the submission process (as mentioned, at the change-content stage) and alter it so that what is eventually stored within Perforce looks like this:
[52738] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
Where 52738 is the final change list number of what the user submitted. (As mentioned, I can already determine this number, so please do dwell on this point.) I.e., what the user sees on the Perforce client console is.
Changelist 52733 renamed 52738.
Submitted change 52738.
Are you trying to replace the content of pending changelist files that were edited on a different client workspace (and different user)?
What type of information are you trying to replace in the documentation files? For example,
is it a date, username like with RCS keyword expansion? http://www.perforce.com/perforce/doc.current/manuals/p4guide/appendix.filetypes.html#DB5-18921
I want to get better clarification on what you are trying to accomplish in case there is another way to do what you want.
Depending on what you are trying to do, you may want to consider shelving ( http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.files.html#d0e5537 )
Also, there is an existing Perforce enhancement request I can add your information to,
regarding client side triggers to modify files on the client side prior to submit. If it becomes implemented, you will be notified by email.
99w,
I have also added you to an existing enhancement request for Customizable RCS keywords, along
with the example you provided.
Short of using a post-command trigger to edit the archive content directly and then update the checksum in the database, there is currently not a way to update the file content with the custom-edited final changelist number.
One of the things I learned very early on in programming was to keep out of interrupt level as much as possible, and especially don't do stuff in interrupt that requires resources that can hang the system. I totally get that you want to resolve the internal labeling in sequence, but a better way to do it may be to just set up the edit during the trigger so that a post trigger tool can perform the file modification.
Correct me if I'm looking at this wrong, but there seems a bit of irony, or perhaps recursion, if you are trying to make a file change during the course of submitting a file change. It might be better to have a second change list that is reserved for the log. You always know where that file is, in your local file space. That said, ktext files and $ fields may be able to help.
Say you have a some meta data for a custom file format that your python app reads. Something like a csv with variables that can change as the file is manipulated:
var1,data1
var2,data2
var3,data3
So if the user can manipulate this meta data, do you have to worry about someone crafting a malformed meta data file that will allow some arbitrary code execution? The only thing I can imagine if you you made the poor choice to make var1 be a shell command that you execute with os.sys(data1) in your own code somewhere. Also, if this were C then you would have to worry about buffers being blown, but I don't think you have to worry about that with python. If your reading in that data as a string is it possible to somehow escape the string "\n os.sys('rm -r /'), this SQL like example totally wont work, but is there similar that is possible?
If you are doing what you say there (plain text, just reading and parsing a simple format), you will be safe. As you indicate, Python is generally safe from the more mundane memory corruption errors that C developers can create if they are not careful. The SQL injection scenario you note is not a concern when simply reading in files in python.
However, if you are concerned about security, which it seems you are (interjection: good for you! A good programmer should be lazy and paranoid), here are some things to consider:
Validate all input. Make sure that each piece of data you read is of the expected size, type, range, etc. Error early, and don't propagate tainted variables elsewhere in your code.
Do you know the expected names of the vars, or at least their format? Make sure the validate that it is the kind of thing you expect before you use it. If it should be just letters, confirm that with a regex or similar.
Do you know the expected range or format of the data? If you're expecting a number, make sure it's a number before you use it. If it's supposed to be a short string, verify the length; you get the idea.
What if you get characters or bytes you don't expect? What if someone throws unicode at you?
If any of these are paths, make sure you canonicalize and know that the path points to an acceptable location before you read or write.
Some specific things not to do:
os.system(attackerControlledString)
eval(attackerControlledString)
__import__(attackerControlledString)
pickle/unpickle attacker controlled content (here's why)
Also, rather than rolling your own config file format, consider ConfigParser or something like JSON. A well understood format (and libraries) helps you get a leg up on proper validation.
OWASP would be my normal go-to for providing a "further reading" link, but their Input Validation page needs help. In lieu, this looks like a reasonably pragmatic read: "Secure Programmer: Validating Input". A slightly dated but more python specific one is "Dealing with User Input in Python"
Depends entirely on the way the file is processed, but generally this should be safe. In Python, you have to put in some effort if you want to treat text as code and execute it.
long-time lurker here, finally emerging from the woodwork.
Essentially, what I'm trying to do is have my logger write data like this to the logfile:
Connecting to database . . . Done.
I'd like the 'Connecting to database . . . ' to be written when the function is called, and the 'Done' written after the function has successfully executed.
I'm using Python 2.6 and the logging module. Also, I'd really like to avoid using decorators for this. Any help would be most appreciated!
Writing to a log is, and must be, an atomic action -- this is crucial, and a key feature of any logging package (including the one in Python's standard library) that distinguishes logging from the simple appending of information to files (where bits of things being written by different processes and threads might well "interleave" -- one of them writing some part of a line but not the line-end, just as you desire, and then maybe another one interposing something right afterwards, before the first task writes what it thinks will be the last part of the line but actually ends up on another line... utter confusion often results;-).
It's not inevitable that the atomic unit be "a line" (logs can be recorded elsewhere than to a text file, of course, and some of the things that are acceptable "sinks" for logs won't even have the concept of "a line"!), but, for logging, atomic units there must be. If you want something entire non-atomic then you don't really want logging but simple appends to a file or other stream (and, watch out for the likely confusion mentioned in the first paragraph;-).
For transient updates about what your task is doing (in the middle of X, X done, starting Y, etc), you could think of a specialized log-handler that (for example) interprets such streams of updates by taking the first word as a subtask-identifier (incrementally building up and displaying somewhere the composite message about the "current subtask", recognizing when the subtask identifier changes that the previous subtask is finished or taking an explicit "subtask finished" message, and only writing persistent log entries on subtask-finished events).
It's a pretty specialized requirement so you're not likely to find a pre-made tool for this, but rather you'll have to roll your own. To help you with that, it's crucial to understand exactly what you're trying to accomplish (why would you want non-atomic logging entries, if such a concept even made any sense -- what deployment or system administration task are you trying to ameliorate by using such a hypothetical tool?) so that the specialized subsystem can be tailored to your actual needs. So, can you please expand on this?
I don't believe Python's logger supports that.
However, would it not be better to aggree on a Log format so that the log file can be easily parsed when you want analyse the data where ; is any deliminator you want:
DateTime;LogType;string
That could be parsed easiily by a script and then you could do analysis on the logs
If you use:
Connecting to database . . . Done.
Then you won't be able to analyse how long the transaction took
I don't think you should go down this route. A logging methodolgy with entry:
Time;functionName()->
And exit logging is more useful for troubleshooting:
Time;functionName()<-