First of all sorry for my bad english.
I'm working on a project and I need to generate a code (ID) that I can verify later.
As my project is very extensive I will give you and example and later what I need to solve.
Example: I have a code that get the temperature of a place once a day, and the data is stored in a local database (I save the temperature, the date, and the unique ID).
The code is encrypted (No one can see the source code of the program).
Now my problem.
I need to be sure that the data stored in my database has not been modified.
What I think can solve this is: For example, the date is 08-19-2017 and the temperature is 25°C. I can do some math operations (for example, multiply all) and get an ID, and later on I can verify if the code match the date and temperature.
Do you think this is a good solution or is there a better one?
Thanks all.
I'm using Python and linux.
The code is encrypted (No one can see the source code of the program).
That's a fallacy. Unless you're using a secure processor that can actually decrypt things into memory that can't be read by the operating system, your program is never truly encrypted. Sure, the original python might be hidden, but from the assembly, a somewhat skilled person can easily gather what is happening.
So, since this is kind of a data security question: Security by obscurity doesn't work on general-purpose hardware. Especially not with relatively high-level things like Python.
Now my problem. I need to be sure that the data stored in my database has not been modified.
That is a hard problem, indeed. The problem is that: if someone's able to fully reconstruct the state of your program, they can also reconstruct what your encryption would have done if the data was different.
There's a few ways around that. But in the end, they all break down to a single principle:
You need some hardware device that can encrypt your data as it comes and proves it hasn't been tampered with, e.g. by keeping a counter of how many things have been encrypted. So, if you have e.g 100 things in the database that have been encrypted by your secure, uncloneable crypto hardware, and it shows it has only been used 100 times, you're fine. The same would apply if that hardware would, for example, always do "encrypt(input bytes + timestamp)".
You can't do that in software on a general purpose OS — software can always be made to run with modified data, and if it's just that you patch the physical memory accessed just in time.
So, what you'll need specific hardware. Feels like a crypto smart card would be able to do something like that, but I don't know whether that includes the functionality to keep a counter or include the timestamp.
One solution that might work is basically using a stream cipher to ensure the integrity of the whole data "stream". Here, part of the secret is the state in which the encryption algorithm is in. Imagine this: You have a smart card with a secret key from a keypair generated on the card itself on it. You hold the other key in your cellar.
You, before shipping the device, encrypt something secret. That puts the smartcard in a state that the malicious tamperer can't guess.
You encrypt the first value, save the output. That changes the internal state!
You encrypt and save the output of a known word or sequence
repeat 2. + 3. for all the other values to be stored.
at the end, you decrypt the data in the database using the key you kept in your cellar. Since the internal state necessarily changed with the input data (i.e. encrypting the same data twice doesn't give the same result!!), the data isn't correctly decryptable if you something is missing from the records. You can immediately check by the output generated by the known word.
takeaway
What you're trying to do is hard – that namely being:
running software on hardware that you have no control over and having to ensure the authenticity of the data it produced.
Now, the impossible part is actually making sure that data hasn't been tampered with before it enters your software – who says that, for example, the driver for your temperature sensor hasn't been replaced by something that always says "-18 °C"? To avoid the capability of people to tamper with your software, you'll need hardware that enforces the non-tampering. And that's not something you can do on PC-style hardware, unless you disable all debugging possibilities and ensure you have safe booting capability.
Related
I have a discord bot that executes
with open("input.bc", "w") as f:
f.write(INPUT)
where INPUT is a string limited to 2000 characters. this file later gets deleted. Is this safe if INPUT is whatever the user wishes it to be?
then runs the bc file (google bc programming language if you are curios)
by the way, this file later gets executed with
execlp("bc", "bc", "-q", "bc_funcs/lib.bc", "bc_funcs/init.bc",
FILE_NAME, "bc_funcs/exit.bc", NULL);
forgot to mention: After 5 seconds if the bc file doesnt finish executing the process is stopped.
The question wording is a little misleading, you would not be trusting user input. User input cannot be trusted.
You will be trusting bc though. I don't know bc to this detail, but it appears to not allow malicious operations. The reason you would still probably not want to run arbitrary user input is exactly because of that trust in the bc implementation that you have. Probably it was not supposed to allow arbitrary operations beyond maths, and there is no known vulnerability that I can find, but these implementations might have vulnerabilities that people have just not discovered yet. In case of bc I think the risk of a potential vulnerability is increased by the fact that not a lot of research might have gone into finding such vulnerabilities.
So in short, while there might not be a known vulnerability right now, my take is it would probably be possible to exploit bc in a way that compromises your server.
Another potential vulnerability (depending on your attacker model) is the file operation involved. You are writing user input to a file (which is fine), and then separate from that you read that file and run it in bc. An attacker might be able to add path elements to run something else as bc on the server, or replace the written .bc file with something else before being run, potentially creating incorrect results (and that's only the best case). File operations are tricky to get right if you assume some level of access for an attacker.
If I get you right, you just let any user do input and than execue it. It's kinda not safe, if a lot of users will use it.
I need some help. I've been working on a file searching app as I learn Python, and it's been a very interesting experience so far, learned a lot and realized how little that actually is.
So, it's my first app, and it needs to be fast! I am unsatisfied with (among other things) the speed of finding matches for sparse searches.
The app caches file and folder names as dbm keys, and the search is basically running search words past these keys.
The GUI is in Tkinter, and to try not get it jammed, I've put my search loop in a thread. The thread recieves queries from the GUI via a queue, then passes results back via another queue.
That's how the code looks:
def TMakeSearch(fdict, squeue=None, rqueue=None):
'''Circumventing StopIteration(), did not see speed advantage'''
RESULTS_PER_BATCH=50
if whichdb(DB)=='dbhash' or 'dumb' in whichdb(DB):
'''iteration is not implemented for gdbm and (n)dbm, forced to
pop the keys out in advance for "for key in fdict:" '''
fdict=fdict
else:
# 'dbm.gnu', 'gdbm', 'dbm.ndbm', 'dbm'
fdict=fdict.keys()
search_list=None
while True:
query=None
while not squeue.empty():
#more items may get in (or not?) while condition is checked
query=squeue.get()
try:
search_list=query.lower().encode(ENCODING).split()
if Tests.is_query_passed:
print (search_list)
except:
#No new query, or a new database has been created and needs to be synced
sleep(0.1)
continue
else:
is_new_query=True
result_batch=[]
for key in fdict:
separator='*'.encode(ENCODING) #Python 3, yaaay
filename=key.split(separator)[0].lower()
#Add key if matching
for token in search_list:
if not token in filename:
break
else:
#Loop hasn't ended abruptly
result_batch.append(key)
if len(result_batch)>=RESULTS_PER_BATCH:
#Time to send off a batch
rqueue.put((result_batch, is_new_query))
if Tests.is_result_batch:
print(result_batch, len(result_batch))
print('is_result_batch: results on queue')
result_batch=[]
is_new_query=False
sleep(0.1)
if not squeue.empty():
break
#Loop ended naturally, with some batch<50
rqueue.put((result_batch, is_new_query))
Once there are few results, the results cease to be real-time, but rather take a few seconds, and that's on my smallish 120GB hard disk.
I believe it can be faster, and wish to make the search real-time.
What approaches exist to make the search faster?
My current marks all involve ramping up the faculties that I use - use multiprocessing somehow, use cython, perhaps somehow use ctypes to make the searches circumvent the Python runtime.
However, I suspect there are simpler things that can be done to make it work, as I am not savvy with Python and optimization.
Assistance please!
I wish to stay within the standard library if possible, as a proof of concept and for portability (currently I only scandir as an external library on Python <3.5), so for example ctypes would be preferrable to cython.
If it's relevant/helpful, the rest of the code is here -
https://github.com/h5rdly/Jiffy
EDIT:
This is the heart of the function, take a few pre-arrangements:
for key in fdict:
for token in search_list:
if not token in key:
break
else:
result_batch.append(key)
where search_list is a list of strings, and fdict is a dictionary or a dbm (didn't see a speed difference trying both).
This is what I wish to make faster, so that results arrive in real-time, even when there are only few keys containing my search words.
EDIT 2:
On #hpaulj 's advice, I've put the dbm keys in a (frozen) set, to gain a noticable imrovement on Windows/Python27 (dbhash):
I have some caveats though -
For my ~50Gb in use, the frozenset takes 28Mb, as by pympler.asizeof. So for the full 1Tb, I suspect it'll take a nice share of RAM.
On linux, for some reason, the conversion not only doesn't help, but the query itself stops getting updated in real time for some weird reason for the duration of the search, making the GUI look unrespnsive.
On Windows, This is almost as fast as I want, but still not warp-immediate.
So this comes around to this addition:
if 'win' in sys.platform:
try:
fdict=frozenset(fdict)
except:
fdict=frozenset(fdict.keys())
Since it would take a significant amount of RAM for larger disks, I think I'll add it as an optional faster search for now, "Scorch Mode".
I wonder what to do next. I thought that perhaps, if I could somehow export the keys/filenames to a datatype that ctypes can pass along, I could then pop a relevant C function to do the searches.
Also, perhaps learn the Python bytecode and do some lower-level optimization.
I'd like this to be as fast as Python would let me, please advise.
Im storing some user raw_input as a variable in Python 2.7, the issue is that this is sensitive as it is the encryption passphrase for a cryptocurrency wallet.
Therefore I want to ensure that once the Python script is completed, there is no trace of the passphase left anywhere on the system.
Where passphrase is the variable, is this at the end of the program:
del passphrase
good to utterly remove all traces?
No del xxx or implicit deletion (leaving the current scope) may not be enough to hide the previously stored value. Note that this may crucially depend on your OS and your Python implementation.
However I would advise not to roll your own security systems unless you really, really know what you're doing but rather search an already existing solution for whatever it is you want to do and use that. For example I'm not sure if either raw_input or input are suitable for cryptographical needs.
You may get additional help in Information Security StackExchange.
I understand that this question has, in essence, already been asked, but that question did not have an unequivocal answer, so please bear with me.
Background: In my company, we use Perforce submission numbers as part of our versioning. Regardless of whether this is a correct method or not, that is how things are. Currently, many developers do separate submissions for code and documentation: first the code and then the documentation to update the client-facing docs with what the new version numbers should be. I would like to streamline this process.
My thoughts are as follows: create a Perforce trigger (which runs on the server side) which scans the submitted documentation files (such as .txt) for a unique term (such as #####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####) and then replaces it with the value of what the change list would be when submitted. I already know how to determine this value. What I cannot figure out, is how or where to update the files.
I have already determined that using the change-content trigger (whether possible or not), which
"fire[s] after changelist creation and file transfer, but prior to committing the submit to the database",
is the way to go. At this point the files need to exist somewhere on the server. How do I determine the (temporary?) location of these files from within, say, a Python script so that I can update or sed to replace the placeholder value with the intended value? The online documentation for Perforce which I have found so far have not been very explicit on whether this is possible or how the mechanics of a submission at this stage would work.
EDIT
Basically what I am looking for is RCS-like functionality, but without the unsightly special character sequences which accompany it. After more digging, what I am asking is the same as this question. However I believe that this must be possible, because the trigger is running on the server side and the files had already been transferred to the server. They must therefore be accessible by the script.
EXAMPLE
Consider the following snippet from a release notes document:
[#####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
This is what the user submits. I then want the trigger to intercept this file during the submission process (as mentioned, at the change-content stage) and alter it so that what is eventually stored within Perforce looks like this:
[52738] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
Where 52738 is the final change list number of what the user submitted. (As mentioned, I can already determine this number, so please do dwell on this point.) I.e., what the user sees on the Perforce client console is.
Changelist 52733 renamed 52738.
Submitted change 52738.
Are you trying to replace the content of pending changelist files that were edited on a different client workspace (and different user)?
What type of information are you trying to replace in the documentation files? For example,
is it a date, username like with RCS keyword expansion? http://www.perforce.com/perforce/doc.current/manuals/p4guide/appendix.filetypes.html#DB5-18921
I want to get better clarification on what you are trying to accomplish in case there is another way to do what you want.
Depending on what you are trying to do, you may want to consider shelving ( http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.files.html#d0e5537 )
Also, there is an existing Perforce enhancement request I can add your information to,
regarding client side triggers to modify files on the client side prior to submit. If it becomes implemented, you will be notified by email.
99w,
I have also added you to an existing enhancement request for Customizable RCS keywords, along
with the example you provided.
Short of using a post-command trigger to edit the archive content directly and then update the checksum in the database, there is currently not a way to update the file content with the custom-edited final changelist number.
One of the things I learned very early on in programming was to keep out of interrupt level as much as possible, and especially don't do stuff in interrupt that requires resources that can hang the system. I totally get that you want to resolve the internal labeling in sequence, but a better way to do it may be to just set up the edit during the trigger so that a post trigger tool can perform the file modification.
Correct me if I'm looking at this wrong, but there seems a bit of irony, or perhaps recursion, if you are trying to make a file change during the course of submitting a file change. It might be better to have a second change list that is reserved for the log. You always know where that file is, in your local file space. That said, ktext files and $ fields may be able to help.
Say you have a some meta data for a custom file format that your python app reads. Something like a csv with variables that can change as the file is manipulated:
var1,data1
var2,data2
var3,data3
So if the user can manipulate this meta data, do you have to worry about someone crafting a malformed meta data file that will allow some arbitrary code execution? The only thing I can imagine if you you made the poor choice to make var1 be a shell command that you execute with os.sys(data1) in your own code somewhere. Also, if this were C then you would have to worry about buffers being blown, but I don't think you have to worry about that with python. If your reading in that data as a string is it possible to somehow escape the string "\n os.sys('rm -r /'), this SQL like example totally wont work, but is there similar that is possible?
If you are doing what you say there (plain text, just reading and parsing a simple format), you will be safe. As you indicate, Python is generally safe from the more mundane memory corruption errors that C developers can create if they are not careful. The SQL injection scenario you note is not a concern when simply reading in files in python.
However, if you are concerned about security, which it seems you are (interjection: good for you! A good programmer should be lazy and paranoid), here are some things to consider:
Validate all input. Make sure that each piece of data you read is of the expected size, type, range, etc. Error early, and don't propagate tainted variables elsewhere in your code.
Do you know the expected names of the vars, or at least their format? Make sure the validate that it is the kind of thing you expect before you use it. If it should be just letters, confirm that with a regex or similar.
Do you know the expected range or format of the data? If you're expecting a number, make sure it's a number before you use it. If it's supposed to be a short string, verify the length; you get the idea.
What if you get characters or bytes you don't expect? What if someone throws unicode at you?
If any of these are paths, make sure you canonicalize and know that the path points to an acceptable location before you read or write.
Some specific things not to do:
os.system(attackerControlledString)
eval(attackerControlledString)
__import__(attackerControlledString)
pickle/unpickle attacker controlled content (here's why)
Also, rather than rolling your own config file format, consider ConfigParser or something like JSON. A well understood format (and libraries) helps you get a leg up on proper validation.
OWASP would be my normal go-to for providing a "further reading" link, but their Input Validation page needs help. In lieu, this looks like a reasonably pragmatic read: "Secure Programmer: Validating Input". A slightly dated but more python specific one is "Dealing with User Input in Python"
Depends entirely on the way the file is processed, but generally this should be safe. In Python, you have to put in some effort if you want to treat text as code and execute it.