How to check if DataStore Indexes are being served on AppEngine?

How to check if DataStore Indexes are being served on AppEngine? - python

How can I check if datastore Indexes as defined in index.yaml are serving in the python code?
I am using Python 1.3.6 AppEngine SDK.

Attempt to perform a query that requires that index. If it raises a NeedIndexError, it's not uploaded or not yet serving.

I don't think there's a way to check without adding some logging to the SDK code. If you're using the SQLite stub, __FindIndexForQuery, lines 1114-1140, is the part that looks for applicable indices to a query, and (at line 1140), returns, and I quote:
An entity_pb.CompositeIndex PB, if a
suitable index exists; otherwise None
A little logging at that point (and when it's about to fall off the end having exhausted the loop -- that's how it returns None) will give you a trace of all your indices that are actually used, as part of the logs of course. The protocol buffer it returns is an instance of the class defined in this file, starting at line 2576.
If you can explain why you want to know this, it would, I think, be quite reasonable to open a feature request on the App Engine tracker, asking Google to add the logging that I'm suggesting, so you don't have to keep maintaining your edited version of the file!
(If you use the file stub, the relevant file is here, and the part to instrument is around line 824 and following; of course, this part will be used only if you're running the SDK in "require indices" mode, AKA "strict mode", otherwise, indices are created in, not used by, the SDK;-)

Related

Would allowing user input to python's import be a security risk?

Say I create a simple web server using Flask, and allowing people to query certain things that I modulated in different python files, using the __import__ statement, would doing this with user supplied information be considered a security risk?
Example:
from flask import Flask
app = Flask(__name__)
#app.route("/<author>/<book>/<chapter>")
def index(author, book, chapter):
return getattr(__import__(author), book)(chapter)
# OR
return getattr(__import__("books." + author), book)(chapter)
I've seen a case like this recently when reviewing code, however it didn't feel right to me.

It is entirely insecure, and your system is wide open to attack. Your first return line doesn't limit what kind of names can be imported, which means the user can execute any arbitrary callable in any importable Python module.
That includes:
/pickle/loads/<url-encoded pickle data>
A pickle is a stack language that lets you execute arbitrary Python code, and the attacker can take full control of your server.
Even a prefixed __import__ would be insecure if an attacker can also place a file on your file system in the PYTHONPATH; all they need is a books directory earlier in the path. They can then use this route to have the file executed in your Flask process, again letting them take full control.
I would not use __import__ at all here. Just import those modules at the start and use a dictionary mapping author to the already imported module. You can use __import__ still to discover those modules on start-up, but you now remove the option to load arbitrary code from the filesystem.
Allowing untrusted data to direct calling arbitrary objects in modules should also be avoided (including getattr()). Again, an attacker that has limited access to the system could exploit this path to widen the crack considerably. Always limit the input to a whitelist of possible options (like the modules you loaded at the start, and per module, what objects can actually be called within).

More than being a security risk, it is a bad idea e.g. I could easily crash your web app by visiting the url:
/sys/exit/anything
translating to:
...
getattr(__import__('sys'), 'exit')('anything')
Don't give the possibility to import/execute just about anything to your users. Restrict the possibilities by using say a dictionary of permissible imports, as #MartijnPieters as clearly pointed out.

Updating files with a Perforce trigger before submit

I understand that this question has, in essence, already been asked, but that question did not have an unequivocal answer, so please bear with me.
Background: In my company, we use Perforce submission numbers as part of our versioning. Regardless of whether this is a correct method or not, that is how things are. Currently, many developers do separate submissions for code and documentation: first the code and then the documentation to update the client-facing docs with what the new version numbers should be. I would like to streamline this process.
My thoughts are as follows: create a Perforce trigger (which runs on the server side) which scans the submitted documentation files (such as .txt) for a unique term (such as #####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####) and then replaces it with the value of what the change list would be when submitted. I already know how to determine this value. What I cannot figure out, is how or where to update the files.
I have already determined that using the change-content trigger (whether possible or not), which
"fire[s] after changelist creation and file transfer, but prior to committing the submit to the database",
is the way to go. At this point the files need to exist somewhere on the server. How do I determine the (temporary?) location of these files from within, say, a Python script so that I can update or sed to replace the placeholder value with the intended value? The online documentation for Perforce which I have found so far have not been very explicit on whether this is possible or how the mechanics of a submission at this stage would work.
EDIT
Basically what I am looking for is RCS-like functionality, but without the unsightly special character sequences which accompany it. After more digging, what I am asking is the same as this question. However I believe that this must be possible, because the trigger is running on the server side and the files had already been transferred to the server. They must therefore be accessible by the script.
EXAMPLE
Consider the following snippet from a release notes document:
[#####PERFORCE##CHANGELIST##NUMBER###ROFL###LOL###WHATEVER#####] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
This is what the user submits. I then want the trigger to intercept this file during the submission process (as mentioned, at the change-content stage) and alter it so that what is eventually stored within Perforce looks like this:
[52738] Added a cool new feature. Early retirement is in sight.
[52702] Fixed a really annoying bug. Many lives saved.
[52686] Fixed an annoying bug.
Where 52738 is the final change list number of what the user submitted. (As mentioned, I can already determine this number, so please do dwell on this point.) I.e., what the user sees on the Perforce client console is.
Changelist 52733 renamed 52738.
Submitted change 52738.

Are you trying to replace the content of pending changelist files that were edited on a different client workspace (and different user)?
What type of information are you trying to replace in the documentation files? For example,
is it a date, username like with RCS keyword expansion? http://www.perforce.com/perforce/doc.current/manuals/p4guide/appendix.filetypes.html#DB5-18921
I want to get better clarification on what you are trying to accomplish in case there is another way to do what you want.
Depending on what you are trying to do, you may want to consider shelving ( http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.files.html#d0e5537 )
Also, there is an existing Perforce enhancement request I can add your information to,
regarding client side triggers to modify files on the client side prior to submit. If it becomes implemented, you will be notified by email.

99w,
I have also added you to an existing enhancement request for Customizable RCS keywords, along
with the example you provided.
Short of using a post-command trigger to edit the archive content directly and then update the checksum in the database, there is currently not a way to update the file content with the custom-edited final changelist number.

One of the things I learned very early on in programming was to keep out of interrupt level as much as possible, and especially don't do stuff in interrupt that requires resources that can hang the system. I totally get that you want to resolve the internal labeling in sequence, but a better way to do it may be to just set up the edit during the trigger so that a post trigger tool can perform the file modification.
Correct me if I'm looking at this wrong, but there seems a bit of irony, or perhaps recursion, if you are trying to make a file change during the course of submitting a file change. It might be better to have a second change list that is reserved for the log. You always know where that file is, in your local file space. That said, ktext files and $ fields may be able to help.

How to properly manage application configurations

What is the most universal and best application configurations management method? I want to have these properties in order to have "good configuration management":
A list of all available properties and their default values in one
place.
A list of properties which can be changed by an app user, also in one
place.
When I retrieve a specific property, it's value is returned from the
2nd list (user changeable configs) or if it's not there, from the
first list.
So far, what I did was hard coding the 1st list as an object (more specific as a dict), wrote .conf file used by ConfigParser to make an app user to easily change some of the properties (2nd list), and wrote a public method on the config object to retrieve a property by it's name or if it's not there, raise an exception. In the end, one object was responsible for managing all the stuff (parsing file, raising exception, overriding properties etc.) But I was wondering, if there's a built-in library which does more or less the same thing, or even a better way to manage configuration, which takes into account all the KISS, DRY and other principles (I'm not always successful to do that with this method)?
Thanks in advance.

Create a default settings module which contains your desired default settings. Create a second module intended to be used by the the user with a from default_settings import * statement at the top, and instructing the user to write any replacements into this module instead.
Python is rather expressive, so in most cases, if you can expect the user to understand it on any level, you can use a Python module itself as the configuration file.

Is fstat() a safe (sandboxed) operation?

I'm currently writing a Python sandbox using sandboxed PyPy. Basically, the sandbox works by providing a "controller" that maps system library calls to a specified function instead. After following the instructions found at codespeak (which walk through the set up process), I realized that the default controller does not include a replacement for os.fstat(), and therefore crashes when I call open(). Specifically, the included pypy/translator/sandbox/sandlib.py does not contain a definition for do_ll_os__ll_os_fstat.
So far, I've implemented it as:
def do_ll_os__ll_os_fstat(self, fd):
return os.fstat(fd)
which seems to work fine. Is this safe? Will this create a hole in the sandbox?

The fstat call can reveal certain information which you may or may not want to keep secret. Among other things:
Whether two file descriptors are on the same filesystem
The block size of the underlying filesystem
Numeric UID/GIDs of file owners
Modification/access times of files
However, it will not modify anything, so if you don't mind this (relatively minor) information leak, no problem. You could also alter some of the results to mask information you want to hide (set owner UIDs/GIDs to 0, for example)

bdonlan's answer is good, but since there is a bounty here, what the heck :-)
You can see for yourself precisely what information fstat provides by reading the POSIX spec for struct stat.
It is definitely a "read-only" operation. And as a rule, Unix file descriptors only provide access to the single object to which they refer. For example, a (readable) file descriptor referencing a directory will allow you to list the files within the directory, but it will not allow you to access files within the directory; for that, you need to open() the file, which will perform a permission check.
Be aware that fstat can be called on non-files like directories or sockets. Here again, though, it will only provide the information you see in struct stat and it will not modify anything. (And for a socket, most of the fields will be meaningless.)

How can I write to the previous line in a log file using Python's Logging module?

long-time lurker here, finally emerging from the woodwork.
Essentially, what I'm trying to do is have my logger write data like this to the logfile:
Connecting to database . . . Done.
I'd like the 'Connecting to database . . . ' to be written when the function is called, and the 'Done' written after the function has successfully executed.
I'm using Python 2.6 and the logging module. Also, I'd really like to avoid using decorators for this. Any help would be most appreciated!

Writing to a log is, and must be, an atomic action -- this is crucial, and a key feature of any logging package (including the one in Python's standard library) that distinguishes logging from the simple appending of information to files (where bits of things being written by different processes and threads might well "interleave" -- one of them writing some part of a line but not the line-end, just as you desire, and then maybe another one interposing something right afterwards, before the first task writes what it thinks will be the last part of the line but actually ends up on another line... utter confusion often results;-).
It's not inevitable that the atomic unit be "a line" (logs can be recorded elsewhere than to a text file, of course, and some of the things that are acceptable "sinks" for logs won't even have the concept of "a line"!), but, for logging, atomic units there must be. If you want something entire non-atomic then you don't really want logging but simple appends to a file or other stream (and, watch out for the likely confusion mentioned in the first paragraph;-).
For transient updates about what your task is doing (in the middle of X, X done, starting Y, etc), you could think of a specialized log-handler that (for example) interprets such streams of updates by taking the first word as a subtask-identifier (incrementally building up and displaying somewhere the composite message about the "current subtask", recognizing when the subtask identifier changes that the previous subtask is finished or taking an explicit "subtask finished" message, and only writing persistent log entries on subtask-finished events).
It's a pretty specialized requirement so you're not likely to find a pre-made tool for this, but rather you'll have to roll your own. To help you with that, it's crucial to understand exactly what you're trying to accomplish (why would you want non-atomic logging entries, if such a concept even made any sense -- what deployment or system administration task are you trying to ameliorate by using such a hypothetical tool?) so that the specialized subsystem can be tailored to your actual needs. So, can you please expand on this?

I don't believe Python's logger supports that.
However, would it not be better to aggree on a Log format so that the log file can be easily parsed when you want analyse the data where ; is any deliminator you want:
DateTime;LogType;string
That could be parsed easiily by a script and then you could do analysis on the logs
If you use:
Connecting to database . . . Done.
Then you won't be able to analyse how long the transaction took

I don't think you should go down this route. A logging methodolgy with entry:
Time;functionName()->
And exit logging is more useful for troubleshooting:
Time;functionName()<-

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.