I am working on a specman environment (hardware verification language), and I want to automate my tasks.
In order to do so, I learned Python programming with the target to use the file manipulation abilities. The problem is that I know only how to manipulate .txt files, Is there a way to change different kind of files?
Your question is way too generic. It's possible to change *.e files using string matching, maybe in some cases this makes sense as a one-time task, but there couldn't be any rules for that. Writing e parser in python doesn't sound like a feasible task.
The only reasonable way to analyze e code is to load it and use reflection. But not always you can feed the results to python to let it make any meaningful modifications.
It's totally possible to use python to generate e code based on some formally defined specs, specifically mentioned coverage, generation constraints, etc. It can be efficient and maintainable approach. However, there are different facilities for that, including tables.
Python certainly can be used for all kinds of smart scriptology: define environment, track installations and versions, choose flows, generate stubs, etc.
Related
I am currently starting a kind of larger project in python and I am unsure about how to best structure it. Or to put it in different terms, how to build it in the most "pythonic" way. Let me try to explain the main functionality:
It is supposed to be a tool or toolset by which to extract data from different sources, at the moment mainly SQL-databases, in the future maybe also data from files stored on some network locations. It will probably consist of three main parts:
A data model which will hold all the data extracted from files / SQL. This will be some combination of classes / instances thereof. No big deal here
One or more scripts, which will control everything (Should the data be displayed? Outputted in another file? Which data exactly needs to be fetched? etc) Also pretty straightforward
And some module/class (or multiple modules) which will handle the data extraction of data. This is where I struggle mainly
So for the actual questions:
Should I place the classes of the data model and the "extractor" into one folder/package and access them from outside the package via my "control script"? Or should I place everything together?
How should I build the "extractor"? I already tried three different approaches for a SqlReader module/class: I tried making it just a simple module, not a class, but I didn't really find a clean way on how and where to initialize it. (Sql-connection needs to be set up) I tried making it a class and creating one instance, but then I need to pass around this instance into the different classes of the data model, because each needs to be able to extract data. And I tried making it a static class (defining
everything as a#classmethod) but again, I didn't like setting it up and it also kind of felt wrong.
Should the main script "know" about the extractor-module? Or should it just interact with the data model itself? If not, again the question, where, when and how to initialize the SqlReader
And last but not least, how do I make sure, I close the SQL-connection whenever my script ends? Meaning, even if it ends through an error. I am using cx_oracle by the way
I am happy about any hints / suggestions / answers etc. :)
For this project you will need the basic Data Science Toolkit: Pandas, Matplotlib, and maybe numpy. Also you will need SQLite3(built-in) or another SQL module to work with the databases.
Pandas: Used to extract, manipulate, analyze data.
Matplotlib: Visualize data, make human readable graphs for further data analyzation.
Numpy: Build fast, stable arrays of data that work much faster than python's lists.
Now, this is just a guideline, you will need to dig deeper in their documentation, then use what you need in your project.
Hope that this is what you were looking for!
Cheers
I want to create a SQL autocompleter for use with rlwrap: https://github.com/hanslub42/rlwrap
This could then be used with sqlite3 & osqueri for example (I know they already have some autocompletion facility, but it's not good enough, especially under rlwrap).
In fact, more generally I would like to know the best approach for building autocompleters based on BNF grammar descriptions; I may want to produce autocompleters for other rlwrapped REPLs at some point in the future.
I have no experience with parsers, but I have read some stuff online about the different types of parsers and how they work, and this Pyleri tutorial: https://tomassetti.me/pyleri-tutorial/
Pyleri looks fairly straightforward, and has the expecting property which makes it easy to create a auto-completer, but AFAIK it would involve translating the sqlite BNF (and any other BNF's that I might want to use in the future) into python code, which is a drag.
ANTLR has lots of predefined grammar files for many different languages, and the ability to output python code, but I'm not sure how easy it is to produce an autocompleter, and I don't want to read through all the documentation only to find out I've wasted my time.
So can anyone advise me? What's the best approach?
I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?
There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.
Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.
You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.
Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.
If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.
You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.
I need to store source code for a basic function in a database and allow it to be modified through an admin interface. This code will take several numbers and strings as parameters, and return a number or None. I know that eval is evil, so I need to implement a safe way to execute a very basic subset of python, or something syntactically similar at least, from within a python based web-app.
The obvious answer is to implement a DSL (Domain Specific Language), however, I have no experience with that, nor do I have any idea where to begin, and a lot of the resources available seem to go a little over my head. I'm hoping that maybe there is something already out there which will allow me to essentially generate a secure python-callable function from a string in a database. the language really only needs to support assignment, basic math, if/else, and case insensitive string comparisons. any other features are a bonus, but I think most things can be done with just that, no need for complex data structures, classes, functions, etc.
If no such thing currently exists, I'm willing to look into the possibility of creating one, but as I said, I have no idea how to go about that, and any advice in that regard would be appreciated as well.
Restricted Python environments are hard to make really safe.
Maybe something like lua is a better fit for you
PySandbox might help. I haven't tested it, just found it linked elsewhere.
You could use Pyparsing to implement your DSL, provided the expressions involved won't be too complex (you don't give full details on that but you imply the requirements are pretty simple). See the examples page including specifically fourFn.py or simpleCalc.py.
You could implement a subset of Python by using the ast module to parse Python code into an abstract syntax tree then walk the tree checking that it only uses the subset of Python that you allow. This will only work in Python 2.x since Python 3 has removed the ast module.
However even using this method it will be hard to create something that is 100% secure, since even the most innocuous code could allow the user to write something that could blow up your application, e.g. by allocating more memory than you have available or putting the program into an infinite loop using all the CPU.
I'm currently working on a component of a trading product that will allow a quant or strategy developer to write their own custom strategies. I obviously can't have them write these strategies in natively compiled languages (or even a language that compiles to a bytecode to run on a vm) since their dev/test cycles have to be on the order of minutes.
I've looked at lua, python, ruby so far and really enjoyed all of them so far, but still found them a little "low level" for my target users. Would I need to somehow write my own parser + interpreter to support a language with a minimum of support for looping, simple arithmatic, logical expression evaluation, or is there another recommendation any of you may have? Thanks in advance.
Mark-Jason Dominus, the author of Perl's Text::Template module, has some insights that might be relevant:
When people make a template module
like this one, they almost always
start by inventing a special syntax
for substitutions. For example, they
build it so that a string like %%VAR%%
is replaced with the value of $VAR.
Then they realize the need extra
formatting, so they put in some
special syntax for formatting. Then
they need a loop, so they invent a
loop syntax. Pretty soon they have a
new little template language.
This approach has two problems: First,
their little language is crippled. If
you need to do something the author
hasn't thought of, you lose. Second:
Who wants to learn another language?
If you write your own mini-language, you could end up in the same predicament -- maintaining a grammar and a parser for a tool that's crippled by design.
If a real programming language seems a bit too low-level, the solution may not be to abandon the language but instead to provide your end users with higher-level utility functions, so that they can operate with familiar concepts without getting bogged down in the weeds of the underlying language.
That allows beginning users to operate at a high level; however, you and any end users with a knack for it -- your super-users -- can still leverage the full power of Ruby or Python or whatever.
It sounds like you might need to create some sort of Domain Specific Language (DSL) for your users that could be built loosely on top of the target language. Ruby, Python and Lua all have their various quirks regarding syntax, and to a degree some of these can be massaged with clever function definitions.
An example of a fairly robust DSL is Cucumber which implements a an interesting strategy of converting user-specified verbiage to actual executable code through a series of regular expressions applied to the input data.
Another candidate might be JavaScript, or some kind of DSL to JavaScript bridge, as that would allow the strategy to run either client-side or server-side. That might help scale your application since client machines often have surplus computing power compared to a heavily loaded server.
Custom-made modules are going to be needed, no matter what you choose, that define your firm's high level constructs.
Here are some of the needs I envision -- you may have some of these covered already: a way to get current positions, current and historical quotes, previous performance data, etc... into the application. Define/backtest/send various kinds of orders (limit/market/stop, what exchange, triggers) or parameters of options, etc... You probably are going to need multiple sandboxes for testing as well as the real thing.
Quants want to be able to do matrix operations, stochastic calculus, PDEs.
If you wanted to do it in python, loading NumPy would be a start.
You could also start with a proprietary system designed to do mathematical financial research such as something built on top of Mathematica or Matlab.
I've been working on a Python Algorithmic Trading Library (actually for backtesting, not for real trading). You may want to take a look at it: http://gbeced.github.com/pyalgotrade/
Check out http://www.tadeveloper.com for a backtesting framework using MATLAB as a scripting language. MATLAB has the advantage that it is very powerful but you do not need to be a programmer to use it.
This might be a bit simplistic, but a lot of quant users are used to working with Excel & VBA macros. Would something like VBSCript be usable, as they may have some experience in this area.
Existing languages are "a little "low level" for my target users."
Yet, all you need is "a minimum of support for looping, simple arithmatic, logical expression evaluation"
I don't get the problem. You only want a few features. What's wrong with the list of languages you provided? They actually offer those features?
What's the disconnect? Feel free to update your question to expand on what the problem is.
I would use Common Lisp, which supports rapid development (you have a running image and can compile/recompile individual functions) and tailoring the language to your domain. You would provide functions and macros as building blocks to express strategies, and the whole language would be available to the user for combining these.
Is something along the lines of Processing the complexity level that you're shooting for? Processing is a good example of taking a full-blown language (Java) and reducing/simplifying the available syntax into only a subset applicable to the problem domain (problem domain = visualization in the case of Processing).
Here's a little side-by-side comparison from the Processing docs.
Java:
g.setColor(Color.black)
fillRect(0, 0, size.width, size.height);
Processing:
background(0);
As others have suggested, you may be able to simply write enough high-level functions such that most of the complexity is hidden from the user but you still retain the ability to do more low-level things when necessary. The Wiring language for Arduino follows this strategy of using a thin layer of high-level functions on top of C in order to make it more accessible to non-programmers and hobbyists.
Define the language first -- if possible, use the pseudo-language called EBN, it's very simple (see the Wikipedia entry).
Then once you have that, pick the language. Almost certainly you will want to use a DSL. Ruby and Lua are both really good at that, IMO.
Once you start working on it, you may find that you go back to your definition and tweak it. But that's the right order to do things, I think.
I have been in the same boat building and trading with my own software. Java is not great because you want something higher level like you say. I have had a lot of success using the eclipse project xtext. http://www.eclipse.org/Xtext It does all the plumbing of building parsers etc. for you and using eclipse you can quickly generate code with functional editors. I suggest looking into this as you consider other options as well. This combined with the eclipse modeling framework is very powerful for quickly building DSL's which sounds like you need. - Duncan