Feasibility of converting all python pandas/numpy code to base python

Feasibility of converting all python pandas/numpy code to base python - python

General python question-
I have built a script using numpy and pandas libraries. I have now been told that I cannot use any libraries- only base python to code. This is because apparently open source libraries are not approved.
Does this restriction make sense? Isn't base python as open source as pandas/numpy libraries are?
Is it possible to convert pandas/numpy code to base python? Does this sound like a simple exercise or does it require learning a lot of new functions? Majority of the code is reading tables and then using if/then type statements and looking up values from other tables to generate and populate new tables.

I'm only going to address the 2nd point. Reimplementing all of numpy/pandas is certainly a very large and useless task. But you're not reimplementing all of it, you only need some parts, and if it's only a few functions, than it's certainly possible.
I'd start from a working script, replace arrays by python lists, and implement the needed fucntions one by one. For SO specifically, I suspect you're better off asking specific questions, e.g. how to implement an analog of a function X in pure python etc.

Related

How do I include a data-extraction module into my python project?

I am currently starting a kind of larger project in python and I am unsure about how to best structure it. Or to put it in different terms, how to build it in the most "pythonic" way. Let me try to explain the main functionality:
It is supposed to be a tool or toolset by which to extract data from different sources, at the moment mainly SQL-databases, in the future maybe also data from files stored on some network locations. It will probably consist of three main parts:
A data model which will hold all the data extracted from files / SQL. This will be some combination of classes / instances thereof. No big deal here
One or more scripts, which will control everything (Should the data be displayed? Outputted in another file? Which data exactly needs to be fetched? etc) Also pretty straightforward
And some module/class (or multiple modules) which will handle the data extraction of data. This is where I struggle mainly
So for the actual questions:
Should I place the classes of the data model and the "extractor" into one folder/package and access them from outside the package via my "control script"? Or should I place everything together?
How should I build the "extractor"? I already tried three different approaches for a SqlReader module/class: I tried making it just a simple module, not a class, but I didn't really find a clean way on how and where to initialize it. (Sql-connection needs to be set up) I tried making it a class and creating one instance, but then I need to pass around this instance into the different classes of the data model, because each needs to be able to extract data. And I tried making it a static class (defining
everything as a#classmethod) but again, I didn't like setting it up and it also kind of felt wrong.
Should the main script "know" about the extractor-module? Or should it just interact with the data model itself? If not, again the question, where, when and how to initialize the SqlReader
And last but not least, how do I make sure, I close the SQL-connection whenever my script ends? Meaning, even if it ends through an error. I am using cx_oracle by the way
I am happy about any hints / suggestions / answers etc. :)

For this project you will need the basic Data Science Toolkit: Pandas, Matplotlib, and maybe numpy. Also you will need SQLite3(built-in) or another SQL module to work with the databases.
Pandas: Used to extract, manipulate, analyze data.
Matplotlib: Visualize data, make human readable graphs for further data analyzation.
Numpy: Build fast, stable arrays of data that work much faster than python's lists.
Now, this is just a guideline, you will need to dig deeper in their documentation, then use what you need in your project.
Hope that this is what you were looking for!
Cheers

How to automate specman files changes with python?

I am working on a specman environment (hardware verification language), and I want to automate my tasks.
In order to do so, I learned Python programming with the target to use the file manipulation abilities. The problem is that I know only how to manipulate .txt files, Is there a way to change different kind of files?

Your question is way too generic. It's possible to change *.e files using string matching, maybe in some cases this makes sense as a one-time task, but there couldn't be any rules for that. Writing e parser in python doesn't sound like a feasible task.
The only reasonable way to analyze e code is to load it and use reflection. But not always you can feed the results to python to let it make any meaningful modifications.
It's totally possible to use python to generate e code based on some formally defined specs, specifically mentioned coverage, generation constraints, etc. It can be efficient and maintainable approach. However, there are different facilities for that, including tables.
Python certainly can be used for all kinds of smart scriptology: define environment, track installations and versions, choose flows, generate stubs, etc.

data exchange format ocaml to python numpy or pandas

I'm generating time series data in ocaml which are basically long lists of floats, from a few kB to hundreds of MB. I would like to read, analyze and plot them using the python numpy and pandas libraries. Right now, i'm thinking of writing them to csv files.
A binary format would probably be more efficient? I'd use HDF5 in a heartbeat but Ocaml does not have a binding. Is there a good binary exchange format that is usable easily from both sides? Is writing a file the best option, or is there a better protocol for data exchange? Potentially even something that can be updated on-line?

First of all I would like to mention, that there're actually bindings for HDF-5 for OCaml. But, when I was faced with the same problem I didn't find one that suits my purposes and is mature enough. So I wouldn't suggest you to use it, but who knows, maybe today there is something more descent.
So, to my experience the best way to store numeric data in OCaml is Bigarrays. They are actually wrappers around the C-pointer, that can be allocated outside of OCaml runtime. They also can be a memory mapped regions. So, for me this is the most efficient way to share data between different processes (potentially written in different languages). You can share data using memory mapping with OCaml, Python, Matlab or whatever with very little pain, especially if you're not trying to modify it from different processes simultaneously.
Other approaches, is to use MPI, ZMQ or bare sockets. I would prefer the latter for the only reason that the former doesn't support bigarrays. Also, I would suggest you to look for capn'proto, it is also very efficient, and have bindings for OCaml and Python, and for your particular use case, can work very fine.

Benefits of accessing the Abstract Syntaxt Tree (AST) . How does Julia exploit it?

I have read that Julia has access to the AST of the code it runs. What exactly does this mean? Is it that the runtime can access it, that code itself can access it, or both?
Building on this:
Is this a key difference of Julia with respect to other dynamic languages, specifically Python?
What are the practical benefits of being able to access the AST?
What would be a good example of something that you can't easily do in Python, but that you can do in Julia, because of this?

What distinguishes Julia from languages like Python is that Julia allows you to intercept code before it is evaluated. Macros are just functions, written in Julia, which let you access that code and manipulate it before it runs. Furthermore, rather than treating code as a string (like "f(x)"), it's provided as a Julian object (like Expr(:call, :f, :x)).
There are plenty of things this allows which just aren't possible in Python. The main ones are:
You can do more work at compile time, increasing performance
Two good examples of this are regexes and printf. Both of these take a format specification of some kind and interpret it in some way. Now, these can fairly straightforwardly be implemented as functions, which might look like this:
match(Regex(".*"), str)
printf("%d", num)
The problem with this is that these specifications must be re-interpreted every time the statement is run. Every time the interpreter goes over this block, the regex must be re-compiled into a state machine, and the format must be run through a mini-interpreter. On the other hand, if we implement these as macros:
match(r".*", str)
#printf("%d", num)
Then the r and #printf macros will intercept the code at compile time, and run their respective interpreters then. The regex turns into a fast state machine, and the #printf statement turns into a simple println(num). At run time the minimum of work is done, so the code is blazing fast. Now, other languages are able to provide fast regexes, for example, by providing special syntax for it – but the fact that they're not special-cased in Julia means that developers can use the same techniques in their own code.
You can make mini-compilers for, well, pretty much anything
Languages with macros tend to have more capable embedded DSLs, because you can change the semantics of the language at will. For example, the algebraic modelling language, JuMP.jl. Clojure also has some neat examples of this too, like its embedded logic programming language. Mathematica.jl even embeds Mathematica's semantics in Julia, so that you can write really natural symbolic expressions like #Integrate(log(x), {x,0,2}). You can fake this to a point in Python (SymPy does a good job), but not as cleanly or as efficiently.
If that doesn't convince you, consider that someone managed to implement an interactive Julia debugger in pure Julia using macros. Try that in Python.
Edit: Another great example of something that's difficult in other languages is Cartestian.jl, which lets you write generic algorithms across arrays of any number of dimensions.

I am not familiar with Julia and only first heard of it with your question, but this sounded an awfully lot like Lisp (and indeed Julia seems to be a new grandchild/dialect of Lisp from what I'm reading) and it's powerful macros. The ability to access the AST at run/compile time brings a whole new dimension to the programmers code: metaprogramming.
See http://docs.julialang.org/en/latest/manual/metaprogramming/ and especially http://docs.julialang.org/en/latest/manual/metaprogramming/#macros for some of the practical uses. Basically you can 'inject/modify' code in places where it would be impossible for python/R to do the same.

Example: loop unrolling without any copy & paste, which takes a compile time argument to easily vary how much you want to unroll the loop.
Here's an excellent resource on Julia metaprogramming: https://en.wikibooks.org/wiki/Introducing_Julia/Metaprogramming

How can I go about securely executing a subset of python?

I need to store source code for a basic function in a database and allow it to be modified through an admin interface. This code will take several numbers and strings as parameters, and return a number or None. I know that eval is evil, so I need to implement a safe way to execute a very basic subset of python, or something syntactically similar at least, from within a python based web-app.
The obvious answer is to implement a DSL (Domain Specific Language), however, I have no experience with that, nor do I have any idea where to begin, and a lot of the resources available seem to go a little over my head. I'm hoping that maybe there is something already out there which will allow me to essentially generate a secure python-callable function from a string in a database. the language really only needs to support assignment, basic math, if/else, and case insensitive string comparisons. any other features are a bonus, but I think most things can be done with just that, no need for complex data structures, classes, functions, etc.
If no such thing currently exists, I'm willing to look into the possibility of creating one, but as I said, I have no idea how to go about that, and any advice in that regard would be appreciated as well.

Restricted Python environments are hard to make really safe.
Maybe something like lua is a better fit for you

PySandbox might help. I haven't tested it, just found it linked elsewhere.

You could use Pyparsing to implement your DSL, provided the expressions involved won't be too complex (you don't give full details on that but you imply the requirements are pretty simple). See the examples page including specifically fourFn.py or simpleCalc.py.

You could implement a subset of Python by using the ast module to parse Python code into an abstract syntax tree then walk the tree checking that it only uses the subset of Python that you allow. This will only work in Python 2.x since Python 3 has removed the ast module.
However even using this method it will be hard to create something that is 100% secure, since even the most innocuous code could allow the user to write something that could blow up your application, e.g. by allocating more memory than you have available or putting the program into an infinite loop using all the CPU.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.