All explained above is in the context of an ETL process. I have a git repository full of sql files. I need to put all those sql files (once pulled) into a sql table with 2 columns: name and query, so that I can access each file later on using a SQL query instead of loading them from the file path. How can I make this? I am free to use the tool I want to, but I just know python and Pentaho.
Maybe the assumption that this method would require less computation time than simply accessing to the pull file located in the hard drive is wrong. In that case let me know.
You can first define the table you're interested in using something along the lines of (you did not mention the database you are using):
CREATE TABLE queries (
name TEXT PRIMARY KEY,
query TEXT
);
After creating the table, you can use perhaps os.walk to iterate through the files in your repository, and insert both the contents (e.g. file.read()) and the name of the file into the table you created previously.
It sounds like you're trying to solve a different problem though. It seems like you're interested in speeding up some process, because you asked about whether accessing queries using a table would be faster than opening a file on disk. To investigate that (separate!) question further, see this.
I would recommend that you profile the existing process you are trying to speed up using profiling tools. After that, you can see whether IO is your bottleneck. Otherwise, you may do all of this work without any benefit.
As a side note, if you are looking up queries in this way, it may indicate that you need to rearchitect your application. Please consider that possibility as well.
Is it possible in python by which I can write a simple .py script to update my access database records or insert new one if any i have on behalf of me? new records are to be pulled from Excel and pushed to be in the database.
MS-Access2010 i am using.
Thanks,
It's definitely possible. You'll probably want to do it with the comtypes module, which allows communication between Windows processes using the Component Object Model (COM).
Here's an example of a script that does that posted in another question.
Getting the information out of Microsoft Excel can be done with a lot of modules, but one I've had success with is openpyxl. Some examples of reading Excel workbooks with it can be found here.
I want to migrate data from an old Tomcat/Jetty website to a new one which runs on Python & Django. Ideally I would like to populate the new website by directly reading the data from the old database and storing them in the new one.
Problem is that the database I was given comes in the form of a bunch of WEB-INF/data/*.dbx and I didn't find any way to read them. So, I have a few questions.
Which format do the WEB-INF/data/*.dbx use?
Is there a python module for directly reading from the WEB-INF/data/*.dbx files?
Is there some external tool for dumpint the WEB-INF/data/*.dbx to an ascii format that will be parsable by python?
If someone has attempted a similar data migration, how does it compare against scraping the data from the old website? (assuming that all important data can be scraped)
Thanks!
The ".dbx" suffix has been used by various softwares over the years so it could be almost anything. The only way to know what you really have here is to browse the source code of the legacy java app (or the relevant doc or ask the author etc).
wrt/ scraping, it's probably going to be a lot of a pain for not much results, depending on the app.
is it possible to set up tables for Mysql in Python?
Here's my problem, I have bunch of .txt files which I want to load into Mysql database. Instead of creating tables in phpmyadmin manually, is it possible to do the following things all in Python?
Create table, including data type definition.
Load many files one by one. I only know this LOAD DATA LOCAL INFILE command to load one file.
Many thanks
Yes, it is possible, you'll need to read the data from the CSV files using CSV module.
http://docs.python.org/library/csv.html
And the inject the data using Python MySQL binding. Here is a good starter tutorial:
http://zetcode.com/databases/mysqlpythontutorial/
If you already know python it will be easy
It is. Typically what you want to do is use an Object-Retlational Mapping library.
Probably the most widely used in the python ecosystem is SQLAlchemy, but there is a lot of magic going on in it, so if you want to keep a tighter control on your DB schema, or if you are learning about relational DB's and want to follow along what the code does, you might be better off with something lighter like Canonical's storm.
EDIT: Just thought to add. The reason to use ORM's is that they provide a very handy way to manipulate data / interface to the DB. But if all you will ever want to do is to do a script to convert textual data to MySQL tables, than you might get along with something even easier. Check the tutorial linked from the official MySQL website, for example.
HTH!
A primary goal of a project I plan to bid on involves creating a Microsoft Access database using python. The main DB backend will be postgres, but the plan is to export an Access image.
This will be a web app that'll take input from the user and go through a black box and output the results as an access db. The web app will be built on a linux server.
I have a few related questions:
Is there a reliable library or module that can be used?
What has your experience been using Access and python?
Any tips, tricks, or must avoids I need to know about?
Thanks :)
Could you use an sqlite database instead?
edit:
If it HAS to be on linux and it HAS to be to MS Access, then I'm pretty sure this is your only choice, but it costs $1,550.
You are either going to have to shell out the money, or convince the client to change one of the other two parameters. Personally, I would push to change the database file to sqlite.
Of course you could always code up your own database driver, but it would probably be worth the time to shell out the $1,550. mdbtools has been working on this for years and the project has been pretty much abandoned.
found it, kinda
Ok, so I just couldn't let this go and found that there is a java library called Jackcess that will write to MS Access mdb files on any platform that can run the jvm. Granted, it's java and not python, but maybe you could learn just enough java to throw an application together and execute it from python? Or just switch the whole app to java, whatever.
The various answers to the duplicate question suggest that your "primary goal" of creating an MS Access database on a linux server is not attainable.
Of course, such a goal is of itself not worthwhile at all. If you tell us what the users/consumers of the Access db are expected to do with it, maybe we can help you. Possibilities: (1) create a script and a (set of) file(s) which the user downloads and runs to create an Access DB (2) if it's just for casual user examination/manipulation, an Excel file may do.
If you know this well enough:
Python, it's database modules, and ODBC configuration
then you should know how to do this:
open a database, read some data, insert it in to a different database
If so, then you are very close to your required solution. The trick is, you can open an MDB file as an ODBC datasource. Now: I'm not sure if you can "CREATE TABLES" with ODBC in an MDB file, so let me propose this recipe:
Create an MDB file with name "TARGET.MDB" -- with the necessary tables, forms, reports, etc. (Put some dummy data in and test that it is what the customer would want.)
Set up an ODBC datasource to the file "TARGET.MDB". Test to make sure you can read/write.
Remove all the dummy data -- but leave the table defs intact. Rename the file "TEMPLATE.MDB".
When you need to generate a new MDB file: with Python copy TEMPLATE.MDB to TARGET.MDB.
Open the datasource to write to TARGET.MDB. Create/copy required records.
Close the datasource, rename TARGET.MDB to TODAYS_REPORT.MDB... or whatever makes sense for this particular data export.
Would that work for you?
It would almost certainly be easier to do that all on Windows as the support for ODBC will be most widely available. However, I think in principle you could do this on Linux, provided you find the right ODBC components to access MDB via ODBC.
You could export to XML using MS's officedata namespace. Access shouldn't have any trouble consuming that. You can provide a separate xsd schema, or encode types and relationships directly in the document tree. Here's is a simple example:
<?xml version="1.0" encoding="UTF-8"?>
<dataroot xmlns="urn:schemas-microsoft-com:officedata">
<Table1><Foo>0.00</Foo><Bar>2011-05-11T00:00:00.000</Bar></Table1>
<Table1><Foo>3.00</Foo><Bar>2011-05-07T00:00:00.000</Bar></Table1>
<Table2><Baz>Hello</Baz><Quux>Kitty</Quux></Table2>
</dataroot>
Googling "urn:schemas-microsoft-com:officedata" should turn up some useful hits.
I would suggest moving the data into a Microsoft SQL database, then linking or importing the data to access.
Could you create a self-extracting file to send to the Windows user who has Microsoft Access installed?
Include a blank .mdb file.
dynamically build xml documents with tables, schema
and data
Include an import executable that will take
all of the xml docs and import into
the Access .mdb file.
It's an extra step for the user, but you get to rely on their existing drivers, software and desktop.
Well, looks to me like you need a copy of vmware server on the linux box running windows, a web service in the vm to write to access, and communications to it from the main linux box. You aren't going to find a means of creating an access db on Linux. Calling it a requirement isn't going to make it technically possible.
http://adodb.sourceforge.net/ - installs on linux, written in php or python, connects to Access and PostgreSQL.
We've been using it for years, and it works beautifully.