I am seeking a way to take SDMX files (like here: http://www12.statcan.gc.ca/datasets/Alternative.cfm?PID=105929&EXT=SDMX) and process them into a Postgresql table.
I can use rsdmx (https://cran.r-project.org/web/packages/rsdmx/index.html) for smaller datasets but for large ones we reach a number of limitations in R.
PandaSDMX (https://pandasdmx.readthedocs.io/en/latest/) appears to resolve some of these issues, but I am not experienced in Python and can't seem to get the syntax to work. I'm able to use Response.get() to load a local file as a response object but am not sure where to go to from there.
I know I need to apply the code tables (structure file) but I'm not sure how to do that or make it so I can use odo (http://odo.pydata.org/en/latest/) to send it to postgresql.
Hoping someone can help out or suggest another method to pursue.
Related
All explained above is in the context of an ETL process. I have a git repository full of sql files. I need to put all those sql files (once pulled) into a sql table with 2 columns: name and query, so that I can access each file later on using a SQL query instead of loading them from the file path. How can I make this? I am free to use the tool I want to, but I just know python and Pentaho.
Maybe the assumption that this method would require less computation time than simply accessing to the pull file located in the hard drive is wrong. In that case let me know.
You can first define the table you're interested in using something along the lines of (you did not mention the database you are using):
CREATE TABLE queries (
name TEXT PRIMARY KEY,
query TEXT
);
After creating the table, you can use perhaps os.walk to iterate through the files in your repository, and insert both the contents (e.g. file.read()) and the name of the file into the table you created previously.
It sounds like you're trying to solve a different problem though. It seems like you're interested in speeding up some process, because you asked about whether accessing queries using a table would be faster than opening a file on disk. To investigate that (separate!) question further, see this.
I would recommend that you profile the existing process you are trying to speed up using profiling tools. After that, you can see whether IO is your bottleneck. Otherwise, you may do all of this work without any benefit.
As a side note, if you are looking up queries in this way, it may indicate that you need to rearchitect your application. Please consider that possibility as well.
I would like to process a large data set of a mechanical testing device with Python. The software of this device only allows to export the data as an Excel file. Therefore, I use the xlrd package which works fine for small *.xlsx files.
The problem I have is, that when I want to open a common data set (3-5 MB) by
xlrd.open_workbook(path_wb)
the access time is about 30s to 60s. Is there any more effecitve and faster way to access Excel files?
You could access the file as a database via PyPyODBC instead, which may (or may not) be faster - you'd have to try it out and compare the results.
This method should work for both .xls and .xlsx files. Unfortunately, it comes with a couple of caveats:
As far as I am aware, this will only work on Windows machines, since you're relying on the Microsoft Jet database driver.
The Microsoft Jet database driver can be rather buggy, especially with dates.
It's not possible to create or modify Excel files (a note in the PyPyODBC exceltests.py file says: I have not been able to successfully create or modify Excel files.). Your question seems to indicate that you're only interested in reading files, though, so hopefully this will not be a problem.
I just figured out that it wasn't actually the problem with the access time but I created an object in the same step. Now, by creating the object separately everything works fast and nice.
I want to migrate data from an old Tomcat/Jetty website to a new one which runs on Python & Django. Ideally I would like to populate the new website by directly reading the data from the old database and storing them in the new one.
Problem is that the database I was given comes in the form of a bunch of WEB-INF/data/*.dbx and I didn't find any way to read them. So, I have a few questions.
Which format do the WEB-INF/data/*.dbx use?
Is there a python module for directly reading from the WEB-INF/data/*.dbx files?
Is there some external tool for dumpint the WEB-INF/data/*.dbx to an ascii format that will be parsable by python?
If someone has attempted a similar data migration, how does it compare against scraping the data from the old website? (assuming that all important data can be scraped)
Thanks!
The ".dbx" suffix has been used by various softwares over the years so it could be almost anything. The only way to know what you really have here is to browse the source code of the legacy java app (or the relevant doc or ask the author etc).
wrt/ scraping, it's probably going to be a lot of a pain for not much results, depending on the app.
A friend of mine has asked me to write a quick Python script.
He has a small SQLite database (3 tables) and has to copy a bunch of data to it from an excel spreadsheet. The spreadsheet only has 2 data fields but alot of rows of data.
He asked if I would write a quick Python script to transfer the ss data to the db to so he doesn't have to spend a crapload of time manually copying it. I told him that I would do my best.
My question is: Where do I start? what do I need to research for this? Does anyone know if there is a pre-existing module to do this? Im trying to research this myself, but haven't come up with anything concrete yet and am not sure of any other search terms to use to narrow down my search.
Im just hoping someone wont mind giving my some guidance in the right direction.
Blessings and thanks
F
Do you really need Python?
SQLite Database Browser is a freeware, public domain, open source visual tool used to create, design and edit database files compatible with SQLite. Controls and wizards are available for users to:
Import and export tables from/to CSV files
If the file is simple enough (no commas in the content), you can import directly from SQLite:
For simple CSV files, you can use the SQLite shell to import the file into your SQLite database.
If you need to import a complex CSV file and the SQLite shell doesn't handle it, you may want to try a different front end, such as SQLite Database Browser.
is it possible to set up tables for Mysql in Python?
Here's my problem, I have bunch of .txt files which I want to load into Mysql database. Instead of creating tables in phpmyadmin manually, is it possible to do the following things all in Python?
Create table, including data type definition.
Load many files one by one. I only know this LOAD DATA LOCAL INFILE command to load one file.
Many thanks
Yes, it is possible, you'll need to read the data from the CSV files using CSV module.
http://docs.python.org/library/csv.html
And the inject the data using Python MySQL binding. Here is a good starter tutorial:
http://zetcode.com/databases/mysqlpythontutorial/
If you already know python it will be easy
It is. Typically what you want to do is use an Object-Retlational Mapping library.
Probably the most widely used in the python ecosystem is SQLAlchemy, but there is a lot of magic going on in it, so if you want to keep a tighter control on your DB schema, or if you are learning about relational DB's and want to follow along what the code does, you might be better off with something lighter like Canonical's storm.
EDIT: Just thought to add. The reason to use ORM's is that they provide a very handy way to manipulate data / interface to the DB. But if all you will ever want to do is to do a script to convert textual data to MySQL tables, than you might get along with something even easier. Check the tutorial linked from the official MySQL website, for example.
HTH!