How to programmatically read files with 'SH.' prefix on AS/400 system?

How to programmatically read files with 'SH.' prefix on AS/400 system? - python

I am continuously learning about both the AS/400 and IBM i Series as of late and unfortunately I have extremely limited assistance in this small business setting -- aka I am the IT department. The AS/400 system is not similar to anything I've used before. It's quite obscure in my opinion.
Right now, the company I work for uses a Food Distribution Application to enter orders, invoice, query inventory, maintain inventory, etc., on the AS/400. Through the Food Distribution Application one can create custom queries using files with the "SH." prefix in them. For example, file "SH.ITEM" returns the inventory items in our warehouse. At the Advanced Sales Menu you type QRY and it allows you to proceed in creating a query using these files.
Instead of using the Food Distribution Application for queries, I'd like to be able to read these SH files directly. Currently, I am using SQL Squirrel (DB2) to read non-'SH.' prefixed schemas/tables but I of course can't use the .SH files. User defined queries can be queried using DB2 but the "SH." files cannot. The database name is QS36F. The files and user defined queries can be found via ftp in the folder server_root/QS36F.
How can I read the information being pulled by the .SH files using a programming language? My language of choice is python but pseudo code or recommendations to make this more seamless would be appreciated. The end goal is to use the AS/400 information to update other database systems.
Screenshots:
Food Distribution Menu:
Creating Query for SH.ITEM:
SH.ITEM Report:
FTP Listing of server_root/QS36F:

I'm not familiar with SQL Squirrel but it's not clear why you can't read the "SH." prefixed files.
One possibility is the "." operator is the default separator between library (schema) and file (table) names when using SQL naming.
Try using quotes around the name, for example:
SELECT * FROM QS36F."SH.ITEM"
Also if you need access to specific members (partitions) you will need to create an alias to query against:
CREATE ALIAS QTEMP.M131204 FOR QS36F."SH.ITEM" (M131204);
SELECT * FROM QTEMP.M131204;
Due to the use of the QS36F library it's also possible these are not 'externally described files'. That means the tables may have only one big field to hold all of the individual field data and it is broken apart in the programs themselves.
DBVisualizer is a great third party SQL tool that can access a multitude of databases.

QS36F is the library used to hold all the files used by S/36 emulated programs, so they are probably program-defined. The use of a prefix like SH. was a way to group similar files on the S/36, because files resided outside of any library.
There was a query product in the S/36. I know next to nothing about it, but I will bet the query capability of your application is built on it.

Related

VS Integration Services: flat file source to OLE DB destination - detect new data columns in file, add columns to table,import columns automatically?

My place of work receives sets of pipe delimited files from many different clients that we use Visual Studio Integration Services projects to import into tables in our MS SQL 2008 R2 server for later processing - specifically with Data Flow Tasks containing Flat File Source to OLE DB Destination steps. Each data flow task has columns that are specifically mapped to columns in our tables, but the chances of a column addition in any file from any client are relatively high (and we are rarely warned that there will be changes), which is becoming tedious as I currently need to...
Run a python script that uses pyodbc to grab the columns contained in the destination tables and compare them to the source files to find out if there is a difference in columns
Execute the necessary SQL to add the columns to the destination tables
Open the corresponding VS Solution, refresh the columns in the flat file sources that have new columns and manually map each new column to the newly created columns in the OLE DB Destination
We are quickly getting more and more sites that I have to do this with, and I desperately need to find a way to automate this. The VS project can easily be automated if we could depend on the changes being accounted for, but as of now this needs to be a manual process to ensure we load all the data properly. Things I've thought about but have been unable to execute...
Using an XML parser - combined with the output of the python script mentioned above - to append new column mappings to the source/destination objects in the VS Package.dtsx.xml. I hit a dead end when I could not find out more information about creating a valid "DTS:DTSID" for new column mapping, and the file became corrupted whenever I edited it. This also seemed a very unstable option
Finding any built-in event handler in Visual Studio to throw an error if the flat file has a new, un-mapped column - I would be fine with this as a solution because we could confidently schedule the import projects to run automatically and only worry about changing the mapping for projects that failed. I could find a built in feature that does this. I'm also aware I could do this with a python script similar to the one mentioned above that fails if there are differences, but this would be extremely tedious to implement due to file-naming conventions and the fact that there are 50+ clients with more on the way.
I am open to any type of solution, even if it's just an idea. As this is my first question on Stack Overflow, I apologize if this was asked poorly and ask for feedback if the question could be improved. Thanks in advance to those that take the time to read!
Edit:
#Larnu stated that SSIS by default throws an error when unrecognized columns are found in the files. This however does not currently happen with our Visual Studio Integration Services projects and our team would certainly resist a conversion of all packages to SSIS at this point. It would be wonderful if someone could provide insight as to how to ensure the package would fail if there were new columns - in VS. If this isn't possible, I may have to pursue the difficult route as mentioned by #Dave Cullum, though I don't think I get paid enough for that!
Also, talking sense into the clients has proven to be impossible - the addition of columns will always be a crapshoot!

Using a script task you can read your file and record how many pipes are in a line:
using (System.IO.StreamReader sr = new System.IO.StreamReader(path))
{
string line = sr.ReadLine();
int ColumnCount = line.Length - line.Replace("|", "").Length +1;
}
I assume you know how to set that to a variable.
Now add an execute SQL and store result as another variable:
Select Count(*)
from INFORMATION_SCHEMA.columns
where TABLE_NAME = [your destination table]
Now exiting the execute SQL add a conditional arrow and compare the numbers. If they are equal continue your process. If they are not equal then go ahead and send an email (or some other type of notification.

How to create a SQL table from several SQL files?

All explained above is in the context of an ETL process. I have a git repository full of sql files. I need to put all those sql files (once pulled) into a sql table with 2 columns: name and query, so that I can access each file later on using a SQL query instead of loading them from the file path. How can I make this? I am free to use the tool I want to, but I just know python and Pentaho.
Maybe the assumption that this method would require less computation time than simply accessing to the pull file located in the hard drive is wrong. In that case let me know.

You can first define the table you're interested in using something along the lines of (you did not mention the database you are using):
CREATE TABLE queries (
name TEXT PRIMARY KEY,
query TEXT
);
After creating the table, you can use perhaps os.walk to iterate through the files in your repository, and insert both the contents (e.g. file.read()) and the name of the file into the table you created previously.
It sounds like you're trying to solve a different problem though. It seems like you're interested in speeding up some process, because you asked about whether accessing queries using a table would be faster than opening a file on disk. To investigate that (separate!) question further, see this.
I would recommend that you profile the existing process you are trying to speed up using profiling tools. After that, you can see whether IO is your bottleneck. Otherwise, you may do all of this work without any benefit.
As a side note, if you are looking up queries in this way, it may indicate that you need to rearchitect your application. Please consider that possibility as well.

Database Version Control for MySQL

What method do you use to version-control your database? I've committed all our database tables as separate .sql scripts to our respository (mercurial). In that way, if any member of the team makes a change to the employee table, say, I will immediately know which particular table has been modified when I updated my repository.
Such a method was described in: What are the best practices for database scripts under code control.
Presently, I'm writing a python script to execute all the .sql files within the database folder, however, the issue of dependencies due to foreign-key constraints ensures we can't just run the .sql files in just any order.
The python script is to generate a file with the order in which to execute the .sql files. It will execute the .sql files in the order in which they appear in the tableorder.txt file. A table cannot be executed until its foreign key table has been executed, for example:
tableorder.txt
table3.sql
table1.sql
table7.sql and so on
Already, I have generated the dependency list for each table, from code, by parsing the result of the "show create table" mysql command. The dependency list may look thus:
tblstate: tblcountry //tblcountry.sql must be executed before tblstate.sql etc
tblemployee: tbldepartment, tblcountry
To generate the content of the tableorder.txt, I will need an algorithm that will look thus:
function print_table(table):
foreach table in database:
if table.dependencies.count == 0
print to tableorder.txt
if table.dependencies.count > 0
print_table(dependency) //print dependency first
end function
As you will imagine, this involves lots of recursion. I'm beginning to wonder if it's worth the effort? If there's some tool out there? What tool (or algorithm) is there to generate a list of the order to execute separate .sql tables and views taking into consideration dependencies? Is it better to version control separate .sql file for each table/view or better to version control the entire database to a single .sql file? I will appreciate any response as this has taken so many days. Thanks.

I do not use MySQL, but rather SQL Server, however, this is how I version my database:
(This is long, but in the end I hope the reasoning for me abandoning a simple schema dump as the primary way to handle database versioning is made apparent.)
I make a modification to the schema and apply it to a test database.
I generate delta change scripts and a dump of the schema after said scripts. (I use ApexSQL, but there are likely MySQL-specific tools to help.)
The delta change scripts know how to go from the current to target schema version: ALTER TABLE existing, CREATE TABLE new, DROP VIEW old .. Multiple operations can occur within the same .SQL file as the delta is of importance.
The dump of the schema is of the target schema version: CREATE TABLE a, CREATE VIEW b .. there is no "ALTER" or "DROP" here, because it is just a snapshot of the target schema. There is one .SQL file per database object as the schema is of importance.
I use RoundhousE to apply the delta change scripts. (I do not use the RoundhousE "anytime script" feature as this does not correctly handle relationships.)
I learned the hard way that applying database schema changes cannot be reliably done without a comprehensive step-by-step plan and, similarly (as noted in the question), the order of relationship dependencies are important. Just storing the "current" or "end" schema is not sufficient. There are many changes that cannot be retroactively applied A->C without knowing A->B->C and some changes B might involve migration logic or corrections. SQL schema change scripts can capture these changes and allow them to be "replayed".
However, at the same time just saving the delta scripts does not provide a "simple view" of the target schema. This is why I also dump all the schema as well as the change scripts and version both. The view dump could, in theory, be used to construct the database but due to relationship dependencies (the very kind noted in the question), it may take some work and I do not use it as part of an automated schema-restore approach: yet, keeping the schema dump part of the Hg version-control allows quick identification of changes and viewing the target schema at a particular version.
The change deltas thus move forward through the revisions while the schema dump provides a view at the current revision. Because the change deltas are incremental and forward-only it is important to keep the branch dealing with these changes "clean", which is easy to do with Hg.
In one of my projects I am currently at database change number 70 - and happy and productive! - after switching to this setup. (And these are deployed changes, not just development changes!)
Happy coding.

You can use sqitch. Here is a tutorial for MySql, but it is actually database agnostic.
Changes are implemented as scripts native to your selected database engine... Database changes may declare dependencies on other changes—even on changes from other Sqitch projects. This ensures proper order of execution, even when you’ve committed changes to your VCS out-of-order... Change deployment is managed by maintaining a plan file. As such, there is no need to number your changes, although you can if you want. Sqitch doesn’t much care how you name your changes... Up until you tag and release your application, you can modify your change deployment scripts as often as you like. They’re not locked in just because they’ve been committed to your VCS. This allows you to take an iterative approach to developing your database schema. Or, better, you can do test-driven database development.

I'm not sure how well this answers your question, but I tend to just use mysqldump (part of the standard installation). This gives me the sql to create the tables and populate them, effectively serializing the database. Example:
> mysqldump -u username -p yourdatabase > database_dump.sql
To load a database from a dump sql file:
mysql -u username -p -e "source /path/to/database_dump.sql"
To further answer your question, I would version control each table separately only if there are multiple people working on the database in such a way that conflicts are likely to occur with just a single dump being version controlled. I've never hit a project where this is the case (the database tends to be one of the least volatile portions of the system after the initial phases of the project), so I just version control the database dump as a whole rather than each table individually.

I understand the problem but you cannot think of controlling the versions of the databases using git as if it were static code "" since it does not work, in the same way and it is not very useful to generate different files for each programmer since as you say they collide or Well, they do not have traceability, I started a project similar to how you have it, but it was one more huge problem when trying to have control over the versions and the collisions of the programmers, the solution that arrives is to generate a project where the following order is maintained
Enter web Login / password
Administration of users and profiles of what each user can do
Committee area -> the common and current command is sent to the database
Example: alter table ALTER TABLE users ADD por2 varchar (255);
the commit creates a traceability in the control system itself and the structure is sent to git starting from the initial structure for the control of changes
Change Control Area: it is the visualization of the commit itself plus the structure generated after the change
Server configuration area: the server is configured and a gitlab or github repository is added to it to carry version control in a more visual way without problems for developers
Backup restoration area: Send a backup and keep track of each version "Result of the change of the database structure"
This is the best handling I found without leaving the job to someone specific. I hope it helps you, I believe it in phyton which was the best I found since it uses Django and you save a lot of programming from the administrative part .. Greetings

What is the best, efficient way to index/store code (scripts, queries) for search/retrieval?

We have a whole lot of code, queries in a whole bunch of folders on a Linux box. Whenever I have to find a script, I do a fgrep -ircl --include=*.{sql, py, sh} "Keyword" * .
I am planning on creating a simple search interface (web) which lets you search for a keyword, file type and displays the location of the file and an excerpt from the resulting file. Lucene can be a good candidate I guess but I don't want to create a copy of all my files just for this purpose.
I am planning on indexing the files using a Python script every day at off hours. More like Google desktop I guess but for web (cross-platform availability).
What do you guys suggest is the best way of accomplishing this task?

I wrote a perl script waaaaay back when to provide a web interface result, still works on my deprecated blackbeltvb.com website if you want to look. It did a live search though, not indexed, and without excerpts.
I also wrote the search for wugnet.com that did ranked results and excerpts, and architected the search that's now in QB Desktop. In your case, I would take that approach - just have a cron job that adds new or updated scripts/files into a database, one big text field, with other fields containing metadata like filenames and types. Then have a web interface into that DB, searching with:
select * from data where keyword like '%word%' and keyword (or keyword) etc...
There's a FAQ on blackbeltvb.com that shows how to construct a SQL search for ranked keyword results, e.g. "all keywords found", "some found" etc...

Using Python, Xapian is the only solution you should consider. It is more "bare metal" than Lucene, but it has first class support for the Python bindings to the native C++ implemenation, and is way smaller and faster than Lucene is with real world data sets, by that I mean LARGE data sets.

Building an MS Access database using python

A primary goal of a project I plan to bid on involves creating a Microsoft Access database using python. The main DB backend will be postgres, but the plan is to export an Access image.
This will be a web app that'll take input from the user and go through a black box and output the results as an access db. The web app will be built on a linux server.
I have a few related questions:
Is there a reliable library or module that can be used?
What has your experience been using Access and python?
Any tips, tricks, or must avoids I need to know about?
Thanks :)

Could you use an sqlite database instead?
edit:
If it HAS to be on linux and it HAS to be to MS Access, then I'm pretty sure this is your only choice, but it costs $1,550.
You are either going to have to shell out the money, or convince the client to change one of the other two parameters. Personally, I would push to change the database file to sqlite.
Of course you could always code up your own database driver, but it would probably be worth the time to shell out the $1,550. mdbtools has been working on this for years and the project has been pretty much abandoned.
found it, kinda
Ok, so I just couldn't let this go and found that there is a java library called Jackcess that will write to MS Access mdb files on any platform that can run the jvm. Granted, it's java and not python, but maybe you could learn just enough java to throw an application together and execute it from python? Or just switch the whole app to java, whatever.

The various answers to the duplicate question suggest that your "primary goal" of creating an MS Access database on a linux server is not attainable.
Of course, such a goal is of itself not worthwhile at all. If you tell us what the users/consumers of the Access db are expected to do with it, maybe we can help you. Possibilities: (1) create a script and a (set of) file(s) which the user downloads and runs to create an Access DB (2) if it's just for casual user examination/manipulation, an Excel file may do.

If you know this well enough:
Python, it's database modules, and ODBC configuration
then you should know how to do this:
open a database, read some data, insert it in to a different database
If so, then you are very close to your required solution. The trick is, you can open an MDB file as an ODBC datasource. Now: I'm not sure if you can "CREATE TABLES" with ODBC in an MDB file, so let me propose this recipe:
Create an MDB file with name "TARGET.MDB" -- with the necessary tables, forms, reports, etc. (Put some dummy data in and test that it is what the customer would want.)
Set up an ODBC datasource to the file "TARGET.MDB". Test to make sure you can read/write.
Remove all the dummy data -- but leave the table defs intact. Rename the file "TEMPLATE.MDB".
When you need to generate a new MDB file: with Python copy TEMPLATE.MDB to TARGET.MDB.
Open the datasource to write to TARGET.MDB. Create/copy required records.
Close the datasource, rename TARGET.MDB to TODAYS_REPORT.MDB... or whatever makes sense for this particular data export.
Would that work for you?
It would almost certainly be easier to do that all on Windows as the support for ODBC will be most widely available. However, I think in principle you could do this on Linux, provided you find the right ODBC components to access MDB via ODBC.

You could export to XML using MS's officedata namespace. Access shouldn't have any trouble consuming that. You can provide a separate xsd schema, or encode types and relationships directly in the document tree. Here's is a simple example:
<?xml version="1.0" encoding="UTF-8"?>
<dataroot xmlns="urn:schemas-microsoft-com:officedata">
<Table1><Foo>0.00</Foo><Bar>2011-05-11T00:00:00.000</Bar></Table1>
<Table1><Foo>3.00</Foo><Bar>2011-05-07T00:00:00.000</Bar></Table1>
<Table2><Baz>Hello</Baz><Quux>Kitty</Quux></Table2>
</dataroot>
Googling "urn:schemas-microsoft-com:officedata" should turn up some useful hits.

I would suggest moving the data into a Microsoft SQL database, then linking or importing the data to access.

Could you create a self-extracting file to send to the Windows user who has Microsoft Access installed?
Include a blank .mdb file.
dynamically build xml documents with tables, schema
and data
Include an import executable that will take
all of the xml docs and import into
the Access .mdb file.
It's an extra step for the user, but you get to rely on their existing drivers, software and desktop.

Well, looks to me like you need a copy of vmware server on the linux box running windows, a web service in the vm to write to access, and communications to it from the main linux box. You aren't going to find a means of creating an access db on Linux. Calling it a requirement isn't going to make it technically possible.

http://adodb.sourceforge.net/ - installs on linux, written in php or python, connects to Access and PostgreSQL.
We've been using it for years, and it works beautifully.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.