Parsing specific keywords in Select Statements and formatting - python

I have a sample select statement:
Select D.account_csn, D.account_key, D.industry_id, I.industry_group_nm, I.industry_segment_nm From ecs.DARN_INDUSTRY I JOIN ecs.DARN_ACCOUNT D
ON I.SRC_ID=D.INDUSTRY_ID
WHERE D.ACCOUNT_CSN='5070000240'
I would like to parse the select statements into separate files. The first file name is called ecs.DARN_INDUSTRY
and inside the file it should look like this:
industry_group_nm
industry_segment_nm
Similarly another file called ecs.DARN_ACCOUNT and the content looks like this:
account_csn
account_key
industry_id
How do I do this in Bash or Python??

I doubt you will find a truly simple answer (maybe someone can prove otherwise). However, you might find python-sqlparse useful.
Parsing general SQL statments will be complicated and it is difficult to guess exactly what you are trying to accomplish. However, I think you are trying to extract the tables and corresponding column references via sql parsing, in which case, look at this question which basically asks that very thing directly.

Here is a long working command through awk,
awk 'NR==1{gsub(/^.*\./,"",$5);gsub(/^.*\./,"",$6);gsub(/.$/,"",$5); printf $5"\n"$6"\n" > "DARN_INDUSTRY"; gsub(/^.*\./,"",$2);gsub(/^.*\./,"",$3);gsub(/^.*\./,"",$4);gsub(/.$/,"",$2);gsub(/.$/,"",$3);gsub(/.$/,"",$4); printf $2"\n"$3"\n"$4"\n" > "DARN_ACCOUNT"}' file
Explanation:
gsub(/^.*\./,"",$5) remove all the characters upto the first . symbol in colum number 5.
printf $5"\n"$6"\n" > "DARN_INDUSTRY" redirects the output of printf command to the file named DARN_INDUSTRY.
gsub(/.$/,"",$4) Removes the last character in column 4.

Related

What is equivalent of Perl DB_FILE module in Python?

I was asked by my supervisor to convert some Perl scripts into Python language. I'm baffled by few lines of code and I am also relatively inexperienced with Python as well. I'm an IT intern, so this was something of a challenge.
Here are the lines of code:
my %sybase;
my $S= tie %sybase, "DB_File", $prismfile, O_RDWR|O_CREAT, 0666, $DB_HASH or die "Cannot open: $!\n";
$DB_HASH->{'cachesize' } = $cache;
I'm not sure what is the equivalent of this statement in Python? DB_FILE is a Perl module. DB_HASH is a database type that allows arbitrary keys/values to be stored in data file, at least that's according to Perl documentation.
After that, the next lines of code also got me stumped on how to convert this to the equivalent in Python as well.
$scnt=0;
while(my $row=$msth->fetchrow_arrayref()) {
$scnt++;
$srow++;
#if ($scnt <= 600000) {
$S->put(join('#',#{$row}[0..5]),join('#',#{$row}[6..19]));
perf(500000,'sybase') ;#if $VERBOSE ;
# }
}
I'll probably use fetchall() in Python to store the entire result dataset in it, then work through it row by row. But I'm not sure how to implement join() correctly in Python, especially since these lines use range within the row index elements -- [0..5]. Also it seems to write the output to data file (look at put()). I'm not sure what perf() does, can anyone help me out here?
I'd appreciate any kind of help here. Thank you very much.

Python parsing update statements using regex

I'm trying to find a regex expression in python that will be able to handle most of the UPDATE queries that I throw at if from my DB. I can't use sqlparse or any other libraries that may be useful with for this, I can only use python's built-in modules or cx_Oracle, in case it has a method I'm not aware of that could do something like this.
Most update queries look like this:
UPDATE TABLE_NAME SET COLUMN_NAME=2, OTHER_COLUMN=to_date('31-DEC-202023:59:59','DD-MON-YYYYHH24:MI:SS'), COLUMN_STRING='Hello, thanks for your help', UPDATED_BY=-100 WHERE CODE=9999;
Most update queries I use have a version of these types of updates. The output has to be a list including each separate SQL keyword (UPDATE, SET, WHERE), each separate update statement(i.e COLUMN_NAME=2) and the final identifier (CODE=9999).
Ideally, the result would look something like this:
list = ['UPDATE', 'TABLE_NAME', 'SET', 'COLUMN_NAME=2', 'OTHER_COLUMN=("31-DEC-2020 23:59:59","DD-MON-YYYY HH24:MI:SS")', COLUMN_STRING='Hello, thanks for your help', 'UPDATED_BY=-100', 'WHERE', 'CODE=9999']
Initially I tried doing this using a string.split() splitting on the spaces, but when dealing with one of my slightly more complex queries like the one above, the split method doesn't deal well with string updates such as the one I'm trying to make in COLUMN_STRING or those in OTHER_COLUMN due to the blank spaces in those updates.
Let's use the shlex module :
import shlex
test="UPDATE TABLE_NAME SET COLUMN_NAME=2, OTHER_COLUMN=to_date('31-DEC-202023:59:59','DD-MON-YYYYHH24:MI:SS'), COLUMN_STRING='Hello, thanks for your help', UPDATED_BY=-100 WHERE CODE=9999;"
t=shlex.split(test)
Up to here, we won't get rid of comma delimiters and the last semi one, so maybe we can do this :
for i in t:
if i[-1] in [',',';']:
i=i[:-1]
If we print every element of that list we'll get :
UPDATE
TABLE_NAME
SET
COLUMN_NAME=2
OTHER_COLUMN=to_date(31-DEC-202023:59:59,DD-MON-YYYYHH24:MI:SS)
COLUMN_STRING=Hello, thanks for your help
UPDATED_BY=-100
WHERE
CODE=9999
Not a proper generic answer, but serves the purpose i hope.

Number of lines added and deleted in files using gitpython

How to get/extract number of lines added and deleted?
(Just like we do using git diff --numstat).
repo_ = Repo('git-repo-path')
git_ = repo_.git
log_ = g.diff('--numstat','HEAD~1')
print(log_)
prints the entire output (lines added/deleted and file-names) as a single string. Can this output format be modified or changed so as to extract useful information?
Output format: num(added) num(deleted) file-name
For all files modified.
If I understand you correctly, you want to extract data from your log_ variable and then re-format it and print it? If that's the case, then I think the simplest way to fix it, is with a regular expression:
import re
for line in log_.split('\n'):
m = re.match(r"(\d+)\s+(\d+)\s+(.+)", line)
if m:
print("{}: rows added {}, rows deleted {}".format(m[3], m[1], m[2]))
The exact output, you can of course modify any way you want, once you have the data in a match m. Getting the hang of regular expressions may take a while but it can be very helpful for small scripts.
However, be adviced, reg exps tend to be write-only code and can be very hard to debug. However, for extracting small parts like this, it is very helpful.

Priority order of a look up table on python: how is that read?

Folks,
I have this look up table and then, a python script that looks for those strings at the beginning of string placed between strings separated by "_"
For example, for this: Accc_x09vbbb_Bcdddd, the script should identify x09vbbb
However, I have a question.
Is the internal logic from python going to use first come, first serve? aka, identify "noedoe" before "noe" since this is at the beggining of the look up table or it will not. Basically, it will use the shortest string on the look at table or other way to search.
['xc0', 'x09', 'xc7', 'xz9', 'xz0', 'xx0', 'xx9', 'xga', 'xgb', 'xg9', 'cn7',
'x8x', 'noedoe', 'noeprf', 'noegtdon', 'noegtrf', 'noegtgrf', 'hollvc',
'holrf', 'holpg', 'holpll', 'holthm', 'holp00f', 'holbp00f', 'holary',
'pgt', 'hol', 'noe']

CSV credential python one liner

I have a csv file like this :
name,username
name2,username2
etc...
And I need to extract each column into lists so I can create a account (admin script).
I am hoping the result would look like this :
NAMES=( name name2 )
MAILS=( username username2 )
LENGHT=3 # number of lines in csv files actually
I would like to do it in python (because I use it elsewhere in my script and would like to convert my collegues to the dark side). Exept that I am not really a python user...
Something like this would do the trick (I assume) :
NAMES=( $(echo "$csv" | pythonFooMagic) )
MAILS=( $(echo "$csv" | python -c "import sys,csv; pythonFooMagic2") )
LENGHT=$(echo "$csv" | pythonFooMagic3)
I kind of found tutos to do it accross several lines but glued together it was ugly.
There must be some cool ways to do it. Else I will resign to use sed... Any ideas?
EDIT : ok bad idea, for future reference, see the comments
You could use a temporary file, like this:
tmpfile=$(mktemp)
# Python script produces the variables you want
pythonFooMagic < csv > $tmpfile
# Here you take the variables somehow. For example...
source $tempfile
rm $tmpfile

Categories