How do I automate data extract from Hive to Windows desktop? - python

I have to login to bastion Linux host then run kinit and beeline using pbrun, then sftp csv file to Windows.
Query sample:
"SELECT * FROM db.table WHERE id > 100"
Is there a Python script or tool to automate this?

You could put yours query to file, for example hive_script.sql
and run his from terminal
hive -f hive_script.sql

I want to post my findings.
The most difficult part was figuring out expect+pbrun.
Because there are 2 interactive questions I had to pause for a sec after first question.
My expect code:
#!/usr/bin/expect -f
set timeout 300
set usr [lindex $argv 0];
set pwd [lindex $argv 1];
set query_file [lindex $argv 2];
spawn -noecho pbrun $usr &
expect -re "Password:"
send "$pwd\r"
sleep 1
expect "Enter reason for this privilege access:"
send "test\r"
send "kinit -k -t /opt/Cloudera/keytabs/`whoami`.`hostname -s`.keytab `whoami`/`hostname -f`#YOUR_FQDN_NAME.NET;ssl=true\r"
send "beeline -u 'jdbc:hive2://bigdataplatform-your_dev.net:10000/;principal=hive/bigdataplatform-your_dev.net#YOUR_FQDN_NAME.NET;ssl=true' --outputformat=csv2 --verbose=false --fastConnect=true --silent=true -f $query_file;\r"
expect "*]$\ " {send "exit\r"}
expect eof
Query:
select * from gfocnnsg_work.pytest LIMIT 1000000;
The rest is Python and paramiko.
I create Transport object, execute expect script, and save standard output on Windows OS.
Data access path:
Windows desktop->
SSH->
Linux login->
pbrun service login->
kinit
beeline->
SQL->
save echo on Windows
Here's a Python script with details: hivehoney

Related

Running PowerShell command via Python gives an error

I'm trying to run this PowerShell command via Python:
sid = utils.execute_powershell(settings.D01_DC1_PORT,
settings.D01_USER,
settings.PASSWORD,
'(Get-ADForest).Domains | '
'%{Get-ADDomain -Server $_}| '
'select domainsid')
The port, the user and the password are all valid. If I run the same script in PowerShell I see values.
Yet, via Python I get this error:
'Unable to contact the server. This may be because this server does not exist, it is currently down, or it does not have the Active Directory Web Services running.'
What is wrong here?
If I change my script so that only one row "returns" from the query, the code passes:
sid = utils.execute_powershell(settings.D01_DC1_PORT,
settings.D01_USER,
settings.PASSWORD,
'(Get-ADForest).Domains | '
'%{Get-ADDomain -Server $_}| '
'select domainsid -First 1')

Not able to do SCP in SSH

Here I'm providing complete details with script etails:
I'm running my script in HUB1.
Firstly,asking users for credentials like username,password,filename,path.
Sample Snipet:
try:
usrname = raw_input('Enter your credentials:')
regex = r'[x][A-Za-z]{6,6}'
sigMatch = re.match(regex,usrname) --> username & pattern matching
sigMatch.group()
except AttributeError:
print "Kindly enter your name"
raise AttributeError
[...]
Immediately I'm doing SSH from HUB1 as below:
ssh.load_system_host_keys()
ssh.connect('100.103.113.15', username='', password='')
ssh_stdin, ssh_stdout, ssh_stderr = ssh.exec_command("cd /home/dir;ls -lrt")
By doing ssh from HUB1, I can virtually login into HUB2 and copy file in
desired directory [ i,e though I run my script in H1 via SSH I can pass
control to HUB2 and execute commands in H2 and able to do desired
operations]
Now, Challenge has come away:
Now I need to copy the same file from HUB2 to HUB3 [which I had copied from
Hub1] by running the same script in Hub1 it self.So,I did scp in SSH as below:
ssh_stdin,ssh_stdout, ssh_stderr = ssh.exec_command("scp %s
usrname#112.186.104.7:/home/dir/ABCdir" %(imageName))
Here It has to prompt for user input for password like Please enter your
password, but above command is directing prompt statement to ssh_stderr
channel. Because of this I'm not able to successfully copy a file when I
execute scp in ssh.exec_command("").
Points to remember:
I don't want to Run two scripts one in H1 and another in H2(for scp) ---> ruled out
My intention is to copy a file from h1 -> h2 -> h3 which are independent each other.
I followed above procedure and In addition to this I tried couple of methods
adding unknown_host_key [by doing this in ,out & err channels are open but file transfer doesn't occur]
By invoking another script in HUB2 [I'm able to invoke scripts which does not require user input but my purpose is to give user input. So not able to trigger another script here].
I request to provide possible ways to nullify the above problem.

Running parameterized python script within bash script

I have a python script that logs me in to a service. I do:
./login.py user#email.com 'pass'
in order to log in.
When I enter this command directly I successfully login. When I run the following script server returns 400.
PYAPIROOT="scriptpath/script"
PYLOGIN="./login.py"
LOGIN="user#email.com"
PASS="'pass'"
function login {
echo -----------------------------
echo
cd $PYAPIROOT
echo "Logging in "$LOGIN
python "$PYLOGIN" "$LOGIN" "$PASS"
echo $PYLOGIN $LOGIN $PASS
echo -----------------------------
}
login
When I copy and run what is echo'ed I get 200.
Why can't I log in using my script?
I suspect it's the double-quoting:
PASS="'pass'"
Use this instead:
PASS="pass"

Python Program rsync

I am struggling to make a program on Python (Ubuntu) "To get a file from a directly connected Linux machine without prompting for Password"
Right now I am using this command on python but wanna put password in advance so it will not prompt me for password.
import os
os.system("echo 'hello world'")
os.system("rsync -rav pi#192.168.2.34:python ~/")
IP Address of Other Linux Machine is: 192.168.2.34
Password is: raspberry
Hostname: pi
You can achieve this by exchanging private keys. This way you can get a file from a directly connected Linux machine without prompting for Password. Here are the steps to exchange private keys:
Execute command ssh-keygen on your Ubuntu terminal.
Keep on pressing enter until something like this shows up:
The key's randomart image is:
+--[ RSA 2048]----+
| . .*|
| . + +o|
| + * + .|
| o E * * |
| S + + o +|
| o o o |
| . . . |
| |
| |
+-----------------+
After that execute ssh-copy-id pi#192.168.2.34 and enter password i.e., raspberry, if that is the password for the other machine.
Now execute python script as normal and it wont prompt for password.
import os
os.system("echo 'hello world'")
os.system("rsync -rav pi#192.168.2.34:python ~/")
You can try the following using pexpect and subprocess, the pexpect should definitely work, subprocess I am not sure:
cmd = "rsync -rav pi#192.168.2.34:python ~/"
from pexpect import *
run(cmd,events={'(?i)password':'your_password\r'})
from subprocess import PIPE,Popen
cmd = "rsync -rav pi#192.168.2.34:python ~/"
proc = Popen(cmd.split(),stdin=PIPE)
proc.stdin.write('your_pass\r')
proc.stdin.flush()
If you don't have pexpect installed use pip install pexpect
If you are on a private network (it should be as addresses are 192.168..), and if you trust all IP addresses on that network (means that no unauthorized user can spool an IP), you can also use host based authentication.
Extract from man page for ssh (I assume you use it as the underlying protocol for rsync) :
Host-based authentication works as follows: If the machine the user logs in from is listed in /etc/hosts.equiv or /etc/shosts.equiv on the remote machine, and the user names are the same on both sides, or if the files ~/.rhosts or ~/.shosts exist in the user's home directory on the remote machine and contain a line containing the name of the client machine and the name of the user on that machine, the user is considered for login.
That is you put in pi home directory a file .shosts containing one line
name_or_ip_address_of_source_machine user_name_on_source_machine
if the file already exists, just add that line.
But ... you must understand that as for BHAT IRSHAD's solution, it implies that you are now allowed to pass any command on dest machine as user pi without password.

transfer files from ssh server directly to client

i have a ssh server which is use to store my files online. i need to make those file available for download easily so i am using paramiko library to connect to the ssh server (from my python script), it then lists files and displays on the webpage. the problem is that i dont want to download the files to the web server's disk (dont have enough space) then send them to the clients , instead do something like read the file and spit it like it can be done in php.
something like this in python
<?php
// your file to upload
$file = '2007_SalaryReport.pdf';
header("Expires: 0");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
header("Content-type: application/pdf");
// tell file size
header('Content-length: '.filesize($file));
// set file name
header('Content-disposition: attachment; filename='.basename($file));
readfile($file);
// Exit script. So that no useless data is output-ed.
exit;
?>
Not a programmer's answer: instead of paramiko, mount the directory with documents with sshfs and serve the documents as if they were on the local file system
http://fuse.sourceforge.net/sshfs.html
A cgi script generates output on stdout which is delivered to the webserver. The output starts with header lines, one blank line (as a separator) and then the content.
So you have to change the header() calls to print statements. And readfile($file) should be replaced by
print
print open("2007_SalaryReport.pdf","rb").read()
See http://en.wikipedia.org/wiki/Common_Gateway_Interface

Categories