Simple SFTP script - python

I am getting error failed to load hostkeys.
Hello,
I'm having trouble with a script i wrote to push a file to an SFTP. I'm using windows and the below is the code I have so far. The actual data manipulation and naming change works fine - it breaks at the SFTP portion. I'm hoping one of you guru's out there can help a newbie out.
import pandas as pd
import datetime
import pysftp
import sys
today = str(datetime.date.today().strftime("%m%d%y"))
report = pd.read_csv('C:\\Users\\nickkeith2\\PycharmProjects\\clt\\041719_clt_Facility_company_Inv.csv')
report.columns = report.columns.str.replace('_', ' ')
report.to_csv('C:\\Users\\nickkeith2\\PycharmProjects\\clt\\' + today + '_clt_Facility_company_Inv2.csv',
index=False)
remote_file = 'C:\\Users\\nickkeith2\\PycharmProjects\\clt\\' + today + '_clt_Facility_company_Inv2.csv'[1]
cnopts = pysftp.CnOpts()
cnopts.hostkeys.load("C:\\Users\\nickkeith2\\id_rsa.pub")
srv = pysftp.Connection(host="xx.xxx.xxx.xxx", username="sftpuser")
srv.put(remote_file)
srv.close()
print(report.columns)
I have tried various of combinations of using a key, not using a key and using a password instead - but no matter what it returns the error:
UserWarning: Failed to load HostKeys from C:\Users\nickkeith2\.ssh\known_hosts. You will need to explicitly load HostKeys (cnopts.hostkeys.load(filename)) or disableHostKey checking (cnopts.hostkeys = None).
warnings.warn(wmsg, UserWarning)
I tried to create the folder in windows it specifies to put the key there but it will not allow me too. Thank you in advance on any insight you may be able to provide.

Transcribing Charles Duffy's comments as an answer:
An id_rsa.pub file is not a host key. To be clear, the host keys file is used by the client to make sure the host it's connecting to is genuine (it's a list of the known public keys identifying the remote servers, not anything to do with the public keys identifying the user who's logging in).

I was able to successfully complete this using paramiko instead of PYSFTP.
Thank you!
import pandas as pd
import datetime
import paramiko
ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
today = str(datetime.date.today().strftime("%m%d%y"))
report = pd.read_csv('C:\\Users\\nickkeith2\\PycharmProjects\\clt\\041719_marshall_Facility_clt_Inv.csv')
report.columns = report.columns.str.replace('_', ' ')
report.to_csv('C:\\Users\\nickkeith2\\PycharmProjects\\clt\\' + today + '_marshall_Facility_clt_Inv2.csv',
index=False)
remote_file = 'C:\\Users\\nickkeith2\\PycharmProjects\\clt\\' + today + '_marshall_Facility_clt_Inv2.csv'
ssh_client.connect(hostname="xxxxxxxx", username="xxxxxxxx", password="xxxxxxxxxx")
ftp_client=ssh_client.open_sftp()
ftp_client.put(remote_file, '\\Outbound\\'+ today + '_marshall_Facility_clt_Inv2.csv')
ftp_client.close()
print(report.columns)

Related

504 Deadline Exceeded error when downloading BQ query results to Python dataframe

I'm using Python to run a query on a BigQuery dataset and then put the results into a Python dataset.
The query runs OK; I can see a temporary table is created for the results in the dataset in BQ, but when using the query client's to_dataset method, it falls over on the 504 Deadline Exceeded error
client = bigquery.Client( credentials=credentials, project= projectID )
dataset = client.dataset('xxx')
table_ref = dataset.table('xxx')
JobConfig = bigquery.QueryJobConfig(destination = table_ref)
client.delete_table(table_ref, not_found_ok=True)
QueryJob = client.query(queryString, location='EU', job_config=JobConfig)
QueryJob.result()
results = client.list_rows(table_ref, timeout =100).to_dataframe()
It all runs fine until the last line. I've added a timeout argument to the list_rows method, but it hasn't helped.
I'm running this on a Windows virtual machine, with Python 3.8 installed.
(I've also tested the same code on my laptop and it worked just fine - don't know what's different.)
Take a look at:
https://github.com/googleapis/python-bigquery-storage/issues/4
it's a known bug in Windows, the "solution" is to:
import google.cloud.bigquery_storage_v1.client
from functools import partialmethod
# Set a two hours timeout
google.cloud.bigquery_storage_v1.client.BigQueryReadClient.read_rows = partialmethod(google.cloud.bigquery_storage_v1.client.BigQueryReadClient.read_rows, timeout=3600*2)
Providing that you'll use:
bqClient = bigquery.Client(credentials=credentials, project=project_id)
bq_storage_client = bigquery_storage_v1.BigQueryReadClient(credentials=credentials)
raw_training_data = bqClient.query(SOME_QUERY).to_arrow(bqstorage_client=bq_storage_client).to_pandas()
If you can use pandas try this :
import pandas as pd
df = pd.read_gbq("select * from `xxx.xxx`", dialect='standard', use_bqstorage_api=True)
To be able to use use_bqstorage_api you have to set it up on GCP. Read more about that in documentation
This link has helped me : https://googleapis.dev/python/bigquery/latest/usage/pandas.html
My working code is :
credentials, your_project_id = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
bqclient = bigquery.Client(credentials=credentials, project=your_project_id)
query_string = """SELECT..."""
df = bqclient.query(query_string).to_dataframe()
Hope it will help you guys

Constructing a URI in Python?

I'm trying to convert a Java program to Python, and one thing that I am currently stuck on is working with URI's in Python. I found urllib.response in Python, but I'm struggling to figure out how to utilize it.
What I'm trying to do with this URI is obtain user info (particularly username and password), the host and path. In Java, there are associated methods (getUserInfo(), getHost(), and getPath()) for this, but I'm having trouble finding equivalents for this in Python, even after looking up the urllib.response Python documentation.
The equivalent code in Java is:
URI dbUri = new URI(env);
username = dbUri.getUserInfo().split(":")[0];
password = dbUri.getUserInfo().split(":")[1];
dbUrl = "jdbc:postgresql://" + dbUri.getHost() + dbUri.getPath();
and what would be the appropriate methods that could be used to convert this to Python?
Seems like you'd want to use something like urllib.parse.urlparse.
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse
from urllib.parse import urlparse
db_url = urlparse(raw_url_string)
username = db_url.username
password = db_url.password
host = db_url.net_loc
path = db_url.path
...
You might need to adjust this a bit. There is a subtle difference between urlparse and urlsplit regarding parameters. Then you can use one of the urlunparse or unsplit.
Code
from urllib import parse
from urllib.parse import urlsplit
url = 'http://localhost:5432/postgres?user=postgres&password=somePassword'
split_url = urlsplit(url)
hostname = split_url.netloc
path = split_url.path
params = dict(parse.parse_qsl(split_url.query))
username = params['user']
password = params['password']
db_url = "jdbc:postgresql://" + hostname + path
print(db_uri)
Output
jdbc:postgresql://localhost:5432/postgres

How to handle certificates in case of the OpenSSLerror in Python (Windows 10)?

I am new here and new on python scripting with the following problem and hope somebody can help me somehow.
I made a script in Python for parsing an internal corporate link where is stored a JSON file with internal exchange rates. The data is loaded into a dataframe and saved in an Excel file.
Detials to mention:
- The certificate was saved using the web-browser;
- I don't have full rights on the PC;
- OS: Windows10
- The certificate is PEM format (I have checked according the suggestions for other questions)
The problem is that the script is failing on certificate verification getting the OpenSSL error (provided below). The script is working only if I am using the option Verify=False.
I have tried all founded solutions for other similar questions without any successful result.
I couldn't figure out what the error exactly means and where could the issue.
Thank you very much in advance.
OpenSSL error: OpenSSL.SSL.Error: [('PEM routines', 'get_name', 'no start line'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'PEM lib')]
from requests_ntlm import HttpNtlmAuth
import requests
import json
import pandas as pd
import os
certfx = os.getcwd() + '\\rootfxCert.cer'
session = requests.Session()
session.auth = HttpNtlmAuth('domain\\user','password')
session.cert = certfx
from_date = input("Insert start date (YYYY-MM-DD): > ")
to_date = input("Insert end date (YYYY-MM-DD): > ")
df = pd.DataFrame()
currList = ["EUR", "USD"]
for date in pd.date_range(start=from_date,end=to_date):
for curr in currList:
strDate = str(date)[:10]
url = "https://apex.sample.grp/api/exrates/daily-rates/" + strDate + "/" + curr
source = session.get(url, cert = session.cert)#, verify = certfx)
data = source.json()
new_df=pd.DataFrame.from_dict(data['items'], orient='columns', dtype=None)
df = df.append(new_df)
df.to_excel("output.xlsx", index = False)

Get elasticloadbalancers names with boto3

When I try to print the load balancers from aws I get a huge dictionary with a lot of keys, but when I'm trying to print only the 'LoadBalancerName' value I get: None, I want to print all the load balancers names in our environment how I can do it? thanks!
What I tried:
import boto3
client = boto3.client('elbv2')
elb = client.describe_load_balancers()
Name = elb.get('LoadBalancerName')
print(Name)
The way in which you were handling the response object was incorrect, and you'll need to put it in a loop if you want all the Names and not just one. What you'll you'll need is this :
import boto3
client = boto3.client('elbv2')
elb = client.describe_load_balancers()
for i in elb['LoadBalancers']:
print(i['LoadBalancerArn'])
print(i['LoadBalancerName'])
However if your still getting none as a value it would be worth double checking what region the load balancers are in as well as if you need to pass in the use of a profile too.

Extract Keyfile JSON from saved connection of type "google_cloud_platform"

I have saved a connection of type "google_cloud_platform" in Airflow as described here https://cloud.google.com/composer/docs/how-to/managing/connections
Now in my DAG, I need to extract from the saved connection the Keyfile JSON
What is the correct hook to be used?
Use airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook to get the stored connection. For example
from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
gcp_hook = GoogleCloudBaseHook(gcp_conn_id="<your-conn-id>")
keyfile_dict = gcp_hook._get_field('keyfile_dict')
You can just use BaseHook as follows:
from airflow.hooks.base_hook import BaseHook
GCP_CONNECTION_ID="my-gcp-connection"
BaseHook.get_connection(GCP_CONNECTION_ID).extras["extra__google_cloud_platform__keyfile_dict"]
The other solutions no longer work. Here's a way that's working in 2023:
from airflow.models import Connection
conn = Connection.get_connection_from_secrets(
conn_id='my-gcp-connection'
)
json_key = conn.extra_dejson['keyfile_dict']
with open('gcp_svc_acc.json', 'w') as f:
f.write(json_key)
Mostly because the imports moved around I think.
The question specifically refers to keyfile JSON, but this is a quick addendum for those who configured keyfile path instead: take care to check if it's keyfile_dict OR keyfile_path that the Airflow admin configured, as they're two different ways to set up the connection.

Categories