Best way to copy multiple files to multiple remote servers - python

I'm developing a service which has to copy multiple files from a central node to remote servers.
The problem is that each time the service is executed, there are new servers and new files to dispatch to these servers. I mean, in each execution, I have the information of which files have to be copied to each server and in which directory.
Obviously, this information is very dynamically changing, so I would like to be able to automatize this task. I tried to get a solution with Ansible, FTP and SCP over Python.
I think Ansible is very difficult to automatize every scp task in each execution.
SCP is ok but I need to build each SCP command in Python to launch it.
FTP Is too much for this problem because there are not many files to dispatch to a single server.
Is there any better solution than what I thinked about?

In case you send some the same file (or files) to different destinations (that can be organized as sets), you could profit from solutions as dsh or parallel-scp.
If this will make sense depends on your use-case.

Parallel-SSH Documentation
from __future__ import print_function
from pssh.pssh_client import ParallelSSHClient
hosts = ['myhost1', 'myhost2']
client = ParallelSSHClient(hosts)
output = client.run_command('uname')
for host, host_output in output.items():
for line in host_output.stdout:
print(line)

How about rsync ?
Example: rsync -rave
Where source or destination could be:
user#IP:/path/to/dest
It knows incremental + you can Cron it or trigger a small script when anything changes

Related

Learning Python fast how can I protect some private connections from been exposed

Hi I'm new to the community and new to Python, experienced but rusty on other high level languages, so my question is simple.
I made a simple script to connect to a private ftp server, and retrieve daily information from it.
from ftplib import FTP
#Open ftp connection
#Connect to server to retrieve inventory
#Open ftp connection
def FTPconnection(file_name):
ftp = FTP('ftp.serveriuse.com')
ftp.login('mylogin', 'password')
#List the files in the current directory
print("Current File List:")
file = ftp.dir()
print(file)
# # #Get the latest csv file from server
# ftp.cwd("/pub")
gfile = open(file_name, "wb")
ftp.retrbinary('RETR '+ file_name, gfile.write)
gfile.close()
ftp.quit()
FTPconnection('test1.csv')
FTPconnection('test2.csv')
That's the whole script, it passes my credentials, and then calls the function FTPconnection on two different files I'm retrieving.
Then my other script that processes them has an import statement, as I tried to call this script as a module, what my import does it's just connect to the FTP server and fetch information.
import ftpconnect as ftpc
This is the on the other Python script, that does the processing.
It works but I want to improve it, so I need some guidance on best practices about how to do this, because in Spyder 4.1.5 I get an 'Module ftpconnect called but unused' warning ... so probably I am missing something here, I'm developing on MacOS using Anaconda and Python 3.8.5.
I'm trying to build an app, to automate some tasks, but I couldn't find anything about modules that guided me to better code, it simply says you have to import whatever .py file name you used and that will be considered a module ...
and my final question is how can you normally protect private information(ftp credentials) from being exposed? This has nothing to do to protect my code but the credentials.
There are a few options for storing passwords and other secrets that a Python program needs to use, particularly a program that needs to run in the background where it can't just ask the user to type in the password.
Problems to avoid:
Checking the password in to source control where other developers or even the public can see it.
Other users on the same server reading the password from a configuration file or source code.
Having the password in a source file where others can see it over your shoulder while you are editing it.
Option 1: SSH
This isn't always an option, but it's probably the best. Your private key is never transmitted over the network, SSH just runs mathematical calculations to prove that you have the right key.
In order to make it work, you need the following:
The database or whatever you are accessing needs to be accessible by SSH. Try searching for "SSH" plus whatever service you are accessing. For example, "ssh postgresql". If this isn't a feature on your database, move on to the next option.
Create an account to run the service that will make calls to the database, and generate an SSH key.
Either add the public key to the service you're going to call, or create a local account on that server, and install the public key there.
Option 2: Environment Variables
This one is the simplest, so it might be a good place to start. It's described well in the Twelve Factor App. The basic idea is that your source code just pulls the password or other secrets from environment variables, and then you configure those environment variables on each system where you run the program. It might also be a nice touch if you use default values that will work for most developers. You have to balance that against making your software "secure by default".
Here's an example that pulls the server, user name, and password from environment variables.
import os
server = os.getenv('MY_APP_DB_SERVER', 'localhost')
user = os.getenv('MY_APP_DB_USER', 'myapp')
password = os.getenv('MY_APP_DB_PASSWORD', '')
db_connect(server, user, password)
Look up how to set environment variables in your operating system, and consider running the service under its own account. That way you don't have sensitive data in environment variables when you run programs in your own account. When you do set up those environment variables, take extra care that other users can't read them. Check file permissions, for example. Of course any users with root permission will be able to read them, but that can't be helped. If you're using systemd, look at the service unit, and be careful to use EnvironmentFile instead of Environment for any secrets. Environment values can be viewed by any user with systemctl show.
Option 3: Configuration Files
This is very similar to the environment variables, but you read the secrets from a text file. I still find the environment variables more flexible for things like deployment tools and continuous integration servers. If you decide to use a configuration file, Python supports several formats in the standard library, like JSON, INI, netrc, and XML. You can also find external packages like PyYAML and TOML. Personally, I find JSON and YAML the simplest to use, and YAML allows comments.
Three things to consider with configuration files:
Where is the file? Maybe a default location like ~/.my_app, and a command-line option to use a different location.
Make sure other users can't read the file.
Obviously, don't commit the configuration file to source code. You might want to commit a template that users can copy to their home directory.
Option 4: Python Module
Some projects just put their secrets right into a Python module.
# settings.py
db_server = 'dbhost1'
db_user = 'my_app'
db_password = 'correcthorsebatterystaple'
Then import that module to get the values.
# my_app.py
from settings import db_server, db_user, db_password
db_connect(db_server, db_user, db_password)
One project that uses this technique is Django. Obviously, you shouldn't commit settings.py to source control, although you might want to commit a file called settings_template.py that users can copy and modify.
I see a few problems with this technique:
Developers might accidentally commit the file to source control. Adding it to .gitignore reduces that risk.
Some of your code is not under source control. If you're disciplined and only put strings and numbers in here, that won't be a problem. If you start writing logging filter classes in here, stop!
If your project already uses this technique, it's easy to transition to environment variables. Just move all the setting values to environment variables, and change the Python module to read from those environment variables.

Using pyftpdlib with file conveyor

I decided to try using fileconveyor in order to write a simple app that will be able to sync a directory (with very small word files) across all my computers.
In order to do that I also installed pydtpdlib so as to write a simple ftp server that fileconveyor will link to.
pydtpdlib comes with a number of examples so I used one of them to run a server on 0.0.0.0:2121 and configured file conveyor to connect to it which it did, reporting back that it is
- Fully up and running now.
The ftp server also logged the connection as such
USER 'user' logged in.
FTP session closed (disconnect).
But I am not quite sure on what to do now.
1.How can I make the ftp server save uploaded files to a directory of my choosing?
2.Will fileconveyor be able to sync the files both ways?
3.If yes how is that possible, as it would have to track changes on the files in the remote machine?
4.Is what I am trying to do a good idea or should I be using file conveyor differently, possibly not with pyftpdlib but some other service?
Answer to 1: You can configure a directory per user and per anonymus user, see example (add_user/add_anonymous)
Answers to pp 2 and 3: I don't think it's possible to make a reliable implementation of a two-way sync application using only a standard FTP server on the one of the sides. Such apps need more information than FTP protocol provides.
Answer to 4: Why do you need pyftpdlib? I believe it's good for building a customized embedded FTP server. You can use any popular FTP servers like ProFTPD or Filezilla. They are well documented and you can find a lot of HOW-TOs.
BTW why don't you want to use Dropbox?

How to copy simultaneously from multiple nodes using Fabric?

I have just started using Fabric, looks like a very useful tool. I am able to write a tiny script to run some commands in parallel on my Amazon EC2 hosts, something like this:
#parallel
def runs_in_parallel():
sudo("sudo rm -rf /usr/lib/jvm/j2sdk1.6-oracle")
Also, I have written another script to copy all the Hadoop logs from all EC2 nodes to my local machine. This script creates a folder with timestamp as name, within that 1 folder for each node as its IP address and then copies that node's logs in this IP address named folder. E.g.:
2014-04-22-15-52-55
50.17.94.170
hadoop-logs
54.204.157.86
hadoop-logs
54.205.86.22
hadoop-logs
Now I want to do this copy task using Fabric so that I can copy the logs in parallel, to save time. I thought I can easily do it the way I did in my first code snippet, but that won't help, as it runs commands on the remote server. I have no clue as of now how to do this. Any help is much appreciated.
You could likely use the get() command to handle pulling down files. You'd want to make them into tarballs, and have them pull into unique filenames on your client to keep the gets from clobbering one another.

How to get certain types of files from one server to another in Python?

Suppose I have one server called server 1. On this server there's a directory called dir1. dir1 has 3 files in them called neh_iu.dat_1, neh_hj.dat_2, jen_ak.dat_1.
I need to get ONLY the 'neh' files from server 1 to another server called server 2. server 2 is where I will be performing certain modifications on these files.
How do I get ONLY the 'neh' files in Python? I'm new to python. I'm aware of a module called paramiko which allows for file transfers but assuming that there are millions of 'neh' files in dir1, and that I don't know the full names of all of them, how can I get an automated process for it in Python?
If you really need to use python instead of bash (assuming you're on unix).
>>> import subprocess
>>> subprocess.call("tar cvzf /path/to/ftp-or-static-http/foo.tgz /path/to/dir/neh*")
This will create a tar file with all the neh*s files. Easy to be transferred between servers (it's only one file instead of millions).
Use FTP, SFTP, HTTP or any transfer protocol supported by your server and perform a request from the other server (curl or ftp).

Python: Verifying NFS authentication

Folks,
I believe there are two questions I have: one python specific and the other NFS.
The basic point is that my program gets the 'username', 'uid', NFS server IP and exported_path as input from the user. It now has to verify that the NFS exported path is readable/writable by this user/uid.
My program is running as root on the local machine. The straight-forward approach is to 'useradd' a user with the given username and uid, mount the NFS exported path (run as root for mount) on some temporary mount_point and then execute 'su username -c touch /mnt_pt/tempfile'. IF the username and userid input were correct (and the NFS server was setup correctly) this touch of tempfile will succeed creating tempfile on the NFS remote directory. This is the goal.
Now the two questions are:
(i) Is there a simpler way to do this than creating a new unix user, mounting and touching a file to verify the NFS permissions?
(ii) If this is what needs to be done, then I wonder if there are any python modules/packages that will help me execute 'useradd', 'userdel' related commands? I currently intend to use the respective binaries(/usr/sbin/useradd etc) and then invoke subprocess.Popen to execute the command and get the output.
Thank you for any insight.
i) You could do something more arcane, but short of actually touching the file you probably aren't going to be testing exactly what you need to test, so I think I'd probably do it the way you suggest.
ii) You might want to check out the python pwd module if you want to verify user existance or the like, but you'll probably need to leverage the useradd/userdel programs themselves to do the dirty work.
You might want to consider leveraging sudo for your program so the entire thing doesn't have to run as root, it seems like a pretty risky proposition.
There is a python suite to test NFS server functionality.
git://git.linux-nfs.org/projects/bfields/pynfs.git
While it's for NFSv4 you can simply adopt it for v3 as well.

Categories