I have a fabfile that run several tasks on the hosts. This results in the creation of a file result.txt in each of the hosts.
Now I want to get all these files locally. This is what I tried:
from invoke import task
#task
def getresult(ctx):
ctx.get('result.txt')
I run with:
fab -H host1 host2 host3 getresult
In the end, I have only one file result.txt in my local machine (it seems to be the copy from the last host of the command line). I would like to get all the files.
Is there a way to do this with fabric v2? I did not find anything in the documentation. It seems that this was possible in fabric v1, but I am not sure for the v2.
In Fabric v2, the get API signature is:
get(remote, local=None, preserve_mode=True)
So when you don't specify the name with which it has to be stored locally, it uses the same name it has in the remote location. Hence it is being overwritten for each host it executes on, and you're ending up with the last one.
One way to fix this would be to mention the local file name and add a random suffix or prefix to it. Something like this.
from invoke import task
import random
#task
def getresult(ctx):
ctx.get('result.txt', 'result%s.txt' % random.random()*100)
This way, each time it executes, it stores the file with a unique name.
You can even add the host name to the file, if you can find a way to use it inside the method.
Related
I created a Spotipy script that pulls the last 50 songs I listened to and adds them, and their audio features, to a Google Sheets. However, I'd love to be able to run this script daily without me manually having to run it, but I have very little experience with CRON scheduling. I'm having trouble wrapping my head around how it can be run given all of the command line arguments I need to enter.
The code requires multiple command line arguments passed first, such as
export SPOTIPY_REDIRECT_URI="http://google.com"
export SPOTIPY_CLIENT_SECRET='secret'
and similar for the client ID.
Additionally, the first argument after the script call is the username username = sys.argv[1].
Most importantly, it prompts me to copy and paste a redirect URL into the command line, which is unique each run.
Is it at all possible to pass the redirect URL to the command line each time the script is run using CRON?
I think what you're looking to accomplish can be achieved in one of two ways.
First, you could write a shell script to handle the export commands and passing the redirect URI to your script. Second, with a few tweaks to your script, you should be able to run it headlessly, without the intermediary copy/paste step.
I'm going to explore the second route in my answer.
Regarding the environment variables, you have a couple options. Depending on your distribution, you may be able to set these variables in the crontab before running the job. Alternatively, you can set the exports in the same job you use to run your script, separating the commands by semicolon. See this answer for a detailed explanation.
Then, regarding the username argument: fortunately, script arguments can also be passed via cron. You just list the username after the script's name in the job; cron will pass it as an argument to your script.
Finally, regarding the redirect URI: if you change your redirect URI to something like localhost with a port number, the script will automatically redirect for you. This wasn't actually made clear to me in the spotipy documentation, but rather from the command line when authenticating with localhost: "Using localhost as redirect URI without a port. Specify a port (e.g. localhost:8080) to allow automatic retrieval of authentication code instead of having to copy and paste the URL your browser is redirected to." Following this advice, I was able to run my script without having to copy/paste the redirect URL.
Add something like "http://localhost:8080" to your app's Redirect URI's in the Spotify Dashboard and then export it in your environment and run your script--it should run without your input!
Assuming that works, you can put everything together in a job like this to execute your script daily at 17:30:
30 17 * * * export SPOTIPY_REDIRECT_URI="http://localhost:8080"; export SPOTIPY_CLIENT_SECRET='secret'; python your_script.py "your_username"
I'm developing a service which has to copy multiple files from a central node to remote servers.
The problem is that each time the service is executed, there are new servers and new files to dispatch to these servers. I mean, in each execution, I have the information of which files have to be copied to each server and in which directory.
Obviously, this information is very dynamically changing, so I would like to be able to automatize this task. I tried to get a solution with Ansible, FTP and SCP over Python.
I think Ansible is very difficult to automatize every scp task in each execution.
SCP is ok but I need to build each SCP command in Python to launch it.
FTP Is too much for this problem because there are not many files to dispatch to a single server.
Is there any better solution than what I thinked about?
In case you send some the same file (or files) to different destinations (that can be organized as sets), you could profit from solutions as dsh or parallel-scp.
If this will make sense depends on your use-case.
Parallel-SSH Documentation
from __future__ import print_function
from pssh.pssh_client import ParallelSSHClient
hosts = ['myhost1', 'myhost2']
client = ParallelSSHClient(hosts)
output = client.run_command('uname')
for host, host_output in output.items():
for line in host_output.stdout:
print(line)
How about rsync ?
Example: rsync -rave
Where source or destination could be:
user#IP:/path/to/dest
It knows incremental + you can Cron it or trigger a small script when anything changes
I am writing some functions, using paramiko, to execute commands and create files on a remote host. I would like to write some unit tests for them, but I don't know what would be the simplest way to achieve this? This is what I envisage as being an example outline of my code:
import os
import paramiko
import pytest
def my_function(hostname, relpath='.', **kwargs):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(hostname, **kwargs)
sftp = ssh.open_sftp()
sftp.chdir(relpath)
stdin, stdout, stderr = ssh.exec_command("echo hallo > test.txt")
#pytest.fixture("module")
def mock_remote_host():
# start a remote host here with a local test path
try:
yield hostname, testpath, {"username":"bob", "password":"1234"}
finally:
# delete the test path
# close the remote host
def test_my_function(mock_remote_host):
hostname, dirpath, kwargs = mock_remote_host
my_function(hostname, **kwargs)
filepath = os.path.join(dirpath, 'test.txt')
assert os.path.exists(filepath)
I have had a look at the paramiko test modules, but they seem quite complex for my use case and I'm not sure how to go about simplifying them.
I think what you really need to mock is paramiko.SSHClientobject. You are unittesting your function my_function, you can assume paramiko module works correctly and the only thing you need to unit test is if my_function calls methods of this paramiko.SSHClient in correct way.
To mock paramiko.SSH module you can use unittest.mock and decorate your test_my_function function with #mock.patch.object(paramiko.SSHClient, sshclientmock). You have to define sshclientmock as some kind of Mock or MagicMock first.
Also in python 2.7 there is some equivalent of unittest.mock but I dont remember where to find it exactly.
EDIT: As #chepner mentioned in comment. For python 2.7 you can find mock module in pypi and install it using pip install mock
To answer my own question, I have created: https://github.com/chrisjsewell/atomic-hpc/tree/master/atomic_hpc/mockssh.
As the readme discusses; it is based on https://github.com/carletes/mock-ssh-server/tree/master/mockssh with additions made (to implement more sftp functions) based on https://github.com/rspivak/sftpserver
The following changes have also been made:
revised users parameter, such that either a private_path_key or password can be used
added a dirname parameter to the Server context manager, such that the this will be set as the root path for the duration of the context.
patched paramiko.sftp_client.SFTPClient.chdir to fix its use with relative paths.
See test_mockssh.py for example uses.
If you want to test remote connectivity, remote filesystem structure and remote path navigation you have to set-up a mock host server (a VM maybe). In other words if you want to test your actions on the host you have to mock the host.
If you want to test your actions with the data of the host the easiest way seems to proceed as running.t said in the other answer.
I agree with HraBal, because of "Infrastructure as code". You can treat virtual machine as a block of code.
For example:
you can use vagrant or docker to initialize a SSH server and then, and modify your DNS configuration file. target domain 127.0.0.1
put application into the server. and run paramiko to connect target domain and test what you want.
I think it is the benefit that you can do this for all programming languages and not need to reinvent the wheel . In addition, you and your successors will know the detail of the system.
(My English is not very good)
I want to deploy my scrapy project on a ip that is not listed in the scrapy.cfg file , because the ip can change and i want to automate the process of deploying. i tried giving the ip of the server directly in the deploy command but it did not work. any suggestion to do this?
First, you should consider assigning a domain to the server, so you can always get to it regardless of its dynamic IP. DynDNS comes handy at times.
Second, you probably won't do the first, because you haven't got access to the server, or for whatever other reason. In that case, I suggest mimicking above behavior by using your system's hosts file. As described at wikipedia article:
The hosts file is a computer file used by an operating system to map hostnames to IP addresses.
For example, lets say you set your url to remotemachine in your scrapy.cfg. You can write a script that would edit the hosts file with the latest IP address, and execute it before deploying your spider. This approach has a benefit of having a system-wide effect, so if you are deploying multiple spiders, or using the same server for some other purpose, you don't have to update multiple configuration files.
This script could look something like this:
import fileinput
import sys
def update_hosts(hostname, ip):
if 'linux' in sys.platform:
hosts_path = '/etc/hosts'
else:
hosts_path = 'c:\windows\system32\drivers\etc\hosts'
for line in fileinput.input(hosts_path, inplace=True):
if hostname in line:
print "{0}\t{1}".format(hostname, ip)
else:
print line.strip()
if __name__ == '__main__':
hostname = sys.argv[1]
ip = sys.argv[2]
update_hosts(hostname, ip)
print "Done!"
Ofcourse,you should do additional argument checks, etc., this is just a quick example.
You can then run it prior deploying like this:
python updatehosts.py remotemachine <remote_ip_here>
If you want to take it a step further and add this functionality as a simple argument to scrapyd-deploy, you can go ahead and edit your scrapyd-deploy file (its just a Python script) to add the additional parameter and update the hosts file from within. But I'm not sure this is the best thing to do, since leaving this implementation separate and more explicit would probably be a better choice.
This is not something you can solve on the scrapyd side.
According to the source code of scrapyd-deploy, it requires the url to be defined in the [deploy] section of the scrapy.cfg.
One of the possible workarounds could be having a placeholder in scrapy.cfg which you would replace with a real IP address of the target server, before starting scrapyd-deploy.
I have just started using Fabric, looks like a very useful tool. I am able to write a tiny script to run some commands in parallel on my Amazon EC2 hosts, something like this:
#parallel
def runs_in_parallel():
sudo("sudo rm -rf /usr/lib/jvm/j2sdk1.6-oracle")
Also, I have written another script to copy all the Hadoop logs from all EC2 nodes to my local machine. This script creates a folder with timestamp as name, within that 1 folder for each node as its IP address and then copies that node's logs in this IP address named folder. E.g.:
2014-04-22-15-52-55
50.17.94.170
hadoop-logs
54.204.157.86
hadoop-logs
54.205.86.22
hadoop-logs
Now I want to do this copy task using Fabric so that I can copy the logs in parallel, to save time. I thought I can easily do it the way I did in my first code snippet, but that won't help, as it runs commands on the remote server. I have no clue as of now how to do this. Any help is much appreciated.
You could likely use the get() command to handle pulling down files. You'd want to make them into tarballs, and have them pull into unique filenames on your client to keep the gets from clobbering one another.