How can I access another server with Python? - python

I have two servers, and one updates with a DNSBL of 100k domains every 15 minutes. I want to process these domains through a Python script with information from Safebrowsing, Siteadvisor, and other services. Unfortunately, the server with the DNSBL is rather slow. Is there a way I can transfer the files over from the other server with SSH in Python?

If it's just files (and directories) you are transferring, why not just use rsync over ssh (in a bash script perhaps). A proven, mature method.
Or you could mount the remote filesystem (over ssh) into your own filesystem using sshfs (fuse) and then use something like pyrobocopy (implementing a basic version of rsync functionality in Python) to transfer the files.
If you don't need incremental copying, you could go the simple route: mount the remote filesystem using sshfs (link above) and then use shutil.copytree to copy the correct directory.
Or yet another option: implement it using the paramiko Python ssh module.

There is a module called pexpect which is pretty nice.
This allows you to ssh, telnet, etc. It also supports ftp which might be handy in transferring files.

Related

Pythonic way to compress a remote directory using SSH or FTP

I want to backup a remote directory with a lot of files, and because of that, I need to compress it before downloading it. I can access this folder through SSH or FTP. The host is running on Linux.
I have covered the downloading part with aioftp. I was using paramiko and tar Linux command for compressing the directory in the remote host, but instead, I want to use Python modules (from standard library or not) and avoid using Linux commands. Maybe a combination of paramiko to open the session, urllib to create the remote object and tarfile to compress it can make the job, but I haven't found the way.
In the end, I want a directory-backup.tar.gz in my localhost.
How can I accomplish that?
You have to compress the directory using tools on the server.
Using local Python code makes no sense. To compress the files locally, you would have to download the uncompressed files, compress them (with your local Python code), and upload the compressed archive, only to download it then again. That defies the purpose of the compression, right?
If you want to use Python code for compression, you would have to run the Python code on the server. Either by uploading the script and executing it on the server or by feeding the code to remote python process. I do not see much advantage of doing that over using ready-made tar command.

Airflow: How to download file from Linux to Windows via smbclient

I have a DAG that imports data from a source to a server. From there, I am looking to download that file from the server to the Windows network. I would like to keep this part in Airflow for automation purposes. Does anyone know how to do this in Airflow? I am not sure whether to use the os package, the shutil package, or maybe there is a different approach.
I think you're saying you're looking for a way to get files from a cloud server to a windows shared drive or onto a computer in the windows network, these are some options I've seen used:
Use a service like google drive, dropbox, box, or s3 to simulate a synced folder on the cloud machine and a machine in the windows network.
Call a bash command to SCP the files to the windows server or a worker in the network. This could work in the opposite direction too.
Add the files to a git repository and have a worker in the windows network sync the repository to a shared location. This option is only good in very specific cases. It has the benefit that you can track changes and restore old states (if the data is in CSV or another text format), but it's not great for large files or binary files.
Use rsync to transfer the files to a worker in the windows network which has the shared location mounted and move the files to the synced dir with python or bash.
Mount the network drive to the server and use python or bash to move the files there.
All of these should be possible with Airflow by either using python (shutil) or a bash script to transfer the files to the right directory for some other process to pick up or by calling a bash sub-process to perform the direct transfer by SCP or commit the data via git. You will have to find out what's possible with your firewall and network settings. Some of these would require coordinating tasks on the windows side (the git option for example would require some kind of cron job or task scheduler to pull the repository to keep the files up to date).

Script that can automatically download new data from the server to my local backup

I have an application running on linux server and I need to create a local backup of it's data.
However, new data is being added to the application after every hour and I want to sync my local backup data with server's data.
I want to write a script (shell or python) that can automatically download new added data from the linux server to my local machine backup. But I am newbie to the linux envoirnment and don't know how to write shell script to achieve this.
What is the better way of achieving this ? And what would be the script to do so ?
rsync -r fits in your use case and it's a single line command.
rsync -r source destination
or the options you need according to your specific case.
So, you don't need a python script for that, but you can still write it and let it use the command above.
Moreover, if you want the Python script to do it in an automatic way, you may check the event scheduler module.
This depends on where and how your data is stored on the Linux server, but you could write a network application which pushes the data to a client and the client saves the data on the local machine. You can use sockets for that.
If the data is available via aan http server and you know how to write RESTful APIs, you could use that as well and make a task run on your local machine every hour which calls the REST API and handles its (JSON) data. Keep in mind that you need to secure the API if the server is running online and not in the same LAN.
You could also write a small application which downloads the files every hour from the server over FTP (if you want to backup files stored on the system). You will need to know the exact path of the file(s) to do this though.
All solutions above are meant for Python programming. Using a shell script is possible, but a little more complicated. I would use Python for this kind of tasks, as you have a lot of network related libraries available (ftp, socket, http clients, simple http servers, WSGI libraries, etc.)

Airflow - retrieve files from Windows shared folders?

What is the best method to grab files from a Windows shared folder on the same network?
Typically, I am extracting data from SFTPs, SalesForce, or database tables, but there are a few cases where end-users need to upload a file to a shared folder that I have to retrieve. My process up to now has been to have a script running on a Windows machine which just grabs any new/changed files and loads them to an SFTP, but that is not ideal. I can't monitor it in my Airflow UI, I need to change my password on that machine physically, mapped network drives seem to break, etc.
Is there a better method? I'd rather the ETL server handle all of this stuff.
Airflow is installed on remote Linux server (same network)
Windows folders are just standard UNC paths where people have access based on their NT ID. These users are saving files which I need to retrieve. These users are non-technical and did not want WinSCP installed to share the data through an SFTP instead or even a Sharepoint (where I could use Shareplum, I think).
I would like to avoid mounting these folders and instead use Python scripts to simply copy the files I need as per an Airflow schedule
Best if I can save my NT ID and password within an Airflow connection to access it with a conn_id
If I'm understanding the question correctly, you have a shared folder mounted on your local machine — not the Windows server where your Airflow install is running. Is it possible to access the shared folder on the server instead?
I think a file sensor would work your use case.
If you could auto sync the shared folder to a cloud file store like S3, then you could use the normal S3KeySensor and S3PrefixSensor that are commonly used . I think this would simplify your solution as you wouldn't have to be concerned with whether the machine(s) the tasks are running on has access to the folder.
Here are two examples of software that syncs a local folder on Windows to S3. Note that I haven't used either of them personally.
https://www.cloudberrylab.com/blog/how-to-sync-local-folder-with-amazon-s3-bucket-with-cloudberry-s3-explorer/
https://s3browser.com/amazon-s3-folder-sync.aspx
That said, I do think using FTPHook.retrieve_file is a reasonable solution if you can't have your files in cloud storage.

Using a python webserver

I am looking for a web server, where I can upload files and download files from Ubuntu to Windows and the vice versa. I've builded a web server with Python and I share my folder in Ubuntu and download the files in this folder at Windows. Now I want to look up every millisecond if there is a new file and download this new files automatically. Is there any script or something helpfully for me?
Is a python web server a good solution?
There are many ways to synchronise folders, even remote.
If you need to stick with the python server approach for some reason, look for file system events libraries to trigger your upload code (for example watchdog).
But if not, it may be simpler to use tools like rsync + inotify, or simply lsync.
Good luck!
Edit: I just realized you want linux->windows sync, not the other way around, so since you don't have ssh server on the target (windows), rsync and lsync will not work for you, you probably need smbclient. In python, consider pysmbc or PySmbClient

Categories