I need to copy a file from a distributed storage network to my linux machine. I only get the UNC path to the file from a database. There are about 20 servers with more than 30 shares on each server, so mounting them local with samba is not a good option.
i tried open(r'\\filestore01\share01\00\000001', 'r') on a Windows machine, which works, but not on linux. I also found Python Linux-copy files to windows shared drive (samba) which will again mount before copy...
is that the only solution available? is there no native lib to copy from a windows share?
The best solution to the problem, to avoid mounting things everywhere, is to use libraries like pysmb:
https://pythonhosted.org/pysmb/api/smb_SMBConnection.html
Probably the best idea would be to write a wrapper for such URLs.
Related
I have a DAG that imports data from a source to a server. From there, I am looking to download that file from the server to the Windows network. I would like to keep this part in Airflow for automation purposes. Does anyone know how to do this in Airflow? I am not sure whether to use the os package, the shutil package, or maybe there is a different approach.
I think you're saying you're looking for a way to get files from a cloud server to a windows shared drive or onto a computer in the windows network, these are some options I've seen used:
Use a service like google drive, dropbox, box, or s3 to simulate a synced folder on the cloud machine and a machine in the windows network.
Call a bash command to SCP the files to the windows server or a worker in the network. This could work in the opposite direction too.
Add the files to a git repository and have a worker in the windows network sync the repository to a shared location. This option is only good in very specific cases. It has the benefit that you can track changes and restore old states (if the data is in CSV or another text format), but it's not great for large files or binary files.
Use rsync to transfer the files to a worker in the windows network which has the shared location mounted and move the files to the synced dir with python or bash.
Mount the network drive to the server and use python or bash to move the files there.
All of these should be possible with Airflow by either using python (shutil) or a bash script to transfer the files to the right directory for some other process to pick up or by calling a bash sub-process to perform the direct transfer by SCP or commit the data via git. You will have to find out what's possible with your firewall and network settings. Some of these would require coordinating tasks on the windows side (the git option for example would require some kind of cron job or task scheduler to pull the repository to keep the files up to date).
What is the best method to grab files from a Windows shared folder on the same network?
Typically, I am extracting data from SFTPs, SalesForce, or database tables, but there are a few cases where end-users need to upload a file to a shared folder that I have to retrieve. My process up to now has been to have a script running on a Windows machine which just grabs any new/changed files and loads them to an SFTP, but that is not ideal. I can't monitor it in my Airflow UI, I need to change my password on that machine physically, mapped network drives seem to break, etc.
Is there a better method? I'd rather the ETL server handle all of this stuff.
Airflow is installed on remote Linux server (same network)
Windows folders are just standard UNC paths where people have access based on their NT ID. These users are saving files which I need to retrieve. These users are non-technical and did not want WinSCP installed to share the data through an SFTP instead or even a Sharepoint (where I could use Shareplum, I think).
I would like to avoid mounting these folders and instead use Python scripts to simply copy the files I need as per an Airflow schedule
Best if I can save my NT ID and password within an Airflow connection to access it with a conn_id
If I'm understanding the question correctly, you have a shared folder mounted on your local machine — not the Windows server where your Airflow install is running. Is it possible to access the shared folder on the server instead?
I think a file sensor would work your use case.
If you could auto sync the shared folder to a cloud file store like S3, then you could use the normal S3KeySensor and S3PrefixSensor that are commonly used . I think this would simplify your solution as you wouldn't have to be concerned with whether the machine(s) the tasks are running on has access to the folder.
Here are two examples of software that syncs a local folder on Windows to S3. Note that I haven't used either of them personally.
https://www.cloudberrylab.com/blog/how-to-sync-local-folder-with-amazon-s3-bucket-with-cloudberry-s3-explorer/
https://s3browser.com/amazon-s3-folder-sync.aspx
That said, I do think using FTPHook.retrieve_file is a reasonable solution if you can't have your files in cloud storage.
I am looking for a web server, where I can upload files and download files from Ubuntu to Windows and the vice versa. I've builded a web server with Python and I share my folder in Ubuntu and download the files in this folder at Windows. Now I want to look up every millisecond if there is a new file and download this new files automatically. Is there any script or something helpfully for me?
Is a python web server a good solution?
There are many ways to synchronise folders, even remote.
If you need to stick with the python server approach for some reason, look for file system events libraries to trigger your upload code (for example watchdog).
But if not, it may be simpler to use tools like rsync + inotify, or simply lsync.
Good luck!
Edit: I just realized you want linux->windows sync, not the other way around, so since you don't have ssh server on the target (windows), rsync and lsync will not work for you, you probably need smbclient. In python, consider pysmbc or PySmbClient
I do not have access to the admin account in Windows 7. Is there a way to install RabbitMQ and its required Erlang without admin privileges? In some portable way?
I need to use it in my Python Celery project.
Thanks!
It is possible. Here's how I've done it:
You need to create a portable Erlang and acquire RabbitMQ server files.
You can install regular Erlang to another computer, then copy the whole installation directory to the computer with limited account. You can use local documents, or AppData like C:\Users\Limited_Account\AppData\erl5.10.4
(If you don't have any access to another computer, you can extract the setup file with 7-Zip but it'll be troublesome to fix paths.)
Modify the erg.ini in the bin folder with the new path. (By default erg.ini uses Unix line endings, so it might be seen as a single line.)
[erlang]
Bindir=C:\\Users\\Limited_Account\\AppData\\erl5.10.4\\erts-5.10.4\\bin
Progname=erl
Rootdir=C:\\Users\\Limited_Account\\AppData\\erl5.10.4\\erl5.10.4
See if bin\erl.exe opens up Erlang Shell. If you see a crash dump, path might not be correct. If Visual C++ Redist. files were not installed before, it will nag you about msvcr100.dll and you need to manually copy them as well but I don't recommended that.
Download the zip version of RabbitMQ server from https://www.rabbitmq.com/install-windows-manual.html and extract it.
Set %ERLANG_HOME% variable. You can type set ERLANG_HOME="C:\\Users\\Limited_Account\\AppData\\erl5.10.4\" in command line. Alternatively, you can add this line to every .bat in the sbin folder.
Now you can use the management scripts in the sbin folder. For example, you can use rabbitmq_server-3.2.4\sbin\rabbitmq-server.bat to start the RabbitMQ Server. Obviously, starting as a service is not an option because you are not an admin.
For further information, see: https://www.rabbitmq.com/install-windows-manual.html
I'm currently running python on Linux machine and have a windows XP guest running on vbox.
I want to access the shared folder on the xp machine. i tried the following command but always get the same error.
d = os.listdir(r"\\remoteip\share")
OSError: [Errno 2] No such file or directory
the shared folder on xp was created by creating a new folder in the Shared Documents folder and I'm able to ping machines.
Windows sharing is implemented using smb protocol. Windows Explorer and most of the Linux file managers (like Nautilus) make it transparent to the user, so it is easy to do common file operations on files\folders shared through smb.
However, Linux (and thus Python that runs on top of it) does not add this abstraction by default on file system level (though you can mount smb share as part of your fs).
So, in the end, to access those files you can:
mount your share using mount -t cifs (man or google for details) and then access your share from Python as usual folder (to my mind this is rather kludgy solution)
use library that deals specifically with smb, like pysmb (here is the relevant docs section) and do your file operations with it's help.
Hope this will help.