Python errno.ESTALE

Python errno.ESTALE - python

I am getting python errno.ESTALE error on red hat5.4 NFSv3 with cache enabled.
I looked up and found that:
"A filehandle becomes stale whenever the file or directory referenced by the handle is removed by another host, while your client still holds an active reference to the object. A typical example occurs when the current directory of a process, running on your client, is removed on the server (either by a process running on the server or on another client)."
I found it that if you chown or listdir, etc. you can flush the cache and hence it wont be stale but this approch hasnt worked for me.
Anyone have other solutions?

I assume this is NFS and you are running a client on Linux.
You should try to remount your NFS filesystem.
like this:
$ mount -o remount [your filesystem]
also you could try to flush the cache as you mentioned.
# To free pagecache
$ echo 1 > /proc/sys/vm/drop_caches
# To free dentries and inodes
$ echo 2 > /proc/sys/vm/drop_caches
# To free pagecache, dentries and inodes
$ echo 3 > /proc/sys/vm/drop_caches

Related

Automated Backups from EC2 to NAS using Rsync

I am trying to automate backups using a raspberry pi with a python script that will rsync everything on the EC2 instances CAMSTEST1 and CAMSProd in the respective backup directory to the on premise NAS.
Hear is the script
#!/usr/bin/python3
import subprocess
from os import path
# private key for AWS Servers
AWS_PRIVATE_KEY = "ARCS-Key-Pair-01.pem"
AWS_USER ="ubuntu"
# NAS backup directory
NAS_BACKUP_DIR = "192.168.1.128:/test"
NAS_MNT = "/mnt/nasper01/"
# CAMSTest1 Config
CAMSTEST1_USER = "ubuntu"
CAMSTEST1_IP = "52.62.119.203"
CAMSTEST1_DIR = "/mnt/backups/*"
CAMSTEST1_MNT = NAS_MNT + "camstest"
#CAMSProd Config
CAMSPROD_USER = "ubuntu"
CAMSPROD_IP = "54.206.28.116"
CAMSPROD_DIR = "/mnt/backups/*"
CAMSPROD_MNT = NAS_MNT + "camsprod"
# mount NAS
print("Mounting NAS")
subprocess.call(["mount","-t", "nfs", NAS_BACKUP_DIR, NAS_MNT])
print("NAS Mounted Successfully")
# backup CAMSTEST1
print("Backing Up CAMSTest1")
hostnamefs = "{user}#{ip}:{dir}".format(user=CAMSTEST1_USER,ip=CAMSTEST1_IP,dir=CAMSTEST1_DIR)
sshaccess = 'ssh -i {private_key}'.format(private_key=AWS_PRIVATE_KEY)
subprocess.call(["rsync","-P","-v","--rsync-path","sudo rsync","--remove-source-files","--recursive","-z","-e",sshaccess,"--exclude","/backup-script",hostnamefs, CAMSTEST1_MNT ])
print("Backed Up CAMSTest1 Successfully")
#backup CAMSPROD
print("Backing Up CAMSProd")
hostnamefs = "{user}#{ip}:{dir}".format(user=CAMSPROD_USER,ip=CAMSPROD_IP,dir=CAMSPROD_DIR)
sshaccess = 'ssh -i {private_key}'.format(private_key=AWS_PRIVATE_KEY)
subprocess.call(["rsync","-P","-v","--rsync-path", "sudo rsync","--remove-source-files","--recursive","-z","-e",sshaccess,"--exclude","/backup-script","--exclude","/influxdb-backup", "--exclude", "/db-backup-only",hostnamefs, CAMSPROD_MNT ])
print("Backed Up CAMSProd Successfully")
Hear is the cronjob
0 0 * * 0 sudo python3 /home/pi/backup/backupscript.py >> /home/pi/backup/backuplog
The script works perfectly when run manually from the terminal. However it does not work with a cronjob. It runs without errors but nothing is copied from teh ec2 instacnce to the NAS. Could anyone explain why its not working with a cronjob but is working in the terminal ?
EDIT
Here is the output of the the backups script log with no errors
Last RunTime:
2021-10-31 00:00:02.191447
Mounting NAS
NAS Mounted Successfully
Backing Up ARCSWeb02
Backed up ARCWeb02 Successfully
Backing Up CAMSTest1
Backed Up CAMSTest1 Successfully
Backing Up CAMSProd
Backed Up CAMSProd Successfully
Fetching origin
Last RunTime:
2021-11-07 00:00:02.264703
Mounting NAS
NAS Mounted Successfully
Backing Up ARCSWeb02
Backed up ARCWeb02 Successfully
Backing Up CAMSTest1
Backed Up CAMSTest1 Successfully
Backing Up CAMSProd
Backed Up CAMSProd Successfully

Rather than a problem with the script itself, which seems to work fine, I would suggest to run the cronjob directly as root by editing the cron file using crontab -e, or by adding a file with root (or other user which can execute the script directly) specified within the job definition into /etc/cron.d/, without specifying the sudo keyword
Here are some references on that matter:
https://askubuntu.com/questions/173924/how-to-run-a-cron-job-using-the-sudo-command
https://serverfault.com/questions/352835/crontab-running-as-a-specific-user

When you run rsync in a subprocess, it picks up a different $PATH from the non-interactive shell; therefore, rsync might not be in its path. Try using the full path to rsync in your subprocess calls.

I have few suggestions,
As suggested before, use cronjob for root user (no sudo)
remove python3 from cronjob, instead make your script executable using `chmod +x ' and run it as
0 0 * * 0 /home/pi/backup/backupscript.py >> /home/pi/backup/backuplog
Just a suggestion, I thing shell script will be more appropriate here, instead of python

DietPi - running a script manually works - but starting from postboot.d throws I/O error

I'm trying to run a script automatic when booting Raspberry with DietPi.
My script starts a Python3 programm which then at the end starts an external program MP4Box which merges 2 video files to a mp4 in a folder in my lighttp webserver.
When I start the script manually everything works. But when the script starts automatically on boot, when it comes to the external program MP4Box, I get an error:
Cannot open destination file /var/www/Videos/20201222_151210.mp4: I/O Error
Script starting my pythons is "startcam" - which lies in the folder /var/lib/dietpi/postboot.d
#!/bin/sh -e
# Autostart RaspiCam
cd /home/dietpi
rm -f trigger/*
python3 -u record_v0.1.py > record.log 2>&1 &
python3 -u motioninterrupt.py > motion.log 2>&1 &
the readme.txt in postboot.d says:
# /var/lib/dietpi/postboot.d is implemented by DietPi and allows to run scripts at the end of the boot process:
# - /etc/systemd/system/dietpi-postboot.service => /boot/dietpi/postboot => /var/lib/dietpi/postboot.d/*
# There are nearly no restrictions about file names and permissions:
# - All files (besides this "readme.txt" and dot files ".filename") are executed as root user.
# - Execute permissions are automatically added.
# NB: This delays the login prompt by the time the script takes, hence it must not be used for long-term processes, but only for oneshot tasks.
So it should also start my script with root priviledges. And that is the (part of the) Script "record_v0.1.py" that throws the error:
import os
os.system('MP4Box -fps 15 -cat /home/dietpi/b-file001.h264 -cat /home/dietpi/a-file001.h264 -new /var/www/Videos/file001.mp4 -tmp ~ -quiet')
When I start the python programs manually (logged in as root) with:
/var/lib/dietpi/postboot.d/startcam
everythin is OK and instead of the error I get the message:
Appending file /home/dietpi/Videos/b-20201222_153124.h264
No suitable destination track found - creating new one (type vide)
Appending file /home/dietpi/Videos/a-20201222_153124.h264
Saving /var/www/Videos/20201222_153124.mp4: 0.500 secs Interleaving
Thanks for every hint

Contrary to the description, the scripts in postboot.d are not excuted as root. So I changed my script to:
#!/bin/sh -e
# Autostart RaspiCam
cd /home/dietpi
rm -f trigger/*
sudo python3 -u record_v0.1.py > record.log 2>&1 &
sudo python3 -u motioninterrupt.py > motion.log 2>&1 &
Now they are running as root and everything works as wanted.

Automatically mounting host folders other than c:\Users in Docker images in Windows

I've got a program that needs to automatically install & manage some Docker containers on Windows with minimal user input.
It needs to automatically setup Docker to mount arbitrary Windows folders. It needs to do this from a clean install, where the Docker VM cannot be assumed to have been created.
Docker by default will allow almost any folder in C:\Users to mount through to its Boot2Docker image, which in turn makes them available for mounting into Docker images themselves.
I'd like a way to automatically modify the default mount script from outside the VM so that I can use other folders, but "VBoxManage.exe run", copyto, etc. commands don't work on Boot2Docker in any way, unlike other Linux VMs I have.
So, in my quest for a solution, I stumbled upon py-vbox, which lets you easily send keyboard events to the console using the VirtualBox API. It also allows for direct console sessions, but they fail just like VBoxManage.exe does. So, this ended with me sending lots of
echo command >> /c/script.sh
commands over the keyboard in order to setup a script that will mount the extra volumes. Is there a better way?
For anyone who might need it, here's a very simplified version of what goes on. The first two bits are the old .bat files, so that they apply to anyone. First, to create our docker VM:
set PATH=%PATH%;"c:\Program Files (x86)\Git\bin"
docker-machine create --driver virtualbox my-docker-vm
"C:\Program Files\Oracle\VirtualBox\VBoxManage.exe" sharedfolder add "my-docker-vm" --name "c/myfolder" --hostpath "c:\myfolder" --automount
"C:\Program Files\Oracle\VirtualBox\VBoxManage.exe" setextradata "my-docker-vm" VBoxInternal2/SharedFoldersEnableSymlinksCreate/c/myfolder 1
Then, the docker VM must be started...
"C:\Program Files\Oracle\VirtualBox\VBoxManage.exe" startvm --type=headless my-docker-vm
set PATH=%PATH%;"c:\Program Files (x86)\Git\bin"
docker-machine env --shell cmd my-docker-vm > temp.cmd
call temp.cmd
del temp.cmd
Now, a simplified version of the Python script to write a simplified mount script into the VM via the keyboard using py-vbox:
import virtualbox
script = """\n\
echo if [ ! -d /c/myfolder ] > /c/script.sh\n\
echo then >> /c/script.sh\n\
echo mkdir -p /c/myfolder >> /c/script.sh\n\
echo mount -t vboxsf c/myfolder /c/myfolder >> /c/script.sh\n\
echo fi >> /c/script.sh\n\
chmod +x /c/script.sh\n\
/bin/sh /c/script.sh\n\
rm /c/script.sh\n\
"""
my_vm_name = 'my-docker-vm'
def mount_folder():
vbox = virtualbox.VirtualBox()
is_there = False
for vmname in vbox.machines:
if str(vmname) == my_vm_name:
is_there = True
break
if is_there is False:
raise whatever
return
vm = vbox.find_machine(my_vm_name)
session = vm.create_session()
session.console.keyboard.put_keys(script)

as discussed in the comments:
The C:\Users folder is shared with the VM using the sharedfolders feature of VirtualBox. Just add another sharedfolder and you are done. This is possible from the commandline via VBoxManage sharedfolder add <uuid|vmname> --name <name> --hostpath <path> [--transient] [--readonly] [--automount]. You probably need to restart the VM afterwards.
Another option in newer Windows versions is to just mount whatever folder you want somewhere inside the C:\Users folder, e.g. C:\Users\myuser\dockerdata.

OpenMPI: Permission denied error while trying to use mpirun

I would like to display "hello world" via MPI on different Google cloud compute instances with the help of the following code:
from mpi4py import MPI
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
print("Hello, World! I am process/rank {} of {} on {}.\n".format(rank, size, name))
.
The problem is, that even so I can ssh-connect across all of these instances without problem, I get a permission denied error message when I try to run my script. I use following command to envoke my script:
mpirun --host localhost,instance_1,instance_2 python hello_world.py
.
And get the following error message:
Permission denied (publickey).
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
.
Additional information:
I installed open-MPI on all of my nodes
I have Google automatically set all of my ssh-keys by using gcloud to log into each instance from each instance
instance-type: n1-standard-1
instance-OS: Linux Debian (default)
.
Thanks you for your help :-)
.
New Information:
(thanks # Zulan for pointing out that I should edit my previous post instead of creating a new answer for new information)
So, I tried to do the same with mpich instead of openmpi. However, I run into a similar error message.
Command:
mpirun --host localhost,instance_1,instance_2 python hello_world.py
.
Error message:
Host key verification failed.
.
I can ssh-connect between my two instances without problems, and through the gcloud commands the ssh-keys should automatically be set up properly.
So, has somebody an idea what the problem could be? I also checked the path, the firewall rules, and my ability to write startup scripts in the temp-folder. Can someone please try to recreate this problem? + Should I raise this question to Google? (never done such thing before, Im quite unsure :S)
Thanks for helping :)

so I finally found a solution. Wow, problem was driving me nuts.
So it turned out, that I needed to generate ssh-keys manually for the script to work. I have no idea why, because google-services already set up the keys by using
gcloud compute ssh , but well, it worked :)
Steps I did:
instance_1 $ ssh-keygen -t rsa
instance_1 $ cd .ssh
instance_1 $ cat id_rsa.pub >> authorized_keys
instance_1 $ gcloud compute copy-files id_rsa.pub
instance_1 $ gcloud compute ssh instance_2
instance_2 $ cd .ssh
instance_2 $ cat id_rsa.pub >> authorized_keys
.
I will open another topic and ask why I cannot use ssh instance_2, even so gcloud compute ssh instance_2 is working. See: Difference between the commands "gcloud compute ssh" and "ssh"

How can I know if my python script is running? (using Cygwin or Windows shell)

I have a python script named sudoserver.py that I start in a CygWin shell by doing:
python sudoserver.py
I am planning to create a shell script (I don't know yet if I will use Windows shell script or a CygWin script) that needs to know if this sudoserver.py python script is running.
But if I do in CygWin (while sudoserver.py is running):
$ ps -e | grep "python" -i
11020 10112 11020 7160 cons0 1000 00:09:53 /usr/bin/python2.7
and in Windows shell:
C:\>tasklist | find "python" /i
python2.7.exe 4344 Console 1 13.172 KB
So it seems I have no info about the .py file being executed. All I know is that python is running something.
The -l (long) option for 'ps' on CygWin does not find my .py file. Nor does it the /v (verbose) switch at tasklist.
What should be the appropriate shell (Windows or CygWin shell would enough; both if possible would be fine) way to programmatically find if an specific python script is executing right now?
NOTE: The python process could be started by another user. Even from a user not logged in a GUI shell, and, even more, the "SYSTEM" (privileged) Windows user.

It is a limitation of the platform.
You probably need to use some low level API to retrieve the process info. You can take a look at this one: Getting the command line arguments of another process in Windows
You can probably use win32api module to access these APIs.
(Sorry, away from a Windows PC so I can't try it out)

Since sudoserver.py is your script, you could modify it to create a file in an accessible location when it starts and to delete the file when it finishes. Your shell script can then check for the existence of that file to find out if sudoserver.py is running.
(EDIT)
Thanks to the commenters who suggested that while the presence or absence of the file is an unreliable indicator, a file's lock status is not.
I wrote the following Python script testlock.py:
f = open ("lockfile.lck","w")
for i in range(10000000):
print (i)
f.close()
... and ran it in a Cygwin console window on my Windows PC. At the same time, I had another Cygwin console window open in the same directory.
First, after I started testlock.py:
Simon#Simon-PC ~/test/python
$ ls
lockfile.lck testlock.py
Simon#Simon-PC ~/test/python
$ rm lockfile.lck
rm: cannot remove `lockfile.lck': Device or resource busy
... then after I had shut down testlock.py by using Ctrl-C:
Simon#Simon-PC ~/test/python
$ rm lockfile.lck
Simon#Simon-PC ~/test/python
$ ls
testlock.py
Simon#Simon-PC ~/test/python
$
Thus, it appears that Windows is locking the file while the testlock.py script is running but it is unlocked when it is stopped with Ctrl-C. The equivalent test can be carried out in Python with the following script:
import os
try:
os.remove ("lockfile.lck")
except:
print ("lockfile.lck in use")
... which correctly reports:
$ python testaccess.py
lockfile.lck in use
... when testlock.py is running but successfully removes the locked file when testlock.py has been stopped with a Ctrl-C.
Note that this approach works in Windows but it won't work in Unix because, according to the Python documentation:
On Windows, attempting to remove a file that is in use causes
an exception to be raised; on Unix, the directory entry is removed
but the storage allocated to the file is not made available until
the original file is no longer in use.
A platform-independent solution using an additional Python module FileLock is described in Locking a file in Python.
(FURTHER EDIT)
It appears that the OP didn't necessarily want a solution in Python. An alternative would be to do this in bash. Here is testlock.sh:
#!/bin/bash
flock lockfile.lck sequence.sh
The script sequence.sh just runs a time-consuming operation:
#!/bin/bash
for i in `seq 1 1000000`;
do
echo $i
done
Now, while testlock.sh is running, we can test the lock status using another variant on flock:
$ flock -n lockfile.lck echo "Lock acquired" || echo "Could not acquire lock"
Could not acquire lock
$ flock -n lockfile.lck echo "Lock acquired" || echo "Could not acquire lock"
Could not acquire lock
$ flock -n lockfile.lck echo "Lock acquired" || echo "Could not acquire lock"
Lock acquired
$
The first two attempts to lock the file failed because testlock.sh was still running and so the file was locked. The last attempt succeeded because testlock.sh had finished running.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.