We're having an issue on a particular server where a call to os.setreuid is hanging, if the uid passed is not the one for the root user.
The code is in this function (please forgive any typos - code is on a different classified system and I'm having to retype it):
import os
import pwd
def run_as(username):
user=pwd.getpwnam(username)
uid, gid = user.pw_uid, user.pw_gid
os.initgroups(username, gid)
os.setregid(gid, gid)
os.setreuid(uid, uid)
It successfully finds the user, gets the uid and gid, calls initgroups, and does the setregid. When the call is made to setreuid, it's hanging. Not returning, but not throwing errors either.
This is on a Redhat Linux server - might be RHEL7 but I think it's RHEL6. It works on other servers. On the problem server, we've checked /etc/passwd and /etc/shadow. We've deleted the user and remade it. If we pass "root" as the username, it works. Pass any other user and it hangs. Has anyone ever dealt with anything like this and have any words of wisdom?
Related
I'm using PyGitHub to update my GitHub.com repos from inside a Ubuntu server using a python script.
I have noticed there are times, when my script just hangs there and there's no error message indicating what went wrong.
This is my script
from pathlib import Path
from typing import Optional
import typer
from github import Github, GithubException
app = typer.Typer()
#app.command()
def add_deploy_key(
token: Optional[str] = None, user_repo: Optional[str] = None, repo_name: Optional[str] = None
):
typer.echo("Starting to access GitHub.com... ")
try:
# using an access token
g = Github(token)
# I skipped a bunch of code to save space
for key in repo.get_keys():
if str(key.key) == str(pure_public_key):
typer.echo(
"We found an existing public key in " + user_repo + ", so we're NOT adding it"
)
return
rep_key = repo.create_key(
"DigitalOcean for " + repo_name, current_public_key, read_only=True
)
if rep_key:
typer.echo("Success with adding public key to repo in GitHub.com!")
typer.echo("")
typer.echo("The url to the deposited key is: " + rep_key.url)
else:
typer.echo("There's some issue when adding public key to repo in GitHub.com")
except GithubException as e:
typer.echo("There's some issue")
typer.echo(str(e))
return
if __name__ == "__main__":
app()
The way I trigger is inside a bash script
output=$(python /opt/github-add-deploy-keys.py --token="$token" --user-repo="$user_repo" --repo-name="$repo_name")
It works. But sometimes it just hangs there without any output. And since it happens intermittently and not consistently, it's hard to debug.
I cannot be sure if it's a typer issue or a network issue or a GitHub.com issue. There's just nothing.
I want it to fail fast and often. I know there's a timeout and a retry for the GitHub object.
See https://pygithub.readthedocs.io/en/latest/github.html?highlight=retry#github.MainClass.Github
I wonder if I can do anything with these two parameters so at least I have a visual of knowing something is being done. I can add a lot of typer.echo statements but that would be extremely verbose.
I'm also unfamiliar with the retry object. I wish even if a retry is made, there will be some echo statements to tell me a retry is being attempted.
What can I try?
The timeout should prevent the github request to hang and retries should make it work but based on the documentation there is a default timeout already. Since this question is for debugging I would suggest using the logging python library to just the steps your script runs in a file. You can find a good logging tutorial here.
As for the logging style, since your case has a lot of unknowns I would log when the script starts, before "Create Key" step and after "Create Key" an maybe in case of errors. You can go over the log file when the script hangs.
You can create a bash script to run your script with timeout and get a notification if the program exits with non zero exit code and leave it to run over night:
timeout 5 python <yourscript>.py
Ok, so it's possible that the answer to this question is simply "stop using parallel-ssh and write your own code using netmiko/paramiko. Also, upgrade to python 3 already."
But here's my issue: I'm using parallel-ssh to try to hit as many as 80 devices at a time. These devices are notoriously unreliable, and they occasionally freeze up after giving one or two lines of output. Then, the parallel-ssh code hangs for hours, leaving the script running, well, until I kill it. I've jumped onto the VM running the scripts after a weekend and seen a job that's been stuck for 52 hours.
The relevant pieces of my first code, the one that hangs:
from pssh.pssh2_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
result = client.run_command(cmd, stop_on_errors=False)
return result
The next thing I tried was the channel_timout option, because if it takes more than 4 minutes to get the command output, then I know that the device froze, and I need to move on and cycle it later in the script:
from pssh.pssh_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, channel_timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
result = client.run_command(cmd, stop_on_errors=False)
return result
This version never actually connects to anything. Any advice? I haven't been able to find anything other than channel_timeout to attempt to kill an ssh session after a certain amount of time.
The code is creating a client object inside a function and then returning only the output of run_command which includes remote channels to the SSH server.
Since the client object is never returned by the function it goes out of scope and gets garbage collected by Python which closes the connection.
Trying to use remote channels on a closed connection will never work. If you capture stack trace of the stuck script it is most probably hanging at using remote channel or connection.
Change your code to keep the client alive. Client should ideally also be reused.
from pssh.pssh2_client import ParallelSSHClient
def remote_ssh(ip_list, ssh_user, ssh_pass, cmd):
client = ParallelSSHClient(ip_list, user=ssh_user, password=ssh_pass, timeout=180, retry_delay=60, pool_size=100, allow_agent=False)
result = client.run_command(cmd, stop_on_errors=False)
return client, result
Make sure you understand where the code is going wrong before jumping to conclusions that will not solve the issue, ie capture stack trace of where it is hanging. Same code doing the same thing will break the same way..
SOLVED:
I figured out the fix for this was adding env.abort_on_prompts = True to my fab file
This is a very specific question ... but I have a python definition that checks for the OS version of a specific host. It goes line by line through the list, attempts to connect to the host and outputs the OS information it finds. But that is just background ...
So my real question is if I can skip the hosts I cannot access. Many hosts it does fine with, but it will hit one where the screen prompts for "Login password for 'yourAdminUser':". I want to know if there is a way that I can get the script to realize when this is being output to the console, that it should terminate that connection attempt and then move on to the next line.
I would paste my code but it is only a few lines and I have nothing in it to expect this error of a password that I do not have.
Thanks
EDIT : I've pasted my definition below.
def get_os():
put(local_path="scripts/check_OS.ksh",remote_path="/tmp/check_OS.ksh")
sudo('chmod u+x /tmp/check_OS.ksh')
output = sudo("/tmp/check_OS.ksh")
print green("OS: {}".format(output))
I have two servers A and B. I'm suppose to send, let said an image file, from server A to another server B. But before server A could send the file over I would like to check if a similar file exist in server B. I try using os.path.exists() and it does not work.
print os.path.exists('ubuntu#serverB.com:b.jpeg')
The result return a false even I have put an exact file on server B. I'm not sure whether is it my syntax error or is there any better solution to this problem. Thank you
The os.path functions only work on files on the same computer. They operate on paths, and ubuntu#serverB.com:b.jpeg is not a path.
In order to accomplish this, you will need to remotely execute a script. Something like this will work, usually:
def exists_remote(host, path):
"""Test if a file exists at path on a host accessible with SSH."""
status = subprocess.call(
['ssh', host, 'test -f {}'.format(pipes.quote(path))])
if status == 0:
return True
if status == 1:
return False
raise Exception('SSH failed')
So you can get if a file exists on another server with:
if exists_remote('ubuntu#serverB.com', 'b.jpeg'):
# it exists...
Note that this will probably be incredibly slow, likely even more than 100 ms.
Root priv can't be dropped in python even after seteuid. A bug?
EDIT Summary: I forgot to drop gid. The accepted answer may help you, though.
Hi. I can't drop the root privilege in python 3.2 on my linux. In fact, even after seteuid(1000), it can read root-owned 400-mode files. The euid is surely set to 1000!
I found after empty os.fork() call, the privileged access is correctly denied. (But it's only in the parent. The child can still read illegitimately.) Is it a bug in python, or is linux so?
Try the code below. Comment out one of the three lines at the bottom, and run as root.
Thanks beforehand.
#!/usr/bin/python3
# Python seteuid pitfall example.
# Run this __as__ the root.
# Here, access to root-owned files /etc/sudoers and /etc/group- are tried.
# Simple access to them *succeeds* even after seteuid(1000) which should fail.
# Three functions, stillRoot(), forkCase() and workAround() are defined.
# The first two seem wrong. In the last one, access fails, as desired.
# ***Comment out*** one of three lines at the bottom before execution.
# If your python is < 3.2, comment out the entire def of forkCase()
import os
def stillRoot():
"""Open succeeds, but it should fail."""
os.seteuid(1000)
open('/etc/sudoers').close()
def forkCase():
"""Child can still open it. Wow."""
# setresuid needs python 3.2
os.setresuid(1000, 1000, 0)
pid = os.fork()
if pid == 0:
# They're surely 1000, not 0!
print('uid: ', os.getuid(), 'euid: ', os.geteuid())
open('/etc/sudoers').close()
print('open succeeded in child.')
exit()
else:
print('child pid: ', pid)
open('/etc/group-').close()
print('parent succeeded to open.')
def workAround():
"""So, a dummy fork after seteuid is necessary?"""
os.seteuid(1000)
pid = os.fork()
if pid == 0:
exit(0)
else:
os.wait()
open('/etc/group-').close()
## Run one of them.
# stillRoot()
# forkCase()
# workAround()
Manipulating process credentials on Unix systems is tricky. I highly recommend gaining a thorough understanding of how the Real, Effective, and Saved-Set user ids are interrelated. It's very easy to screw up "dropping privileges".
As to your specific observations... I'm wondering if there's a simple cause you may have overlooked. Your code is preforming a inconsistent tests and you've neglected to specify the exact file permissions on your /etc/sudoers and /etc/group- files. Your could would be expected to behave exactly as you describe if /etc/sudoers has permissions mode=440, uid=root, gid=root (which are the default permissions on my system) and if /etc/group- has mode=400.
You're not modifying the process's GID so if /etc/sudoers is group-readable, that would explain why it's always readable. fork() does not modify process credentials. However, it could appear to do so in your example code since you're checking different files in the parent and child. If /etc/group- does not have group read permissions where /etc/sudoers does, that would explain the apparent problem.
If all you're trying to do is "drop privileges", use the following code:
os.setgid( NEW_GID )
os.setuid( NEW_UID )
Generally speaking, you'll only want to manipulate the effective user id if your process needs to toggle it's root permissions on and off over the life of the process. If you just need to do some setup operations with root permissions but will no longer require them after those setup operations are complete, just use the code above to irrevokably drop them.
Oh, and a useful debugging utility for process credential manipulation on Linux is to print the output of /proc/self/status, the Uid and Gid lines of this file display the real, effective, saved-set, and file ids held by the current process (in that order). The Python APIs can be used to retrieve the same information but you can consider the contents of this file as "truth data" and avoid any potential complications from Python's cross-platform APIs.