authGSSServerInit extremely slow - python

I am implementing a single sign-on mechanism for a Flask server running on Ubuntu 16.04 that authenticates users against an Active Directory server in the Windows domain.
When I run the example app from https://github.com/mkomitee/flask-kerberos/tree/master/example on the Flask server, I can access the Flask server from a client computer that's logged in, the server correctly negotiates access and returns the name of the logged in user. However, this is very slow, taking about two minutes.
Following the steps of what happens in flask-kerberos, I found that the process stalls at the authGSSServerInit step. I can reproduce the behaviour using the following minimal progam:
import kerberos
rc, state = kerberos.authGSSServerInit("HTTP#flaskserver.mydomain.local")
The initalisation finishes successfully, but it takes about two minutes again.
I have successfully registered the service principal (HTTP/flaskserver.mydomain.local) on the AD server and exported the keytab to the Flask server. I can get a ticket granting ticket on the Flask server using kinit -k HTTP/flaskserver.mydomain.local. I can also verify passwords in Python using the kerberos library:
import kerberos
kerberos.checkPassword('username', 'password', 'HTTP/flaskserver.mydomain.local', 'MYDOMAIN.LOCAL'
This runs correctly and almost instantly.
What could be the cause for the delay in running kerberos.authGSSServerInit? How do I debug this?

The delay was caused by a failing reverse DNS lookup for the hostname. host flaskserver correctly returned the IP, but host <ip-of-flaskserver> returned a Host <ip-of-flaskserver>.in-addr.arpa not found: 2(SERVFAIL).
As described at https://web.mit.edu/kerberos/krb5-1.13/doc/admin/princ_dns.html, disabling the reverse DNS lookup in the krb5.conf solved the problem:
[libdefaults]
rdns = false

Related

Streamlit server configuration on remote HTTPS using Azure Compute Instance

I am trying to host a Streamlit app on Azure Compute Instance resource.
It appears that accessing the instance is possible through https://{instanceName}-{internalPort}.northeurope.instances.azureml.ms (with an Azure-provided security layer in between).
To smoketest this I created a simple Flask app and verified I could access it: I was able to access my dummy app on https://[REDACTED]-5000.northeurope.instances.azureml.ms/.
Attempt 1: Basic Configuration
Now I want to serve my Streamlit app. Initially I wanted to eliminate error sources and simply check if wires are connected correctly, and my app is simply:
import streamlit as st
st.title("Hello, World!")
Running this streamlit app (streamlit run sl_app.py) gives:
2022-03-28 11:49:38.932 Trying to detect encoding from a tiny portion of (13) byte(s).
2022-03-28 11:49:38.933 ascii passed initial chaos probing. Mean measured chaos is 0.000000 %
2022-03-28 11:49:38.933 ascii is most likely the one. Stopping the process.
You can now view your Streamlit app in your browser.
Network URL: http://[REDACTED]:8501
External URL: http://[REDACTED]:8501
Trying to access this through https://[REDACTED]-8501.northeurope.instances.azureml.ms/ I can access the app, but the "Please wait..." indicator appears indefinitely:
Attempt 2: Updated Streamlit Config
Inspired by App is not loading when running remotely Symptom #2 I created a Streamlit config.toml with a reconfiguring of server/browser access points, and ended up with the following:
[browser]
serverAddress = "[REDACTED]-8501.northeurope.instances.azureml.ms"
serverPort = 80
gatherUsageStats = false
[server]
port = 8501
headless = true
enableCORS = false
enableXsrfProtection = false
enableWebsocketCompression = false
Running the app now gives:
You can now view your Streamlit app in your browser.
URL: http://[REDACTED]-8501.northeurope.instances.azureml.ms:80
However, I still get the infinite Please wait-indicator. Diving a little bit deeper reveals something related to a wss stream? Whatever that is?
I suspect that what I'm seeing is due to the fact that Azure automatically pipes my request from http:// to https://, and this for some reason rejects the stream component that Streamlit uses?
Note: Various IP addresses and hostnames are REDACTED for the sake of security :-)
The major issue happened here is to access the SSL certificate. Here is a perfect guide to follow to deploy streamlit on Azure.
https://towardsdatascience.com/beginner-guide-to-streamlit-deployment-on-azure-f6618eee1ba9
https://towardsdatascience.com/beginner-guide-to-streamlit-deployment-on-azure-part-2-cf14bb201b8e
First link is to deploy without any errors and second link is to activate with SSL certificate to make the URL as HTTPS from HTTP.

Why does this gRPC call from the Google Secret Manager API hang when run by Apache?

In short:
I have a Django application being served up by Apache on a Google Compute Engine VM.
I want to access a secret from Google Secret Manager in my Python code (when the Django app is initialising).
When I do 'python manage.py runserver', the secret is successfully retrieved. However, when I get Apache to run my application, it hangs when it sends a request to the secret manager.
Too much detail:
I followed the answer to this question GCP VM Instance is not able to access secrets from Secret Manager despite of appropriate Roles. I have created a service account (not the default), and have given it the 'cloud-platform' scope. I also gave it the 'Secret Manager Admin' role in the web console.
After initially running into trouble, I downloaded the a json key for the service account from the web console, and set the GOOGLE_APPLICATION_CREDENTIALS env-var to point to it.
When I run the django server directly on the VM, everything works fine. When I let Apache run the application, I can see from the logs that the service account credential json is loaded successfully.
However, when I make my first API call, via google.cloud.secretmanager.SecretManagerServiceClient.list_secret_versions , the application hangs. I don't even get a 500 error in my browser, just an eternal loading icon. I traced the execution as far as:
grpc._channel._UnaryUnaryMultiCallable._blocking, line 926 : 'call = self._channel.segregated_call(...'
It never gets past that line. I couldn't figure out where that call goes so I couldnt inspect it any further than that.
Thoughts
I don't understand GCP service accounts / API access very well. I can't understand why this difference is occurring between the django dev server and apache, given that they're both using the same service account credentials from json. I'm also surprised that the application just hangs in the google library rather than throwing an exception. There's even a timeout option when sending a request, but changing this doesn't make any difference.
I wonder if it's somehow related to the fact that I'm running the django server under my own account, but apache is using whatever user account it uses?
Update
I tried changing the user/group that apache runs as to match my own. No change.
I enabled logging for gRPC itself. There is a clear difference between when I run with apache vs the django dev server.
On Django:
secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x17cfda0, target=secretmanager.googleapis.com:443, args=0x7fe254620f20, reserved=(nil))
init.cc:167] grpc_init(void)
client_channel.cc:1099] chand=0x2299b88: creating client_channel for channel stack 0x2299b18
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
client_channel.cc:1879] chand=0x2299b88 calld=0x229e440: created call
...
call.cc:1980] grpc_call_start_batch(call=0x229daa0, ops=0x20cfe70, nops=6, tag=0x7fe25463c680, reserved=(nil))
call.cc:1573] ops[0]: SEND_INITIAL_METADATA...
call.cc:1573] ops[1]: SEND_MESSAGE ptr=0x21f7a20
...
So, a channel is created, then a call is created, and then we see gRPC start to execute the operations for that call (as far as I read it).
On Apache:
secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x7fd5bc850f70, target=secretmanager.googleapis.com:443, args=0x7fd583065c50, reserved=(nil))
init.cc:167] grpc_init(void)
client_channel.cc:1099] chand=0x7fd5bca91bb8: creating client_channel for channel stack 0x7fd5bca91b48
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
So, we a channel is created... and then nothing. No call, no operations. So the python code is sitting there waiting for gRPC to make this call, which it never does.
The problem appears to be that the forking behaviour of Apache breaks gRPC somehow. I couldn't nail down the precise cause, but after I began to suspect that forking was the issue, I found this old gRPC issue that indicates that forking is a bit of a tricky area.
I tried to reconfigure Apache to use a different 'Multi-processing Module', but as my experience in this is limited, I couldn't get gRPC to work under any of them.
In the end, I switched to using nginx/uwsgi instead of Apache/mod_wsgi, and I did not have the same issue. If you're trying to solve a problem like this and you have to use Apache, I'd advice further investigating Apache forking, how gRPC handles forking, and the different MPMs available for Apache.
I'm facing a similar issue. When running my Flask Application with eventlet==0.33.0 and gunicorn https://github.com/benoitc/gunicorn/archive/ff58e0c6da83d5520916bc4cc109a529258d76e1.zip#egg=gunicorn==20.1.0. When calling secret_client.access_secret_version it hangs forever.
It used to work fine with an older eventlet version, but we needed to upgrade to the latest version of eventlet due to security reasons.
I experienced a similar issue and I was able to solve with the following:
import grpc.experimental.gevent as grpc_gevent
from gevent import monkey
from google.cloud import secretmanager
monkey.patch_all()
grpc_gevent.init_gevent()
client = secretmanager.SecretManagerServiceClient()

Blacklist connections in WSGIserver

I’m running a simple WSGIsever with following code
botapp = bottle.app()
server = WSGIServer((myAddress, int(myPort)), botapp)
It’s logging more and more failed login attempts.
Searched for blacklist in WSGI documentation (https://uwsgi-docs.readthedocs.io/en/latest/Options.html)
How to better implement dynamic update of blacklist, e.g. when new recurring failed attempt is detected, IP is added to blacklist and applied to new connections.
Much appreciate your input.

Azure Functions IP addresses out of range

I have a Azure Function, which makes calculations and stores and reads data from my own Cosmos DB and one external database via REST API.
From Azure Portal, I can see the "outboundIpAddresses" and "possibleOutboundIpAddresses" (subscriptions > {your subscription} > providers > Microsoft.Web > sites). Totally 12 IP addresses. When I run the function locally (VS Code), everything goes smoothly. However, when I deploy that function, I get the following error:
Result: Failure Exception: CosmosHttpResponseError: (Forbidden) Request originated from client IP <IP-address> through public internet. This is blocked by your Cosmos DB account firewall settings
This itself is self-explanatory, but the problem is that the IP-address mentioned in the error message does not belong neither "outboundIpAddresses" or "possibleOutboundIpAddresses". And almost every time the function gets triggered, the client IP in the error message changes.
Do you have any ideas why this happens and how to solve the issue?
Is your function app in Consumption plan? If yes, when a function app that runs on the Consumption plan is scaled, a new range of outbound IP addresses may be assigned. When running on the Consumption plan, you may need to whitelist the entire data center.
On further note, if you are into app service plan, you have the option of assigning dedicated IP address .

Configure consul for dynamic health check services

I have a consul stack with 2 hosts (for testing). 1 host only run consul in bootstrap mode and the other one run client mode with Registrator for automatically registering services (both run on docker). And now, if I start an application (port 8080 for example) container, Registrator will detect then register it to consul, but it does not have http-check as I want. I found that Registrator has option for auto register health check is add SERVICE_8080_CHECK_HTTP: '/' to application container, it work pretty good. At this point I have a problem, if I docker stop application container, there is no health check for this app so I can't get the status to write some stuff for alert or replace failed app. So the question is, how can I got dynamic health check services but still get status passing or failed or warning or critical
Thanks
Registrator de-registers the service when you stop the container. If you have multiple instances of that service it shouldn't be a problem.
If this is your use-case after all, don't use Registrator for service registration, you can use Consul's HTTP API to register the service or include a service definition file for the agent.
In any case, uou really shouldn't run a single Consul server - https://www.consul.io/intro/index.html

Categories