Kaleido not working on AWS due to glibc outdated

Kaleido not working on AWS due to glibc outdated - python

I have a django application app in which there is a model booking and the idea is that when someone books
the model calculates some metrics
generates a plotly figure and exports it statically
it sends a message to the respective user via email.
Before the email had plain text. In order to test the new changes. Ive done something of this sort.
def send_notification(self):
try:
mail_msg = new_email()
mail_msg.send()
except Exception as e:
logger.error(f"failed to send new booking mail due to {e}"
mail_msg = old_mail()
mail_msg.send()
Working, and debugging locally had no issues with using kaleido as the export engine when attaching the image to new_mail, namely.
fig.to_image("png", engine="kaleido")
Pipeline is as follows, FE book -> SQS -> send_notification()
On AWS I did not get any errors when the new code was deployed, and I also did not get any mails, not the new ones, not the old ones. After 1 hour, the 10 attempts that I've made powered through, both in email and in the logs naturally. But I just got the old emails, as I expected.
Logs show
Error initializing NSS with a persistent database (sql:/home/webapp/.pki/nssdb): /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /var/app/venv/staging-LQM1lest/lib/python3.8/site-packages/kaleido/executable/lib/libsqlite3.so.0)
[ERROR] failed to send new booking mail due to 'Transform failed. Error stream'
I understand that upgrading glibc standalone can break your system quite easily. I am going to try downgrading kaleido.
Is there other thing I could try to solve this? All the options I can think of are
downgrading kaleido (it would be amazing to check which version of glibc they require first)
creating a lambda just to fig.to_image()

Related

ToDoist API Python SDK throwing 410 Error

Background: I'm trying to write a Python script that creates a task in ToDoist using their REST API Python SDK, based on the charge percentage of my dog's Fi Collar (obtained via Pytryfi). Basically, if the Fi collar's battery falls below a certain threshold, create a new task.
Problem: A 401 error is returned when trying to use the todoist_api_python SDK, copied exactly from Todoist's website.
Following these instructions, I was able to install todoist-api-python, but when I run this code (using a real API key):
pip install todoist-api-python
from todoist_api_python.api import TodoistAPI
api = TodoistAPI("XXXXXXX")
try:
projects = api.get_projects()
print(projects)
except Exception as error:
print(error)
I receive this error:
410 Client Error: Gone for url: https://api.todoist.com/rest/v1/projects
I do know there has been a recent change from v1->v2 of this API, and indeed when I put the URL above from the error message in to a browser with ../v2/projects, I see a list of my tasks.
I don't know how to make the todoist-api-python SDK point to the new URL. I'd really appreciate any help you can offer! Thanks!

Be sure you're using the version 2 of todoist-api-python, as this is the one that uses the latest version of the API: https://pypi.org/project/todoist-api-python/
You're probably using an old version that relies on REST API v1.

EasyAuth on Azure Function App errors out custom oidc provider

We have a Python Linux azure function that is connected to a custom oidc provider and azure ad to provide authentication to the HTTP triggered functions using Microsofts easyauth.
After the initial setup, the azure function was working and has been working for the last few months.
In the last 2 days, our application suddenly started to error out on our custom provider, the azure ad authentication is still working, after checking the easyauth logs, we see the error
System.PlatformNotSupportedException: Windows Cryptography Next Generation (CNG) is not supported on this platform.
No changes were made on either the custom oidc provider or the azure function in the last 2 days.
We suspect that maybe the base easyauth docker image (mcr.microsoft.com/appsvc/middleware:stage2) got updated and that broke the authentication.
Any ideas or suggestions on possible fixes or even related problems?

Could it be due to this: https://github.com/Azure/app-service-announcements/issues/404
Use RSACNG when validating tokens to add PS256 support
EDIT: Also experiencing this issue as of this morning. I'm currently trying to manually downgrade the version using this command az webapp auth update --name xxx --resource-group xxx --runtime-version "1.5.1" but my Azure credentials don't have enough power to run that so I can't validate if it works or not.
EDIT2: Doesn't work if you are using auth v2.
EDIT3: It actually does work if you are using auth v2. You just have to check the help options of the command to realize that for auth v2 you have to install a CLI extension with command az extension add --name authV2. After that you can run the commands. I downgraded the version to 1.5.1 but nothing changed. I'm not sure if it has something to do with the fact that we are deploying to a slot first which probably had the new version still. I have also created an Azure support ticket about this.
EDIT4: Got in to a support call with Azure yesterday. They fixed the issue during the night. A restart of the application is required. I'm still baffled by the fact that the documentation shows that you can pinpoint the version of Easy Auth / Authentication/Authorization middleware but when I go to troubleshoot my AppService and select Easy Auth it actually shows that the pinpointed version is 1.5.1 and the running version is 1.6.2. So it just totally ignores the whole configuration. Fun, right?

we have started to see this as well on some of our instances, the worrying thing is that we have multiple running instances and it is working in some and not in some.
we "solved" the issue on one production instance by redeploying the function app, it is setup through terraform and a destroy of the function app and then a create made it work again.

Exact same issue there.
2 app services (one for prod and one for dev located in France central region) using an Azure AD app in an other Azure B2C tenant for authentication (https://learn.microsoft.com/en-us/azure/app-service/configure-authentication-provider-aad#-option-2-use-an-existing-registration-created-separately) were working for about 1 year.
Then after the deployment of a new container version of our app in the "dev" app service, the authentication broke en DEV only and we started receiving ERROR 500 message when we are being redirected to the /.auth/login/aad/callback endpoint after the authentication is done in Azure B2C.
By inspecting the app service log we have these logs :
2022-11-08T08:47:28.449645417Z [41m[30mfail[39m[22m[49m: Microsoft.AspNetCore.Server.Kestrel[13]
**2022-11-08T08:47:28.449692217Z Connection id "0HMM1CIPP8I5M", Request id "0HMM1CIPP8I5M:00000004": An unhandled exception was thrown by the application**.
2022-11-08T08:47:28.450647224Z System.PlatformNotSupportedException: Windows Cryptography Next Generation (CNG) is not supported on this platform.
2022-11-08T08:47:28.451187128Z at System.Security.Cryptography.RSACng..ctor()
2022-11-08T08:47:28.451205328Z at Microsoft.Azure.AppService.Middleware.JsonWebKey.GetSecurityKeys() in /EasyAuth/Microsoft.Azure.AppService.Middleware.Modules/JsonWebKey.cs:line 100
2022-11-08T08:47:28.451422129Z at Microsoft.Azure.AppService.Middleware.OpenIdConnectConfiguration.GetJwtValidationParameters(String siteName, String clientId, String authenticationType, String allowedAudiences) in /EasyAuth/Microsoft.Azure.AppService.Middleware.Modules/OpenIdConnectConfiguration.cs:line 114
2022-11-08T08:47:28.457668471Z at Microsoft.Azure.AppService.Middleware.AzureActiveDirectoryProvider.GetOpenIdConnectValidationParameters(ConfigManager oidcConfigManager, Boolean forceRefresh) in /EasyAuth/Microsoft.Azure.AppService.Middleware.Modules/IdentityProviders/AzureActiveDirectoryProvider.cs:line 1131
2022-11-08T08:47:28.457685071Z at Microsoft.Azure.AppService.Middleware.AzureActiveDirectoryProvider.HandleServerDirectedLoginAsync(HttpContextBase context) in /EasyAuth/Microsoft.Azure.AppService.Middleware.Modules/IdentityProviders/AzureActiveDirectoryProvider.cs:line 518
2022-11-08T08:47:28.457689872Z at Microsoft.Azure.AppService.Middleware.IdentityProviderBase.OnCompleteServerDirectedLoginAsync(HttpContextBase context) in /EasyAuth/Microsoft.Azure.AppService.Middleware.Modules/IdentityProviders/IdentityProviderBase.cs:line 655
2022-11-08T08:47:28.457693772Z at Microsoft.Azure.AppService.Middleware.IdentityProviderBase.TryHandleProtocolRequestAsync(HttpContextBase context) in /EasyAuth/Microsoft.Azure.AppService.Middleware.Modules/IdentityProviders/IdentityProviderBase.cs:line 185
2022-11-08T08:47:28.457697572Z at Microsoft.Azure.AppService.Middleware.EasyAuthModule.OnBeginRequestAsync(HttpContextBase context) in /EasyAuth/Microsoft.Azure.AppService.Middleware.Modules/EasyAuthModule.cs:line 220
2022-11-08T08:47:28.457818072Z at Microsoft.Azure.AppService.Middleware.NetCore.AppServiceMiddleware.InvokeAsync(HttpContext context) in /EasyAuth/Microsoft.Azure.AppService.Middleware.NetCore/AppServiceMiddleware.cs:line 102
2022-11-08T08:47:28.457928173Z at Microsoft.Azure.AppService.MiddlewareShim.AutoHealing.AutoHealingMiddleware.Invoke(HttpContext context) in /EasyAuth/Middleware.Host/AutoHealing/AutoHealingMiddleware.cs:line 55
2022-11-08T08:47:28.457939473Z at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
Creating a new app in an other app service plan did not improve the situation so we have opened a support ticket/case at Microsoft. This issue has nothing to do with our application.This issue is 100% related to a change that might happened at Microsoft.
Let's keep in touch on this thread to share knowledge about this issue.

issue is solved after restarting the azure app services

Why does this gRPC call from the Google Secret Manager API hang when run by Apache?

In short:
I have a Django application being served up by Apache on a Google Compute Engine VM.
I want to access a secret from Google Secret Manager in my Python code (when the Django app is initialising).
When I do 'python manage.py runserver', the secret is successfully retrieved. However, when I get Apache to run my application, it hangs when it sends a request to the secret manager.
Too much detail:
I followed the answer to this question GCP VM Instance is not able to access secrets from Secret Manager despite of appropriate Roles. I have created a service account (not the default), and have given it the 'cloud-platform' scope. I also gave it the 'Secret Manager Admin' role in the web console.
After initially running into trouble, I downloaded the a json key for the service account from the web console, and set the GOOGLE_APPLICATION_CREDENTIALS env-var to point to it.
When I run the django server directly on the VM, everything works fine. When I let Apache run the application, I can see from the logs that the service account credential json is loaded successfully.
However, when I make my first API call, via google.cloud.secretmanager.SecretManagerServiceClient.list_secret_versions , the application hangs. I don't even get a 500 error in my browser, just an eternal loading icon. I traced the execution as far as:
grpc._channel._UnaryUnaryMultiCallable._blocking, line 926 : 'call = self._channel.segregated_call(...'
It never gets past that line. I couldn't figure out where that call goes so I couldnt inspect it any further than that.
Thoughts
I don't understand GCP service accounts / API access very well. I can't understand why this difference is occurring between the django dev server and apache, given that they're both using the same service account credentials from json. I'm also surprised that the application just hangs in the google library rather than throwing an exception. There's even a timeout option when sending a request, but changing this doesn't make any difference.
I wonder if it's somehow related to the fact that I'm running the django server under my own account, but apache is using whatever user account it uses?
Update
I tried changing the user/group that apache runs as to match my own. No change.
I enabled logging for gRPC itself. There is a clear difference between when I run with apache vs the django dev server.
On Django:
secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x17cfda0, target=secretmanager.googleapis.com:443, args=0x7fe254620f20, reserved=(nil))
init.cc:167] grpc_init(void)
client_channel.cc:1099] chand=0x2299b88: creating client_channel for channel stack 0x2299b18
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
client_channel.cc:1879] chand=0x2299b88 calld=0x229e440: created call
...
call.cc:1980] grpc_call_start_batch(call=0x229daa0, ops=0x20cfe70, nops=6, tag=0x7fe25463c680, reserved=(nil))
call.cc:1573] ops[0]: SEND_INITIAL_METADATA...
call.cc:1573] ops[1]: SEND_MESSAGE ptr=0x21f7a20
...
So, a channel is created, then a call is created, and then we see gRPC start to execute the operations for that call (as far as I read it).
On Apache:
secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x7fd5bc850f70, target=secretmanager.googleapis.com:443, args=0x7fd583065c50, reserved=(nil))
init.cc:167] grpc_init(void)
client_channel.cc:1099] chand=0x7fd5bca91bb8: creating client_channel for channel stack 0x7fd5bca91b48
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
timer_manager.cc:188] sleep for a 1001 milliseconds
...
So, we a channel is created... and then nothing. No call, no operations. So the python code is sitting there waiting for gRPC to make this call, which it never does.

The problem appears to be that the forking behaviour of Apache breaks gRPC somehow. I couldn't nail down the precise cause, but after I began to suspect that forking was the issue, I found this old gRPC issue that indicates that forking is a bit of a tricky area.
I tried to reconfigure Apache to use a different 'Multi-processing Module', but as my experience in this is limited, I couldn't get gRPC to work under any of them.
In the end, I switched to using nginx/uwsgi instead of Apache/mod_wsgi, and I did not have the same issue. If you're trying to solve a problem like this and you have to use Apache, I'd advice further investigating Apache forking, how gRPC handles forking, and the different MPMs available for Apache.

I'm facing a similar issue. When running my Flask Application with eventlet==0.33.0 and gunicorn https://github.com/benoitc/gunicorn/archive/ff58e0c6da83d5520916bc4cc109a529258d76e1.zip#egg=gunicorn==20.1.0. When calling secret_client.access_secret_version it hangs forever.
It used to work fine with an older eventlet version, but we needed to upgrade to the latest version of eventlet due to security reasons.

I experienced a similar issue and I was able to solve with the following:
import grpc.experimental.gevent as grpc_gevent
from gevent import monkey
from google.cloud import secretmanager
monkey.patch_all()
grpc_gevent.init_gevent()
client = secretmanager.SecretManagerServiceClient()

DNS record look up with Python / Django [dnspython]

I am trying to get DNS records of particular domain. So, I found dnspython package where it could be easily done. It works fine when I run it from my computer. However, when I call it from Django views it shows the previous records (old records) which means it's not updating.
Is it some kind of caching in OS level? Note that, I am also using Docker. Restarting docker and clearing cache in Django didn't help, still shows old records.
Here's the sample code for checking records:
import dns.resolver
result = dns.resolver.resolve("domain.com", "TXT")[0].to_text()
The code snippet above works and shows any update in TXT record, when I run it from my computer. However, in Django it's stuck in old records and not updating.
In Django views:
def verify_dns_view(request, domain_id):
domain = get_object_or_404(models.Domain, id=domain_id)
mx_record = dns.resolver.resolve(domain.name, "MX")[0].to_text()
txt_record_spf = dns.resolver.resolve(domain.name, "TXT")[0].to_text()
...

There always might be a different DNS server that your app and PC are connecting to. In your case app server is "further away" from the server where the actual domain is registered so it did not update the record yet.

Connect to Azure Service Bus with Python

I have a mixed (C#, Python) system communicating asynchronously through Azure Service Bus queues. Everything was working fine but now I'm getting strange error messages in my Python consumer (which is basically a copy and paste from: https://azure.microsoft.com/en-gb/documentation/articles/service-bus-python-how-to-use-queues/). In particular, the line
msg = bus_service.receive_queue_message('myqueue', peek_lock=False)
always results in a could not convert string to float: max-age=31536000 error - the queue is accessed though (in fact, I can see in Azure that the message gets actually off the queue), and I already tried with different types of payload (the original Json based I was using and simple string now). Strangest of all, was working fine. Does anybody got a similar experience?

Just answering my own question in case somebody stumbles into the same problem. My requirements.txt file was not up to date with the latest Python Azure module (of course, I checked the wrong Python env and so I was "sure" it wasn't that :-)). Once I updated the dependencies, things started working again.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Kaleido not working on AWS due to glibc outdated - python

Related

ToDoist API Python SDK throwing 410 Error

EasyAuth on Azure Function App errors out custom oidc provider

Why does this gRPC call from the Google Secret Manager API hang when run by Apache?

DNS record look up with Python / Django [dnspython]

Connect to Azure Service Bus with Python

Categories

Resources