September 2nd, 2024
Fixed
We reduced the latency of all authenticated API endpoints by reducing the number of calls to Auth0
We removed a bug that caused issues in our SLO performance product that caused issues when their were missing data points
Improve the stability of the internal data structure containing SLI query information
Fixed the issue where in some cases, compute engine was skipping documents in interval based calculation resulting in incorrect slo computation
one failing SLO can send the dataflow into a crash loop, blocking all non failing slos from being computed. We shouldn't just raise exceptions in those flows, we should catch the exception, log it on Sentry and possibly Slack and then continue the flow so that other non failing SLOs can continue computing. In those cases, SLO computation timestamp will be stuck at some point in time, and it won't make any progress until we fix the issue. Those issues should be p0.
Sometimes we fail to load indexed metrics. On the FE, we cannot progress as the FE no longer allows to input a metric key manually.