Delivery Guarantees for SaaS Product Event Integrations

Your SaaS sends events to external tools. A signup goes to HubSpot. A payment failure goes to Salesforce. A cancellation triggers a message in Intercom.

The code that sends those events probably looks reasonable. An HTTP call, maybe a try/catch, maybe a timeout. It works in development. It works in staging. It works in production — until a timeout duplicates a signup, creates two CRM contacts, and nobody notices until a customer complains that they're getting double onboarding emails.

The gap between "sends events" and "guarantees delivery" is wider than most teams realize. And it's a gap that only shows up when something goes wrong at the worst possible time.

What "delivery" actually means

Most integration code treats delivery as a binary: the HTTP call succeeded, or it threw an error. But delivery in a distributed system is a spectrum, and production traffic lives in the uncomfortable middle.

Delivery means the destination received the event, processed it, and acknowledged it. That's three things, not one. A 200 response from HubSpot's API means something different than a 200 from a webhook endpoint you don't control. A timeout doesn't mean the event wasn't received — it means you don't know.

When teams build integration code, they tend to optimize for the happy path: the network is fast, the destination is healthy, the payload is valid. Delivery guarantees are about everything else.

The five gaps

After watching teams debug integration failures across dozens of SaaS products, the same five gaps show up repeatedly. Most codebases have at least three of them.

To make these concrete, let's follow one event through a system that has all five. Here's what that looks like in production.

A customer upgrades from a free trial to a paid plan. Your app emits a subscription.started event. It needs to reach HubSpot (update the contact lifecycle stage), Salesforce (create an opportunity), and Slack (notify the sales channel). The HubSpot API is having a slow day — responses are timing out at about a 30% rate. Here's what happens.

1. No retry strategy (or the wrong one)

The subscription.started call to HubSpot times out. Your code retries immediately — three times in quick succession. HubSpot is already slow, and now you've tripled the load. All three retries time out too.

This is worse than no retries at all. Immediate retries hit a destination that's already struggling. If the failure was caused by rate limiting or temporary overload, you're making the problem worse. If it was a network blip, you're competing with the original request that may still be in flight.

A real retry strategy needs exponential backoff (wait longer between each attempt), jitter (randomize the wait so retries from different events don't all hit at the same time), and a maximum attempt count. It also needs to distinguish between retryable failures (503, timeout, connection reset) and permanent ones (400, 401, 404). Retrying a malformed payload forever doesn't help anyone.

What Your Integration Code Is Missing: Delivery Guarantees for Product Events

Your SaaS sends events to external tools, but the gap between "sends events" and "guarantees delivery" is wider than most teams realize. This post breaks down the five delivery gaps — retries, idempotency, dead letters, isolation, and visibility — and what it takes to close them.

What "delivery" actually means

The five gaps

1. No retry strategy (or the wrong one)

2. No idempotency protection

3. No dead letter handling

4. No per-destination isolation

5. No delivery visibility

Why teams skip these

The build-vs-buy decision