Build vs Buy - The True Cost of Hand-Rolled SaaS Integrations
Hand-rolled integrations look cheap until retries, dead letters, credential storage, replay tooling, and on-call load show up. A practical build-vs-buy look for SaaS teams.
Most SaaS teams should build their first integration themselves.
That is not a contrarian take. It is usually the right one.
If you need one strategic destination, the fastest way to learn the product surface, the customer workflow, and the data model is to wire it up directly. You learn what the payload needs to look like. You learn what the provider API expects. You learn whether the integration actually matters to customers.
The problem is that teams often keep using the first-build estimate to justify the fifth integration.
That is where the math breaks.
The initial build is often cheap enough to feel obvious. The long tail of reliability, auth, replay, observability, and maintenance is where the real cost hides.
Why the first build always looks cheaper than it is
On day one, the integration usually looks like this:
- a queue job
- a provider SDK or
fetch call - a little field mapping
- maybe a retry wrapper
That is a real amount of work, but it is legible. A small team can estimate it. A founder can believe it fits in a sprint. An engineer can say, correctly, "we can just build this."
The estimate is not wrong. It is incomplete.
What gets missed is everything the first integration quietly establishes as precedent:
- a retry policy
- a credential storage pattern
- a support expectation when deliveries fail
- an event model that other destinations will eventually want too
- an operational promise to the business that "yes, we support integrations now"
The first integration is rarely only one integration. It is the seed of an integration platform, whether you intended to build one or not.
The hidden costs that show up after launch
This is the part teams feel but often do not price.
Retry logic is a product, not a helper
The first time a destination returns a timeout or a 503, you need retry behavior. Not one retry. A real policy.
That means deciding:
- which failures are retryable
- how long to wait between attempts
- whether to use exponential backoff
- how to add jitter
- when to give up
If you implement that once for one destination, fine. If you implement it again for every new destination, you start carrying a lot of duplicated infrastructure. We break down that operational surface in Webhook Retry Logic in Node.js.
Reliable retries imply duplicate attempts. The destination might process the payload before your timeout fires. You retry. Now the consumer sees the same logical event twice.
- stable event identifiers
- idempotent consumers
- dedup storage
- support answers for "did this run twice?"
This is not optional extra polish. It is part of reliable delivery.
Dead-letter handling and replay need to exist before the first serious incident
Eventually a destination stays broken long enough that retries stop helping.
Without dead letters, the event disappears. Without replay, recovery is manual. Without visibility, support has no useful answer.
That is why dead-letter queues for webhooks matter so early. They are not only for giant systems. They are for any system that cannot afford "we lost the event."
Credential management expands fast
A direct integration is manageable when you use one internal API key.
The difficulty jumps when customers connect their own accounts:
- OAuth tokens need refresh and re-auth flows
- API keys need secure storage and rotation
- connection state needs to be isolated per customer
- support needs to know whether a failure was auth, rate limit, payload, or destination outage
At that point, you are not only delivering events. You are maintaining customer-specific connectivity infrastructure.
Observability turns into a support requirement
The business version of the problem is not "our worker threw an exception."
- "Why did Acme's lead not make it to HubSpot?"
- "Was the webhook retried?"
- "Can we replay just the failed events from yesterday?"
- "Did this fail for one customer or all customers?"
Once integrations are customer-facing, observability is no longer an engineering luxury. It is part of the product.
Maintenance churn compounds quietly
Provider APIs change. Scope requirements shift. Rate limits get hit. Field mappings drift. Customers ask for new object types. Product adds a new event that three destinations now need.
None of these individually feels like a re-architecture. Together they create a permanent maintenance lane in your roadmap.
On-call burden is the real multiplier
This is the cost teams undercount most often.
Every integration creates a new class of production incident:
- a destination outage
- an auth failure
- a mapping regression
- queue backlog
- retry storm
- noisy-neighbor tenant issue
Even if the initial code was cheap, the on-call surface usually is not.
The multiplication problem
The reason build-vs-buy changes over time is not that one integration gets impossibly hard. It is that each new destination multiplies the operational surface area of the previous ones.
Here is the rough pattern:
| Capability | One integration | Five integrations |
|---|
| API client logic | Annoying but manageable | Constantly diverging per provider |
| Retry policy | One code path | Provider-specific rules and rate limits |
| Auth handling | One token model | Multiple auth modes and failure states |
| Support tooling | Maybe logs are enough | You need search, replay, and customer scoping |
| Observability | Ad hoc | Product-level requirement |
| On-call | Occasional | Persistent operational load |
This is why integration work feels nonlinear.
The second destination is not only "one more API." It also means:
- one more provider to authenticate against
- one more mapping surface
- one more failure mode
- one more customer expectation that the delivery is reliable
If the first integration taught you the domain, the third and fourth usually teach you whether you are building the right thing in-house.
When building it yourself still makes sense
Buying infrastructure is not the right answer every time.
Building is still a strong choice when:
The integration is strategically unique
If the integration is deeply product-specific and unlike anything else you expect to support, custom code can be the cleanest path.
You only need one or two destinations
If you genuinely have a narrow integration surface and do not expect customer-configured connections, the platform overhead of a dedicated delivery layer may be unnecessary.
You need control over unusual behavior
Sometimes you have special ordering rules, strict latency constraints, or domain-specific workflows that are easier to implement directly than to fit into a general platform.
You are still proving demand
In the early stage, building one integration directly can be the fastest way to learn whether the problem deserves more investment at all.
The point is not "never build." The point is to be honest about what you are actually choosing to own long term.
When buying starts to look rational
Buying gets more attractive when the problem is no longer one custom connection. It is repeatable integration infrastructure.
- you are adding a third or fourth destination
- customers want to connect their own accounts
- reliability requirements are rising faster than feature velocity
- support needs replay and delivery visibility
- the integration surface is consuming roadmap time that should belong to the core product
At that stage, the question is less "can we build this?" and more "should our team be the one maintaining this forever?"
That is where the category matters: delivery layers, integration platforms, embedded connectivity platforms, or whatever label you prefer. The shared value is not that they write one API call for you. It is that they centralize the plumbing you would otherwise rebuild repeatedly.
The middle ground is usually the best architecture
Teams sometimes frame this as a false binary:
- build everything yourself
- outsource the whole integration experience
There is a more practical middle ground:
- own your event model
- own the product semantics
- buy or centralize the delivery layer
That means your application still decides what user.signup, invoice.paid, or subscription.canceled mean. Your product still owns which events exist and what data they carry. But you stop re-implementing the transport machinery around those events for every destination.
This is the model behind our DIY comparison, and it is usually the cleanest handoff between product logic and delivery infrastructure.
Meshes sits in that middle ground. You emit product events once, and the platform handles routing, retries, fan-out, and connection management outside your core app. That does not make it the right answer for every team, but it is the type of architecture that becomes attractive when your bottleneck is infrastructure repetition rather than product-specific integration logic.
A practical way to decide
If you are trying to make the call now, a simple decision framework is more useful than a slogan.
Build if most of these are true:
- you have one high-value integration
- your team is comfortable owning auth and retry logic
- the customer workflow is still being discovered
- integration reliability is important, but not yet a dedicated product surface
Buy or centralize if most of these are true:
- you expect multiple destinations
- customers bring their own credentials
- support already needs event-level visibility
- failures need replay, not just logs
- the maintenance lane is starting to crowd out core product work
The goal is not to avoid complexity. It is to put complexity in the place your team is actually willing to maintain.
The real cost is not the first sprint
Hand-rolled integrations are often worth it at the beginning. That is why so many teams start there.
The mistake is assuming the cost curve stays flat.
Retries, dead letters, auth, replay, field mapping, customer isolation, and on-call response all turn a simple integration into a real operating surface. Once you have several destinations or several customer-specific connections, you are not choosing between "one API call" and "a platform." You are choosing whether integration plumbing should be part of your product team's permanent job.
That is the real build-vs-buy question.
Want to keep your event model and stop rebuilding the delivery plumbing? Join Meshes and route product events to multiple destinations without hand-rolling the infrastructure each time.