Reliability
CustomerNode is a multi-tenant SaaS platform that customers run their day-to-day work on. This page describes how we operate the platform for availability, how we respond when something goes wrong, and how we communicate with customers during and after incidents.
Specific availability commitments — including any service-level agreement and associated remedies — are documented in your enterprise agreement and Master Services Agreement. This page describes our operational approach; it does not create or modify contractual SLAs.
Availability approach
Production infrastructure runs in independently audited cloud environments with redundancy at the compute, storage, and network layers. Critical infrastructure components are designed with redundancy and failover mechanisms intended to reduce the likelihood of customer-visible downtime. Component restarts and rolling deploys are routine operational events, not incidents.
Architecture for resilience
- Managed edge: public-facing endpoints sit behind a managed edge layer that performs TLS termination, traffic filtering, and rate limiting.
- Stateless application tier: the application is horizontally scalable; instances can be replaced and rescheduled independently.
- Replicated storage: primary databases use managed services with redundancy and point-in-time recovery.
- Health monitoring: production services are regularly monitored, and failed instances are removed from rotation automatically.
Backup & recovery
Production data is backed up on a regular schedule. Backups are encrypted at rest and stored separately from primary infrastructure. We periodically test restoration procedures to validate recovery objectives. Recovery point and recovery time objectives are documented internally and shared with enterprise customers under NDA on request.
Maintenance windows
The platform supports rolling deployments for most changes without customer-visible downtime. When a change requires customer-visible downtime — possible for certain database migrations or infrastructure cutovers — we schedule it during a low-traffic window and provide advance notice to affected customers via email and in-app notice.
Incident response
We operate a documented incident response process covering detection, triage, containment, eradication, recovery, and post-incident review.
- Detection: automated monitoring of error rates, latency, and authentication anomalies; supplemented by customer reports.
- Triage & severity: incidents are classified by customer impact (SEV-1 platform-wide outage, SEV-2 partial degradation, SEV-3 limited impact).
- Containment: on-call engineers stabilize the platform before pursuing root cause; this may include traffic shedding or feature degradation to preserve core functionality.
- Recovery: we restore service and validate that customer-visible behavior is correct before declaring resolution.
- Post-incident review: material incidents are reviewed internally and the findings drive corrective actions tracked to completion.
Customer notification
During material platform incidents, CustomerNode communicates with affected customers on a commercially reasonable cadence appropriate to the severity and scope of the incident.
- Proactive updates: we initiate communication once incident scope is confirmed, rather than waiting on customer reports.
- Ongoing communication: while an incident is open, we provide updates as the situation evolves.
- Post-resolution summary: following material incidents, we share with affected customers a summary of what happened, customer-visible impact, and the corrective actions we are taking.
- Security incidents involving customer data: handled under the separate notification commitments described in our Data Processing Addendum.
Business continuity & disaster recovery
Our business continuity plan addresses scenarios including cloud infrastructure disruption, loss of a critical subprocessor, and loss of access to office or personnel infrastructure. The plan is reviewed on a recurring schedule. A summary of recovery objectives and validated scenarios is available to current and prospective customers under NDA during security review.
Subprocessor & dependency monitoring
Our reliability depends in part on third-party services (see Subprocessors). We monitor the health of critical subprocessors and have documented fallback or degradation strategies for the dependencies whose loss would otherwise be customer-visible.
Status & transparency
Customers who need real-time visibility into platform health, or notification of in-progress incidents, may request to be added to our incident notification list at [email protected]. A public status page is on our roadmap; until it is live, in-progress SEV-1 and SEV-2 incidents are communicated by email to affected tenants and via in-app notices where possible.