Observability Crosses the IT/OT Line: MQTT, OPC and the New Telemetry Stack
One platform watching servers, Kubernetes clusters, ATM hardware, anaesthesia machines and refrigerated trucks at the same time. That is the proposition ATS Network Management is putting on the table out of Johannesburg, and it represents a meaningful expansion of what the word "observability" has meant for the last decade. The pitch lands at a moment when most engineering teams are still finishing their first generation of metrics, logs and traces work for cloud-native apps, never mind operational technology.
The shift, if real, collapses two telemetry stacks (IT and OT) into one. That changes how platform teams budget storage, design alerting, and structure on-call. It also changes who owns what.
Key Details
The argument, as ITWeb reported on 12 March 2026, is that observability platforms can now ingest telemetry from machines, infrastructure, cloud workloads and physical environments in real time, using IoT protocols such as MQTT (lightweight messaging for sensors and devices) and industrial standards such as OPC (a widely used protocol for machine communication). The target verticals named are banking, healthcare, mining, logistics and utilities.
The banking example is the clearest. Each ATM is described as a distributed IoT device combining hardware sensors, an operating system, network connectivity and transaction processing software. A bank running thousands of these endpoints can, in theory, detect hardware degradation, connectivity failures, abnormal temperature conditions and transaction system errors before customers are affected. That is four distinct failure modes per device, multiplied across a fleet, on one pane of glass.
Healthcare gets a similar treatment. Operating theatres bundle anaesthesia machines, surgical imaging, patient monitoring, sterilisation systems and environmental controls. Pharmaceutical cold chain extends sensor coverage to refrigeration units, cold storage facilities, transport containers and warehouse environments, with the platform expected to flag temperature deviations, refrigeration failures and power disruptions before inventory is compromised. The article puts a hard number on the stakes: a single prevented refrigeration failure can save millions in spoiled inventory. That is the only explicit financial figure in the source.
The cloud layer is treated as non-negotiable. The argument is that a cloud application slowdown can ripple into ATM transaction processing, a logistics platform failure can break cold-chain temperature monitoring, and a cloud service outage can disrupt manufacturing-floor production monitoring. Coverage therefore has to extend across hybrid and multicloud workloads, containers and Kubernetes environments, application performance and databases, and the network connectivity between cloud and operational sites. Mining adds telemetry from conveyor systems, drilling equipment, ventilation systems and underground safety sensors. Utilities add substations, transformers, renewables and grid sensors, with transformer overheating and abnormal power fluctuations called out as detection targets.
Why This Matters for Engineering Teams
The interesting engineering question is not "can you ingest MQTT into an observability backend". You can. It's whether the data model holds up when you do. A traditional APM stack is built around request-scoped traces, application metrics on roughly 15 to 60 second intervals, and structured logs. OT data looks different: high-frequency vibration samples, slow-moving temperature curves, discrete state changes from PLCs over OPC. The cardinality profile, retention requirements and alerting semantics are not the same animal.
The source claims machines, applications and sensors can together generate millions of data points every minute. That is the load-bearing number in the entire piece, and the source does not disclose what fraction is high-frequency OT versus standard application metrics, which matters because storage cost and query performance scale very differently for the two. If even 20 percent of those points are sub-second sensor samples, the storage tier needs a time-series database tuned for downsampling and rollups, not a generic log store. Teams that skip that distinction will see their ingest bill reset by an order of magnitude.
There's also a correlation problem. The source flags AI-driven observability for detecting subtle performance degradation in cloud applications, abnormal machine vibration indicating equipment failure, and unusual refrigeration behaviour. Three very different signal types, three very different baselines. I'd argue the realistic near-term win is not unified anomaly detection across all of them, it's correlation across well-defined boundaries: a Kubernetes pod restart correlated with an ATM endpoint going dark, a database latency spike correlated with cold-chain alert delay. Standards work like OpenTelemetry already gives teams a shared semantic convention for the IT side; extending those conventions to OT signals is the unfinished homework.
Testable prediction: platform teams that adopt unified IT/OT observability without separating their hot and cold storage tiers will see ingestion costs at least double within two quarters of rollout, and will quietly start dropping sensor data to compensate.
Industry Impact
For banking platform leads, the ATM-as-IoT framing is overdue. Treating an ATM as a server with peripherals lets you reuse existing SRE practice (SLOs, error budgets, runbooks) on a fleet that has historically been managed by a separate facilities or vendor-ops function. The hard part is organisational, not technical. Whoever owns the cloud transaction processing platform and whoever owns ATM hardware health usually do not share an alerting tool, an on-call rotation, or a postmortem process. Unified observability forces that conversation.
In healthcare, the prize is theatre readiness, but the constraint is regulatory. The source describes integrating telemetry from anaesthesia machines, imaging, patient monitoring, sterilisation and environmental controls. We don't know from the source how device vendors expose that telemetry, or whether the proposed approach assumes pulling data through OPC bridges or vendor APIs. The bound: if even one major vendor in the theatre stack refuses to expose machine state, the unified view degrades into a partial view, and clinical engineering teams will not trust it for go/no-go decisions.
Logistics and pharma are the cleanest fit. Cold-chain telemetry is already sensor-heavy, the failure modes (temperature deviations, refrigeration failures, power disruptions) are well-characterised, and the financial case is concrete. Retail refrigeration monitoring across warehouses, refrigerated delivery vehicles and retail units follows the same pattern. Mining and utilities have the highest-stakes physical environments and the messiest legacy protocol estate, which means longest integration timelines but largest payoff per detected failure.
What to Watch
Three signals worth tracking over the next year. First, whether observability vendors publish OT-specific semantic conventions or leave it as customer homework. If MQTT and OPC ingest stays in the "bring your own schema" tier, adoption will be limited to teams with dedicated data engineers. Second, whether banks and hospitals start posting job descriptions that combine SRE and operational technology responsibilities. That is the lagging indicator that the org-chart change is real, not just the tooling change. Third, whether AI-driven anomaly detection across mixed signal types produces fewer false positives than per-domain detection. The source is bullish on this; the evidence base is thin.
Prediction: by Q2 2027, at least one major observability vendor will ship a first-class OPC connector with documented semantic conventions, and the first reference architectures pairing Kubernetes monitoring with industrial telemetry will appear, similar in spirit to the patterns documented in the Google Cloud Architecture Framework. If that doesn't happen, the IT/OT convergence story stays a deck slide for another cycle.
Key Takeaways
- ATS Network Management's pitch unifies IT and OT telemetry through MQTT and OPC, targeting banking, healthcare, mining, logistics and utilities.
- The ATM-as-distributed-IoT framing is the most actionable example: four failure classes per device (hardware, connectivity, environment, transactions) on one platform.
- The "millions of data points per minute" claim is real but unsourced as to composition. Mixed IT/OT cardinality will break naive storage tiers.
- The hardest problem is not ingest, it's correlation semantics and organisational ownership across previously separate IT and facilities teams.
- Watch for OT-aware semantic conventions from observability vendors and combined SRE/OT job postings as the leading indicators of real adoption.
Frequently Asked Questions
Q: What's the difference between traditional IT monitoring and the observability approach described here?
Traditional monitoring answers "is the IT system working" by watching servers, networks, applications and databases. The expanded observability approach pulls in operational technology data through protocols like MQTT and OPC, so the same platform can watch cloud workloads alongside ATM hardware, refrigeration units, theatre equipment and industrial machines.
Q: Why are MQTT and OPC the protocols of choice?
MQTT is a lightweight messaging protocol designed for IoT sensors and devices, which makes it efficient for high-volume, low-bandwidth telemetry. OPC is a long-established industrial protocol for machine-to-machine communication, so it gives observability platforms access to existing factory and infrastructure equipment without ripping out the control layer.
Q: What's the realistic risk of pushing OT data into an existing observability platform?
Cost and signal fidelity. Sensor streams have very different cardinality and frequency profiles than application metrics, so storage and query tiers tuned for APM can get expensive fast. Teams that don't separate hot OT data from standard application telemetry tend to either overspend on ingest or quietly drop sensor samples, which defeats the purpose.
The 1-Second Tax: Why Mobile Speed Is an Architecture Decision
A one-second mobile delay cuts conversions by 20%. For platform leads, that's not a frontend bug, it's a build-vs-buy decision sitting on the CFO's desk.
AI SRE Summit 2026: Komodor Forces Hype-vs-Reality Reckoning
Komodor's May 12 AI SRE Summit lines up Honeycomb, Salesforce, and Man Group voices to stress-test the gap between vendor demos and 2am incident reality.
Cloud Native Hits 19.9M Developers: The Plumbing Won
CNCF and SlashData clock the cloud native developer population at 19.9 million, but the real story is that Kubernetes has vanished behind internal platforms.

