Z-Wave: Errant Asynchronous Events

For a long time now, I’ve had a “hunch” there is something wrong with a) common implementations of Z-Wave libraries used by hub manufacturers or b) an artifact of a Z-Wave “collision” when two or more asynchronous event messages are sent by various sensors in a Z-Wave system.

Keep in mind, common battery-operated sensors–sometimes referred to as “sleepy” devices–only awake to send a quick message to the associated hub/device upon occurrence of an even–then go back to sleep to save battery power. They are not “listening” for a command to, say, turn on a light, as would a wall switch. Common examples of “sleepy” devices include battery operated motion sensors, door open/close sensors, and environmental sensors for temperature and humidity. A “collision” would be when two or more devices awaken at the same time and, thinking the network wasn’t busy, sending their traffic.

Over the course of approximately ten years I have observed this phenomenon four times when I was able to troubleshoot the incident–the most recent last night (after we were in bed, of course :grinning_face_with_smiling_eyes: ). The first three events were on two different SmartThings hubs and last night’s event was while using my Z-Box hub–apparently indicating it may not be a hub dependency. There were four different devices involved (two events involved the same device type, but different locations), three different device types, and three different device manufacturers)–apparently indicating the issue isn’t specific to a manufacturer or particular device firmware release.

The events were as follows:

  • A door contact sensor on an exterior slider in our former Florida condo reported “door opened” one summer while we were at home in New England. After checking motion sensors inside the condo, I saw no reason to suspect a break-in. Our homewatch folks confirmed that the slider door was indeed locked closed and nothing was amiss. The sensor was a Linear/GoControl WADWAZ-1 Z-Wave Door/Window sensor. Resolution: I wrote a modified driver for the device that asked for sensor state during the periodic “wake-up” interval when the device reported battery status. The result was “door was closed.”
  • A motion sensor at the same location indicated motion inside the condo. Again, checking other motion sensors, the three door open/close sensors, and watching the interior on a security camera, I saw no evidence anyone was inside the condo. The sensor was an Aeon Labs Multisensor 6 Z-Wave Plus device. Resolution: I wrote a modified driver for the device that asked for device status when the device reported battery status. The result was “no motion.”
  • A different physical motion sensor of the same model at our primary home in New England reported “motion” while we were away visiting family in Colorado. At this location, we had a commercial alarm system, as well as interior security cameras. I observed no indication of intrusion, the police were not called by the alarm company, and the alarm app indicated all was well at home. Resolution: I again modified the driver for the device that retrieved device status when it reported battery status. The result was “no motion.”
  • Finally, last night our Zooz ZSE50 800LR Siren & Chime happily announced, “Alert! High temperature detected in basement freezer!” We got up from bed, traipsed down to the basement, got out the IR Temperature Gun and found the freezer temperature was well below 32 degrees Fahrenheit. I unplugged the siren and went back to bed. The temperature sensor is wired to a Fibaro FGK10X Door/Window Sensor w/Temperature, a Z-Wave Plus device that also alerts us if we accidentally leave the freezer door open. Ironically, in this case the sensor and the hub were from the same manufacturer, so you’d think there should be no issues, right?
    Below is the temperature graph I retrieved from my Z-Box hub this afternoon, which shows reported temperature over time–where you can see after I’d closed the freezer door, the temperature report gradually returned to normal.
    Notice the sudden jump of temperature to 32 degrees? And how does the Z-Box handle temperature internally, but in Celsius! Yes, zero degrees C is 32 degrees F. Kind of suggests a payload containing a zero at the location supposedly containing temperature, doesn’t it? In other words, a bogus report was handed by the Z-Wave library to the hub for processing, one could suppose…

All of the above speculation is simply that: I am no expert on the internals of any Z-Wave library code, nor either the SmartThings or Z-Box hubs. It is simply my mind’s way of troubleshooting, based on a career in embedded system programming and familiarity with various other communications protocols. In my mind, the long-term evidence certainly points to something. But what? Who “owns” the problem? The common thread is the Z-Wave protocol or its implementation–not the device, not the device firmware, or not even the particular hub software.

So, am I nuts? Anyone out there with similar experience or better knowledge of the protocol and how it should be handled with regard to error conditions?

2 Likes

First of all - I love a good mystery. :slight_smile:

My first thought is - do we know if these devices actually reported the open / high temp conditions on their own, as opposed to an issue with the protocol?

For example - on the temp chart, the temp went high, and came down gradually over 5 additional reporting cycles. if it was a one-off error or ‘collision’ wouldn’t it have been one high reading and then the next would be the actual / low temp? If you only had the door open to zap it with a laser temp probe would the temp really have been 28 degrees a moment later? (not sure how long the door was open…etc)

I’m thinking these devices may be ‘panicing’ (to use linux verbiage) and rebooting in some way. During that panic, their values are reporting in error or at some default. In the case of the temp sensor, perhaps it takes time to normalize after such a reboot. Just a guess. All of these are older devices, it seems (not sure if related) but maybe some long-standing firmware issue (memory leak, etc) of some sort?

Also ZWave is CSMACA iirc, so the devices could theoretically have a collision. They wait for silence in the air, then try to broadcast. But as far as I recall if they do have an actual collision they can’t detect it, they just fail to get an ACK, (since the receiving node couldn’t understand it) and then the node waits a specified/random amount of time and retries. The packet would not be understood or received by the recipient (hence no ACK) rather than be received garbled or with bad data. Since the random timer is different for the two nodes in theory they’d retransmit at different times. This is based on my understanding which if I’m off base someone please correct me. The best would be to watch this with a Zniffer but ever so hard to capture in the wild, of course.

I’ve seen similar behavior in other places - for example I have some Shelly devices that monitor power (wifi, not zwave) which also occasionally give me one wildly inaccurate measurement and skew charts in home assistant. Sometimes a firmware upgrade fixes it, but it sure messes up my trends when the heat pump pulled negative 5000 W during an overnight. :slight_smile:

Not sure if helpful, but I am curious.

2 Likes

No knowledge of the protocol, but I don’t think you’re nuts. I very much think collisions are happening or devices are sending out packets to other devices that are also asleep (unknowingly). There almost needs to be a handshake process going on here (an acknowledgement at the end that the packet was received).

I was having a problem for quite some time with door and motion sensors getting stuck in states, that we actually added indicators in a couple different spots as visual indicators for us:


I then switched over to LR for all the open/close sensors and motion sensors, and (knock on wood) so far none of them have gotten stuck since, now that they communicate directly with the hub. Everything just works as expected.

The above was just my ZBox experience. As for SmartThings, which I ran for many years (although much less devices), I honestly never had these issues. That being said, I think SmartThings was doing something different, as the devices consume battery at a much higher rate. So perhaps it was keeping them awake, polling them more often, or just reporting the battery levels incorrectly so I was replacing them unnecessarily.

1 Like

The freezer door was open while we checked several frozen items in the freezer and I hunted down the IR Temperature gadget. Minutes, not seconds. It’s a chest freezer with the temperature sensor mounted in the door lid, so it hung there in the breeze until we closed the freezer. Also note the typical temperature variance: if this had been an actual failure of the freezer, we would have seen a gradual approach to 32 degrees F, not a sudden jump as shown. As the freezer door cooled down once we gave up and went back to bed, the temperature dropped gradually…

Agreed: if I’d left the freezer door closed–or popped it open only long enough to take the temperature reading–we might have seen a sudden jump back onto the “normal” range of the unit.

One would think so. The “random” wait before retransmit would most likely be different among devices and a subsequent retransmit would not result in a collision. I do not know (perhaps you do?) if a Z-Wave packet has any kind of checksum or checkdigit that would cause a packet to be discarded as corrupted. (Or if that logic were implemented & tested correctly on both ends!)

The following clearly could affect things other than missed Async Events. I’ve been chasing this for days and finally found something this evening–but to stay on topic…

What about the case when the Z-Box CPU is simply “tapped out” and sits at 100% utilization? Could that cause the hub to miss one or more Z-Wave event(s) during that interval? I’ve marked several instances of 100% utilization in the screenshot, below.

Side note: these instances of CPU utilization occurred at sunset (5:35PM local time), when my “Good Evening” routines ran. Although the profile was changed from “Home” to “Evening” as it should have, it failed to change the color of an LED on a ZEN32 scene controller (which shows no communications errors on the Z-Wave Diagnostic page). The device color is changed by a simple Lua Scene that sends a single message to the device. This is not the same issue as the hub missing an incoming event! (If there were other affects, I haven’t noticed yet.) :thinking: