I’ve coded up a simple test of the peer-to-peer protocol I’ve been discussing recently. Here is a picture of the set-up:

For this test, I have 3 light posts transmitting short 10byte packets as fast as they possibly can given the constraints of the protocol. These constraints basically reduce to:

  • You can’t send a packet when someone else is sending a packet
  • After someone else finishes, you have to wait at least 1ms to be sure they are done
  • After the 1ms wait, you must wait a random amount between 0 and 1ms to reduce chance of starting at the same time as someone else
Other than these limitations, all 3 devices are competing to send as many packets as they can as quickly as they can.
The 3 light posts are connected via RS485 bus by twisted pair wire pulled from CAT5e cable and have about 50cm of wire between them. All devices are connected to the bus at 38400baud.
Since each packet sent contains the sender’s deviceId and a packet sequenceNumber, the computer I have listening to the bus can determine how many packets were lost for each device. The results of this test shows more loss than I expected, but overall not too bad.
Here is a dump from my Python script listening to the bus:

packetTime = 13.957secs
numPackets = 3525, numSkips = 434, numBadChecksum = 118
skip% = 12.31%, badChksum% = 3.35%
  src=1, numSrcPackets=1072, 30.41% of all packets
  src=2, numSrcPackets=1054, 29.90% of all packets
  src=3, numSrcPackets=1281, 36.34% of all packets
total num device packets received = 3407
total num packets per second = 244.10
effective baudrate = 24410.32
baudrate efficiency vs 38400 line baudrate = 63.57%

The good news is that the three devices seem to be load balancing among themselves properly (due to the random wait time). Every time I run this script, I get almost the same percentages for each device. I suspect that the variance between the devices is due to the slight differences in their clocks. Probably device #3 has a clock that is just slightly faster, meaning that he waits just a little less time than the other two so he goes first slightly more often. But this variance is well within my requirements, and no one is being shut out to a significant degree. This is good.
Also bear in mind that since each 10-byte packet takes about 2.6ms to transmit, and each device must wait at least 1ms after the last byte is received (but on average closer to 1.5ms due to the random wait) before trying to send, this results in a minimum packet time of around 3.6ms to 4.1ms. This means that the theoretical upper bound given the timeouts inherent in my protocol comes out to about 244 to 278 packets per second. Our results are on the low end of this range, so the collisions don’t seem to be hurting us too much at all. And this should be a worst case scenario given that all 3 devices are each trying to saturate the bus on their own.
So how to explain this given that we are skipping (i.e. losing) about 12% of the packets sent. If we estimate that since we only lose packets when two packets collide, we lose them two at a time. So about 6% of the time anyone sends a packet, it collides. If we take the 244.1 packets per second and multiply by 1.06 to account for the 6% packet less, we get 259 packets per second being sent (but not necessarily received due to collision). Since packets always take 2.6ms to transmit, this means that the average delay between packets is 1.25ms, and then they collide 6% of the time. This makes perfect sense since, although each device is averaging 1.5ms random wait, the minimum of the 3 will win. That this minimum is averaging 1.25ms is reasonable.

For comparison, here is a test where I unplug two of the devices and only have one device on the bus:
packetTime = 13.413secs
numPackets = 3265, numSkips = 0, numBadChecksum = 0
skip% = 0.00%, badChksum% = 0.00%
  src=1, numSrcPackets=3265, 100.00% of all packets
total num device packets received = 3265
total num packets per second = 243.42
effective baudrate = 24342.47
baudrate efficiency vs 38400 line baudrate = 63.39%

You can see that despite not having any collisions, the average packet rate goes down (very slightly). This is because the average delay when one device is on the bus is 1.5ms while the average delay when 3 devices are on the bus goes down to 1.25ms (because the minimum of 3 random waits wins). As counter-intuitive as it may be, it appears that 3 devices sending packets as fast as they can and losing 12% of them actually get slightly more through than 1 device sending alone on the bus where none of its packets get lost.

Worth contemplating this a bit to get your mind around that one.

The other two devices when they have the bus to themselves show results that are quite similar. Device #2 gets 241 packets per second, and device #3 gets 248, which confirms the theory that device #3 is running a bit faster and device #2 a bit slower.

The 63% efficiency I’m seeing compares quite well against the 18.4% of Pure ALOHA, and the 36.8% of Slotted ALOHA. Furthermore, tightening up the 1ms wait plus up to 1ms more in random wait could push the efficiency up even further. But my conclusion is that for my purposes, the 243 packets per second is more than enough. (I don’t expect more than a few packets per second under normal circumstances.) And if I ever did need more bandwidth, upping the baudrate would seem to be the easiest way to do this.

But for now, this test was very successful, and I believe I have a functioning peer-to-peer RS485 protocol to proceed with on the “smart lights” firmware.