I have recently been working on designing a peer-to-peer RS485 network protocol with collision avoidance (and maybe even collision detection) for the walkway lights on the land. (Link to last two posts: here and here.)

As mentioned in my last post, even though each device is doing its best to detect if another device is transmitting before it goes ahead and starts sending data down the bus, there is always a chance of collision. This happens whenever two devices both check if the bus is busy at about the same time (when neither of them is transmitting yet, so the bus is seen as “free” to both of them), and then they both start transmitting after the both finish checking. This is referred to as a “race condition” in programming.

One unavoidable factor that increases the chance of this happening is that the electrical signal takes a certain amount of time to transmit down the wire. (My rough estimate for our 100m CAT5 RS485 bus is about 250ns or 0.25us from end to end.)

There are also delays between when the program executes the “send data” instruction and the first stop bit is transmitted (pulling the bus low), delays between when the line actually goes low and the RS485 chip converts that into a low signal on the TTL level RX pin on the microcontroller, and delays between the RX pin being pulled low and the microcontroller executing a software interrupt.

The longest delay (I believe) occurs after the microcontroller triggers an interrupt and before it begins to run the code I told it to run when that interrupt occurs. Basically, interrupts can occur at any time, when the microcontroller is running other instructions. Before executing my interrupt service routine (ISR), the program must save the state of all of the variables that were being used in the other routine that was running. This takes time.

In order to measure this time, I came up with an experiment.

Here is the test case:

  • ATmega328p microcontroller running on internal 8MHz RC oscillator clock
  • RS485 bus connected between computer and RS485 driver chip on ATmega328p PCB
  • Computer sends data to microcontroller
  • Microcontroller enables pin-change interrupt on RX pin to trigger ISR
  • ISR changes the state of an output pin (PB9) when it is called
  • Oscilloscope connected to RX pin and PB9

In this test, I found that there was consistently a 6.4us delay between the RX pin going low and the PB9 pin changing state.

In order to see how long it takes to change state on PB9, I tried a busy loop pulling it low and high, and found that it takes about 0.25us to change the state on the pin. (This results in a 2 MHz signal, .25us high, .25us for a total of .5us per period.)

So out of that 6.4us, about 0.25us comes from the output side. So there was about a 6.15us delay between when the RX pin went low, and the ISR began to run. This is much longer than the 0.25us electrical signal propagation delay I mentioned above.

This is useful to know if you are programming microcontrollers. For example, as a result of this experiment, I realized I need to turn off the pin-change interrupt on the RX pin right after the first start bit is detected. In theory, the RX pin can change every bit on the bus, which is about every 26us at 38400 baud. Making the microcontroller do 6.2us of ISR setup every 26us comes out to about 25% of my CPU power! Or thought of a different way, this could be 100% of my CPU power for a 160,000 baud rate.

But back to the issue of propagation delays causing an increase chance of collisions. Despite this likely being the longest propagation delay, I believe that this doesn’t significantly increase the chance of a collision. The reason I think so is that the moment the microcontroller triggers the interrupt and begins storing the current run state away so that it can begin running the ISR, the CPU is busy doing all of this preparation. Thus, it can’t possibly be preparing to send a transmission during this time. And that is what causes collisions.

Put a different way: even though it takes about 6.2us to detect that the bus has become busy, it can’t begin sending data during that time because the hardware is basically processing the fact that the line has become busy. The only chance for collision then comes if this trigger event occurs right after we tested if the line was busy but right before it runs the “transmit data” operation. Which is exactly the same chance for collision if the ISR took no time to process.

Well, that may not be entirely true. According to what I have read, it does take about 5 clock cycles for the interrupt to actually trigger. I don’t know if this 5 clock cycles is for hardware operations before it halts the main program and begins running stuff, or if it halts the main program in the first clock cycle and uses the CPU to do something for 5 clocks. In the first case, this would give the CPU 5 more clock cycles of operations where it might have been checking if the RX pin was low but the main program didn’t know yet that it was. But 5 clock cycles at 8MHz is only about 0.625us which is about one tenth of the delay we are seeing. And this is still less than one microsecond, which is part of the “one to two microseconds” time window that I mentioned before.

If two devices decide to start sending within one or two microseconds of each other, I don’t think there is much I can do to prevent them from both going ahead and causing a collision.

But just remember… we don’t need to prevent collisions entirely. We just need to reduce them to a minimum likelihood. On the rare occasions that collisions do occur, the receiver won’t receive the message, and the sender will simply have to resend it. This isn’t a big deal (as long as it doesn’t happen too often).
My next test will be to code this whole algorithm up and see how often it actually does happen.
Question to answer: according to Wikipedia, Pure ALOHA can only use 18.4% of the available bandwidth. The improved Slotted ALOHA doubles this to get 36.8%. Neither one uses “listen before send”, so how much can I get with this approach?
Inevitable losses in efficiency result from the 1ms timeout after each send plus up to another 1ms random wait to avoid collisions. This is on top of losses due to whatever collisions still occur. So I definitely will not be getting 100%. How much bandwidth will I actually get?