RFM69 hanged network

SendNode ACK
times to get ACK

This week I spend a lot of my free time learning RFM69 library core to find out hanged network problem. Full story you can find on Felix Rusu LowPowerLab forums.

So, below you can find how the things went:
Examining the code and putting all RFM69 logic on paper as diagram I found that receiveDone() do not work as expected.

The main problem what caused the network hanging was enabling interrupts (for very short time of course) when looking for free network to send ACK message.

Waiting for free network receiveDone() is called (it can enable interrupts) and you can get new message while you are processing current one.


As you notice there is no time out in this while loop, also. If canSend() logic will not return true it can run forever.

To solve this problem there are 2 ways at least:
a) improve receiveDone() and think about interupts
b) change/improve sendACK().

I went for b), as I didn’t liked that sendACK() clears incoming message and can destroy SENDERID.
With improved code below sendACK() can be executed immediately after receiveDone() without destroying message and getting some extra ms what is important for sleeping low power nodes waiting for ACK (avoiding read out data from radio before execute sendACK() as it is now in current version).

RFM69.cpp

 

RFM69.h (add first line, change second)

I will explain what I did and why:

-in RFM69.h added new defined value ACK_CSMA_LIMIT_MS which is the time in ms before retry to send a message if ACK wasn’t received
-use this parameter instead of manual value in sendWithRetry() procedure, because it will be used later in sendACK()
RFM69.cpp
– setting mode back to RX (without this sending do not work). Why changing status from STANDBY to TX in sendFrame do not allow to send, I didn’t figured. Seems it want’s RX before or some timing issues. The correct place for it is one line before sendFrame, but in that case it is not working (seems some time is needed to switch), tried to add “Wait for ModeReady” w/o success
– keeping RSSI, as we will do measurement for current network and message received RSSI should be kept
– in while loop we are checking free network conditions ONLY ACK_CSMA_LIMIT_MS time, as we do not need to do it longer because sendWithRetry only waits for ACK this amount of time. (In current version it is to long also, no need to wait for free network to sendACK 1000ms if sender waited only 40ms for answer)
– sending ACK
This version works very good for me, ACK times are improved and tracing sendACK() observed that in most cases while loop executes immediately (0ms). Super!
You can get full changed code and look at changes in github: https://github.com/openminihub/RFM69

 

RFM69 hanged network

2 thoughts on “RFM69 hanged network

  1. Tom says:

    Hi, nice fix for the potential deadlock (hang) in the sendAck,

    Does the send function need this too (see line 192 in cpp file)?

    I compared the latest lowpowerlabs rfm69 cpp and h files with your and see that in your version of interruptHandler lines 326 327 that your removed a call to receiveBegin which is in the lowpowerlabs interruptHandler (just above the commented out call to digitalWrite(4, 0)

    Is this call to ReceiveBegin needed?

    Thanks

  2. Martins says:

    Hi, some time turned around after your question…
    Answering on your questions:
    – no, send function do not need it, it already has a limit;
    – in my modified library receiveBegin is not needed.

Comments are closed.