Skip to content

Comm link too unstable for RTK use. #587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tonycanike opened this issue Jan 31, 2024 · 45 comments
Open

Comm link too unstable for RTK use. #587

tonycanike opened this issue Jan 31, 2024 · 45 comments

Comments

@tonycanike
Copy link

tonycanike commented Jan 31, 2024

My LoRaSerials are unusable for RTK work. The latency (age) of my RTK solution, as report by the Facets, is very unstable when using the LoRaSerial radios. It will work for a few minutes, then the comm link appears to go down, the latency climbs, I lose RTK fix. Then the comm link returns, I get an RTK fixed solution, things are good for a while, and then it happens all over again.

I do not have this problem with my Holybros or my RFD900x radios. They work perfectly with my two Facets in base rover RTK mode.

Here is my full problem description:
https://forum.sparkfun.com/viewtopic.php?f=116&t=60898

Multiple other users seem to be reporting this issue also.

viewtopic.php?f=116&t=60609
viewtopic.php?f=117&t=60671
viewtopic.php?f=117&t=60673
viewtopic.php?f=117&t=60872

@nseidle
Copy link
Member

nseidle commented Jan 31, 2024

Tony's configuration:

Base/Transmitter/Server
ATR
AT-AirSpeed=0
AT-AutoTune=0
AT-Bandwidth=500.00
AT-ClientFindPartnerRetryInterval=3
AT-CodingRate=7
AT-DataScrambling=0
AT-EnableCRC16=1
AT-EncryptData=1
AT-EncryptionKey=xxx
AT-FramesToYield=3
AT-FrequencyHop=1
AT-FrequencyMax=928.000
AT-FrequencyMin=902.000
AT-HeartBeatTimeout=5000
AT-MaxDwellTime=400
AT-MaxResends=0
AT-NetID=192
AT-NumberOfChannels=50
AT-OperatingMode=0
AT-OverHeadtime=10
AT-PreambleLength=8
AT-SelectLedUse=4
AT-Server=1
AT-SpreadFactor=8
AT-SyncWord=18
AT-TrainingKey=xxx
AT-TrainingTimeout=1
AT-TxPower=30
AT-TxToRxUsec=280
AT-VerifyRxNetID=1

ATS
AT-CopySerial=0
AT-Echo=0
AT-FlowControl=0
AT-InvertCts=0
AT-InvertRts=0
AT-RTSOffBytes=32
AT-RTSOnBytes=256
AT-SerialDelay=50
AT-SerialSpeed=57600
AT-UsbSerialWait=0
OK

Receiver/Rover
ATR
AT-AirSpeed=0
AT-AutoTune=0
AT-Bandwidth=500.00
AT-ClientFindPartnerRetryInterval=3
AT-CodingRate=7
AT-DataScrambling=0
AT-EnableCRC16=1
AT-EncryptData=1
AT-EncryptionKey=xxxx
AT-FramesToYield=3
AT-FrequencyHop=1
AT-FrequencyMax=928.000
AT-FrequencyMin=902.000
AT-HeartBeatTimeout=5000
AT-MaxDwellTime=400
AT-MaxResends=0
AT-NetID=192
AT-NumberOfChannels=50
AT-OperatingMode=0
AT-OverHeadtime=10
AT-PreambleLength=8
AT-SelectLedUse=4
AT-Server=0
AT-SpreadFactor=8
AT-SyncWord=18
AT-TrainingKey=xxxx
AT-TrainingTimeout=1
AT-TxPower=30
AT-TxToRxUsec=280
AT-VerifyRxNetID=1
OK

ats
ATS
AT-CopySerial=0
AT-Echo=0
AT-FlowControl=0
AT-InvertCts=0
AT-InvertRts=0
AT-RTSOffBytes=32
AT-RTSOnBytes=256
AT-SerialDelay=50
AT-SerialSpeed=57600
AT-UsbSerialWait=0

My radios were in multipoint mode (this makes the most sense for the continuous stream of updated RTK data).

@nseidle
Copy link
Member

nseidle commented Jan 31, 2024

I had SerialTest send a 100-character string every 500ms to my "server" LoRaSerial radio, and I monitored the output of the other radio with either Teraterm or another instance of SerialTest. The 4 green signal strength LEDs were illuminated on the receiving radio, I think 3 or 4 on the server/transmitting radio.

The yellow LED on the server radio would flash with every string transmitted (every 500ms), and the blue LED on the other radios would flash with every string received (every 500ms). Sometimes, often after 4 or so minutes, the reception would stop, the data wasn't displayed in TeraTerm, the blue LED stopped flashing. Nothing would happen for about 30 seconds. Then the green LEDs would blink on the receiving radio, and everything would start working fine again. Sometimes this would happen again every 3-4 minutes, and sometimes it would not.

It's like there's some bug and the receiving radio loses sync with the transmitting radio, and it has to restart/resync. The transmitting radio shows no apparent anomalous behavior - that yellow LED just keeps blinking every 500ms.

@nseidle
Copy link
Member

nseidle commented Jan 31, 2024

Thanks for reporting!

after 4 or so minutes, the reception would stop,

It sounds as if the units are getting out of sync. In MP mode, the server is transmitting a clock sync but if the client misses the clock sync multiple times, it will eventually get off frequency and the link will go down. We have the time delay of the client realizing it's truly out of sync, and the time delay where the server has to come back around through the hop table. If the client misses the clock sync, it has another wait for the server to come around again. We have a few mitigations in place to reduce this time: the client will enter a discover_scan mode where it actively pings the hop table but this can come back negative if the server is actively transmitting when the client is pinging.

P2P doesn't have this time delay. Because they are expecting to regularly hear from each other, the desync and sync times are much shorter.

Point-To-Point and Multipoint are very different beasts. Are you seeing similar issues with Point-To-Point?

@cturvey
Copy link

cturvey commented Feb 1, 2024

Point-To-Point and Multipoint are very different beasts.

Indeed, skimming thru this there doesn't look to be a method where one unit establishes itself as a master station, so they all potentially throw DATAGRAM_SYNC_CLOCKS packets at each other, or back-n-forth

@cturvey
Copy link

cturvey commented Feb 1, 2024

In the One-to-Many situation, really need to establish one as the primary station, driving the hop time and pattern. The primary should be only one broadcasting DATAGRAM_SYNC_CLOCKS, and ideally this should communicate the time to next hop, and where it's going to go. The rest of the stations need to synchronize to this, and not be sending their own DATAGRAM_SYNC_CLOCKS as this will just result in chaos as there's no indication of who's in charge, some of the stations will not be in range of each other, and apt to be synchronized by the periodicity of the messaging from the GPS receivers.

Perhaps some way to identify who is sending sync messages, and some level of precedence. Data from GPS/GNSS should allow for time domain synchronization around multiple units

@tonycanike
Copy link
Author

@nseidle Nathan, thanks for jumping on this.

I don't have a solid answer on your P2P vs. MP question. I'm geographically distant from my equipment right now and hope to work on this more mid-February.

I tried Point-to-Point once when I first got the LoRaSerial radios, but I've been focusing on MP as I believe it makes better sense for the RTK use case. If data is lost forgetaboutit, as updated data will be sent in the next second.

I experienced the MP issues with the two radios within 5 feet of each other on my workbench.

@cturvey I do configure the one radio at my RTK base to be the "server", and the docs say only the server is transmitting the sync heartbeats. I haven't looked at the code though.
https://docs.sparkfun.com/SparkFun_LoRaSerial/operating_modes/

Tony.

@cturvey
Copy link

cturvey commented Feb 1, 2024

Mostly just skimmed the source doing a quick static-analysis as to what looks to be going on, and where the sync packets are transmitted and where received. Unpacked the hopping somewhat, but suppose if it misses a hop it's going to have to wait until it cycles around. I'd need to get some units to do dynamic analysis and review debug output side-by-side, or back-port onto the DISCO / Murata platform

@nseidle
Copy link
Member

nseidle commented Feb 1, 2024

Hi @cturvey - I welcome the analysis and help. I can send you hardware if desired, just say the word and I'll PM you for an address.

@cturvey
Copy link

cturvey commented Feb 1, 2024

TBH the logic as I'm unpacking it suggests that both ends schedule transmission, the server doesn't act on reception

getTxTime(xmitDatagramP2PSyncClocks, &txSyncClocksUsec, "SYNC_CLOCKS");

@j-w-bullfrog
Copy link

Yes, Point -to Point has the same issues. I havn't seen any performance difference between to two modes, I also tested with the radio's between 1000ft, 10 & 3 ft apart so I could watch the led's.

@cturvey
Copy link

cturvey commented Feb 2, 2024

@nseidle Nathan, thanks for the units arrived today, battled the IDE and have it building, needed to regress from RadioLib 6.4.2 back to 5.1.2, but do have closure now. Need to find the LoRaSerial driver now.
Will dig in.

Made a .INF to pull in USBSER.SYS
https://github.com/cturvey/RandomNinjaChef/blob/main/sparkfun_loraserial.inf

@cturvey
Copy link

cturvey commented Feb 3, 2024

Not sure how you'd like to do support on this.
I've got the build process working on two boxes, one with IDE 1.8.19 and the other with IDE 2.2.1
Everything builds and downloads using the GitHub code (LoRaSerial v2.0 ?)
When I go into the AT-Server=1, AT-OperatingMode=0
It seems to trap out very quickly, either a power-on-reset or perhaps a watchdog.
Do I need to power these externally? Currently just running off USB on an older laptop / powered hub
The initial firmware they shipped with the unit didn't reset (USB ding-dong) like this.

@cturvey
Copy link

cturvey commented Feb 3, 2024

Reverting to the original image shared in the repo. So something in the build/library. Least confident in WDTZero
Noting method to push in original using Arduino 2.2.1 IDE / Arduino SAMD Board Package

"C:\Users\xx\AppData\Local\Arduino15\packages\arduino\tools\bossac\1.7.0-arduino3/bossac.exe" -i -d --port=COM8 -U true -i -e -w -v "C:\SparkFun\LoRaSerial\SparkFun_LoRaSerial_v2_0.bin" -R

@cturvey
Copy link

cturvey commented Feb 3, 2024

Turning off watchdog, last message, then it hangs.
"State: MP: Waiting for TX done"
Using RadioLib 5.1.2
Using SAMD_TimerInterrupt 1.10.1, will try 1.9.0
...
Ok 1.9.0 is happier, so making some progress

@cturvey
Copy link

cturvey commented Feb 4, 2024

@nseidle Likely not the source of the issue, but the logic here is broken allowing corrupt packets to be processed down-stream. Should be || (OR) not && (AND)

https://github.com/sparkfun/SparkFun_LoRaSerial/blob/main/Firmware/LoRaSerial/Radio.ino#L1902

    if ((incomingBuffer[rxDataBytes - 2] != (crc >> 8))
        && (incomingBuffer[rxDataBytes - 1] != (crc & 0xff)))
    {

I can compile and run code, so walking, adding/enabling instrumentation, and doing some dynamic analysis.
Right now not seeing the link dying, functional 10 Hours to this point. Occasional loss/recovery, with the 10 seconds of Blue LED fast flashing.

HEARTBEAT sent on a consistent/repetitive basis (with server ms time-stamping), SYNC_CLOCKS (with channel, in multi-point) on demand, reporting ACK-1 in response to FIND_PARTNER.

Unpacking the to-and-fro of the protocol in my head.

@tonycanike
Copy link
Author

Right now not seeing the link dying, functional 10 Hours to this point. Occasional loss/recovery, with the 10 seconds of Blue LED fast flashing

I wonder why that occasional loss/recovery is happening. And if that's the behavior that's causing me problems.

I'll assume a scenario where we're using the well-chosen defaults of the base transmitting data every second and the F9P dropping its RTK solution when it ages out at 60 seconds.

My experience is that in challenging situations it can take a 2-4 minutes to reestablish an RTK solution once lost.

And if the radio link is down for 20-30ish seconds,the F9P often doesn't calculate a new RTK solution before the 60 second age out.

If there's a loss of data to the rover for 20-30ish seconds every 4-5 minutes, the system usually doesn't have an RTK fixed solution and it is not usable.

To get back to my question up top, if "occasional loss/recovery" is every 5-10 minutes and "10 seconds" is closer to 20 seconds, the Facets generally won't have a stable RTK solution and they will not be usable.

It's also been my experience that I can not trust RTK Fixed solutions that are not stable. Bad fixes do happen.

This is all based on my memory experiences back in November and December, so take it with a grain of salt. I won't be geographically able to retest and confirm this for a month or so.

When I am able to test, I'll be very happy to help and test new LoRaSerial firmware. Testing the whole system (Facets and LoRaSerial radios) end-to-end inside on a workbench is not practical for me, so I need to go out in the field.

@cturvey
Copy link

cturvey commented Feb 4, 2024

Most of my observations are anecdotal at the moment, but the dropping looks to be precipitated by a packet drop-out (CRC) and HEARTBEAT Timeout. The rapid FIND_PARTNER strobing across channels seems ineffective, and it then recovers via the natural cycling of the channel hopping, and a success for a TX: FIND_PARTNER / RX: ACK-1

Some how I think it should be possible for the Stations to be more predictive of the Server's Hop Channel in the time-domain. In the one-to-many sense I don't think we want to be strafing the band with FIND_PARTNER requests, and instead "Scanning for servers" in a more passive sense, either listening for data packets or HEARTBEATs which are going to be occurring on a somewhat continuous basis.

The most practical way to do RTK is for the Server to just keep broadcasting, perhaps having two LoRaSerial 1W's with perhaps different bands, channels or spreading strategies. The stations could then be less powerful devices.

Or broadcasting a GPS Only RTCM3 subset once every 30 seconds at a prescribed channel. Knowing GPS ToW one could perhaps align at both ends, and modulo into the channel hop table if paranoid..

@tonycanike in your MULTIPOINT use case are you looking to back-haul position information back to the Server unit?

You can also push the RTK time-out on the ZED's out, As long as you're not losing carrier lock some maintained RTK FIXED/FLOAT solution is going to be significantly better than dumping to GNSS/DGNSS

@cturvey
Copy link

cturvey commented Feb 4, 2024

Misses HEARTBEAT, recovery via DISCOVERY fails, and waits for HEARTBEAT to cycle

RX: HEARTBEAT
Case #3, 0 Hops, 186 Nxt Hop - 44 (TX + RX) = 142 mSec
State: MP: Wait for TX or RX LinkUptime:     0:02:25
RX: HEARTBEAT
Case #3, 0 Hops, 51 Nxt Hop - 46 (TX + RX) = 5 mSec
State: MP: Wait for TX or RX LinkUptime:     0:02:30
HEARTBEAT Timeout
Lcl: 398, Rmt: 54 - 44 = 10 + 0 = 10 msToNextHop
Lcl: 238, Rmt: 212 - 44 = 168 + 0 = 168 msToNextHop
Lcl: 68, Rmt: 382 - 44 = 338 + 0 = 338 msToNextHop
Lcl: 192, Rmt: 261 - 44 = 217 + 0 = 217 msToNextHop
Lcl: 208, Rmt: 247 - 44 = 203 + 0 = 203 msToNextHop
Lcl: 216, Rmt: 237 - 44 = 193 + 0 = 193 msToNextHop
Lcl: 346, Rmt: 105 - 44 = 61 + 0 = 61 msToNextHop
Lcl: 70, Rmt: 382 - 44 = 338 + 0 = 338 msToNextHop
Lcl: 259, Rmt: 191 - 44 = 147 + 0 = 147 msToNextHop
Lcl: 290, Rmt: 163 - 44 = 119 + 0 = 119 msToNextHop
Lcl: 99, Rmt: 353 - 44 = 309 + 0 = 309 msToNextHop
Lcl: 96, Rmt: 358 - 44 = 314 + 0 = 314 msToNextHop
Lcl: 286, Rmt: 165 - 44 = 121 + 0 = 121 msToNextHop
Lcl: 68, Rmt: 382 - 44 = 338 + 0 = 338 msToNextHop
Lcl: 269, Rmt: 186 - 44 = 142 + 0 = 142 msToNextHop
Lcl: 5, Rmt: 51 - 46 = 5 + 0 = 5 msToNextHop
State: Disc: Setup for scanning
Start scanning
State: Disc: Scanning for servers
MP: SYNC_CLOCKS Timeout
TX: FIND_PARTNER
State: Disc: Wait for FIND_PARTNER to xmit
State: Disc: Scanning for servers
MP: SYNC_CLOCKS Timeout
TX: FIND_PARTNER
State: Disc: Wait for FIND_PARTNER to xmit
State: Disc: Scanning for servers
MP: SYNC_CLOCKS Timeout
TX: FIND_PARTNER
State: Disc: Wait for FIND_PARTNER to xmit
State: Disc: Scanning for servers
MP: SYNC_CLOCKS Timeout
TX: FIND_PARTNER
State: Disc: Wait for FIND_PARTNER to xmit
State: Disc: Scanning for servers
...
State: Disc: Wait for FIND_PARTNER to xmit
State: Disc: Scanning for servers
MP: SYNC_CLOCKS Timeout
TX: FIND_PARTNER
State: Disc: Wait for FIND_PARTNER to xmit
State: Disc: Scanning for servers
MP: SYNC_CLOCKS Timeout
State: Disc: Wait for Server HB
RX: HEARTBEAT
Case #3, 0 Hops, 382 Nxt Hop - 54 (TX + RX) = 328 mSec
    Channel Number: 0
Received HB, leaving DISCOVER standby
State: MP: Wait for TX or RX LinkUptime:     0:00:14
RX: HEARTBEAT
Case #3, 0 Hops, 285 Nxt Hop - 44 (TX + RX) = 241 mSec
State: MP: Wait for TX or RX LinkUptime:     0:00:18
RX: HEARTBEAT
Case #3, 0 Hops, 203 Nxt Hop - 44 (TX + RX) = 159 mSec
State: MP: Wait for TX or RX LinkUptime:     0:00:21
RX: HEARTBEAT
Case #3, 0 Hops, 131 Nxt Hop - 44 (TX + RX) = 87 mSec
State: MP: Wait for TX or RX LinkUptime:     0:00:26
RX: HEARTBEAT

Data CRC, HEARTBEAT timeout, successful quick recovery

Case #3, 0 Hops, 195 Nxt Hop - 44 (TX + RX) = 151 mSec
State: MP: Wait for TX or RX LinkUptime:     0:02:05
RX: HEARTBEAT
Case #2, 1 Hops, 1 Nxt Hop - 44 (TX + RX) + 400 Adj = 357 mSec
State: MP: Wait for TX or RX LinkUptime:     0:02:10
RX: Bad CRC-16, received 0x5204 expected 0xCABE
HEARTBEAT Timeout
Lcl: 338, Rmt: 110 - 44 = 66 + 0 = 66 msToNextHop
Lcl: 373, Rmt: 79 - 44 = 35 + 0 = 35 msToNextHop
Lcl: 350, Rmt: 104 - 44 = 60 + 0 = 60 msToNextHop
Lcl: 114, Rmt: 338 - 44 = 294 + 0 = 294 msToNextHop
Lcl: 95, Rmt: 355 - 44 = 311 + 0 = 311 msToNextHop
Lcl: 66, Rmt: 382 - 44 = 338 + 0 = 338 msToNextHop
Lcl: 276, Rmt: 174 - 44 = 130 + 0 = 130 msToNextHop
Lcl: 321, Rmt: 130 - 44 = 86 + 0 = 86 msToNextHop
Lcl: 55, Rmt: 395 - 44 = 351 + 0 = 351 msToNextHop
Lcl: 276, Rmt: 173 - 44 = 129 + 0 = 129 msToNextHop
Lcl: 295, Rmt: 159 - 44 = 115 + 0 = 115 msToNextHop
Lcl: 70, Rmt: 382 - 44 = 338 + 0 = 338 msToNextHop
Lcl: 189, Rmt: 260 - 44 = 216 + 0 = 216 msToNextHop
Lcl: 18, Rmt: 35 - 44 = -9 + 400 = 391 msToNextHop
Lcl: 258, Rmt: 195 - 44 = 151 + 0 = 151 msToNextHop
Lcl: 54, Rmt: 1 - 44 = -43 + 400 = 357 msToNextHop, timeToHop: 0, Hops: 1
State: Disc: Setup for scanning
Start scanning
State: Disc: Scanning for servers
MP: SYNC_CLOCKS Timeout
TX: FIND_PARTNER
...
State: Disc: Wait for FIND_PARTNER to xmit
State: Disc: Scanning for servers
RX: ACK-1
Case #3, 0 Hops, 290 Nxt Hop - 53 (TX + RX) = 237 mSec
    Channel Number: 26
State: MP: Wait for TX or RX LinkUptime:     0:00:00
RX: HEARTBEAT
Case #3, 0 Hops, 277 Nxt Hop - 44 (TX + RX) = 233 mSec
State: MP: Wait for TX or RX LinkUptime:     0:00:01
RX: HEARTBEAT
Case #3, 0 Hops, 207 Nxt Hop - 44 (TX + RX) = 163 mSec
State: MP: Wait for TX or RX LinkUptime:     0:00:04
RX: HEARTBEAT
Case #3, 0 Hops, 89 Nxt Hop - 44 (TX + RX) = 45 mSec
State: MP: Wait for TX or RX LinkUptime:     0:00:07

@j-w-bullfrog
Copy link

I can benchtest; aka see both radios diodes with them sitting next to each other, since my base rtk express unit sits in the basement attached by a 10 m signal line to a fixed antenna outside. Actually I also have a similar line for the radio antenna at the same point. About 20 ft away I have a daylight window that I can put the rtk express rover gps antenna in and enough cable to make the radios sit side by side so that I can watch the led's on both. I also have a sufficient # of radio antennas so that I don't have to use the one mounted 15 ft in the air.

@tonycanike
Copy link
Author

@cturvey I am not looking to backhaul the rover position. As far as I know, no data is flowing from the rover to the base.

@j-w-bullfrog That bench testing would be awesome. Much harder to do with my Facets and the integrated antenna.

@tonycanike
Copy link
Author

tonycanike commented Feb 4, 2024

@cturvey Your thought about two radios at the base reminded me of a desire for a repeater mode. I opened a new issue, number #588.

@j-w-bullfrog
Copy link

If you get me test firmware I can flash, I can do functional testing. If not, I'll probably need some pointers to get it to compile. Both rtk express of course have sd cards, and I do have u-center & m-center.

@cturvey
Copy link

cturvey commented Feb 5, 2024

@j-w-bullfrog I'm currently looking at the dynamic interactions of the stand-alone LoRaSerial units, with a focus on MP (Multi-Point, One-to-Many). I'm bench testing here and getting the debug / telemetry to where I need it to be before connecting it to my existing base infrastructure.

The build process is described here.
https://docs.sparkfun.com/SparkFun_LoRaSerial/firmware_build/
I've had it successfully build on IDE 1.8.x and IDE 2.x.x platforms
Building and Running are two different hurdles, but the Library Versions are important. Some have leeway, others don't.

  • Crypto, v0.4.0
  • FlashStorage_SAMD, v1.3.2
  • JC_Button, v2.1.2
  • RadioLib, v5.1.2 (v6 has api / compile differences)
  • SAMD_TimerInterrupt, v1.9.0 (v1.10.1 watchdogs)
  • WDTZero (cost of kicking is still high)

The issue with the timer being particularly difficult, as if you ATO or ATW the USB will disconnect, or repetitively connect-disconnect, and I had to open unit to recover.

Probably going to drop the spreading factor and coding rate to reduce the latency and air-time, and to more closely match my current deployment strategy.

@cturvey
Copy link

cturvey commented Feb 5, 2024

I don't like how the watchdog is kicked at all.

I really want to make one subroutine that kicks it and checks the channel hop. At the moment it does this same thing in hundreds of places. The hop should be around 400 ms, and the watchdog around 2 seconds. And there's places where is done fractions of a milli-second apart. We could check if millis() has even advanced. Several places where the kick is implied to take several milli-seconds, but that seems longer than the 32 KHz should take to sync, but perhaps prescalers also expand that time.

The WDT also has to sync with the 32.768 KHz as the WDT is on much slow clock domain, or the MCU will introduce a bus stall (basically stuff wait states, and out-to-lunch)
https://github.com/javos65/WDTZero/blob/master/src/WDTZero.cpp#L89C20-L89C39

You can pretest the SYNCBUSY so the write will fall straight through

https://hackaday.io/project/20647-mightywatt-r3-70w-electronic-load-for-arduino/log/56143-found-and-fixed-a-bug-in-sketch-for-arduino-zero

@cturvey
Copy link

cturvey commented Feb 5, 2024

Nevermind, there is some mitigation of this
https://github.com/sparkfun/SparkFun_LoRaSerial/blob/main/Firmware/LoRaSerial/Begin.ino#L74

Could still do something a bit different so it doesn't stall at all. Disappearing for 4-5 ms in this context is far from ideal, it'll break millis() I would think.

Still I think I'm going to make a CheckHopKickWatchdog() function

@j-w-bullfrog
Copy link

Interesting. I've seen where an esp / arduino (1.8?) will ignore an hardware interrupt connected subroutine to finish the current instruction. In this case it was a character write to a led screen. However the interrupt was generated by a zero crossing on the ac power line feeding power to a up to 2kw heater, so you could see the missed cycle. While not the same thing, I've seen quirks like this before.

That said, if other radios can perform using the supposedly same base s/w with the same chip set, what make this different?

@cturvey
Copy link

cturvey commented Feb 6, 2024

I can't speak to the ESP32, I'm working with the SAMD21 based radios, the v1.3 version of this guy in the plastic housing.
https://learn.sparkfun.com/tutorials/loraserial-hookup-guide/all
https://www.sparkfun.com/products/20029
I think Tony and I are looking at this from an RTCM3 Broadcaster, and potentially Re-Broadcaster, variant on MULTI-POINT

@j-w-bullfrog
Copy link

Yep, my bad on that. Those are the 1 w radios that I have; I just inherently fall into my own system perspective that data comes from the ZED, to the esp and sent to a rover esp zed, and the radios are just replacing the wifi or cellular to transmit that same data. Our app is rural and so wifi, and cellular are not reliable solutions for a 2 mile radius.

@archielowen
Copy link

Hey guys, I was brought here because I have the same issue, but I'm a newbie and I have no idea how to produce these reports you produce, program or anything you guys were discussing. I do field work regularly and can produce data if you want to test it. I'll keep an eye on this post and if you need a guinea pig just let me know.

@cturvey
Copy link

cturvey commented Feb 6, 2024

No worries, I've just enabled a bunch of the AT-Debug settings.
Mostly

  • AT-Debug
  • AT-DebugSync
  • AT-DebugDatagrams

These are going to be unhelpful for normal operation, but I want to understand the interplay at failure.

I'm not entirely sold on the need for the END-POINT (AT-Server=0) devices needing/wanting to squawk at the SERVER across all channels. I see that as unnecessarily disruptive to the eco-system, and you'd need to be on the right band at the right time to get a response.

With LoRa, if you're not listening you're going to miss packets.

@cturvey
Copy link

cturvey commented Feb 7, 2024

Multi-Point Server that only sends (no RSSI LEDs) and Multi-Point Client that just listens, waits for the Heart Beat to loop back, and doesn't send partner requests
https://github.com/cturvey/SparkFun_LoRaSerial/tree/tinker/Firmware/LoRaSerial

@tonycanike
Copy link
Author

tonycanike commented Feb 14, 2024

I'll try to bench test it this week. Just the radios, no RTK. My bench is in the basement and external GNSS antennas on Facets are not practical for me.

I imagine there are people with use cases for bi-directional multipoint configurations. If this is successful, perhaps this variant on multipoint could be a new distinct configuration option.

@nseidle
Copy link
Member

nseidle commented Feb 14, 2024

Same here - I'm away from hardware this week but should be able to pick this up next week.

@j-w-bullfrog
Copy link

I'll try to bench test it this week. Just the radios, no RTK. My bench is in the basement and external GNSS antennas on Facets are not practical for me.

How would you bench test just the radios w/o rtk? I have a bench setup, but other than watching the led's and examining the u-blox file from the sd, I'm clueless, and it doesn't point to what might be the trouble, just a problem.

@tonycanike
Copy link
Author

@j-w-bullfrog

How would you bench test just the radios w/o rtk?

I documented my bench test in the forum:

https://forum.sparkfun.com/viewtopic.php?f=116&t=60898

@j-w-bullfrog
Copy link

Tony, I've re-read your test procedure and it produces the same results that I got from using RTK in the setup, and what you have seen (weeks ago). I used both point to point, multipoint, both allowable speeds, and encryption disabled. It does help the radios to supply them with separate power.

@cturvey
Copy link

cturvey commented Feb 14, 2024

I haven't had much chance to work on this over the last week. I need to perhaps integrate some RTCM3 diagnostic output.

In terms of resyncing / recovery the channels could be reduced so it cycles quicker on 400 ms hops. Not sure of the feelings about cycling vs FCC compliance. LoRa already sweeps.

AT-NumberOfChannels=50

@nseidle
Copy link
Member

nseidle commented Feb 14, 2024

50 channel min required for FCC compliance. 400ms could be lowered but not increased.

@j-w-bullfrog
Copy link

I'm going to demonstrate my lack of hwd / sw understanding by asking why we can't take the holybro firmware a d run it on the one watt LorA's. My understanding is that its the same chipset, probably different pinouts (config), and not inherently paired, but all i need is MORE power. So, how completely wrong am i? BTW, I do get the data I need perfectly from the holybro's

@HighSpeedLowDrag1
Copy link

Multi-Point Server that only sends (no RSSI LEDs) and Multi-Point Client that just listens, waits for the Heart Beat to loop back, and doesn't send partner requests https://github.com/cturvey/SparkFun_LoRaSerial/tree/tinker/Firmware/LoRaSerial

Hi guys! I am very interested in what you are doing here and would like to help if I can. @cturvey Would there be a limit to the number of clients in this case? I have a need that requires more than 32 clients.

@j-w-bullfrog
Copy link

Any solutions to getting a working radio set? Thanks All.

@nseidle
Copy link
Member

nseidle commented Apr 2, 2024

Tony's original issue is with Multipoint. We are still working on a more robust solution for multipoint. Use the radios in P2P mode for now.

@j-w-bullfrog
Copy link

j-w-bullfrog commented Apr 2, 2024 via email

@tonycanike
Copy link
Author

For RTK work, I really want simplex one-way base-to-rover transmissions. The rover NEVER transmits any RF. The Base just transmits and sends data. The rover just receives. No ACKs. No retries. No handshaking. Keep it simple. Here's why:

  1. POWER - the base radio can transmit at a higher power with an external battery, the rover can be powered off the Facet and receive. We know the 1w radios can not be fully powered by the Facet radio port. The Facet radio port is (understandably) voltage and current limited. It's not hard to put a battery and make a custom power cable for the base radio to fully power the base radio. I find it surprisingly challenging to fully power the rover radio and not create nightmare for myself.

I spent 4 hours on my hands and knees crawling through dense autumn olive the other day with my Facet rover, pole, radio, and a weak JST connector cable. Understory brush, briars, multi-flora, poison ivy, barberry, hawthorns, autumn olive, and all the other crap out there is grabby, pokey, prickly, itchy, and totally annoying. The brush grabs at cables and external batteries and wants to eat them. This is the reality of surveying. Those pretty pictures of nicely-dressed happy smiling people with a rover on a manhole cover are total bs...well at least they don't correspond to my experience of surveying!

Having dangly cables, external batteries, and other choss on the rover pole is simply not a workable solution. The rover setup needs to be robust, clean, simple, and lightweight.

Being able to fully power the base radio with an external battery and minimally power the rover radios off the Facet would be a great solution. I tape up the JST cable to keep it in place. Of course, this means the base radio probably will not be able to receive transmissions from the rover radio, hence this post.

  1. DUTY CYCLE & BANDWIDTH - as others have points out above, if you're receiving you can't transmit. If you're transmitting you can't receive. It's just a waste of time for the base radio to receive. It's just a waste of time for the rover radio to transmit, and the rover probably then misses data from the base.

  2. HIGHER PROBABILITY OF WORKING - if the radios aren't relying on handshakes, some data is more likely to get from the base to the rover. If the rover misses a few packets, so what? No one cares. Next second new data comes down the pike. The old data is worthless now, stop trying to resend it. Retries, handshakes, and acknowledgments are a non-value-add "feature" in a base-rover RTK radio link.

I didn't invent the above - this is how the 450MHz band radios that most surveyors use work.

I, for one, have sadly put the SparkFun LoRaSerial radios into the "stuff that doesn't work" box. I really wanted to like them. I have some of the above power problems with the RFD900x radios I use, and I hate the "hard to see" LEDs on the RFD900x radios. The SparkFun radios have great LEDs.

@Raulricardo23
Copy link

SOME NEWS so that the multipoint signal is not lost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants