-
Notifications
You must be signed in to change notification settings - Fork 7
Comm link too unstable for RTK use. #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tony's configuration: Base/Transmitter/Server ATS Receiver/Rover ats
|
|
Thanks for reporting!
It sounds as if the units are getting out of sync. In MP mode, the server is transmitting a clock sync but if the client misses the clock sync multiple times, it will eventually get off frequency and the link will go down. We have the time delay of the client realizing it's truly out of sync, and the time delay where the server has to come back around through the hop table. If the client misses the clock sync, it has another wait for the server to come around again. We have a few mitigations in place to reduce this time: the client will enter a discover_scan mode where it actively pings the hop table but this can come back negative if the server is actively transmitting when the client is pinging. P2P doesn't have this time delay. Because they are expecting to regularly hear from each other, the desync and sync times are much shorter. Point-To-Point and Multipoint are very different beasts. Are you seeing similar issues with Point-To-Point? |
Indeed, skimming thru this there doesn't look to be a method where one unit establishes itself as a master station, so they all potentially throw DATAGRAM_SYNC_CLOCKS packets at each other, or back-n-forth |
In the One-to-Many situation, really need to establish one as the primary station, driving the hop time and pattern. The primary should be only one broadcasting DATAGRAM_SYNC_CLOCKS, and ideally this should communicate the time to next hop, and where it's going to go. The rest of the stations need to synchronize to this, and not be sending their own DATAGRAM_SYNC_CLOCKS as this will just result in chaos as there's no indication of who's in charge, some of the stations will not be in range of each other, and apt to be synchronized by the periodicity of the messaging from the GPS receivers. Perhaps some way to identify who is sending sync messages, and some level of precedence. Data from GPS/GNSS should allow for time domain synchronization around multiple units |
@nseidle Nathan, thanks for jumping on this. I don't have a solid answer on your P2P vs. MP question. I'm geographically distant from my equipment right now and hope to work on this more mid-February. I tried Point-to-Point once when I first got the LoRaSerial radios, but I've been focusing on MP as I believe it makes better sense for the RTK use case. If data is lost forgetaboutit, as updated data will be sent in the next second. I experienced the MP issues with the two radios within 5 feet of each other on my workbench. @cturvey I do configure the one radio at my RTK base to be the "server", and the docs say only the server is transmitting the sync heartbeats. I haven't looked at the code though. Tony. |
Mostly just skimmed the source doing a quick static-analysis as to what looks to be going on, and where the sync packets are transmitted and where received. Unpacked the hopping somewhat, but suppose if it misses a hop it's going to have to wait until it cycles around. I'd need to get some units to do dynamic analysis and review debug output side-by-side, or back-port onto the DISCO / Murata platform |
Hi @cturvey - I welcome the analysis and help. I can send you hardware if desired, just say the word and I'll PM you for an address. |
TBH the logic as I'm unpacking it suggests that both ends schedule transmission, the server doesn't act on reception
|
Yes, Point -to Point has the same issues. I havn't seen any performance difference between to two modes, I also tested with the radio's between 1000ft, 10 & 3 ft apart so I could watch the led's. |
@nseidle Nathan, thanks for the units arrived today, battled the IDE and have it building, needed to regress from RadioLib 6.4.2 back to 5.1.2, but do have closure now. Need to find the LoRaSerial driver now. Made a .INF to pull in USBSER.SYS |
Not sure how you'd like to do support on this. |
Reverting to the original image shared in the repo. So something in the build/library. Least confident in WDTZero
|
Turning off watchdog, last message, then it hangs. |
@nseidle Likely not the source of the issue, but the logic here is broken allowing corrupt packets to be processed down-stream. Should be || (OR) not && (AND) https://github.com/sparkfun/SparkFun_LoRaSerial/blob/main/Firmware/LoRaSerial/Radio.ino#L1902
I can compile and run code, so walking, adding/enabling instrumentation, and doing some dynamic analysis. HEARTBEAT sent on a consistent/repetitive basis (with server ms time-stamping), SYNC_CLOCKS (with channel, in multi-point) on demand, reporting ACK-1 in response to FIND_PARTNER. Unpacking the to-and-fro of the protocol in my head. |
Right now not seeing the link dying, functional 10 Hours to this point. Occasional loss/recovery, with the 10 seconds of Blue LED fast flashing I wonder why that occasional loss/recovery is happening. And if that's the behavior that's causing me problems. I'll assume a scenario where we're using the well-chosen defaults of the base transmitting data every second and the F9P dropping its RTK solution when it ages out at 60 seconds. My experience is that in challenging situations it can take a 2-4 minutes to reestablish an RTK solution once lost. And if the radio link is down for 20-30ish seconds,the F9P often doesn't calculate a new RTK solution before the 60 second age out. If there's a loss of data to the rover for 20-30ish seconds every 4-5 minutes, the system usually doesn't have an RTK fixed solution and it is not usable. To get back to my question up top, if "occasional loss/recovery" is every 5-10 minutes and "10 seconds" is closer to 20 seconds, the Facets generally won't have a stable RTK solution and they will not be usable. It's also been my experience that I can not trust RTK Fixed solutions that are not stable. Bad fixes do happen. This is all based on my memory experiences back in November and December, so take it with a grain of salt. I won't be geographically able to retest and confirm this for a month or so. When I am able to test, I'll be very happy to help and test new LoRaSerial firmware. Testing the whole system (Facets and LoRaSerial radios) end-to-end inside on a workbench is not practical for me, so I need to go out in the field. |
Most of my observations are anecdotal at the moment, but the dropping looks to be precipitated by a packet drop-out (CRC) and HEARTBEAT Timeout. The rapid FIND_PARTNER strobing across channels seems ineffective, and it then recovers via the natural cycling of the channel hopping, and a success for a TX: FIND_PARTNER / RX: ACK-1 Some how I think it should be possible for the Stations to be more predictive of the Server's Hop Channel in the time-domain. In the one-to-many sense I don't think we want to be strafing the band with FIND_PARTNER requests, and instead "Scanning for servers" in a more passive sense, either listening for data packets or HEARTBEATs which are going to be occurring on a somewhat continuous basis. The most practical way to do RTK is for the Server to just keep broadcasting, perhaps having two LoRaSerial 1W's with perhaps different bands, channels or spreading strategies. The stations could then be less powerful devices. Or broadcasting a GPS Only RTCM3 subset once every 30 seconds at a prescribed channel. Knowing GPS ToW one could perhaps align at both ends, and modulo into the channel hop table if paranoid.. @tonycanike in your MULTIPOINT use case are you looking to back-haul position information back to the Server unit? You can also push the RTK time-out on the ZED's out, As long as you're not losing carrier lock some maintained RTK FIXED/FLOAT solution is going to be significantly better than dumping to GNSS/DGNSS |
Misses HEARTBEAT, recovery via DISCOVERY fails, and waits for HEARTBEAT to cycle
Data CRC, HEARTBEAT timeout, successful quick recovery
|
I can benchtest; aka see both radios diodes with them sitting next to each other, since my base rtk express unit sits in the basement attached by a 10 m signal line to a fixed antenna outside. Actually I also have a similar line for the radio antenna at the same point. About 20 ft away I have a daylight window that I can put the rtk express rover gps antenna in and enough cable to make the radios sit side by side so that I can watch the led's on both. I also have a sufficient # of radio antennas so that I don't have to use the one mounted 15 ft in the air. |
@cturvey I am not looking to backhaul the rover position. As far as I know, no data is flowing from the rover to the base. @j-w-bullfrog That bench testing would be awesome. Much harder to do with my Facets and the integrated antenna. |
If you get me test firmware I can flash, I can do functional testing. If not, I'll probably need some pointers to get it to compile. Both rtk express of course have sd cards, and I do have u-center & m-center. |
@j-w-bullfrog I'm currently looking at the dynamic interactions of the stand-alone LoRaSerial units, with a focus on MP (Multi-Point, One-to-Many). I'm bench testing here and getting the debug / telemetry to where I need it to be before connecting it to my existing base infrastructure. The build process is described here.
The issue with the timer being particularly difficult, as if you ATO or ATW the USB will disconnect, or repetitively connect-disconnect, and I had to open unit to recover. Probably going to drop the spreading factor and coding rate to reduce the latency and air-time, and to more closely match my current deployment strategy. |
I don't like how the watchdog is kicked at all. I really want to make one subroutine that kicks it and checks the channel hop. At the moment it does this same thing in hundreds of places. The hop should be around 400 ms, and the watchdog around 2 seconds. And there's places where is done fractions of a milli-second apart. We could check if millis() has even advanced. Several places where the kick is implied to take several milli-seconds, but that seems longer than the 32 KHz should take to sync, but perhaps prescalers also expand that time. The WDT also has to sync with the 32.768 KHz as the WDT is on much slow clock domain, or the MCU will introduce a bus stall (basically stuff wait states, and out-to-lunch) You can pretest the SYNCBUSY so the write will fall straight through |
Nevermind, there is some mitigation of this Could still do something a bit different so it doesn't stall at all. Disappearing for 4-5 ms in this context is far from ideal, it'll break millis() I would think. Still I think I'm going to make a CheckHopKickWatchdog() function |
Interesting. I've seen where an esp / arduino (1.8?) will ignore an hardware interrupt connected subroutine to finish the current instruction. In this case it was a character write to a led screen. However the interrupt was generated by a zero crossing on the ac power line feeding power to a up to 2kw heater, so you could see the missed cycle. While not the same thing, I've seen quirks like this before. That said, if other radios can perform using the supposedly same base s/w with the same chip set, what make this different? |
I can't speak to the ESP32, I'm working with the SAMD21 based radios, the v1.3 version of this guy in the plastic housing. |
Yep, my bad on that. Those are the 1 w radios that I have; I just inherently fall into my own system perspective that data comes from the ZED, to the esp and sent to a rover esp zed, and the radios are just replacing the wifi or cellular to transmit that same data. Our app is rural and so wifi, and cellular are not reliable solutions for a 2 mile radius. |
Hey guys, I was brought here because I have the same issue, but I'm a newbie and I have no idea how to produce these reports you produce, program or anything you guys were discussing. I do field work regularly and can produce data if you want to test it. I'll keep an eye on this post and if you need a guinea pig just let me know. |
No worries, I've just enabled a bunch of the AT-Debug settings.
These are going to be unhelpful for normal operation, but I want to understand the interplay at failure. I'm not entirely sold on the need for the END-POINT (AT-Server=0) devices needing/wanting to squawk at the SERVER across all channels. I see that as unnecessarily disruptive to the eco-system, and you'd need to be on the right band at the right time to get a response. With LoRa, if you're not listening you're going to miss packets. |
Multi-Point Server that only sends (no RSSI LEDs) and Multi-Point Client that just listens, waits for the Heart Beat to loop back, and doesn't send partner requests |
I'll try to bench test it this week. Just the radios, no RTK. My bench is in the basement and external GNSS antennas on Facets are not practical for me. I imagine there are people with use cases for bi-directional multipoint configurations. If this is successful, perhaps this variant on multipoint could be a new distinct configuration option. |
Same here - I'm away from hardware this week but should be able to pick this up next week. |
How would you bench test just the radios w/o rtk? I have a bench setup, but other than watching the led's and examining the u-blox file from the sd, I'm clueless, and it doesn't point to what might be the trouble, just a problem. |
I documented my bench test in the forum: |
Tony, I've re-read your test procedure and it produces the same results that I got from using RTK in the setup, and what you have seen (weeks ago). I used both point to point, multipoint, both allowable speeds, and encryption disabled. It does help the radios to supply them with separate power. |
I haven't had much chance to work on this over the last week. I need to perhaps integrate some RTCM3 diagnostic output. In terms of resyncing / recovery the channels could be reduced so it cycles quicker on 400 ms hops. Not sure of the feelings about cycling vs FCC compliance. LoRa already sweeps.
|
50 channel min required for FCC compliance. 400ms could be lowered but not increased. |
I'm going to demonstrate my lack of hwd / sw understanding by asking why we can't take the holybro firmware a d run it on the one watt LorA's. My understanding is that its the same chipset, probably different pinouts (config), and not inherently paired, but all i need is MORE power. So, how completely wrong am i? BTW, I do get the data I need perfectly from the holybro's |
Hi guys! I am very interested in what you are doing here and would like to help if I can. @cturvey Would there be a limit to the number of clients in this case? I have a need that requires more than 32 clients. |
Any solutions to getting a working radio set? Thanks All. |
Tony's original issue is with Multipoint. We are still working on a more robust solution for multipoint. Use the radios in P2P mode for now. |
As I stated previously, they don't work any better or differently in the
other mode.
…On Tue, Apr 2, 2024, 3:17 PM Nathan Seidle ***@***.***> wrote:
Tony's original issue is with Multipoint. We are still working on a more
robust solution for multipoint. Use the radios in P2P mode for now.
—
Reply to this email directly, view it on GitHub
<#587 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH2UE26QNLV4DNLBBJID4J3Y3MG5NAVCNFSM6AAAAABCTQI2V2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTGAYTQOBXGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
For RTK work, I really want simplex one-way base-to-rover transmissions. The rover NEVER transmits any RF. The Base just transmits and sends data. The rover just receives. No ACKs. No retries. No handshaking. Keep it simple. Here's why:
I spent 4 hours on my hands and knees crawling through dense autumn olive the other day with my Facet rover, pole, radio, and a weak JST connector cable. Understory brush, briars, multi-flora, poison ivy, barberry, hawthorns, autumn olive, and all the other crap out there is grabby, pokey, prickly, itchy, and totally annoying. The brush grabs at cables and external batteries and wants to eat them. This is the reality of surveying. Those pretty pictures of nicely-dressed happy smiling people with a rover on a manhole cover are total bs...well at least they don't correspond to my experience of surveying! Having dangly cables, external batteries, and other choss on the rover pole is simply not a workable solution. The rover setup needs to be robust, clean, simple, and lightweight. Being able to fully power the base radio with an external battery and minimally power the rover radios off the Facet would be a great solution. I tape up the JST cable to keep it in place. Of course, this means the base radio probably will not be able to receive transmissions from the rover radio, hence this post.
I didn't invent the above - this is how the 450MHz band radios that most surveyors use work. I, for one, have sadly put the SparkFun LoRaSerial radios into the "stuff that doesn't work" box. I really wanted to like them. I have some of the above power problems with the RFD900x radios I use, and I hate the "hard to see" LEDs on the RFD900x radios. The SparkFun radios have great LEDs. |
SOME NEWS so that the multipoint signal is not lost |
Uh oh!
There was an error while loading. Please reload this page.
My LoRaSerials are unusable for RTK work. The latency (age) of my RTK solution, as report by the Facets, is very unstable when using the LoRaSerial radios. It will work for a few minutes, then the comm link appears to go down, the latency climbs, I lose RTK fix. Then the comm link returns, I get an RTK fixed solution, things are good for a while, and then it happens all over again.
I do not have this problem with my Holybros or my RFD900x radios. They work perfectly with my two Facets in base rover RTK mode.
Here is my full problem description:
https://forum.sparkfun.com/viewtopic.php?f=116&t=60898
Multiple other users seem to be reporting this issue also.
viewtopic.php?f=116&t=60609
viewtopic.php?f=117&t=60671
viewtopic.php?f=117&t=60673
viewtopic.php?f=117&t=60872
The text was updated successfully, but these errors were encountered: