Node lagging behind

Since I booted my first mirror node, it seems that it is not working properly.
I can see a lot of connection loss/timeout in logs, but also fork issues, PoW issues and It’s usually not up-to-date. When I restart it, it grabs everything missing and get on par with the longest branch, but when I leave it running it doesn’t. At the time of writing, it is stuck on the block 402199 while the chain is at 402329. Here is an extract of logs:

2021-02-26T19:47:15+00:00 - info: WS2P: init: bundle of peers 6/7
2021-02-26T19:47:15+00:00 - info: WS2P: connected to peer Abpio1ZP using `WS2P g1.bourdon.eu.org 443`!
2021-02-26T19:47:15+00:00 - info: WS2P: connected to peer 8iVdpXqF using `WS2P g1.duniter.org 443`!
2021-02-26T19:47:15+00:00 - info: WS2P: connected to peer 7F6oyFQy using `WS2P vit.fdn.org 443`!
2021-02-26T19:47:15+00:00 - info: WS2P: connected to peer 2ny7YAdm using `WS2P 82.65.206.220 20900`!
2021-02-26T19:47:15+00:00 - info: WS2P: connected to peer HnQH8P6n using `WS2P obelix.toutat.is 443`!
2021-02-26T19:47:15+00:00 - info: WS2P: init: bundle of peers 7/7
2021-02-26T19:47:15+00:00 - info: WS2P: connected to peer 3bGqhAEL using `WS2P duniter2.lucho14.website 443`!
2021-02-26T19:47:16+00:00 - info: WS2P: connected to peer 5UhU5aNc using `WS2P g1.fdlibre.eu 443`!
2021-02-26T19:47:30+00:00 - info: WS2P: connection [74RBUM4V `WS2P monit.g1.nordstrom.duniter.org 443`] has been closed
2021-02-26T19:47:30+00:00 - info: WS2P: connection [8iVdpXqF `WS2P g1.duniter.org 443`] has been closed
2021-02-26T19:47:30+00:00 - info: WS2P: connection [Abpio1ZP `WS2P g1.bourdon.eu.org 443`] has been closed
2021-02-26T19:47:45+00:00 - info: WS2P: connection [2ny7YAdm `WS2P 82.65.206.220 20900`] has been closed
2021-02-26T19:47:48+00:00 - info: SIDE Block #402328-00000010 added to the blockchain in 0 ms
2021-02-26T19:47:48+00:00 - info: Block resolution: 0 potential blocks after current#402199...
2021-02-26T19:47:48+00:00 - info: Fork resolution: 24 potential block(s) found...
2021-02-26T19:48:03+00:00 - warn: Security trigger: proof-of-work process seems stuck
2021-02-26T19:48:03+00:00 - warn: Local node is not a member. Waiting to be a member before computing a block.
[...]
2021-02-26T19:49:25+00:00 - info: [AdmpBbkt] ⬇ PEER 74RBUM4V 402298-0
2021-02-26T19:49:25+00:00 - warn: Unknown reference block of peer

My node’s key is AdmpBbktXNwAdLYRtxaxn4PKDNG51gdghKSeD5AjLkLK.

Now, thoughts and questions:

  • The proof of work warning might be simply because the mirror node does not do anything, so in this case the warning could be ignored? Possibly shouldn’t be written at all.
  • The node detects a potential fork with ahead, yet it doesn’t switch to it. Config has switchOnHeadAdvance so I’d expect it to switch, why doesn’t it switch?
  • How come I get that much timeouts, is it normal behaviour?

Yes, it can be ignored if you’re not member. Maybe it’s useful to remind members to add their member key…

It’s probably a bug. Many people encounter bugs like this (problems with forks, sync, database corruption, valid blocks detected invalid…) and they seem to be pretty random. Sometimes it works after reset and sync again, sometimes not.

They involve complex mecanisms so I think they won’t be fixed until their code is migrated to Rust.

I don’t know, probably because some nodes are badly configured (so their public address is not accessible) or temporarily down.

AFAIK there is a problem with fork management. As I understood, a block is marked as invalid on the side-chain (maybe relatively to the local chain), and Duniter won’t reconsider this validation when solving the fork. A workaround is to restart the node (these falty blocks are in RAM). I have a cron task that restart my node every hour whith duniter stop ; duniter start. I still have this kind of problems once every months or so, on a Pi with 2Go RAM. Before the CRON, it was every 2 days.

see this post: Désynchro rapide de mon serveur Ğ1 - #2 by elois

My node regurlarly logs lots of timeouts. It does not prevent it to work preperly.

I understand from both your posts that timeouts and disconnects seem to be the norm, thank you.

Regarding the failing to sync up, the link does not describe what I see, as their log clearly show a warn about some invalid block, and I don’t have that… As far as I understand from the thread, the code is complex and currently not well written, so one should expect random issues and I might be hitting one for some reason. I’ll try the auto restart for the time being.