Indexeur squid Axiom en panne

:arrow_right: https://gdev-squid.axiom-team.fr/v1/graphql

query LatestBlock {
  block(orderBy: {height: DESC}, limit: 1) {
    timestamp
    height
  }
}

{
  "data": {
    "block": [
      {
        "timestamp": "2024-05-12T14:40:48.001+00:00",
        "height": 1405030
      }
    ]
  }
}

Donc pas d’indexation depuis 11 jours @poka

(le mien aussi à cause d’un OOM, je le relance)

Aïe.
Le soucis c’est que toute mes ressources sont OK pour ce container.
J’ai fait un grep rapide de “error” dans les logs, regardé à la date du 12, je vois notamment ça:

duniter-squid-hasura-squid-1  | {"detail":{"http_info":{"content_encoding":null,"http_version":"HTTP/1.1","ip":"127.0.0.1","method":"GET","status":500,"url":"/healthz"},"operation":{"error":{"code":"postgres-error","error":"connection error","internal":"connection to server at \"db\" (172.23.0.2), port 5432 failed: FATAL:  the database system is in recovery mode\n","path":"$"},"query":{"type":null},"request_id":"d9dbf4bd-d954-42f4-ab28-4ca1b4120663","request_mode":"error","response_size":189,"uncompressed_response_size":189},"request_id":"d9dbf4bd-d954-42f4-ab28-4ca1b4120663"},"level":"error","timestamp":"2024-05-12T14:41:12.250+0000","type":"http-log"}

A voir si ça correspond à la minute exacte de l’arrêt de de l’indexation.
Il faut vraiment comprendre ce qui à causer ça, je ne relance pas sans avoir compris.
Ca aurait pu être les backup auto du LXC qui cause un soucis de psql au gel du système, mais en l’occurence l’heure ne correspond pas du tout.

Comme ça on y voit plus clair :

{
  "detail": {
    "http_info": {
      "content_encoding": null,
      "http_version": "HTTP/1.1",
      "ip": "127.0.0.1",
      "method": "GET",
      "status": 500,
      "url": "/healthz"
    },
    "operation": {
      "error": {
        "code": "postgres-error",
        "error": "connection error",
        "internal": "connection to server at \"db\" (172.23.0.2), port 5432 failed: FATAL:  the database system is in recovery mode\n",
        "path": "$"
      },
      "query": { "type": null },
      "request_id": "d9dbf4bd-d954-42f4-ab28-4ca1b4120663",
      "request_mode": "error",
      "response_size": 189,
      "uncompressed_response_size": 189
    },
    "request_id": "d9dbf4bd-d954-42f4-ab28-4ca1b4120663"
  },
  "level": "error",
  "timestamp": "2024-05-12T14:41:12.250+0000",
  "type": "http-log"
}

FATAL: the database system is in recovery mode

Il faut aller regarder du côté des logs de la db postgres. Tu utilises un volume ou un dossier monté ?

J’utilise un volume standard.

Ca correspond bien au moment où l’indexation s’est terminé:

Voici les logs du container psql à ce moment là:

duniter-squid-db-1  | 2024-05-12 14:20:53.208 UTC [62] LOG:  checkpoint complete: wrote 281 buffers (1.7%); 0 WAL file(s) added, 0 removed, 0 recycled; write=28.090 s, sync=0.017 s, total=28.156 s; sync files=47, longest=0.010 s, average=0.001 s; distance=1995 kB, estimate=2085 kB
duniter-squid-db-1  | 2024-05-12 14:25:25.306 UTC [62] LOG:  checkpoint starting: time
duniter-squid-db-1  | 2024-05-12 14:25:56.036 UTC [62] LOG:  checkpoint complete: wrote 307 buffers (1.9%); 0 WAL file(s) added, 0 removed, 0 recycled; write=30.673 s, sync=0.017 s, total=30.731 s; sync files=50, longest=0.010 s, average=0.001 s; distance=2039 kB, estimate=2081 kB
duniter-squid-db-1  | 2024-05-12 14:30:25.134 UTC [62] LOG:  checkpoint starting: time
duniter-squid-db-1  | 2024-05-12 14:30:56.301 UTC [62] LOG:  checkpoint complete: wrote 311 buffers (1.9%); 0 WAL file(s) added, 0 removed, 1 recycled; write=31.090 s, sync=0.016 s, total=31.168 s; sync files=51, longest=0.010 s, average=0.001 s; distance=2122 kB, estimate=2122 kB
duniter-squid-db-1  | 2024-05-12 14:35:25.398 UTC [62] LOG:  checkpoint starting: time
duniter-squid-db-1  | 2024-05-12 14:35:56.107 UTC [62] LOG:  checkpoint complete: wrote 297 buffers (1.8%); 0 WAL file(s) added, 0 removed, 0 recycled; write=29.673 s, sync=0.980 s, total=30.710 s; sync files=49, longest=0.168 s, average=0.020 s; distance=1962 kB, estimate=2106 kB
duniter-squid-db-1  | 2024-05-12 14:40:25.165 UTC [62] LOG:  checkpoint starting: time
duniter-squid-db-1  | 2024-05-12 14:41:11.036 UTC [1] LOG:  server process (PID 1443547) was terminated by signal 13: Broken pipe
duniter-squid-db-1  | 2024-05-12 14:41:11.036 UTC [1] LOG:  terminating any other active server processes
duniter-squid-db-1  | 2024-05-12 14:41:11.066 UTC [1443549] LOG:  could not send data to client: Broken pipe
duniter-squid-db-1  | 2024-05-12 14:41:11.626 UTC [1443561] FATAL:  the database system is in recovery mode
duniter-squid-db-1  | 2024-05-12 14:41:11.640 UTC [1443559] FATAL:  the database system is in recovery mode
duniter-squid-db-1  | 2024-05-12 14:41:11.640 UTC [1443557] FATAL:  the database system is in recovery mode
duniter-squid-db-1  | 2024-05-12 14:41:11.640 UTC [1443558] FATAL:  the database system is in recovery mode

Les syslogs:

$ sudo journalctl --since "2024-05-12 14:39:00" --until "2024-05-12 14:45:00"
-- Journal begins at Thu 2022-08-25 20:19:48 UTC, ends at Fri 2024-05-24 18:14:51 UTC. --
May 12 14:39:17 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189-runc.x7QTvR.mount: Succeeded.
May 12 14:39:20 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-29d439ef4e99f80e64c0d5defe6f5a3f29e33e3f866a647942fbc3ebf43d506c-runc.4K2IqY.mount: Succeeded.
May 12 14:39:20 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189-runc.NxRR5c.mount: Succeeded.
May 12 14:39:33 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189-runc.9ziqfk.mount: Succeeded.
May 12 14:39:50 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189-runc.U29YgC.mount: Succeeded.
May 12 14:39:55 v2s-poka dockerd[281]: time="2024-05-12T14:39:55.260818892Z" level=error msg="Could not send KILL signal to container process" container=31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189 error="process does not exist efe4e16baa284eb86e923ef005b559595>
May 12 14:40:18 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189-runc.XASHdc.mount: Succeeded.
May 12 14:40:21 v2s-poka dockerd[281]: time="2024-05-12T14:40:21.606016740Z" level=error msg="Could not send KILL signal to container process" container=31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189 error="process already finished: not found" exec=fdcdace63ea726>
May 12 14:41:04 v2s-poka dockerd[281]: time="2024-05-12T14:41:03.509302212Z" level=error msg="stream copy error: read /proc/self/fd/18: file already closed"
May 12 14:41:04 v2s-poka dockerd[281]: time="2024-05-12T14:41:03.583317582Z" level=error msg="stream copy error: read /proc/self/fd/17: file already closed"
May 12 14:41:07 v2s-poka containerd[111]: time="2024-05-12T14:41:06.604959120Z" level=warning msg="error copying stdout" runtime=io.containerd.runc.v2
May 12 14:41:07 v2s-poka containerd[111]: time="2024-05-12T14:41:06.604996539Z" level=warning msg="error copying stderr" runtime=io.containerd.runc.v2
May 12 14:41:15 v2s-poka systemd[1]: docker-d812b8b86177121f16159b95e656a1fff40e868b5dd9d962703096aaa8857713.scope: Succeeded.
May 12 14:41:16 v2s-poka dockerd[281]: time="2024-05-12T14:41:16.243510882Z" level=info msg="ignoring event" container=d812b8b86177121f16159b95e656a1fff40e868b5dd9d962703096aaa8857713 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 12 14:41:16 v2s-poka containerd[111]: time="2024-05-12T14:41:16.347410944Z" level=info msg="shim disconnected" id=d812b8b86177121f16159b95e656a1fff40e868b5dd9d962703096aaa8857713
May 12 14:41:16 v2s-poka containerd[111]: time="2024-05-12T14:41:16.377237177Z" level=warning msg="cleaning up after shim disconnected" id=d812b8b86177121f16159b95e656a1fff40e868b5dd9d962703096aaa8857713 namespace=moby
May 12 14:41:16 v2s-poka containerd[111]: time="2024-05-12T14:41:16.383903941Z" level=info msg="cleaning up dead shim"
May 12 14:41:16 v2s-poka systemd[1]: docker-d812b8b86177121f16159b95e656a1fff40e868b5dd9d962703096aaa8857713.scope: Consumed 3h 44min 35.292s CPU time.
May 12 14:41:16 v2s-poka containerd[111]: time="2024-05-12T14:41:16.523588274Z" level=warning msg="cleanup warnings time=\"2024-05-12T14:41:16Z\" level=info msg=\"starting signal loop\" namespace=moby pid=315016 runtime=io.containerd.runc.v2\n"
May 12 14:41:17 v2s-poka systemd[1]: run-docker-netns-1b82ccc5c2ef.mount: Succeeded.
May 12 14:41:18 v2s-poka systemd[1]: var-lib-docker-overlay2-704975724fe63734d1f3f5ec6d9e14bcefce8aa40ee3a984268950be5ea8b4e8-merged.mount: Succeeded.
May 12 14:41:29 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189-runc.Lc8oXm.mount: Succeeded.
May 12 14:42:04 v2s-poka systemd[1]: run-docker-runtime\x2drunc-moby-31d6c795d1814f56412b942b32b3a1bb9f65b2ad6102989bf56fea193689d189-runc.EZeqiu.mount: Succeeded.

Désolé pour la mise en forme je ne peux pas faire mieux, mieux vaut copier ces logs dans un éditeur de texte en local pour analyse.

Probablement un soucis de freeze system, dû à une activité anormalement importante d’un autre processus, peut être dans un autre container de cette infra (mini pc chez moi, proxmox).

Donc le soucis ne vient pas de la stack squid en soit, mais de mon infra.
Je tente de relancer, sinon je resync. Tous les logs pertinents sont là pour trace, si jamais ça se reproduit.


edit: un simple restart des container à suffit à relancer la sync où elle s’était arrêté, donc nickel.

3 Likes