Fun with the Wot in Neo4j

Hi there,

I recently had fun with Neo4j which is a graph database.
It allows to represent the data as the graph and it seems to me that it could be usefull as Wot is basically a graph.

Data Import

First of all, I imported the data from the duniter’s sqlite database. (test_net currency)
The script is pretty simple and can be found here : http://hastebin.com/duxekagoxi.py

Data Model

My data model is basic :

  • Nodes are identities with three properties : block_number, pseudo, pub_key
  • Relationships are certifications with a single propety : block_number

More complex data models can be used. We could add currency flows for example.

Let’s have fun

  • Display the Wot as with certifications emitted before the block 2000.

MATCH (a)-[r]->(b)
WHERE r.block < 2000
RETURN a,b,r

  • Display the shortest path between a specific nodes and all other nodes

MATCH p=ShortestPath(((person {pseudo:β€œgreyzlii”}) -[*]-> (f2f)))
RETURN DISTINCT person.pseudo, f2f.pseudo, length(p), [ x in nodes(p)[1…] | x.pseudo ]
ORDER BY length(p) DESC

  • List persons that signed the same persons than me

MATCH (n {pseudo:β€œgreyzlii”}) β†’ (cosignee) ← (cosigner)
RETURN cosignee.pseudo, [x in collect(cosigner) | x.pseudo ]

Possible Use Case for Duniter

This sytem could be used for a recommendation system (friend of friend).
For example, if a and b signed you and a and b are both signed by c, you probably know c.
If you sign c, you will shorten paths.

MATCH p=( (n {pseudo : β€œgreyzlii”}) <-[*2]- (f2f) )
WHERE NOT (n) β†’ (f2f) AND n <> f2f
RETURN f2f.pseudo, count(p), collect([ x in nodes(p)[1…-1] | x.pseudo])
ORDER BY count(p) DESC

Do you think it is usefull ?

Please let me know…

8 Likes

Very Nice !
I would like to use it to suggest friend into Cesium !

Awesome ! I’d love to see a duniter module based on this database for the nodes ! That would open tons of possibilities for the clients development!

2 Likes

Other Possible use case

Give details to a potential member when he is outdistanced.
We can display overdistanced sentries from a specific identity

MATCH (i) ← (sentry)
WITH sentry, count(i) as count_i
WHERE count_i >= 10
MATCH p=ShortestPath((me {pseudo: β€œGaluel”}) -[*]-> (sentry))
WITH sentry, length(p) as length_p
WHERE length_p > 2
RETURN sentry.pseudo, length_p

4 Likes

Fraud detection

Frauders have to create sybil identities and sign it with their Β« real Β» accounts.
They probably sign more newcomers (identities which is not member yet) than others members.

First, we can know who signs newcomers the most.
Ex : For each member, check the early signers (first three ones), then display signers who appears the most.

MATCH (n) -[cert]-> (m)
OPTIONAL MATCH (p) -[previous_cert]-> (m)
WHERE NOT previous_cert.block >= cert.block AND n <> p
WITH n, m, count(previous_cert) as count_pcert
WHERE count_pcert < 3
RETURN n.pseudo, count(m) as count_m, COLLECT(m.pseudo)
ORDER BY count_m DESC
LIMIT 10

We can also try to find collaborative frauders.
Ex : For each member, check the early signers (first three ones), group them, and find how many times they sign a newcomer together.

MATCH (i) -[r]-> (n{pseudo : "cgeek"})
WITH i, r, n
ORDER BY r.block
WITH n, collect(r.block)[2] as certif
SET n.thirdsig = certif
RETURN n.pseudo, certif

MATCH (a) -[cert_a]-> (m) <-[cert_b]- (b), (c) -[cert_c]-> (m)
WHERE id(a) > id(b) > id(c) AND cert_a.block <= m.thirdsig AND cert_b.block <= m.thirdsig AND cert_c.block <= m.thirdsig
RETURN a.pseudo, b.pseudo, c.pseudo, count(m) as count_m, collect(m.pseudo)
ORDER BY count_m DESC
LIMIT 10

3 Likes

it impresses me :wink:
I like the research and the work

Looks like tierce, cgeek, Thatoo, myself and yourself, greyzlii, are damn Sybil attackers… :wink: We have to watch these members for the future! :smiley:
Very nice work indeed!

I was wondering how we can distinguish the founders members helping first newcomers to join, from cheating members. It is quite difficult, if a founder get greedy and start cheating already at start.

1 Like

Hi there. Here is outdegree end in-degree distributions of the current Wot.
Sorry, graphs are not great. Will be better next time.

Outdegree distribution

Indegree distribution

2 Likes

In x, this is the indegree.
In y, the outdegree.

2 Likes

For people who don’t know this terms:

3 Likes

I think we could integrate neo4j with the Elasticsearch infrastructure used by duniter4j.

The idea is to reuse ES has search engin, WITH a filter that use neo4j graphs.

This two solutions (neo4j and ElasticSearch) are written in Java, and could be integrated inside duniter4j.

This article can help : http://graphaware.com/neo4j/2016/04/20/graph-aided-search-the-rise-of-personalised-content.html

2 Likes

So I started to play around with this. I especially implemented the F2F request since we are having trouble to expand the WoT.

This is a proof of concept, I was thinking about implementing standard requests under an API that could be added to the nodes.
For example, with the F2F example :
http://gtest.duniter.inso.ovh/neo4j/f2f/inso

[{"f2f":"vincentux",
"certifiers":["JeanFerreira","stanlog"]},
{"f2f":"moul",
"certifiers":["cgeek"]}]

We can see that I should certify vincentux or moul to reduce the WoT diameter. They are suggested to me because they certified JF/Stanlog or cgeek, who certified me.

What do you think ? Should we put this project in main duniter tree, and integrate this module cleanly to the code of duniter ? (I have no idea how to do so :smiley: )
The code is here : https://github.com/Insoleet/duniter-neo4j

1 Like

It can be usefull to give recommandations thru calculations but not the kernel. At the moment we miss information yet in the clients :

Who is sentry or not
What is the distance from 1 member to 90% of sentries ?
Where is the max distance to 90% ?
Where is the formula of that max distance ?
The whole Wot diameter (max distance of all members tio 90% of sentries)

Generally members must have those crucial informations because they are part of the wot code

Who is sentry or not

Indeed its missing

What is the distance from 1 member to 90% of sentries ?

This computation cannot be done by the clients and needs a specific API like this one

Where is the max distance to 90% ?

What do you mean by β€œwhere” ? Who are the members on the border of the wot ?

Where is the formula of that max distance ?

What do you mean by β€œformula” ?

The whole Wot diameter (max distance of all members tio 90% of sentries)

This computation cannot be done by the clients and needs a specific API too

Generally speaking, since the modularization of Duniter, it seems that we could implement such an API which users who would like to provide could install on their node. But I didn’t understand yet how to add/remove module to an installation of Duniter.

Since this module is depending on a graph database engine, it cannot be deployed to desktop servers.

Formula is y(n)=n(1:stepmax) = 21(1:3)=2 actually, will be 3 with 27 members and 4 with 64

F2F ??
EDIT : Display the shortest path between a specific nodes and all other nodes ?

F2F -> friend of friend ! See the original post of greyzlii :slight_smile:

@inso ok, tks ! :slight_smile:

What about to create a new API, named β€œWOT_GRAPH_API” ?
Could you:

  • Add your project into duniter repo, ( as subproject β€œduniter-neo4j” ?)
  • Add an endpoint to your Duniter node. So we could display this extended API into network view.

After that, we will be able to use it in Cesium+, to suggest friend. But maybe we will need to add features, such as sort (first suggestion should be link that could reduce the WoT size").

(sorry for my approximate English again)

1 Like

It seems that @yannlefranco reach 94% of sentries with stepmax at 3.

MATCH (n {sentry : 1} )
WHERE NOT n.uid = "yannlefranco"
WITH count(n) as count_n
MATCH p=ShortestPath((member {uid:"yannlefranco"}) <-[*..3]- (r_sentry {sentry : 1}))
RETURN member.uid,count_n as nb_sentries, count(r_sentry) as reachable_sentries, 100.0 * count(r_sentry) / count_n AS percent

╒══════════════╀═════════════╀════════════════════╀═════════════════╕
β”‚"member.uid"  β”‚"nb_sentries"β”‚"reachable_sentries"β”‚"percent"        β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ════════════════════β•ͺ═════════════════║
β”‚"yannlefranco"β”‚18           β”‚17                  β”‚94.44444444444444β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This number reduces at 55 when 2 steps only are authorized :

╒══════════════╀═════════════╀════════════════════╀═════════════════╕
β”‚"member.uid" β”‚"nb_sentries"β”‚"reachable_sentries"β”‚"percent" β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ════════════════════β•ͺ═════════════════║
β”‚"yannlefranco"β”‚18 β”‚10 β”‚55.55555555555556β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

cgeek reaches 100% of sentries at step2 :

╒════════════╀═════════════╀═══════╀════════════════════╀═════════════════╕
β”‚"member.uid"β”‚"nb_sentries"β”‚"steps"β”‚"reachable_sentries"β”‚"percent"        β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ═══════β•ͺ════════════════════β•ͺ═════════════════║
β”‚"cgeek"     β”‚18           β”‚1      β”‚13                  β”‚72.22222222222223β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"cgeek"     β”‚18           β”‚2      β”‚5                   β”‚27.77777777777778β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
4 Likes