Vertex removal in LightGraphs
Testing abstractions, their limits and leaks
In various graph-related algorithms, a graph is modified through successive operations, merging, creating and deleting vertices. That’s the case for the Blossom algorithm finding a best matching in a graph and using contractions of nodes. In such cases, it can be useful to remove only the vertex being contracted, and maintain the number of all other vertices.
LightGraphs.jl offers a set of abstractions, types and algorithms to get started
with graphs. The claim of the abstraction is simple: whatever the underlying
structure representing your graph, if it implements the AbstractGraph
interface,
it can be used out of the box with all algorithms built on LightGraphs.jl.
The main concrete type presented by LightGraphs.jl is SimpleGraph
and its
directed counterpart SimpleDiGraph
, only storing edges as adjacency lists,
meaning vertices are just the integers from 1 to the length of the list.
This means that in a graph with 6 vertices, deleting vertex 4 will re-label vertex 6
as 4. Hopefully, the interface should allow us to build a graph type on top of another graph,
re-implementing only vertex removal.
A simple vertex-safe implementation
First things first, we will build it as a struct, using LightGraphs:
import LightGraphs
const LG = LightGraphs
struct VSafeGraph{T, G<:LG.AbstractGraph{T}, V<:AbstractVector{Int}} <: LG.AbstractGraph{T}
g::G
deleted_vertices::V
VSafeGraph(g::G, v::V) where {T, G<:LG.AbstractGraph{T}, V<:AbstractVector{Int}} = new{T, G, V}(g, v)
end
VSafeGraph(g::G) where {G<:LG.AbstractGraph} = VSafeGraph(g, Vector{Int}())
VSafeGraph(nv::Integer) = VSafeGraph(LG.SimpleGraph(nv))
We added simple default constructors for convenience. The structure holds two elements:
- An inner abstract graph
g
- A list of vertices already deleted:
deleted_vertices
.
The interface can now be implemented for our type, starting with the trivial parts:
LG.edges(g::VSafeGraph) = LG.edges(g.g)
LG.edgetype(g::VSafeGraph) = LG.edgetype(g.g)
LG.is_directed(g::VSafeGraph) = LG.is_directed(g.g)
LG.is_directed(::Type{<:VSafeGraph{T,G}}) where {T,G} = LG.is_directed(G)
LG.ne(g::VSafeGraph) = LG.ne(g.g)
LG.nv(g::VSafeGraph) = LG.nv(g.g) - length(g.deleted_vertices)
LG.vertices(g::VSafeGraph) = (v for v in LG.vertices(g.g) if !(v in g.deleted_vertices))
LG.outneighbors(g::VSafeGraph, v) = LG.outneighbors(g.g, v)
LG.inneighbors(g::VSafeGraph, v) = LG.inneighbors(g.g, v)
LG.has_vertex(g::VSafeGraph, v) = LG.has_vertex(g.g, v) && !(v in g.deleted_vertices)
LG.has_edge(g::VSafeGraph, e) = LG.has_edge(g.g, e)
LG.add_vertex!(g::VSafeGraph) = LG.add_vertex!(g.g)
LG.rem_edge!(g::VSafeGraph, v1, v2) = LG.rem_edge!(g.g, v1, v2)
Base.copy(g::VSafeGraph) = VSafeGraph(copy(g.g), copy(g.deleteed_vertices))
For most of these, we only re-call the method on the inner graph type.
Only for LG.nv
, which computes the number of vertices in the inner graph,
minus the number of vertices in our removed list. Now the tricky parts,
adding an edge and removing a vertex, which require a bit more verifications:
function LG.add_edge!(g::VSafeGraph, v1, v2)
if !LG.has_vertex(g, v1) || !LG.has_vertex(g, v2)
return false
end
LG.add_edge!(g.g, v1, v2)
end
function LG.rem_vertex!(g::VSafeGraph, v1)
if !LG.has_vertex(g, v1) || v1 in g.deleted_vertices
return false
end
for v2 in LG.outneighbors(g, v1)
LG.rem_edge!(g, v1, v2)
end
for v2 in LG.inneighbors(g, v1)
LG.rem_edge!(g, v2, v1)
end
push!(g.deleted_vertices, v1)
return true
end
Instead of removing the vertex v1
from the inner graph, the function removes
all edges pointing to and from v1
, and then adds it to the removed list.
Specific and generic tests
So far so good, we can add some basic tests to check our type behaves as expected:
@testset "Graph construction and basic interface" begin
nv = 20
g1 = VSafeGraph(nv)
@test LG.nv(g1) == nv
@test LG.nv(g1.g) == nv
g2_inner = LG.CompleteGraph(nv)
g2 = VSafeGraph(g2_inner)
@test LG.nv(g2) == LG.nv(g2_inner)
@test LG.ne(g2) == LG.ne(g2_inner)
@test all(sort(collect(LG.vertices(g2))) .== sort(collect(LG.vertices(g2_inner))))
g3 = VSafeGraph(LG.CompleteDiGraph(30))
@test LG.is_directed(g3)
@test !LG.is_directed(g2)
end
@testset "Vertex deletion" begin
Random.seed!(33)
nv = 45
inner = LG.CompleteGraph(nv)
g = VSafeGraph(inner)
@test LG.ne(inner) == LG.ne(g)
@test LG.nv(inner) == LG.nv(g)
nrm = 0
for _ in 1:15
removed_ok = LG.rem_vertex!(g, rand(1:nv))
if !removed_ok
continue
end
nrm += 1
@test LG.nv(inner) == nv
@test LG.nv(g) == nv - nrm
@test length(g.deleted_vertices) == nrm
@test LG.ne(inner) == LG.ne(g)
end
end
So far so good. Now, with the promise of generic graphs and the AbstractGraph interface, we should be able to use any algorithm in LightGraphs.jl, let us try to compute a page rank and a Kruskal minimum spanning tree:
nv = 45
inner = LG.CompleteGraph(nv)
g = VSafeGraph(inner)
removed_ok = LG.rem_vertex!(g, rand(1:nv))
@test removed_ok
# LG broken here
@test_throws BoundsError LG.pagerank(g)
@test_throws BoundsError LG.kruskal_mst(g)
Yikes, what’s happening here? Many parts of LightGraphs.jl use vertices computed
from vertices(g)
as indices for structures indexed by them. So if you remove
vertex 4 in a 6-vertex graph, vertices will be {1,2,3,5,6}
, and the rank
algorithm will try to access the 6th rank, even though only 5 exist.
Fixes and proposal
It would be too bad to throw the interface altogether, but we need to do
something for the broken behaviour. The underlying assumption here is that
vertices behave like indices for anything vertex-related.
So the way we implement this interface for VSafeGraph
is correct, but the
implicit contract is not, the way it is used in algorithms such as pagerank
and Kruskal leak the underlying implementation for SimpleGraph
:
a contiguous list of integers from 1 to the number of vertices.
It reminds me of this great talk
on paying attention to the contract of an interface in Go, the type is telling
you what to expect in and out, but not how it is supposed or will be used.
The first fix is to make vertices
return 1:nv(g)
for VSafeGraph, but if you
think about it, it means it needs to do such with any graph type, which means
the vertices
function is redundant with other functions of the interface and
should not be mandatory. The other option is to fix breaking code to really
use the interface signalled and documented and not the leaked implementation.
We still have some good news though:
- Changing the code here is strictly non-breaking, since we would just remove the assumption that vertices are indices.
- If we want to keep this assumption for some pieces of code, it means these
pieces are not generic but specialized, something we can handle well using either
dispatch on types or traits, which LightGraphs.jl already does. There is a
IsDirected
trait associated with the fact that a graph is directed or not, there could also be aHasContiguousVertices
trait signalling whether this assumption is validated for a type.
Edit: refined proposal
Following some discussions with fellow LightGraphs.jl developers and users, a softer transition could be:
- Add the functions
vertex_indices(g)
andvertex_values(g)
to the interface,vertex_values
could default tovertex_indices
, which could itself default on1:nv(g)
. - Deprecate
vertices(g)
, with a fallback tovertex_indices
. - Replace all calls to
vertex
with eithervertex_indices
orvertex_values
depending on which makes sense for the use case.
This change is non-breaking and only deprecating vertices
, making the
interface more explicit. By keeping the two functions, we avoid having to use
enumerate(vertices_values(g))
every time we need indices.
Edit 2: Corrections to the functions
I have corrected various functions following Pankaj’s much needed Pull Request on the corresponding repository, thanks!
Edit 3
Seth Bromberger spotted an error in my assumptions, Swap-and-pop is used for vertex removal, so the last vertex will take the place of the removed one in the re-labelling.