This repository contains an object oriented C++ library for representing and manipulating graph data structures. The library is designed to be flexible and extensible, allowing for different implementations of the graph data structure to be used depending on the specific use case. The core interface for the graph data structure is defined in the ESE::Graph class, which is an abstract base class that defines the core methods for accessing and manipulating graph data. The concrete implementations of the ESE::Graph class are ESE::CSRGraph, ESE::ListGraph, and ESE::EdgeListGraph, which use different data structures to store the graph data and provide different trade-offs in terms of performance and flexibility.
A graph is a data structure that consists of a set of nodes and a set of edges that connect the nodes. connections may either be directed (i.e. the two ends of the edge can be distinguished from each other) or undirected ( if A=>B is an edge, then B=>A is also an edge). The ESE graph library is designed to represent directed graphs, but can be used to represent undirected graphs by treating each undirected edge as two directed edges in opposite directions.
The repesentation makes a distinction between "root nodes", which are assumed to have no incoming edges and one outgoing edge, "sink nodes", which are assumed to have no outgoing edges, and "dynamic nodes", which may have any number of incoming and outgoing edges. This specialisation is a common pattern for trees and tree-like graphs, and allows for some optimisations in the graph data structure and algorithms.
The library uses a simple integer-based node id system to represent the nodes in the graph. The node ids are -n_root_nodes to 0 (exclusive) for root nodes, from 0 to n_dynamic_nodes (exclusive) for dynamic nodes and from n_dynamic_nodes to n_dynamic_nodes+n_sink_nodes (exclusive) for sink nodes. In other words node numbering is assumed to be contiguous and ordered, with all root nodes having negative node ids, all dynamic nodes having non-negative node ids, and all sink nodes having non-negative node ids higher than the dynamic nodes. This is primarily a convenience for systems in which the graph data structure is used as a "processing graph" for some kind of pipeline, where the root nodes represent the "input nodes" of the pipeline, the dynamic nodes represent the "processing nodes" of the pipeline, and the sink nodes represent the "output nodes" of the pipeline.
Edges may be globally distinguished (i.e. with a unique edge id) distinguished for each output or destination node (i.e. with an edge index which is unique for each output/destination node), or not distinguished at all. The library does not make any assumptions about the edge representation, but the concrete implementations of the ESE::Graph class provide different trade-offs in terms of performance and flexibility for different edge representations.
Given a graph topology, one unique generic representation is the adacency matrix, which is a 2D matrix where the entry at row i and column j is equal to the edge id if there is an edge from node i to node j, and equal to some default value otherwise. This representation is relative simple and easy to understand, but can be inefficient for sparse graphs (i.e. graphs with few edges relative to the number of nodes) as it requires storing a large number of 0s. The concrete adjacency basedimplementations here use more space and access efficient representations of this matrix.
The list of lists representation stores a list indexed by the output node id, where each entry in the list is a itself a list of the destination node ids of the edges from that output node (and optionally the edge ids).
The Compressed Sparse Row/Yale representation further compresses the adjacency matrix by storing the destination node ids of the edges in a single list, and storing a separate list of pointers to the start of each output node's edges in the destination node id list. This allows for efficient iteration over the edges of a particular output node, as well as efficient iteration over the global edge list, but can be less efficient for adding or removing edges from the graph as it requires shifting the destination node ids in the list.