LINQ: it can grow on trees


The IEnumerable interface underpins LINQ. It’s the contract that makes LINQ queries composable. I think the whole link thing is great (see earlier post) but is constrained to to dealing with flat data. Sure you can group but if you need groups nested three deep then you have to know this in advance. Hierarchical structures are not so convenient because they are arbitrarily deep and different nodes have different depths. So this post is about extending the LINQ approach to handle hierarchical data.

Getting flat list from a node hierarchy

Here’s an extension method which will traverse a hierarchical node structure and return a flat IEnumerable of nodes, filtered by a criteria. The method will appear in intellisense for an enumerable of any type . Because the method returns an IEnumerable it is composable.

The summary is almost as long as the method body so it’s easy to see what’s going on. The first line checks there are nodes to work with the nodes passed in are evaluated. A call to the select lambda is made and if the caller returns true the node is yielded up to be consumed by the caller. If not, the node is not returned. In this implementation it’s children are still processed – though you might like to change this behavior or add a parameter to control this aspect of the method’s behavior.

The extension method does not rely on a fixed type so does not know which property or method returns a list of child nodes. I guess it could use reflection to find a property implementing IEnumerable. However, i this implementation each node is passed to the children lambda callback so the caller (assumed to know how to find the list of child nodes) can return the appropriate list for further processing.

Assuming a simple class like:

public class Node
{
   public int Code = 0;
   public List<Node> Children = null;
}

And tree like:

  Node (Code=1)
    Node (Code=7)
      Node (Code=5)
      Node (Code=5)
        Node (Code=9)
    Node (Code=6)

The following line will recursively allow you to operate on each node:

// Use a Func<> if you need an answer like an 'Any' function for a tree
Action<Node> traverse = null;
traverse = (n) =>
{
    // Do the work here...
    Console.WriteLine(n.Code);

    // Then process children
    // (could be the other way around - process children then work - for example if building a composite string)
    n.Children.ForEach(traverse);
};
traverse(lastPart);

The following line will return a flat list containing 4 node instances:

var nodes = new List<Node> () { Node (Code=1) }.Traverse(n => n.Children, n => n.Code < 7).ToList();

Will return a list of 4 nodes. Here’s the method:

/// <summary>
/// A generic mechanism to traverse a structure of the form node.Children -> node.Children
/// </summary>
/// <typeparam name="T">The type of the nodes to traverse</typeparam>
/// [param name="nodes"]The nodes to traverse</param>
/// [param name="children"]A callback to allow the caller to select the sub-nodes to iterate</param>
/// [param name="select"]Present the current node to the caller so the caller can choose if this is a node to select</param>;
/// <returns>An IEnumerable of nodes which meet the criteria define by the select lambda</returns>
/// <remarks>The children callback must not use an OfType<> iterator</remarks>
public static IEnumerable<T> Traverse<T>(
	this IEnumerable<T> nodes, 
	Func<T, IEnumerable<T>> children, 
	Func<T, bool> select)
{
	if (children == null) throw new ArgumentException("The Traverse parameter 'children' cannot be null", "children");

	// Handle the scenario that the caller might return null to signal no more nodes
	foreach (var childNode in nodes)
	{
		if (select != null && select(childNode))
			yield return childNode;

		var childNodes = children(childNode);

		if (childNodes.IsNullOrEmpty()) continue;

		foreach (var inner in Traverse(childNodes, children, select))
			yield return inner; // Forces the inner yields up the line
	}
}

A handy extension method to check the state of an IEnumerable which is used in the Traverse() method:

public static bool IsNullOrEmpty<TK, TV>(this IEnumerable<T> list)
{
	return list == null || list.Count == 0;
}

Generate a composable node graph from a flat list

It’s often necessary to persist and retrieve hierarchical data. My preference is to use left/right or blue/red attributes on records held in a flat list. These can be generated and saved easily.

Here’s the hierarchy above stored as flat records extended to include left/right properties. These can easily be saved to a table and can have a hierarchy as deep as needed:

  Flat(Code=1, Left=1, Right=12)
    Flat (Code=7, Left=2, Right=9)
    Flat (Code=5, Left=3, Right=4)
    Flat (Code=5, Left=5, Right=8)
    Flat (Code=9, Left=6, Right=7)
    Flat (Code=6, Left=10, Right=11)

The following method converts it to a graph of nodes

/// <summary>
/// Generic method to create a graph of nodes based on a user defined scheme for defining the hierarchy in a flat list
/// </summary>
/// <typeparam name="F">An element in array/list/set/queue holding the '(f)lat' records</typeparam>
/// <typeparam name="H">A node in the (h)ierarchy</typeparam>
/// <param name="list">The list of flat items</param>
/// <param name="transform">A caller defined method which returns the children for the node or null</param>
/// <param name="parent">The parent for the current node.  Will be null for the first level</param>
/// <param name="ileft">Allow the caller to provide the left value of the item</param>
/// <param name="iright">Allow the caller to provide the right value of the item</param>
/// <param name="pleft">Allow the caller to provide the left value of the parent</param>
/// <param name="pright">Allow the caller to provide the right value of the parent</param>
/// <param name="nodeInParent">A caller defined method to determine if the current node is with the
/// parent based on the hierarchy scheme used by the user.</param>
/// <returns></returns>
public static IEnumerable<H> CreateTree<F, H>(
	this Queue<F> list,
	Func<F, IEnumerable<H>, H> transform,
	F parent,
	Func<F,int> ileft,
	Func<F, int> iright,
	Func<F,int> pleft,
	Func<F, int> pright,
	Action<H, IEnumerable<H>> assignParent
)
{
	if (ileft == null) throw new ArgumentNullException("ileft");
	if (iright == null) throw new ArgumentNullException("iright");
	if (pleft == null) throw new ArgumentNullException("pleft");
	if (pright == null) throw new ArgumentNullException("pright");
	if (transform == null) throw new ArgumentNullException("transform");

	while (list.Count > 0)
	{
		// Work out if the next one is in this node group or not
		F item = list.Peek();
		int itemLeft = ileft(item);
		int itemRight = iright(item);
		int parentLeft = 0;
		int parentRight = 0;

		if (parent != null)
		{
			parentLeft = pleft(parent);
			parentRight = pright(parent);
		}

		int n = parent != null &&
				(itemLeft < parentLeft ||
					itemLeft > parentRight ||
					itemRight > parentRight ||
					itemRight < parentLeft)
				? 0
				: itemRight - itemLeft;

		if (n == 0) break;

		list.Dequeue();

		IEnumerable<H> nodes = (n > 1)
			? CreateTree(list, transform, item, ileft, iright, pleft, pright, assignParent).ToList()
			: null;

		H node = transform(item, nodes);
		if (assignParent != null && nodes != null)
			assignParent(node, nodes);

		yield return node;
	}
}

Create a flat list with left/right nodes from a node hierarchy

The last step is to create a method able to generate a flat list from an arbitrary hierarchy of nodes. Again, the extension method will know nothing about the classes used in the hierarchy or about the classes used to hold flattened information or about how the caller will choose to use the flattened information. All it will do is maintain a pair of left/right values for each node in the hierarchy and let the caller figure out what to do with this information.

/// <summary>
/// Flattens a node hierarchy.  The extension method will know nothing about the 
/// classes used in the hierarchy or about the classes used to hold flattened 
/// information or about how the caller will choose to use the flattened information.
/// All it will do is maintain a pair of left/right values for each node in the
/// hierarchy and let the caller figure out what to do with this information.
/// </summary>
/// <typeparam name="F">A element in array/list/set/queue holding the '(f)lat' records</typeparam>
/// <typeparam name="H">A node in the (h)ierarchy</typeparam>
/// [param name="nodes">The nodes to traverse</param>
/// [param name="number">The starting number for the left/right pair</param>
/// [param name="children">A callback to allow the caller the select the sub-nodes to iterate</param>
/// [param name="createFlattenedInstance">A callback to allow the caller to create an instance of 
/// type F and use the left/right information as appropriate</param>
/// <returns>IEnumerable<KeyValuePair<F,int>></returns>
/// <remarks>Ideally the return would only be IEnumerable&lt;F&gt;. 
/// There needs to be a way to return the current 'number' and ref 
/// parameters are not allowed when using IEnumerable return values.
/// So the value property of each KeyValuePair instance is used to
/// record the 'right' number so the parent is able to find out the
/// largest number.</remarks>
public static IEnumerable<KeyValuePair<F,int>> Flatten<F, H>(
	this IEnumerable<H> nodes, 
	int number, 
	Func<H, IEnumerable<H>> children, 
	Func<H,F> createFlattenedInstance, 
	Action<F, int> setLeft, 
	Action<F, int> setRight)
{
	if (children == null)
		throw new ArgumentException("The Flatten parameter 'children' cannot be null", "children");
	if (createFlattenedInstance == null)
		throw new ArgumentException("The Flatten parameter 'createFlattenedInstance' cannot be null", "createFlattenedInstance");
	if (setLeft == null)
		throw new ArgumentException("The Flatten parameter 'setLeft' cannot be null", "setLeft");
	if (setRight == null)
		throw new ArgumentException("The Flatten parameter 'setRight' cannot be null", "setRight");

	// Handle the scenario that the caller might return null to signal no more nodes
	foreach (var node in nodes)
	{
		number++;
		var f = createFlattenedInstance(node);
		setLeft(f, number);

		var childNodes = children(node);

		if (!childNodes.IsNullOrEmpty())
		{
			foreach (var inner in Flatten(childNodes, number, children, createFlattenedInstance, setLeft, setRight))
			{
				// Record the largest number among child nodes
				if (inner.Value > number) number = inner.Value;
				yield return inner; // Forces the inner yields up the line
			}
		}

		number++;
		setRight(f, number);

		yield return new KeyValuePair<F, int>(f, number);
	}
}

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Be the first to leave a comment!