An Introduction to Generic Programming

Lifting

Douglas Gregor

Lifting
   Lifting Basic Types
   Lifting Containers
   Lifting Iteration
   Review

The lifting process is the first and most important part of the Generic Programming process. Lifting seeks to discover a generic algorithm by answering the following fundamental question:

What are the minimal requirements that my data types need to fulfill for the algorithm to operate correctly and efficiently?

Lifting Basic Types

For trivial algorithms it is occasionally possible to guess the minimal requirements and directly implement a generic algorithm. However, for any real algorithm, lifting is an iterative process that begins with multiple, concrete implementations of the same algorithm. For instance, consider the following two implementations of summation in C++. We will use C++ for the examples in this tutorial, because it is the most readily-available language that supports Generic Programming well.

int sum(int* array, int n) {
  int result = 0;
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}
float sum(float* array, int n) {
  float result = 0;
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}

These two functions are nearly identical, except that the left-hand sum applies to integers while the right-hand sum applies to floating point values. We can now lift these two algorithms to create a single, generic algorithm that can provide the functionality of both versions of sum. To do so, we replace int and float with a type parameter T, using C++ templates:

template
T sum(T* array, int n) {
  T result = 0;
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}

The template can be interpreted as "for all types T", but this is not strictly true. When we lift from the concrete implementations of sum to this generic implementation, we need to specify which operations the type T must support to be used with sum. To find these requirements, we look for operations in the body of the function that use type T. In the following figure, we have highlighted each of these operations in red:

template
T sum(T* array, int n) {
  T result = 0;
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}

So, to use sum to a given type T, we need to be able to initialize it with 0 (first red line), add two values of type T (with operator +), then assign the result to a T (with operator =); finally, we need to be able to return a value of type T from the function, which requires a copy constructor. Our generic sum works with many types, including (of course) int and float, but also long, double, or any other type that meets the requirements that we have spelled out.

To lift this algorithm further, we need to compare it against another concrete implementation of summation. For instance, let's consider a function that concatenates an array of strings together:

template
T sum(T* array, int n) {
  T result = 0;
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}
std::string concatenate(std::string* array, int n) {
  std::string result = "";
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}

Again, these two algorithms are nearly identical, except that our generic algorithm expects numeric types, e.g., ones that can be initialize with zero, whereas the concatenation algorithm operates on strings and is initialized with the empty string. How can we lift our generic algorithm to account for strings? We need to abstract away the initialization value, so that numeric types will be initialized with zero whereas strings will be initialized with the empty string. In all of these cases, the default constructor provides the proper value, so we can revise our generic algorithm to and spell out the new requirements:

// Requirements:
//   T must have a default constructor that produces the identity value
//   T must have an additive operator +
//   T must have an assignment operator
//   T must have a copy constructor
template
T sum(T* array, int n) {
  T result = T();
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}

Lifting Containers

Our generic algorithm now supports summation of numeric values, concatenation of strings, etc. It could also support addition of arrays of matrices and many other data types. However, it still only supports C++ arrays. What about other data structures, such as vectors or lists? Since vectors support the same indexing scheme as arrays, we can lift our algorithm to support arrays, vectors, and any other "container" data type that supports the indexing operator []:

// Requirements:
//   T must have a default constructor that produces the identity value
//   T must have an additive operator +
//   T must have an assignment operator
//   T must have a copy constructor
//   Container must have an indexing operator [] that returns a T
template<typename Container, typename T>
T sum(const Container& array, int n) {
  T result = T();
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}

Does our new generic algorithm work with a std::list? Unfortunately, it does not, because std::list does not provide an indexing operator. Let's assume that we could change std::list to provide an indexing operator, although doing so is strictly against the principles of Generic Programming: a generic algorithm should not require types to have functionality that cannot be "added on" without changing the definition of the type. Nevertheless, if we modified std::list to provide an indexing operator [], the generic sum algorithm would operate correctly.

Lifting Iteration

There is yet one more problem with using std::list in our sum algorithm. Consider how one would implement the indexing operator for a linked list: to find the ith element, one needs to step through i items in the list. This is a linear-time operation. If we then call sum with our list, summation will take quadratic time! The algorithm is correct, but not efficient, meaning that we have either over-abstracted (and, therefore, are implementing a more general but less efficient algorithm) or we have abstracted poorly. Let's step back and compare our array-based sum against an implementation of sum for a linked list:

template
T sum(T* array, int n) {
  T result = T();
  for (int i = 0; i < n; ++i)
    result = result + array[i];
  return result;
}
template
struct list_node {
  list_node* next;
  T value;
};
struct linked_list {
  list_node* start;
};
template
T sum(list_node list, int n) {
  T result = 0;
  for (list_node* current = list.start; 
       current != NULL; current = current->next)
    result = result + current->value;
  return result;
}

These two implementations have many more differences than other pairs of implementations we have seen previously. For instance, the iteration scheme is completely different, with a simple loop over integers for arrays but a series of pointer jumps for linked lists. Also, the role of the container itself is much smaller in the linked-list version: once the start of the linked-list has been extracted from list, the list object is never referenced. This second difference is the key to finding a better abstraction: we have tried (and failed) at making the container the core abstraction, but what if iteration of a sequence is the core abstraction? Can we re-cast the array-based sum in a way that focuses more on the iteration than on the container? One could write the following:

template
T sum(T* array, int n) {
  T result = T();
  for (T* current = array; current != array + n; ++current)
    result = result + *current;
  return result;
}

The two concrete algorithms now have roughly the same form: there is something that we use to iterate over the values in the container, with an operation to access the current element (e.g., dereferencing with * or retrieving the value member), an operation to step to the next element (e.g., incrementing a pointer with ++, or jumping to the next pointer of a linked list), and an operation to test against the end of the sequence (NULL or array + n). If we pull out these operations and give them names, we can write a generic sum that works equally well on array-based sequences and linked lists:

// Requirements:
//   T must have an additive operator +
//   T must have an assignment operator
//   T must have a copy constructor
//   I must have an inequality operator !=
//   I must have a copy constructor
//   I must have an operation next() that moves to the next value in the sequence
//   I must have an operation get() that returns the current value (of type T).
template<typename I, typename T>
T sum(I start, I end, T init) {
  for (I current = start; current != end; current = next(current))
    init = init + get(current);
  return init;
}

Now, we need only write the functions get() and next() for both pointers and list_node pointers. Then, our sum algorithm will operate efficiently for linked lists, arrays, and any other data structure whose elements can be accessed by traversing them linearly, so long as get(), next(), and != can be implemented efficiently.

template T* next(T* p) { return ++p; }
template list_node* next(list_node<&>* n) { return n->next; } 
template T get(T* p) { return *p; }
template T get(list_node<&>* n) { return n->value; } 

In this last lifting step, we introduced a new template type parameter I. In the terminology of the C++ Standard Template Library (STL), I is an iterator. Iterators are an abstraction over the values in a sequence, and can be used to access the elements stored in a variety of containers, from dynamic arrays and linked lists to balanced binary trees and hash tables. By separating out the storage of values (containers) from access to these values (iterators), one can write algorithms that operate on any container.

Review

Our original implementations of sum supported summation of simple, numeric data types stored in arrays. Through the lifting process, we were able to widen the applicability to other operations with an operator+, such as string concatenation. We were also able to lift away the dependence on arrays, by bringing iteration--not container access--into the requirements. Our final sum algorithm can operate on many different kinds of containers with many different kinds of data, while still retaining the efficiency of a specialized, concrete implementation.

The lifting process of Generic Programming integrates many concrete implementations of the same algorithm, teasing out the minimal requirements that algorithms place on their parameters. Read on to learn how the requirements extracted during lifting can be combined and categorized into concepts, providing descriptions of the core abstractions in a problem domain.

Previous: Introduction Author: Douglas Gregor Next: Concepts