Lifting
Douglas Gregor
Lifting
   Lifting Basic Types
   Lifting Containers
   Lifting Iteration
   Review
The lifting process is the first and most important part of
the Generic Programming process. Lifting seeks to discover a generic
algorithm by answering the following fundamental question:
What are the minimal requirements that my data types need to fulfill
for the algorithm to operate correctly and efficiently?
Lifting Basic Types
For trivial algorithms it is occasionally possible to guess the
minimal requirements and directly implement a generic
algorithm. However, for any real algorithm, lifting is an iterative
process that begins with multiple, concrete implementations of the
same algorithm. For instance, consider the following two
implementations of summation in C++. We will use C++ for the examples
in this tutorial, because it is the most readily-available language
that supports Generic Programming well.
int sum(int* array, int n) {
int result = 0;
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
|
float sum(float* array, int n) {
float result = 0;
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
|
These two functions are nearly identical, except that the
left-hand sum applies to integers while the
right-hand sum applies to floating point values. We can
now lift these two algorithms to create a single, generic
algorithm that can provide the functionality of both versions
of sum. To do so, we replace int and float
with a type parameter T, using C++ templates:
template
T sum(T* array, int n) {
T result = 0;
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
The template can be interpreted as "for
all types T", but this is not strictly true. When we lift from the
concrete implementations of sum to this generic
implementation, we need to specify which operations the
type T must support to be used with sum. To find
these requirements, we look for operations in the body of the
function that use type T. In the following figure, we have
highlighted each of these operations in red:
template
T sum(T* array, int n) {
T result = 0;
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
So, to use sum to a given type T, we need to be
able to initialize it with 0 (first red line), add two values of
type T (with operator +), then assign the result to
a T (with operator =); finally, we need to be able
to return a value of type T from the function, which requires
a copy constructor. Our generic sum works with many types,
including (of course) int and float, but
also long, double, or any other type that meets the
requirements that we have spelled out.
To lift this algorithm further, we need to compare it against
another concrete implementation of summation. For instance, let's
consider a function that concatenates an array of strings
together:
template
T sum(T* array, int n) {
T result = 0;
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
|
std::string concatenate(std::string* array, int n) {
std::string result = "";
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
|
Again, these two algorithms are nearly identical, except that our
generic algorithm expects numeric types, e.g., ones that can be
initialize with zero, whereas the concatenation algorithm operates on
strings and is initialized with the empty string. How can we lift our
generic algorithm to account for strings? We need to abstract away the
initialization value, so that numeric types will be initialized with
zero whereas strings will be initialized with the empty string. In all
of these cases, the default constructor provides the proper value, so
we can revise our generic algorithm to and spell out the new
requirements:
// Requirements:
// T must have a default constructor that produces the identity value
// T must have an additive operator +
// T must have an assignment operator
// T must have a copy constructor
template
T sum(T* array, int n) {
T result = T();
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
Lifting Containers
Our generic algorithm now supports summation of numeric values,
concatenation of strings, etc. It could also support addition of
arrays of matrices and many other data types. However, it still only
supports C++ arrays. What about other data structures, such
as vectors or lists? Since vectors support
the same indexing scheme as arrays, we can lift our algorithm to
support arrays, vectors, and any other "container" data type
that supports the indexing operator []:
// Requirements:
// T must have a default constructor that produces the identity value
// T must have an additive operator +
// T must have an assignment operator
// T must have a copy constructor
// Container must have an indexing operator [] that returns a T
template<typename Container, typename T>
T sum(const Container& array, int n) {
T result = T();
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
Does our new generic algorithm work with a std::list?
Unfortunately, it does not, because std::list does not
provide an indexing operator. Let's assume that we could
change std::list to provide an indexing operator, although
doing so is strictly against the principles of Generic Programming: a
generic algorithm should not require types to have functionality that
cannot be "added on" without changing the definition of the
type. Nevertheless, if we modified std::list to provide an
indexing operator [], the generic sum algorithm
would operate correctly.
Lifting Iteration
There is yet one more problem with using std::list in
our sum algorithm. Consider how one would implement the
indexing operator for a linked list: to find the ith
element, one needs to step through i items in the list. This is
a linear-time operation. If we then call sum with our list,
summation will take quadratic time! The algorithm is correct,
but not efficient, meaning that we have either over-abstracted
(and, therefore, are implementing a more general but less efficient
algorithm) or we have abstracted poorly. Let's step back and compare
our array-based sum against an implementation
of sum for a linked list:
template
T sum(T* array, int n) {
T result = T();
for (int i = 0; i < n; ++i)
result = result + array[i];
return result;
}
|
template
struct list_node {
list_node* next;
T value;
};
struct linked_list {
list_node* start;
};
template
T sum(list_node list, int n) {
T result = 0;
for (list_node* current = list.start;
current != NULL; current = current->next)
result = result + current->value;
return result;
}
|
These two implementations have many more differences than other
pairs of implementations we have seen previously. For instance, the
iteration scheme is completely different, with a simple loop over
integers for arrays but a series of pointer jumps for linked
lists. Also, the role of the container itself is much smaller in the
linked-list version: once the start of the linked-list has been
extracted from list, the list object is never
referenced. This second difference is the key to finding a better
abstraction: we have tried (and failed) at making the container the
core abstraction, but what if iteration of a sequence is the core
abstraction? Can we re-cast the array-based sum in a way
that focuses more on the iteration than on the container? One could
write the following:
template
T sum(T* array, int n) {
T result = T();
for (T* current = array; current != array + n; ++current)
result = result + *current;
return result;
}
The two concrete algorithms now have roughly the same form: there
is something that we use to iterate over the values in the
container, with an operation to access the current element (e.g.,
dereferencing with * or retrieving the value
member), an operation to step to the next element (e.g.,
incrementing a pointer with ++, or jumping to
the next pointer of a linked list), and an operation to
test against the end of the sequence (NULL or array +
n). If we pull out these operations and give them names, we
can write a generic sum that works equally well on
array-based sequences and linked lists:
// Requirements:
// T must have an additive operator +
// T must have an assignment operator
// T must have a copy constructor
// I must have an inequality operator !=
// I must have a copy constructor
// I must have an operation next() that moves to the next value in the sequence
// I must have an operation get() that returns the current value (of type T).
template<typename I, typename T>
T sum(I start, I end, T init) {
for (I current = start; current != end; current = next(current))
init = init + get(current);
return init;
}
Now, we need only write the functions get()
and next() for both pointers and list_node
pointers. Then, our sum algorithm will operate efficiently
for linked lists, arrays, and any other data structure whose
elements can be accessed by traversing them linearly, so long
as get(), next(), and != can be
implemented efficiently.
template T* next(T* p) { return ++p; }
template list_node* next(list_node<&>* n) { return n->next; }
template T get(T* p) { return *p; }
template T get(list_node<&>* n) { return n->value; }
In this last lifting step, we introduced a new template type
parameter I. In the terminology of
the C++
Standard Template Library (STL), I is
an iterator. Iterators are an abstraction over the values in
a sequence, and can be used to access the elements stored in a
variety of containers, from dynamic arrays and linked lists to
balanced binary trees and hash tables. By separating out the
storage of values (containers) from access to these values
(iterators), one can write algorithms that operate on any
container.
Review
Our original implementations of sum supported summation of
simple, numeric data types stored in arrays. Through the lifting
process, we were able to widen the applicability to other operations
with an operator+, such as string concatenation. We were
also able to lift away the dependence on arrays, by bringing
iteration--not container access--into the requirements. Our
final sum algorithm can operate on many different kinds of
containers with many different kinds of data, while still retaining
the efficiency of a specialized, concrete implementation.
The lifting process of Generic Programming integrates many
concrete implementations of the same algorithm, teasing out the
minimal requirements that algorithms place on their parameters. Read
on to learn how the requirements extracted during lifting can be
combined and categorized into concepts,
providing descriptions of the core abstractions in a problem
domain.
|