The following document is a record of useful pointers and tips as I research the C++ Standard Template Library.
A variety of sources are used as I continually update this document. See the Resources section for a full list.
The primary resource is Stephan T. Lavavej’s STL video lectures over at Channel 9, and as such, this document will be arranged as individual video lecture notes.
The C++ Standard Template Library (STL) is a library, as opposed to a framework, for general computation focussed on speed and optimisation of general purpose tasks.
There are three main components to the STL: containers, iterators and algorithms. There are also functors. More on this later. Algorithms and containers are implemented without needing knowledge of each other, and only act on each other via iterators.
The STL extensively uses templates. Hence, when it gets instantiated by the compiler, everything is inline-able, with virtually no call overheads.
Containers, such as
vector, stores a sequence of elements of any valid type. These include your own classes, as long as they are copyable.
Vectors are flexible, expandable arrays which are contiguous in memory. They are also space efficient.
Since vectors are contiguous in memory, there will be times when more memory is required. In this case, a new contiguous memory block is allocated, and the existing elements are copied over. New elements can then be inserted to the new block; the old block is deleted.
Geometric memory allocation is used to do this, i.e. if there are N elements, vector will allocate memory for kN elements, where k > 1. Common implementations of the C++ STL use k = 1.5, or k = 2. This results in amortized constant time for adding elements.
Containers are flexible, so it can store arbitrary number of elements. This prevents security and correctness problems, like buffer and heap overflows. Each container knows its own size.
Undefined behaviour may occur if you try to access an element of a container, outside of the range of the number of elements, e.g.
The C++ standard does not specify what should happen in these cases.
Iterators point to elements within a container, in this case a vector. They are a generalisation of pointers, via operator overloading, abstracting away the implementations for different containters.
begin() returns an iterator to the first element of the vector.
end() returns an iterator to the element one past the end of the container. This is known as an inclusive-exclusive range. This makes empty ranges natural, i.e. an empty vector with have
end() iterators pointing to the same element. It also imitates how a normal array works. (pointer arithmetic)
We can include the algorithm header and use any of its algorithms.
The sort algorithm is templated on iterator type. As long as the container is of random access type, e.g. vector, we can use
sort() on it.
std::list is a doubly-linked list and is not of random access type, i.e. we cannot access the nth element of the list in constant time, so we cannot use
sort() on it.
If we have a vector of strings, by default, they are sorted lexicographically. But we can change this.
To preserve element orders when the sort algorithm cannot determine which element should appear first, e.g if the lengths of strings are equal, then we can use
std::stable_sort(). This is not the default as it may be less efficient than
sort() due to additional checks.
(More containers: deque, forward_list)
The associative containers in STL are
Map is templated on its element type, on both its key and value. Elements are automatically sorted.
If we have, as above, a
map<int, std::string>, each element has type
pair<const int, std::string>. We can also use iterators on maps.
Map is generally implemented as a semi-balanced binary tree, or a red-black tree. This ensures efficient operations, such as
To read an element from a constant reference of a map, we need to use
find(), instead of
Note that map is the only associative container which has
The set container is similar to map in terms of operation run times, but it enforces uniqueness. It also permits logarithmic time loookup, as opposed to linear lookup time with vector or map.
Unordered maps work in the exact same way as map, but elements are unordered upon insertion. However, its data structure is very different to map. It is in fact implemented as a hash table. Each bucket will be associated with a chain of values, to support collisions.
Insertion into an unordered map is expected to be constant time. In the case of a collision, the worst case is O(N), i.e. efficiency degrades.
Generally use maps for performance guarantees. Unordered maps can be used when performance requires it.
To do this, we need to use the erase-remove idiom.
Removing each unwanted element from the vector with a loop will have performance drawbacks. Firstly, vector will maintain contiguous block memory. So when one element is deleted, it will shift the remaining elements back, in linear time. This is for each unwanted element, so it will result in quadratic time.
Instead, we first use
std::remove(), and then
std::vector::erase(), which are part of the STL
<algorithm> header. These are non-member functions working on elements via iterators.
remove() will maintain two iterators to perform a linear pass on the vector to identify and move valid elements.
erase() then removes the unwanted elements at the end of the vector.
Introduced from C++0X, we use
unique_ptr in place of
delete. Shared pointers were originally in the Boost library.
auto_ptr has been officially deprecated.
A unique pointer is the unique owner of a dynamically allocated object. It is a smart pointer: a class which imitates a raw pointer via operator overloading (on
We did not call
unique_ptr will destroy the object for us once it goes out of scope.
Ownership of the object can be transferred with the
Unique pointers are not copy constructable – it is move constructable – but can be returned from functions. C++0X will automatically perform a move construction upon return.
Ownership transfer is implicit when unique pointer is returned from a method/function
Shared pointers share ownership to an object – they are reference counted shared pointers. Many shared pointers may point to the same object.
The pointed object is destroyed when all the shared pointers pointing to the object are destroyed, preventing resource leaks.
The reference counter is incremented when a new shared pointer points to the object. The reference counter is decremented when a shared pointer to the object is destroyed
The pointed object is deleted when the reference counter goes to zero.
make_shared is cleaner and a more efficient helper function.
Shared pointers may be used in STL containers. For example, you could declare a vector of shared pointers.
Solve the Nurikabe puzzle to demonstrate STL. Refer to the source file on the video link page.
class Grid - Stores state of puzzle, top-level member functions, write to file
class Region - keep track of all the contiguous regions
STL algorithms are usually templated on iterator types (to work with arbitrary containers with arbitrary elements) and predicates.
for_each: Iterate over all elements to perform an operation.
count: Counts the elements that match the passed value (
count_if: Counts elements that match a certain criterion via the given predicate.
find: Finds a matching element in linear time
find_if: Find an element that matches a certain criterion
equal: Compares sequences match
Note that lengths are not compared, we can check this manually if needed. The second sequence should contain at least as many elements as the first sequence; it is an error if otherwise.
all_of: Do all of the elements satisfy the criterion?
any_of: Do any of the elements satisfy the criterion?
none_of: Do none of the elements satisfy the criterion?
Functors can either be plain functions (non-member functions and we can get function pointers to them), or function objects (classes which overrides
operator() ); anything that can be called with “()”. Function objects can then either be hand-written or lambdas. Compilers tend to do a better job inlining function objects and lambdas.
copy: Copies from a range of input iterators into an output iterator. The output sequence container should have enough space the elements to be copied.
copy() can also be used to expand a container.
replace_if: Replaces a element that matches a certain criterion.
transform: Transforms the elements in a container
We can also use back inserters to copy elements from one container to another (as another way of performing the copy task in the previous Part)
Back inserters return an
insert_iterator. Dereferencing and incrementing an
insert_iterator adds a new element using
push_back. For example, passing to to a
back_inserter to copy will add elements.
Note that this example can be executed better via a range construct in a constructor
is more efficient than
back_insertor since it only calls
We can then sort this vector via STL
ostream_iterator writes to ostream, but is not needed in C++ 11 as lambdas provide a cleaner design.
std::binary_search is not particuarly useful as it just tells you the element is present or absent (it returns a
On the other hand,
std::lower_bound returns the first position where a value can be inserted and still maintain sorted order. If the element does not exist,
lower_bound returns the first position where the new element could be inserted, and still maintains sorted order.
begin() is returned if the element has to be inserted as the first element.
end() is returned if the element has to be inserted as the last element.
Note that binary search requires that the sequence is sorted beforehand.
std::upper_bound returns the last position where a value can be inserted and still maintain sorted order. (Remember include-exclude ranges.)
std::equal_range returns a pair of iterators, containing the lower bound and upper bound.
As we have seen so far, STL uses “less than” as the default comparators. Custom comparators should behave like “less than” (strict weak ordering), i.e. x compared to x should always return
false, otherwise STL will return invalid results.
The order of “same” elements is not guaranteed. Use
std::stable_sort if the order of “same” elements needs to be preserved.
std::partial_sort allows you to sort a small subset of the full collection.
If this sequence was sorted,
std::nth_element will return the nth_element.
set_intersections can be obtained for sequences.
mincan be used to obtain the least element
maxreturns the largest element
min_elementreturns in iterator to the least element
min_maxreturns the minimum and the maximum. This is more efficient than calling min and max separately.
Most algorithms are in
<algorithm> header file, but some are in
<numeric> (for generic numerical algorithms).
accumulate: Iterate to perform a binary operation value. Initial value can be specified here. The default version uses plus, but accumulate can be changed to use multiply to find the product.
iota: Used to initialize a sequence of increasing numbers. (C++OX only)
std::transform has two overloads:
Regular expressions are declarative string algorithms for flexible string processing. C++ regular expressions have a Perl type expression.
A few examples of regular expression uses include
To use regular expressions in C++, we must include the
<regex> header file, which include templated functions.
std::regex is acutally a typedef for
basic_regex<char> (regular strings), and
std::wregex is a typedef for
There are three main regular expression functions
regex_match: Does the whole regex match the entire string?
match_resultsand is templated on string type.
smatchis a typedef for results for regular strings
cmatchis a typedef for results for const strings
regex_search: Searches for substrings of the large string that match the specified regular expression. Returns the first occurance of the match. Never use
regex_search to iterate through each occurance of the large string; use
regex_iterator instead. This is because
regex_search does not handle zero-length string matches, which then may cause infinite loops.
sub_matchcontains iterators for the start and end of the match
regex_replace: Search and replace strings that match the regular expression. The replacements can be controlled at capture group level.
As mentioned previously,
regex_iterator may be used to look at all instances of the matching regular expression. Use
sregex_iterator for regular strings, and
wregex_iterator for unicode strings.
regex_token_iterator: Use to generate tokens from a string, e.g. if you are only interested in a subset of capture groups. This is the generalised form of
strtok from the C library.
Note that C++ will parse the regular expression string before passing it onto the regular expression engine. Hence, we must use double-backslashes to escape regular expression backslashes. Furthermore, regular expression tree generation from the string definition of a regular expression is time consuming. This generation should be done once and reused multiple times.
Further information: a
regex object is a pointer to a “Non-deterministic Finite Automaton”.
L-values are persistent named objects. R-values values are temporaries: objects which do not have a name (e.g. integer literals, or
x+y). Both apply to expressions, which can refer to objects; ojects themselves are not inherently l-values or r-values.
A key question to distinguish l-values and r-values are: “Can you take its address?”. If you can, it is an l-value. Otherwise, it is an r-value.
An example of an l-value reference is
int& r, and can be used to bind a reference to a temporary object (when passing an argument by reference).
An r-value reference behaves just like an l-value reference, but with subtle differences. It can bind two r-values specifically, and can never bind l-values.
There are two functions within the STL, namely
std::forward<T> (). R-value references support two radically different things
C++ uses the copy constructor whenever an object needs to be copied. This happens in cases like:
The problem here is that a copy constructor might sometimes result in redundant copies. For example:
results in a copy constructor creating an object on return from
MethodReturningValue(), and so another copy constructor is called for the assignment. This occurs in many cases, e.g. when using
std::vector, or your own classes with copy constructors and
operator= which dynamically allocates memory.
C++11 defines a move constructor so that this double copy can be avoided.
In the above example, the method returns a value with a copy constructor. Now the compiler knows that this is a temporary object so the second time it calls the move constructor. The move constructor does not copy the object, instead it just replaces the contents of the object.
A move constructor has a signature of:
Compare this to a copy constructor:
C++11 version of standard template library will use move constructor to reuse a temporary object (
std::vector). When objects of a class are saved by value in STL containers, defining the move constructor will improve performance as unnecessary copying will be avoided.
std::move takes a value or an object, and returns an r-value.
We need to do this since
other – an r-value reference – is an l-value (since it has a name).
As a word of warning, only use r-value references within the move semantics and perfect forward idioms, i.e. move constructors and
operator= move assignments.
We can unify the copy assignment operator and move assignment operator
Type traits enable template meta programming with template argument deduction.
type_traits header allows you to check the type of the typename in a template, and define methods that are appropriate for that type. For example,
false_typeare typedefs for two separate types (they are equivalent of bool in runtime code). Queries about types return
is_integral: is the type an integer? (
long long, etc)
is_floatingpoint: is it a floating point type?
is_pointer: is it a pointer type?
false_type can be used to define different overloads for a method. This way you can control the overload that gets called for different types.