Introduction
The following document is a record of useful pointers and tips as I research the C++ Standard Template Library.
A variety of sources are used as I continually update this document. See the Resources section for a full list.
The primary resource is Stephan T. Lavavej's STL video lectures over at Channel 9, and as such, this document will be arranged as individual video lecture notes.
Part 1: Containers, Iterators and Algorithms
The C++ Standard Template Library (STL) is a library, as opposed to a framework, for general computation focussed on speed and optimisation of general purpose tasks.
There are three main components to the STL: containers, iterators and algorithms. There are also functors. More on this later. Algorithms and containers are implemented without needing knowledge of each other, and only act on each other via iterators.
The STL extensively uses templates. Hence, when it gets instantiated by the compiler, everything is inline-able, with virtually no call overheads.
Sequence containers
Containers, such as vector
, stores a sequence of elements of any valid type. These include your own classes, as long as they are copyable.
Vectors and vector memory management
Vectors are flexible, expandable arrays which are contiguous in memory. They are also space efficient.
std::vector v; // Default constructor
v.push_back(11);
v.push_back(22);
v.push_back(33);
Since vectors are contiguous in memory, there will be times when more memory is required. In this case, a new contiguous memory block is allocated, and the existing elements are copied over. New elements can then be inserted to the new block; the old block is deleted.
Geometric memory allocation is used to do this, i.e. if there are N elements, vector will allocate memory for kN elements, where k > 1. Common implementations of the C++ STL use k = 1.5, or k = 2. This results in amortized constant time for adding elements.
Containers are flexible, so it can store arbitrary number of elements. This prevents security and correctness problems, like buffer and heap overflows. Each container knows its own size.
v.size();
Undefined behaviour may occur if you try to access an element of a container, outside of the range of the number of elements, e.g.
v[5] = 55;
The C++ standard does not specify what should happen in these cases.
Accessing a vector via an iterator
Iterators point to elements within a container, in this case a vector. They are a generalisation of pointers, via operator overloading, abstracting away the implementations for different containters.
vector::iterator i = v.begin() + 1;
std::cout << *i << std::endl; // 22
begin()
returns an iterator to the first element of the vector. end()
returns an iterator to the element one past the end of the container. This is known as an inclusive-exclusive range. This makes empty ranges natural, i.e. an empty vector with have begin()
and end()
iterators pointing to the same element. It also imitates how a normal array works. (pointer arithmetic)
Using algorithms on iterators
We can include the algorithm header and use any of its algorithms.
#include
...
std::sort(v.begin(), v.end());
The sort algorithm is templated on iterator type. As long as the container is of random access type, e.g. vector, we can use sort()
on it. std::list
is a doubly-linked list and is not of random access type, i.e. we cannot access the nth element of the list in constant time, so we cannot use sort()
on it.
If we have a vector of strings, by default, they are sorted lexicographically. But we can change this.
#include
#include
#include
...
std::vector v;
v.push_back("cat");
v.push_back("antelope");
v.push_back("puppy");
v.push_back("bear");
std::sort(v.begin(), v.end()); // Lexicographical order
std::sort(v.begin(), v.end(), std::greater()); // Reverse lexicographical order
// Using lambdas ...
std::sort(v.begin(), v.end(),
[](const std::string& left, const std::string& right) {
return left.size() < right.size(); // Soft by increasing string length
});
To preserve element orders when the sort algorithm cannot determine which element should appear first, e.g if the lengths of strings are equal, then we can use std::stable_sort()
. This is not the default as it may be less efficient than sort()
due to additional checks.
(More containers: deque, forward_list)
Part 2: Associate Containers
The associative containers in STL are map
, multimap
, set
, and multiset
(C++03), unordered_map
, unordered_multimap
, unordered_set
, and unordered_multiset
(C++0X).
Map
Map is templated on its element type, on both its key and value. Elements are automatically sorted.
#include
If we have, as above, a map<int, std::string>
, each element has type pair<const int, std::string>
. We can also use iterators on maps.
// Using const iterators and C++11 auto pointer
// in place of std::map::iterator
for(auto i = m.cbegin(); i != m.cend(); ++i) {
std::cout << "(" << i->first << ", " << i->second << ")" << std::endl;
}
Map is generally implemented as a semi-balanced binary tree, or a red-black tree. This ensures efficient operations, such as
- Insertion: O(log N)
- Deletion: O(log N)
- Iteration: O(N)
- Destroy: O(N)
To read an element from a constant reference of a map, we need to use find()
, instead of operator[]
.
void foo(const map& m) {
auto i = m.find(3);
if(i == m.end()) {
std::cout << "not found" << std::endl;
} else {
std::cout << i->second << std::endl;
}
}
Note that map is the only associative container which has operator[]
.
The set container is similar to map in terms of operation run times, but it enforces uniqueness. It also permits logarithmic time loookup, as opposed to linear lookup time with vector or map.
Unordered Map
Unordered maps work in the exact same way as map, but elements are unordered upon insertion. However, its data structure is very different to map. It is in fact implemented as a hash table. Each bucket will be associated with a chain of values, to support collisions.
Insertion into an unordered map is expected to be constant time. In the case of a collision, the worst case is O(N), i.e. efficiency degrades.
Generally use maps for performance guarantees. Unordered maps can be used when performance requires it.
Deleting multiple entries from a vector
To do this, we need to use the erase-remove idiom.
#include
#include
...
std::vector v;
v.push_back(11);
v.push_back(22);
v.push_back(33);
v.push_back(44);
v.push_back(55);
Removing each unwanted element from the vector with a loop will have performance drawbacks. Firstly, vector will maintain contiguous block memory. So when one element is deleted, it will shift the remaining elements back, in linear time. This is for each unwanted element, so it will result in quadratic time.
Instead, we first use std::remove()
, and then std::vector::erase()
, which are part of the STL <algorithm>
header. These are non-member functions working on elements via iterators.
remove()
will maintain two iterators to perform a linear pass on the vector to identify and move valid elements. erase()
then removes the unwanted elements at the end of the vector.
v.erase(remove_if(v.begin(), v.end(),
[](int e) {
return e % 2 == 1;
}), v.end());
Part 3: Smart Pointers
Introduced from C++0X, we use shared_ptr
and unique_ptr
in place of new
and delete
. Shared pointers were originally in the Boost library.
Note that auto_ptr
has been officially deprecated.
Unique Pointers
A unique pointer is the unique owner of a dynamically allocated object. It is a smart pointer: a class which imitates a raw pointer via operator overloading (on *
and ->
).
#include
...
std::unique_ptr up(new int(1729));
std::cout << *up << std::endl;
We did not call delete
since unique_ptr
will destroy the object for us once it goes out of scope.
Ownership of the object can be transferred with the std::move()
function.
std::unique_ptr up2 = std::move(up);
Unique pointers are not copy constructable -- it is move constructable -- but can be returned from functions. C++0X will automatically perform a move construction upon return.
Ownership transfer is implicit when unique pointer is returned from a method/function
Shared Pointers
Shared pointers share ownership to an object -- they are reference counted shared pointers. Many shared pointers may point to the same object.
std::shared_ptr sp(new int(1234));
std::shared_ptr sp2(sp); // Using copy constructor
std::cout << *sp << std::endl;
std::cout << *sp2 << std::endl;
The pointed object is destroyed when all the shared pointers pointing to the object are destroyed, preventing resource leaks.
The reference counter is incremented when a new shared pointer points to the object. The reference counter is decremented when a shared pointer to the object is destroyed
The pointed object is deleted when the reference counter goes to zero.
auto sp = make_shared(1729);
make_shared
is cleaner and a more efficient helper function.
Shared pointers may be used in STL containers. For example, you could declare a vector of shared pointers.
Part 4 & 5: Nurikabe Solver
Solve the Nurikabe puzzle to demonstrate STL. Refer to the source file on the video link page.
class Grid - Stores state of puzzle, top-level member functions, write to file
class Region - keep track of all the contiguous regions
Part 6: Algorithms and Functors
STL algorithms are usually templated on iterator types (to work with arbitrary containers with arbitrary elements) and predicates.
Non-modifying/non-mutating algorithms
for_each
: Iterate over all elements to perform an operation.count
: Counts the elements that match the passed value (==
).
#include
...
std::vector v;
v.push_back(11);
v.push_back(22);
v.push_back(33);
v.push_back(44);
v.push_back(55);
std::cout << std::count(v.begin(), v.end(), 33) << std::endl; // 1
count_if
: Counts elements that match a certain criterion via the given predicate.
bool even(int n) {
return n % 2 == 0;
}
...
std::cout << std::count_if(v.begin(), v.eng(), even) << std::endl; // 2
// or using lambdas
std::cout << std::count_if(v.begin(), v.end(),
[](int n) { return n % 2 == 0 }) << std::endl; // 2
find
: Finds a matching element in linear timefind_if
: Find an element that matches a certain criterionequal
: Compares sequences match
std::equal(first.begin(), first.end(), second.begin());
Note that lengths are not compared, we can check this manually if needed. The second sequence should contain at least as many elements as the first sequence; it is an error if otherwise.
all_of
: Do all of the elements satisfy the criterion?any_of
: Do any of the elements satisfy the criterion?none_of
: Do none of the elements satisfy the criterion?
Functors
Functors can either be plain functions (non-member functions and we can get function pointers to them), or function objects (classes which overrides operator()
); anything that can be called with "()". Function objects can then either be hand-written or lambdas. Compilers tend to do a better job inlining function objects and lambdas.
Modifying/Mutating algirithms
copy
: Copies from a range of input iterators into an output iterator. The output sequence container should have enough space the elements to be copied.
std::copy(v.begin(), v.end(), u.begin());
copy()
can also be used to expand a container.
// WRONG - vector does not know it needs to expand (it is empty)
std::vector dest;
std::copy(v.begin(), v.end(), dest.begin());
// RIGHT
std::vector dest(5);
std::copy(v.begin(), v.end(), dest.begin());
replace_if
: Replaces a element that matches a certain criterion.transform
: Transforms the elements in a container
Part 7: Algorithms and Sorting
We can also use back inserters to copy elements from one container to another (as another way of performing the copy task in the previous Part)
const int arr[] = (22, 99, 33, 44, 55);
vector v;
v.push_back(11);
v.push_back(77);
// Simple print out loop
for(auto i = v.begin(); i != v.end(); ++i) {
std::cout << *i << " ";
}
std::cout << std::endl;
std::copy(arr, arr + sizeof(arr) / sizeof(arr[0]), std::back_inserter(v));
// Print out via for_each and lambdas
for_each(v.begin(), v.end(), [](int n) { std::cout << n << " "; });
std::cout << std::endl;
Back inserters return an insert_iterator
. Dereferencing and incrementing an insert_iterator
adds a new element using push_back
. For example, passing to to a back_inserter
to copy will add elements.
Note that this example can be executed better via a range construct in a constructor
std::vector v(arr, arr + sizeof(arr) / sizeof(arr[0]));
However,
v.insert(v.end(), arr, arr + sizeof(arr) / sizeof(arr[0]))
is more efficient than back_insertor
since it only calls push_back
once.
We can then sort this vector via STL
sort(v.begin(), v.end());
// Another way to print out
copy(v.begin(), v.end(), ostream_iterator(std::cout, " "));
std::cout << std::endl;
Writing to ostream_iterator
writes to ostream, but is not needed in C++ 11 as lambdas provide a cleaner design.
Binary Search
std::binary_search
is not particuarly useful as it just tells you the element is present or absent (it returns a bool
).
On the other hand, std::lower_bound
returns the first position where a value can be inserted and still maintain sorted order. If the element does not exist, lower_bound
returns the first position where the new element could be inserted, and still maintains sorted order.
auto i = lower_bound(v.begin(), e.end(), 50);
std::cout << i - v.begin() << std::endl; // 4
begin()
is returned if the element has to be inserted as the first element. end()
is returned if the element has to be inserted as the last element.
Note that binary search requires that the sequence is sorted beforehand.
std::upper_bound
returns the last position where a value can be inserted and still maintain sorted order. (Remember include-exclude ranges.)
std::equal_range
returns a pair of iterators, containing the lower bound and upper bound.
As we have seen so far, STL uses "less than" as the default comparators. Custom comparators should behave like "less than" (strict weak ordering), i.e. x compared to x should always return false
, otherwise STL will return invalid results.
The order of "same" elements is not guaranteed. Use std::stable_sort
if the order of "same" elements needs to be preserved. std::partial_sort
allows you to sort a small subset of the full collection.
If this sequence was sorted, std::nth_element
will return the nth_element.
Set Operations
set_union
andset_intersection
s can be obtained for sequences.min
can be used to obtain the least elementmax
returns the largest elementmin_element
returns in iterator to the least elementmin_max
returns the minimum and the maximum. This is more efficient than calling min and max separately.
Most algorithms are in <algorithm>
header file, but some are in <numeric>
(for generic numerical algorithms).
accumulate
: Iterate to perform a binary operation value. Initial value can be specified here. The default version uses plus, but accumulate can be changed to use multiply to find the product.iota
: Used to initialize a sequence of increasing numbers. (C++OX only)
std::vector v(10);
iota(v.begin(), v.end(), 0); // {0,1,...,9}
Transformations
std::transform
has two overloads:
- unary: Perform a unary operation on an input sequence and write to an output sequence. In place transformation can be carried out by passing the same sequence as input as well as output.
std::transform(v.begin(), v.end(), result, unary_operation);
- binary: Takes elements from the first and the second sequences, performs a binary operation and then writes the results to the output sequence. Lets you perform pair wise transformation on two sequences.
std::transform(v.begin(), v.end(), u.begin(), result, binary_operation);
Part 8: Regular Expressions in C++11
Regular expressions are declarative string algorithms for flexible string processing. C++ regular expressions have a Perl type expression.
A few examples of regular expression uses include
- Validation: is this input well-formed? (serial numbers)
- Decision: what set does this string belong to? (
.jpg
or.png
) - Parsing: what is in this input? (year)
- Transformation: format strings for output or further procession. (escape speciall characters)
- Iteration: find each occurrence of a pattern within a string. (iterate through URLs)
- Tokenisation: systematically split apart a string. (break a string into whitespace-separated words)
To use regular expressions in C++, we must include the <regex>
header file, which include templated functions. std::regex
is acutally a typedef for basic_regex<char>
(regular strings), and std::wregex
is a typedef for basic_regex<wchar_t>
(unicode).
#include
...
const std::regex r("ab+c");
for(std::string s; getline(std::cin,s); ) {
std::cout << regex_match(s,r) << std::endl;
}
There are three main regular expression functions
-
regex_match
: Does the whole regex match the entire string?- We can construct an object which represents the currently matched component of the string. It is of class
match_results
and is templated on string type. smatch
is a typedef for results for regular stringscmatch
is a typedef for results for const strings
- We can construct an object which represents the currently matched component of the string. It is of class
const std::regex r("a(b+)(c+)d");
for(std::string s; getline(cin,s); ) {
std::smatch m;
const bool b = std::regex_match(s,m,r);
std::cout << b << std::endl;
if(b) {
std::cout << m[1] << std::endl; // Substring that matches the first capture group
std::cout << m[2] << std::endl; // Substring that mathces the second capture group
}
}
-
regex_search
: Searches for substrings of the large string that match the specified regular expression. Returns the first occurance of the match. Never useregex_search
to iterate through each occurance of the large string; useregex_iterator
instead. This is becauseregex_search
does not handle zero-length string matches, which then may cause infinite loops.sub_match
contains iterators for the start and end of the match- The string can be extracted from the individual matches
const std::regex r("c+d+e+");
for(std::string s; getline(cin,s); ) {
smatch m;
const bool b = std::regex_search(s,m,r);
std::cout << b << std::endl;
if(b) {
std::cout << m[0] << std::endl; // Whole substring that matches
// m[0].first // Iterator to the match
// m[0].first - s.begin() // Starting index of match
// m[0].second - s.begin() // Ending index of match
}
}
regex_replace
: Search and replace strings that match the regular expression. The replacements can be controlled at capture group level.
const std::regex r("c+d+e+");
const string fmt("[$3]-[$2]-[$1]");
for(std::string s; getline(cin,s); ) {
std::cout << std::regex_replace(s,r,fmt) << std::endl;
}
As mentioned previously, regex_iterator
may be used to look at all instances of the matching regular expression. Use sregex_iterator
for regular strings, and wregex_iterator
for unicode strings.
const std::regex r("c+d+e+");
for(std::string s; getline(cin,s); ) {
for(std::sregex_iterator i(s.begin(), s.end(), r), end; i != end; ++i ) {
std::cout << (*i)[2]<< std::endl;
}
}
regex_token_iterator
: Use to generate tokens from a string, e.g. if you are only interested in a subset of capture groups. This is the generalised form of strtok
from the C library.
Note that C++ will parse the regular expression string before passing it onto the regular expression engine. Hence, we must use double-backslashes to escape regular expression backslashes. Furthermore, regular expression tree generation from the string definition of a regular expression is time consuming. This generation should be done once and reused multiple times.
Further information: a regex
object is a pointer to a "Non-deterministic Finite Automaton".
Part 9: R-value References in C++11
L-values are persistent named objects. R-values values are temporaries: objects which do not have a name (e.g. integer literals, or x+y
). Both apply to expressions, which can refer to objects; ojects themselves are not inherently l-values or r-values.
A key question to distinguish l-values and r-values are: "Can you take its address?". If you can, it is an l-value. Otherwise, it is an r-value.
An example of an l-value reference is int& r
, and can be used to bind a reference to a temporary object (when passing an argument by reference).
An r-value reference behaves just like an l-value reference, but with subtle differences. It can bind two r-values specifically, and can never bind l-values.
const string& cr = s+t; // The temporary will persist as long as cr is in scope
string&& rr = s+t; // An r-value reference (not realistic)
There are two functions within the STL, namely std::move()
and std::forward<T> ()
. R-value references support two radically different things
- It supports move semantics, and
- perfect forwarding
C++ uses the copy constructor whenever an object needs to be copied. This happens in cases like:
- Passing a parameter by value
- Returning an object by value
- Assigning an object
The problem here is that a copy constructor might sometimes result in redundant copies. For example:
myvalue = MethodReturingValue()
results in a copy constructor creating an object on return from MethodReturningValue()
, and so another copy constructor is called for the assignment. This occurs in many cases, e.g. when using std::string
, std::vector
, or your own classes with copy constructors and operator=
which dynamically allocates memory.
C++11 defines a move constructor so that this double copy can be avoided.
In the above example, the method returns a value with a copy constructor. Now the compiler knows that this is a temporary object so the second time it calls the move constructor. The move constructor does not copy the object, instead it just replaces the contents of the object.
A move constructor has a signature of:
ClassA(ClassA&& rvalue) // Note the absence of const and double ampersand
Compare this to a copy constructor:
ClassA(const ClassA& lvalue)
C++11 version of standard template library will use move constructor to reuse a temporary object (std::string
, std::vector
). When objects of a class are saved by value in STL containers, defining the move constructor will improve performance as unnecessary copying will be avoided.
std::move
takes a value or an object, and returns an r-value.
ClassA& operator=(ClassA&& other) {
ClassA(std::move(other)).swap(*this);
other.privatePointer = nullptr;
return *this;
}
We need to do this since other
-- an r-value reference -- is an l-value (since it has a name).
As a word of warning, only use r-value references within the move semantics and perfect forward idioms, i.e. move constructors and operator=
move assignments.
We can unify the copy assignment operator and move assignment operator
ClassA& operator=(ClassA other) {
other.swap(*this);
return *this;
}
Part 10: Type Traits for Template Meta-Programming
Type traits enable template meta programming with template argument deduction.
The type_traits
header allows you to check the type of the typename in a template, and define methods that are appropriate for that type. For example,
true_type
andfalse_type
are typedefs for two separate types (they are equivalent of bool in runtime code). Queries about types returntrue_type
orfalse_type
.is_integral
: is the type an integer? (int
,long long
, etc)is_floatingpoint
: is it a floating point type?is_pointer
: is it a pointer type?
true_type
and false_type
can be used to define different overloads for a method. This way you can control the overload that gets called for different types.
#include
template void foo(T t, std::true_type) {
std::cout << t << " integral!" << std::endl;
}
template void foo(T t, std::false_type) {
std:;cout << t << " floating!" << std::endl;
}
template void var(T t) {
foo(t, typename std::is_integral::type());
}
int main() {
bar(1729);
var(3.14);
}
Resources
- C++ Standard Template Library - Stephan T. Lavavej
- EventHelix for video summaries