## Introduction

The following document is a record of useful pointers and tips as I research the C++ Standard Template Library.

A variety of sources are used as I continually update this document. See the Resources section for a full list.

The primary resource is Stephan T. Lavavej’s STL video lectures over at Channel 9, and as such, this document will be arranged as individual video lecture notes.

## Part 1: Containers, Iterators and Algorithms

The C++ Standard Template Library (STL) is a library, as opposed to a framework, for general computation focussed on speed and optimisation of general purpose tasks.

There are three main components to the STL: containers, iterators and algorithms. There are also functors. More on this later. Algorithms and containers are implemented without needing knowledge of each other, and only act on each other via iterators.

The STL extensively uses templates. Hence, when it gets instantiated by the compiler, everything is inline-able, with virtually no call overheads.

### Sequence containers

Containers, such as vector, stores a sequence of elements of any valid type. These include your own classes, as long as they are copyable.

### Vectors and vector memory management

Vectors are flexible, expandable arrays which are contiguous in memory. They are also space efficient.

Since vectors are contiguous in memory, there will be times when more memory is required. In this case, a new contiguous memory block is allocated, and the existing elements are copied over. New elements can then be inserted to the new block; the old block is deleted.

Geometric memory allocation is used to do this, i.e. if there are N elements, vector will allocate memory for kN elements, where k > 1. Common implementations of the C++ STL use k = 1.5, or k = 2. This results in amortized constant time for adding elements.

Containers are flexible, so it can store arbitrary number of elements. This prevents security and correctness problems, like buffer and heap overflows. Each container knows its own size.

Undefined behaviour may occur if you try to access an element of a container, outside of the range of the number of elements, e.g.

The C++ standard does not specify what should happen in these cases.

### Accessing a vector via an iterator

Iterators point to elements within a container, in this case a vector. They are a generalisation of pointers, via operator overloading, abstracting away the implementations for different containters.

begin() returns an iterator to the first element of the vector. end() returns an iterator to the element one past the end of the container. This is known as an inclusive-exclusive range. This makes empty ranges natural, i.e. an empty vector with have begin() and end() iterators pointing to the same element. It also imitates how a normal array works. (pointer arithmetic)

### Using algorithms on iterators

We can include the algorithm header and use any of its algorithms.

The sort algorithm is templated on iterator type. As long as the container is of random access type, e.g. vector, we can use sort() on it. std::list is a doubly-linked list and is not of random access type, i.e. we cannot access the nth element of the list in constant time, so we cannot use sort() on it.

If we have a vector of strings, by default, they are sorted lexicographically. But we can change this.

To preserve element orders when the sort algorithm cannot determine which element should appear first, e.g if the lengths of strings are equal, then we can use std::stable_sort(). This is not the default as it may be less efficient than sort() due to additional checks.

(More containers: deque, forward_list)

## Part 2: Associate Containers

The associative containers in STL are map, multimap, set, and multiset (C++03), unordered_map, unordered_multimap, unordered_set, and unordered_multiset (C++0X).

### Map

Map is templated on its element type, on both its key and value. Elements are automatically sorted.

If we have, as above, a map<int, std::string>, each element has type pair<const int, std::string>. We can also use iterators on maps.

Map is generally implemented as a semi-balanced binary tree, or a red-black tree. This ensures efficient operations, such as

• Insertion: O(log N)
• Deletion: O(log N)
• Iteration: O(N)
• Destroy: O(N)

To read an element from a constant reference of a map, we need to use find(), instead of operator[].

Note that map is the only associative container which has operator[].

The set container is similar to map in terms of operation run times, but it enforces uniqueness. It also permits logarithmic time loookup, as opposed to linear lookup time with vector or map.

### Unordered Map

Unordered maps work in the exact same way as map, but elements are unordered upon insertion. However, its data structure is very different to map. It is in fact implemented as a hash table. Each bucket will be associated with a chain of values, to support collisions.

Insertion into an unordered map is expected to be constant time. In the case of a collision, the worst case is O(N), i.e. efficiency degrades.

Generally use maps for performance guarantees. Unordered maps can be used when performance requires it.

### Deleting multiple entries from a vector

To do this, we need to use the erase-remove idiom.

Removing each unwanted element from the vector with a loop will have performance drawbacks. Firstly, vector will maintain contiguous block memory. So when one element is deleted, it will shift the remaining elements back, in linear time. This is for each unwanted element, so it will result in quadratic time.

Instead, we first use std::remove(), and then std::vector::erase(), which are part of the STL <algorithm> header. These are non-member functions working on elements via iterators.

remove() will maintain two iterators to perform a linear pass on the vector to identify and move valid elements. erase() then removes the unwanted elements at the end of the vector.

## Part 3: Smart Pointers

Introduced from C++0X, we use shared_ptr and unique_ptr in place of new and delete. Shared pointers were originally in the Boost library.

Note that auto_ptr has been officially deprecated.

### Unique Pointers

A unique pointer is the unique owner of a dynamically allocated object. It is a smart pointer: a class which imitates a raw pointer via operator overloading (on * and ->).

We did not call delete since unique_ptr will destroy the object for us once it goes out of scope.

Ownership of the object can be transferred with the std::move() function.

Unique pointers are not copy constructable – it is move constructable – but can be returned from functions. C++0X will automatically perform a move construction upon return.

Ownership transfer is implicit when unique pointer is returned from a method/function

### Shared Pointers

Shared pointers share ownership to an object – they are reference counted shared pointers. Many shared pointers may point to the same object.

The pointed object is destroyed when all the shared pointers pointing to the object are destroyed, preventing resource leaks.

The reference counter is incremented when a new shared pointer points to the object. The reference counter is decremented when a shared pointer to the object is destroyed

The pointed object is deleted when the reference counter goes to zero.

make_shared is cleaner and a more efficient helper function.

Shared pointers may be used in STL containers. For example, you could declare a vector of shared pointers.

## Part 4 & 5: Nurikabe Solver

Solve the Nurikabe puzzle to demonstrate STL. Refer to the source file on the video link page.

class Grid - Stores state of puzzle, top-level member functions, write to file

class Region - keep track of all the contiguous regions

## Part 6: Algorithms and Functors

STL algorithms are usually templated on iterator types (to work with arbitrary containers with arbitrary elements) and predicates.

### Non-modifying/non-mutating algorithms

• for_each: Iterate over all elements to perform an operation.
• count: Counts the elements that match the passed value (==).
• count_if: Counts elements that match a certain criterion via the given predicate.
• find: Finds a matching element in linear time
• find_if: Find an element that matches a certain criterion
• equal: Compares sequences match

Note that lengths are not compared, we can check this manually if needed. The second sequence should contain at least as many elements as the first sequence; it is an error if otherwise.

• all_of: Do all of the elements satisfy the criterion?
• any_of: Do any of the elements satisfy the criterion?
• none_of: Do none of the elements satisfy the criterion?

### Functors

Functors can either be plain functions (non-member functions and we can get function pointers to them), or function objects (classes which overrides operator() ); anything that can be called with “()”. Function objects can then either be hand-written or lambdas. Compilers tend to do a better job inlining function objects and lambdas.

### Modifying/Mutating algirithms

• copy: Copies from a range of input iterators into an output iterator. The output sequence container should have enough space the elements to be copied.

copy() can also be used to expand a container.

• replace_if: Replaces a element that matches a certain criterion.
• transform: Transforms the elements in a container

## Part 7: Algorithms and Sorting

We can also use back inserters to copy elements from one container to another (as another way of performing the copy task in the previous Part)

Back inserters return an insert_iterator. Dereferencing and incrementing an insert_iterator adds a new element using push_back. For example, passing to to a back_inserter to copy will add elements.

Note that this example can be executed better via a range construct in a constructor

However,

is more efficient than back_insertor since it only calls push_back once.

We can then sort this vector via STL

Writing to ostream_iterator writes to ostream, but is not needed in C++ 11 as lambdas provide a cleaner design.

std::binary_search is not particuarly useful as it just tells you the element is present or absent (it returns a bool).

On the other hand, std::lower_bound returns the first position where a value can be inserted and still maintain sorted order. If the element does not exist, lower_bound returns the first position where the new element could be inserted, and still maintains sorted order.

begin() is returned if the element has to be inserted as the first element. end() is returned if the element has to be inserted as the last element.

Note that binary search requires that the sequence is sorted beforehand.

std::upper_bound returns the last position where a value can be inserted and still maintain sorted order. (Remember include-exclude ranges.)

std::equal_range returns a pair of iterators, containing the lower bound and upper bound.

As we have seen so far, STL uses “less than” as the default comparators. Custom comparators should behave like “less than” (strict weak ordering), i.e. x compared to x should always return false, otherwise STL will return invalid results.

The order of “same” elements is not guaranteed. Use std::stable_sort if the order of “same” elements needs to be preserved. std::partial_sort allows you to sort a small subset of the full collection.

If this sequence was sorted, std::nth_element will return the nth_element.

### Set Operations

• set_union and set_intersections can be obtained for sequences.
• min can be used to obtain the least element
• max returns the largest element
• min_element returns in iterator to the least element
• min_max returns the minimum and the maximum. This is more efficient than calling min and max separately.

Most algorithms are in <algorithm> header file, but some are in <numeric> (for generic numerical algorithms).

• accumulate: Iterate to perform a binary operation value. Initial value can be specified here. The default version uses plus, but accumulate can be changed to use multiply to find the product.
• iota: Used to initialize a sequence of increasing numbers. (C++OX only)

### Transformations

std::transform has two overloads:

• unary: Perform a unary operation on an input sequence and write to an output sequence. In place transformation can be carried out by passing the same sequence as input as well as output.
• binary: Takes elements from the first and the second sequences, performs a binary operation and then writes the results to the output sequence. Lets you perform pair wise transformation on two sequences.

## Part 8: Regular Expressions in C++11

Regular expressions are declarative string algorithms for flexible string processing. C++ regular expressions have a Perl type expression.

A few examples of regular expression uses include

• Validation: is this input well-formed? (serial numbers)
• Decision: what set does this string belong to? (.jpg or .png)
• Parsing: what is in this input? (year)
• Transformation: format strings for output or further procession. (escape speciall characters)
• Iteration: find each occurrence of a pattern within a string. (iterate through URLs)
• Tokenisation: systematically split apart a string. (break a string into whitespace-separated words)

To use regular expressions in C++, we must include the <regex> header file, which include templated functions. std::regex is acutally a typedef for basic_regex<char> (regular strings), and std::wregex is a typedef for basic_regex<wchar_t> (unicode).

There are three main regular expression functions

• regex_match: Does the whole regex match the entire string?

• We can construct an object which represents the currently matched component of the string. It is of class match_results and is templated on string type.
• smatch is a typedef for results for regular strings
• cmatch is a typedef for results for const strings
• regex_search: Searches for substrings of the large string that match the specified regular expression. Returns the first occurance of the match. Never use regex_search to iterate through each occurance of the large string; use regex_iterator instead. This is because regex_search does not handle zero-length string matches, which then may cause infinite loops.

• sub_match contains iterators for the start and end of the match
• The string can be extracted from the individual matches
• regex_replace: Search and replace strings that match the regular expression. The replacements can be controlled at capture group level.

As mentioned previously, regex_iterator may be used to look at all instances of the matching regular expression. Use sregex_iterator for regular strings, and wregex_iterator for unicode strings.

regex_token_iterator: Use to generate tokens from a string, e.g. if you are only interested in a subset of capture groups. This is the generalised form of strtok from the C library.

Note that C++ will parse the regular expression string before passing it onto the regular expression engine. Hence, we must use double-backslashes to escape regular expression backslashes. Furthermore, regular expression tree generation from the string definition of a regular expression is time consuming. This generation should be done once and reused multiple times.

Further information: a regex object is a pointer to a “Non-deterministic Finite Automaton”.

## Part 9: R-value References in C++11

L-values are persistent named objects. R-values values are temporaries: objects which do not have a name (e.g. integer literals, or x+y). Both apply to expressions, which can refer to objects; ojects themselves are not inherently l-values or r-values.

A key question to distinguish l-values and r-values are: “Can you take its address?”. If you can, it is an l-value. Otherwise, it is an r-value.

An example of an l-value reference is int& r, and can be used to bind a reference to a temporary object (when passing an argument by reference).

An r-value reference behaves just like an l-value reference, but with subtle differences. It can bind two r-values specifically, and can never bind l-values.

There are two functions within the STL, namely std::move() and std::forward<T> (). R-value references support two radically different things

• It supports move semantics, and
• perfect forwarding

C++ uses the copy constructor whenever an object needs to be copied. This happens in cases like:

• Passing a parameter by value
• Returning an object by value
• Assigning an object

The problem here is that a copy constructor might sometimes result in redundant copies. For example:

results in a copy constructor creating an object on return from MethodReturningValue(), and so another copy constructor is called for the assignment. This occurs in many cases, e.g. when using std::string, std::vector, or your own classes with copy constructors and operator= which dynamically allocates memory.

C++11 defines a move constructor so that this double copy can be avoided.

In the above example, the method returns a value with a copy constructor. Now the compiler knows that this is a temporary object so the second time it calls the move constructor. The move constructor does not copy the object, instead it just replaces the contents of the object.

A move constructor has a signature of:

Compare this to a copy constructor:

C++11 version of standard template library will use move constructor to reuse a temporary object (std::string, std::vector). When objects of a class are saved by value in STL containers, defining the move constructor will improve performance as unnecessary copying will be avoided.

std::move takes a value or an object, and returns an r-value.

We need to do this since other – an r-value reference – is an l-value (since it has a name).

As a word of warning, only use r-value references within the move semantics and perfect forward idioms, i.e. move constructors and operator= move assignments.

We can unify the copy assignment operator and move assignment operator

## Part 10: Type Traits for Template Meta-Programming

The type_traits header allows you to check the type of the typename in a template, and define methods that are appropriate for that type. For example,
• true_type and false_type are typedefs for two separate types (they are equivalent of bool in runtime code). Queries about types return true_type or false_type.
• is_integral: is the type an integer? (int, long long, etc)
• is_floatingpoint: is it a floating point type?
• is_pointer: is it a pointer type?
true_type and false_type can be used to define different overloads for a method. This way you can control the overload that gets called for different types.