Lazy Evaluation
Lazy evaluation delays the evaluation of an expression until its value is needed.1 Adding a mechanism to an stl::map that calculates the value based on the key adds optional lazy evaluation. This can be used to populate the map on demand, spreading the cost of calculating the keys over the time that they are first requested. This can also be a cache to store values that are expensive to calculate and needed multiple times, where you don’t know a priori what values will be needed.
Inheritance
Deriving the new class from std::map is not safe since the base class lacks a virtual destructor. If the derived class was stored as a pointer to the base class, then incorrect destruction and memory leaking could happen. As much as we’d like to inherit from std::map it violates the Liskov Substitution Principle and the No Nasty Surprises mandate. The Liskov Substitution Principle says, given two classes Base and Derived, any place we can use Base& or Base* we can also use Derived& and Derived* respectively. The program is not required to behave identically if the substitution is made (indeed, what would be the point of the substitution?) but it must remain well-formed and behave “correctly” for some version of “correctly.” It is easier to understand the LSP intuitively than it is formally.
Composition to the Rescue
Instead of inheritance, the code uses composition and an implementation of operator[](const key_type&). While it would be wonderful if there was a mechanism in C++ to forward the member functions of std::map to the LazyMap interface, it does not exist. We cannot use the Curiously Recurring Template Pattern (CRTP), also known as Mixins-from-Above, because the base class, std::map, is not parameterized on they equivalent of the derived class. An accessor, getMap() and a mutator map() make the usual std::map functionality available. A little extra work for the client, but they are spared a Nasty Surprise.
Giving the compiler a helping hand
Take a look at the implementation of operator[]() and you see the keyword typename used any place that uses a nested type within Map. Why? Because any type with the required type traits can be used as the parameterizing type, the compiler does not know if Map::key_type refers to a
- Member variable
- Member function
- Nested typedef
To help the compiler out, we give it direction with the typename keyword. Be careful to use it even if your particular compiler doesn’t complain. Sometimes the compiler can figure it out on its own and let’s you slide. This is a problem when moving between compilers. I’ve had a lot of grief over this moving between VS2010 and g++4.4. VS2010 let’s me slide and then I port the code and g++ complains all over the place.
See boost::ref in LazyMapExameple.cpp? That is new but is interesting and powerful enough that it deserves its own post.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
#pragma once //! \file LazyMap.hpp //! \author Jeff Benshetler //! \date 2012-05-08 //! Licensed under \link http://www.boost.org/LICENSE_1_0.txt Boost Software License 1.0 \endlink #include <map> //! \class LazyMap //! \brief Defer populating map entries until the value is requested. //! When a key not in the map is requested, use the provided //! function object that generates the value for a given key. //! \param Gen A function object called with const Key& that returns T. template < typename Key, typename T, typename Gen, class Compare=std::less<Key>,class Allocator = std::allocator< std::pair<const Key,T> > > class LazyMap { public: typedef std::map<Key,T,Compare,Allocator> Map; typedef Gen Generator; private: Generator m_gen; Map m_map; public: //! \param gen Function object that calculates the value when passed a key. //! This must have value semantics. LazyMap(Generator gen) : m_gen(gen) { } // end constructor //! \brief Retrieve the value associated with a key //! from the existing map if possible, otherwise //! use the generator to populate the map //! and return the generated value. //! //! Intentionally overrides map::operator[]. //! \post key is in the map with an associated value typename Map::mapped_type& operator[](const typename Map::key_type& key) { typename Map::iterator iter = m_map.find(key); if ( iter==m_map.end() ) { typename Map::mapped_type mapped( m_gen(key) ); iter = m_map.insert( m_map.begin(), typename Map::value_type(key,mapped) ); } // end if return iter->second; } // end operator[] Map& map() { return m_map; } const Map& getMap() { return m_map; } }; // end class LazyMap |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
#pragma once //! \file LazyMapExample.cpp //! \author Jeff Benshetler //! \date 2012-05-09 //! Licensed under \link http://www.boost.org/LICENSE_1_0.txt Boost Software License 1.0 \endlink #include <iostream> #include <cstdint> #include <cstdlib> #include <ctime> #include <fstream> #include <string> #include <vector> #include <set> #include <algorithm> #include <iterator> #include <boost/ref.hpp> #include "LazyMap.hpp" #include "Elapsed.hpp" //! \brief Reports how many times a word is used in a file. //! This class is NOT production code. It is a stand-in for //! an expensive operation, like a database read, //! SOAP query, or complex mathematical computation. class FileWordCount { std::ifstream m_in; public: FileWordCount(const std::string& inputFilePath) : m_in( inputFilePath.c_str(), std::ifstream::in) { using namespace std; // Exceptions must be explicitly enabled in streams m_in.exceptions(ifstream::failbit|ifstream::badbit|ifstream::eofbit); } void goToFileStart() { using namespace std; try { m_in.seekg(0,ifstream::beg); } catch (ifstream::failure&) { m_in.clear(); } // end catch } // end goToFileStart() std::string readWord() { std::string result; try { m_in >> result; } catch (...) { m_in.clear(); } // end catch return result; } // end readWord() // doesn't handle EOF correctly // but that is not important for our // demonstration std::string nthWord(int N) { goToFileStart(); std::string word; while (N-- > 0) word = readWord(); return word; } // end nthWord() // we "assume" the file is too large to hold in memory and therefore we must query // each word. size_t operator()(const std::string& wordToMatch) { goToFileStart(); size_t count=0; std::string wordFromFile; try { do { wordFromFile = readWord(); if (wordToMatch==wordFromFile) { ++count; } // end if } while ( !wordFromFile.empty() ); } catch (...) { m_in.clear(); } // end catch return count; } // end function call operator }; // end class FileWordCount typedef std::vector<std::string> WordVec; class RandomWord { const WordVec& m_wordsInUse; public: #pragma warning(push) #pragma warning(disable:4244) // warning C4244: 'argument' : conversion from 'time_t' to 'unsigned int', possible loss of data RandomWord(const WordVec& wordsInUse) : m_wordsInUse(wordsInUse) { srand( time( reinterpret_cast<time_t*>(0) ) ); } // end constructor #pragma warning(pop) const std::string& operator()() const { int randIndex = rand() % m_wordsInUse.size(); return m_wordsInUse.at( randIndex ); } // end function call operator }; // end class RandomWord // Always use typedefs when instantiating templates. They are free // because they do not create a new symbol, merely a symbol alias. //! typedef LazyMap<std::string,size_t,FileWordCount& > LazyWordCount; int main(int argc,char* argv[]) { using namespace std; const std::string inputFile("sandwiches.txt"); cout << "Using input file \"" << inputFile << "\"\n"; FileWordCount fileWordCount( boost::ref(inputFile) ); static const int WordsUsed = 100; // there are 100 words of interest in the file static const int TimesUsed = 1000; // we make this many random draws from the words in use and get their use count WordVec wordsInUse; for (int i=0; i<WordsUsed; i++) { wordsInUse.push_back( fileWordCount.nthWord(i) ); } // end for RandomWord randomWord( wordsInUse ); cout << "Reading file each time for word count..."; Elapsed elapsed; // first we check looking up the word each time for (int i=0; i<TimesUsed; i++ ) { fileWordCount( randomWord() ); } // end for auto ms = elapsed().total_milliseconds(); cout << "done in " << ms << "ms\n\n"; LazyWordCount lazyWordCount(fileWordCount); cout << "Using LazyMap to cache word count uses after first access..."; elapsed.reset(); for (int i=0; i<TimesUsed; i++ ) { lazyWordCount[ randomWord() ]; } // end for ms = elapsed().total_milliseconds(); cout << "done in " << ms << "ms\n\n"; cout << "Verifying that both methods reached the same answer...\n"; bool ok=true; WordVec::iterator iCurr = wordsInUse.begin(); WordVec::iterator iEnd = wordsInUse.end(); for (; iCurr!=iEnd; ++iCurr) { const std::string& word(*iCurr); if (!( fileWordCount(word)==lazyWordCount[word] ) ) { cerr << "Failed \"" << word << "\" fileWordCount=" << fileWordCount(word) << ", lazyWordCount=" << lazyWordCount[word] << "\n"; ok = false; } // end if } // end for if (ok) cout << "All word counts matched\n"; } // end main() |
Input data file used with LazyMapExample
The speedup shown in the example is better than 12X. You may think this example is artificial, but it is not too different from database access. Using a similar technique on production code that traversed graphs achieved a speedup of better than 50X. The LazyMap.hpp class is designed for re-use and has a friendly license. The driver program and test data are included so you can take LazyMap for a test drive.
