AppPkg/Applications/Python/Python-2.7.10: Initial Checkin part 3/5.
The Objects directory from the cPython 2.7.10 distribution, along with the LICENSE and README files. These files are unchanged and set the baseline for subsequent commits. Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Daryl McDaniel <edk2-lists@mc2research.org> git-svn-id: https://svn.code.sf.net/p/edk2/code/trunk/edk2@18739 6f19259b-4bc3-4df7-8a09-765794883524
This commit is contained in:
committed by
darylm503
parent
7eb75bccb5
commit
53b2ba5790
270
AppPkg/Applications/Python/Python-2.7.10/Objects/dictnotes.txt
Normal file
270
AppPkg/Applications/Python/Python-2.7.10/Objects/dictnotes.txt
Normal file
@@ -0,0 +1,270 @@
|
||||
NOTES ON OPTIMIZING DICTIONARIES
|
||||
================================
|
||||
|
||||
|
||||
Principal Use Cases for Dictionaries
|
||||
------------------------------------
|
||||
|
||||
Passing keyword arguments
|
||||
Typically, one read and one write for 1 to 3 elements.
|
||||
Occurs frequently in normal python code.
|
||||
|
||||
Class method lookup
|
||||
Dictionaries vary in size with 8 to 16 elements being common.
|
||||
Usually written once with many lookups.
|
||||
When base classes are used, there are many failed lookups
|
||||
followed by a lookup in a base class.
|
||||
|
||||
Instance attribute lookup and Global variables
|
||||
Dictionaries vary in size. 4 to 10 elements are common.
|
||||
Both reads and writes are common.
|
||||
|
||||
Builtins
|
||||
Frequent reads. Almost never written.
|
||||
Size 126 interned strings (as of Py2.3b1).
|
||||
A few keys are accessed much more frequently than others.
|
||||
|
||||
Uniquification
|
||||
Dictionaries of any size. Bulk of work is in creation.
|
||||
Repeated writes to a smaller set of keys.
|
||||
Single read of each key.
|
||||
Some use cases have two consecutive accesses to the same key.
|
||||
|
||||
* Removing duplicates from a sequence.
|
||||
dict.fromkeys(seqn).keys()
|
||||
|
||||
* Counting elements in a sequence.
|
||||
for e in seqn:
|
||||
d[e] = d.get(e,0) + 1
|
||||
|
||||
* Accumulating references in a dictionary of lists:
|
||||
|
||||
for pagenumber, page in enumerate(pages):
|
||||
for word in page:
|
||||
d.setdefault(word, []).append(pagenumber)
|
||||
|
||||
Note, the second example is a use case characterized by a get and set
|
||||
to the same key. There are similar use cases with a __contains__
|
||||
followed by a get, set, or del to the same key. Part of the
|
||||
justification for d.setdefault is combining the two lookups into one.
|
||||
|
||||
Membership Testing
|
||||
Dictionaries of any size. Created once and then rarely changes.
|
||||
Single write to each key.
|
||||
Many calls to __contains__() or has_key().
|
||||
Similar access patterns occur with replacement dictionaries
|
||||
such as with the % formatting operator.
|
||||
|
||||
Dynamic Mappings
|
||||
Characterized by deletions interspersed with adds and replacements.
|
||||
Performance benefits greatly from the re-use of dummy entries.
|
||||
|
||||
|
||||
Data Layout (assuming a 32-bit box with 64 bytes per cache line)
|
||||
----------------------------------------------------------------
|
||||
|
||||
Smalldicts (8 entries) are attached to the dictobject structure
|
||||
and the whole group nearly fills two consecutive cache lines.
|
||||
|
||||
Larger dicts use the first half of the dictobject structure (one cache
|
||||
line) and a separate, continuous block of entries (at 12 bytes each
|
||||
for a total of 5.333 entries per cache line).
|
||||
|
||||
|
||||
Tunable Dictionary Parameters
|
||||
-----------------------------
|
||||
|
||||
* PyDict_MINSIZE. Currently set to 8.
|
||||
Must be a power of two. New dicts have to zero-out every cell.
|
||||
Each additional 8 consumes 1.5 cache lines. Increasing improves
|
||||
the sparseness of small dictionaries but costs time to read in
|
||||
the additional cache lines if they are not already in cache.
|
||||
That case is common when keyword arguments are passed.
|
||||
|
||||
* Maximum dictionary load in PyDict_SetItem. Currently set to 2/3.
|
||||
Increasing this ratio makes dictionaries more dense resulting
|
||||
in more collisions. Decreasing it improves sparseness at the
|
||||
expense of spreading entries over more cache lines and at the
|
||||
cost of total memory consumed.
|
||||
|
||||
The load test occurs in highly time sensitive code. Efforts
|
||||
to make the test more complex (for example, varying the load
|
||||
for different sizes) have degraded performance.
|
||||
|
||||
* Growth rate upon hitting maximum load. Currently set to *2.
|
||||
Raising this to *4 results in half the number of resizes,
|
||||
less effort to resize, better sparseness for some (but not
|
||||
all dict sizes), and potentially doubles memory consumption
|
||||
depending on the size of the dictionary. Setting to *4
|
||||
eliminates every other resize step.
|
||||
|
||||
* Maximum sparseness (minimum dictionary load). What percentage
|
||||
of entries can be unused before the dictionary shrinks to
|
||||
free up memory and speed up iteration? (The current CPython
|
||||
code does not represent this parameter directly.)
|
||||
|
||||
* Shrinkage rate upon exceeding maximum sparseness. The current
|
||||
CPython code never even checks sparseness when deleting a
|
||||
key. When a new key is added, it resizes based on the number
|
||||
of active keys, so that the addition may trigger shrinkage
|
||||
rather than growth.
|
||||
|
||||
Tune-ups should be measured across a broad range of applications and
|
||||
use cases. A change to any parameter will help in some situations and
|
||||
hurt in others. The key is to find settings that help the most common
|
||||
cases and do the least damage to the less common cases. Results will
|
||||
vary dramatically depending on the exact number of keys, whether the
|
||||
keys are all strings, whether reads or writes dominate, the exact
|
||||
hash values of the keys (some sets of values have fewer collisions than
|
||||
others). Any one test or benchmark is likely to prove misleading.
|
||||
|
||||
While making a dictionary more sparse reduces collisions, it impairs
|
||||
iteration and key listing. Those methods loop over every potential
|
||||
entry. Doubling the size of dictionary results in twice as many
|
||||
non-overlapping memory accesses for keys(), items(), values(),
|
||||
__iter__(), iterkeys(), iteritems(), itervalues(), and update().
|
||||
Also, every dictionary iterates at least twice, once for the memset()
|
||||
when it is created and once by dealloc().
|
||||
|
||||
Dictionary operations involving only a single key can be O(1) unless
|
||||
resizing is possible. By checking for a resize only when the
|
||||
dictionary can grow (and may *require* resizing), other operations
|
||||
remain O(1), and the odds of resize thrashing or memory fragmentation
|
||||
are reduced. In particular, an algorithm that empties a dictionary
|
||||
by repeatedly invoking .pop will see no resizing, which might
|
||||
not be necessary at all because the dictionary is eventually
|
||||
discarded entirely.
|
||||
|
||||
|
||||
Results of Cache Locality Experiments
|
||||
-------------------------------------
|
||||
|
||||
When an entry is retrieved from memory, 4.333 adjacent entries are also
|
||||
retrieved into a cache line. Since accessing items in cache is *much*
|
||||
cheaper than a cache miss, an enticing idea is to probe the adjacent
|
||||
entries as a first step in collision resolution. Unfortunately, the
|
||||
introduction of any regularity into collision searches results in more
|
||||
collisions than the current random chaining approach.
|
||||
|
||||
Exploiting cache locality at the expense of additional collisions fails
|
||||
to payoff when the entries are already loaded in cache (the expense
|
||||
is paid with no compensating benefit). This occurs in small dictionaries
|
||||
where the whole dictionary fits into a pair of cache lines. It also
|
||||
occurs frequently in large dictionaries which have a common access pattern
|
||||
where some keys are accessed much more frequently than others. The
|
||||
more popular entries *and* their collision chains tend to remain in cache.
|
||||
|
||||
To exploit cache locality, change the collision resolution section
|
||||
in lookdict() and lookdict_string(). Set i^=1 at the top of the
|
||||
loop and move the i = (i << 2) + i + perturb + 1 to an unrolled
|
||||
version of the loop.
|
||||
|
||||
This optimization strategy can be leveraged in several ways:
|
||||
|
||||
* If the dictionary is kept sparse (through the tunable parameters),
|
||||
then the occurrence of additional collisions is lessened.
|
||||
|
||||
* If lookdict() and lookdict_string() are specialized for small dicts
|
||||
and for largedicts, then the versions for large_dicts can be given
|
||||
an alternate search strategy without increasing collisions in small dicts
|
||||
which already have the maximum benefit of cache locality.
|
||||
|
||||
* If the use case for a dictionary is known to have a random key
|
||||
access pattern (as opposed to a more common pattern with a Zipf's law
|
||||
distribution), then there will be more benefit for large dictionaries
|
||||
because any given key is no more likely than another to already be
|
||||
in cache.
|
||||
|
||||
* In use cases with paired accesses to the same key, the second access
|
||||
is always in cache and gets no benefit from efforts to further improve
|
||||
cache locality.
|
||||
|
||||
Optimizing the Search of Small Dictionaries
|
||||
-------------------------------------------
|
||||
|
||||
If lookdict() and lookdict_string() are specialized for smaller dictionaries,
|
||||
then a custom search approach can be implemented that exploits the small
|
||||
search space and cache locality.
|
||||
|
||||
* The simplest example is a linear search of contiguous entries. This is
|
||||
simple to implement, guaranteed to terminate rapidly, never searches
|
||||
the same entry twice, and precludes the need to check for dummy entries.
|
||||
|
||||
* A more advanced example is a self-organizing search so that the most
|
||||
frequently accessed entries get probed first. The organization
|
||||
adapts if the access pattern changes over time. Treaps are ideally
|
||||
suited for self-organization with the most common entries at the
|
||||
top of the heap and a rapid binary search pattern. Most probes and
|
||||
results are all located at the top of the tree allowing them all to
|
||||
be located in one or two cache lines.
|
||||
|
||||
* Also, small dictionaries may be made more dense, perhaps filling all
|
||||
eight cells to take the maximum advantage of two cache lines.
|
||||
|
||||
|
||||
Strategy Pattern
|
||||
----------------
|
||||
|
||||
Consider allowing the user to set the tunable parameters or to select a
|
||||
particular search method. Since some dictionary use cases have known
|
||||
sizes and access patterns, the user may be able to provide useful hints.
|
||||
|
||||
1) For example, if membership testing or lookups dominate runtime and memory
|
||||
is not at a premium, the user may benefit from setting the maximum load
|
||||
ratio at 5% or 10% instead of the usual 66.7%. This will sharply
|
||||
curtail the number of collisions but will increase iteration time.
|
||||
The builtin namespace is a prime example of a dictionary that can
|
||||
benefit from being highly sparse.
|
||||
|
||||
2) Dictionary creation time can be shortened in cases where the ultimate
|
||||
size of the dictionary is known in advance. The dictionary can be
|
||||
pre-sized so that no resize operations are required during creation.
|
||||
Not only does this save resizes, but the key insertion will go
|
||||
more quickly because the first half of the keys will be inserted into
|
||||
a more sparse environment than before. The preconditions for this
|
||||
strategy arise whenever a dictionary is created from a key or item
|
||||
sequence and the number of *unique* keys is known.
|
||||
|
||||
3) If the key space is large and the access pattern is known to be random,
|
||||
then search strategies exploiting cache locality can be fruitful.
|
||||
The preconditions for this strategy arise in simulations and
|
||||
numerical analysis.
|
||||
|
||||
4) If the keys are fixed and the access pattern strongly favors some of
|
||||
the keys, then the entries can be stored contiguously and accessed
|
||||
with a linear search or treap. This exploits knowledge of the data,
|
||||
cache locality, and a simplified search routine. It also eliminates
|
||||
the need to test for dummy entries on each probe. The preconditions
|
||||
for this strategy arise in symbol tables and in the builtin dictionary.
|
||||
|
||||
|
||||
Readonly Dictionaries
|
||||
---------------------
|
||||
Some dictionary use cases pass through a build stage and then move to a
|
||||
more heavily exercised lookup stage with no further changes to the
|
||||
dictionary.
|
||||
|
||||
An idea that emerged on python-dev is to be able to convert a dictionary
|
||||
to a read-only state. This can help prevent programming errors and also
|
||||
provide knowledge that can be exploited for lookup optimization.
|
||||
|
||||
The dictionary can be immediately rebuilt (eliminating dummy entries),
|
||||
resized (to an appropriate level of sparseness), and the keys can be
|
||||
jostled (to minimize collisions). The lookdict() routine can then
|
||||
eliminate the test for dummy entries (saving about 1/4 of the time
|
||||
spent in the collision resolution loop).
|
||||
|
||||
An additional possibility is to insert links into the empty spaces
|
||||
so that dictionary iteration can proceed in len(d) steps instead of
|
||||
(mp->mask + 1) steps. Alternatively, a separate tuple of keys can be
|
||||
kept just for iteration.
|
||||
|
||||
|
||||
Caching Lookups
|
||||
---------------
|
||||
The idea is to exploit key access patterns by anticipating future lookups
|
||||
based on previous lookups.
|
||||
|
||||
The simplest incarnation is to save the most recently accessed entry.
|
||||
This gives optimal performance for use cases where every get is followed
|
||||
by a set or del to the same key.
|
Reference in New Issue
Block a user