Dark Medieval Times

THE RELEVANT INFORMATION PRINCIPLE

Programs often operate on multiple types of data. The relationships between these types are encoded in programs' data structures.

If the behavior of a program changes, the relationships between its types might change, as well. Therefore, it is crucial to program these relationships in a way that will not cause great pain and refactoring when they are modified.

That point is often overlooked. Luckily, there is a simple principle which can be followed to avoid this kind of refactoring.

Consider a program which attempts to find statistically significant performance differences between git commits of another program. To start, a map from input parameter to execution time could be used to organize the data.

	var executionTimes map[string]time.Duration

The map lends itself nicely to the problem; any execution time can be accessed with an O(1) map lookup.

	dur, ok := executionTimes["-quality 100"]

Suppose another parameter is to be tested. Then the map seems to work just as well as before.

	dur, ok := executionTimes["-quality 100 -compress jpeg"]

But structural problems are starting to surface. What if a specific subset of trials is desired? Say, all trials whose -quality is 85. Code must be written to filter trials. Since all the information about each trial is stored as the key of a map, a parser function would take as input a string (a key in the map) along with a predicate and return a boolean.

	pred := func(quality, compress string) bool {
		return quality == "85"
	}
	for param, dur := range executionTimes {
		if !isDesired(param, pred) {
			continue
		}
		process(param, dur)
	}

The code is clean enough, though the implementation of the isDesired function is omitted. Still, three key points arise:

1. With each new feature and change in behavior of the program, the implementation of isDesired must be updated.

2. Suppose another attribute is to be tested which is not an input parameter. For example, suppose the environment is to be modified in varying ways before running each trial. It is clearly not a good idea to encode this in the input parameter strings; the code must be refactored to support this change.

3. The constant-time lookups aren't so valuable anymore, since, in many cases, the program iterates through the entirety of executionTimes.

Enter the relevant information principle. By sacrificing the perfomance niceties of certain data structures (in this case, the lookup time of maps), programs can be organized in a way much more conducive to future modification.

Consider each type of important data in the program. In this example, the types of important data are trial parameters and trial results. Organize each into a structure containing all relevant information, so that an instance of the type can be processed by itself.

To reiterate: a program should not require any information outside of the instance it's processing, in order to process it. If there is map from k to v, processing v should not require knowing that k was its key in the map. That information should be within v itself. Do this even if the type will only contain a single primitive.

Refactor the example program, adhering to the relevant information principle:

	type TrialParams struct {
		Quality int
		Compress string
	}

	type TrialResult struct {
		TrialParams
		Duration time.Duration
	}

Keep the instances of these types in simple lists (Golang slices will do):

	var trialParams []TrialParams
	var trialResults []TrialResult

Now, the internal data structures of the program will rarely, if ever, have to be modified to add features and change behavior.

Adding an attribute to trial parameters involves adding a field to the TrialParams struct. None of the code which operates on the trialParams slice or on an individual TrialParams instance will have to be modified. This enormous benefit is exhibited in the TrialResult code as well.