In this article, I hope to clarify the concept of learn drive by using Minecraft as an analogy.
An analogy can be thought of as a correspondence between concepts. If we think of a concept network, an analogy between two systems represented by two concept networks is like a graph isomorphism between the two networks. To demonstrate an isomorphism (or a partial isomorphism), you have to say which nodes in one graph correspond to which nodes in the other, so that to describe an analogy between the learn drive and Minecraft I would need to say which concepts in (my understanding of) the learn drive correspond to which concepts in (my understanding of) Minecraft.
In this analogy, I am going to equate Minecraft blocks with information, Minecraft items with knowledge, and mining with learning. It is important to note the distinction between an item and a block. An item is something which resides in the player’s inventory and can be used by the player to build things, while a block exists in the outside world, and is not usable by the player unless broken into an item. Another important note is that for the purposes of the analogy, I count any means of acquiring items as mining, not only the narrow definition of braking blocks, so that activities such as getting loot from chests, shearing sheep or reaping farms I also consider “mining”. Let me explain why I chose this correspondence.
Items and knowledge can both be considered as resources owned by individuals. Both items and knowledge have value, which is based on what personal goals those resources can help an individual achieve (in Minecraft those goals often take the form of structures to build or tools to create). I stress that the value is subjective. Knowledge valuable for one person may be useless to another, and items valuable for one may be useless to another.
There is a system in the brain - the knowledge valuation network - capable of computing the value of newly acquired knowledge. We equated items with knowledge, so this leads me to hypothesize about the existence of a system which corresponds to the knowledge valuation network, but which is used for estimating the value of items (notice how recognizing analogies allows us to ask questions and hypothesize the existence of things which we can’t directly see). The knowledge valuation network computes value largely based on the goals which a piece of knowledge serves, and so the item valuation network will compute the value of items on the basis of what goals they help the player achieve. I talk more about the item valuation network below.
If knowledge can be thought of as a resource used to solve problems, and items can be thought of as a resource used to build things (e.g. houses or tools), then it should follow that in our analogy we should equate problem solving with building things.
When presented with information, we have to expend effort in order to understand it, in order to convert it to knowledge. With items being equated with knowledge, the question is what must we expend effort to process in order to arrive at items? Blocks come to mind. The player must expend effort to break blocks so that items fall out, and without this effort the items are not usable. This leads me to equate blocks with information in the analogy, in the isomorphism.
When learning, we find information, expend effort to understand it, and the result is knowledge. When mining, we find blocks, expend effort to break them, and the result is items. So in this analogy it seems sensible to equate mining with learning.
With mining equated with learning, we might try to answer the question of how to learn efficiently by thinking about the question of how to mine efficiently. This approach also illustrates the inspirational value of analogies: we can learn about a system not through direct observation but through the observation of an analogous system. It also shows a value of video games. Video games are rich, they have a lot of systems, and a system in a video game may be analogous to a real world system. Therefore, knowledge attained in the video game can later be applied in the real world. All it takes is for us to say “hey, this real world thing is kind of like that thing in this video game I really like” and thereafter we start reusing what was learned in the game. Back to the question of efficient mining. As players, what decisions must we make and what actions must we perform so that at the end of all of this we have valuable items in our inventory? Thankfully, the brain may already be equipped with a system that makes those decisions for us and which guides us to perform those actions! This means we don’t have to think much about it, but just get out of that system’s way and let it run free!
Let’s call the learn drive analogue in Minecraft the “mine drive”. The mine drive is a hypothetical system that exists in the brain of all players, just as the learn drive exists in the brains of all humans. The mine is activated when there is a need for more items, when the current inventory is insufficient to meet the goals of the players, just like the learn drive is activated when our current knowledge is insufficient to achieve our goals. The mine drive guides the player’s decision-making when exploring the Minecraft world in order to find valuable items, just like the learn drive guides our decision-making when exploring the world in order to find valuable knowledge. The mine drive tells the player in which directions to move and in which not to move, which places to visit and which not to visit, what strategies to use and what not to use.
The guidance of the mine drive is based on the subjective attractiveness the player experiences towards certain actions and choices. When the mine drive computes that a certan action has a high probability of helping the player acquire valuable items, the mine drive makes the player experience that action as appealing, and not undertaking it feel would feel unpleasant. Anything which gets in the way of the action is an annoyance. An action which the mine drive evaluates as not likely enough to help the acquisition of valuable items is experienced as unappealing. The takeaway is that undertaking those actions which seem appealing and ignoring those which seem unappealing will result in the acquisition of valuable items. Put simply, doing what the player feels like doing will result in good things, because that which feels good is that which the mine drive has evaluated is likely to result in good things. This is what the learn drive does as well. The learn drive makes those actions and choices pleasurable which it has computed are likely to lead to valuable knowledge. Watching a video on a topic of interest is an example of such an action. When faced with multiple videos, the learn drive makes the choice of which is best to watch at the moment and we feel drawn to that video. All we know is that the video seems appealing, we are not aware of all of the computations that went into deciding that this video should be watched at the moment.
Lets say you need diamonds, and while exploring the environment you notice the entrance to a cave. Lets say you have often seen diamonds in caves in the past, or in the past someone told you that caves have diamonds. When you see the cave, you are induced by the mine drive to go there, because of the high probability the drive assigns to your finding diamonds in the cave. The mine drive will generate a reward upon your noticing the cave, and the prospect of exploring it will seem appealing to you. This appeal is how the mine drive guides you towards taking the action of exploring the cave. To compare this to learning, lets say you are interested in building some software project and feel that you need to learn more about concurrency. While exploring the web you may come across someone mentioning a free operating systems book. You may know that operating systems make extensive use of concurrency, and so you may feel something pulling you towards opening the book. There is just something appealing about that book. That appeal is how the learn drive will guide you to open the book. If the book does actually talk about concurrency, you will experience a reward of discovery, you discovered valuable information and that is very pleasurable.
In this example I will analyze this video:
[0:23] Python wants to enchant some of his weapons and tools. That is his goal. To achieve it, he knows he needs an enchanting table and books, items which he is currently lacking. This lack drives him to explore the environment in order to attain those items. Analogously, in real life when we don’t have the knowledge we need in order to achieve our goals, we feel driven to explore the world (e.g. the WWW) in order to attain the knowledge we need to achieve those goals.
[2:24] While exploring with the goal of finding obsidian (which he needs to make an enchanting table and Nether portal), you can notice he is skipping a lot of the iron and coal he encounters on his way, he doesn’t even bother to pick it up. In learning this would be like skipping a lot of information which may be potentially useful but which you don’t need right now. However, well-schooled students often carry the bad habit of reading linearly and trying to memorize everything on their path. I know I used to do that. At school we may become conditioned to learn everything “just in case”.
At [8:16] the player is driven to find a village, with the hope that this will result in the acquisition of bookshelves (or other items of value). Recall that books and bookshelves are necessary for [achieving the goal of] enchanting his tools and weapons.
[9:12] He wants to begin the village search, but encounters angry villagers on his path, which means he has a Bad Omen. This event is important because it boosts his valuation of cows milk, as milk will achieve the goal of removing the omen. This is interesting because it shows that a new goal changes valuations. With these new goals in mind, the player will experience a reward if he discovers a cow, whereas without the remove-the-omen goal this reward won’t come. This is also what happens in real life. When we face new problems (e.g. an unexpected health issue), this may make information which was previously uninteresting become interesting (e.g. information on nutrition). These new interests will be taken into account by the learn drive, and information which was previously unpleasant to peruse will now become pleasurable. Some people may worry that if they only learn what is pleasurable they won’t get around to what is important, but the way the learn drive works is that it gives us the greatest pleasure upon learning about what is most important, as pleasure is proportional to value.
[10:55] Here we get a good example of the mine drive reward, the reward of discovering something valuable. While exploring to find a village or milk, he unexpectedly discovers a wandering trader, from whom he could potentially buy valuable items. You can see he is very happy about finding the trader, and also very happy when he sees that he does indeed have valuable items. What is interesting is that he found a trader not by actively searching for one, but by exploring with a different set of goals in mind. This type of serendipity is very common in free learning. To give a personal example, I was searching for information on Epistemology, found a book titled “Knowledge: A Very Short Introduction”, and upon opening the book I found books in the same “A Very Short Introduction” series on different topics that I am interested in. By exploring with the goal of finding books on one topic, I found many books on other topics of interest. This also shows the value of being free to choose your own trajectory, and free to suddenly change direction. In school, where teachers have to stick to a curriculum, often such diversions into other topics during a class on a given topic are suppressed, because teachers have to stick to a curriculum and are often faced with time limits. Imagine someone telling PythonMC when he found the trader “you don’t have time for that, stick to finding a village or milk to remove the bad omen”. I think he wouldn’t be too happy, and he would also miss out on an opportunity to get valuable items.
At [12:30] he discovers a cow, and his interjection “Ah, there we go!” hints of the reward he experienced upon the discovery. Maybe not as high of a reward as when he found the trader, but still a reward. When learning, we don’t only get rewards upon major discoveries, we also get a steady stream of medium- or micro-rewards upon medium- or micro-discoveries. Upon drinking the milk and removing the omen, the goal was achieved, the value of milk is reduced (due to it no longer serving any goals), and hence the player will not finding it worthwhile to change his course to milk any other cows encountered down the road (e.g. like in [16:42]). He’ll just ignore them. If you have the goal of making a website, you may find information on programming very interesting, but then lose that interest once the website is done. From then on, you may feel like ignoring programming information until another goal requires it.
At [13:39] the player feels that finding a vantage point might help him find a village. But what generates this feeling? It could be the mine drive, whose guidance system computed that finding a vantage point has a high probability of helping the player, and so generates a feeling in him which prompts him to actually do it. In learning, we often feel that doing something will help us learn, and this feeling is generated by the learn drive, and is how it guides us to do that something. An issue with school is that it may condition us out of relying on feeling in order to learn, because quite often there is a mismatch between what we feel is right and what school says is right. However, conditioning out the learn drive means we will have trouble learning on our own once school ends.
Upon mounting a summit, he notices smoke particles, which signal the presence of a village. Notice the reward and excitement he experiences upon seeing the smoke – “Oh, we have smoke particles over there! This is a village! Ah, cool! Hahaha!”. Since he used the find-a-vantage-point strategy to find this village, and this lead to a reward of discovery, the mine drive may guide the player to use that strategy more often in the future. If the strategy failed, maybe he would have lost a bit of confidence in its effectiveness. What is important is that he was free to choose his strategy, and its outcome calibrated the mine drive with respect to the strategy. This hints why freedom is so important. Without freedom, we can’t learn viscerally learn what mining/learning strategies work and what don’t. Freedom is essentail for the development of an effective mine/learn drive. In school, where learning choices are made not by the learners themselves, there is no learn drive calibration. After we leave school, even if we still have a learn drive, it may be less effective, at least until we recalibrate it.
At [14:45], the player starts collecting items, and you can tell he finds the process very enjoyable (e.g. at [15:43] you can hear him “Ooh man, I love villages!”). The process of collecting the items is highly rewarding. In learning, not only is the discovery of valuable information rewarding, but the act of understanding information and gaining knowledge is also very (if not more) rewarding. If not, it only means the knowledge isn’t very valuable for you (as determined by the knowledge valuation network).
This is the analogue of the knowledge valuation network in the brain. Items are deemed valuable by the item valuation network if they serve the goals of the player. For example, the player has the goal of surviving, so they value armor, weapons and food. Diamonds are an item that can be used to build armor and tools, so diamonds are valued by the player, because they ultimately help the goal of staying alive. If the player has the goal of building some structure, then all items which help build that structure are valuable: wood, stone, whatever. Just as goals are in constant flux, so are valuations. The mine drive relies on the output of the item valuation network in order to determine what is worth exploring for.
The item valuation network is also at work when comparing the value of two items. For instance, when you have a full inventory but have a valuable block in front of you, the item valuation network makes the choice of what, if anything, to leave behind to mine the block and take its item. For example, if we go back to this video, you can see him making such choices at [20:16] (drops lantern to get books), [20:43] (drops iron ingots to get leather) and [20:48] (drops apples to get meat). Interestingly, the last example shows that it may be a bit hard to make the choice.
As with all choices, the player will just feels that it is a good idea to drop something in favor of something else, and they may be unaware of the computations made by the item valuation network in order to generate this feeling, all they know about is the feeling itself.
Normally when players fall asleep in Minecraft, their inventory and chests remain the same after waking up. But imagine that instead of this there was an automatic process activated upon sleep which organized all of the items of the player in a way which makes them easy to find, and that garbage items (e.g. dirt collected when digging) are just thrown away (the item valuation network may be used to determine which items are garbage and which should be kept). Something similar may be happening in real life sleep: our memories get optimized (see Neural optimization in sleep).
Lets think about “Minecraft school”. In this school, instead of being free to play Minecraft at any time you wish, you are coerced into playing it at times chosen by others (times which often conflict with your sleep). Instead of freely exploring the Minecraft world you are just given a long row of blocks chosen by “teachers”, and are then told to mine the row without going out of sequence. Instead of choosing for yourself what structures to build, you are tasked with creating predefined structures, the same structures that everyone else is building, and that you will be graded based on how well you built these structures. This starts looking less like a fun game and more like a “prison mine”. But this is similar to what happens at real school. Students don’t get to choose when to learn, what and how to learn, and they are often tasked with solving minor problems they don’t care for.
The problem with this “row of blocks” type of mining is not only that it is exceedingly boring. It is also highly inefficient. Players have different things they want to build, but in Minecraft school they are all given the same row of blocks to break. This is like real school, where students hae different interests and goals but are all run through the same curriculum. In Minecraft school, the rate at which the players are forced to break the blocks is too fast, so that in order to collect new items they have to throw old items out, in order to make room in their inventory. Something similar happens in school, where excess volume and excess speed result in a leaky vessel approach to learning: new knowledge easily displaces old knowledge in a process that results in minimal learning (see Interference). When the blocks to mine aren’t chosen by the players themselves, too many useless items and too few useful items will be collected. On the other hand, in “free mining” where item collection is guided by the player’s mine drive, the optimum amount of items of each kind will be collected, because the mine drive will stop giving the player pleasure when they have accumulated enough items of a given type. In real school, because what to learn is not under our control, we often learn too much of what is not interesting and important to us, and too little of what is actually useful for us. However, in free learning we accumulate the optimum amount of knowledge about a given subject, because free learning is guided by pleasure, and the pleasure stops when the usefulness of the knowledge stops, and learning about thing useless to us is unpleasant.
Because mining in Minecraft school is so boring, players may start believing that mining is an inherently unpleasant activity, even when no longer in the school. To guide players away from bad mining choices (e.g. breaking iron ore when iron is not needed), the mine drive generates an aversion to those choices, but if players are made to believe that mining is inherently unpleasant, they may ignore this aversion, making the whole mining process less efficient. This is what happened to me. I thought that learning is supposed to be unpleasant, and so when the learn drive generated displeasure towards learning something as a guidance against persisting to learn it, I did persist, thinking that this displeasure is an inherent part of learning.
In Minecraft school, players never have to explore in order to collect their items, they just break blocks in a predefined sequence. As a result, they may never learn how to search and traverse the environment in order to collect the items they need. They may even think that certain items can only be collected in Minecraft school. Some people believe that certain information you can only get in school, whereas all it would take is to explore the Web a bit to find it. Freedom while exploring is essential to become good at exploring.
Imagine you need some specific type of item, e.g. iron, and you need it fast (e.g. because you expect to face some enemy in the very near future and you need the iron to make weapons and armor). So you rush down in the cave and speed through it until you find iron ores, extract them, then you again speed through until you find more, etc.
You maybe got your iron fast, but you also missed out on opportunities to collect other valuable resources. There simply wasn’t time to mine non-urgent blocks. Next time you need them, you’ll have to go in the mine again and may even have to traverse the same pathways. Obviously at that point it would have been more efficient for you to extract everything on the first trip, thus amortizing the trip cost, instead of going on two separate trips.
When learning in a state of rush, for example when preparing for a test, we go through information sources (e.g. books, articles, videos) which we feel provide test-relevant information but we ignore all information which isn’t relevant for the concepts we are to be tested on. This entails missing many learning opportunities one could have seized when focused on that page or video extract.
Creative elaborations are not availed. Remote connections are not made between what the book talks about and the unique concepts of interest in our minds. Analogies are not recognized. If only we were to slow down now, we would gain so much more in the long run, at the cost of performing the current task more slowly.
Another way to look at it is in terms of brain states. A brain state consits of a set of active concepts with a subset of those being in focus. We change brain states by activating and deactivating concepts, and inserting/removing from the focus set. The set of all possible brain states makes up a space, with each specific brain state being a location in that space. So we can liken a brain state to a physical location in Minecraft, like the cave in our example.
When we are in some brain state, we are in a unique position to create knowledge from it (like make a connection between the focus concepts and some other active concept in the background). But if we are rushing, we move away from a brain state before all knowledge could have been extracted. All we extracted was the relevant for the current goal, but we could have missed out on the relevant for other goals which don’t happen to be urgent at the time. To create the knowledge we missed, we’d maybe have to come back to the state later, which, assuming it’s even possible, carries an expense of time.
So in summary, when we rush we achieve a single and specific goal quicker, but when we take it slower, over the long run we achieve more goals overall.