Training Dogs Not to Bark Using Variable Interval Reinforcement
An example of an interval schedule would be an animal trainer training dogs in a shelter not to bark when someone enters the kennels. During this training the trainer walks by and randomly rewards a quiet dog. The dog could be quiet for 30 seconds or 10 seconds, but on average the dog will get rewarded for being quiet every 30 seconds, thus this results in a variable interval schedule of reinforcement (VI 30 sec).
Trainers often take advantage of intermittent schedules of reinforcement to facilitate the persistence of behavior in dog training programs (Hall 2017). For example, detection dogs are trained to search for a target item, like drugs or explosives, for long periods of time. As soon as the dogs detect the item, they are trained to notify the handler. In other words, detecting the item is a cue to engage in a different behavior (notifying the handler). The dogs are then given reinforcement for correctly notifying their handler about the found item. Because reinforcers are delivered only after the dog finds an item, it can be tricky to train the dog to continue to persist in searching behavior since no reinforcement is delivered during that time. To examine the behavior further, Thrailkil et al. (2016) demonstrated how an intermittent schedule of reinforcement can be used to increase the persistence of behavior in a rat model of detection dog training. In their experiment, rats were trained to pull a chain that served as an analog for search behavior. Successfully pulling the chain resulted in the production of a lever that was analogous to finding a target item. The lever presentation cued a lever press, which was then reinforced with food. Pressing the lever was analogous to notifying the handler about the found item. All rats were first trained to pull the chain on a continuous reinforcement schedule, meaning that each chain pull gave the rats the opportunity to press the lever. Later, for some rats, the schedule of reinforcement was slowly faded to an intermittent schedule so that pulling the chain produced the lever only one‐third of the time. For other rats, the schedule of reinforcement remained continuous such that every chain pull produced the lever. To test how the two groups of rats would behave when reinforcement is no longer available, the researchers stopped providing the lever altogether. The rats that underwent intermittent reinforcement persisted in the chain‐pulling behavior for a much longer period of time than rats that received continuous reinforcement. This is good news for those dogs working in the field—as long as they find the target item every now and then and get reinforcement, their searching behavior should maintain for long durations.
3.4.1 Conditioned Reinforcement and Conditioned Punishment
When using reinforcement in animal training, we often think of using food, like meat‐flavored treats. Food is a biologically based reinforcer, along with others such as water, shelter, and mating, and these are all called primary or unconditioned reinforcers. The same goes for punishers. Some stimuli are unconditioned and function as punishers because of their inherent aversiveness, such as a painful electric shock.
It is obvious, especially when analyzing human behavior, that most of what influences behavior is not a piece of food or access to a mate. Instead, human behavior is often influenced by stimuli that are more complex. For example, students study to get good grades, employees work for money, and children draw silly cartoons for their parent’s approval. These stimuli (i.e., grades, money, and approval) get their reinforcing efficacy through the individual’s prior learning experience. Without an associative learning history, a good grade or a dollar bill are unlikely to produce any behavior changes. In this respect, they begin as neutral stimuli. Neutral stimuli acquire reinforcing function by being paired with an already established reinforcer. After repeated pairings of the neutral stimulus and a reinforcer, the neutral stimulus becomes a conditioned reinforcer. This should sound familiar! The classical conditioning process of stimulus‐stimulus pairings results in the capacity for neutral stimuli to become conditioned reinforcers or conditioned punishers (Williams 1994).
Conditioned reinforcement has been thoroughly investigated in the behavioral laboratory. In the laboratory, when pigeons earn food reinforcers by pecking a key, a grain dispenser, also called a food hopper, is made accessible for a certain period of time so that the pigeon can consume the primary reinforcer (grain). When the food hopper activates, it produces a distinct sound. After repeated pairings of the sound and food, the sound itself becomes a conditioned reinforcer and thus can strengthen behavior (Kelleher 1961). This means that the pigeon will peck at the key just to produce the hopper sound!
Conditioned reinforcers have been shown not only to be effective in strengthening or maintaining behavior, but they can also establish new behavior (Alfernik et al. 1973). A dog is not born wanting to play with toys, but when that toy is paired with primary reinforcers such as social interaction, the toy itself can reinforce a response. The toy can be used to reinforce behaviors the dog already knows as well as behaviors that the dog is learning.
Although conditioned reinforcers can maintain learned responses and establish new ones, they are at risk of losing their reinforcing value if they aren’t periodically paired with the unconditioned reinforcer. If the pigeon’s key pecks produced only the sound of the hopper but no food, after a while the pigeon would stop pecking. The sound will only function as a reinforcer if it is occasionally paired with food. Similarly, money maintains its reinforcing value because it can be exchanged for goods and services. If someone tried to use Canadian dollars in the United States, the Canadian dollars will lose their reinforcing value quickly because they are no longer paired with other reinforcers.
The same concepts that apply to conditioned reinforcers also apply to conditioned punishers. For example, some dog owners use invisible fencing systems to keep their dogs within the boundaries of their yard. When the dog approaches a boundary, the dog’s collar emits a tone and then shortly thereafter a shock. After some experience hearing the tone and then experiencing the shock, the tone alone is aversive to the dog, and the dog refrains from approaching the boundaries. Once a stimulus becomes a conditioned punisher, it can successfully diminish behaviors beyond the context in which they were first paired. That same collar tone could be used to suppress barking, jumping, or potentially any other operant behavior. However, the punishing effects of the tone will eventually wear off if the shock no longer accompanies it. For a conditioned punisher to maintain its suppressive effects, it too must occasionally precede the unconditioned punisher.
Readers experienced in animal training may wonder why we don’t discuss clickers in this section. For those non‐animal trainers, clickers are hand‐held devices that, when pressed, make a clicking sound. Clickers and similar devices (such as whistles) are discussed by animal trainers as conditioned reinforcers because they are paired with food. However, this function has been questioned (Dorey and Cox 2018), and more research needs to be conducted to make this claim.
3.4.2 Extinction and Shaping
Behaviors maintained by consistent and predictable reinforcement are highly sensitive to discontinuing reinforcement (Williams 1994). For example, if someone pressed an elevator button, but it didn’t light up to indicate that an elevator was on its way, what would the person do? Most people would press the button again, maybe a few more times in rapid succession, or hold the button down harder and longer than usual. After a few attempts, most people would eventually just take the stairs. The process by which a response stops occurring when reinforcement no longer follows the behavior is termed extinction.
Extinction can be both a process and a procedure. Extinction as a procedure entails withholding the reinforcer that previously maintained a response. Extinction as a process involves the decrease and eventual elimination of a response. It is important to note this difference because for extinction as a process to successfully occur, the reinforcer that is maintaining the response must be identified. Sometimes we assume that a behavior is maintained by a certain reinforcer, but relying on assumptions can lead us astray when trying to implement extinction to decrease behavior.
The discovery