|
NB This isn't an article so much as an essay I wrote for a university course. I think it shows! Being able to write like this won't necessarily make you a better horse-trainer, but being able to understand the concepts - and to recognise them in practise - will be a big step in the right direction.
It can be argued that the science of operant conditioning began with
Thorndyke's Law of Effect (Thorndyke 1911). Extension of this law gives
a standard definition of reinforcement: a reinforcer is a stimulus that,
when presented following a behaviour, increases the probability that the
behaviour will reoccur. Thus positive reinforcement (PR) is the addition of a
pleasant stimulus, or reward, and negative reinforcement (NR) is the cessation of an
aversive stimulus. In contrast to reinforcement, punishment (P) can
be defined as a consequence that, if presented immediately following a
behaviour, makes the behaviour less likely to reoccur (e.g. Burch &
Bailey 1999; see Fig. 1).
Figure 1: Contingencies of reinforcement: arrows indicate the likelihood of recurrence of behaviour
A positive reinforcer is anything that will motivate the
subject, e.g. a cash bonus, praise or
promotion following hard work or a food reward for a horse which
performs a desired behaviour. The reward must be
presented simultaneously with, or within three seconds of, the behaviour; if the reward is delayed then it will not be clear which behaviour is being rewarded and the desired behaviour
will be less likely to occur. The effect of PR is to
increase motivation and confidence; it encourages creativity and exploration of the environment since there is no fear of a negative
consequence of an ``incorrect action".
NR is typically the release of an aversive stimulus; a
negative reinforcer is a stimulus something will work to avoid. If the
desired behaviour causes the aversive to be reduced then it is ``escape"
NR, e.g. moving away from the fire when it
becomes too hot or a horse learning to stop when pressure is applied to
the bit, provided the reins are released as soon as the horse halts.
Alternatively ``avoidance" NR involves working to
avoid the escalation of the aversive, such as opening a window to allow
the heat to escape or the horse learning to halt because otherwise the rein-pressure will increase.
The timing of the release is critical so that the subject
understands which behaviour is being reinforced. The release must coincide with the desired behaviour or there will be a
punishing effect. However, even a
well-timed release does not necessarily teach the animal the desired
behaviour, merely a means of avoiding the pressure. It must be used with
care so as not to cause loss of confidence, confusion and resentment.
Studies on horses comparing the effects of positive and negative
reinforcement on horses have suggested that some
horses have a higher response to NR and others to
PR (e.g. Visser et al. 2003). However, the study does
not include reasons as to why this might be or address the possible
welfare issues implicated in stating that some horses respond better to
avoidance training.
By definition, a punisher needs to be aversive in order to eliminate an
undesirable behaviour. For example we may be hungover following excess alcohol or whip a horse for refusing a jump. Occurring after the behaviour,
P does not alter the fact that the behaviour has taken place and provides no information as to the desired behaviour. Instead it
just teaches the subject avoidance of the aversive (e.g. taking
pain-killers in anticipation of a hangover or a horse
refusing to be caught if it expects P) and resentment of the trainer.
In order to maximize its effect, P should be intense (as opposed
to escalated from a mild form which can lead to habituation) and delivered
as quickly as possible after the behaviour (see Fig. 2 for effects of
delayed P).
Figure 2: The effects on rats of punishment (electric shocks), delayed punishment and response-independent shocks. Clearly punishment needs to be immediate for it to be effective (Schwartz et al. 1978 and references therein).
When teaching a horse to load into a trailer, PR can be used to reward every step towards the trailer. A
detailed shaping plan would be necessary and the horse should be happy with each step before moving onto
the next. Care should be taken so that the horse's desire for treats does
not lead to it loading too quickly and flooding itself. Once the steps
forwards and loading are well-established they can be placed on a variable schedule of reinforcement (VSR) so
that eventually just the final result is rewarded. NR
could be used by applying pressure to the lead rope and releasing when the
horse takes a step forwards. In this case it is harder to avoid flooding
since it is necessary to keep the pressure on if the horse halts. Care
should be taken not to allow the horse to release the pressure through
incorrect behaviour e.g. rearing. Loading a horse using P would
require e.g. hitting the horse with a whip every time it halted and/or
walked backwards. Regardless of the method, loading should be practiced in
different locations and at different times of day so as to generalize the
response.
When training the horse to be safely shod by the farrier PR can be used to
reward first shifting the weight and then lifting the foot for
progressively longer periods in response to a cue. Lifting the foot using
NR or P might require squeezing the chestnut (and punishing keeping four
feet on the ground) and releasing as soon as the foot is lifted (NR). If
the horse struggles then the foot could be held tightly and released when
the horse stands still again (NR) or the horse could be smacked/reprimanded (P). In
each case, after gradual shaping the behaviour should be extended to
include the range of leg positions required by the farrier, hitting the
foot with a hammer, various locations (including next to the farrier's
van when another horse is being shod) and various people (including men).
PR when used incorrectly is the least likely of the three to cause
serious psychological trauma. Typical problems might include incorrect
timing and the inadvertent rewarding of undesirable behaviours. For
example a mother might give sweets to a child ``just to keep him quiet"
following a tantrum or a rider may stroke/praise a horse with the
intention of offering reassurance but in reality rewarding ``being
scared". A clicker trainer risks rewarding mugging or establishing behaviour chains, e.g. the horse thinking the desired
behaviour is mugging followed by looking away. PR might not work long-term if
the rewards are insufficiently motivating or if the shaping plan leads to
the desired behaviour too quickly, causing flooding. PR should not be used
to mask pain.
In practice training with PR is not as simple as
Thorndyke's Law of Effect suggests. A behaviour rewarded every time
it is presented will tend to decrease over long periods of time. Instead it is necessary to use a VSR, thus maintaining motivation and a
high response rate, e.g. gamblers winning occasional jackpots
are more susceptible to addiction than through winning a
small amount every time they play (see Fig. 3). Similarly we do not reward an older horse every time it walks a step under saddle
although we may do this initially with a youngster.
Figure 3: A variable-ratio schedule of reinforcement (left) will maintain motivation more successfully than a fixed-ratio schedule (right) (Schwartz et al. 1978 and references therein).
There is also the
possibility that the subject will do the minimum required to earn the
treat, e.g. inadvertently rewarding progressively smaller foot-falls.
Similarly (s)he might only offer previously-rewarded behaviours, rather
than offering new ones, unless creativity is encouraged (e.g. only ever
using the same bread-making recipe Schwartz et al. 1978).
PR training can encourage the subject to offer lots of behaviours which may
be undesirable for the owner, particularly since withholding the reward
leads to escalation of the behaviour. Inadvertent rewarding at the peak of
the extinction burst firmly reinforces that unwanted behaviour
(e.g. horse kicking stable door). PR may also create a worried,
``neurotic" animal desperately trying to offer behaviours, particularly if
used in conjunction with P of incorrect behaviours.
Incorrect use of NR and P can be much more serious. At best it can lead to
pain, fear, confusion, worry and resentfulness, e.g. a poorly timed release of pressure, giving no indication of the correct behaviour.
Alternatively the wrong behaviour could be reinforced, such as rearing to
escape a pressure halter. NR and P could mask pain, typically
where the aversive nature of the stimulus outweighs the aversive nature of
the desired behaviour (e.g. using a gum line to prevent bucking under
saddle; Roberts 2002), and lead to flooding (e.g. forcing someone to
confront a phobia). Conversely the P may be considered
reinforcing by the subject (e.g. the expulsion of a child from school).
If the pressure is not released then NR can become punishing, resulting
in a ``shut down" attitude and conditioned suppression of behaviours, (e.g. electric shock
experiments on rats by Estes \& Skinner 1941). Continued P can
lead to learned helplessness since the subject realises that its
behaviour and the outcomes are independent. This
learning produces the motivational, cognitive and emotional effects of
uncontrollability, causing severe stress and depletion of the
neurochemical necessary for mediation of movement (e.g. Maier \& Seligman
1976). While the original experiments involved the electrocution of dogs, this
effect is apparently closely linked with depression in humans,
although both can be ``alleviated" by forcing the subject to experience
success in re-learning to control his environment (Seligman 1968, 1990).
The long-term effects on the subject of such ``retraining'' were not discussed.
References
Burch M., Bailey J., 1999, ``How Dogs Learn", Howell
Estes W.K., Skinner B.F., 1941, Journal of Experimental Psychology, 29, 390
Maier S.F., Seligman M.E.P., 1976, Journal of Experimental Psychology, 105, 3
Roberts M., 2002, ``From My Hands To Yours", Monty \& Pat Roberts Inc.
Schwartz B., Wasserman E.A., Robbins S.J., 1978, ``Psychology of Learning and Behaviour", 5th edition, Norton
Seligman M.E.P., 1968, Journal of Comparative and Physiological Psychology, 66, 402
Seligman M.E.P., 1990, ``Learned Optimism", New York:Knopf
Thorndyke E.L., 1911, ``Animal Intelligence"
Visser E.K. et al., Applied Animal Behaviour Science, 2003, 80, 311
Copyright Catherine Bell 2004
|