Department of Psychology

please read our copyright and disclaimer notice

Doing ANOVAs using MINITAB

jump to Commands, Data storage, Command syntax (basic, subcommands, specifying the design: between subjects, within subjects, mixed mode), Example
The Minitab statistics package includes an Anova package which will meet most of the needs of most psychologists. This was introduced in Release 5 and was consolidated in Release 7.1. The only serious limitations we have yet found are (i) you have to work out contrasts for yourself and (ii) it cannot handle unequal cell sizes or missing data (note that unequal cell sizes can sometimes be dealt with by using the Minitab GLM command). As Minitab is much simpler to use than the bigger stats packs (SPSS etc), and fully interactive, this is probably the system to learn if you are needing to do anovas by computer for the first time, or the first time recently.

Unfortunately, the commands are not dealt with in the standard introductions to Minitab (Ryan, Joiner & Ryan, 1985, Minitab handbook 2nd edn, 1990 reprint; Monk, 1991, Exploring statistics with Minitab; West, 1991, Computing for Psychologists: Statistical analysis using SPSS and MINITAB); and the on-line help facility has been proved by several people's experience to be inadequate. So I have written these notes to try and stop you having to come and ask me how to do it. They don't cover all features of the commands, just those which I have already had to use, with a concentration on the points that have caused me and others difficulty.

These notes assume that you are using Minitab Release 7 on a computer with a unix operating system, but later releases (and Macintosh or PC versions) should be at least compatible, and may make some things easier. They also assume you know the basics of using Minitab (if you don't, get hold of Ryan et al, Monk, or West), and of the analysis of variance (if you don't you shouldn't be using it anyway, stick to discourse analysis).

The commands
Two commands are used for anovas: ANOVA and ANCOVA. (All Minitab command names are written in capitals in these notes, for clarity, but in actual use they can be typed lower case or as a mixture. The same goes for Minitab variable names.) ANOVA does analysis of variance. ANCOVA does analysis of co-variance, i.e. analyses in which as well as design variables with discrete levels, there are continuous variables which may also be determining the results (co-variates); it's a cross between analysis of variance and multiple regression. ANCOVA will do almost everything that ANOVA will do and take the co-variates into account as well, but if you don't have co-variates, use ANOVA. If you can handle ANOVA, the way ANCOVA works should be obvious by extension, so these notes don't discuss it further. Neither ANOVA nor ANCOVA can cope with missing values, and if you give them data containing missing values, they will produce fairly incomprehensible error messages, usually "Unequal cell counts".
Data storage
The first consideration is that all the values of the dependent variable need to be held in a single column. So if, for example, we have scores for one group of subjects in column C1, and scores for a second group in column C2, we will have to use command STACK to get them into a single column (C11, say). HELP STACK should give you enough information to do this.
Having achieved this, we have to set up an additional column for each factor, which will contain to contain the level of that factor corresponding to each value of the dependent variable. Suppose now that we have dependent variable values in C1, and there are two factors, both of them between- rather than within-subjects. Then we need to set up two additional columns (let's say, C2 and C3), to hold the levels of these factors. The columns holding the factor levels must be the same length as the column holding the dependent variable; so a row, taken across all the columns concerned, would give (for a single observation) its dependent variable value and the values of each factor. This is the same system as is used in SPSS and many other statistical packages.
For example, the data for a 2x3 analysis with four observations in each cell of the design might look like this:
```
row C1 C2 C3
  1 11  1  1
  2 15  1  1
  3 17  1  1
  4  9  1  1
  5  9  1  2
  6 11  1  2
  7  7  1  2
  8 10  1  2
  9  7  1  3
 10  8  1  3
 11 12  1  3
 12  6  1  3
 13 19  2  1
 14 23  2  1
 15 26  2  1
 16 22  2  1
 17 18  2  2
 18 15  2  2
 19 22  2  2
 20 17  2  2
 21 24  2  3
 22 29  2  3
 23 21  2  3
 24 31  2  3
```
There is a convenient way of setting up the factor levels by using repeating factors in the SET command. You can find out how by using HELP SET. For example, in the case above, C2 and C3 could be set up by:
```
MTB> SET C2
DTA> 12(1 2)
DTA> END
MTB> SET C3
DTA> 4(1:3)2
DTA> END
MTB>
```
It's quite difficult to remember when the repeat factors should come before or after the parentheses, so it is important always to look at what you have done (e.g. with PRINT C2 C3 in the example above), and check that you have got the right pattern of values. This is easy enough to do.
In this example, it was not necessary to have a column specifying subject numbers. For mixed designs (involving both between- and within-subject factors), however, we will need such a column, for reasons spelled out below. This leads to a tricky point. It would be logical if each subject had a distinct number. So if the above 2_x_3 design with 4 subjects per cell involved two between-subject factors, subject numbers would run up to 24; while if it involved two within-subject factors, they would only run up to 4. Indeed, with this information, the package would be able to work out which factors were within and which between. Logical and desirable though such an arrangement would be, it isn't what Minitab ANOVA does. Instead, the subjects in a particular design cell are all numbered from 1 up to the size of the cell (even when subject 1 in one cell has nothing to do with subject 1 in another cell). So in the above example, subject numbers would run from 1 to 4 regardless of the nature of the factors, and we have to have another way of telling the command about the experimental design.
Command syntax
- Basic syntax.
  At its simplest, input to ANOVA looks like this:
```
MTB> ANOVA C1=C2 C3 C2*C3
```
  Let's break this down.
  MTB> is the Minitab prompt and is produced by the computer, not typed by the user. (In other situations, Minitab gives a different prompt, but in general, everything up to the first > sign will be the prompt).
  ANOVA is just the name of the Minitab command we are using.
  What comes before the = sign specifies the dependent variable, whose data is to be analysed. So in this case the dependent variables are to be found in C1. If we put several column names or numbers before the = sign, ANOVA will do an anova on each of them, using the same factors for each.
  After the equals sign comes the specification of the independent variables. In the example, C2 and C3 are columns containing the levels of the factors in the experimental design. Since we specify both of them, and the term C2*C3 which stands for their interaction, this ANOVA command would do a factorial anova on data held in column C1, finding the main effects of C2 and C3 plus their interaction. The form of anova done would assume that both factors are "fixed" (rather than "random") and "between subjects" rather than "within subject", i.e. there are no "repeated measures" (we will see later how to relax these assumptions). The within-cell variance would be used as the error term, so there would have to be at least two observations for every combination of factor levels. The analysis would give you the conventional anova table (including significance levels) with tests on both main effects and the interaction. Note that it is not necessary to have a column giving the subject number within each cell.
  As in all Minitab commands, columns can be identified by names rather than C<number>, provided the names have previously been set up by the NAME command. Remember that Minitab names are restricted to 8 alphameric characters, of which the first must be a letter. Unusually, though, in the ANOVA command it is not necessary to put single quotes around names. Correspondingly, it is not permitted to add extra words to the command line in order to make it more readily understood, and this applies whether or not names are used. So
  MTB> ANOVA dep var in C1=factor levels in C2 and C3
  would not be legal, although corresponding text is allowed in all other Minitab commands. In practice, however, with anovas of any complexity, it usually saves typing to specify columns by their numbers. As in all Mintab commands, column names will be used on all output, which is where they are really needed.
- Use of sub-commands
  Additional information is given to the ANOVA command by using sub-commands, a standard Minitab procedure. This procedure is used to specify which variables are within-subject, for example, or to request that tables of means are provided. Both these topics are discussed further below, but here is an example to illustrate the syntax:
```
MTB>ANOVA C1=C2 C3;
SUBC>MEANS C2.
```
  Note the semi-colon (to tell the system that a sub-command is coming on the next line) and the full stop (to tell it that this is the last sub-command). This ANOVA command would do the same analysis as above, but in addition give you tables of the mean value of C1 for each level of the factor that is stored in C2.
- Specifying the design
  - All variables between subjects. This is the simplest case to specify. We have already seen that to include a factor whose levels are held in column C2 in the design, we just include C2 after the = sign in the command line; and to include an interaction between two factors, we give both column names with an asterisk between them. Multi-way interactions can be specified in the obvious way, e.g. C2*C3*C4, or AGE*SEX*INCOMEGP*HANDNESS if AGE, SEX, INCOMEGP and HANDNESS have all been defined as column names.
    We are not bound to calculate all the possible effects. If we just gave the command
```
MTB> ANOVA C1=C2 C3
```
    the interaction would be absorbed into the error term. This is what we would have to do if we only had one observation per cell.
    When there are several factors, it gets laborious to write out all the main and interaction effects required. There is an abbreviated form to help with that: the symbol | between two column numbers/names will tell Minitab to calculate their main effects and all their interactions.
```
MTB> ANOVA C1=C2|C3|C4
```
    will compute the main effects of C2, C3 and C4, the three two-way interactions, and the three-way interaction. So it is exactly equivalent to
```
MTB> ANOVA C1=C2 C3 C4 C2*C3 C2*C4 C3*C4 C2*C3*C4
```
    Note, though, that this form would fail if there was only one observation per cell. There is a general Minitab ANOVA rule that the highest possible interaction term is always included in the analysis, and with only one subject per cell, this is the C2*C3*C4 interaction (which is in fact used as the error term). Including it in the model specification is thus in effect asking for it to be included twice, and it leads to an error message.
  - Repeated measures (within-subject factors): general principles Some statistics packages send you into frightful contortions if you want to do an analysis of variance with repeated measures, because they insist on treating them as multivariate analyses. Minitab deals with them using the more usual approach, which accepts a much greater parallelism between between- and within-subjects variables.
    To understand the principles of design specification for Minitab ANOVAs, it is necessary to grasp the difference between a fixed and a random factor.
    A fixed factor is one whose value is known exactly, and for which we can say that we have used all the possible values (within a certain range) in our design. Gender is the archetype. A random factor is one for which we can only make a random sampling from all the possible values it might have. Subjects are the archetypal random factor, but not the only possible one. For example, if we do a psycholinguistics experiment using a set of 100 English words and a set of 100 Dutch words, then English vs. Dutch is a fixed factor, but within it, we have a random factor, words.
    If the only random factor we have in an experiment is the subjects sampled, independently, within each cell of the design, we don't need to tell Minitab about it. It will assume that each cell contains a random sample of observations from some population. But if that isn't the sampling structure, we need to say what it is. And if we have repeated measures, the samples of observations within each cell are not independent; in some groups of cells, the same subjects are being used. We give Minitab this information by putting the subjects factor into the ANOVA command line, in a way that tells the system how it is being sampled in the particular design we are currently analysing. Because we have done that, we have to use the subcommand RANDOM to tell Minitab that this factor is random.
    To use subjects as an explicit factor, we have to have an additional column, which contains the within-cell subject number corresponding to each observation, as discussed above under "Data storage". To make things easier to follow, we give this column the name SUBJECTS. So with a simple analysis involving just one factor (C2), which is varied within subjects, we would write:
```
MTB> ANOVA C1=SUBJECTS C2;
SUBC>RANDOM SUBJECTS.
```
    The SUBJECTS factor is put first to get the anova table laid out properly. Note that this will produce a warning message against the SUBJECTS factor telling us we cannot calculate an exact F-test for this effect--but we don't normally want to anyway (we are usually not interested in whether or not there are significant individual differences).
    If we have multiple within-subjects factors, the procedure is easily extended, but we have to remember the rule that each such factor (and each of their interactions) should be tested against its own interaction with the subjects variable (this is an anova rule, not anything specific to Minitab). Minitab will do this correctly, but if we want the anova table printed out properly we have to be careful how we spell out the terms, and the anova line gets long and complicated. We often have to use Minitab's continuation character (& or ++), which tells the system that a command continues onto the next line. For example, if C2, C3, and C4 are all within-subject variables, we would have to write the following,
```
MTB> ANOVA C1=SUBJECTS C2 C2*SUBJECTS C3 C3*SUBJECTS &
CONT>          C4 C4*SUBJECTS C2*C3 C2*C3*SUBJECTS  &
CONT>          C2*C4 C2*C4*SUBJECTS C3*C4 C3*C4*SUBJECTS  &
CONT>          C2*C3*C4
SUBC> RANDOM SUBJECTS.
```
    Note that the highest possible level of interaction, C2*C3*C4*SUBJECTS, is omitted from the ANOVA command. This is in obedience to the general Minitab ANOVA rule that the highest possible level of interaction is always automatically included in the analysis. Attempting to include it explicitly will lead to an error message--very annoying when you have just put in such a complicated command!
  - Mixed mode analyses (involving both between- and within-subject factors). Predictably, these are the most complicated designs to specify. Consider the simplest case, where we have one between-subjects factor (C2, say) and one within (C3). Then subjects are only sampled randomly within values of the C2 variable. We tell Minitab this by including a term SUBJECTS(C2) (which we can read as "SUBJECTS within C2") on the ANOVA line, and then adding a sub-command to declare this factor as random. The simplest way of doing that is as follows:
```
MTB> ANOVA C1=C2|C3 SUBJECTS(C2);
SUBC>RANDOM SUBJECTS(C2).
```
    This would produce the right anova, and give correct values of F associated with the right numbers of degrees of freedom and the right significance levels. But the table won't be arranged quite as we would like it. To get it the way a 1-between, 1-within anova should look in the textbooks, we need to specify the terms on the ANOVA line in the order in which we want them to appear in the anova table:
```
MTB> ANOVA C1=C2 SUBJECTS(C2) C3 C2*C3;
SUBC>RANDOM SUBJECTS(C2).
```
    Designs involving larger numbers of factors, whether between- or within-subjects, can be dealt with by logical extensions of the rules laid out here, plus the following additional rules:
    * In a mixed-mode analysis (i.e. some between-subject and some within-subject variables), if there is more than one between-subject variable, then subjects are sampled randomly within all of them. This is represented within Minitab by writing (for example) SUBJECTS(C2 C3) where C2 and C3 both vary between subjects.
    * If there is more than one within-subject factor, remember the general rule that such factors are always tested against their own interactions with the random variable (i.e. subjects within the between-subjects variable). This has to be specified exactly, and is one of the chief causes of very long anova command lines.
    * Unfortunately, the | expansion symbol does not work on sub-terms. So the logically correct C2*(C3|C4) or C2*C3|C4 are not expanded, as you might reasonably hope, to C2*C3 C2*C4 C2*C3*C4.
    The biggest problem is that the ANOVA line tends to get very long indeed. It pays to be able to use the copy-key facility on your terminal, and to remember that Minitab has a continuation-character (& or ++) which indicates that a command continues on the next line.

The following example shows the analysis is of a two-between, two-within design, which illustrates most of what you need to know. The dependent variable is 'affect' and in in C1; there are two between-subjects variables, 'drug' in C3 and 'NHS/priv' in C4, and two within-subject factors, 'trial' in C2 and 'am/pm' in C6. It isn't absolutely guaranteed in the sense that I couldn't find a worked example of this complexity in a text book (if anyone knows of one, please inform me, I didn't search very far), though I have frequently met them in real research life.

MTB > ANOVA C1=C3 C4 C3*C4 SUBJECTS(C3 C4) C2 C2*C3 C2*C4 C2*C3*C4  &
CONT> C2*SUBJECTS(C3 C4) C6 C6*C3 C6*C4 C6*C3*C4 C6*SUBJECTS(C3 C4) &
CONT> C2*C6 C2*C6*C3 C2*C6*C4 C2*C6*C3*C4;
SUBC> RANDOM SUBJECTS(C3 C4).

Factor                     Type Levels Values
drugs                     fixed      2     1     2
NHS/priv                  fixed      2     1     2
subjects(drugs NHS/priv) random      5     1     2     3     4     5
trials                    fixed      6     1     2     3     4     5     6
am/pm                     fixed      2     1     2

Analysis of Variance for affect
Source                             DF         SS         MS       F      P
drugs                               1      3.554      3.554    1.23  0.284
NHS/priv                            1      0.932      0.932    0.32  0.578
drugs*NHS/priv                      1      0.288      0.288    0.10  0.757
subjects(drugs NHS/priv)           16     46.270      2.892       *
trials                              5     18.914      3.783    0.86  0.512
drugs*trials                        5     17.080      3.416    0.78  0.569
NHS/priv*trials                     5     44.212      8.842    2.01  0.086
drugs*NHS/priv*trials               5     21.215      4.243    0.96  0.445
trials*subjects(drugs NHS/priv)    80    351.912      4.399    2.01  0.000
am/pm                               1     49.115     49.115   29.00  0.000
drugs*am/pm                         1      2.710      2.710    1.60  0.224
NHS/priv*am/pm                      1      0.220      0.220    0.13  0.724
drugs*NHS/priv*am/pm                1      0.240      0.240    0.14  0.712
am/pm*subjects(drugs NHS/priv)     16     27.095      1.693    0.77  0.709
trials*am/pm                        5      8.227      1.645    0.75  0.586
drugs*trials*am/pm                  5      8.380      1.676    0.77  0.576
NHS/priv*trials*am/pm               5     21.933      4.387    2.01  0.086
drugs*NHS/priv*trials*am/pm         5      9.938      1.988    0.91  0.479
Error                              80    174.836      2.185
Total                             239    807.069
* no exact F-test can be calculated

Stephen Lea
revised 8th February 1996

University of Exeter
Department of Psychology
Washington Singer Laboratories
Exeter EX4 4QG
United Kingdom
Tel +44 1392 264626
Fax +44 1392 264623

Send questions and comments to the departmental administrator (Liz Hewitt) or to the author of this page

This information has been looked at Visitor Status Bar

times.