**SPSS** stands for *Statistical Package for the
Social Sciences*, and is the most frequently used
software among psychologists, sociologists and linguists
(and probably in many other fields) to perform statistical
computations.
With statistics software, such as SPSS, you get the mathematics for free.
Still, you should always keep in mind that software can only help
you if you understand what they do and in which case you can use
this or that function.

**A Getting familiar with SPSS****B1 Entering data by hand****B2 Using “Variable View”****B3 Creating a frequency table****C Creating a histogram****D Creating a boxplot****E Calculating mean, mode and median****F Calculating measures of spread**

**> Launch SPSS in the Start Menu.**

**> Once SPSS is running, you are
offered a menu with choices. Click on “cancel”.**

Now you are in the ** Data Editor**, the window of SPSS in which you can
enter data and work with them. (The results are going to be presented in a different
window.) On its top you find the name of the data file you are working
with, but at this moment it is still:

It is a spreadsheet you might be familiar with from other applications (such as
Excell). There is, however, a big difference: rows and columns cannot be
reversed, because they have different meanings. Namely, in
this Data Editor, each (vertical) column represents a ** variable**.
Each variable is given a name, which appears on the top of the column. Use
meaningful names, such as LENGTH, and not something like X24A06.

Each (horizontal) row represents a ** case**.
A case is a series of observations belonging together, such as the answers of a
respondent to the questions in a questionnaire, or different values measured on
the same subject of the experiment. For instance, if you have 32 respondents,
then you need 32 rows for the 32 cases. If the questionnaire contained 40
questions, then you most probably need 40 columns, and so you have 40
variables. Additionally, you can also calculate new, derivative
variables from existing ones.

The
Data Editor is composed of two parts: the ** Data View **and
the

The *Variable View* offers an overview of your
variables, and you can also define some features of these variables. The most
important features are:

1.: the name of the variable.Name

2.: defines the type of the variable. Some of the types offered by SPSS:Typea.3.Numeric: the usual way of rendering numbers (e.g., 12345,67).

b.Comma: comma before each group of three digits, dot before decimal digits (e.g., 12,345.67).

c.Dot: dot before each group of three digits, comma before decimal digits (e.g., 12.345,67).

d.String: any textual information (e.g., answers to an open question).: the number of positions available in the Data View window.Width

4.: the number of decimal digits after the comma/dot.Decimals

5.: text providing more information about the variable.Label

6.: texts providing information about each value of the variable.Values

7.: the value used to denote missing values (e.g., “no answer”).Missing

8.: the width of the column in the Data View window.Column

9.: the “measurement scale” of the variable (nominal, ordinal or scale, the last covering all types of numeric scales).Measure

On the top of the window you find the menu of SPSS: *FILE,
EDIT, VIEW, DATA*, etc. All statistical calculations are found under
ANALYZE, and all diagrams and charts under GRAPHS. To calculate new variables
based on the existing ones, use the commands under TRANSFORM. The HELP menu
provides you help with further assistance.

**> Have a look at the different menus to get a general overview of
them.**

The **MLU** (*Mean Length of Utterance*) measures the length
of an utterance (a well-formed sentence or a sentence-like
series of words) by counting the number of words it contains.
It is an important measure of linguistic capabilities of children
acquiring a language and of patients with impaired language. It
is also useful in identifying authors of texts, since every
author has a characteristic MLU.

Here is the MLU measured on 20 patients:

3, 5, 4, 4, 10, 4, 11, 4, 4, 6, 3, 4, 4, 8, 8, 8, 5, 8, 4, 9.

**> Enter these values by hand (in Data View).**

**> In the Variable View, specify that the variable is named MLU.**

**> Then, set the number of decimals to 0. Utterence length always has
an integer value, so displaying decimals makes no sense, it is an error.**

When you work with SPSS (as with any other application), it is good practice to regularly save your data files. Output files are often simpler to create again, but data files are certainly not. Moreover, SPSS may not be always stable, causing the program to terminate unexpectedly.

**> Therefore, save your data file to your own network drive in a separate
folder that you create specifically for this lab.**

A frequency table is a table that shows how often each value of a variable appears among your data.

**> Create a frequency table from this variable. **

Hint: 'Analyze', 'Descriptive Statistics', 'Frequencies'.

During the data entry process, one quite often makes errors. Hence, it is imperative to check always the data you have just entered. Beside re-reading the numbers in Data View, you should also look for outliers “created” by erroneous data entry: for instance, typing too many zeros or entering two values in a single cell will create values much greater than other values. In the present case, check if the frequency table contains only values you remember having entered (and that make sense). Compare also your frequency table to the one of your neighbours in the lab.

**> Check the frequency of each value
in you frequency table together with your neighbour.**

**> Copy-paste the table into a Word file.**

**Q: How many measurements (data) do you have? **

**Q: Which MLU is the second most frequent?**

**Q: How often does the highest value of MLU occur?**

A histogram (or frequency diagram) is a graph displaying how frequently the possible values of a variable occur (or how frequently values falling within a certain range occur) among the data having been entered.

**> Create a histogram based on the variable MLU.**

Hint: 'Graphs', 'Histogram'.

A Normal curve (a Gaussian distribution) is a very important function in statistics. Many statistical processes require that the data (approximately) follow a Normal distribution. A number of tricks exist to test whether this requirement is met by your data. The simplest one is to have SPSS fit a Normal curve on your data, when plotting a histogram.

**> Create again a histogram, but now have SPSS also draw a Normal curve.**

Hint: mark the checkbox ‘display normal curve’.

**> Copy-paste this second graph to a Word file.**

**Q: What does the vertical axis
display: numbers or percentages?**

**Q: What is the highest value and what
is the lowest value of the variable?**

**Q: How many peaks are there?**

**Q: There is a gap on the graph. At
what value? What does this observation mean?
Would you expect to find this gap if you had many more data?**

**Q: Is this distribution approximately Normal?**

A boxplot can be seen as a simplified histogram turned to its side, but it will also prove useful for other purposes later on.

**> Create a boxplot of your variable.**

Hint: 'Graphs', 'Boxplot'. Choose: “Simple” and “Summary separate variable”.

**> Copy-paste this boxplot to a Word file.**

**Q: Which is the lowest and highest value according to the boxplot?**

**Q: Which is approximately the median
according to the boxplot?**

**Q: How many percentages of the data
are outside of the box?**

**Q: Which data are outside of the “whiskers”
of the boxplot?**

We often would like to summarize the distribution of a variable as a very
few numbers that tell us roughly where the many observed values of that variable are
located. Generally the **mean** (average) is used for that purpose. Another
option is employing the **mode**, that is, the value that appears most
frequently.
One can also use the **median**, the middle value if the
observations are sorted from lowest to highest.

When a histogram is created, the mean is automatically calculated. The mode, the median and the mean can also be obtained by choosing “Analyze”, “Descriptive Statistics”, and then “Frequencies“ in the menu. If you wish, uncheck the mark next to ‘Display Frequency Table’, and ignore the warning. Then choose the mean, the mode and the median via the Statistics.

**> Have SPSS calculate the mean, the mode and the median.**

**Q: Suppose you make an error during
data entry: you type 80 instead of 8. Which of these values will change, and
which will not?**

**Q: The median of MLU is lower
than its mean. This is because the histogram is skewed to the … (left or
right?), and it has a longer tail to the … (left or right?).**

In many cases we are not only interested in where more
or less the values of the variable are located, but also in the “width” of the
frequency distribution. There are different measures of describing the “width”
of the histogram. The most known one is **standard deviation** (SD), but **range**
and **interquartile range** are also used. The drawback of the range (the
difference of the maximum and minimum values) is that it depends on
the two most extreme values being observed.
The other measures are much less influenced by outliers, and they are rather
determined by the bulk of the data.

**> Have SPSS calculate for you the SD, the range and the
quartiles.**

Hint: “Analyze”, “Descriptive Statistics”,
“Frequencies”.

**Q: If the range is seen as the width
of the histogram, then how many SD is the width of this histogram?**
(How many times is the range larger than the SD?)

This material is an adapted version of the assignments of the statistics courses developed by John Nerbonne at the University of Groningen.