Reliability Analysis

It is very common in psychological research to collect multiple measures of the same construct. For example, in a questionnaire designed to measure optimism, there are typically many items that collectively measure the construct of optimism. To have confidence in a measure such as this, we need to test its reliability: the degree to which it is error-free. The type of reliability we'll be examining here is called internal consistency reliability: the degree to which multiple measures of the same thing agree with one another.

Benevolent Sexism Scale

Peter Glick and Susan Fiske (1996) developed an interesting measure called the Benevolent Sexism Scale (BSS). Its 11 items are given below:

1. No matter how accomplished he is, a man is not truly complete as a person unless he has the love of a woman.
2. In a disaster, women ought not necessarily to be rescued before men.
3. People are often truly happy in life without being romantically involved with a member of the other sex.
4. Many women have a quality of purity that few men possess.
5. Women should be cherished and protected by men.
6. Every man ought to have a woman whom he adores.
7. Men are complete without women.
8. A good woman should be set on a pedestal by her man.
9. Women, compared to men, tend to have a superior moral sensibility.
10. Men should be willing to sacrifice their own well being in order to provide financially for the women in their lives.
11. Women, as compared to men, tend to have a more refined sense of culture and good taste.

Responses to these items from 74 male college students are in this SPSS data file, which you should download and open.

Reverse Scoring

Most of the items are phrased so that strong agreement indicates a belief that men should protect women, that men need women, and that women have positive qualities that men lack. However, three of the items are phrased in the reverse: #2, #3, and #7. In order to make those items comparable to the other items, we will need to reverse score them.

In this questionnaire, participants responded to the items using a 7-point Likert scale ranging from 1 ("Strongly Disagree") to 7 ("Strongly Agree"). When we reverse-score an item, we want 1's to turn into 7's, 7's to turn into 1's, and all the scores in between to become their appropriate opposite (6's into 2's, 5's into 3's, etc.). Fortunately, there is a simple mathematical rule for reverse-scoring:

reverse score(x) = max(x) + 1 - x

Where max(x) is the maximum possible value for x. In our case, max(x) is 7 because the Likert scale only went up to 7. To reverse score, we take 7 + 1 = 8, and subtract our scores from that. 8 - 7 = 1, 8 - 1 = 7. Voila.

To get SPSS to reverse-score:

Select Transform -> Compute:

You will be creating a new variable for each of the variables you need to reverse-score: #2, #3, and #7. The original variables are called bss02, bss03, and bss07. Let's call the new reverse-scored variables bss02r, bss03r, and bss07r. Name the first variable (the "Target Variable") bss02r, and set it equal to 8-bss02:


Instead of pressing ok, press PASTE. The following syntax appears:

It's easy to modify this syntax to compute all your reverse-scored items at once. Highlight the second line: COMPUTE bss02r = 8-bss02. (don't forget to include the period at the end) and press CTRL+C (or select Edit -> Copy). Move the cursor down one line and press CTRL+V (or select Edit -> Paste). Press RETURN to move the 'EXECUTE .' line down. Your new syntax should look like this:

Repeat that one more time so that you've got three identical lines in a row:

Now, modify the second and third row so that they are appropriate to bss03 and bss07:

Be sure to change both the left and the right side of each COMPUTE statement (this is the most common mistake people make on this assignment). You can now run this syntax (Run -> All) and it will create three new variables that are reverse-scored versions of bss02, bss03, and bss07.

Note: SPSS syntax is very particular about spelling and punctuation. Make sure you spell all the variables correctly.


Now you're ready to compute the reliability of this scale. Select Analyze -> Scale -> Reliability Analysis. Move the new reverse-scored items (bss02r, bss03r, bss07r) into the 'Items' box, as well as all the other items that didn't need to be reverse-scored (1, 4, 5, 6, 8, 9, 10, and 11).

Then click on the box labeled Statistics and select Scale if item deleted (you'll see why later):

Press 'Continue' and then 'OK.' You should get the following output:

Look at the top of the output and you will see ".741" under "Cronbach's Alpha." This is the most common statistic used to describe the internal consistency reliability of a set of items. If you are using a questionnaire in your research, your results should include a report of the Cronbach's alpha for your questionnaire.

The first two columns (Scale Mean if Item Deleted and Scale Variance if Item Deleted) of the next table generally aren't all that useful. The third column is the correlation between a particular item and the sum of the rest of the items. This tells you how well a particular item "goes with" the rest of the items. In the output above, the best item appears to be BSS01, with an item-total correlation of r = .598. The item with the lowest item-total correlation is BSS05 (r = .255). If this number is close to zero, then you should consider removing the item from your scale because it is not measuring the same thing as the rest of the items.

Alpha if Item Deleted

Now look in the last column: "Alpha if item deleted." This is a very important column. It estimates what the Cronbach's alpha would be if you got rid of a particular item. For example, at the very top of this column, the number is .690. That means that the Cronbach's alpha of this scale would drop from .741 to .690 if you got rid of that item. Because a higher alpha indicates more reliability, it would be a bad idea to get rid of the first item. In fact, if you look down the "Alpha if item deleted" column, you will see that none of the values is greater than the current alpha of the whole scale: .741. This means that you don't need to drop any items.

Improving Reliability

If you are using an accepted scale obtained from a published source, you do not need to worry about improving reliability. You should use the whole scale, even if it has problems, because if you start changing the scale you will be unable to compare your results to the results of others who have used the scale. You only want to improve the reliability of a scale if it is a scale you are developing.

If one of the "Alpha if item deleted" values is greater than the overall alpha, you should re-run Analyze -> Scale -> Reliability Analysis after moving the offending item from the "Items" box back over to the unused items box. Repeat this process until there are no values in the "Alpha if item deleted" column that are greater than the alpha for the overall scale.

Computing a mean score for a questionnaire

The goal of this whole procedure is to produce a single score for your questionnaire. Once you've used reliability analysis to identify the items that will produce the most reliable measure, you can use those items to create an average score for your questionnaire, as described below.

Note: Combining items on different scales. If you will be combining items that are on different scales (e.g., one is weight and goes from 100 to 250 [pounds] and another is height and goes from 60 to 81 [inches]) you cannot simply average them together because weight will have a much bigger impact on the final average. Instead, you must first standardize them and then you can average them together.

To compute a mean score, select Transform -> Compute. In the Target Variable box, type in the name of your scale: BSS:

In the Numeric Expression box, type the word MEAN, followed by ( and then a list of the variables you want to average together, separated by commas. Make sure you only put in the variables that you decided were the best for the scale. Note that I used bss02r, bss03r, and bss07r and not their original variables bss02, bss03, and bss07. At the end, close the expression with a ). Press OK to compute the new variable.

Select Graphs -> Legacy Dialogs -> Histogram and put your new BSS variable into the variable box. Press OK. You should get output like this:

A histogram is a plot of how often possible values occurred. It's one way to see if there is anything really strange in your data - any extreme values, or all the scores piled up on one side. If you've done everything correctly, you should find that the values on the right side of the image above correspond to the values in your output: standard deviation of .851, mean of 4.30, and N of 74.