Purpose: Percentage of syllables stuttered (%SS) and severity rating (SR) scales are measures in common use to quantify stuttering severity and its changes during basic and clinical research conditions. However, their reliability has not been assessed with indices measuring both relative and absolute reliability. This study was designed to provide such information. Relative reliability deals with the rank order of participants in a sample, whereas absolute reliability measures the closeness of scores to one other and to a hypothetical true score. Method: Eighty-seven adult participants who stutter received a 10-min unscheduled telephone call. Three experienced judges measured %SS and also used a 9-point SR scale to measure stuttering severity from recordings of the telephone calls. Results: Relative intrajudge and interjudge reliability were satisfactory for both scales. However, absolute intrajudge and interjudge reliability were not satisfactory. Results showed that paired-judge SR and %SS procedures improved absolute reliability compared with single-judge measures. Additionally, the paired-judge procedure improved relative reliability from high to very high levels. Conclusion: Measurement of group changes of stuttering severity can be done in research contexts using either %SS or SR. However, for detecting changes within individuals using such measures, a paired-judge procedure is a more reliable method.