Background: Antibiotics are still prescribed to most patients attending primary care with acute sore throat, despite evidence that there is modest benefit overall from antibiotics. Targeting antibiotics using either clinical scoring methods or rapid antigen detection tests (RADTs) could help. However, there is debate about which groups of streptococci are important (particularly Lancefield groups C and G), and uncertainty about the variables that most clearly predict the presence of streptococci. Objective: This study aimed to compare clinical scores or RADTs with delayed antibiotic prescribing. Design: The study comprised a RADT in vitro study; two diagnostic cohorts to develop streptococcal scores (score 1; score 2); and, finally, an open pragmatic randomised controlled trial with nested qualitative and cost-effectiveness studies. Setting: The setting was UK primary care general practices. Participants: Participants were patients aged ≥ 3 years with acute sore throat. Interventions: An internet program randomised patients to targeted antibiotic use according to (1) delayed antibiotics (control group), (2) clinical score or (3) RADT used according to clinical score. Main outcome measures: The main outcome measures were self-reported antibiotic use and symptom duration and severity on seven-point Likert scales (primary outcome: mean sore throat/difficulty swallowing score in the first 2-4 days). Results: The IMI TestPack Plus Strep A (Inverness Medical, Bedford, UK) was sensitive, specific and easy to use. Lancefield group A/C/G streptococci were found in 40% of cohort 2 and 34% of cohort 1. A five-point score predicting the presence of A/C/G streptococci [FeverPAIN: Fever; Purulence; Attend rapidly (≤ 3 days); severe Inflammation; and No cough or coryza] had moderate predictive value (bootstrapped estimates of area under receiver operating characteristic curve: 0.73 cohort 1, 0.71 cohort 2) and identified a substantial number of participants at low risk of streptococcal infection. In total, 38% of cohort 1 and 36% of cohort 2 scored ≤ 1 for FeverPAIN, associated with streptococcal percentages of 13% and 18%, respectively. In an adaptive trial design, the preliminary score (score 1; n = 1129) was replaced by FeverPAIN (n = 631). For score 1, there were no significant differences between groups. For FeverPAIN, symptom severity was documented in 80% of patients, and was lower in the clinical score group than in the delayed prescribing group (-0.33; 95% confidence interval -0.64 to -0.02; p = 0.039; equivalent to one in three rating sore throat a slight rather than moderately bad problem), and a similar reduction was observed for the RADT group (-0.30; -0.61 to 0.00; p = 0.053). Moderately bad or worse symptoms resolved significantly faster (30%) in the clinical score group (hazard ratio 1.30; 1.03 to 1.63) but not the RADT group (1.11; 0.88 to 1.40). In the delayed group, 75/164 (46%) used antibiotics, and 29% fewer used antibiotics in the clinical score group (risk ratio 0.71; 0.50 to 0.95; p = 0.018) and 27% fewer in the RADT group (0.73; 0.52 to 0.98; p = 0.033). No significant differences in complications or reconsultations were found. The clinical score group dominated both other groups for both the cost/quality-adjusted life-years and cost/change in symptom severity analyses, being both less costly and more effective, and cost-effectiveness acceptability curves indicated the clinical score to be the most likely to be cost-effective from an NHS perspective. Patients were positive about RADTs. Health professionals' concerns about test validity, the time the test took and medicalising self-limiting illness lessened after using the tests. For both RADTs and clinical scores, there were tensions with established clinical experience. Conclusions: Targeting antibiotics using a clinical score (FeverPAIN) efficiently improves symptoms and reduces antibiotic use. RADTs used in combination with FeverPAIN provide no clear advantages over FeverPAIN alone, and RADTs are unlikely to be incorporated into practice until health professionals' concerns are met and they have experience of using them. Clinical scores also face barriers related to clinicians' perceptions of their utility in the face of experience. This study has demonstrated the limitation of using one data set to develop a clinical score. FeverPAIN, derived from two data sets, appears to be valid and its use improves outcomes, but diagnostic studies to confirm the validity of FeverPAIN in other data sets and settings are needed. Experienced clinicians need to identify barriers to the use of clinical scoring methods. Implementation studies that address perceived barriers in the use of FeverPAIN are needed.