Home About Journal AHEAD OF PRINT Current Issue Back Issues Instructions Submission Search Subscribe Blog    

Users Online: 93 
Print this page  Email this page Small font sizeDefault font sizeIncrease font size 

SYMPOSIUM Table of Contents   
Year : 2007  |  Volume : 41  |  Issue : 1  |  Page : 11-15
The hierarchy of evidence: Levels and grades of recommendation

Division of Orthopedic Surgery, McMaster University, Hamilton, Ontario, Canada

Click here for correspondence address and email

How to cite this article:
Petrisor B A, Bhandari M. The hierarchy of evidence: Levels and grades of recommendation. Indian J Orthop 2007;41:11-5

How to cite this URL:
Petrisor B A, Bhandari M. The hierarchy of evidence: Levels and grades of recommendation. Indian J Orthop [serial online] 2007 [cited 2020 Jan 29];41:11-5. Available from:
Evidence-based medicine requires the integration of clinical judgment, recommendations from the best available evidence and the patient's values.[1] The "best available evidence" is used quite frequently and in order to fully understand this one needs to have a clear knowledge of the hierarchy of evidence and how the integration of this evidence can be used to formulate a grade of recommendation.[2] It is necessary to place the available literature into a hierarchy as this allows for a clearer communication when discussing studies, both in day-to-day activities such as teaching rounds or discussions with colleagues, but especially when conducting a systematic review so as to establish a recommendation for practice.[2] This necessarily requires an understanding of both study design and quality as well as other aspects which can make placing the study within the hierarchy difficult.[3] Another confounder is that there are a number of systems that can be used to place a study into a hierarchy and depending on the system a study can be placed at a different "level".[4] However, in general the different systems rate high quality evidence as "1" or "high" and low quality evidence as "4 or 5" or "low". Recently, some orthopedic journals have adopted the reporting of levels of evidence with the individual studies and in many cases the grading system has been adopted from the Oxford centre for evidence-based medicine system.[3] Rather than refer to any particular system we will speak in general terms of those studies deemed to be high-level evidence and relate this to those of lesser quality.

   Levels of Evidence Top

Study design

Surgical literature can be broadly classified as those articles with a primary interest in therapy, prognosis, harm, economic analysis or those focusing on overviews to name a few.[5] Within each classification there is a hierarchy of evidence, that is, some studies are better suited than others, to answer a question of therapy, for example, and may more accurately represent the "truth". The ability of a study to do this rests on two main contributing factors, the study design and the study quality.[3] In this context we will focus for the most part on those studies addressing therapy as this is generally the most common study in the orthopedic surgical literature.

Available therapeutic literature can be broadly categorized as those studies of an observational nature and those studies that have a randomized experimental design.[2] The reason that studies are placed into a hierarchy is that those at the top are considered the "best evidence".[5] In the case of therapeutic trials this is the randomized controlled trial (RCT) and meta-analyses of RCTs. RCTs have within them, by the nature of randomization, an ability to help control bias.[6],[7] Bias (of which there are many types) can confound the outcome of a study such that the study may over or underestimate what the true treatment effect is.[8] Randomization is able to achieve this by not only controlling for known prognostic variables but also and more importantly controlling for the unknown prognostic variables within a sample population.[7] That is to say, the act of randomization should be able to create an equal distribution of prognostic variables (both known and unknown) within both the control and treatment groups within a study. This bias-controlling measure helps attain a more accurate estimation of the truth.[6] Those studies of a more observational nature have within their designs areas of bias not present in the randomized trial.

Meta-analyses of randomized controlled trials in effect use the data from individual RCTs and statistically pool it.[5],[9] This effectively increases the number of patients that the data was obtained from, thereby increasing the effective sample size. The major drawback to this pooling is that it is dependent on the quality of RCTs that were used.[9] For example, if three RCTs are in favor of a treatment and two are not or if the results show wide variation between the estimates of treatment effect with large confidence intervals (i.e. the precision of the point estimate of the treatment effect is poor) between different RCTs then there is some variable (or variables) causing inconsistent results between studies (one variable may in fact be differences in study quality among others) and the quality of usable results from statistical pooling will be poor. However, if five methodologically well done RCTs are used, all of which favor a treatment and have precise measures of treatment effect (i.e., narrow confidence intervals) then the data obtained from statistical pooling is much more believable. The former studies can be termed heterogeneous and the latter homogeneous.[9]

In contrast to this, the lowest level on the hierarchy (aside from expert opinion) is the case report and case series.[3] These are usually retrospective in nature and have no comparison group. They are able to provide outcomes for only one subgroup of the population (those with the intervention). There is the potential for the introduction of bias especially if there is incomplete data collection or follow-up which may happen with retrospective study designs. Also, these studies are usually based on a single surgeon's or center's experience which may raise doubts as to the generalizability of the results. Even with these drawbacks, this study design may be useful in many ways. They can be used effectively for hypothesis generation as well as potentially providing information on rare disease entities or complications that may be associated with certain procedures or implants. For example, reporting of infection rates following a large series of tibial fractures treated with a reamed intramedullary nail[10] or the rate of hardware failure of a particular implant to name a few.

The next level of study is the case-control. The case-control starts with a group who has had an outcome of interest and looks back at other similar individuals to see what factors may have been present in the study group and may be associated with the outcome. Let us take a hypothetical example. Those patients who have a nonunion following a tibial shaft fracture treated with an intramedullary nail. If one wanted to see what prognostic factors may have contributed to this, a group that was matched for the known prognostic variables such as age, treatment type, fracture pattern etc. could then be compared and an analysis of other prognostic variables such as smoking, nonsteroidal anti-inflammatory use or fracture pattern could be done to see if there was any association between these and the development of nonunion. The drawback to this design is that there may be unknown or as yet unidentified risk factors that would not be able to be analyzed. However, in those that are known, the strength of association may be determined and given in the form of odds ratios or sometimes relative risks. Other strengths of this study design are that they are usually less expensive to implement and can allow for a quicker "answer" to a specific question. They also can allow for analysis of multiple prognostic factors and relationships within these factors to help determine potential associations to the outcome of choice (in this case nonunion).

In contrast to the case-control and slightly higher on the levels of evidence hierarchy,[3] the cohort study is usually done in a prospective fashion (although it can be done retrospectively) and usually follows two groups of patients. One of these groups has a risk factor or prognostic factor of interest and the other does not. The groups are followed to see what the rate of development of a disease or specific outcome is in those with the risk factor as compared to those without. Given that this is usually done prospectively it falls higher within the hierarchy as data collection and follow-up can be more closely monitored and attempts can be made to make them as complete and accurate as possible. This type of study design can be very powerful in some instances. For example, if one wanted to see what the effect of smoking was on nonunion rates, it wouldn't be ethical or generally possible to randomize patients with fractures into those who are going to smoke and those who are not. However, by following two groups of patients, smokers and non-smokers with tibial fractures for instance, one can then document nonunion rates between the two groups. In this case, because of its prospective design, groups can at least be matched to try and limit the bias of at least those prognostic variables that are known, such as age, fracture pattern or treatment type to name a few.

It is important to understand distinctions between study designs. Some investigators argue that well-constructed observational studies lead to similar conclusions as RCTs.[11] However, others suggest that observational studies have a more significant potential to over or underestimate treatment effects. Indeed, examples are present in both medical and orthopedic surgical specialties showing that discrepant results can be found between randomized and nonrandomized trials.[6],[8],[12] One recent nonsurgical example of this is hormone replacement therapy in postmenopausal women.[13],[14] Previous observational studies suggested that there was a significant effect of hormone replacement therapy on bone density with a favorable risk profile. However, a recent large RCT found an increasing incidence of detrimental cardiac and other adverse events in those undergoing hormone replacement therapy, risks which had heretofore been underestimated by observational studies.[13],[14] As a result of this the management of postmenopausal osteoporosis has undergone a shift in first-line therapy.[13] In the orthopedic literature it has been suggested that when assessing randomized and nonrandomized trials using studies of arthroplasty vs. internal fixation, nonrandomized studies overestimated the risk of mortality following arthroplasty and underestimated the risk of revision surgery with arthroplasty.[8] Interestingly, they also found that in those nonrandomized studies that had similar results to randomized studies, patient age, gender and fracture displacement were controlled for between groups.[8] This illustrates the importance of both controlling for variables and for randomization which will control for potentially important but as yet unknown variables.

Thus the type of study design used places the study broadly into a hierarchy of evidence from the case series up to the randomized controlled trial. There is also, however, an internal hierarchy within the overall levels of evidence and that is usually based on the study methodology and overall quality.

   Study Quality Top

Concepts of study methodology are important to consider when placing a study into the levels of evidence. There are some that advocate dividing the hierarchy levels into sub-levels based in part on study methodology, while others suggest that poor methodology will take a study down a level.[2],[3] For instance, one RCT could be considered a very high-level study while another RCT because of methodological limitations may be considered lower. Do these then fall into separate categories or into sub-categories of the same level? It depends on the level of evidence system being used. The real point is that these systems acknowledge a difference in the quality and thus the "level" of these studies. In many instances however, the methodological limitations that will take a study down a level are not clearly defined and it is left to the individual to attempt to correctly categorize the study based on them.

The rigor with which a study is conducted plays a role in how believable the results may be.[15],[16] Not all case-control, cohort or randomized studies are done to the same standards and thus if done multiple times, may have different results, both due to chance or due to confounding variables and biases. Briefly, if we take the example of a RCT one needs to look at all aspects of the methodology to see how rigorously the study was conducted. We present three examples of how different aspects of methodology may affect the results of a trial. While it is important to look closely at the methods section of a paper to see how the study was conducted, it must be remembered that if something has not been reported as being done (such as the method of randomization) it does not necessarily mean it was not done.[17] This illustrates the importance of tools such as the "Consolidated Standards of Reporting" (CONSORT) statement for reporting trials which attempts to improve the quality of reporting.[18],[19]


As randomization is the key to balancing prognostic variables, it is first necessary to determine how it was done. The most important concepts of randomization are that allocation is concealed and that the allocation is truly random. If it is known to which group a patient will be randomized it may be possible to potentially influence their allocation. Examples of this would include randomizing by chart number, birthdates or odd or even days. This necessarily introduces a selection bias which negates the effect of randomization. This makes concealment of allocation a vital component of successful randomization. Allocation can be concealed by having offsite randomization centers, web-based or phone-based randomization.


In surgical trials blinding is obviously not possible for some aspects of the trial. It is not possible (or ethical) to blind a surgeon, nor is it usually possible to blind a patient to a particular treatment. However, there are other aspects of a trial where blinding can play a role. For instance, it is possible to blind outcome assessors, the data analysts and potentially other outcomes' adjudicators. Thus it is important to understand who is doing the data collecting and ask, are they independent and were they blinded to the treatment received? If not, possible influences (either subconscious or not) on the patient and subsequent results can happen.


The number lost to follow-up is very important to know as clearly this can affect the estimate of treatment effect. While some argue that only a 0% loss to follow-up fully ensures benefits of randomization,[20] in general, the validity of a study may be threatened if more than 20% of patients are lost to follow-up.[5] Calculations of results should include a worst case scenario, that is, those that are lost to follow-up are considered to have the worst outcome in the treatment group and those lost to follow-up in the control group having the best outcome. If there is still a treatment effect seen between the groups then this makes a more compelling argument for the treatment effect observed being a valid estimate of the truth.[21]

Scales have been devised that can rate a study based on its methodology and assign a score.[22] This does not always need to be done in daily practice however. Knowledge of the different areas of methodology though may affect interpretation of the results and allow for the recognition of a "strong" study which may then provide more compelling and "believable" results as compared to a "weaker" study.

Grades of recommendation

When truly does assessing the quality of a study in relation to the levels of evidence matter? It matters when a grade of recommendation is being developed. A very important concept is that a single high-level therapeutic study (in our case) does not imply a high grade of recommendation for treatment. A grade of recommendation can only be developed after a thorough systematic review of the literature and in many cases discussions with content experts.[2],[4],[23] When developing grades of recommendation, it becomes important to place weights on studies with more weight being given to studies of high quality and high on the hierarchy and less so to lesser quality studies.[2]

The GRADE working group suggests a system for grading the quality of evidence obtained from a thorough systematic review [Table - 1]. This should be done for all the outcomes of interest as well as all the potential harms. They suggest that once the total evidence has been graded then recommendations for treatment can be made.

The GRADE working group suggests that when making a recommendation for treatment four areas should be considered: 1) What are the benefits vs. harms? Are there clear benefits to an intervention or are there more harms than good? 2) The quality of the evidence, 3) Are there modifying factors affecting the clinical setting such as the proximity of qualified persons able to carry out the intervention? and 4) What is the baseline risk for the potential population being treated?[1] Once these factors are taken into consideration, the GRADE working group recommends a recommendation be placed into one of four categories. Either "do it" or "don't do it" and "probably do it" or "probably don't do it". The grades of "do it" or "don't do it" are defined as "a judgment that most well-informed people would make". The grades of "probably do it" or "probably don't do it" are defined as "a judgment that the majority of well-informed people would make but a substantial minority would not".[2]

Thus one can see that a grade of recommendation in contradistinction to a level of recommendation is made based on the above four criteria. Inherent in the above criteria are a thorough review of the literature and a grading of the studies through knowledge of study design and methodology. Evidence-based medicine is touted as being a decision-making based on the composite of the triumvirate of clinical experience, the best available evidence and patient values. One can see that knowledge of the levels of evidence, the pros and cons of different study designs and how study methodology can affect the placement of a study within the hierarchy encompasses one aspect of this. The development of grades of recommendation based on the GRADE working group system gives one the tools to convey the best available evidence to the patient as well as help the literature guide the busy clinician. Also, different harms and benefits of various treatments are given different value judgments by individual patients. Discussions with patients about what is important to them, mixed with surgical experience and "what works in my hands" helps round out the decision-making when developing a treatment plan.

   References Top

1.Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: What it is and what it isn't. BMJ 1996;312:71-2.  Back to cited text no. 1  [PUBMED]  [FULLTEXT]
2.Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al . Grading quality of evidence and strength of recommendations. BMJ 2004;328:1490.  Back to cited text no. 2    
3.Phillips B, Ball C, Sackett DL, Badenoch D, Straus S, Haynes B, et al . Levels of evidence and grades of recommendation. Centre for evidence-based medicine: Oxford-centre for evidence based medicine. GENERIC: 1998.  Back to cited text no. 3    
4.Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al . Systems for grading the quality of evidence and the strength of recommendations I: Critical appraisal of existing approaches The GRADE Working Group. BMC Health Serv Res 2004;4:38.  Back to cited text no. 4    
5.Sackett DL, Richardson WS, Rosenberg WM, Haynes RB. Evidence Based Medicine: How to practice and teach EBM. Churchill Livingstone: New York; 1997.   Back to cited text no. 5    
6.Schulz KF, Grimes DA. Generation of allocation sequences in randomised trials: Chance, not choice. Lancet 2002;359:515-9.  Back to cited text no. 6    
7.Thoma A, Farrokhyar F, Bhandari M, Tandan V; Evidence-Based Surgery Working Group. Users' guide to the surgical literature. How to assess a randomized controlled trial in surgery. Can J Surg 2004;47:200-8.  Back to cited text no. 7    
8.Bhandari M, Tornetta P 3rd, Ellis T, Audige L, Sprague S, Kuo JC, et al . Hierarchy of evidence: Differences in results between non-randomized studies and randomized trials in patients with femoral neck fractures. Arch Orthop Trauma Surg 2004;124:10-6.  Back to cited text no. 8    
9.Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: A basic science for clinical medicine. Little, Brown and Company: Boston; 1991.   Back to cited text no. 9    
10.Petrisor B, Anderson S, Court-Brown CM Infection after reamed intramedullary nailing of the tibia: A case series review. J Orthop Trauma 2005;19:437-41.  Back to cited text no. 10    
11.Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies and the hierarchy of research designs. N Engl J Med 2000;342:1887-92.  Back to cited text no. 11    
12.Chalmers TC, Celano P, Sacks HS, Smith H Jr. Bias in treatment assignment in controlled clinical trials. N Engl J Med 1983;309:1358-61.  Back to cited text no. 12    
13.Lemay A. The relevance of the Women's Health Initiative results on combined hormone replacement therapy in clinical practice. J Obstet Gynaecol Can 2002;24:711-5.  Back to cited text no. 13    
14.Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, et al . Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results From the Women's Health Initiative randomized controlled trial. JAMA 2002;288:321-33.  Back to cited text no. 14    
15.Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, et al . Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Control Clin Trials 1996;17:1-12.  Back to cited text no. 15    
16.Kunz R, Oxman AD. The unpredictability paradox: Review of empirical comparisons of randomised and non-randomised clinical trials. BMJ 1998;317:1185-90.  Back to cited text no. 16    
17.Devereaux PJ, Manns BJ, Ghali WA, Quan H, Guyatt GH. The reporting of methodological factors in randomized controlled trials and the association with a journal policy to promote adherence to the Consolidated Standards of Reporting Trials (CONSORT) checklist. Control Clin Trials 2002;23:380-8.  Back to cited text no. 17    
18.Kessler KM. The CONSORT statement: Explanation and elaboration. Consolidated Standards of Reporting Trials. Ann Intern Med 2002;136:926-7.  Back to cited text no. 18    
19.Yuasa H. The CONSORT statement: Explanation and elaboration. Consolidated Standards of Reporting Trials. Ann Intern Med 2002;136:926-7.  Back to cited text no. 19    
20.Schulz KF, Grimes DA. Sample size slippages in randomised trials: Exclusions and the lost and wayward. Lancet 2002;359:781-5.  Back to cited text no. 20    
21.Sprague S, Leece P, Bhandari M, Tornetta P 3rd, Schemitsch E, Swiontkowski MF, et al . Limiting loss to follow-up in a multicenter randomized trial in orthopedic surgery. Control Clin Trials 2003;24:719-25.  Back to cited text no. 21    
22.Detsky AS, Naylor CD, O'Rourke K, McGeer AJ, L'Abbe KA. Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol 1992;45:255-65.  Back to cited text no. 22    
23.Atkins D, Briss PA, Eccles M, Flottorp S, Guyatt GH, Harbour RT, et al . Systems for grading the quality of evidence and the strength of recommendations II: Pilot study of a new system. BMC Health Serv Res 2005;5:25.  Back to cited text no. 23    

Correspondence Address:
B A Petrisor
Brad Petrisor MD 6North Wing, Hamilton Health Sciences, General Hospital, 237 Barton St. E, Hamilton, Ontario, L8L 2X2
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/0019-5413.30519

Rights and Permissions


[Table - 1]

This article has been cited by
1 Smoking and Mental Illness: A Bibliometric Analysis of Research Output Over Time
Alexandra P. Metse,John H. Wiggers,Paula M. Wye,Luke Wolfenden,Judith J. Prochaska,Emily A. Stockings,Jill M. Williams,Kerryn Ansell,Caitlin Fehily,Jenny A. Bowman
Nicotine & Tobacco Research. 2017; 19(1): 24
[Pubmed] | [DOI]
2 Adhesive capsulitis of the hip: a review addressing diagnosis, treatment and outcomes
Darren de SA,Mark Phillips,Michael Catapano,Nicole Simunovic,Etienne L. Belzile,Jon Karlsson,Olufemi R. Ayeni
Journal of Hip Preservation Surgery. 2016; 3(1): 43
[Pubmed] | [DOI]
3 Storytelling in the context of vaccine refusal: a strategy to improve communication and immunisation
Philip B Cawkwell,David Oshinsky
Medical Humanities. 2016; 42(1): 31
[Pubmed] | [DOI]
4 Systematic Review of the Exposure Assessment and Epidemiology of High-Frequency Voltage Transients
Frank de Vocht,Robert G. Olsen
Frontiers in Public Health. 2016; 4
[Pubmed] | [DOI]
5 Use of Mesh During Ventral Hernia Repair in Clean-contaminated and Contaminated Cases
Manuel López-Cano,José A. Pereira,Manuel Armengol-Carrasco
Annals of Surgery. 2016; 263(3): e47
[Pubmed] | [DOI]
6 Worldwide utilization of topical remedies in treatment of psoriasis: a systematic review
Mathias Tiedemann Svendsen,Janithika Jeyabalan,Klaus Ejner Andersen,Flemming Andersen,Helle Johannessen
Journal of Dermatological Treatment. 2016; : 1
[Pubmed] | [DOI]
7 Where have all the patients gone?
Nancy S Padian,Charles B Holmes
The Lancet HIV. 2015;
[Pubmed] | [DOI]
8 Gimme that old time religion: the influence of the healthcare belief system of chiropractic’s early leaders on the development of x-ray imaging in the profession
Kenneth John Young
Chiropractic & Manual Therapies. 2014; 22(1)
[Pubmed] | [DOI]
9 Clinical Outcomes of Biologic Mesh
Hobart W. Harris
Surgical Clinics of North America. 2013; 93(5): 1217
[Pubmed] | [DOI]
10 A critical review of biologic mesh use in ventral hernia repairs under contaminated conditions
F. E. Primus, H. W. Harris
Hernia. 2013;
[VIEW] | [DOI]
11 Innovations in Orthopaedics: Hypothesis to publication
Jain, A.
Indian Journal of Orthopaedics. 2012; 46(6): 605-607
12 Hierarchy of evidence: Where observational studies fit in and why we need them
Hoppe, D.J., Schemitsch, E.H., Morshed, S., Tornetta III, P., Bhandari, M.
Journal of Bone and Joint Surgery - Series A. 2009; 91(SUPPL. 3): 2-9
13 Hierarchy of Evidence: Where Observational Studies Fit in and Why We Need Them
Daniel J Hoppe,Emil H Schemitsch,Saam Morshed,Paul Tornetta,Mohit Bhandari
The Journal of Bone and Joint Surgery-American Volume. 2009; 91(Suppl 3): 2
[Pubmed] | [DOI]
14 Checklists to improve the quality of the orthopaedic literature
Mundi, R., Chaudhry, H., Singh, I., Bhandari, M.
Indian Journal of Orthopaedics. 2008; 42(2): 150-164
15 Study Design and Hierarchy of Evidence for Surgical Decision Making
Sprague, S., McKay, P., Thoma, A.
Clinics in Plastic Surgery. 2008; 35(2): 195-205
16 How good is the orthopaedic literature
Chaudhry, H., Mundi, R., Singh, I., Einhorn, T.A., Bhandari, M.
Indian Journal of Orthopaedics. 2008; 42(2): 144-149
17 Study Design and Hierarchy of Evidence for Surgical Decision Making
Sheila Sprague,Paula McKay,Achilleas Thoma
Clinics in Plastic Surgery. 2008; 35(2): 195
[Pubmed] | [DOI]


    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
    Email Alert *
    Add to My List *
* Registration required (free)  

    Levels of Evidence
    Study Quality
    Article Tables

 Article Access Statistics
    PDF Downloaded918    
    Comments [Add]    
    Cited by others 17    

Recommend this journal