Article Text


GRADE: levels of evidence and grades of recommendation

Statistics from

With the explosion of evidence based guidelines there have been a large number of ways of describing the quality of evidence behind the recommendations offered. Faced with the current multiplication, a guideline user may be faced with the same recommendation which is classified “II-2, B”, “C+, 1”, or “strong evidence, strongly recommended”, depending on which system is used. Is there an easy way of understanding them? What do they mean anyway?

Most systems have the same basic methods at their heart. The guideline developers are first asked to assess the methodological quality of the studies which support a recommendation: this produces the “level of evidence”. With this information about the breadth of evidence supporting a decision or point of action, the developers are then asked to evaluate the whole of the evidence and how it applies to the recommendation at hand: this gives the “grade (or strength) of recommendation”. Where the systems vary is in how the study quality is assigned, which factors are included in assessing a grade or strength of recommendation, and if different axes are used for different types of question (for example, therapeutic, diagnostic, and prognostic). These differences can produce different “meanings” to the final judgements. A statement such as “All breast milk donors should be tested for HIV and HTLV” may receive a “D” recommendation because of the paucity of evidence, but this does not mean that such testing is not to be undertaken. In another system, this may receive a “1c+” recommendation that implies a low quality of evidence but overwhelming support for the action suggested.

An international collaborative group (GRADE) is developing a single consensus system to overcome some of these difficulties. The GRADE system is explained in greater detail on their website ( In essence, the system asks guideline developers to think about the quality of the evidence by evaluating study design (for example, randomised controlled trial), execution (for example, allocation concealed), consistency and directness of the evidence (for example, using proteinuria rather than end stage renal failure). These factors should be looked at for each critically important outcome (for example, mortality rates, serious adverse events, and quality of life measures). To make a final judgement about the strength of a recommendation requires explicitly balancing the benefits and harms including the quality of evidence, the ability of the users to implement the recommendation, and the likely magnitude of impact. Including cost implications is the last step in the process. Each step can be recorded on an appropriate proforma to ensure the process is transparent.

How a single system will affect the way we read and write guidelines is yet to be seen. In time to come it may be that all guidelines have an open and disputable record to how their recommendations were arrived at. Until then, it’s best to double check the system before you mistake your E for your A.

View Abstract


  • Bob Phillips

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.