Judicial Ghostwriting: Authorship on the Supreme Court

Albert Yoon & Jeffrey Rosenthal

Posted in , ,

Justice Louis Brandeis once wrote, “The reason the public thinks so much of the Justices of the Supreme Court is that they are almost the only people in Washington who do their own work.”  It is remarkable to think that each year, each justice is responsible for evaluating over seven thousand files, hearing oral argument for approximately sixty to eighty cases, and writing seven to ten lengthy published opinions, all of which will become established law and generate scrutiny by judges, lawyers, academics, students, and the press.  To add to the degree of difficulty, nearly all justices do this well past the standard retirement age of sixty-five, often until they die or are physically or mentally incapable.

Brandeis’ claim notwithstanding, anecdotes abound that justices delegate much of their work to their clerks.  A recent study of Supreme Court clerks concluded, “one can safely conclude that no other set of sitting Supreme Court justices have delegated as much responsibility to their law clerks as those on the Rehnquist Court.” This purported reliance on clerks varies by justice.  Justice Oliver Wendell Holmes was known to write his opinions in longhand, using his clerks primarily for administrative tasks.  More recently, Justice John Paul Stevens and Justice Antonin Scalia reputedly draft their own opinions, while both Chief Justice William Rehnquist and Justice Harry Blackmun were known to allow their clerks to draft their opinions.

Whether we should care if justices delegate the opinion-writing process to their clerks depends on the degree to which it occurs.  The import of an opinion rests less on the identification of the prevailing party, but on the reasoning that accompanies the decision.  Chief Justice Rehnquist himself cautioned that each justice “must retain for himself control not merely of the outcome of the case, but of the explanation of the outcome.”  If justices are delegating these substantive responsibilities, then we arguably should be concerned.

This outsourcing of work by justices to clerks raises two concerns.  The first is competence.  Clerks, while typically excellent law students from elite law schools, are also usually recent graduates who have little or no practice experience.  This bimodal staffing structure – well-seasoned justices and nascent lawyers – stands in stark contrast to the executive and legislative branches, both replete with experienced staff.  The second is ideology. The ideological preferences of the justice and her clerks may diverge.  Even when the justice dictates the broad direction of an opinion, i.e., the prevailing party and general reasoning, the clerk may still exercise considerable discretion.

Any serious discussion of judicial discretion, however, first requires a deeper understanding of judicial authorship.  The issue of authorship is nothing new: scholars have examined authorship in literature1 and history2.  More recently, legal scholars have compared drafts of opinions within individual justices’ chambers to discern writing styles, or—among lower court judges—citations to their own earlier opinions.

Our approach differs from these earlier attempts in that we explore judicial authorship based on a comprehensive evaluation of each justice’s writing style.  Our central intuition is that the more participants—i.e., clerks—in a justice’s opinion-writing process, the more heterogenous the writing style of that justice’s opinions.  Justices who write their own opinions (or delegate less of the drafting) would presumptively possess less variable writing styles than justices who relied exclusively (or heavily) on their law clerks.

We take advantage of the Court’s institutional design to our identification strategy.  Because clerkships on the Supreme Court typically are for a single term, justices who rely more on their clerks to write opinions would likely have a more variable writing style, both within and across years, in comparison to their less reliant colleagues.  Moreover, the clerks’ responsibilities on the Court have grown over time: the evolution from stenographer in the late nineteenth century to legal assistant in the 1920s to law firm associate beginning with the Warren Court.

In this Article, we analyze the text of majority opinions of all Supreme Court justices.  Using a parsimonious model based on the justices’ use of sixty-three common-function words (e.g., the, also, her), we construct a variability measure (V score) for each justice’s writing style based on the chi-squared statistic.  This measure accounts for the number of opinions and their length.

We observe a general trend where justices’ V scores have increased over time.  The V scores are lower and generally stable for the period from 1900 to 1950 and steadily increase beginning in the 1950s.  Among recent and current justices, Justice Antonin Scalia has an overall V score of 3.08, lower than most of his contemporaries on the Court.  Similarly, Justice Stephen Breyer had a V score of 3.06.  By contrast, Justice Sandra Day O’Connor and Justice Anthony Kenney had two of the highest V scores of all the justices, at 3.85 and 3.73, respectively.  Interestingly, both of these justices are well known as pivotal or “swing” justices, casting the decisive vote in many closely divided opinions.

To place these scores in historical perspective, Justice Oliver Wendell Holmes had an aggregate V score of 1.78.  This lower score was also remarkably consistent, with a standard deviation of 0.22 (his V score was below 2.00 in twenty-seven of his thirty years on the Court).  Similarly, Justice Benjamin Cardozo had a V score of 2.08, with a standard deviation of 0.40, while on the Court.

We checked the validity of these V scores by comparing them to Judge Richard A. Posner and Judge Frank Easterbrook, two judges known to write their own opinions.  Their V scores were 2.52 (standard deviation of 0.14) and 2.28 (standard deviation of 0.18), respectively, lower than any of the contemporary justices on the Court and lower than all but a handful of justices dating back to 1900.  While Judge Posner’s and Judge Easterbrook’s opinions were shorter on average than most justices, they also wrote many more opinions than any justice, lending support to our belief that the V scores are a valid measure of writing variability.

While the justices’ V scores allow us to convincingly reject the null hypothesis that their writing style follows a uniform and random distribution of function words, they do not establish whether the scores themselves meaningfully differ from one another.  There is no straightforward analytic test; because we reject the null hypothesis, the justices’ V scores by definition do not follow a chi-squared distribution, nor do they allow us to analytically determine what type of distribution it is.  It is possible, however, to determine this distribution empirically through a bootstrap test.3

For each justice, we select one hundred cases of authored majority opinions uniformly at random, with repetition.   For each sample of one hundred cases, we compute the V score in the same manner. We then repeat this process 1,000 times for each justice, generating 1,000 different possible V scores, depending on which one hundred cases we draw. We simply count the fraction of pairs in which the V score for justice A is greater than for justice B, which gives us an estimate of the probability that the V score for justice A is greater than that for justice B for a random selection of judgments.  This process allows us to estimate the distribution function for the difference of the V score between justice A minus justice B, and to compute a 95% confidence interval for this difference.

We find for the last Rehnquist natural court (1994 to 2005), in 44% of the pairings, the justices had V scores statistically distinguishable from one another.  Even when the difference in V scores were not statistically significant, in 72% of the pairings, the probability was either greater than 0.70 or less than 0.30.  We also used these scores to observe differences within justices over their tenure on the Court.  For example, we observed that Justice O’Connor had a V score of 3.61 before she reached age 65, and a V score of 4.33 afterwards, reflecting that her writing became more variable after she reached retirement age.  We also observed that Justice Thurgood Marshall, reputed by some to have relied heavily on his clerks during his early years on the Court, actually had a slightly lower V score during his final five years (2.84) on the Court than in his first five years (3.02), while Justice William Brennan had a markedly lower V score (2.46) in his first five years than in his last five years (3.63).

Having shown that justices have writing styles that are statistically distinguishable from the null hypothesis and, in many instances, from one another, the final part of our article examines whether accurately predicting authorship of judicial opinions is possible.  We again use a bootstrap, pairwise approach.  We consider a particular pair of justices—justice A and justice B—and the universe of majority opinions authored by one of these justices.  We partition the data into a training set and a testing set to avoid overfitting the data.  For our test, we use leave-one-out cross-validation, where, for each judgment written by either justice A or justice B, the judgment is the test set and all remaining judgments written by either justice serve as the training set.  We use a linear classifier in which all values of the linear fit value are assigned to one of the two justices.

Again examining the last Rehnquist natural court, we found, for example, in a pairwise comparison of Justice Breyer and Justice Ginsburg, the linear classifier accurately predicted the author of Justice Breyer’s opinions 94% of the time, and Justice Ginsburg’s opinions 96% of the time. Overall, our model achieved an accuracy rate of at least 70% in sixty-eight out of seventy-two pairwise comparisons; in thirty of the pairings, the accuracy rates exceeded 90%.  Comparing these rates to the null hypothesis – where authorship is randomly determined between the two justices in the pair – our model predicts quite well.

Of course, authorship on the Supreme Court is admittedly an academic exercise, since most majority opinions identify the author, but our model shows that the text of justices’ opinions are statistically distinguishable from one another, even when the differences in the justices V scores themselves are not statistically significant.

The purpose of our Article was twofold.  Our first goal was to show that it is possible to statistically evaluate justices’ writings. Our model was based on only function words, and  a different algorithm, perhaps one more tailored to legal writing, might produce even higher predictive accuracy.  Our second, and perhaps more important goal, was to show how textual analysis could increase our understanding of how justices produce opinions and how much they rely on their clerks.  Our findings are consistent with historical and anecdotal accounts of justices’ relationships with their clerks.

We note that our findings provide only circumstantial evidence of collaborative authorship.  While we contend that low V scores suggest that the justice does her own writing, some justices with high V scores might naturally have a variable writing style, and justices with low V scores might have a writing style that is easier for their clerks to replicate.  While plausible, our separate analysis of Judge Posner and Easterbrook, both known to write their own opinions, yields low V scores with low standard deviations, providing strong support for our analysis.

The qualitative effect of this apparent increased delegation to clerks is outside the scope of this Article.  Our analysis shows, however, that statistical analysis can meaningfully contribute to our understanding not merely of how justices vote, but also how they write.


Jeffrey S. Rosenthal, Professor, University of Toronto, Department of Statistics.

Albert H. Yoon, Professor of Law, University of Toronto Faculty of Law.

This Editorial is based on the Article, Judicial Ghostwriting: Authorship on the Supreme Court, 96 CORNELL L. REV. 1307 (2011)

Copyright © 2011 Cornell Law Review.

  1. See generally John Michell, Who Wrote Shakespeare? (1996) (surveying a collection of arguments for alternative authors); S. Schoenbaum, Shakespeare’s Lives (1993) (same); James D.A. Boyle, The Search for an Author: Shakespeare and the Framers, 37 Am. U. L. Rev. 625, 628–29 (1988) (noting debates on authorship of Shakespeare’s works in connection with changing conceptions of authorship throughout history).
  2. See Frederick Mosteller & David L. Wallace, Inference and Disputed Authorship: The Federalist 263 (1964).
  3. A bootstrap is a procedure of repeated sampling with replacement from a given sample.

Post a Comment (all fields are required)

You must be logged in to post a comment.