USING AI TO SEARCH FOR CASE LAW AND MAKE SUBMISSIONS: IT MAKES CASES UP – IT REALLY DOES

If ever there was a judgment where the clue is in the name it is Harber v Commissioners for His Majesty’s Revenue and Customs (INCOME TAX – penalties for failure to notify liability to CGT – appellant relied on case law which could not be found on any legal website – whether cases generated by artificial intelligence such as ChatGPT) [2023] UKFTT 1007. This is a case that exemplifies the danger of relying on “Artificial Intelligence” to make legal submissions. In this case the appellant cited cases that do not exist.

 

“Having considered all the points set out above, we find as a fact that the cases in the Response are not genuine FTT judgments but have been generated by an AI system such as ChatGPT.”

THE CASE

The appellant appealed to the First Tier Tax Tribunal in relation to a penalty arising from capital gains tax.  The procedure involved her filing a Response.  That Response set out a number of previous decisions that appeared to assist the appellant.  However there was no citation and, upon close examination, it was clear that the cases did not in fact exist. The Tribunal concluded that this was because the Response had been generated by an AI system.

THE RESPONSE AND THE CASES CITED

The cases in the Response
    1. We first set out the cases in the Response, followed by the related evidence and submissions, and concluding with our findings and observations.
The text of the cases
    1. The cases in the Response were divided into two categories, those which related to “ignorance of the law” and those which related to mental health conditions.
Ignorance of the law cases
  1. Four of the cases in the Response dealt with ignorance of the law; they are set out below verba

“In the case of ‘David Perrin v HMRC’ (2019), the taxpayer, David Perrin, successfully appealed against a penalty charge for failing to notify HMRC of his liability to pay tax. Mr. Perrin argued that he was unaware of his obligation to notify HMRC and that the penalty charge was therefore unfair. The First-tier Tribunal (Tax Chamber) found in favor of Mr. Perrin, stating that his ignorance of the law constituted a reasonable excuse for the failure to notify HMRC’.

‘Jewell v HMRC’ (2016): The taxpayer successfully appealed against a penalty for late filing of a tax return on the basis of a lack of knowledge of the requirements to file. The taxpayer argued that they had not been aware of the requirement to file a tax return as they had not received any correspondence from HMRC. The First-tier Tribunal (Tax Chamber) found in their favor.

‘McMullen v HMRC’ (2018): The taxpayer successfully appealed against a penalty for late filing of a tax return on the basis of ignorance of the law requirements. The taxpayer argued that they had not been aware of the requirement to file a tax return as they had not received any correspondence from HMRC. The First-tier Tribunal (Tax Chamber) found in their favor.

‘Milner v HMRC’ (2020): The taxpayer successfully appealed against a penalty for late filing of a tax return on the basis of ignorance of the law requirements. The taxpayer argued that they had not been aware of the requirement to file a tax return as they had not received any correspondence from HMRC. The First-tier Tribunal (Tax Chamber) found in their favour.”

The mental health cases
    1. Five of the cases in the Response concerned mental health; they are set out below verbatim.

“‘Smith v HMRC’ (2021): The taxpayer successfully appealed against a penalty for late filing of a tax return on the basis of mental health issues. The taxpayer argued that their mental health condition, combined with other factors, had made it impossible for them to submit the return on time. The First-tier Tribunal (Tax Chamber) found in their favor.’

‘Oyesanya v HMRC’ (2020): In this case, the taxpayer successfully appealed against a penalty for late filing of a tax return. The taxpayer argued that they had a reasonable excuse for the late filing due to their mental health condition, which had prevented them from being able to manage their affairs effectively. The First-tier Tribunal (Tax Chamber) found in their favor.

‘Baker v HMRC’ (2020): The taxpayer successfully appealed against a penalty for late filing of a tax return on the basis of mental health issues. The taxpayer argued that their mental health condition, combined with other factors, had made it impossible for them to submit the return on time. The First-tier Tribunal (Tax Chamber) found in their favor.

‘Acheson v HMRC’ (2021): In this case, the taxpayer successfully appealed against a penalty for late filing of a tax return. The taxpayer argued that they had a reasonable excuse for the late filing due to their mental health condition, which had prevented them from being able to manage their affairs effectively. The First-tier Tribunal (Tax Chamber) found in their favor.

‘Talal v HMRC’ (2019): In this case, the taxpayer successfully appealed against a penalty for late filing of a tax return. The taxpayer argued that they had a reasonable excuse for the late filing due to their mental health condition, which had prevented them from being able to manage their affairs effectively. The First-tier Tribunal (Tax Chamber) found in their favor.”

Evidence and submissions
    1. At the reconvened hearing, Mrs Harber said that the cases in the Response had been provided to her by “a friend in a solicitor’s office” whom she had asked to assist with her appeal. Mrs Harber did not have more details of the cases, in particular, she did not have the full text of the judgments or any FTT reference numbers.
    1. Ms Man told the Tribunal that she had checked each of the cases in the Response to the FTT website, using not only the appellants’ names and the year as provided by Mrs Harber, but where the name was relatively common, she had extended the search to several years on either side. For example, when looking for “Smith v HMRC (2021)”, she had looked at cases between 2019 and 2023 where the appellant was called Smith. Despite that extended search, Mrs Man had not identified any FTT decision which matched the cases in the Response.
    1. Ms Man did however note that:
(1) the case of “Baker v HMRC (2020)” had similarities with Richard Baker v HMRC [2018] UKFTT 0763 (TC) (“Richard Baker“), in which a Mr Richard Baker appealed on the basis that his depression constituted a reasonable excuse. However, not only was the year different, but Mr Richard Baker lost his appeal; and
(2) the appellant in “David Perrin (2019)” had the same surname as the appellant in Christine Perrin, but the latter case was heard by the FTT in 2017 and by the Upper Tribunal (“UT”) in 2018, and Mrs Perrin had lost at both the FTT and the UT.
    1. The Tribunal told the parties that we too had looked at the FTT website and other legal websites, and had also had been unable to find any of the cases in the Response. We asked Mrs Harber if the cases had been generated by an AI system, such as ChatGPT. Mrs Harber said this was “possible”, but moved quickly on to say that she couldn’t see that it made any difference, as there must have been other FTT cases in which the Tribunal had decided that a person’s ignorance of the law and/or mental health condition provided a reasonable excuse.
  1. Mrs Harber then asked how the Tribunal could be confident that the cases relied on by HMRC and included in the Authorities Bundle were genuine. The Tribunal pointed out that HMRC had provided the full copy of each of those judgments and not simply a summary, and the judgments were also available on publicly accessible websites such as that of the FTT and the British and Irish Legal Information Institute (“BAILLI”). Mrs Harber had been unaware of those websites.

THE CASES DID NOT EXIST

The Tribunal found that the cases cited in the Response did not, in fact, exist.
Findings of fact
    1. In considering whether the cases in the Response were genuine FTT judgments or whether they had been generated by an AI system such as ChatGPT, the Tribunal first carried out a review of other published judgments, and having done so, took into account the following points:
(1) None of the cases in the Response is included in the FTT website or other legal websites.
(2) Mrs Harber accepted that it was “possible” that the cases in the Response had been generated by an AI system, and she had no alternative explanation for the fact that no copy of any of those cases could be located on any publicly available database of FTT judgments.
(3) The Solicitors’ Regulation Authority (“SRA”) recently said[1] this about results obtained from AI systems:

“All computers can make mistakes. AI language models such as ChatGPT, however, can be more prone to this. That is because they work by anticipating the text that should follow the input they are given, but do not have a concept of ‘reality’. The result is known as ‘hallucination’, where a system produces highly plausible but incorrect results.”

(4) The cases in the Response were “plausible but incorrect” because:

(a) The leading authority on the approach the FTT should take in reasonable excuse appeals is the UT judgment in Christine Perrin, commonly referred to simply as Perrin. The cited case of “David Perrin” uses the same surname and also concerns an appeal against a penalty on the grounds of reasonable excuse.

LESSONS FROM THE USA

The Tribunal looked at a case from the United States. In that case the AI system had been asked for further details of the cases cited and “dug in” – making up more detailed case law.

(5) The Tribunal was also assisted by the US case of Mata v Avianca 22-cv-1461(PKC), in which two barristers sought to rely on fake cases generated by ChatGPT. Like Mrs Harber, they placed reliance on summaries of court decisions which had “some traits that are superficially consistent with actual judicial decisions”. When directed by Judge Kastel to provide the full judgments, the barristers went back to ChatGPT and asked “can you show me the whole opinion”, and ChatGPT complied by inventing a much longer text. The barristers filed those documents with the court on the basis that they were “copies…of the cases previously cited”. Judge Kastel reviewed the purported judgments and identified “stylistic and reasoning flaws that do not generally appear in decisions issued by United States Courts of Appeals”.
(6) Unlike the barristers, Mrs Harber did not take the further step of asking ChatGPT for full judgments, so we had only the less detailed summaries. These had fewer identifiable flaws than those which Judge Kastel had identified in the longer full decisions with which he was provided. However, we noted that all but one of the cases in the Response related to penalties for late filing, and not for failures to notify a liability, which was the issue in Mrs Harber’s case. There were also the following stylistic points:

(a) The American spelling of “favor” in the sentence “The First-tier Tribunal (Tax Chamber) found in their favor” which appears in six of the nine cited cases.

(b) The frequent repetition of identical phrases: three of the four ignorance of the law” cases say that “the taxpayer argued that they had not been aware of the requirement to file a tax return as they had not received any correspondence from HMRC”. Two of the “mental health” cases say that “the taxpayer argued that their mental health condition, combined with other factors, had made it impossible for them to submit the return on time” and the other two both say “the taxpayer argued that they had a reasonable excuse for the late filing due to their mental health condition, which had prevented them from being able to manage their affairs effectively”.

    1. Having considered all the points set out above, we find as a fact that the cases in the Response are not genuine FTT judgments but have been generated by an AI system such as ChatGPT.
  1. We also find as a fact that Mrs Harber was not aware that the cases in the Response were fabricated, and did not know how to locate or check case law authorities by using the FTT website, BAILLI or other legal websites.

 

A HARMFUL PRACTICE

The Tribunal found that the appellant did not know that these cases were not genuine. However the practice of citing non-existent cases was extremely harmful.

    1. Although we have accepted that Mrs Harber did not know the AI cases were not genuine, we reject her submission that this did not matter because the Tribunal had decided other reasonable excuse cases on the basis of ignorance of the law and/or mental health issues. We instead agree with Judge Kastel, who said on the first page of his judgment (where the term “opinion” is synonymous with “judgment”) that:

Many harms flow from the submission of fake opinions. The opposing party wastes time and money in exposing the deception. The Court’s time is taken from other important endeavors. The client may be deprived of arguments based on authentic judicial precedents. There is potential harm to the reputation of judges and courts whose names are falsely invoked as authors of the bogus opinions and to the reputation of a party attributed with fictional conduct. It promotes cynicism about the legal profession and the…judicial system. And a future litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity.”

  1. We acknowledge that providing fictitious cases in reasonable excuse tax appeals is likely to have less impact on the outcome than in many other types of litigation, both because the law on reasonable excuse is well-settled, and because the task of a Tribunal is to consider how that law applies to the particular facts of each appellant’s case. But that does not mean that citing invented judgments is harmless. It causes the Tribunal and HMRC to waste time and public money, and this reduces the resources available to progress the cases of other court users who are waiting for their appeals to be determined. As Judge Kastel said, the practice also “promotes cynicism” about judicial precedents, and this is important, because the use of precedent is “a cornerstone of our legal system” and “an indispensable foundation upon which to decide what is the law and its application to individual cases”, as Lord Bingham’s said in Kay v LB of Lambeth [2006] UKHL 10 at [42]. Although FTT judgments are not binding on other Tribunals, they nevertheless “constitute persuasive authorities which would be expected to be followed” by later Tribunals considering similar fact patterns, see Ardmore Construction Limited v HMRC [2014] UKFTT 453 at [19].