Smart Retrieval and Structuring of Legal Documents

Project Summary
CategoryLegaltech
CustomerFoqum Analytics / Lefebvre
Period2018-07-02 to 2019-07-01

Overview

Natural Language Processing with Deep Learning for retrieval of legal documents

This project was developed in the early days of Deep Learning NLP, before the transformer architecture was built into commercial products.

In 2018, Lefebvre (Spain’s leading provider of legal information) sought to modernize the way it managed judicial content. The goal was to automatically classify, label, and extract relevant information from a corpus of over one million court rulings and legal documents — a repository that continues to grow daily.

Beyond structuring the data, the project aimed to enable advanced search capabilities that would support faster and more informed legal decision-making.

The project began with entirely unlabeled data. Due to the highly specialized nature of legal language — particularly within the Spanish legal system — off-the-shelf pre-trained NLP models were not suitable.

Addressing this required the development of a dedicated annotation pipeline, the design of domain-specific labeling strategies, and the implementation of an active learning framework to efficiently guide expert annotation. At the time (2018), this meant deploying state-of-the-art NLP methodologies adapted specifically to the legal domain.

The project delivered a hierarchical classification system capable of organizing judgments and legal documents across multiple levels of legal categories, achieving accuracy rates above 90%. In addition, the implementation of semantic search capabilities improved information retrieval performance. Compared to the previous system, the new solution was 22 times more efficient, significantly reducing operational workload and increasing productivity and service responsiveness.

Due to confidentiality agreements, the team could not write a publication in this project, but the results were explained in the specialized conference JURIX 2019 - IberLegal with a talk in the industry session.

I was Principal Investigator of the project and responsible for delivery of results under the contract between UCA Datalab and Quantum Analytics.

Contract Art. 83 between Quantum Analytics and Universidad de Cádiz

PI: David Gómez-Ullate (UCA), 02/07/2018 – 01/07/2019, Sum: 72.600 EUR.

Related Outreach

David Gómez-Ullate
Authors
Professor of Applied Mathematics — Head of Mathematics, School of Science & Technology, IE University