Smart Retrieval and Structuring of Legal Documents
Project Summary
| Category | Legaltech |
|---|---|
| Customer | Foqum Analytics / Lefebvre |
| Period | 2018-07-02 to 2019-07-01 |

Overview
Natural Language Processing with Deep Learning for retrieval of legal documents
This project was developed in the early days of Deep Learning NLP, before the transformer architecture was built into commercial products.
In 2018, Lefebvre (Spain’s leading provider of legal information) sought to modernize the way it managed judicial content. The goal was to automatically classify, label, and extract relevant information from a corpus of over one million court rulings and legal documents — a repository that continues to grow daily.
Beyond structuring the data, the project aimed to enable advanced search capabilities that would support faster and more informed legal decision-making.
The project began with entirely unlabeled data. Due to the highly specialized nature of legal language — particularly within the Spanish legal system — off-the-shelf pre-trained NLP models were not suitable.
Addressing this required the development of a dedicated annotation pipeline, the design of domain-specific labeling strategies, and the implementation of an active learning framework to efficiently guide expert annotation. At the time (2018), this meant deploying state-of-the-art NLP methodologies adapted specifically to the legal domain.
The project delivered a hierarchical classification system capable of organizing judgments and legal documents across multiple levels of legal categories, achieving accuracy rates above 90%. In addition, the implementation of semantic search capabilities improved information retrieval performance. Compared to the previous system, the new solution was 22 times more efficient, significantly reducing operational workload and increasing productivity and service responsiveness.
Due to confidentiality agreements, the team could not write a publication in this project, but the results were explained in the specialized conference JURIX 2019 - IberLegal with a talk in the industry session.
I was Principal Investigator of the project and responsible for delivery of results under the contract between UCA Datalab and Quantum Analytics.
Contract Art. 83 between Quantum Analytics and Universidad de Cádiz
PI: David Gómez-Ullate (UCA), 02/07/2018 – 01/07/2019, Sum: 72.600 EUR.