Part1: The (Even More) Final Problem: A Story of NLP-Generated Relationship Graphs

A series of blog posts by Lucas Zurbuchen

Part 1: Brief Overview and Preparation

One day, a reincarnated Sherlock Holmes was sitting in his Baker Street office pondering why he was there. This shouldn’t have happened – his author tried to kill him off in Meiringen, Switzerland to pursue more serious literature in his supposed Final Problem. But alas, public outrage over his “death” forced his author to put Holmes back in London. 

As much as he enjoyed his job, there were some parts he dreaded doing again. Quickly coming up with the initial web of people and relationships was possible with his intellect, but he always wanted to jump to the complex relationships for which he was known. This was worse for detectives not like Holmes, who had to use much of their brain power to come up with the web of a case before doing the interesting work. Was there any way to give him and fellow detectives some sort of head start? Fortunately, bits of his spirit from Meiringen found its way to an office in Sihlfeldstrasse, Zurich which inspired the people there to solve this problem.

Well, that’s obviously not entirely true. But Sherlock Holmes was the overarching model of the legal tech company, Herlock. Their goal is to provide lawyers information and insights on complex legal documents from legal cases using the latest Natural Language Processing (NLP) technology. One of the many things they offer is exactly this – a relationship graph (RG) that covers the web of the entire case a lawyer is looking at so they can get right to deducing the complex relationships. In this article, I’ll describe the five-step process to how Herlock can visually represent an entire case and what role NLP plays in doing this.


The three steps, broadly speaking, are as follows:

  1. Extracting entities: examine an uploaded document to get names of entities and relationships between them.

  2. Refining the graph: double-check that two entities aren’t referring to the same thing (merging entities), and look at how frequently each entity and relationship occurs in the documents (weighting the graph).

  3. Presenting the graph: clean up the complex web of entities so that the user can see all the data in a clearly-presented way.

Before going into more detail, it might also be helpful to clarify the input and output of this entire process. The input, described as an entire case, is a set of legal documents while the output, described as the RG, is the set of entities and relationships between them. 

To start extracting entities, we first need to look through all the text in the uploaded document. However, this can be very tedious, especially if the document is long and you’re looking for several pieces of information. Fortunately, data is already collected in Herlock that can be used for the text of the RG – the extracted events which were collected for Herlock’s timeline visualization (another one of the Information Graphs they offer). This data is leveraged on the assumption that all exchanges are events, the former which we can use to determine the details of exchanges between entities. Now, there is enough base data to execute the entity extraction. 

Next week we’ll learn how to extract entities and connect them to a network with NLP and finally how to present them in a user-friendly way.

Previous
Previous

Part 2: The (Even More) Final Problem: A Story of NLP-Generated Relationship Graphs

Next
Next

A Trip to the LLM Zoo