A series of blog posts by Lucas Zurbuchen

Part 3: Presenting a Complex NLP-Generated Relationship Graph

So far we extracted entities from documents, connected them to a network and have a presentable graph now, but there remains one more problem – it’s too complicated. With all the entities and relations we have to depict for an entire case, there is no chance a lawyer could view and navigate everything efficiently. Luckily, the last step takes care of this.

Like many solutions for managing complex data, the solution implemented in the last step is a hierarchy. More specifically, for each entity category, we can create a tree so that the extracted entities from previous steps are leaves and all other nodes in the tree are clusters of these entities. Even if a node in this hierarchy is a parent, we can represent it in the RG just like its children, simply having all the properties of its children (like outgoing edges or node weights). In order to navigate the RG, we can implement an algorithm where interacting with a node on the visualization can let you either zoom in to its children or out to its parent, connected to all of their closest-related entities.

Creating the hierarchy consists of recursively using clustering algorithms until the size of the cluster is less than a certain threshold. However, it’s important to note two things: we need to generate different embeddings for clustering based on the entity category, and we need to find the best way to name a cluster for each category so that we can represent it as a node in the visualization. To address the first problem, we use different embedding techniques for different categories. For example, exchanges might use sentence embedders, people might use simple vectors of characters (so lawyers can look through names in alphabetical order), and locations might use weighted embeddings that consider both coordinates and area (extracted with geospatial APIs). Essentially, the goal is finding out how to detect similarities and differences between entities the most effectively. To address the second issue, if the name of the cluster isn’t based on alphabetical order (for example “Names A-F” for people), GenAI can be used to find the overall theme of the cluster in the exact same way it was used to merge edge labels. Anyhow, having a hierarchy for each of the entity categories makes it possible for users to zoom in and out of different sections of the RG.

To accomplish this zooming mechanism, we simply construct a graph with a limited number of nodes that contains the node(s) of interest as well as their closest-related entities. We take a breadth-first approach to this problem, collecting the outgoing edges of each node we add to the graph and determining which one is the most important to look at first (which we can conveniently judge based on the node weights we calculated). We also make sure that as many parent nodes as possible show up on the graph because a parent node means that we can represent the information of multiple nodes at once.

Finally, we have our finished relationship graph – a visual representation of an entire legal case. This graph includes several types of entities and exchanges between them, mechanisms to mitigate common sources of error, and an intuitive way to present this complex web of information. It allows lawyers to get right to the interesting stuff, thoroughly investigating a legal case in both a broad and detailed perspective. They can get closer to Sherlock Holmes’s problem solving abilities and see what he might have seen when he solved all of his cases. Lastly, this means that Holmes might have gotten his (hypothetical) wish fulfilled 130 years later.

Part 3: The (Even More) Final Problem: A Story of NLP-Generated Relationship Graphs

Part 2: The (Even More) Final Problem: A Story of NLP-Generated Relationship Graphs