Document map tutorial

Document Map

What is the Document Map

The Semantic Map shows how close different text fragments are in their meanings. The Document Map, conversely, was designed to visualize semantic proximity between documents.

For each document, our system measures the weight or importance of each topic discussed in the document — the more important the discussed phenomenon or subject is for a given document, the higher the weight of its topic. If topics discussed in documents are similar, then documents are similar in meaning.

The Document Map consists of dots where each dot represents one document. A document map is similar to a semantic map in its main feature: semantically close documents will be placed close on the map.

Below, we explain the functionality of the Document Map based on English Wikipedia. Document maps for other document collections (or domain areas) will have similar functionality, but topics or document classes will be different.

Exploratory Search with the Document Map

Search with the Document Map

Dense areas on the map contain documents that are similar in their content. These well-defined areas can be used to categorize documents into document classes. The most important document classes are defined by larger dots: red circles mark larger classes, while white circles are used for less important topics.

For a more detailed view, zoom in with the mouse scroll. You can also move the map: left-click on it and drag it in the desired direction. When you want to view the required area, click on it. A red selection box will appear. This box will help you find out what topic the documents in the selected area of the map are about.

On the right is the Area Explorer, which will help you understand to which area of knowledge the selected area of the map is related. Above, in the Top Topics box, are the topics discussed in documents. Many dots — documents — fall into the part of the map you are examining. Each dot is characterized by several topics. The occurrence of topics is calculated for all dots. The most frequent ones are shown in the top topics box.

Top Topics box

Below, in the Documents box, the names of the documents associated with the selected area of the map are shown. Each dot corresponds to one document. The list will primarily depict those documents in which the first topic from Top Topics (in our example, lawsuits) prevails. You can click on the name of the document in the list. This document will open in a new browser tab.

Documents box

Click on a topic in the Top Topics box. All documents containing this topic will be highlighted on the map. You can select several topics at the same time. This way you can find relationship between document classes and understand the global semantic structure of the map.

Search with the Document Map

On the left is the Map Explorer, the guide to topics on the document map. At the top left is the All Topics box. It lists the topics that are most common in the domain area. Click on a topic to see all documents on the map that contain it.

All Topics box

Summary

The Document Map can be used to visually identify classes of documents in a large collection of texts. It can be created for any collection of documents in any language. The Document Map is used in the Silk Data Semantic Framework.

Stay in touch