Gil Moskowitz

CS725 - Information Visualization

Fall 2013

Course Project

MeSH: Medical Subject Headings

MeSH is a standardized vocabulary created and maintained by the U.S. National Library of Medicine (NLM). It is used for indexing a number of databases of medical and biological information. One of these is PubMed, a bibliographic database for the biomedical field. The MeSH data are freely available in several formats and can be downloaded from the NLM (http://www.nlm.nih.gov/mesh/filelist.html) provided you agree to some simple terms of use.

MeSH terms fall into three categories:

Descriptors
Concepts, some general and others specific, including main terms and synonyms, notations on usage, and brief textual descriptions. Examples include Ascorbic Acid, Dental Clinics, Lesotho, and Tick Bites. There are about 27,000 MeSH descriptors in the 2014 version of the data.
Qualifiers
Broad terms that apply to a number of descriptors. Each qualifier has a two-letter abbreviation as well as a text description and usage information. Most descriptors have a list of qualifiers. Examples include classification, deficiency, ethics, and virology. There are just over 80 qualifiers.
Supplementary Concept Records
Typically refer to specific chemicals. Each SCR has both formal and informal names for the substance. There are over 200,000 SCRs.

Every MeSH descriptor has one or more MeSH numbers. These numbers describe a forest of 16 trees of related descriptors. Descriptors with multiple MeSH numbers may appear in several places on the same tree or on multiple trees. There are approximately twice as many MeSH numbers as descriptors.

The PubMed search engine converts users' search terms to a specialized form that includes MeSH terms before running queries. For example, searching for progesterone receptor returns over 30,000 results and the results page shows the translated query in the box:

"receptors, progesterone"[MeSH Terms] OR ("receptors"[All Fields] AND "progesterone"[All Fields]) OR "progesterone receptors"[All Fields] OR ("progesterone"[All Fields] AND "receptor"[All Fields]) OR "progesterone receptor"[All Fields]

Searching on "receptors, progesterone"[MeSH Terms] returns just under 15,000 results. Obviously this second search is more selective. Given the care that the NLM gives to indexing, we expect that a search using MeSH terms should return fewer results with higher relevance. The better the understanding users have of MeSH, the more precise their queries can be.

The NLM provides a text-based MeSH Browser online. Users can enter search terms and get back a full description of the MeSH term, including references to the parent, child, and sibling nodes in the basic MeSH trees as well as links to terms on other trees. Because the window on the entire set of MeSH terms is so small, however, it is difficult to get a good understanding of the data set and thus how to build effective queries.

Visualization