Link Grammar Parser


January, 2021: link-grammar 5.8.1 released! See below for a description of recent changes.

What is Link Grammar?

The Link Grammar Parser is a syntactic parser of English, Russian, Arabic and Persian (and other languages as well), based on Link Grammar, an original theory of syntax and morphology. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a "constituent" (HPSG style phrase tree) representation of a sentence (showing noun phrases, verb phrases, etc.). The RelEx extension provides Stanford-parser compatible dependency grammar output.

The theory of Link Grammar parsing, and the original version of the parser was created in 1991 by Davy Temperley, John Lafferty and Daniel Sleator, at the time professors of linguistics and computer science at the Carnegie Mellon University. It is the product of decades of academic research into grammar and morphology, and is discussed in numerous publications.

Although based on the original Carnegie-Mellon code base, the current Link Grammar package has dramatically evolved and is profoundly different from earlier versions. There have been innumerable bug fixes; performance has improved by more than an order of magnitude. The package is fully multi-threaded, fully UTF-8 enabled, and has been scrubbed for security, enabling cloud deployment. Parse coverage of English has been dramatically improved; other languages have been added (most notably, Russian). There is a raft of new features, including support for morphology, log-likelihood semantic selection, and a sophisticated tokenizer that moves far beyond white-space-delimited sentence-splitting.

Quick Overview

The parser includes API's in various different programming languages, as well as a handy command-line tool for playing with it. Here's some typical output:

              linkparser> This is a test!
                 Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=6)
                  +----->WV----->+---Ost--+   |
                  +---Wd---+-Ss*b+  +Ds**c+   |
                  |        |     |  |     |   |
              LEFT-WALL this.p is.v a  test.n !
              (S (NP this.p) (VP is.v (NP a test.n)) !)
                          LEFT-WALL    0.000  Wd+ hWV+ Xp+
                             this.p    0.000  Wd- Ss*b+
                               is.v    0.000  Ss- dWV- O*t+
                                  a    0.000  Ds**c+
                             test.n    0.000  Ds**c- Os-
                                  !    0.000  Xp- RW+
                         RIGHT-WALL    0.000  RW-

This rather busy display illustrates many interesting things. For example, the Ss*b link connects the verb and the subject, and indicates that the subject is singular. Likewise, the Ost link connects the verb and the object, and also indicates that the object is singular. The WV (verb-wall) link points at the head-verb of the sentence, while the Wd link points at the head-noun. The Xp link connects to the trailing punctuation. The Ds**c link connects the noun to the determiner: it again confirms that the noun is singular, and also that the noun starts with a consonant. (The PH link, not required here, is used to force phonetic agreement, distinguishing 'a' from 'an'). These link types are documented in the English Link Documentation.

The bottom of the display is a listing of the "disjuncts" used for each word. The disjuncts are simply a list of the connectors that were employed to form the links. They are particularly interesting because they serve as an extremely fine-grained form of a "part of speech" or "grammatical category", although they also can be interpreted as "semantic selections". Thus, for example: the disjunct S- O+ indicates a transitive verb: its a verb that takes both a subject and an object. The additional markup above indicates that 'is' is not only being used as a transitive verb, but it also indicates finer details: a transitive verb that took a singular subject, and was used (is usable as) the head verb of a sentence. The floating-point value is the "cost" of the disjunct; it very roughly captures the log-likelihood of this particular grammatical (and semantic!) usage. Much as parts-of-speech correlate with word-meanings, so also fine-grained parts-of-speech correlate with much finer distinctions and gradations of meaning.

The link-grammar parser also supports morphological analysis. Here is an example in Russian:

              linkparser> это теста
                 Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=4)
                  +---Wd---+       +-LLCAG-+
                  |        |       |       |
              LEFT-WALL это.msi тест.= =а.ndnpi

The LL link connects the stem 'тест' to the suffix 'а'. The MVA link connects only to the suffix, because, in Russian, it is the suffixes that carry all of the syntactic structure, and not the stems. The Russian lexis is documented here.


An extended overview and summary of Link Grammar can be found on the Link Grammar Wikipedia page, which touches on most of the important, primary aspects of the theory. However, it is no substitute for the original papers published on the topic:

A fairly comprehensive bibliography of papers written before 2004 is here and is mirrored here. A sampling of publications that reference Link Grammar in some way can be found here; some of these may be downloaded here.


There is an extensive set of pages documenting the English dictionary; specifically, the names of links and their meanings, as well as how to write new rules. There is also a short primer for creating dictionaries for new languages.

The documentation for the C/C++ programming API is here. Bindings for other programming languages can be found in the bindings directory in the GitHub Link Grammar Repo.

System Summary

  • Actively maintained! New releases typically happen quarterly.
  • Besides English, there is a comprehensive Russian dictionary, thanks to Sergey Protasov. The Persian and Arabic subsystems were provided by John Dehdari. A modest (thousand-word) German dictionary is included. There are proof-of-concept dictionaries for Lithuanian, Indonesian, Kazakh, Vietnamese, Hebrew and Turkish.
  • Several machine-learning projects are attempting to automatically learn LG grammars using unsupervised training methods on bulk text.
  • LG is a full morpho-syntactic parser; morphological disambiguation is handled with a sophisticated tokenization system which tracks alternative candidate word-splits (of words into morphemes) during parsing.
  • Multiple programming language bindings are available, including Ruby, Python, Perl, Lisp, Java, Javascript, Ocaml and AutoIt. Look here.
  • A network (TCP/IP) parse server provides JSON-formatted parse results.
  • Integrated with the OpenCog Atomspace. This allows graph queries and graph tools to be applied to LG output.
  • Fully multi-threaded; a standard build system; pkg-config integration; a CMake config file, dynamic/shared library support; pre-defined Docker containers; support for Linux as well as Windows, MacOSX, FreeBSD.
  • Several security audits have been performed, including fuzzing for mal-formed input. Secure and robust for cloud deployment.
  • Source code hosted at GitHub.
  • LGPL v2.1 license; see endnote for details.

Downloading Link Grammar

The source code to the system can be downloaded as a tarball. The current stable version is Link Grammar 5.8.1 (January, 2021). Older versions are available here.

GitHub hosts the primary link-grammar repository. Issues (bugs) should be reported there. Developers who are not a part of the core development team should not use or deploy the source from github. It is unstable and frequently buggy and broken! All users should use the tarballs, only!

Mailing Lists

The mailing list for Link Grammar discussion is at the link-grammar google group.

Subscribe to link-grammar:

Enter email:

Ongoing development by OpenCog

Ongoing development of Link Grammar is guided and supported by the Open Cognition project, where the parser plays an important role in the OpenCog natural language processing subsystem. Research and implementation is ongoing; current work includes investigations into unsupervised learning of language.

Stanford Parser Compatibility

A sibling project, RelEx, uses constraint-grammar-like techniques to extract dependency relations that are compatible with the Stanford parser. It's performance is comparable to the Stanford PCFG parsing model, and is more than three times faster than the Stanford "lexicalized" (factored) model.

The RelEx project is no longer in active development. We learned (the hard way) that the native Link Grammar parses contain much more information than the Stanford dependency markup is capable of supporting. The Stanford-style dependencies are simply are not rich or sophisticated enough to produce the kind of data needed for semantic analysis and comprehension, viz. tasks such as predicate-argument extraction, framing, semantic selection, and the like.

Language generation

For sentence generation, i.e. the creation of grammatically correct sentences from a bag of semantic relations, the microplanner and surface realization (sureal) portion of OpenCog is strongly recommended. A short example is here. These "sort-of work", but not very well. The primary issue is that they do not make use of the statistical information available in language to choose likely or reasonable sentence constructions.

We previously recommended two projects that should now be considered obsolete: NLGen and NLGen2. For your entertainment, they're still listed below: The NLGen and NLGen2 projects provide natural language generation modules, based on, and compatible with link-grammar and RelEx. They implement the SegSim ideas for NL generation. See the following YouTube videos of a virtual dog, showing some of NLGen's capabilities (circa 2009): Demo of Virtual Dog Learning to Play Fetch via Imitation and Reinforcement, AI Virtual Dog's Emotions Fluctuate Based on Its Experiences, Demo of Embodied Anaphora Resolution and AI Virtual Dog Answers Simple Questions about Itself and Its Environment.

Linguistic Disclaimer

Link Grammar is a natural language parser, not a human-level artificial general intelligence. This means that there are many sentences that it cannot parse correctly, or at all. There are entire classes of speech and writing that it cannot handle, including twitter posts, IRC chat logs, Valley-girl basilect, Old and Middle English, stock-market listings and raw HTML dumps.

Link Grammar works best with "newspaper English", as taught to and written by those educated in American colleges: standard-sized sentences, with proper grammar, proper punctuation, and correct capitalization. Link Grammar has difficulties with the following types of textual input:

  • Phrases (that are not a part of a complete sentence). There is some support for incomplete sentences with ellipsis. Many kinds of short phrases that can be interpreted as commands or instructions or exclamations are supported.
  • Twitter posts. These tend to be sentence fragments, often lacking proper grammatical structure. You should strip off hash-tags before sending text into the parser.
  • Any text containing a large number of spelling errors. The parser does have a built-in "spelling-guesser", which explores alternative spellings for words.
  • "Registers", such as newspaper headlines, where determiners are omitted; for example, "Thieves rob bank." Note, however, a "dialect" support system is in development, which can be used to alter ranking favorability for different forms of expression within a single dictionary.
  • Dialog, stage plays and movie scripts. Such dialog tends to consist of interleaved sentences. External software would be needed to disentangle distinct sentence streams.
  • Speech-to-text output. Such systems generate large numbers of mis-heard words that, taken at face value cannot be a part of valid sentences. Even if such recognition was perfect, spoken English tends not to be as well-constructed or grammatical as written English.
  • Support for British English and Commonwealth English is poor. This includes any English dialects spoken in India, Pakistan, Nigeria, Bangladesh, South Africa, as well as former American protectorates, such as the Phillipines. British and regional spelling of words is missing from the dictionaries. The "dialect" support subsystem should be able to alleviate this, provided that the lexis is appropriately curated.
  • Slang and various regional non-middle-class-American dialects. This includes most dialects spoken by anyone living in economically poor or under-educated geographical regions, whether in urban housing projects or the red-state small-town and rural poor. Self-identifying subgroup dialects are also not handled, such as drug-culture, gang-culture and hacker-culture. The "dialect" support subsystem should be able to alleviate this, provided that the lexis is appropriately curated.
  • Long run-on sentences. These can generate thousands of alternative parses in a combinatorial explosion.

It is hoped that the unsupervised learning of language proposal will be of sufficient power and ability to handle most of these exceptional cases. Work is currently ongoing.

Natural Language Support

Ranked in order of maturity.

The main English documentation is here.
A set of Russian dictionaries providing full coverage for the language have been incorporated into the main distribution as of version 4.7.10 (March 2013). An older version, from which these are derived, can be found at http://slashzone.ru/parser/. By Sergey Protasov. Includes link documentation (mirror) and subscript (morphology) documentation (mirror). Russian morpheme dictionaries can be had at http://aot.ru.

Документация по связям и по классам слов доступна в виде списка примеров.

The Persian dictionaries from Jon Dehdari have been incorporated into the main distribution, as of version 5.0.0 (April 2014). This includes a copy of the Persian stemming engine, as significant morphology analysis needs to be performed to parse Persian.
The Arabic dictionaries from Jon Dehdari have been incorporated into the main distribution, as of version 5.0.0 (April 2014). These are derived from the older, original version. [Mirror] These require the Aramorph stemming package, which is included.
A small German dictionary, consisting of 850 words, is included. A brief description is provided here.
A small Lithuanian prototype dictionary has been created. It contains a few hundred words. A few basic sentences parse just fine; the current version focuses on morphological analysis coupled with grammatical analysis. Documentation is here.

Sukurta yra labai prasta Lietuvių kalbos žodynas; beveik neiks ikį šiol neveikia. Čia dokumentacija.

A small Vietnamese prototype dictionary has been created. It contains several hundred words.
A small Indonesian prototype dictionary has been created. It contains about one hundred words.
A very small Hebrew prototype dictionary has been created. It contains a few dozen words. Almost nothing works correctly (yet).
A very small Kazakh prototype dictionary has been created. It contains a few dozen words. Almost nothing works correctly (yet).
A very small Turkish prototype dictionary has been created. It contains a few dozen words. Almost nothing works correctly (yet).
French, Luthor project
The Luthor project aims to develop a set of scripts to automatically construct Link Grammar linkage dictionaries by mining Wiktionary data. Current efforts are focusing on French. (This project appears to be defunct).

Adjunct Projects

The default distribution for Link Grammar includes bindings for Java, Python, OCaML, Common Lisp, and AutoIt, as well as a SWIG FFI interface file. Additional language bindings, and some related projects, are listed below:

RelEx Semantic Relation Extractor
RelEx is an English-language semantic relationship extractor, built on the Link Parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It will also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. RelEx includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm.
Ruby bindings
Ruby bindings are coordinated at the Ruby-LinkParser website. The code can be found at the ged/link-parser github page.
Perl bindings
Perl bindings, created by Danny Brian, can be found on the Lingua-LinkParser page on CPAN. Caution: those bindings appear to be unmaintained; currently, they includ features that were removed more than than five years ago. (We encourage a new maintainer to step up!) There is also a tutorial written against a very old version of the bindings; some details may be different.
Psi Toolkit (Perl)
The Psi Toolkit, an NLP toolkit aimed at linguists and NLP engineers, includes bindings for link-grammar, via perl.

Recent Changes

Version 5.8.1 (8 January 2021)

Assorted fixes.

  • Fix macOS/SunOS build break.
  • English dict: fix numerical identifiers used as adjectives.
  • English dict: fix post-posed Latin adjectival modifiers.
  • Merge upstream gentoo patches. #1102
  • Make -O3 default for CFLAGS/CXXFLAGS, but overridable by the user.
  • English dict: fix look_at, listen_to person-action
  • English dict: fix verb "felt" with object-action.
  • English dict: fix why-perform-action questions.
  • Fix race condition in spell-guesser. #1122 #1123

Version 5.8.0 (28 February 2020)

Notable changes include: inclusion of javascript node.js bindings; the obsoleting of python2, improved English dictionaries, and most interestingly, an experimental interface for dialects. With this interface, one can provide alternative weightings that emphasize the type of speech that might be common in limited geographical areas, and would not be considered to be commonplace. For example, one can provide weightings for Irish-American, urban English, and newspaper-headline English which might otherwise interfere with ordinary parsing of mainstream English.

  • Java bindings: Remove the obsolete senses API.
  • swig-4.0 compatibility bug fix.
  • English dict: Fixes to support questions ending in WH-words.
  • Copy (merge) Richard van der Dys `node.js` bindings.
  • English dict: Provide head and tail markers for all conjunctions.
  • Remove the Python 2 bindings.
  • Add dialect support to the library.
  • English dict: support for archaic/poetic abbreviations
  • English dict: introduce OH link for vocatives/invocations.
  • English dict: improved parsing of imperatives.
  • Add !!word/ link-parser command for displaying extended word dict info.

Version 5.7.0 (13 September 2019)

This version has one quite remarkable change: the parsing of long sentences has been improved by a factor of 3x or 4x, and thus, the parse speed of many "typical" texts is doubled, or more. Two other important fixes are for broken 32-bit support, and for Windows.

  • Minor efficiency improvements to the SQL-backed dictionary.
  • Incompatible change to the Exp traversal API.
  • Remove the obsolete and unsupported "corpus statistics" code.
  • Major performance improvement (3x-4x) for long sentences. #996
  • Fix broken build on Windows.
  • Convert Windows build to Visual Studio 2019.
  • Fix a bug that causes random results on 32-bit systems. #1000
  • Fix a bug that could cause missing linkages on some systems. #1007

Version 5.6.2 (24 June 2019)

This adds a missing shared-library symbol that broke the opencog build!

  • Bug-fix the SQL-backed dictionary.
  • Add missing public symbol to shlib export list.
  • English dict: additions of paraphrasing verbs.

Version 5.6.1 (27 May 2019)

Important! This is an important update, as it more than doubles the performance across a broad range of different input texts. Kudos to Amir for this amazing work, as he took something that seemed quite fast to begin with, and squeezed out an honest factor of two from it! This is unusual in mature software.

  • Performance improvement (approx 20%) in expressions #882.
  • Performance improvement (approx 10%) by disjunct/connectors pools #896.
  • Performance improvement (4-10% for English) by faster power-pruning #903.
  • Fix a bug in trailing connectors encoding (may cause bad linkages).
  • Fix inability to form linkage when first word is disconnected.
  • English dict: fix use of quotations with paraphrasing verbs.
  • English dict: fix broken usage of "have not".
  • Performance improvement (approx 16%) for long sentences. #931
  • Performance improvement (approx 20%) for long sentences. #939

Version 5.6.0 (4 January 2019)

  • Improve Windows support.
  • Fix dict cost reading under user locales with comma decimal separator.
  • Support using the pcre2 regex package (configured by default if available).
  • Add "-with-regexlib=pcre2|tre|regex|c" to "configure".
  • Fix and document building on FreeBSD.
  • Major documentation update for building with Cygwin.
  • Revise the manpage.
  • Remove the experimental Viterbi code.
  • Revise the SAT parser cost model to align it with the classic parser.
  • Implement a strict check on connector name.
  • Major revision of the SWIG interface file; wrap all the library functions.
  • Fix linkage_get_disjunct_*() when parse-option display_morphology is true.
  • Change library and python-bindings default for display_morphology to true.
  • Drastic speedup for long sentences (hash encoding of trailing connectors).
  • English dict: Support locative replies/declarations.
  • English dict: broaden support for misc paraphrasing verbs.
  • English dict: fix relativized paraphrasing.
  • English dict: fix comparative-style conjunctions.
A list of older changes can be found here.


Issues concerning this website should be addressed to Linas Vepstas - <linasvepstas@gmail.com> or Dom Lachowicz - <domlachowicz@gmail.com>.


Current versions of the Link Grammar parser software, language dictionaries and documentation are available under the LGPL v2.1 license. Versions prior to 5.0.0 are available under a variant of the BSD license.

Copyright (c) 2003-2004 Daniel Sleator, David Temperley, and John Lafferty. All rights reserved.
Copyright (c) 2003 Peter Szolovits
Copyright (c) 2004,2012,2013 Sergey Protasov
Copyright (c) 2006 Sampo Pyysalo
Copyright (c) 2007 Mike Ross
Copyright (c) 2008,2009,2010 Borislav Iordanov
Copyright (c) 2008-2019 Linas Vepstas
Copyright (c) 2014-2019 Amir Plivatsky