| |
Abstract:
Research in computational linguistics requires the continual
development and redevelopment of data sources and tools. A number
of existing tools are theory specific and force users to model
within the constraints of a given theory. The Java Computational
Linguistics Environment (JaCLE) is designed to provide linguists
with tools to test and model theoretic assumptions, without forcing
commitments to a particular theory. JaCLE makes use of annotation
graphs (Bird and Liberman, 1999), and features and feature bundles
as developed for the Text Encoding Initiative (Langendoen and
Simons, 1995) to provide flexibility in modeling linguistic data
and formalisms in tandem with varying degrees of unification in the
modeling of grammar productions.
The competition between the roles of the grammar and the lexicon
is a major issue in the implementation of linguistic models, and
presents some of the most significant differences between various
syntactic theories. Depending on how much power is given to either
the grammar or lexicon, one can determine the efficacy of any given
theory or implementation. Likewise, to what degree there is overlap
between the grammar and lexicon, one can determine how much
ambiguity is permitted within an implementation. JaCLE allows the
user to test varying degrees of unification on specific lexical and
grammatical fragments. This represents the major impetus for the
design of JaCLE, which allowed the authors to readily test specific
theories or formalisms on small fragments of lexicon and grammar
without developing different systems specific to different
theories.
Designed in Java, within the framework of Open Source typically
seen in the Linux environment, JaCLE is designed as a foundation
for testing theories of syntax as applied to parsing, with
flexibility provided by altering its grammar and lexicon. Because
of its modular design, and its framework built from a core set of
primitive data types -- segments and features -- research of
varying degrees of sophistication can use JaCLE for testing
theories of syntax and parsing.
Bird, Steven and Mark Liberman. (1999) A Formal Framework for
Linguistic Annotation. Technical Report MS-CIS-99-01, Department of
Computer and Information Systems.
Langendoen, D. Terence and Gary F. Simons. (1995) A rationale for
the TEI Recommendations for Feature-Structure Markup. Computers and
the Humanitities. V. 29. pp. 191-209
|