Quantcast
Viewing latest article 6
Browse Latest Browse All 10

Text Analysis with GATE – Part 7

JAPE: Regular Expressions over Annotations

JAPE is a Java Annotation Patterns Engine. JAPE provides finite state transduction over annotations based on regular expressions.
JAPE is a version of Common Pattern Specification Language.

JAPE allows to recognise regular expressions in annotations on documents. A regular language can only describe sets of strings, not graphs, and GATE’s model of annotations is based on graphs. Regular expressions are applied to character strings, a simple linear sequence of items, but here they are applied to much more complex data structure. The result is that in certain cases the matching process is non-deterministic (i.e. the results are dependent on random factors like the addresses at which data is stored in the virtual machine).

A JAPE grammar consists of a set of phases, each of which consists of a set of pattern/action rules. The phases run sequentially and constitute a cascade of finite state transducers over annotations. The left-hand-side (LHS) of the rules consist of an annotation pattern description. The right-hand-side (RHS) consists of annotation manipulation statements. Annotations matched on the LHS of a rule may be referred to on the RHS by means of labels that are attached to pattern elements.
An Example -

Phase: Jobtitle
Input: Lookup
Options: control = appelt debug = true
Rule: Jobtitle1
(
{Lookup.majorType == jobtitle}
(
{Lookup.majorType == jobtitle}
)?
)
:jobtitle
–>

:jobtitle.JobTitle = {rule = “JobTitle1″}
The LHS is the part preceding the ‘–>’ and the RHS is the part following it. The LHS specifies a pattern to be matched to the annotated GATE document, whereas the RHS specifies what is to be done to the matched text. In this example, we have a rule entitled ‘Jobtitle1’, which will match text annotated with a ‘Lookup’ annotation with a ‘majorType’ feature of ‘jobtitle’, followed optionally by further text annotated as a ‘Lookup’ with ‘majorType’ of ‘jobtitle’. Once this rule has matched a sequence of text, the entire sequence is allocated a label by the rule, and in this case, the label is ‘jobtitle’. On the RHS, we refer to this span of text using the label given in the LHS; ‘jobtitle’. We say that this text is to be given an annotation of type ‘JobTitle’ and a ‘rule’ feature set to ‘JobTitle1’.Phase: Jobtitle

 All these contents are collected from General Architechture of Text Enginnering Documentation User Guide.

We have only tried to extract information from the above document to understand the software perspective.


Viewing latest article 6
Browse Latest Browse All 10

Trending Articles