Corpus Create Annotation Schema By linnaeus Tagger

From Anote2Wiki
Jump to: navigation, search

Select Option

To perform a new NER (Entity Recognition) based in Linnaeus Tagger, start by loading a Corpus to the Clipboard.
Selecting the Corpus, you should right click over it and choose Corpus -> NER -> Linnaeus Tagger


Corpus Process NER Linnaeus Tagger.png


New Configuration or Load Configuration

A wizard will be presented to configure the process. The first step allows to select two options: Create a new process (New Configuration) or Load Configuration from an NER process that was already performed. To start a new process select New Configuration and press the Next button.


NER Linnaeus Tagger Wizard1.png


Resources Selection

In the next panel, select lexical resources. Here, dictionaries, lookup tables, rules sets and ontologies can be added to be used in the NER process. Selecting the respective tabs, you can select from existing resources organized by their types. When all lexical resources are selected, press Next.


NER Linnaeus Tagger Wizard1a.png

NER Linnaeus Tagger Wizard1b.png

NER Linnaeus Tagger Wizard1d.png


Rule: Partial Match with Dictionaries

Using Rules, it is possible to associate some Rule annotations to Dictionary Terms (including only the Dictionaries selected in this step). For that purpose, you need to select the option Partial Match with Dictionaries on the Rules Tab as shown in the following figure.


ER Linnaeus Tagger Wizard1c.png.png


Example

Dictionary Terms:
relA Gene [List Synonyms][List External Ids]
relB Gene [List Synonyms][List External Ids]
ppGpp Coumpound [List Synonyms][List External Ids]

Rule: 

AA(.*)?\b

If the rule is applied to the following text segment: 

A AArelA gene for some organism interacts with AArelB.

Results
* Without using this option
    relA ( Annotated by Partial Match Rule)
    relB ( Annotated by Partial Match Rule)

* Using Partial Match with Dictionaries
    relA ( Annotated by Partial Match Rule) + associated with dictionary term relA (List Synonyms and External Ids)
    relB ( Annotated by Partial Match Rule) + associated with dictionary term relA (List Synonyms and External Ids)

Select Class and Case Sensitivity

In the next panel, for each lexical resource previously selected you can filter for classes, i.e. select which classes will be associated to each resource. At this step, you can also define if the process will be case sensitive or not.


NER Linnaeus Tagger Wizzard3.png


Pre-processing

The Pre-processing option allows you to define restrictions for annotations generated by the NER, as follows:

Selecting option No the processing continues without any Pre-Processing.


NER Linnaeus Tagger Wizzard4a.png


Stop Words

You can select a list of stop words (Lexical Words Set - Lexical Resources) to use in the Linnaeus Tagger algorithm. Stop words are important to prevent the algorithm from annotating common English words as entities and thus reduce false positive annotations. To activate this pre-processing option, select the Stop Words option and in the panel below select one Lexical Words Resource previously created. To continue with the NER configuration select Next


NER Linnaeus Tagger Wizzard4b.png


Advanced Options

The Linnaeus Tagger Advanced options allows you to define configurations for annotations generated by Linnaeus Tagger algorithm, as follows:

In the Abbreviation option, you can allow abbreviated bio-entities detection in the Linnaeus Tagger algorithm.

In the Number of running Threads option, you can select the number of processing threads that the Linnaeus Tagger algorithm will be able to use.

In the Disambiguation option, you can select one type of disambiguation to be performed by Linnaeus Tagger algorithm.

  • If the option is equal to:
    • OFF: No disambiguation is performed
    • ON_EARLIER: Disambiguation is performed by looking at earlier contents in the document
    • ON_WHOLE: Disambiguation is performed by looking in the whole document


NER Linnaeus Tagger Wizzard5.png


Normalization

In the last panel, you can select a Normalization option that allows to add white space between word delimiters and increase the entity recognition accuracy. This option changes the original offsets of the text.


NER Linnaeus Tagger Wizard6.png



The configuration is now complete once you press Ok. At all times you can cancel the operation clicking on the respective button.

Performing the operation

NER Linnaeus Tagger operation will now start and a progress window will appear, indicating the execution status of the operation. The NER Linnaeus Tagger operation will take a few minutes or hours depending on the number of documents, document size and size of the resources.
When the process ends, a new NER Process object will be added to the clipboard and can be visualized through the Corpus Process View.