Difference between revisions of "Corpus Create Annotation Schema By NER Lexical Resources"

From Anote2Wiki
Jump to: navigation, search
(Created page with "Category:HOWTOs When there are one or more Corpus available in a clipboard, it is possible to execute an entity name recognition (NER). '''Named Entity Recognition whit...")
 
(Rule: Partial Match with Dictionaries)
 
(41 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
__TOC__
 
[[Category:HOWTOs]]
 
[[Category:HOWTOs]]
  
 +
== Select Option ==
  
When there are one or more Corpus available in a clipboard, it is possible to execute an entity name recognition (NER).  '''Named Entity Recognition whit Lexical Resources''' is a native operation over Corpus(right clicking it - Clipboard).  
+
To perform a new NER (Entity Recognition) based in Lexical Resources, start by [[Corpora_Load_Corpus|loading a Corpus to the Clipboard]].  
  
Corpus -> NER -> Lexical Resources
+
Selecting the Corpus, you should right click over it and choose  '''Corpus -> NER -> Lexical Resources'''
  
[[Image:Corpus_Process_NER_ANote.png|center]]
 
  
 +
[[Image:Corpus_Process_NER_ANote.png|1500px|center]]
  
A wizard will be presented. This allows to configure the NER process. The first step is to select the Publication Set over which the NER will be performed. When the desired Publication Set is selected, the '''Next button''' is pressed.
 
  
 +
== New Configuration or Load Configuration==
  
[[Image:ner2.png|450px|center]]
+
A wizard will be presented to configure the process.
 +
The first step allows to select two options: Create a new process (''New Configuration'') or ''Load Configuration'' from an NER process that was already performed.
 +
To start a new process select '''New Configuration''' and press the '''Next''' button.
  
  
In the next step, a dictionary must be selected for the NER. Here, a new dictionary can be imported (how to import dictionaries will be described later in this document). After the dictionary has been chosen, the list of possible classes will be presented. The user selects the classes to annotate by moving them from the left to the right list.
+
[[Image:NER_ANote_Wizard1.png|800px|center]]
  
  
[[Image:ner3.png|450px|center]]
+
== Resources Selection ==
  
 +
In the next panel, '''select lexical resources'''. Here, dictionaries, lookup tables, rules sets and ontologies can be added to be used in the NER process.
 +
Selecting the respective tabs the user can select from existing resources organized by their types.
 +
When all lexical resources are selected, press '''Next'''.
  
In the last step, a set of complementary classes that the user can choose to be annotated are presented. Those are classes which are given by lists of terms manually compiled. The available options are:
 
  
*'''Biology-related Verbs''';
+
[[Image:NER_ANote_Wizard1a.png|800px|center]]
*'''Laboratory Techniques''';
 
*'''Physiological States''';
 
*'''Predefined Expert Hand Rules'''.
 
  
  
In the same window the user defines if he decides to annotate abstracts or full texts.
+
[[Image:NER_ANote_Wizard1b.png|800px|center]]
  
  
After all the configurations have been made, the '''Execute button''' (gear icon) has to be pressed. When the button is pressed, the NER operation will start and a small window will appear, indicating the execution of the operation. The NER operation will take a few minutes.
+
=== Rule: Partial Match with Dictionaries ===
  
 +
Using Rules, it is possible to associate some Rule annotations to Dictionary Terms (including only the Dictionaries selected in this step).
 +
For that purpose, you need to select the option Partial Match with Dictionaries on the Rules Tab as shown in the following figure.
  
[[Image:ner4.png|450px|center]]
 
  
 +
[[File:RulesMatcing.png|center|800px]]
  
When the process is finished, a new '''Ner Box List''' object will be added to the clipboard. This object contains a list of items of the datatype '''ANoteNerBox''', each being the result of a NER operation. The Ner Box List exists because it is possible to create different kinds of configurations to NER (e.g. distinct dictionaries), and each configuration yields a distinct '''NerBox'''.
+
 
 +
<pre>
 +
'''Example'''
 +
 
 +
Dictionary Terms:
 +
relA Gene [List Synonyms][List External Ids]
 +
relB Gene [List Synonyms][List External Ids]
 +
ppGpp Coumpound [List Synonyms][List External Ids]
 +
 
 +
Rule:
 +
 
 +
AA(.*)?\b
 +
 
 +
If the rule is applied to to the following text segment:
 +
 
 +
A AArelA gene for some organism interacts with AArelB.
 +
 
 +
'''''Results'''´´
 +
* Without using this option
 +
    relA ( Annotated by Partial Match Rule)
 +
    relB ( Annotated by Partial Match Rule)
 +
 
 +
* Using Partial Match with Dictionaries
 +
    relA ( Annotated by Partial Match Rule) + associated with dictionary term relA (List Synonyms and External Ids)
 +
    relB ( Annotated by Partial Match Rule) + associated with dictionary term relA (List Synonyms and External Ids)
 +
 
 +
</pre>
 +
 
 +
== Select Class and Case Sensitivity ==
 +
 
 +
In the next panel, for each '''lexical resource''' previously selected you can '''filter for classes''', i.e. select which classes will be associated to each resource.
 +
At this step, you can also define if the process will be case sensitive or not.
 +
 
 +
 
 +
[[Image:NER_ANote_Wizard2.png|800px|center]]
 +
 
 +
 
 +
== Pre-Processing ==
 +
 
 +
The Pre-Processing option allows you to define restrictions for annotations generated by the NER, as follows:
 +
 
 +
Selecting option '''''No''''' the processing continues without any Pre-Processing.
 +
 
 +
 
 +
[[Image:NER_ANote_Wizard3_Without_Pre_processing.png|800px|center]]
 +
 
 +
 
 +
=== Stop Words ===
 +
 
 +
You can select a list of stop words (Lexical Words Set - Lexical Resources) to use in the NER algorithm.
 +
Stop words are important to prevent the algorithm from annotating common English words as entities and thus reduce false positive annotations.
 +
To activate this pre-processing option, select the '''''Stop Words''''' option and in the panel below select one Lexical Words Resource previously created.
 +
To continue with the NER configuration select  '''Next'''
 +
 
 +
 
 +
[[Image:NER_ANote_Wizard3_StopWords.png|800px|center]]
 +
 
 +
 
 +
=== POS-Tagging ===
 +
 
 +
You can filter NER annotations by restricting the annotations to specific grammatical classes (nouns,adjectives,verbs,pronouns, etc).
 +
To select this pre-processing option select '''''POS-Tagging''''' an in the panel below select the tags representing classes to be considered in the annotation.
 +
 
 +
 
 +
[[Image:NER_ANote_Wizard3_POS_Tagging.png|800px|center]]
 +
 
 +
== Normalization ==
 +
 
 +
In the last panel, you can select a '''Normalization option''' that allow to add white space between words delimiters and increase the entity recognition accuracy.
 +
This option changes the original offsets of the text.
 +
 
 +
 
 +
[[Image:NER_ANote_Wizard4.png|800px|center]]
 +
 
 +
 
 +
The configuration is now complete once you press '''Ok'''. At all times you can cancel the operation clicking on the respective button.
 +
 
 +
== Performing ==
 +
 
 +
NER operation will now start and a progress window will appear, indicating the execution of the operation.
 +
The NER operation will take a few minutes or hours depending on the number of documents, document size and size of the resources.
 +
 
 +
When the process ends, a new '''NER Process''' object will be added to the clipboard and can be visualized through the  [[Corpora_Load_Corpus|''Corpus Process View'']].

Latest revision as of 15:42, 12 June 2013

Select Option

To perform a new NER (Entity Recognition) based in Lexical Resources, start by loading a Corpus to the Clipboard.

Selecting the Corpus, you should right click over it and choose Corpus -> NER -> Lexical Resources


Corpus Process NER ANote.png


New Configuration or Load Configuration

A wizard will be presented to configure the process. The first step allows to select two options: Create a new process (New Configuration) or Load Configuration from an NER process that was already performed. To start a new process select New Configuration and press the Next button.


NER ANote Wizard1.png


Resources Selection

In the next panel, select lexical resources. Here, dictionaries, lookup tables, rules sets and ontologies can be added to be used in the NER process. Selecting the respective tabs the user can select from existing resources organized by their types. When all lexical resources are selected, press Next.


NER ANote Wizard1a.png


NER ANote Wizard1b.png


Rule: Partial Match with Dictionaries

Using Rules, it is possible to associate some Rule annotations to Dictionary Terms (including only the Dictionaries selected in this step). For that purpose, you need to select the option Partial Match with Dictionaries on the Rules Tab as shown in the following figure.


RulesMatcing.png


'''Example'''

Dictionary Terms:
relA Gene [List Synonyms][List External Ids]
relB Gene [List Synonyms][List External Ids]
ppGpp Coumpound [List Synonyms][List External Ids]

Rule: 

AA(.*)?\b

If the rule is applied to to the following text segment: 

A AArelA gene for some organism interacts with AArelB.

'''''Results'''´´
* Without using this option
    relA ( Annotated by Partial Match Rule)
    relB ( Annotated by Partial Match Rule)

* Using Partial Match with Dictionaries
    relA ( Annotated by Partial Match Rule) + associated with dictionary term relA (List Synonyms and External Ids)
    relB ( Annotated by Partial Match Rule) + associated with dictionary term relA (List Synonyms and External Ids)

Select Class and Case Sensitivity

In the next panel, for each lexical resource previously selected you can filter for classes, i.e. select which classes will be associated to each resource. At this step, you can also define if the process will be case sensitive or not.


NER ANote Wizard2.png


Pre-Processing

The Pre-Processing option allows you to define restrictions for annotations generated by the NER, as follows:

Selecting option No the processing continues without any Pre-Processing.


NER ANote Wizard3 Without Pre processing.png


Stop Words

You can select a list of stop words (Lexical Words Set - Lexical Resources) to use in the NER algorithm. Stop words are important to prevent the algorithm from annotating common English words as entities and thus reduce false positive annotations. To activate this pre-processing option, select the Stop Words option and in the panel below select one Lexical Words Resource previously created. To continue with the NER configuration select Next


NER ANote Wizard3 StopWords.png


POS-Tagging

You can filter NER annotations by restricting the annotations to specific grammatical classes (nouns,adjectives,verbs,pronouns, etc). To select this pre-processing option select POS-Tagging an in the panel below select the tags representing classes to be considered in the annotation.


NER ANote Wizard3 POS Tagging.png

Normalization

In the last panel, you can select a Normalization option that allow to add white space between words delimiters and increase the entity recognition accuracy. This option changes the original offsets of the text.


NER ANote Wizard4.png


The configuration is now complete once you press Ok. At all times you can cancel the operation clicking on the respective button.

Performing

NER operation will now start and a progress window will appear, indicating the execution of the operation. The NER operation will take a few minutes or hours depending on the number of documents, document size and size of the resources.

When the process ends, a new NER Process object will be added to the clipboard and can be visualized through the Corpus Process View.