Difference between revisions of "Corpus Create Annotation Schema By NER Lexical Resources"

From Anote2Wiki
Jump to: navigation, search
(Select Class ( For each resource))
Line 3: Line 3:
  
 
== Select Option ==
 
== Select Option ==
The user can perform a new NER (Entity recognition) based in Lexical Resources [[Corpora_Load_Corpus|loading Corpus to Clipboard]] based on previous settings (NER already performed - Same Resources and Same Options).  
+
To perform a new NER (Entity Recognition) based in Lexical Resources, start by [[Corpora_Load_Corpus|loading a Corpus to the Clipboard]].  
  
Selecting Corpus, the user must press right mouse button an select '''Corpus -> NER -> Lexical Resources'''
+
Selecting the Corpus, you should right click over it and choose '''Corpus -> NER -> Lexical Resources'''
  
 
[[Image:Corpus_Process_NER_ANote.png|1500px|center]]
 
[[Image:Corpus_Process_NER_ANote.png|1500px|center]]
  
 
== New Configuration or Load Configuration==
 
== New Configuration or Load Configuration==
A wizard will be presented. The first allows to select two options: Create a new process ('' New Configuration'') and ''Load Configuration'' from a process that already performed. The user must select '''New Configuration''' and press '''Next button'''.
+
 
 +
A wizard will be presented to configure the process.  
 +
The first step allows to select two options: Create a new process (''New Configuration'') or ''Load Configuration'' from an NER process that was already performed.  
 +
To start a new process select '''New Configuration''' and press the '''Next''' button.
  
 
[[Image:NER_ANote_Wizard1.png|800px|center]]
 
[[Image:NER_ANote_Wizard1.png|800px|center]]
  
 
== Resources Selection ==
 
== Resources Selection ==
In the next panel, the user must '''select lexical resources'''. Here, dictionaries, lookup tables, Rules set and Ontologies can be added to NER process. Selecting tabs the user can change resources type and select different resources.
+
 
After lexical resources selection, the user must press '''Next button'''.
+
In the next panel, the user must '''select lexical resources'''. Here, dictionaries, lookup tables, rules sets and ontologies can be added to be used in the NER process.  
 +
Selecting the respective tabs the user can select from existing resources organized by their types.
 +
When all lexical resources are selected, press '''Next'''.
  
 
[[Image:NER_ANote_Wizard1a.png|800px|center]]
 
[[Image:NER_ANote_Wizard1a.png|800px|center]]
Line 22: Line 27:
 
[[Image:NER_ANote_Wizard1b.png|800px|center]]
 
[[Image:NER_ANote_Wizard1b.png|800px|center]]
  
== Select Class ( For each resource) and Case Sensitivity ==
+
== Select Class and Case Sensitivity ==
In the next panel, For each '''lexical resource''' previous selected the user can '''filter for classes'''.
+
 
 +
In the next panel, for each '''lexical resource''' previously selected you can '''filter for classes''', i.e. select which classes will be associated to each resource.
 +
At this step, you can also define if the process will be case sensitive or not.
  
 
[[Image:NER_ANote_Wizard2.png|800px|center]]
 
[[Image:NER_ANote_Wizard2.png|800px|center]]
 
the user also
 
  
 
== Pre-Processing ==
 
== Pre-Processing ==
  
Pre-Processing option allows the user to define restrictions for annotations generated by NER
+
The Pre-Processing option allows you to define restrictions for annotations generated by the NER, as follows:
 
 
=== Without Pre-processing ===
 
  
Selecting option '''''No''''' the processing continuing without any Pre_Processing.  
+
Selecting option '''''No''''' the processing continues without any Pre_Processing.  
  
 
[[Image:NER_ANote_Wizard3_Without_Pre_processing.png|800px|center]]
 
[[Image:NER_ANote_Wizard3_Without_Pre_processing.png|800px|center]]
  
 
=== Stop Words ===
 
=== Stop Words ===
The user can select a list of stop words (Lexical Words Set - Lexical Resources) to perform NER algorithm. Stop words are important for algorithm don't annotate common English word as entities ( Remove false positive annotations). For select this pre-processing option select '''''Stop Words''''' and the panel above select one Lexical Words Resource previous created. For continuing whit NER configuration selct '''Next Button'''  
+
The user can select a list of stop words (Lexical Words Set - Lexical Resources) to use in the NER algorithm.  
 +
Stop words are important to prevent the algorithm from annotating common English words as entities and thus reduce false positive annotations.  
 +
To activate this pre-processing option, select the '''''Stop Words''''' option and in the panel below select one Lexical Words Resource previously created.  
 +
To continue with the NER configuration select '''Next'''  
  
 
[[Image:NER_ANote_Wizard3_StopWords.png|800px|center]]
 
[[Image:NER_ANote_Wizard3_StopWords.png|800px|center]]
Line 46: Line 52:
 
=== POS-Tagging ===
 
=== POS-Tagging ===
  
The user can filter NER annotations by restricted the annotations to specific grammatically words (Example noun,adjectives,pronoun). For select this pre-processing option select '''''POS-Tagging''''' an in panel above (Select POS-tags) select POS-Tags which are considered to annotation.
+
You can filter NER annotations by restricting the annotations to specific grammatical classes (nouns,adjectives,verbs,pronouns, etc).  
 +
To select this pre-processing option select '''''POS-Tagging''''' an in the panel below select the tags representing classes to be considered in the annotation.
  
 
[[Image:NER_ANote_Wizard3_POS_Tagging.png|800px|center]]
 
[[Image:NER_ANote_Wizard3_POS_Tagging.png|800px|center]]
Line 52: Line 59:
 
== Normalization ==
 
== Normalization ==
  
The last panel, The user could select a '''Normalization option'''. The Normalization options permits adding white space between words delimiters and increase the entity recognition. This option cahnge the original offsets for text.  
+
In the last panel, you can select a '''Normalization option''' that allow to add white space between words delimiters and increase the entity recognition accuracy.  
 +
This option changes the original offsets of the text.  
  
 
[[Image:NER_ANote_Wizard4.png|800px|center]]
 
[[Image:NER_ANote_Wizard4.png|800px|center]]
  
The configurations have been made, the '''Ok button''' must be pressed. All time the user could cancel operation clinking on cancel button.
+
The configuration is now complete once you press '''Ok'''. At all times you can cancel the operation clicking on the respective button.
  
 
== Performing ==
 
== Performing ==
NER operation will start and a small window will appear, indicating the execution of the operation.
+
NER operation will now start and a progress window will appear, indicating the execution of the operation.
The NER operation will take a few minutes or hours depending number of documents, document size and total resources terms.
+
The NER operation will take a few minutes or hours depending on the number of documents, document size and size of the resources.
  
When process finishing , a new '''NER Process''' object will be added to the [[Corpora_Load_Corpus|''Corpus Process View'']].
+
When the process finishes , a new '''NER Process''' object will be added to the [[Corpora_Load_Corpus|''Corpus Process View'']].

Revision as of 00:05, 16 January 2013

Select Option

To perform a new NER (Entity Recognition) based in Lexical Resources, start by loading a Corpus to the Clipboard.

Selecting the Corpus, you should right click over it and choose Corpus -> NER -> Lexical Resources

Corpus Process NER ANote.png

New Configuration or Load Configuration

A wizard will be presented to configure the process. The first step allows to select two options: Create a new process (New Configuration) or Load Configuration from an NER process that was already performed. To start a new process select New Configuration and press the Next button.

NER ANote Wizard1.png

Resources Selection

In the next panel, the user must select lexical resources. Here, dictionaries, lookup tables, rules sets and ontologies can be added to be used in the NER process. Selecting the respective tabs the user can select from existing resources organized by their types. When all lexical resources are selected, press Next.

NER ANote Wizard1a.png
NER ANote Wizard1b.png

Select Class and Case Sensitivity

In the next panel, for each lexical resource previously selected you can filter for classes, i.e. select which classes will be associated to each resource. At this step, you can also define if the process will be case sensitive or not.

NER ANote Wizard2.png

Pre-Processing

The Pre-Processing option allows you to define restrictions for annotations generated by the NER, as follows:

Selecting option No the processing continues without any Pre_Processing.

NER ANote Wizard3 Without Pre processing.png

Stop Words

The user can select a list of stop words (Lexical Words Set - Lexical Resources) to use in the NER algorithm. Stop words are important to prevent the algorithm from annotating common English words as entities and thus reduce false positive annotations. To activate this pre-processing option, select the Stop Words option and in the panel below select one Lexical Words Resource previously created. To continue with the NER configuration select Next

NER ANote Wizard3 StopWords.png

POS-Tagging

You can filter NER annotations by restricting the annotations to specific grammatical classes (nouns,adjectives,verbs,pronouns, etc). To select this pre-processing option select POS-Tagging an in the panel below select the tags representing classes to be considered in the annotation.

NER ANote Wizard3 POS Tagging.png

Normalization

In the last panel, you can select a Normalization option that allow to add white space between words delimiters and increase the entity recognition accuracy. This option changes the original offsets of the text.

NER ANote Wizard4.png

The configuration is now complete once you press Ok. At all times you can cancel the operation clicking on the respective button.

Performing

NER operation will now start and a progress window will appear, indicating the execution of the operation. The NER operation will take a few minutes or hours depending on the number of documents, document size and size of the resources.

When the process finishes , a new NER Process object will be added to the Corpus Process View.