Difference between revisions of "RE Model Creation"

From Anote2Wiki
Jump to: navigation, search
Line 3: Line 3:
  
 
== Select Option ==
 
== Select Option ==
To perform a RE Model creation using the BioTML Framework, start by [[Corpus_Load_Process|loading a RE Process to the Clipboard]].<br />
+
 
Selecting the RE Process, you should right click over it and choose RE Process -> BioTML Tagger -> Create Model<br /><br />
+
To create an RE Machine Learning Model using BioTML, the first step is to select the corpus with the RE annotations.
 +
To do this, start by [[Corpus_Load_Process| loading the RE Process to the Clipboard]].<br />
 +
Selecting the RE Process, you should right clic it and choose RE Process -> BioTML Tagger -> Create Model<br /><br />
  
  
Line 11: Line 13:
  
 
== New Configuration or Load Configuration==
 
== New Configuration or Load Configuration==
 +
 
A wizard will be presented to configure the model creation. The first step allows to select two options: Create New BioTML Model Configuration or Load BioTML Model Configuration. To start a new model configuraion select '''Create New BioTML Model Configuration''' and press the '''Next''' button.<br />
 
A wizard will be presented to configure the model creation. The first step allows to select two options: Create New BioTML Model Configuration or Load BioTML Model Configuration. To start a new model configuraion select '''Create New BioTML Model Configuration''' and press the '''Next''' button.<br />
  
Line 18: Line 21:
  
 
== Select NLP Tokenizer System ==
 
== Select NLP Tokenizer System ==
 +
 
A GUI is presented to select the possible NLP systems that are integrated in the BioTML framework. Those systems are used to preform the tokenization of all documents in order to create a data matrix for machine learning algorithms.<br />
 
A GUI is presented to select the possible NLP systems that are integrated in the BioTML framework. Those systems are used to preform the tokenization of all documents in order to create a data matrix for machine learning algorithms.<br />
 
The possible NLP systems to be chosen are the ClearNLP, Standford Core NLP and OpenNLP. Each system contains a description that is presented on this GUI.
 
The possible NLP systems to be chosen are the ClearNLP, Standford Core NLP and OpenNLP. Each system contains a description that is presented on this GUI.
Line 26: Line 30:
  
 
== BioTML Features Selection ==
 
== BioTML Features Selection ==
 +
 
The features selection for machine learning data matrix are selected in this GUI. Regarding the number and type of features, the produced model could have more or less fitting into the data. This selection will have a great impact in the prediction capability, recall and accuracy of the model during the RE annotation.<br /><br />
 
The features selection for machine learning data matrix are selected in this GUI. Regarding the number and type of features, the produced model could have more or less fitting into the data. This selection will have a great impact in the prediction capability, recall and accuracy of the model during the RE annotation.<br /><br />
 
Attention: The number of features and some feature types could dramatically increase the memory and CPU usage!!  
 
Attention: The number of features and some feature types could dramatically increase the memory and CPU usage!!  
Line 34: Line 39:
  
 
== BioTML Model Algorithm Configuration ==
 
== BioTML Model Algorithm Configuration ==
 +
 
The machine learning algorithm is selected in this GUI. Advanced settings could appear regarding the selected algorithm type. <br /><br />
 
The machine learning algorithm is selected in this GUI. Advanced settings could appear regarding the selected algorithm type. <br /><br />
 
For further information's about those advanced configurations please visit the urls: [http://mallet.cs.umass.edu/api/cc/mallet/fst/CRF.html  For CRF Informations] or [https://github.com/cjlin1/libsvm/blob/master/README For SVM informations]
 
For further information's about those advanced configurations please visit the urls: [http://mallet.cs.umass.edu/api/cc/mallet/fst/CRF.html  For CRF Informations] or [https://github.com/cjlin1/libsvm/blob/master/README For SVM informations]
Line 42: Line 48:
  
 
== BioTML RE Aproach Type Selection==  
 
== BioTML RE Aproach Type Selection==  
 +
 
After the algorithm selection, a GUI is presented to select the possible RE approaches that will be used to train the model.  
 
After the algorithm selection, a GUI is presented to select the possible RE approaches that will be used to train the model.  
  
Line 49: Line 56:
  
 
== Save BioTML Model File ==
 
== Save BioTML Model File ==
 +
 
All configurations for RE Model creation are defined. In this Gui, you could define the directory or Zip file that will store the RE Model.  
 
All configurations for RE Model creation are defined. In this Gui, you could define the directory or Zip file that will store the RE Model.  
  

Revision as of 14:18, 4 August 2015

Select Option

To create an RE Machine Learning Model using BioTML, the first step is to select the corpus with the RE annotations. To do this, start by loading the RE Process to the Clipboard.
Selecting the RE Process, you should right clic it and choose RE Process -> BioTML Tagger -> Create Model


Create RE Model By BioTML Tagger.png


New Configuration or Load Configuration

A wizard will be presented to configure the model creation. The first step allows to select two options: Create New BioTML Model Configuration or Load BioTML Model Configuration. To start a new model configuraion select Create New BioTML Model Configuration and press the Next button.


Select New or Load Configuration By BioTML Tagger.png


Select NLP Tokenizer System

A GUI is presented to select the possible NLP systems that are integrated in the BioTML framework. Those systems are used to preform the tokenization of all documents in order to create a data matrix for machine learning algorithms.
The possible NLP systems to be chosen are the ClearNLP, Standford Core NLP and OpenNLP. Each system contains a description that is presented on this GUI.


Select NLP System By BioTML.png


BioTML Features Selection

The features selection for machine learning data matrix are selected in this GUI. Regarding the number and type of features, the produced model could have more or less fitting into the data. This selection will have a great impact in the prediction capability, recall and accuracy of the model during the RE annotation.

Attention: The number of features and some feature types could dramatically increase the memory and CPU usage!!


Select Features For BioTML.png


BioTML Model Algorithm Configuration

The machine learning algorithm is selected in this GUI. Advanced settings could appear regarding the selected algorithm type.

For further information's about those advanced configurations please visit the urls: For CRF Informations or For SVM informations


Select Algorithm Settings For BioTML.png


BioTML RE Aproach Type Selection

After the algorithm selection, a GUI is presented to select the possible RE approaches that will be used to train the model.


Select RE Aproach Type For Model Creation By BioTML Tagger.png


Save BioTML Model File

All configurations for RE Model creation are defined. In this Gui, you could define the directory or Zip file that will store the RE Model.


Save Model By BioTML Tagger.png


Result

The RE model creation operation will start and a working window is shown, indicating the execution of the operation. The RE model creation operation will take a few minutes or hours, depending on corpus size and model configurations. When the process ends, a new Zip file containing the RE model will be added to the defined directory in the last GUI.