RE Model Creation

From Anote2Wiki
Revision as of 14:38, 4 August 2015 by Anote2Wiki (talk | contribs) (Result)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Select Option

To create an RE Machine Learning Model using BioTML, the first step is to select the corpus with the RE annotations. To do this, start by loading the RE Process to the Clipboard.
Selecting the RE Process, you should right clic it and choose RE Process -> BioTML Tagger -> Create Model


Create RE Model By BioTML Tagger.png


New Configuration or Load Configuration

A wizard will be presented to configure the model creation process. The first step allows to select two options: Create New BioTML Model Configuration or Load BioTML Model Configuration. To start a new model configuration, select Create New BioTML Model Configuration and press the Next button.


Select New or Load Configuration By BioTML Tagger.png


Select NLP Tokenizer System

A GUI is presented to select the possible NLP systems that are integrated in the BioTML framework. Those systems are used to perform the tokenization of all documents to create a data matrix for machine learning algorithms.
The possible NLP systems to be chosen are the ClearNLP, Stanford Core NLP and OpenNLP. Each system contains a description that is presented on this GUI.


Select NLP System By BioTML.png


BioTML Feature Selection

The selection of the features used by the machine learning algorithm as its training data matrix is done using the GUI below. Depending on the number and type of features selected, the produced model can fit the data with variable degree of accuracy. The selection of the features can have a great impact in the predictive capability of the model, influencing precision, recall and accuracy of the model during the RE annotation.

Warning: The number of features and some feature types can significantly increase the memory and CPU usage !


Select Features For BioTML.png


BioTML Algorithm Configuration

The machine learning algorithm is selected in this GUI. Currently available options are Conditional Random Fields (CRFs) implemented by the Mallet software and Support Vector Machines (SVMs) as implemented by LIBSVM software. Advanced settings can appear regarding the selected algorithm type.

For further information about those advanced configurations please visit the URLs: CRF Information or SVM information


Select Algorithm Settings For BioTML.png


BioTML RE Aproach Type Selection

After the algorithm selection, a GUI is presented to select the possible RE approaches that will be used to train the model. For each approach, a textual description is provided in the box below.


Select RE Aproach Type For Model Creation By BioTML Tagger.png


Save BioTML Model File

All configurations for RE Model creation are now defined. In this GUI, you can define a folder or Zip file that will store the RE Model.


Save Model By BioTML Tagger.png


Result

The RE model creation operation will start and a working window is shown, indicating the execution of the operation. The RE model creation operation will take a few minutes or hours, depending on corpus size and model configurations. When the process ends, a new Zip file containing the RE model will be added to the defined directory in the last GUI.