Difference between revisions of "Corpora"

From Anote2Wiki
Jump to: navigation, search
(Created page with "__TOC__ == About PubMed Retrieval Plug-in == This plugin was developed in collaboration with the SING group at University of Vigo - Spain. Information Retrieval plug-in di...")
 
(User HowTOs)
 
(24 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
__TOC__
 
__TOC__
  
== About PubMed Retrieval Plug-in ==
+
== About Corpora Plug-in ==
  
 +
Plug-in that defines central data-types for Corpora (Corpus Set). All information extraction processes are applied over a Corpus in @Note2. A Corpus is a set of documents that could be annotated with entities/events in IEProcesses. In this plug-in is already some standard View for Annotated documents.
  
This plugin was developed in collaboration with the SING group at University of Vigo - Spain.
+
== User HowTOs ==
 +
 
 +
[[Corpora Load Corpus]]
 +
 
 +
[[Corpora Remove Corpus]]
 +
 
 +
[[Corpus Load Process]]
 +
 
 +
[[Corpus Remove Process]]
 +
 
 +
[[Annotated Document Default View]]
 +
 
 +
[[Process Entity Details View]]
 +
 
 +
[[Process Load Annotated Document]]
 +
 
 +
[[Create Corpus By Publication Manager]]
 +
 
 +
[[Process Relations view]]
  
Information Retrieval plug-in divided and two main points:
+
[[Process Relations Resume Stats]]
* Publication searching in PubMed, with the combination of keywords and specific fields. The result is a set of publication what contains information about title, abstract, journal. These publications are organized in queries
 
* Publication Retrieval that try get PDF file for a Publication indexed for PMID.There are limitation for crawling article (free access only).
 
  
For structuring this plug-in was defined Queries, a publication Set. This publications can be classified in a Query ( Publication Query Relevance ). This step can be important for example in corpus creation or in learning data for automatic classification systems.
+
[[Merging NER Schemas]]
  
== User HowTOs ==
+
[[REProcess Export To XGMML File]]
 +
 
 +
[[Change Class Colors]]
  
 
== MVC AIBench Model: ==
 
== MVC AIBench Model: ==
Line 18: Line 37:
 
=== '''''Data-types:''''' ===
 
=== '''''Data-types:''''' ===
  
'''''PublicationManager''''': Plug-in Main Data-type that contains information about all PubMed search already done. Contain information about proxy and database given by configuration file and directory for publications PDF files.
+
'''''Corpora''''': Represents Corpora and contains a Corpus Set. Contains methods for Corpus database management. 
 +
 
 +
'''''Corpus''''':  Represents a set of publications. Contains information about Corpus properties, name, description, database id and lists of IEPRocess applied to corpus.
 +
 
 +
'''''NERDocumentAnnotation''''':  Contains information about document entities annotations resulting from NER processes..
 +
 
 +
'''''REDocumentAnnotation''''':  Contains information about document entities and event annotation resulting from RE processes.
 +
 
 +
'''''NERProcess''''':  Represents a NER Process and contains a set of ''NERDocumentAnnotation''.
  
'''''QueryInformationRetrievalExtension''''': Represent a Query. Contain information about database ID, date, keywords, organism, matching publications, available abstracts and other generic query properties.  
+
'''''REProcess''''': Represents a RE Process and contains a set of ''REDocumentAnnotations''.
  
 
=== '''''Operations:''''' ===
 
=== '''''Operations:''''' ===
  
'''''AddFileToPublicationManagerOperation''''': Manually add a PDFfile for publication.
+
'''''ChangeClassColor''''': Operation for changing class color. The color serves to view multi-colors.
 +
 
 +
'''''CreateCorpusOperationByPublicationManager''''': Operation that permits corpus creation deriving for Queries of Publication Manager.
 +
 
 +
'''''NERAnnotationsMergeOperation''''': Operation for merging NER Schemas from Corpus.
 +
 
 +
'''''ExitOperation''''': Plug-in exit Operation.
 +
 
 +
'''''InitProject''''': Plug-in start operation.
 +
 
 +
'''''LoadCorpusStatus:''''' Load Corpora Session.
 +
 
 +
'''''SaveCorpusStatus:''''' Save Corpora Session.
  
'''''AddPublicationToQueryOperation''''': Add a new publication to QueryInformationRetrievalExtension.
+
=== '''''Views:''''' ===
  
'''''ExitOperation''''': Publication Manager exit operation.
+
'''''CorporaView''''': Allows the visualization of Corpus data-types on the clipboard.  
  
'''''InitReferenceManager''''': Initialize PublicationManager plug-in.
+
'''''CorpusDocumentsView''''': Allows the visualization of the documents belonging to each corpus.
  
'''''JournalRetrivalListDocs''''': Operation that given a list PMID try gets PDF files.
+
'''''CorpusProcessesView''''': Allows the visualization of the processes applied to each corpus.
  
'''''PubmedSearchOperation''''': Operation for Search in PubMed given the query details.
+
'''''NERAnnotatedDocumentView''''': NERDocumentAnnotation View; allows checking the document entity annotations.  
  
'''''SelectRelevance''''': Change document relevance for query.
+
'''''NERProcessAnnotationDocumentsView''''': NERProcess View of all document and the creation of NERDocumentAnnotation.
  
'''''UpdateQueryOperation''''': Update QueryInformationRetrievalExtension (updates the result of PubMed search in time).
+
'''''NERStatisticsView''''': NERProcess View that contains statistics for entities in the corpus.
  
=== '''''Views:''''' ===
+
'''''REAnnotatedDocumentView''''': REDocumentAnnotation View for document entity and event annotations.
  
'''''PublicationManagerView''''': PublicationManager View that contains a visualizer off all queries presents in Publication Manager and permits have some filters.  
+
'''''REProcessAnnotationDocumentsView''''': REProcess View that contains statistics for entities in REProcess.
  
'''''QueryRelevanceView''''': QueryInformationRetrievalExtension View that permits viewing documents in a query whit your relevance.  
+
'''''RERelationsViewer''''': REProcess view of all Relation present in the process.
  
'''''QueryView''''': QueryInformationRetrievalExtension View that permits viewing documents in a query an permits some search steps.
+
'''''REProcessRelationsResumeStats''''' REProcess Relations main statistics.

Latest revision as of 16:12, 27 February 2013

About Corpora Plug-in

Plug-in that defines central data-types for Corpora (Corpus Set). All information extraction processes are applied over a Corpus in @Note2. A Corpus is a set of documents that could be annotated with entities/events in IEProcesses. In this plug-in is already some standard View for Annotated documents.

User HowTOs

Corpora Load Corpus

Corpora Remove Corpus

Corpus Load Process

Corpus Remove Process

Annotated Document Default View

Process Entity Details View

Process Load Annotated Document

Create Corpus By Publication Manager

Process Relations view

Process Relations Resume Stats

Merging NER Schemas

REProcess Export To XGMML File

Change Class Colors

MVC AIBench Model:

Data-types:

Corpora: Represents Corpora and contains a Corpus Set. Contains methods for Corpus database management.

Corpus: Represents a set of publications. Contains information about Corpus properties, name, description, database id and lists of IEPRocess applied to corpus.

NERDocumentAnnotation: Contains information about document entities annotations resulting from NER processes..

REDocumentAnnotation: Contains information about document entities and event annotation resulting from RE processes.

NERProcess: Represents a NER Process and contains a set of NERDocumentAnnotation.

REProcess: Represents a RE Process and contains a set of REDocumentAnnotations.

Operations:

ChangeClassColor: Operation for changing class color. The color serves to view multi-colors.

CreateCorpusOperationByPublicationManager: Operation that permits corpus creation deriving for Queries of Publication Manager.

NERAnnotationsMergeOperation: Operation for merging NER Schemas from Corpus.

ExitOperation: Plug-in exit Operation.

InitProject: Plug-in start operation.

LoadCorpusStatus: Load Corpora Session.

SaveCorpusStatus: Save Corpora Session.

Views:

CorporaView: Allows the visualization of Corpus data-types on the clipboard.

CorpusDocumentsView: Allows the visualization of the documents belonging to each corpus.

CorpusProcessesView: Allows the visualization of the processes applied to each corpus.

NERAnnotatedDocumentView: NERDocumentAnnotation View; allows checking the document entity annotations.

NERProcessAnnotationDocumentsView: NERProcess View of all document and the creation of NERDocumentAnnotation.

NERStatisticsView: NERProcess View that contains statistics for entities in the corpus.

REAnnotatedDocumentView: REDocumentAnnotation View for document entity and event annotations.

REProcessAnnotationDocumentsView: REProcess View that contains statistics for entities in REProcess.

RERelationsViewer: REProcess view of all Relation present in the process.

REProcessRelationsResumeStats REProcess Relations main statistics.