Difference between revisions of "Create Corpus By PDF Directory"

From Anote2Wiki
Jump to: navigation, search
(Update Document PMID (OtherID))
(Update Document PMID (OtherID))
Line 50: Line 50:
  
 
A graphical interface is launched that allows you to edit the file PMID information. Pressing PDF button a GUI with PDF file is opening.
 
A graphical interface is launched that allows you to edit the file PMID information. Pressing PDF button a GUI with PDF file is opening.
 +
 +
[[File:Create_Corpus_By_PDF_Directrory_Step3.png|800px|center]]
  
 
== Update Publication Meta Information and Full Text ==
 
== Update Publication Meta Information and Full Text ==

Revision as of 17:07, 7 April 2014

Operation

To create a Corpus based on PDF Directory, click the Corpora datatype object in the clipboard and select the option Corpus -> Create -> PDF Directory.


Create Corpus By PDF Directrory.png


Selecting this option causes the Corpus Creation wizard to be launched. Below a brief overview of Wizard steps.

Create Corpus By PDF Directrory Overview.png

Select Loader Options and Directory

A GUI appears allowing to select the folder where the PDF files are saved.

Here user have can opt for one Loader Option:

  • Default (Without) The PDF Meta-information not be automatically loaded from PubMed (Workflow Diagram Step 1)
  • Use File Name as PMID (Other ID) For each PDF are associated a PMID for in further steps system find meta information like abstract,title or authors for PDF in Pubmed.(Workflow Diagram Step 2)
  • Import Document Meta information from TSV File For each PDF will be associated a PMID by given a TSV file with file name and PMID combination. Further steps system find meta information like abstract,title or authors for PDF in PubMed. (Workflow Diagram Step 3)
Create Corpus By PDF Directrory Step1.png

Select Corpus Name

Select a name for the corpus, e.g “New Corpus” and press next. By Default the Corpus name as a the directory Name full path.

Create Corpus By PDF Directrory Step2.png

Import Meta-Information

A graphical interface is launched that allows you to select the file / view information about the first lines and select General Delimiter,Text Delimiter, DefaultValue and mapping between File name or full path and PMID.


Create Corpus By PDF Directrory Step2b.png
  • General Delimiter: overall file delimiter to split the contents of different columns (in Blue)
  • Text Delimiter: delimiter to encapsulate information
  • Default Value: default value used to represent empty records ( in orange )
  • Column Selection Options: select the column in the file for file Name or file full path and column for PMID

Update Document PMID (OtherID)

A graphical interface is launched that allows you to edit the file PMID information. Pressing PDF button a GUI with PDF file is opening.

Create Corpus By PDF Directrory Step3.png

Update Publication Meta Information and Full Text

Result

A new Corpus is now created and will be available in the clipboard, being visualized through the Corpora View.