Difference between revisions of "Create Corpus By PDF Directory"
(→Select Corpus Name) |
(→Import Meta-Information) |
||
Line 35: | Line 35: | ||
[[File:Create_Corpus_By_PDF_Directrory_Step2.png|800px|center]] | [[File:Create_Corpus_By_PDF_Directrory_Step2.png|800px|center]] | ||
− | == Import Meta-Information == | + | == Import Meta-Information == |
+ | |||
+ | A graphical interface is launched that allows you to select the file / view information about the first lines and select General Delimiter,Text Delimiter, DefaultValue and mapping between File name or full path and PMID. | ||
+ | |||
+ | |||
+ | [[File:sdasdsadasd.png|800px|center]] | ||
+ | |||
+ | |||
+ | * General Delimiter: overall file delimiter to split the contents of different columns (in Blue) | ||
+ | * Text Delimiter: delimiter to encapsulate information | ||
+ | * Default Value: default value used to represent empty records ( in orange ) | ||
+ | * Column Selection Options: select the column in the file for file Name or file full path and column for PMID | ||
== Update Document PMID (OtherID) == | == Update Document PMID (OtherID) == |
Revision as of 17:02, 7 April 2014
Contents
Operation
To create a Corpus based on PDF Directory, click the Corpora datatype object in the clipboard and select the option Corpus -> Create -> PDF Directory.
Selecting this option causes the Corpus Creation wizard to be launched. Below a brief overview of Wizard steps.
Select Loader Options and Directory
A GUI appears allowing to select the folder where the PDF files are saved.
Here user have can opt for one Loader Option:
- Default (Without) The PDF Meta-information not be automatically loaded from PubMed (Workflow Diagram Step 1)
- Use File Name as PMID (Other ID) For each PDF are associated a PMID for in further steps system find meta information like abstract,title or authors for PDF in Pubmed.(Workflow Diagram Step 2)
- Import Document Meta information from TSV File For each PDF will be associated a PMID by given a TSV file with file name and PMID combination. Further steps system find meta information like abstract,title or authors for PDF in PubMed. (Workflow Diagram Step 3)
Select Corpus Name
Select a name for the corpus, e.g “New Corpus” and press next. By Default the Corpus name as a the directory Name full path.
Import Meta-Information
A graphical interface is launched that allows you to select the file / view information about the first lines and select General Delimiter,Text Delimiter, DefaultValue and mapping between File name or full path and PMID.
- General Delimiter: overall file delimiter to split the contents of different columns (in Blue)
- Text Delimiter: delimiter to encapsulate information
- Default Value: default value used to represent empty records ( in orange )
- Column Selection Options: select the column in the file for file Name or file full path and column for PMID
Update Document PMID (OtherID)
Update Publication Meta Information and Full Text
Result
A new Corpus is now created and will be available in the clipboard, being visualized through the Corpora View.