Commit 67541ae8 authored by Panagiotis Papadakos's avatar Panagiotis Papadakos
Browse files

[Comment] It is best to hold the document id also in the posting file

parent 857ffc81
...@@ -98,13 +98,13 @@ public class Index { ...@@ -98,13 +98,13 @@ public class Index {
* ========================================================================= * =========================================================================
* 3) DOCUMENTS FILE => documents.idx (Random Access File) * 3) DOCUMENTS FILE => documents.idx (Random Access File)
* *
* For each entry it stores: | Title (variable bytes / UTF-8) | * For each entry it stores: | DOCUMENT_ID (40 ASCII chars => 40 bytes) |
* Author_1,Author_2, ...,Author_k (variable bytes / UTF-8) | AuthorID_1, * Title (variable bytes / UTF-8) | Author_1,Author_2, ...,Author_k
* AuthorID_2, ...,Author_ID_k (variable size /ASCII) | Year (short => 2 * (variable bytes / UTF-8) | AuthorID_1, AuthorID_2, ...,Author_ID_k
* bytes)| Journal Name (variable bytes / UTF-8) | The weight (norm) of * (variable size /ASCII) | Year (short => 2 bytes)| Journal Name (variable
* Document (double => 8 bytes)| Length of Document (int => 4 bytes) | * bytes / UTF-8) | The weight (norm) of Document (double => 8 bytes)|
* PageRank Score (double => 8 bytes => this will be used in the second * Length of Document (int => 4 bytes) | PageRank Score (double => 8 bytes
* phase of the project) * => this will be used in the second phase of the project)
* *
* ==> IMPORTANT NOTES * ==> IMPORTANT NOTES
* *
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment