Commit 67541ae8 authored by Panagiotis Papadakos's avatar Panagiotis Papadakos
Browse files

[Comment] It is best to hold the document id also in the posting file

parent 857ffc81
......@@ -98,13 +98,13 @@ public class Index {
* =========================================================================
* 3) DOCUMENTS FILE => documents.idx (Random Access File)
*
* For each entry it stores: | Title (variable bytes / UTF-8) |
* Author_1,Author_2, ...,Author_k (variable bytes / UTF-8) | AuthorID_1,
* AuthorID_2, ...,Author_ID_k (variable size /ASCII) | Year (short => 2
* bytes)| Journal Name (variable bytes / UTF-8) | The weight (norm) of
* Document (double => 8 bytes)| Length of Document (int => 4 bytes) |
* PageRank Score (double => 8 bytes => this will be used in the second
* phase of the project)
* For each entry it stores: | DOCUMENT_ID (40 ASCII chars => 40 bytes) |
* Title (variable bytes / UTF-8) | Author_1,Author_2, ...,Author_k
* (variable bytes / UTF-8) | AuthorID_1, AuthorID_2, ...,Author_ID_k
* (variable size /ASCII) | Year (short => 2 bytes)| Journal Name (variable
* bytes / UTF-8) | The weight (norm) of Document (double => 8 bytes)|
* Length of Document (int => 4 bytes) | PageRank Score (double => 8 bytes
* => this will be used in the second phase of the project)
*
* ==> IMPORTANT NOTES
*
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment