A Study on Quotation of Web Documents Using Reading Annotation and its Applications
Abstract
In this thesis, we propose a new mechanism of quotation of Web documents and its applications.In our mechanism, we define a quotation information as an annotation to the documents described in an XML format. The quotation information includes the pointer to the internal element of the quoted document, the pointer to the internal element of the quoting document, and the attribute concerning the purpose of the quotation.And we represent the quotation information as a bidirectional hyperlink that connects the internal elements of the quoted document with that of the quoting document. Our mechanism has an advantage for document authors who quote online documents, for readers, and for researchers who perform citation analysis.
Also, we propose a method to facilitate the user to quote the document using reading annotation - metadata that we associate some attributes with any parts of the documents during reading them. Our proposed system records users' reading annotations and allows the user to retrieve them and quote parts of the document easily when the user writes a new document.We compared our method with a general retrieval method in some experiments.The results showed that our method was more effective than the general method in retrieval time.
In addition, we propose a method to similar documents based on co-citation extracted from quotation annotations - a set of quotation information accumulated in our system.Most of previous methods that have been proposed in citation analysis consider that all citations have the same similarities.Any semantic information on the quotation were not reflected in these similarities.We propose a method to consider semantic information on the quotation using our quotation annotation.Semantic information on the quotation include a distance between quotation parts and purposes of quotation.We experimented to show how effective our method is.Concretely, we classified co-citations based on their semantic information,and compared similarities between documents that have the relationship defined by a co-citation.The results of our experiment showed that our method was effective than the method that employed conventional measures.