A Model for Automatic Abstractive Multidocument Domain-Specific Summarization

Ahmed, Hadia Abbas Mohammed Elsied; Supervisor, - Naomie Binti Salim

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
PhD theses : Computer Science and Information Technology
→
View Item

dc.contributor.author	Ahmed, Hadia Abbas Mohammed Elsied
dc.contributor.author	Supervisor, - Naomie Binti Salim
dc.date.accessioned	2019-05-26T10:13:56Z
dc.date.available	2019-05-26T10:13:56Z
dc.date.issued	2019-03-01
dc.identifier.citation	Ahmed, Hadia Abbas Mohammed Elsied.A Model for Automatic Abstractive Multidocument Domain-Specific Summarization\Hadia Abbas Mohammed Elsied Ahmed;Naomie Binti Salim.-Khartoum:Sudan University of Science & Technology,College of Computer Science and Information Technology,2019.-102p.:ill.;28cm.-Ph.D.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/22661
dc.description	Thesis	en_US
dc.description.abstract	Documents which are retrieved there on the internet through online search often come with a large amount of text. In the context of news documents, different news sources reporting on the same event usually contain common components that build up the main story of the news. This study aims to provide a new model of multi-document abstractive summarization (SRL-CST) based technique.The study first makes a pre-process to the texts which include sentence splitting, tokenization, stop word elimination and word stemming and then employs the Semantic Role Labeling (SRL) to each sentence and then Predicate Argument Structure (PAS) extracted, which will be the representation of the texts undergo summary. Since this study involves multiple documents, the research further investigates the automatic identification of cross-document relations from unannotated text documents, where the case-based reasoning (CBR) classification model is proposed. Cross-document relations are used to identify highly relevant sentences to be included in the summary. In the context of CST, the researcher suggests combining each related relation to be in one big relation and this is done based on their similar meaning. Content selection for the summary is made by combining the PASs based on the Cross document Structure theory(CST) relations that each PAS has with other PASs, then according to number of relation types that each PAS holds a score is given calculated to each PAS ,then we combine the PASs according to rules related to CST suggested by the researcher so as to reduce the redundancy. Next, the PASs was ranked using document No and the sentence position No in that document. lastly, the PASs in the top 20% higher scores are selected to form the final summary. Pyramid evaluation is examined against the study system summary and human model summaries and it could be observed from the results, that on mean coverage score the proposed approach (AS-SRL-CST) yields better summarization results.	en_US
dc.description.sponsorship	Sudan University of Science and Technology	en_US
dc.language.iso	en	en_US
dc.publisher	Sudan University of Science & Technology	en_US
dc.subject	Automatic Abstractive	en_US
dc.subject	Multidocument	en_US
dc.subject	Domain-Specific	en_US
dc.title	A Model for Automatic Abstractive Multidocument Domain-Specific Summarization	en_US
dc.title.alternative	نموزج للتلخيص التلقائي للوثائق المتعددة في مجال محدد	en_US
dc.type	Thesis	en_US