dc.contributor.author |
Ahmed, Hadia Abbas Mohammed Elsied |
|
dc.contributor.author |
Supervisor, - Naomie Binti Salim |
|
dc.date.accessioned |
2019-05-26T10:13:56Z |
|
dc.date.available |
2019-05-26T10:13:56Z |
|
dc.date.issued |
2019-03-01 |
|
dc.identifier.citation |
Ahmed, Hadia Abbas Mohammed Elsied.A Model for Automatic Abstractive Multidocument Domain-Specific Summarization\Hadia Abbas Mohammed Elsied Ahmed;Naomie Binti Salim.-Khartoum:Sudan University of Science & Technology,College of Computer Science and Information Technology,2019.-102p.:ill.;28cm.-Ph.D. |
en_US |
dc.identifier.uri |
http://repository.sustech.edu/handle/123456789/22661 |
|
dc.description |
Thesis |
en_US |
dc.description.abstract |
Documents which are retrieved there on the internet through online search often come with a large amount of text. In the context of news documents, different news sources reporting on the same event usually contain common components that build up the main story of the news. This study aims to provide a new model of multi-document abstractive summarization (SRL-CST) based technique.The study first makes a pre-process to the texts which include sentence splitting, tokenization, stop word elimination and word stemming and then employs the Semantic Role Labeling (SRL) to each sentence and then Predicate Argument Structure (PAS) extracted, which will be the representation of the texts undergo summary.
Since this study involves multiple documents, the research further investigates the automatic identification of cross-document relations from unannotated text documents, where the case-based reasoning (CBR) classification model is proposed. Cross-document relations are used to identify highly relevant sentences to be included in the summary. In the context of CST, the researcher suggests combining each related relation to be in one big relation and this is done based on their similar meaning.
Content selection for the summary is made by combining the PASs based on the Cross document Structure theory(CST) relations that each PAS has with other PASs, then according to number of relation types that each PAS holds a score is given calculated to each PAS ,then we combine the PASs according to rules related to CST suggested by the researcher so as to reduce the redundancy. Next, the PASs was ranked using document No and the sentence position No in that document. lastly, the PASs in the top 20% higher scores are selected to form the final summary. Pyramid evaluation is examined against the study system summary and human model summaries and it could be observed from the results, that on mean coverage score the proposed approach (AS-SRL-CST) yields better summarization results. |
en_US |
dc.description.sponsorship |
Sudan University of Science and Technology |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Sudan University of Science & Technology |
en_US |
dc.subject |
Automatic Abstractive |
en_US |
dc.subject |
Multidocument |
en_US |
dc.subject |
Domain-Specific |
en_US |
dc.title |
A Model for Automatic Abstractive Multidocument Domain-Specific Summarization |
en_US |
dc.title.alternative |
نموزج للتلخيص التلقائي للوثائق المتعددة في مجال محدد |
en_US |
dc.type |
Thesis |
en_US |