From Extractive To Abstractive Summarization: A Journey

E-Book Overview

This book describes recent advances in text summarization, identifies remaining gaps and challenges, and proposes ways to overcome them. It begins with one of the most frequently discussed topics in text summarization – ‘sentence extraction’ –, examines the effectiveness of current techniques in domain-specific text summarization, and proposes several improvements. In turn, the book describes the application of summarization in the legal and scientific domains, describing two new corpora that consist of more than 100 thousand court judgments and more than 20 thousand scientific articles, with the corresponding manually written summaries. The availability of these large-scale corpora opens up the possibility of using the now popular data-driven approaches based on deep learning. The book then highlights the effectiveness of neural sentence extraction approaches, which perform just as well as rule-based approaches, but without the need for any manual annotation. As a next step, multiple techniques for creating ensembles of sentence extractors – which deliver better and more robust summaries – are proposed. In closing, the book presents a neural network-based model for sentence compression. Overall the book takes readers on a journey that begins with simple sentence extraction and ends in abstractive summarization, while also covering key topics like ensemble techniques and domain-specific summarization, which have not been explored in detail prior to this.

E-Book Content

Parth Mehta · Prasenjit Majumder From Extractive to Abstractive Summarization: A Journey From Extractive to Abstractive Summarization: A Journey Parth Mehta Prasenjit Majumder • From Extractive to Abstractive Summarization: A Journey 123 Parth Mehta Information Retrieval and Language Processing Lab Dhirubhai Ambani Institute of Information and Communication Technology Gandhinagar, Gujarat, India Prasenjit Majumder Information Retrieval and Language Processing Lab Dhirubhai Ambani Institute of Information and Communication Technology Gandhinagar, Gujarat, India ISBN 978-981-13-8933-7 ISBN 978-981-13-8934-4 https://doi.org/10.1007/978-981-13-8934-4 (eBook) © Springer Nature Singapore Pte Ltd. 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Preface Research in the field of text summarisation has primarily been dominated by investigations of various sentence extraction techniques with a significant focus towa