Inference control in statistical databases, also known as statistical disclosure limitation or statistical confidentiality, is about finding tradeoffs to the tension between the increasing societal need for accurate statistical data and the legal and ethical obligation to protect privacy of individuals and enterprises which are the source of data for producing statistics. Techniques used by intruders to make inferences compromising privacy increasingly draw on data mining, record linkage, knowledge discovery, and data analysis and thus statistical inference control becomes an integral part of computer science.This coherent state-of-the-art survey presents some of the most recent work in the field. The papers presented together with an introduction are organized in topical sections on tabular data protection, microdata protection, and software and user case studies.
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2316 3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo Josep Domingo-Ferrer (Ed.) Inference Control in Statistical Databases From Theory to Practice 13 Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editor Josep Domingo-Ferrer Universitat Rovira i Virgili Department of Computer Engineering and Mathematics Av. Pa¨ısos Catalans 26, 43007 Tarragona, Spain E-mail:
[email protected] Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Inference control in statistical databases : from theory to practice / Josep Domingo-Ferrer (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Tokyo : Springer, 2002 (Lecture notes in computer science ; Vol. 2316) ISBN 3-540-43614-6 CR Subject Classification (1998): G.3, H.2.8, K.4.1, I.2.4 ISSN 0302-9743 ISBN 3-540-43614-6 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2002 Printed in Germany Typesetting: Camera-ready by author, data conversion by Christian Grosche, Hamburg Printed on acid-free paper SPIN 10846628 06/3142 543210 Preface Inference control in statistical databases (also known as statistical disclosure control, statistical disclosure limitation, or statistical confidentiality) is about finding tradeoffs to the tension between the increasing societal demand for accurate statistical data and the legal and ethical obligation to protect the privacy of individuals and enterprises which are the source of data for producing statistics. To put it bluntly, statistical agencies cannot expect to collect accurate information from individual or corporate respondents unless these feel the privacy of their responses is guaranteed. This state-of-the-art survey covers some of the most recent work in the field of inference control in statistical databases. This topic