Principles of Data Integration,
Edition 1
By AnHai Doan, Alon Halevy and Zachary Ives

Publication Date: 25 Jun 2012

Principles of Data Integration is the first comprehensive textbook of data integration, covering theoretical principles and implementation issues as well as current challenges raised by the semantic web and cloud computing. The book offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand. Readers will also learn how to build their own algorithms and implement their own data integration application.

Written by three of the most respected experts in the field, this book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts.

This text is an ideal resource for database practitioners in industry, including data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, and data analysts; students in data analytics and knowledge discovery; and other data professionals working at the R&D and implementation levels.

Key Features

  • Offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand
  • Enables you to build your own algorithms and implement your own data integration applications
About the author
By AnHai Doan, Associate Professor in Computer Science at the University of Wisconsin-Madison. Consulting work with Microsoft AdCenter Lab and Yahoo Research Lab.; Alon Halevy, Head of the Structured Data Group, Google Research, Mountain View, California. and Zachary Ives, Associate Professor at the University of Pennsylvania, and a Faculty Member of the Penn Center for Bioinformatics.
Table of Contents



1. Introduction

1.1 What Is Data Integration?

1.2 Why Is It Hard?

1.3 Data Integration Architectures

1.4 Outline of the Book

Bibliographic Notes

Part I: Foundational Data Integration Techniques

2. Manipulating Query Expressions

2.1 Review of Database Concepts

2.2 Query Unfolding

2.3 Query Containment and Equivalence

2.4 Answering Queries Using Views

Bibliographic Notes

3. Describing Data Sources

3.1 Overview and Desiderata

3.2 Schema Mapping Languages

3.3 Access-Pattern Limitations

3.4 Integrity Constraints on the Mediated Schema

3.5 Answer Completeness

3.6 Data-Level Heterogeneity

Bibliographic Notes

4. String Matching

4.1 Problem Description

4.2 Similarity Measures

4.3 Scaling Up String Matching

Bibliographic Notes

5. Schema Matching and Mapping

5.1 Problem Definition

5.2 Challenges of Schema Matching and Mapping

5.3 Overview of Matching and Mapping Systems

5.4 Matchers

5.5 Combining Match Predictions

5.6 Enforcing Domain Integrity Constraints

5.7 Match Selector

5.8 Reusing Previous Matches

5.9 Many-to-Many Matches

5.10 From Matches to Mappings

Bibliographic Notes

6. General Schema Manipulation Operators

6.1 Model Management Operators

6.2 Merge

6.3 ModelGen

6.4 Invert

6.5 Toward Model Management Systems

6.5 Bibliographic Notes

7. Data Matching

7.1 Problem Definition

7.2 Rule-Based Matching

7.3 Learning-Based Matching

7.4 Matching by Clustering

7.5 Probabilistic Approaches to Data Matching

7.6 Collective Matching

7.7 Scaling Up Data Matching

Bibliographic Notes

8. Query Processing

8.1 Background: DBMS Query Processing

8.2 Background: Distributed Query Processing

8.3 Query Processing for Data Integration

8.4 Generating Initial Query Plans

8.5 Query Execution for Internet Data

8.6 Overview of Adaptive Query Processing

8.7 Event-Driven Adaptivity

8.8 Performance-Driven Adaptivity

Bibliographic Notes

9. Wrappers

9.1 Introduction

9.2 Manual Wrapper Construction

9.3 Learning-Based Wrapper Construction

9.4 Wrapper Learning without Schema

9.5 Interactive Wrapper Construction

Bibliographic Notes

10. Data Warehousing and Caching

10.1 Data Warehousing

10.2 Data Exchange: Declarative Warehousing

10.3 Caching and Partial Materialization

10.4 Direct Analysis of Local, External Data

Bibliographic Notes

Part II: Integration with Extended Data Representations

11. XML

11.1 Data Model

11.2 XML Structural and Schema Definitions

11.3 Query Language

11.4 Query Processing for XML

11.5 Schema Mapping for XML

Bibliographic Notes

12. Ontologies and Knowledge Representation

12.1 Example: Using KR in Data Integration

12.2 Description Logics

12.3 The Semantic Web

Bibliographic Notes

13. Incorporating Uncertainty into Data Integration

13.1 Representing Uncertainty

13.2 Modeling Uncertain Schema Mappings

13.3 Uncertainty and Data Provenance

Bibliographic Notes

14. Data Provenance

14.1 The Two Views of Provenance

14.2 Applications of Data Provenance

14.3 Provenance Semirings

14.4 Storing Provenance

Bibliographic Notes

Part III: Novel Integration Architectures

15. Data Integration on the Web

15.1 What Can We Do with Web Data?

15.2 The Deep Web

15.3 Topical Portals

15.4 Lightweight Combination of Web Data

15.5 Pay-as-You-Go Data Management

Bibliographic Notes

16. Keyword Search

16.1 Keyword Search over Structured Data

16.2 Computing Ranked Results

16.3 Keyword Search for Data Integration

Bibliographic Notes

17. Peer-to-Peer Integration

17.1 Peers and Mappings

17.2 Semantics of Mappings

17.3 Complexity of Query Answering in PDMS

17.4 Query Reformulation Algorithm

17.5 Composing Mappings

17.6 Peer Data Management with Looser Mappings

Bibliographic Notes

18. Integration in Support of Collaboration

18.1 What Makes Collaboration Different

18.2 Processing Corrections and Feedback

18.3 Collaborative Annotation and Presentation

18.4 Dynamic Data: Collaborative Data Sharing

Bibliographic Notes

19. The Future of Data Integration

19.1 Uncertainty, Provenance, and Cleaning

19.2 Crowdsourcing and “Human Computing”

19.3 Building Large-Scale Structured Web Databases

19.4 Lightweight Integration

19.5 Visualizing Integrated Data

19.6 Integrating Social Media

19.7 Cluster- and Cloud-Based Parallel Processing and Caching



Book details
ISBN: 9780124160446
Page Count: 520
Retail Price : £58.99

Han & Kamber, Data Mining: Concepts and Techniques, 2e (MK 2006). (9781558609013) $74.95

Allemang, Semantic Web for the Working Ontologist (MK 2008) 97801238735560. $69.95/51.95EURO/42.99GBP

Witten/Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2e (MK 2005). (9780120884070) $69.95/51.95EURO/42.99GBP


Database practitioners in industry, i.e., data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, data analysts, and other data professionals working at the R&D and implementation levels. Students in data analytics and knowledge discovery