BEER: Blocking for Effective Entity Resolution
Home Features Demo Code Papers Contact Us




What is BEER?

Blocking is a key component of Entity Resolution (ER) that aims to improve efficiency by quickly pruning out non-matching record pairs. However, depending on the noise in the dataset and the distribution of entity cluster sizes, existing techniques can be either (a) too aggressive, such that they help scale but can adversely affect the ER effectiveness, or (b) too permissive, potentially harming ER efficiency. We propose a new methodology of progressive blocking that enables both efficient and effective ER and works across different entity cluster size distributions without manual fine tuning.

Blocking for Effective Entity Resolution is an end-to-end system that performs entity resolution along with Blocking in Tandem.

System Architecture

Features

  • BEER presents ways to leverage feedback from partial ER results to fine-tune blocking.
  • BEER constructs refined blocks and re-orders them based on their quality, ensuring ER effectiveness.
  • BEER adapts to the properties of the input dataset with no need for manual configuration.

Video of BEER Demo

Papers

Contributors: Donatella Firmani, Sainyam Galhotra, Barna Saha and Divesh Srivastava

Please reach out to Sainyam (sainyam@cs.umass.edu) if you have any questions.