What is BEER?

Blocking is a key component of Entity Resolution (ER) that aims to improve efficiency by quickly pruning out non-matching record pairs. However, depending on the noise in the dataset and the distribution of entity cluster sizes, existing techniques can be either (a) too aggressive, such that they help scale but can adversely affect the ER effectiveness, or (b) too permissive, potentially harming ER efficiency. We propose a new methodology of progressive blocking that enables both efficient and effective ER and works across different entity cluster size distributions without manual fine tuning.

Blocking for Effective Entity Resolution is an end-to-end system that performs entity resolution along with Blocking in Tandem.

System Architecture

Features

BEER presents ways to leverage feedback from partial ER results to fine-tune blocking.

BEER constructs refined blocks and re-orders them based on their quality, ensuring ER effectiveness.

BEER adapts to the properties of the input dataset with no need for manual configuration.

Papers

Efficient and Effective ER with Progressive Blocking.
Sainyam Galhotra, Donatella Firmani, Barna Saha and Divesh Srivastava. Accepted at VLDB Journal 2021.

Robust Entity Resolution Using a CrowdOracle.
Donatella Firmani, Sainyam Galhotra, Barna Saha, Divesh Srivastava. IEEE Data Eng. Bull. 41(2): 91-103 (2018).

Robust Entity Resolution using Random Graphs.
Sainyam Galhotra, Donatella Firmani, Barna Saha, Divesh Srivastava. SIGMOD Conference 2018: 3-18.

Online Entity Resolution Using an Oracle.
Donatella Firmani, Barna Saha, Divesh Srivastava. PVLDB 9(5): 384-395 (2016).

What is BEER?

System Architecture

Features

Video of BEER Demo

Papers

Contributors: Donatella Firmani, Sainyam Galhotra, Barna Saha and Divesh Srivastava

Please reach out to Sainyam (sainyam@cs.umass.edu) if you have any questions.