Technical Program

Paper Detail

Paper Title Convergence of Chao Unseen Species Estimator
Paper IdentifierMO1.R3.3
Authors Nived Rajaraman, Prafulla Chandra, Andrew Thangaraj, Indian Institute of Technology, Madras, India; Ananda Theertha Suresh, Google Research, United States
Session Estimation I
Location Monge, Level 3
Session Time Monday, 08 July, 09:50 - 11:10
Presentation Time Monday, 08 July, 10:30 - 10:50
Manuscript  Click here to download the manuscript
Abstract Support size estimation and the related problem of unseen species estimation have wide applications in ecology and database analysis. Perhaps the most used support size estimator is the Chao estimator. Despite its widespread use, little is known about its theoretical properties. We analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug-in estimator by a factor of $\mathcal{O}((k/n)^2)$. Our main technical contribution is a new method to analyze rational estimators for discrete distribution properties, which may be of independent interest.