Privacy is the problem of extracting useful information from data in a way that does not compromise the privacy of individuals who are represented in the data. Algorithmic fairness (i.e., discrimination control) has emerged as an equally important consideration where the goal is to design data-driven algorithms whose performance is agnostic to legally protected attributes such as race or gender. Privacy and fairness have both received extensive attention from multiple communities, resulting in a proliferation of formulations, metrics, designs, and results. This tutorial aims to highlight the key developments on both fronts by taking an information-theoretic lens to the underlying problems: those of quantifying measures for information leakage, formalizing the trade-off problem between guaranteeing privacy/fairness and restricting utility (value of data), and designing mechanisms (mappings) with rigorous privacy/fairness or utility guarantees. At their core, both emerging challenges have a strong information theoretic underpinnings and it is this that will be highlighted throughout the tutorial. The strong connections between information-theoretic methods and theoretical methods in computer science and statistics will also be discussed.
Our proposed tutorial is broadly divided into three parts: The focus in the first part of the tutorial will be on information-theoretic metrics that have been studied with an overarching goal of operationally motivating meaningful measures for leakage in the age of machine learning and data-driven methods. In the second part, the tutorial will first review the privacy-utility trade-off problem; various formulations of privacy/fairness vs. utility tradeoffs will be considered and compared. In particular, In the context of fairness, the tutorial will cover: (i) different mathematical definitions of disparate impact, (ii) an overview of methods for discrimination discovery and control, and (iii) information-theoretic tools useful and relevant in fair machine learning. Finally, in the third and last part, the tutorial will turn its attention to the practical challenge of designing algorithms that ensure privacy and fairness. A particular emphasis will be placed on how learning techniques, in particular generative adversarial models inspired by generative adversarial network designs (GANs), can be used to achieve privacy and fairness. Discussion will also include sample complexity bounds of learning such mechanisms and resilience of privacy mechanisms against arbitrary inferential adversaries. This third part of the tutorial will also contain a period dedicated to discussing industry trends in privacy and fairness, especially recent efforts at Google on implementing local differential privacy and at IBM on achieving fairness.
The tutorial will conclude with a brief breakout session in which participants can discuss their own research problems with the lecturers. Alternatively, if the enrollment permits, there will be a period devoted to collaborative problem formulation involving both the lecturers and the participants. No background in either fairness or privacy or machine learning will be assumed.
Flavio P. Calmon is an Assistant Professor of Electrical Engineering at Harvard's John A. Paulson School of Engineering and Applied Sciences. Before joining Harvard, he was the inaugural Data Science for Social Good Post-Doctoral Fellow at IBM Research in Yorktown Heights, New York. He received his Ph.D. in Electrical Engineering and Computer Science at MIT. His main research interests are information theory and statistics, with applications to fairness, privacy, machine learning, and communications engineering. Prof. Calmon has received the NSF CAREER award, the Google Faculty Research Award, the IBM Open Collaborative Research Award, and Harvard's Lemann Brazil Research Fund Award. He has experience in data science in both industry and academia, and has developed pre-processing algorithms for discrimination control currently implemented in the IBM AI360 platform. His work on fairness in machine learning has been publicized by both industry and Harvard. Prof. Calmon has participated in the semester-long MIT Teaching Certification Program, and his teaching ratings rank among the top of the School of Engineering and Applied Sciences at Harvard.
Lalitha Sankar is an Associate Professor in the School of Electrical, Computer, and Energy Engineering at Arizona State University. Prior to this, she was an Associate Research Scholar at Princeton University and a recipient of a three year Science and Technology Teaching Postdoctoral Fellowship from the Council on Science and Technology at Princeton University. She has also worked at AT&T Labs and Polaroid Engineering R&D Labs. She received the B.Tech degree from the Indian Institute of Technology, Bombay, the M.S. degree from the University of Maryland, and the Ph.D degree from Rutgers University. Her research interests focus on the application of information theory, signal processing, and statistics to study problems of privacy, fairness and learning as well as cybersecurity of cyber-physical systems. She has received the NSF CAREER award, the IEEE Globecom 2011 Best Paper Award, and the 2007-2008 Electrical Engineering Academic Achievement Award from Rutgers University.
Peter Kairouz is a research scientist at Google AI, where he leads efforts on decentralized and privacy-preserving machine learning. Before joining Google, he was a Postdoctoral Research Fellow at Stanford University. He received his Ph.D. in ECE, M.S. in Maths, and M.S. in ECE from the University of Illinois at Urbana-Champaign (UIUC) and his B.E. in ECE from the American University of Beirut (AUB). During his Ph.D., he interned twice at Qualcomm and Google, where he designed privacy-aware utility-optimal unsupervised learning algorithms. His work on data privacy received a lot of media attention and was recently featured on Forbes. He is the recipient of the 2012 Roberto Padovani Scholarship from Qualcomm's Research Center, the 2015 ACM SIGMETRICS Best Paper Award, the 2015 Qualcomm Innovation Fellowship Finalist Award, and the 2016 Harold L. Olesen Award for Excellence in Undergraduate Teaching from UIUC. While at UIUC, he taught several classes on big data and probabilities and was the General Chair for the 10th Annual Coordinated Science Laboratory Student Conference. His research interests are interdisciplinary and span the areas of data and network sciences, privacy-preserving data analysis, machine learning, and information theory.
Aaron Wagner is a professor in the School of Electrical and Computer Engineering at Cornell University. He received the B.S. degree from the University of Michigan, Ann Arbor, and the M.S. and Ph.D. degrees from the University of California, Berkeley. He has received the NSF CAREER award, the David J. Sakrison Memorial Prize from the U.C. Berkeley EECS Dept., the Bernard Friedman Memorial Prize in Applied Mathematics from the U.C. Berkeley Dept. of Mathematics, the James L. Massey Research and Teaching Award for Young Scholars from the IEEE Information Theory Society and teaching awards at the Department, College, and University level at Cornell. He was a plenary speaker at SPAWC 2018.
Non-convex optimization is the core algorithmic workhorse in modern machine learning. Its practical use far exceeds our theoretical understanding of it, yet rigor is key to getting even better performance and algorithms.
The tutorial will have two parts -- part one provides an overview of the exciting recent progress in rigorously understanding non-convex optimization; both first-order methods like gradient descent and its variants, and other approaches. Part two will dig deeper into specific important problem classes, and also describe several directions for future research.
The tutorial should be broadly accessible to the typical ISIT audience; any more specific background will be developed as needed.
Sujay Sanghavi is an Associate Professor of ECE at the University of Texas at Austin, which he joined in 2009. Prior to that he was a postdoc at LIDS in MIT, and has a PhD and two MS degrees (in Math and ECE) from UIUC. Sujay's primary research interest is the development of rigorous new methods and algorithms for machine learning, and their use in solving core problems in industry. Sujay's background is in the fields of optimization, algorithms, stochastic processes and graph theory. Sujay has an NSF Career Award and a DoD Young Investigator Award. He is an Associate Editor for the Journal of Machine Learning Research (JMLR), an Associate Editor of Machine Learning for the Transactions on Information Theory, and a senior TPC member for the NeurIPS and ICML conferences. He has been a Visiting Scientist at Google Research in Mountainview, CA, and a quant and founding member of a portfolio management team in the statistical arbitrage hedge fund Engineers Gate. He is currently also a Principal Research Scientist and Amazon Scholar with Amazon Inc.
Praneeth Netrapalli has been a researcher at Microsoft Research India since 2016. Prior to that he was a postdoc at Microsoft Research New England. He obtained his MS and PhD degrees from The University of Texas at Austin in Electrical and Computer Engineering. His main research interests are in developing provable and efficient algorithms for various problems in machine learning using tools from optimization and probability. He is an area chair of the AISTATS, ICML, COLT and NeurIPS conferences.
One of the major recent advances in theoretical machine learning is the development of efficient learning algorithms for various high-dimensional statistical models. The Achilles heel of these algorithms is the assumption that the samples are precisely generated from the model. This assumption is crucial for the performance of these algorithms: even a very small fraction of outliers can completely compromise the algorithms' behavior.
Recent results in theoretical computer science have led to the development of the first computationally efficient robust estimators for a range of high-dimensional models. The goal of this tutorial is to introduce the broad information theory community to the core insights and techniques in this area of algorithmic robust statistics, and discuss new directions and opportunities for future work.
Ilias Diakonikolas is an Assistant Professor and Andrew and Erna Viterbi Early Career Chair in the Department of Computer Science at USC. He obtained a Diploma in Electrical and Computer Engineering from the National Technical University of Athens and a Ph.D. in Computer Science from Columbia University. Before moving to USC, he was a faculty member at the University of Edinburgh, and prior to that he was the Simons postdoctoral fellow in theoretical computer science at the University of California, Berkeley. His research focus is on the algorithmic and statistical foundations of massive data sets. He is a recipient of a Sloan Faculty Fellowship, an NSF Career Award, a Google Faculty Research Award, a Marie Curie Fellowship, the IBM Research Pat Goldberg Best Paper Award, a Viterbi Junior Faculty Research Award, and an honorable mention in the George Nicholson competition from the INFORMS society.
Private (PIR) protocols make it possible to retrieve a data item from a database without disclosing any information about the identity of the item being retrieved. The notion of PIR was first introduced by Chor, Goldreich, Kushilevitz, and Sudan in 1995 and has attracted considerable attention since in the Theoretical Computer Science community. In information-theoretic k-server PIR, the database is replicated among k non-communicating servers, and each server learns nothing about the item retrieved by the user.
In the original formulation of the PIR problem, the files are assumed to be a single bit and the communication complexity is measured by the total amount of communication, i.e., both upload and download. However, the recent information-theoretic reformulation of the problem assumes the practical scenario in which the files are of arbitrarily large size. Under this setup, the number of uploaded bits can be neglected with respect to the corresponding number of downloaded bits.
This reformulation of the problem introduces the rate of a PIR scheme to be the ratio between the size of the file and the total number of downloaded bits from all servers, and the supremum of achievable rates over all achievable retrieval schemes is defined as the PIR capacity.
Coding and information theory have been extremely successful in characterizing the capacity of PIR as well as proposing codes for capacity-achieving PIR schemes. This topic has been extensively investigated in the past few years for several extensions of the problem such as coded PIR, PIR with colluding servers, robust PIR, byzantine PIR, symmetric PIR, multifile PIR, multi-round PIR, PIR with side information, and several more.
The goal of this tutorial is to furnish the audience with significant results on the topic, both in coding and information theory, and point to fresh problem formulations and open research directions brought by this new paradigm.
Salim El Rouayheb, Rutgers University. Salim El Rouayheb is an assistant professor in the ECE Department at Rutgers University. From 2013 to 2017, he was an assistant professor at the ECE Department at the Illinois Institute of Technology, Chicago. He was a research scholar at the EE Department at Princeton University (2012-2013) and a postdoc at the EECS department at the University of California, Berkeley (2010-2011). He received his Ph.D. degree in electrical engineering from Texas A&M University, College Station, in 2009. He received the Google Faculty Award in 2018 and NSF CAREER award in 2016. His research interests lie in the area of information theoretic security and privacy of data in networks and distributed systems.
Sennur Ulukus, University of Maryland. Sennur Ulukus is a Professor of Electrical and Computer Engineering at the University of Maryland at College Park, where she also holds a joint appointment with the Institute for Systems Research (ISR). Prior to joining UMD, she was a Senior Technical Staff Member at AT&T Labs-Research. She received her Ph.D. degree in Electrical and Computer Engineering from Wireless Information Network Laboratory (WINLAB), Rutgers University, and B.S. and M.S. degrees in Electrical and Electronics Engineering from Bilkent University. She is a fellow of the IEEE and a Distinguished Scholar-Teacher of the University of Maryland. She received the 2003 IEEE Marconi Prize Paper Award in Wireless Communications, an 2005 NSF CAREER Award, the 2010-2011 ISR Outstanding Systems Engineering Faculty Award, and the 2012 George Corcoran Education Award. She is on the Editorial Board of the IEEE Transactions on Green Communications and Networking (2016-). She was an Editor for the IEEE Journal on Selected Areas in Communications-Series on Green Communications and Networking (2015-2016), IEEE Transactions on Information Theory (2007-2010), and IEEE Transactions on Communications (2003-2007). She was a Guest Editor for the IEEE Journal on Selected Areas in Communications (2015 and 2008), Journal of Communications and Networks (2012), and IEEE Transactions on Information Theory (2011). She was a general TPC co-chair of IEEE ISIT 2017, IEEE Globecom 2016, IEEE PIMRC 2014 and IEEE CTW 2011.
Eitan Yaakobi, Technion. Eitan Yaakobi is an Assistant Professor at the Computer Science Department at the Technion — Israel Institute of Technology. He received a PhD. degree in Electrical and Computer Engineering at the University of California, San Diego, under the supervision of Prof. Paul Siegel, Prof. Alexander Vardy, and Prof. Jack Wolf. Between 2011-2014, he was a postdoctoral researcher in the department of Electrical Engineering at the California Institute of Technology, where he worked with Prof. Shuki Bruck and the Center for Memories and Recording Research at the University of California, San Diego, where he worked with Prof. Paul Siegel. He received the Marconi Society Young Scholar Award in 2009 and was a recipient of the Intel Ph.D. fellowship in 2010-2011. His research interests include information and coding theory with applications to non-volatile memories, associative memories, data storage and retrieval, and voting theory.
The high-level goal of this proposal is to provide a coherent introduction to exciting recent developments at the intersection of statistical physics, information theory, and signal processing. Drawing upon our joint expertise in these fields, we will describe the main ideas in the context of three contemporary problems: compressed sensing, matrix factorization, and inference with multilayer (i.e., deep) networks. Specifically, we will focus on: 1) the design of AMP algorithms; 2) connections between AMP algorithms, traditional optimization algorithms, free-energy minimization, and random matrices; 3) the characterization of fundamental problem regimes based on the phase-transition phenomenon; 4) the mapping between canonical high-dimensional inference problems and classical ìspin-glassî models studied in statistical physics; and 5) an overview of recent theoretical advances that provide rigorous and interpretable alternatives to the heuristic replica method from statistical physics.
Philip Schniter is a professor in the Department of Electrical and Computer Engineering at The Ohio State University. He received his PhD in Electrical Engineering from Cornell University in 2000 and held visiting professor positions at Eurecom in 2008, Supelec in 2009, and Duke University in 2016-17. He received the NSF CAREER Award in 2003, the IEEE Signal Processing Society Best Paper Award in 2016 (with Jason Parker and Volkan Cevher), and the Qualcomm Faculty Award in 2017. He was elevated to IEEE Fellow in 2014. Dr. Schniter has published over 50 IEEE journal and conference papers on the theory and application of approximate message-passing algorithms.
Galen Reeves joined the faculty at Duke University in Fall 2013, and is currently an Assistant Professor with a joint appointment in the Department of Electrical Computer Engineering and the Department of Statistical Science. He completed his PhD in Electrical Engineering and Computer Sciences at the University of California, Berkeley in 2011, and he was a postdoctoral associate in the Departments of Statistics at Stanford University from 2011 to 2013. His research interests include information theory and high-dimensional statistics. He received the NSF CAREER award in 2017.
Jean Barbier is an Assistant Professor at the Abdus Salam International Center for Theoretical Physics (ICTP) in Trieste. He received his PhD in 2015 from the Ecole Normale Superieure (ENS) of Paris where he applied statistical physics as well as message-passing algorithms to probabilistic graphical models, with applications ranging from signal processing to combinatorial optimization or coding theory. He then joined the EPFL in Lausanne as a Research Scientist from 2015 until 2018, with a focus on the rigorous aspects of high-dimensional statistical problems. He then occupied the Junior Professor Chair in Data Science at the ENS of Paris before joining the ICTP. His main research interests are in statistical physics and its interdisciplinary connections with statistical inference, information theory and machine learning.
In this tutorial, we first present an information-theoretic framework for cooperative interference management, along with a discussion of practically implementable schemes that approach provable capacity bounds and are applicable in large infrastructural networks. We then go on to discuss three recent advances and their potential impact on next generation wireless networks; namely, exploiting user caching to further empower cooperative interference management, using blockchain-enabled monetary mechanisms to enable distributed coordination in interference networks, and using machine learning techniques for interference management and uncoordinated spectrum access. Our information theoretic framework considers infrastructure networks that are supported by a backhaul network that allows for cooperation between the infrastructure nodes. Further, we focus on the case with limited backhaul rate constraints and strict delay requirements. We show that cooperation among infrastructural nodes, along with flexible message assignments and zero-forcing, can deliver scalable gains in throughput. The schemes that we present rely on local cooperation, and they are shown to be optimal in locally connected networks, where the path loss effect allows us to neglect connections between transmitters and receivers that are far away from each other. We further show that significant gains due to cooperation could be achieved even with no or minimal load on the supporting backhaul network. We also extend these insights to dynamic interference networks that capture the effect of deep fading. We conclude with a discussion of related open problems and research opportunities.
Aly El Gamal received the M.S. degree in Mathematics and the Ph.D. degree in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in 2013 and 2014, respectively. Currently, he is an assistant professor at Purdue University, where he teaches Machine Learning, Signals and Systems, Data Structures and Digital System Design. He is also leading the Purdue and Texas A&M team (BAM! Wireless) in the second DARPA Spectrum Collaboration Challenge (SC2). BAM! Wireless was among the top 10 teams and the top 5 teams in the first and second preliminary events held in Johns Hopkins Applied Physics Laboratory in December 2017 and 2018, respectively, and received prize awards in both. His current research interests include information theory and machine learning. He worked as an intern at the Office of the Chief Scientist of Qualcomm Inc. in 2012, and a postdoctoral research associate at the University of Southern California in 2014. He was also a Simons postdoctoral fellow at the University of Texas, Austin in 2015.
Venugopal V. Veeravalli received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 1992, the M.S. degree from Carnegie-Mellon University in 1987, and the B.Tech degree from Indian Institute of Technology, Bombay (Silver Medal Honors) in 1985. He is currently the Henry Magnuski Professor in the ECE Department at the University of Illinois at Urbana-Champaign, where he also holds appointments with the Department of Statistics, the Coordinated Science Laboratory and the Information Trust Institute. His research interests span the theoretical areas of statistical inference, machine learning, and information theory, with applications to data science, wireless communications, and cyberphysical systems. He is a Fellow of the IEEE, and the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE).