The following exchange is part of an ongoing CS News series in which Cornell Computer Science faculty share some details about their research initiatives and teaching practice. For this session, we speak with assistant professor Immanuel Trummer.
IMMANUEL TRUMMER: At Work Making Database Systems More Efficient and More User-Friendly
Immanuel, can you please give readers a sense of your current research projects and also share a few representative publications on these topics?
My research is about efficiency in data processing. That can be efficiency on the system's side, when trying to reduce computational overheads while processing large data sets. It can also be efficiency on the user's side, trying to save the user's time when working with data via smarter interfaces. In my research, I cover both variants and apply techniques such as optimization, planning, or machine learning to make data processing more efficient.
For instance, in recent work we reduce computational overheads by automatically discovering efficient processing plans via reinforcement learning or trade time for precision by working with bit subsets. To save the user's time, we automatically translate text documents to SQL queries for fact checking (e.g., for verifying COVID-19 claims, an application recently covered in the press) or summarize query results via concise natural language descriptions.
To go a little more in-depth on one project, what should we know about your work on reinforcement learning for query processing?
We are currently developing a new database system, SkinnerDB (described in a recent paper selected for the "Best of SIGMOD" edition), that uses reinforcement learning to select better data processing plans. Picking a good plan is crucial in query processing. For the same query, processing time can range from seconds to hours depending on that choice. Unfortunately, it is very difficult to predict processing time for plan candidates before executing them. Current database systems use hand-crafted rules and simplifying models to choose between plans, a brittle approach. In SkinnerDB, we discover promising plans via reinforcement learning instead.
Using machine learning for database tuning is currently a hot topic in the database community. Most work in that space uses learning-based tuning on top of traditional execution engines. SkinnerDB is based on the hypothesis that exploiting the full potential of learning methods requires a re-design of the database system as a whole. For instance, our execution engine allows switching processing plans during execution with a very high frequency. This caters to a reinforcement learning based optimizer that learns optimal plans by trying out different options for short periods of time while analyzing their performance.
One aspect I like about this work is that it adopts a holistic perspective on query planning and execution. Typically, analytical database systems are designed for maximal raw performance, neglecting the impact of execution engine choices on planning complexity. In SkinnerDB, we show that we can strike interesting tradeoffs between performance and ease of optimization, ultimately resulting in more robust query processing performance.
Looking ahead, can you articulate what you foresee as the long-term positive impact of your work on database systems?
Database systems have long promised to shield users from the intricate details of data processing via declarative interfaces. Currently, they make good on that promise only partially, regularly requiring intervention by skilled users to achieve acceptable performance. Systems such as SkinnerDB remove the need for users to know about data processing internals. In parallel projects on novel query interfaces, we remove the need for users to know formal query languages. Ultimately, our work will contribute towards making data analysis more accessible to broader shares of the population via systems that are self-managing and intuitive to use.
Thank you, Immanuel.
For more on Trummer’s work, see his website, and related CS News, for example, the announcement of his Google Faculty Research Award. For work that specifically addresses the coronavirus pandemic, read more about his new database, CoronaCheck.