Key-Value Stores (2017-now)
Managing large and frequently updated application state efficiently and reliably is a hard problem in the cloud and edge. Following my learnings from state management challenges in Trill, I designed and built a new high-performance key-value store called FASTER. FASTER bridges the gap between larger-than-memory and pure in-memory data structures using a novel hybrid log organization. FASTER is now open source, available in C# and C++, with more than 3300 stars on GitHub. Read about the technology in our research paper, and from the project website.
FASTER employs a new scalable recovery model called Concurrent Prefix Recovery (CPR), which is also applicable to traditional databases, and avoids the overhead of a separate write-ahead log. Learn about CPR in our new research paper.
Streaming Analytics (2008-now)
Starting 2012, I led the creation of Trill, a high-performance incremental analytics engine built as a C# .NET library. Trill employs a new “one-size-fits-many” system architecture that provides best-of-breed or better performance across a diverse range of analytics styles and latency needs. You can learn more about Trill from the research paper or from my VLDB slides. Trill is now open source (usage samples). Visit the project website for more information on Trill.
Prior to Trill, I worked on stream processing research in areas such as out-of-order processing, recursive streaming, pattern detection, latency estimation, unifying real-time and offline analytics, and progressive offline analytics and visualization. My work on streams shipped commercially as part of Microsoft SQL Server, as the StreamInsight engine.
Distributed Processing (2014-now)
I am interested in distributed data processing and resilient microservice architectures. Quill is a distributed system that leverages Trill for temporal analytics. CRA is my open-source framework that makes it easy to author resilient distributed platforms and applications such as Quill. Learn about CRA from our technical report and from our short paper to appear at ICDE 2019. Recently, we used CRA to build Ambrosia, a microservices framework based on robust exactly-once message delivery. Read about Ambrosia in our technical report.
Raw Data Processing (2016-now)
I am interested in data processing over raw data such as CSV and JSON, where the schema may not always be known a priori. A core building block for raw data processing is parsing. My colleague Yinan led the creation of a SIMD-based parser for raw data, called Mison. You can read about Mison in our research paper. We also recently designed a new technique that makes it possible to parse large CSV files in parallel. This work will appear in SIGMOD 2019 (details will be posted soon).
Recently, we built a storage layer for flexible-schema data called FishStore. This work will appear at SIGMOD 2019, and I will post more details soon.