Aditya Parameswaran

I am an assistant professor at the University of California, Berkeley, with a joint appointment at the I School and EECS. I am part of the Data Systems & Foundations group and the Human-Computer Interaction group, and I am affiliated with the RISELab and the Berkeley Institute of Design.

My research interests are broadly in building tools for simplifying data analytics, i.e., empowering individuals and teams to leverage and make sense of their datasets more easily, efficiently, and effectively.



We are always looking for postdocs, PhD, MS, and UG students or research/development staff to join our efforts! If you are a postdoc or staff applicant, feel free to email me directly with your CV and qualifications. If you are an aspiring PhD student, please apply to the EECS or I School PhD programs. If you are an MS or UG student, feel free to fill out this form: it is rare that we will work with students outside UC Berkeley except in cases of unusually good fit.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in the School of Information (I School) and Electrical Engineering and Computer Sciences (EECS) at the University of California, Berkeley. Until June 2019, Aditya was an Assistant Professor in Computer Science at the University of Illinois, Urbana-Champaign. He spent a year as a PostDoc at MIT CSAIL following his PhD at Stanford University. He develops systems and algorithms for "human-in-the-loop" data analytics, synthesizing techniques from data management and human-computer interaction.

Click here for a longer bio.

Quick Project Links

                                                 

News

  • August 13, 2021: Doris Lee and Devin Petersohn wrap up their theses on visualization recommendation and dataframe systems respectively. Congratulations Dr. Lee and Dr. Petersohn! Exciting times ahead!!
  • August 9, 2021: Devin Petersohn gives his dissertation talk on dataframe systems. Devin's work has laid the groundwork for how to reason about and optimize dataframe computation. Congrats Devin!
  • August 3, 2021: Stephen Macke wraps up his dissertation! Stephen's work on minimizing error while ensuring interactivity in a range of settings has been a joy to be a part of.
  • July 26, 2021: Tana Wattanawaroon defends his dissertation on supporting efficient computation in spreadsheets, titled "Generalizing Spreadsheet Computation for Evolving Spreadsheets at Scale". Congratulations Tana!
  • July 12, 2021: Our new PhD student, Shreya Shankar, is off to the races with MLTrace, a lightweight approach for instrumenting ML code to allow for end-to-end debugging, reproducibility, and introspection. Super exciting stuff; watch this space!
  • July 1, 2021: Our scalable dataframe system, Modin, has over 6000 github stars and over 1M downloads at this point. We're excited that there is so much organic traction and interest!
  • May 14, 2021: Our technical report on Lux is out! Lux has users across a range of industries at this point, including insurance, retail, and education, and thousands of github stars. Our paper introduces our always-on visualization framework, our lightweight intent language, and how we made Lux interactive when operating on large dataframes.
  • Apr 19, 2021: New Lux release; v0.3.0 supports a relational database backend, geovis, Jupyter lab, and matplotlib instead of altair, and more! Dashboard export (to streamlit, data pane, HTML, etc. forthcoming.) More here.
  • Apr 16, 2021: Doris Xin presents her dissertation talk on "Usable and Efficient Systems for Machine Learning"! Congratulations Dr. Xin! Doris is off to be a CEO at a new startup!
  • Apr 13, 2021: Doris Lee's paper in collaboration with folks at Tableau Research (Vidya Setlur, Melanie Tory) on developing a taxonomy for visualization recommendation was accepted at VIS 2021 via TVCG!
  • March 26, 2021: Doris Xin's paper in collaboration with Google folks on understanding production machine learning pipelines in TFX (along with my longtime collaborator Alkis Polyzotis and Hui Miao) was accepted as an industry paper at SIGMOD'21!
  • March 15, 2021: Our paper on leveraging think-time for opportunistic dataframe query evaluation was published at IEEE Data Engineering Bulletin and showcased at ML and Data Projects to know
  • February 15, 2021: Thrilled to see continued Lux adoption and interest. We had another industry blog post, yet another one, and hit over 700 stars on github.
  • January 15, 2021: Stephen Macke won the CIDR gong show for presenting his take on the next generation of notebooks! Devin Petersohn also presented his work on scalable dataframes.
  • January 10, 2021: Dual VLDB'21 accepts! Our paper on a general-purpose spreadsheet exploration tool, enabling zoom in/out, called NOAH, led by Sajjadur Rahman, was one. Our paper on NBSafety, a Jupyter kernel for safe notebook interactions, led by Stephen Macke, was another.
  • January 6, 2021: A virtual welcome to our new postdoc, Dixin Tang, who is joining us from U Chicago having worked with Aaron Elmore and Mike Franklin.
  • January 1, 2021: I took on the role of the Faculty Equity Advisor at the School of Information. See more here.
        Click here for more news.

Synergistic Activities

I serve as the Faculty Equity Advisor at the School of Information. I am a Co-Chair of Workshops for SIGMOD 2020 and 2021. Please send us your exciting and engaging community-building ideas! Interdisciplinary/novel workshop ideas encouraged. Here is the call for proposals. I am the US Sponsor Chair for VLDB 2021. Please reach out if you'd like to sponsor VLDB 2021!

I serve on the steering committees of HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of databases, data mining, and visualization/HCI - join us!

In the recent past, I am serving as an Area/Associate Chair for HCOMP 2020, VLDB 2020, and SIGMOD 2020, as a Program Committee member for VLDB Demo 2019 and HILDA 2019 (phew!) I've served on the program committees of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.

Past: I stepped down as Associate Editor for SIGMOD Record after nearly half a decade. I co-organized HILDA 2017. I was the SIGMOD 2016 Undergraduate Research Chair.

Recent Releases



Medium Blog




Selected Projects

lux

Lux: An always-on visualization recommendation system

Lux is a tool for effortlessly visualizing insights from very large data sets in dataframe workflows. Lux builds on half a decade of work on visualization recommendation systems.

Project page here.


helix

Helix: An Accelerated Human-in-the-loop Machine Learning System

Helix accelerates the iterative development of machine learning pipelines with a human developer "in the loop" via intelligent assistance and reuse.

Project page here.


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here


Datasift

Orpheus: Relational Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets. OrpheusDB is a component of DataHub focused on using a relational database for versioning.

Project page: here


crowd-alg

Populace: A Suite of Crowd-Powered Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error. Since 2014, our focus has been on optimizing open-ended crowdsourcing: an understudied and challenging class.

Project page: here