Hello! I am an AI researcher and PhD student working on deep learning model explainability at École Polytechnique Fédérale de Lausanne (EPFL). I'm coadvised by Prof. Tanja Käser at the ML4Ed Lab and Prof. Martin Jaggi at the MLO Lab.
Before moving to Switzerland, I worked for two years at Microsoft AI as a lead engineer for the Open Neural Network eXchange project.
My claim to fame (haha) is that I graduated at 20 as the youngest M.S. in Computer Science recipient in UC Berkeley's history. Since then, I've served as a machine learning lecturer for the Berkeley Division of Data Sciences
and the
University of Washington CSE Department.
I love people, data, and working on exciting problems at the intersection of the two:
Thank you for taking time out of your day to find out what I do with mine!
FATED Workshop Co-Chair @ EDM 2022
WiML Program Chair @ ICML 2022
AIED Program Committee 2023
AIED 2021*, 2022* (Subreviewer for Tanja Käser)
EMNLP BlackBoxNLP 2021, 2022, 2023
EACL 2022
EDM 2023
Journal of Educational Data Mining (JEDM) 2022
LAK 2022*, 2023* (Subreviewer for Tanja Käser)
Editor for Springer Series on Big Data Management (Educational Data Science)
Fairness Working Group @ EDM 2022
WiML Workshop Team @ NeurIPS 2021
Lead of the 2020 ONNX SIG for Models and Tutorials
We implement five state-of-the-art methodologies for explaining black-box machine learning models (LIME, PermutationSHAP, KernelSHAP, DiCE, CEM) on the downstream task of student performance prediction for five massive open online courses. Our experiments demonstrate that the families of explainers do not agree with each other on feature importance for the same Bidirectional LSTM models with the same representative set of students. We use Principal Component Analysis, Jensen-Shannon distance, and Spearman's rank-order correlation to quantitatively cross-examine explanations across methods and courses. Our results come to the concerning conclusion that the choice of explainer is an important decision and is in fact paramount to the interpretation of the predictive results, even more so than the course the model is trained on.
Authors: Vinitra Swamy, Bahar Radhmehr, Natasa Krco, Professor Mirko Marras, and Professor Tanja Käser. Talk and paper featured at Educational Data Mining (EDM) 2022 in Durham, UK.
[Paper] [Pre-Print] [Slides] [Code]Our goal is to validate explainers for student success prediction across controlled differences in online and blended learning course design. Our analyses cover five course pairs that differ in one educationally relevant aspect and two popular instance-based explainable AI methods (LIME and SHAP). We quantitatively compare the distances between the explanations across courses and methods, then validate the explanations of LIME and SHAP with 26 semi-structured interviews of university-level educators regarding which features they believe contribute most to student success, which explanations they trust most, and how they could transform these insights into actionable course design decisions. Our results show that quantitatively, explainers significantly disagree with each other about what is important, and qualitatively, experts themselves do not agree on which explanations are most trustworthy.
Authors: Vinitra Swamy, Sijia Du, Professor Mirko Marras, and Professor Tanja Käser. Nominated for best full paper at Learning Analytics and Knowledge (LAK) 2023 in Arlington, Texas.
[Paper] [Pre-Print] [Code]RIPPLE utilizes irregular multivariate time series modeling with graph neural networks to achieve comparable or better accuracy with raw time series clickstreams in comparison to hand-crafted features. Furthermore, we extend concept activation vectors for interpretability in raw time series models. Our experimental analysis on 23 MOOCs with millions of combined interactions over six behavioral dimensions show that models designed with our approach can (i) beat state-of-the-art educational time series baselines with no feature extraction and (ii) provide interpretable insights for personalized interventions.
Authors: Mohammad Asadi (intern), Vinitra Swamy, Jibril Frej, Julien Vignoud, Mirko Marras, Tanja Käser. Talk and paper featured at AAAI 2023 in Washington, DC.
[Paper] [Pre-Print] [Slides] [Code]We tackle the problem of transferability across MOOCs from different domains and topics, focusing on models for early success prediction. In this paper, we present and analyze three novel strategies to creating generalizable models: 1) pre-training a model on a large set of diverse courses, 2) leveraging the pre-trained model by including meta features about courses to orient downstream tasks, and 3) fine-tuning the meta transfer learning model on previous course iterations. Our experiments on 26 MOOCs with over 145,000 combined enrollments and millions of interactions show that models combining interaction data and course information have comparable or better performance than models which have access to previous iterations of the course. With these models, we aim to effectively enable educators to warm-start their predictions for new and ongoing courses.
Authors: Vinitra Swamy, Professor Mirko Marras, and Professor Tanja Käser. Talk and paper published at ACM Learning @ Scale 2022 in NYC.
[Paper] [Pre-Print] [Slides] [Code]While transformer-based language models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. In this paper, we compare BERT-based language models (DistilBERT, BERT, RoBERTa) through snapshots of acquired knowledge at sequential stages of the training process. We contribute a quantitative framework to compare language models through knowledge graph extraction and showcase a part-of-speech analysis to identify the linguistic strengths of each model variant. Using these metrics, machine learning practitioners can compare models, diagnose their models' behavioral strengths and weaknesses, and identify new targeted datasets to improve model performance.
Authors: Vinitra Swamy, Angelika Romanou, and Professor Martin Jaggi. Talk, poster, and paper published at eXplainable AI for Debugging and Diagnosis at NeurIPS 2021.
[Paper] [Poster] [Code]Open Neural Network Exchange (ONNX) is an open standard for machine learning interoperability. Founded by Microsoft and Facebook, and now supported by over 30 other companies, ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
[ONNX + Azure ML Tutorials]Gave a talk to data scientists and engineers at MLADS Spring 2019 on model operationalization and acceleration with ONNX alongside Emma Ning, Spandan Tiwari, Nathan Yan, and Lara Haidar-Ahmad.
Overview of AI model interoperability with ONNX and ONNX Runtime for data scientists and researchers at University of Washington, Seattle.
[Slides]
A detailed research report on autograding, analytics, and scaling JupyterHub infrastructure highlighted in use for thousands of students taking Data 8 at UC Berkeley. Presented as a graduate student affiliated with RISELab.
[Thesis]
Knowledge Tracing is a body of learning science literature that seeks to model student knowledge acquisition through their interaction with coursework. This project uses a recurrent neural network (LSTM) to optimize prediction of student performance in large scale computer science classes.
Authors: Vinitra Swamy, Samuel Lau, Allen Guo, Madeline Wu, Wilton Wu, Professor Zachary Pardos, Professor David Culler on a short paper published at published at Artificial Intelligence in Education / International Festival of Learning 2018 in London, England.
Helped develop UC Berkeley data science's software infrastructure stack including JupyterHub, autograding with OkPy, Gradescope, and authentication for 1000s of students.
Collaborated with Yuvi Panda, Ryan Lovett, Chris Holdgraf, and Gunjan Baid on a talk detailing the infrastructure stack at JupyterCon 2017.
Partner of the Campus Fund, a student venture arm of Wingman Ventures. We evaluate and invest pre-seed in startups and learn about VC from talented mentors and entrepreneurs. The Campus Fund operates in Switzerland through ETH Zurich, EPFL and HSG.
Working on a framework for deep learning / ML framework interoperability (ONNX) alongside an ecosystem of converters, containers, and inference engines.
Lead of the inter-company ONNX Special Interest Group (SIG) for Model Zoo and Tutorials with Microsoft, Intel, Facebook, IBM, nVidia, RedHat, and other academic and industry collaborators.
Attended several conferences as a representative of Microsoft AI: WIDS 2020, Microsoft //Build 2019, KDD 2019, Microsoft Research Faculty Summit 2019, UC Berkeley AI for Social Impact Conference 2018, Women in Cloud Summit 2018, RISECamp 2018
Worked on projects in AI + Systems with an application area of data science education. Project areas include JupyterHub architecture, custom deployments, OkPy autograding integration, Jupyter noteboook extensions, and D3.js / PlotLy visualizations for data science explorations of funding and enrollment data.
Worked on the CSi2 project as a Machine Learning Research Scientist intern on the Hybrid Cloud team. Presented an exit talk and filed 2 patents.
Interned at LinkedIn headquarters with the Growth Division's Search Engine Optimization (SEO) Team the summer before entering UC Berkeley. Worked on fullstack testing infrastructure for the public profile pages, as well as a Hadoop project; outside of assigned work, helped plan LinkedIn’s DevelopHER Hackathon and worked on several Market Research/User Experience Design initiatives.
Spent a summer learning computer science fundamentals and shadowing engineers through the CAPE high school internship program at Google Headquarters in Mountain View, CA. Chosen as a Google Ambassador for Computer Science following the experience. Worked with Google, Salesforce, and AT&T to introduce coding to over 15,000 girls across California with the Made w/ Code Initiative.
[Data 8 Website] [Course Offering] [Course Materials / Code]