HCI · Accessibility · Input Devices
A mouse-shaped input device with three independently rotatable wheels for faster GUI navigation.
Computer Vision · VLMs · Evaluation
A heatmap-based tool for evaluating VLM reliability without ground truth across video frames.
HCI · Accessibility · Mobile
An automatic space compactor for low-vision smartphone users, reducing whitespace under magnification.
HCI · Accessibility · Desktop
A probabilistic framework for estimating perceived accessibility of desktop apps in non-visual interaction.
Accessibility · Computer Vision
A taxonomy and benchmark of 90 objects crucial for blind and low-vision individuals' navigation.
LLMs · Education · Agentic AI
A teacher-guided classroom LLM assistant that adapts per assignment with instructor-defined guardrails.
Computer Vision · Video Segmentation
A model-agnostic temporal consistency loss improving VOS robustness under occlusion and reappearance.
HCI · Accessibility · VR/XR
An open-source accessibility API for 3D and VR apps enabling numerous accessibility features in existing Unity games.
Computer Vision · 3D Scenes · LLMs
A structured motion grammar replacing pixel inputs with engine telemetry for 3D scene reasoning.
Computer Vision · Video Analysis · Systems
A local-first video investigation pipeline for grounded text-to-timestamp retrieval and event detection.
LLMs · PEFT · Machine Unlearning
A parameter-efficient unlearning pipeline using LoRA to remove targeted behaviors from LLMs.
LLMs · Clinical Drafting · Accessibility
An LLM-based analyzer for generating PAALSS-style reports from Spanish aided AAC transcripts via Ollama.
Computer Vision · Vision-Language Models · Evaluation
Problem
Large multimodal models (LMMs) like GPT-4V can process text, images, and video — but unlike humans, their outputs often lack common sense and can be inconsistent across modalities. In multi-label video object recognition, a model might identify an object in one frame but miss it in the next, even when frames are nearly identical.
Automated metrics like F1 or average precision cannot capture such inconsistencies, making human evaluation essential. But there was no simple tool that let non-experts assess whether a model was actually performing reliably.
Solution
IKIWISI (I-Know-It-When-I-See-It) is a lightweight, intuitive tool that visualizes LMM prediction inconsistencies across video frames via binary heatmaps (Figure 1):
Outcomes
Impact
IKIWISI is among the first tools to enable intuitive inconsistency diagnosis across LMM outputs without requiring code or model access. It supports faster and more inclusive evaluation workflows for researchers, QA teams, and designers, and can be integrated into model audit pipelines and prompt refinement tools.
HCI · Accessibility · Assistive Technology
Problem
Blind users often face major challenges when navigating modern graphical user interfaces, especially when interface elements are deeply nested, unlabeled, or visually grouped without accessible structure.
Traditional screen readers and keyboard shortcuts are often linear and slow, making it difficult to work efficiently in complex software or perform tasks such as remote technical support. These barriers can increase reliance on sighted assistance and reduce digital autonomy.
Solution
Wheeler is a mouse-shaped device with three independently rotatable wheels, where each wheel controls a different axis or level of navigation (Figure 1).
Evaluation
Outcomes
Impact
Wheeler introduces a new rotary, parallel input paradigm for non-visual interaction. It enables faster and more structured access to complex application hierarchies, supports interaction with otherwise inaccessible interface components, and offers a low-cost augmentative tool that works alongside screen readers.
HCI · Accessibility · Mobile Interfaces
Problem
Low-vision users often rely on screen magnifiers to interact with smartphone interfaces. But conventional magnifiers enlarge both useful content and redundant whitespace, which forces excessive panning and frequently causes loss of context.
Because these tools do not reason about layout structure, they can make everyday mobile interaction slower, more effortful, and more cognitively demanding.
Solution
SpaceXMag is an optimization-based system that automatically compacts whitespace in smartphone UIs while preserving structural relationships between interface elements.
Evaluation
Outcomes
Impact
SpaceXMag demonstrates that DOM-aware, optimization-driven layout restructuring can make magnification substantially more usable in real-world mobile interfaces. It points toward smarter adaptive accessibility tools that preserve context instead of simply enlarging pixels.
LLMs · Education · Classroom Systems
Problem
In classroom settings, general-purpose LLMs can easily over-help, provide direct solutions too quickly, or ignore the structure and expectations of a specific assignment. This makes it hard for instructors to use AI support without undermining students' own reasoning process.
Instructors also need a practical way to adapt AI behavior across assignments, manage student access, and review interactions in a structured classroom environment.
Solution
TeachPilot (shown in the demo above) is a teacher-guided LLM assistant that changes behavior per assignment and applies instructor-defined guardrails before sending requests to the model.
System Design
Outcomes
Impact
TeachPilot shows how classroom AI can be structured as an instructor-shaped system rather than a generic chatbot. By combining assignment-aware prompting, role-based controls, and persistent conversation management, it supports more accountable and pedagogically aligned AI use in education.
HCI · Accessibility · Computer Vision
Problem
Blind and low-vision (BLV) individuals face major challenges when navigating urban and indoor environments, but existing computer vision datasets often miss many objects that are critical for safe movement.
Common datasets such as ImageNet and MS-COCO do not adequately represent hazards and navigation-relevant objects like overhanging branches, sidewalk pits, or bus stops, limiting the usefulness of current vision systems for accessibility.
Solution
This project created an accessibility-centered object taxonomy and benchmark specifically for BLV navigation.
Outcomes
Impact
This work highlights a major gap between state-of-the-art vision systems and the real object awareness needed for safe BLV navigation. By releasing a benchmark dataset and object taxonomy grounded in accessibility needs, it lays the foundation for more inclusive navigation aids and future accessibility-focused vision models.
HCI · Accessibility · Non-Visual Interaction
Problem
Blind screen reader users rely on keyboard navigation to interact with desktop applications, but navigating menus, toolbars, dialogs, and nested interface structures is often slow, inconsistent, and frustrating.
Existing accessibility evaluation methods usually focus on compliance checks or labor-intensive user studies. They do not provide a scalable way to estimate perceived accessibility — how accessible an application actually feels to blind users during real non-visual interaction.
Solution
This project introduces a probabilistic interaction framework for estimating perceived accessibility from the structure of desktop interfaces and keystroke-based navigation behavior.
Outcomes
Impact
This work provides an automated, user-centered way to estimate perceived accessibility for non-visual desktop interaction. It helps developers and auditors compare applications, identify concrete navigation bottlenecks, and move accessibility evaluation toward scalable, interface-aware assessment rather than compliance alone.
HCI · Accessibility · XR Systems
Problem
Blind and low-vision users face major barriers in Unity-based 3D and VR applications because these apps often lack screen reader support, accessible UI components, and effective keyboard navigation.
Existing accessibility plugins are fragmented, visually oriented, and usually must be built into the game during development, which makes them hard to retrofit and inadequate for non-visual interaction in existing immersive applications.
Solution
OpenAccAPI is an extensible accessibility API for 3D and VR applications that exposes UI and scene elements to assistive technologies and supports retrofitting accessibility into existing Unity apps.
Evaluation
Impact
OpenAccAPI introduces a comprehensive accessibility API for 3D and VR apps aimed at non-visual interaction. It creates a practical path toward more accessible immersive software and lays groundwork for future research in accessible XR design and agent-based non-visual navigation.
Computer Vision · Video Retrieval · Investigation Systems
Problem
Video investigations are often slow, repetitive, and difficult to standardize. Analysts need to search long recordings, identify meaningful incidents, compare events across cameras, and document evidence in a repeatable way.
Many practical workflows also need to remain lightweight and explainable, without requiring expensive model training or opaque end-to-end systems.
Solution
ForenSight (shwon in the demo above) is a local-first, no-training investigation pipeline that combines pretrained vision embeddings with lightweight, explainable rules for search, review, and incident analysis.
Impact
ForenSight demonstrates a practical middle ground between manual video review and heavyweight learned surveillance systems. It emphasizes grounded retrieval, explainable event logic, reproducible case documentation, and local-first deployment for more transparent investigation workflows.
LLMs · PEFT · Unlearning Systems
Problem
Large language models can retain undesirable behaviors, biased patterns, or targeted knowledge that developers may later want to remove. Full retraining is expensive, slow, and often impractical for iterative experimentation.
Solution
This project builds a parameter-efficient unlearning workflow using synthetic data generation and LoRA-based fine-tuning.
Impact
This project explores a practical path toward controllable, low-cost machine unlearning for large language models. By combining synthetic data with parameter-efficient fine-tuning, it supports faster experimentation on selective forgetting, safety adaptation, and post-hoc model editing.
Computer Vision · 3D Worlds · Language Reasoning
Problem
Current video-language models are built around rendered pixels or estimated features such as flow and depth. In complex 3D and virtual environments, that makes scene understanding expensive and often brittle, because the models do not directly observe the object-level dynamics that actually drive the scene.
Interactive worlds already contain rich internal state — object identities, positions, scales, visibilities, hierarchies, colliders, and activation flags — but that information is rarely exposed in a form that language models can reason over cleanly.
Solution
OTN converts raw engine telemetry into a structured, language-ready representation of scene dynamics. Instead of asking a model to infer motion from pixels, it provides an object-centric motion grammar that explicitly summarizes how entities appear, move, and interact over time.
Outcomes
Impact
OTN offers a different interface between 3D simulation and language understanding: instead of scaling bigger pixel-based models, it leverages engine-native structure to make reasoning cheaper, more explicit, and easier to audit.
Computer Vision · Video Object Segmentation · Learning Objectives
Problem
Modern video object segmentation models use memory modules and propagation mechanisms to maintain temporal correspondence, but they are still trained mostly with frame-level losses such as cross-entropy and Tversky loss.
That creates a mismatch: the architecture is temporal, but the training signal is largely static. As a result, models often fail in scenarios where temporal consistency matters most, especially with small objects, heavy occlusion, and objects that disappear and later re-emerge.
Solution
This project introduces an explicit temporal consistency loss that can wrap around existing semi-supervised VOS training pipelines without requiring architectural changes (Figure 2).
Outcomes
Impact
This work shifts part of temporal reasoning from architecture design into the training objective itself. By making temporal consistency an explicit optimization target, it offers a simple and reusable way to improve VOS robustness in real-world challenging videos without redesigning the underlying model.
LLMs · Clinical Drafting · AAC Analysis
Problem
PAALSS-style analysis of aided AAC transcripts is detail-heavy and formatting-sensitive. Researchers and clinicians often need to clean transcripts, structure utterances consistently, prompt the model carefully, and turn the output into a usable draft report.
Solution
PAALSS Analyzer (shown in the demo above) is a lightweight Streamlit app that turns transcript analysis into a controlled workflow rather than an ad hoc chat interaction.
Impact
PAALSS Analyzer packages transcript parsing, prompt control, bilingual interface support, model selection, and document export into a focused research workflow. It is best framed as a drafting-support tool for structured AAC analysis, not as a diagnostic system.