As part of the AI+X Global Talent Community series, BlendED hosted a workshop exploring one of today’s most transformative technologies: computer vision. Students from the UK, Korea, Singapore, Europe, and the U.S. joined to learn how modern AI systems interpret the world, generate realistic imagery, and reconstruct 3D environments. This session offered a comprehensive introduction to the foundations, evolution, and future directions of visual intelligence.
About the Speaker
Researcher at MIT Media Lab; NSF Graduate Research Fellow
Sid specializes in vision systems for extreme environments, including low-light, high-speed, and non-line-of-sight imaging. His research spans:
Single-photon sensing
Physics-based and neural vision
Generative modeling
3D reconstruction
He holds an MS from MIT and a BS in Electrical Engineering from UCLA. Sid also collaborates with MIT Media Lab researchers and contributes to next-generation vision systems that extend beyond human capabilities.
“Vision is the process of discovering from an image what is present in the world and where it is.” — Sid, referencing David Marr’s classic definition
Why This Topic?
Computer vision sits at the intersection of data, algorithms, and sensing, powering everything from robotics and autonomous vehicles to AR/VR, medical imaging, and industrial inspection.
With rapid advances in deep learning, 3D modeling, and generative AI, understanding how machines “see” is becoming a foundational skill for the next wave of innovators.
This workshop helps students build intuition for:
How AI interprets images
Why 3D reconstruction has become mainstream
The evolution from classical vision → deep learning → generative modeling
How modern sensors go beyond human limits
Key Insights from the Session
1. What Does It Mean to See?
Sid explained that human and machine vision share the same goal: understanding what is in the world and where. Vision allows systems to interpret meaning without touching or interacting physically — a core requirement for robotics and autonomous systems.
2. Three Levels of Vision
Sid introduced the classical pipeline:
Low-level: depth, materials, edges
Mid-level: regions, boundaries, motion
High-level: objects, semantics, pose estimation
3. Why Computer Vision Is Hard
Humans see meaning; computers see numbers. Translating pixel values into semantic understanding is the central challenge of the field.
4. The History of AI Comes in Waves
From perceptrons in the 1950s → the AI Winter → CNN breakthroughs (AlexNet) → today’s foundation models, Sid walked through how progress has repeatedly surged after periods of stagnation.
5. Generative Modeling Is Transforming Vision
Diffusion models, GANs, and VAEs now generate imagery with realism once thought impossible.
Sid showed examples of:
Realistic cityscapes
Scene simulation for autonomous driving
Multi-scenario predictions from a single frame
6. From 2D → 3D: The Rise of NeRF
The breakthrough of Neural Radiance Fields (NeRF) enables AI to construct 3D worlds from ordinary images. This technique now powers:
Google Maps 3D views
AR/VR content
Medical imaging
Automotive visualization
7. Inverse Graphics Explained
Computer vision is fundamentally about reversing the rendering process—inferring real-world shape, material, and structure from 2D projections.
8. Discriminative vs. Generative Intelligence
A complete vision system must both recognize (discriminative) and predict/simulate (generative). Foundation models now unify both capabilities within a single framework.
9. Beyond Human Sensing
AI vision isn’t limited to RGB cameras.
Sid demonstrated advanced sensing that lets machines:
See in total darkness
See around corners
See through fog
Measure heat signatures
Track light propagation inside materials
10. The Future of Vision: AI + Sensing + Simulation
Modern computer vision sits at the intersection of physics-based sensing, neural learning, and simulation—opening the door to robotics, digital twins, AR, and science applications.
Q&A Highlights
Participants asked:
“Is computer vision overhyped?”
Sid explained that the opposite is true—progress is so strong that researchers sometimes worry the field is “solved,” but robotics and real-world deployment continue to reveal open challenges.
“What tools power the demos?”
Techniques include optical flow networks (RAFT), keypoint tracking, thermal imaging, depth sensors, and high-speed cameras depending on the task.
“How can students enter this field?”
Graduate study, research internships, and project-based learning—such as BlendED’s AI+X Learning Plan—are strong pathways.
Related Learning Opportunity: AI in Visual Computing (2026)
Sid will be leading a hands-on, project-based AI in Visual Computing course within the AI+X Learning Plan in early 2026. Tracks include:
3D Modeling & Neural Radiance Fields
Object Detection & Classification
Image Synthesis & Generative AI
Extreme Sensing & Physics-based Vision
Students work on real datasets and build portfolio-ready projects.
Looking Ahead
Computer vision continues to evolve rapidly, with breakthroughs emerging in 3D reconstruction, generative simulation, and sensing technologies. As Sid emphasized, innovation comes in cycles—but the field is far from saturated. The next wave will shape robotics, biomedical imaging, AR/VR, and beyond.
Join the AI+X Community
Become part of a global network of learners exploring AI in biology, engineering, business, hardware, and more.
Join our future AI+X workshops
Create your free GTC account to stay updated on global events
Explore upcoming PBL projects, including AI & Cybersecurity
Visit us in Boston for the 2026 Winter or Summer AI+X On-Campus Experience
📺 Watch the Replay
Couldn’t join live? Don’t miss this in-depth discussion and Q&A.