Advanced Computer Vision with Deep Learning

This two day workshop is designed to give participants the skills needed to build cutting edge Computer Vision models with Deep Learning.

Course Details
Content
Course Duration

What You'll Learn

✓ Building networks such as Resnets & Inception Networks
✓ How to build Segmentation models with UNets
✓ How facial recognition models and their networks work

✓ How Object Detection Networks work
✓ Doing Image translation with Pix2Pix Networks
✓ Using Advanced Image Augmentation strategies

Course Overview

In the second course in the series, we venture beyond the basic skills to delve into advanced usage of Convolutional Neural Networks and modern image network architectures. This journey takes us from old school classification and object detection right up into the future of image segmentation with Meta's Segment Anything Model (SAM), a groundbreaking project that democratizes segmentation, allowing for adaptation to specific tasks without additional training. This makes SAM versatile for a broad set of use cases, from underwater photography to cell microscopy.

As we understand the current state-of-the-art technologies, we will review the history of ImageNet winning models, the impact of Inception, Residual architectures, EfficientNet, and Vision Transformers, and how they enable the field to go beyond hand-engineered models. Alongside these foundational concepts, we will explore the fascinating world of Stable Diffusion, where randomness meets structure, and chaos turns into order through neural networks like VAEs and GANs.

Our exploration continues with a deep dive into advanced skills and techniques such as object detection with models like the YOLO series, person tracking with Deep Sort, and pixel-level image segmentation with U-Nets and DenseNets. We will also examine top models using Transformers, and the principles of Generative Adversarial Networks (GANs), including how StyleGAN architectures have evolved over time.

One of the exciting new areas we'll touch on is Diffusion Models, guiding random noise into meaningful structures for tasks like image denoising and inpainting. We'll also introduce CLIP, and BLIP-2, visual-language pre-training paradigms that bridge the gap between vision and language models, allowing state-of-the-art results for tasks like visual Q&A.

Participants will have the opportunity to build with tools like Keras/TensorFlow, PyTorch, JAX and TorchVision, which are often used for cutting-edge computer vision research. Hands-on experience will be an essential part of the course, allowing participants to build models and apply new skills to projects in their field.

The course duration is 3 days, encompassing 28 hours including online sessions. It offers an introduction to PyTorch and TorchVision, advanced classification, objection detection, and the skills to create applications like image search and similarity comparisons.

Join us in this comprehensive exploration of computer vision's newest frontiers, arming yourself with skills to apply in your own area of work. This workshop brings together the past, present, and future of computer vision, offering a thrilling opportunity to learn and grow in this dynamic field.

In this course, participants will learn:

Advanced usage of Convolutional Neural Networks and modern image network architectures
Understanding and working with the Segment Anything Model (SAM) for versatile image segmentation
Exploration of Stable Diffusion, where randomness is transformed into meaningful structures
Techniques for object detection, person tracking, and pixel-level image segmentation with models like YOLO, Deep Sort, U-Nets, and DenseNets
Examination of Transformers in computer vision and principles of Generative Adversarial Networks (GANs), including StyleGAN architectures
Introduction to cutting-edge concepts like Diffusion Models and BLIP-2
Building experience with tools like Keras/TensorFlow, PyTorch, JAX and TorchVision
Hands-on practice in creating applications for advanced classification, objection detection, image search, and similarity comparisons
Acquire skills to create applications like image search and similarity comparisons
To build various types of Deep Learning Computer Vision models
Learn about image segmentation and classifying at the pixel level with architectures like U-Nets and DenseNets and how they are used in a variety of image segmentation tasks
Tracking people and objects through videos
Person Re-identification and object tracking with Deep Sort
ViT - Vision Transformer
VLMs such as PaliGemma and Phi-3 Vision

Duration

3 days live + 7 hours online

Pricing

$2700 per pax

* Please contact us for group discounts

Prerequisites

A solid understanding of Deep Learning

Technologies we teach will include:

Certificate

Earn a certificate upon completion

Training Level

Intermediate Level

Time to Complete

Approx. 28 hours to complete

Download Brochure

Why Study Artificial Inteligence?

1. Demand for AI/DL jobs has never been at this all time high.

2. Developers need these skills

3. AI/DL Jobs pay more than standard developer jobs

Request More Info

Sign up to receive additional information about this course. Find out what other learners are doing with the skills they gamed, and evaluate if this course is the right fit for you.

Frequently Asked Questions

Do I have to log in at a set time? How does the 360-degree assessment work? At this point, you probably have a few questions, and we’ve got answers.

About RDAI

About us