Large Language Models: Understanding Transformer Architectures

  • Online

In this course, learners will understand the fundamental components of transformer architectures—the backbone of modern LLMs. The course provides an in-depth analysis of the original transformer model. It delves into the key components, including the scaled dot-product self-attention mechanism, multi-head attention, positional encoding, feedforward neural networks, layer normalization, and residual connections. By the end of the course, learners will understand each component and how they work together to enable parallelization and capture long-range dependencies in sequences, addressing the limitations of previous RNN-based models.

To reinforce learning, the course includes a programming exercise that introduces learners to a basic implementation of the self-attention mechanism from the original transformer model using Python and NumPy.

What you will learn:

  • Foundational principles of transformers from core components to a fully integrated system
  • Precise mathematical explanations, clear visuals, and practice coding exercises
  • Core architecture: embeddings, positional encoding, and key mathematical principles driving transformer models
  • Self attention: trace the derivation of queries, keys, values and why scaling is crucial
  • Core operations: understand linear transformations, softmax, and ReLU
  • Multi-head Attention: how multiple heads enrich feature representations
  • Integration: the role of feed-forward networks, residual connections, and layer normalization in stabilizing training and improving model performance
    Hands-on practice: apply the theoretical concepts through a code implementation of transformers

This course is part of the following course program:

Large Language Models Demystified

Instructors

Sai Chand Boyapati Photo

Sai Chand Boyapati

Mr. Boyapati is an internationally recognized expert in software quality assurance (QA), whose groundbreaking work has had a transformative impact on industries worldwide. His influence and contribution in testing span major developments in software products that have redefined their markets.

Mr. Boyapati holds a critical leadership role as Director of Software Quality Assurance in a globally distinguished organization.

In addition to his technical achievements, Mr. Boyapati has served as a peer reviewer and judge in authoritative capacities. He has evaluated numerous research papers for prestigious conferences and hackathons on AI & LLM’s.

He has written extensively on QA, cybersecurity, and artificial intelligence/LLM’s, with articles published in Media. His book, Focus on QA: Redefining Software Testing in the AI-Driven Era, became a bestseller upon release, providing invaluable insights into applying AI to QA processes

Hamza Mohammed, Course Editor Photo

Hamza Mohammed, Course Editor

Hamza Mohammed is a Machine Learning Engineer with Samsung Research America. He is an industry expert in deep learning and reinforcement learning, specializing in large language models. Mr. Hamza has a proven research and industry track record applying, optimizing, and accelerating, deep learning and reinforcement learning in various disciplines, including computer vision, natural language processing (including multi-modal modeling), robotics and automation, software engineering and testing, autonomous navigation and ADAS, digital twin simulation, and biomedical imaging. He has designed and optimized ML models and algorithms for edge-compute deployments and is an authority in securing and optimizing AI application security for on-device and on-premise environments. He is a contributor to several open-source projects, an author and peer-reviewer of multiple publications in top-tier ML venues, and is an inventor on key patents. Mr. Hamza holds a B.S. in Electrical Engineering and Computer Sciences, University of California, Berkeley, USA.

Who Should Attend:

AI Researchers and Academics, Data Scientists, Educators and Instructors, Enthusiasts with a Strong Programming Background  , Machine Learning Engineers, Software Developers and Programmers, Students (experience in linear algebra and calculus required), Technical Professionals in industry, Technical Product Manager, Technical Leaders 

Prerequisites: Basic programming knowledge in Python; no prior experience with language models is required

Course Level: Intermediate

Publication Year: 2025

ISBN: 978-1-7281-7894-3


Large Language Models: Understanding Transformer Architectures
  • Course Provider: Educational Activities
  • Course Number: EDP814
  • Duration (Hours): 1
  • Credits: 0.1 CEU/ 1 PDH