Bookmark this

Your GenAI command center.

This is built to be used every week, not read once. It remembers progress, gives the next useful action, and keeps the learning path focused.

Next best action

Stage 00 · Setup and Mindset

Install Python, set up VS Code, create a clean project folder, and run one tiny model call.

Ship proofHello model repo

Interview answerWhat can an LLM do and where can it fail?

Open next stage

Choose your lane

Start small, build confidence.

Use the roadmap in order. Do not rush math, agents, or cloud. Your first win is a working API call, a clear README, and one public mini-project.

Do Stage 00, 01, 06, and 07 before deep theory.
Ship Smart Summarizer before touching agents.
Write limitations in plain English after every build.

Start in 10 minutes

Get one tiny win before the roadmap begins.

Before reading Stage 00, run one small LLM experiment. The goal is not mastery. The goal is to feel the loop: prompt, response, log, improve.

Open the notebook or a fresh Colab.
Run a tiny model call or the first "hello model" cell.
Change the prompt once and compare the response.
Write one note: what changed, what failed, what you would measure next.

Open Day 1 notebook Day 1 win complete

# Day 1 shape: prompt -> response -> log
prompt = "Explain RAG to a fresher in 4 bullets"

response = call_your_llm(prompt)

print(response.text)
print({
    "model": response.model,
    "tokens": response.tokens,
    "latency_ms": response.latency_ms
})

Use the notebook for the real runnable version. This snippet shows the habit every GenAI app needs: call, inspect, and log.

Start here

The honest 2026 version.

You do not need to train a foundation model to become useful in GenAI. Most fresher-friendly work is building reliable apps around models: connect APIs, add your data, validate outputs, measure quality, and deploy something people can try.

Core path: call models, validate structured output, add RAG, measure quality, and deploy one small app.

Portfolio proof: show screenshots, logs, citations, eval results, limits, and tradeoffs instead of only certificates.

Interview focus: explain failures, cost, latency, safety, and why you chose prompt vs RAG vs fine-tuning.

Keep it small: finish one useful app before exploring advanced agents, fine-tuning, or cloud architecture.

Skip for now: training LLMs from scratch, Kubernetes, multi-agent teams, huge vector databases, advanced math proofs, and fine-tuning. Learn them later only when a project genuinely needs them.

TL;DR roadmap preview

Quick overview before the full roadmap.

This section is only the high-level map: five phases, timing, difficulty, and the big idea of each layer. The full detailed roadmap starts below in the stage-by-stage section with checklists, projects, interview prep, and build proof.

Foundations

Setup, Python, and honest basics

Start with a working development setup, Python fluency, APIs, and just enough math to understand the GenAI stack without drowning in theory.

Module 1.1

Setup and mindset

01Install Python, VS Code, Git, and one hosted model provider.
02Learn what LLMs can do, where they fail, and why logging matters.
03Create your project folder, setup notes, screenshots, and learning journal.

End state You can run a Python script, call an LLM API, explain what happened, and show a tiny portfolio artifact.

Already a software engineer? Skim Stages 0-4, then start with LLM APIs, structured output, RAG, evals, and deployment. You do not need to relearn every ML concept before building useful GenAI apps.

Portfolio checklist

What good proof looks like.

A fresher does not need to know every paper. You need a few small projects that show you can build carefully, explain tradeoffs, and learn from failures.

Skill 01 Core

Software foundation

You can write clean Python, call APIs, read errors, and organize reliable portfolio projects.

Python, Git, CLI, virtual envs, HTTP, JSON, env vars.
Proof: a project folder with scripts, tests, screenshots, and a readable project brief.

Skill 02 LLM app

LLM API fluency

You can build a useful LLM feature without guessing how prompts, tokens, latency, and cost behave.

Chat APIs, streaming, retries, rate limits, model selection, cost logs.
Proof: a hosted summarizer or support ticket triage tool with structured output.

Skill 03 RAG

Retrieval quality

You can make an LLM answer from private documents and explain why the answer is grounded.

Load, chunk, embed, store, retrieve, re-rank, cite, evaluate.
Proof: chat-with-PDF with citations and a small eval set.

Skill 04 Agents

Tool use and agents

You can safely let a model call tools, use state, and stop before it loops or burns budget.

Function calling, typed schemas, tool tests, iteration caps, memory.
Proof: a research agent with 2-3 tools and visible traces.

Skill 05 Prod

Production habits

You can turn a notebook into a public service that is logged, validated, and cheap enough to run.

FastAPI, Docker, Pydantic, observability, guardrails, caching, secrets.
Proof: public URL, demo video, cost notes, failure cases.

Skill 06 Interview

Clear explanations

You can answer system-design and debugging questions in plain language with real tradeoffs.

RAG vs fine-tuning, hallucinations, prompt injection, evals, latency.
Proof: portfolio case studies with architecture and decisions.

Every project needs a case study: problem, architecture, setup, screenshots, eval results, limitations, and next steps.

Every LLM call needs logs: model, prompt version, tokens, latency, cost, errors, and retry behavior.

Every RAG demo needs citations: show retrieved chunks and explain how you measured faithfulness.

Every interview answer needs tradeoffs: quality vs latency, hosted vs open model, prompt vs RAG vs fine-tune.

The roadmap

The fourteen stages.

Each stage answers four questions: what is this, why does it matter, what should I ship, and what interview signal does it create.

Phase 01Foundations

Stage 00

Mindset & Setup

Time: 1–2 hours Difficulty: Prereq: a laptop

Start with the right mental model: an LLM predicts tokens from context. It can be useful, but it can also be wrong, stale, or overconfident.

Install: Python 3.10+, VS Code, Git, and one clean project workspace.
Learn: the command line, virtual environments (venv or conda), and the four Git verbs you'll use forever — clone, commit, push, pull.
Sign up: Google AI Studio, Hugging Face, and one hosted-model provider such as OpenAI, Anthropic, or Gemini. Use free tiers first; provider limits change, so always check current pricing before scaling.
Profile setup: clear LinkedIn headline, simple portfolio page, and one clean project case study. Hiring starts before the first interview.

Build: a "hello world" Python project with setup steps, one screenshot, and a short note on what an LLM can and cannot do.

Stage 01

Python Essentials

Time: 1–2 weeks Difficulty: Skip if: you've shipped Python before

You do not need to master all of Python. You need enough to read files, call APIs, handle errors, and organize a small app.

Variables, types, control flow, functions, classes.
Files, JSON, CSV, environment variables, and HTTP APIs (requests).
API basics: status codes, headers, auth tokens, timeouts, retries, pagination, and rate limits.
numpy (arrays), pandas (tables), matplotlib (plots).
Exception handling, list/dict comprehensions, f-strings, type hints, and light testing with pytest.
Backend preview: build one tiny FastAPI endpoint so deployment later does not feel alien.

Build: a small script that reads a CSV, calls a public API with retries, validates the response, and writes clean JSON.

Stage 02

Math (Just Enough)

Time: 2–4 days Difficulty: Goal: intuition, not proofs

You don't need a math degree. You need to look at a paper's diagram and not feel lost.

Linear algebra: vectors, matrices, dot products — and why "similarity" has a geometric meaning.
Probability: distributions, expectation, why softmax exists.
Calculus: what a gradient is and why "going downhill" trains a model.

Honest take: learn enough math to explain vectors, probability, gradients, and attention at a high level. Deep proofs can wait until a project or interview actually needs them.

Phase 02Core ML

Stage 03

Machine Learning Foundations

Time: 1 week Difficulty: Tool: scikit-learn

Before you train a transformer, train a logistic regression. The vocabulary is the same.

Supervised vs unsupervised vs reinforcement.
Train/validation/test splits, overfitting, the bias-variance tradeoff.
Data leakage, baseline models, feature quality, and why a bad split can make a weak model look brilliant.
Metrics: accuracy, precision, recall, F1, MSE, perplexity. Pick the right one for the right task.
scikit-learn on a small tabular dataset (Iris, Titanic) — feel the workflow.
One neuron → forward pass → loss → backprop → gradient descent. Karpathy's micrograd is the gold standard.

Stage 04

Deep Learning Basics

Time: 1–2 weeks Difficulty: Pick one: PyTorch (recommended)

Pick PyTorch. Train three small models. Move on. Don't build a custom CUDA kernel.

Tensors, layers, activations (ReLU, GELU), losses, optimizers (SGD, Adam).
One framework — PyTorch (every modern paper) or Keras.
Build end-to-end: a tiny MLP for MNIST, a small CNN for CIFAR-10, a character-level RNN/Transformer for text.

Ship: a network that beats random on a real dataset, saved as a notebook with charts that prove it.

Stage 05

NLP & Transformers

Time: 1 week Difficulty: Why it matters: GenAI's core

Text becomes numbers, numbers become attention, attention becomes a model that finishes your sentence. Get the picture, not the matrix algebra.

flowchart LR
    T["Hello world"] -->|tokenizer| TOK["IDs: 15496, 995"]
    TOK -->|embedding lookup| EMB[("vectors
e.g. 768-dim each")]
    EMB --> ATT["Self-attention
each token looks at every other"]
    ATT --> FFN["Feed-forward
+ residuals + layer norm"]
    FFN -. repeat N layers .-> OUT["Logits over vocabulary"]
    OUT -->|argmax / sampling| NEXT["next token: '!'"]
    NEXT -. feed back .-> T

    classDef inn fill:#fffbeb,stroke:#fbbf24
    classDef proc fill:#eff6ff,stroke:#60a5fa
    classDef out fill:#ecfdf5,stroke:#34d399
    class T,TOK inn
    class EMB,ATT,FFN proc
    class OUT,NEXT out

Click to expand

Tokenization: how text becomes numbers (BPE, SentencePiece). Words ≠ tokens.
Embeddings: vectors that capture meaning. Similar meaning, similar vectors.
Attention: the mechanism that lets a model "look at" earlier tokens with different weights.
Encoder vs decoder vs encoder-decoder: BERT (encoder, understanding), GPT/Llama/Claude (decoder, generation), T5/BART (both).
Hands-on: Hugging Face Transformers — load a model, run pipeline(...), inspect outputs.

Phase 03Applied LLMs

Stage 06

Large Language Models

Time: 3–5 days Difficulty: Knobs you'll use daily

An LLM is a tokenizer wrapped around a transformer wrapped around a prompt. Knowing the four knobs — context, tokens, temperature, top-p — covers most beginner app work.

flowchart LR
    P["Your prompt
+ system + history"] --> TOK["Tokenize
~4 chars per token"]
    TOK --> M["LLM
billions of parameters"]
    M --> LOG["Logits
score for every token
in the vocabulary"]
    LOG --> S{Sampling
strategy}
    S -->|temperature| OUT["Next token"]
    OUT -->|append + repeat| TOK
    OUT --> STOP["Stop on EOS
or max_tokens"]

    classDef inn fill:#fffbeb,stroke:#fbbf24
    classDef proc fill:#eff6ff,stroke:#60a5fa
    classDef out fill:#ecfdf5,stroke:#34d399
    class P,TOK inn
    class M,LOG,S proc
    class OUT,STOP out

Click to expand

Pre-training (next-token prediction on the internet) → post-training (instruction tuning + RLHF/DPO to make it useful and safe).
Open-weight vs closed: Llama, Mistral, Qwen, DeepSeek, Gemma (open) vs GPT, Claude, Gemini (closed APIs).
Context window — how much text the model can "see". It varies by model and provider, so check current docs before designing around a limit.
Tokens — the billing and length unit. ~1 token ≈ 4 English characters or ¾ of a word.
Temperature — lower for factual/extraction tasks, higher for brainstorming and writing. Measure instead of guessing.
top-p / top-k — alternative sampling controls. Stick with temperature unless you have a reason.
API production basics: streaming, retries with backoff, timeout handling, rate-limit handling, model fallback, and per-request cost logging.
Model choice: small/fast model for extraction and routing; stronger model for reasoning, coding, or hard synthesis. Measure instead of guessing.

Ship: call a hosted LLM with streaming + retries, log tokens/latency, then run a small open model locally with Ollama.

Stage 07

Prompt Engineering

Time: 3–5 days Difficulty: The first lever

Most "we need fine-tuning" problems are actually "we need a tighter prompt and a Pydantic schema."

Zero-shot, few-shot, chain-of-thought.
Roles: system (rules), user (input), assistant (response). Use them.
Structured output: ask for JSON; better, use the model's structured-output mode (response_format, with_structured_output). Validate with Pydantic.
Function / tool calling: the model returns which function to call with what arguments; your code runs it and feeds the result back.
Prompt evals: save 20 tricky inputs, run them after every prompt change, and keep the prompt version in logs.
Pitfalls: hallucinations, prompt injection (user input overrides system rules), context drift in long chats.

flowchart LR
    P[Prompt] --> M{Did it work?}
    M -->|Yes| DONE["Ship it"]
    M -->|Mostly. Bad format| P1["Tighten the prompt
+ JSON / Pydantic schema"]
    M -->|It does not know my data| RAG["Add RAG
Stage 8"]
    M -->|Wrong style/skill, even with examples| FT["Fine-tune
Stage 11"]
    P1 --> P
    RAG --> DONE
    FT --> DONE

    classDef good fill:#ecfdf5,stroke:#34d399
    classDef warn fill:#fffbeb,stroke:#fbbf24
    class DONE good
    class P1,RAG,FT warn

Click to expand

Default order: improve the prompt first, add RAG when the answer needs your data, and consider fine-tuning only when you need a repeated style, format, or behavior that prompting cannot hold.

Stage 08

RAG · Retrieval-Augmented Generation

Time: 1 week Difficulty: Most useful pattern in industry

LLMs don't know your data. RAG fixes that without retraining anything — you bolt search onto generation and let the model quote from your docs.

flowchart LR
    subgraph INDEX["Build (offline, once)"]
        D[Your docs
PDFs, wiki, tickets] --> CH[Chunk
~300-800 tokens]
        CH --> EMB1[Embedding model
e.g. all-MiniLM-L6-v2]
        EMB1 --> VDB[(Vector DB
FAISS / Chroma / Qdrant)]
    end

    subgraph QUERY["Answer (every request)"]
        Q[User question] --> EMB2[Same embedding model]
        EMB2 --> SEARCH[Top-k similarity search]
        VDB --> SEARCH
        SEARCH --> CTX[Top chunks = context]
        CTX --> PROMPT[Prompt = system + context + question]
        PROMPT --> LLM[LLM]
        LLM --> A[Grounded answer
with citations]
    end

    classDef offline fill:#eff6ff,stroke:#60a5fa
    classDef online fill:#ecfdf5,stroke:#34d399
    class D,CH,EMB1,VDB offline
    class Q,EMB2,SEARCH,CTX,PROMPT,LLM,A online

Click to expand

Why RAG: LLMs don't know your data and have a knowledge cutoff. Inject relevant snippets into the prompt at query time.
Embeddings & vector DBs: FAISS (local, free), Chroma (simple), Qdrant (production OSS), Pinecone (managed).
Chunking: start with RecursiveCharacterTextSplitter, ~500 tokens, ~50 overlap. Tune later.
Re-ranking: add a cross-encoder (e.g. bge-reranker) on top-20 → top-3 for big quality wins.
Hybrid search: combine keyword/BM25 with vector search when exact terms, IDs, or error codes matter.
RAG evals: measure context recall, faithfulness, answer relevance, citation accuracy, and "I don't know" behavior.

Ship: "Chat with your PDF" with source citations, retrieved chunk preview, and a 25-question eval sheet. (Section 6 of the notebook.)

Phase 04Production

Stage 09

Frameworks & Tooling

Time: 1 week Difficulty: Pick one and commit

The frameworks aren't magic. They're glue. Pick the one with the docs you can read and the failures you can debug.

Orchestration: LangChain (broadest) or LlamaIndex (RAG-first). Pick one.
Validation: Pydantic v2 — your defense against malformed LLM output.
UIs in <50 lines: Gradio (great for ML demos) or Streamlit (great for dashboards).
Observability: LangSmith, Langfuse, or Phoenix — see every prompt, response, and tool call.
Model gateway: use LiteLLM or a thin adapter layer so switching between OpenAI, Anthropic, Gemini, and local models does not rewrite your app.
Integration pattern: learn MCP basics if you want agents/tools to connect cleanly to external systems.
Experiment tracking (optional): Weights & Biases, MLflow.

Stage 10

Agents & Multi-step Workflows

Time: 1–2 weeks Difficulty: Advanced pattern

Agent = LLM + tools + memory + a loop. Most "agent failures" are actually loops without limits and tools without tests.

flowchart LR
    U[User goal] --> AG{LLM
plans next step}
    AG -->|"Thought:
I need to search the docs"| TOOL[Pick a tool
+ arguments]
    TOOL --> EXEC[Run tool
search / API / code]
    EXEC -->|Observation| AG
    AG -->|I have enough info| ANS[Final answer]

    classDef think fill:#fffbeb,stroke:#fbbf24
    classDef act fill:#eff6ff,stroke:#60a5fa
    classDef done fill:#ecfdf5,stroke:#34d399
    class AG think
    class TOOL,EXEC act
    class ANS done

Click to expand

The ReAct pattern (Reason → Act → Observe → repeat) is the foundation.
Tool use / function calling: the LLM emits a JSON call; your runtime executes it and returns the result.
Frameworks: LangGraph (graphs and control), CrewAI (role-based), AutoGen (multi-agent chat), OpenAI Agents SDK.
Tool quality: typed schemas, input validation, deterministic tool tests, clear errors, and a max-iteration budget.
Memory: separate short-term conversation state from long-term user/profile memory; never store sensitive data by accident.
Watch out for: infinite loops (always cap iterations), runaway costs (log every call), bad tools (a flaky tool poisons the whole agent).

Ship: a research agent that takes a question, calls 2-3 tools, shows its trace, writes a 1-page sourced report, and stops on budget.

Stage 11

Fine-tuning & Customization

Time: 1–2 weeks Difficulty: Optional — skip until you can't

Fine-tune for style, tone, domain language, format. Almost never for adding facts — that's RAG's job.

flowchart TD
    DATA["Curated dataset
500-5,000 examples"] --> SPLIT["Split
train / eval (80/20)"]
    SPLIT --> EVAL_BEFORE["Run baseline
measure metrics"]
    SPLIT --> TRAIN["LoRA / QLoRA training
1-4 epochs
1x consumer GPU"]
    TRAIN --> ADAPTER["Adapter weights
~10-100 MB
not full model"]
    ADAPTER --> MERGE["Merge or load
at inference time"]
    MERGE --> EVAL_AFTER["Re-run eval
compare to baseline"]
    EVAL_AFTER --> SHIP{Better
on YOUR metric?}
    SHIP -->|Yes| DEPLOY["Ship adapter"]
    SHIP -->|No, regressed| DATA
    SHIP -->|No change| PROMPT["Try better prompts
or RAG instead"]

    classDef data fill:#fffbeb,stroke:#fbbf24
    classDef train fill:#eff6ff,stroke:#60a5fa
    classDef eval fill:#f5f3ff,stroke:#a78bfa
    classDef done fill:#ecfdf5,stroke:#34d399
    class DATA,SPLIT data
    class TRAIN,ADAPTER,MERGE train
    class EVAL_BEFORE,EVAL_AFTER,SHIP eval
    class DEPLOY,PROMPT done

Click to expand

LoRA / QLoRA — train a small "adapter" instead of the full model. Runs on a single consumer GPU.
Datasets: quality > quantity. 500 hand-curated examples beat 50,000 noisy ones.
Evaluation: build an eval set before you fine-tune. If you can't measure it, you can't improve it.
Tools: Hugging Face peft + trl, Unsloth, Axolotl.

Stage 12

Deployment & Productionization

Time: 1 week Difficulty: The portfolio multiplier

A demo on your laptop is a story. A public URL is a credential.

flowchart LR
    DEV[Notebook
that works] --> APP[FastAPI / Flask app
+ Pydantic schemas]
    APP --> DOCK[Dockerfile
requirements.txt]
    DOCK --> HOST{Where?}
    HOST --> HF["HF Spaces
simple demos"]
    HOST --> RND["Render
web app host"]
    HOST --> RAIL["Railway
web app host"]
    HOST --> CLOUD["AWS / GCP / Azure
full control"]

    APP --> OBS[Observability
LangSmith / Langfuse]
    APP --> GUARD[Guardrails
input/output filters]
    APP --> COST[Cost tracking
tokens x price]

    classDef green fill:#ecfdf5,stroke:#34d399
    classDef blue fill:#eff6ff,stroke:#60a5fa
    classDef yellow fill:#fffbeb,stroke:#fbbf24
    class HF,RND green
    class RAIL,CLOUD blue
    class OBS,GUARD,COST yellow

Click to expand

Wrap your model in FastAPI. Validate input/output with Pydantic.
Containerize with Docker so it runs the same everywhere.
Low-cost hosting: Hugging Face Spaces, Render, and Railway. Check current free tiers, sleep policies, and pricing before relying on them.
Production basics: secrets management, request queues, caching, background jobs, health checks, CI/CD, and rollback notes.
Reliability basics: retry only safe operations, cap output tokens, add timeouts, and surface useful errors to users.

Don't ship without: logging, error handling, rate limiting, a kill-switch, a per-request token-cost log, and a small eval set you can run before releases.

Stage 13

Responsible AI

Time: ongoing Difficulty: Not a stage — a lens

Treat user input as untrusted. Treat model output as confidently wrong until proven otherwise. Cite sources, log everything, and never paste PII into someone else's API.

flowchart LR
    USER["User input
(untrusted)"] --> GUARD1["Input guardrails
injection detection
PII redaction"]
    GUARD1 --> SYS["System prompt
+ retrieved context"]
    SYS --> LLM[LLM]
    LLM -->|may hallucinate| GUARD2["Output guardrails
schema validation
citation check
toxicity filter"]
    GUARD2 --> LOG["Log everything
prompt, response, tokens, cost"]
    LOG --> RESP["Response to user"]

    SECRET[("Sensitive data
PII, secrets, IP")] -. never send .-> LLM
    SECRET -->|use| LOCAL["Self-hosted model
Ollama / vLLM"]
    LOCAL --> GUARD2

    classDef danger fill:#fef2f2,stroke:#dc2626,color:#7f1d1d
    classDef guard fill:#fffbeb,stroke:#fbbf24
    classDef safe fill:#ecfdf5,stroke:#34d399
    class USER,SECRET danger
    class GUARD1,GUARD2,LOG guard
    class RESP,LOCAL safe

Click to expand

Hallucinations: assume they happen. Cite sources, validate, add a confidence step.
Prompt injection / jailbreaks: never let user input override system instructions for sensitive actions.
Data leakage: don't send PII, secrets, or proprietary data to third-party APIs without consent. Self-host when in doubt.
Bias & fairness: test across demographics; "default" outputs aren't neutral.
Security baseline: know the OWASP Top 10 for LLM Apps and design around prompt injection, data exfiltration, excessive agency, and insecure plugins/tools.
Evals & guardrails: Ragas, DeepEval, Guardrails AI, NeMo Guardrails.

Minimal toolkit

One row, one decision.

There are 200 ways to do every step. Here's the smallest set that gets you to a deployed app — picked because they're free, popular, and won't trap you in a niche later.

Area	Pick this	Why
Language	Python 3.10+	Every GenAI library lives here.
DL framework	PyTorch	What modern papers and Hugging Face use.
LLM API	Gemini, OpenAI, Anthropic	Learn one deeply, compare two others for tradeoffs.
Backend API	FastAPI	Simple, typed, production-shaped Python web services.
LLM library (open)	HF Transformers	Standard for any open model.
Local model runner	Ollama	One command to run Llama, Qwen, Gemma.
Model gateway	LiteLLM or thin adapter	Switch providers without rewriting app logic.
Orchestration	LangChain or LlamaIndex	Pick one and commit.
Vector DB (start)	FAISS or Chroma	Local, free, zero ops.
UI prototype	Gradio or Streamlit	Demo in < 50 lines.
Validation	Pydantic v2	Stops malformed JSON from breaking prod.
Agent framework	LangGraph	Most controllable; pairs with LangChain.
Tool integration	Function calling + MCP basics	Clean schemas and reusable tool connections.
RAG evals	Ragas or DeepEval	Measure faithfulness instead of trusting vibes.
Observability	LangSmith or Langfuse	See every prompt and response.
Deployment	Docker + HF Spaces	Free, fast, public URL.

Optional enthusiast path: you do not need a full cloud stack to start. Learn this only after you have one working RAG or agent demo and want to understand how production GenAI systems are deployed in real teams.

AWS production stack

Bedrock-first GenAI system.

Use this lane when the company already trusts AWS and you want managed model access, RAG, agents, guardrails, identity, logging, and deployment without stitching every service from scratch.

Amazon BedrockS3Knowledge BasesAgentsCloudWatchGuardrails

Model access

Amazon Bedrock for hosted foundation models, inference, prompt management, and model choice.

RAG layer

S3 data sources into Bedrock Knowledge Bases with a supported vector store such as OpenSearch Serverless.

Agent layer

Bedrock Agents with action groups, knowledge base access, traces, versions, and aliases.

Serving layer

FastAPI behind Lambda/API Gateway for serverless or ECS Fargate/App Runner for always-on apps.

Safety layer

IAM least privilege, Secrets Manager, Bedrock Guardrails, PII filters, and a kill switch.

Ops layer

CloudWatch logs, token/cost dashboards, prompt versions, eval runs, and rollback notes.

Starter architecture

Client sends request to FastAPI.
API validates with Pydantic and calls Bedrock.
RAG calls Knowledge Bases, retrieves evidence, then generates answer with citations.

Production checklist

Secrets Manager, IAM least privilege, CloudWatch dashboards.
Retries with backoff, output token caps, request IDs, and rate limits.
Guardrails for denied topics, sensitive data, and grounding checks.

Interview story

Explain why managed Bedrock reduces operational work.
Explain when ECS/App Runner is better than Lambda.
Explain how guardrails and evals reduce risk before release.

Knowledge Bases docs Agents docs Guardrails docs

Google Cloud stack

Vertex AI plus Cloud Run.

Use this lane when you want Gemini through Vertex AI, managed vector retrieval, agent deployment options, serverless containers, and Google-native evaluation/grounding workflows.

Vertex AIGeminiVector SearchAgent EngineCloud RunCloud Logging

Model access

Vertex AI with Gemini, Model Garden, embeddings, prompt experiments, and evaluation workflows.

RAG layer

Cloud Storage or curated docs into Vertex AI Vector Search, with hybrid retrieval when exact terms matter.

Agent layer

Vertex AI Agent Engine or Agent Builder for managed agent runtime, auth, scaling, and tracing.

Serving layer

FastAPI on Cloud Run, images in Artifact Registry, secrets in Secret Manager, and jobs for ingestion.

Safety layer

IAM, service accounts, private networking where needed, grounding, schema validation, and allowlisted tools.

Ops layer

Cloud Logging, Cloud Monitoring, prompt/eval tracking, latency budgets, and per-request cost logs.

Starter architecture

FastAPI service on Cloud Run calls Vertex AI Gemini.
Ingestion job chunks docs and writes vectors to Vector Search.
App retrieves evidence, grounds the answer, and logs tokens and latency.

Production checklist

Secret Manager, Artifact Registry, Cloud Logging, and Cloud Monitoring.
Service account per workload, input validation, and clear timeout policies.
Saved eval set before every prompt or retrieval change.

Interview story

Explain why Cloud Run is a clean first production container target.
Explain vector search vs keyword search vs hybrid retrieval.
Explain how grounding and evals catch silent RAG failures.

Vertex AI GenAI docs Vector Search docs Agent Engine docs

Azure production stack

Foundry plus enterprise search.

Use this lane when the company runs on Microsoft identity, Azure OpenAI or Microsoft Foundry, governance-heavy workflows, and enterprise RAG over private documents.

Microsoft FoundryAzure OpenAIAI SearchAgent ServiceKey VaultApp Insights

Model access

Microsoft Foundry or Azure OpenAI for models, deployments, projects, endpoints, and governance.

RAG layer

Azure AI Search with vector indexes, vectorizers, chunking/indexer pipelines, semantic ranker, and hybrid search.

Agent layer

Microsoft Foundry Agent Service, Agent Framework, or a custom FastAPI agent with typed tools.

Serving layer

FastAPI on Azure Container Apps or App Service, with background jobs for ingestion and eval runs.

Safety layer

Key Vault, RBAC, private endpoints, Azure AI Content Safety, and policy controls around tools and data.

Ops layer

Application Insights, Azure Monitor, eval dashboards, cost budgets, and deployment slots or rollback notes.

Starter architecture

FastAPI app validates requests and calls Foundry or Azure OpenAI.
Documents are indexed into Azure AI Search with vectors and metadata.
RAG retrieves evidence, applies policy checks, and returns cited answers.

Production checklist

Key Vault, managed identity, private endpoints, and App Insights traces.
Content safety checks, schema validation, and PII handling.
Hybrid search, citation validation, and saved RAG evals.

Interview story

Explain why AI Search is a strong enterprise RAG default.
Explain Foundry Agent Service vs a custom API agent.
Explain how RBAC, Key Vault, and monitoring fit production readiness.

Foundry docs Agent Service docs AI Search docs

What this actually costs

Keep the first pass cheap.

For a learner doing the roadmap end-to-end, keep the first pass cheap: free tiers, local models, tiny datasets, and small eval sets before any paid scale-up.

Provider	Tier	What you can do
Google AI Studio (Gemini)	Free / low-cost	Good for learner experiments. Limits change, so check the current console before large runs.
OpenAI	Optional paid credit	Use small/fast models for practice calls, extraction, routing, and structured output demos.
Anthropic Claude	Optional paid credit	Useful comparison provider for writing, analysis, and safety-focused workflows.
Hugging Face	Free	Datasets, model downloads, Spaces hosting (CPU).
Ollama (local)	Free	Runs small open models on your laptop; performance depends on RAM, GPU, and model size.

Beginner play: start with free-tier hosted models + Ollama for local experiments. Spend money only when you need reliability, higher limits, or a production demo.

Build these, in this order

Six beginner projects worth showing.

A strong fresher portfolio is a few small things you can explain clearly, not one oversized project you abandoned. Personal, finished, and measurable beats flashy.

Project 01 · Stage 07

Smart Summarizer

Paste an article, lecture note, or long email and return a 5-bullet summary, key terms, risks, and one suggested follow-up question.

PromptingStructured outputStreamingCase-study first

Learn Stage 07

Build scope

Input textarea or file upload.
Output JSON with summary, keywords, confidence, and limitations.
Log model, latency, token count, and prompt version.

Interview questions

How did you reduce hallucinations in summaries?
Why structured output instead of free text?
How would you evaluate summary quality?

Project 02 · Stage 07

AI Support Ticket Triage

Paste a customer issue and return structured routing: category, urgency, sentiment, missing details, and a suggested first reply.

PydanticJSON schemaPrompt evalsHuman review

Learn Stage 07

Build scope

Accept pasted tickets or CSV rows and normalize messy text.
Return category, priority, confidence, missing fields, and next action.
Add 20 synthetic tickets for routing and tone evals.

Interview questions

How do you validate model output?
How do you handle low-confidence classifications?
What makes a triage result measurable?

Project 03 · Stage 07

Code Explainer Bot

Paste a function and get a beginner explanation, complexity estimate, edge cases, and one refactor suggestion.

Code promptsSafetyTestsExamples

Learn Stage 07

Build scope

Support Python first. Add language detection later.
Return explanation, complexity, pitfalls, and test ideas.
Show prompt examples in the project notes.

Interview questions

When should you not use an LLM for code?
How do you detect wrong explanations?
How would you add unit-test generation safely?

Project 04 · Stage 08

Chat With Your PDF

Build a RAG app over one textbook, policy document, or handbook. Answers must cite source chunks and say "I don't know" when evidence is missing.

RAGEmbeddingsCitationsEval sheet

Learn Stage 08

Build scope

Chunk document, embed, retrieve top-k, and cite sources.
Show retrieved chunks before the final answer.
Create 25 questions with expected evidence.

Interview questions

How did you choose chunk size?
What is context recall?
How do you reduce wrong citations?

Project 05 · Stage 10

Mini Research Agent

Give it one research question. It should call two or three safe tools, show its trace, and produce a short sourced report with a budget limit.

Tool callingTracesBudgetsStop rules

Learn Stage 10

Build scope

Use typed tool schemas and validate arguments.
Cap iterations, tokens, and total tool calls.
Store trace logs for every run.

Interview questions

What makes an agent different from a chatbot?
How do you stop loops?
How do you test tools independently?

Project 06 · Capstone

Personal Capstone

Build something you would genuinely use: a study buddy for your notes, a personal finance explainer, a journal reflector, or a niche workflow from your college or internship.

Personal nichePublic demoEval reportDemo video

Use the 12-week plan

Build scope

One clear user, one painful workflow, one measurable outcome.
Public URL, project brief, screenshots, architecture, and limitations.
Include cost, latency, failure cases, and what you would improve.

Interview questions

Why did you choose prompt, RAG, agent, or fine-tuning?
What failed in v1?
How would you scale this for real users?

Minimum quality bar

Public demo, screenshots, setup commands, architecture diagram, prompt examples, and known limitations.

Interview signal

Show eval results, cost/latency logs, a failure case you fixed, and why you chose prompt/RAG/agent/fine-tune.

Viral hook

Build for a real niche: "AI study buddy for my notes" beats generic chatbot because people instantly understand the value.

12-week sprint

One quarter. One portfolio.

Doable part-time at ~10 hours/week. Doable full-time in 6–8 weeks if you can give it your day job's attention.

6 weeks · full-time

Fast track

25-35 hours/week. Best if you already know Python and can build daily.

Week 1: Python/API refresh + ML/DL basics.
Week 2: Transformers, hosted APIs, structured output.
Weeks 3-4: RAG, vector DB, evals, deployment.
Weeks 5-6: agent, capstone, interview drill.

12 weeks · part-time

College track

4-6 hours/week. Best when exams, classes, or a job compete for time.

One stage every 1-2 weeks.
One public progress post per month.
One polished project every 6-8 weeks.
Final month: DSA refresh + GenAI system design.

Recommended 12-week path

A project-first sprint that builds proof every two weeks.

Use this as a pacing guide, not a deadline. If a project needs another week, take it and keep the project notes honest.

Weeks 1-2

Python, APIs, and setup

Get comfortable with scripts, files, HTTP, JSON, env vars, Git, and simple tests.

Public starter project

Weeks 3-4

ML and deep learning basics

Learn enough metrics, splits, tensors, and training loops to explain what models are doing.

Tiny classifier + metrics

Weeks 5-6

Transformers and LLM APIs

Understand tokens, context, latency, cost, structured output, and provider tradeoffs.

LLM tool with logs

Weeks 7-8

Prompting and RAG

Build retrieval over real documents, cite sources, and create a small eval sheet.

Chat-with-PDF demo

Weeks 9-10

Frameworks and agents

Add tool schemas, traces, stop rules, memory choices, and budget limits without overbuilding.

Safe mini-agent

Weeks 11-12

Capstone and deployment

Ship one useful app with a project brief, screenshots, demo video, eval report, and interview story.

Portfolio-ready app

Weeks	Focus	Deliverable	Interview checkpoint
1-2	Python + API basics	Project folder with scripts, tests, screenshots, and setup notes	Explain HTTP, JSON, env vars, errors, and version-control flow.
3-4	ML + Deep Learning	Tiny image/text classifier with metrics	Explain overfitting, splits, metrics, and baseline models.
5-6	Transformers + LLM APIs	Hosted LLM tool with streaming + logs	Explain tokens, temperature, context, latency, and cost.
7-8	Prompt engineering + RAG	Chat-with-PDF deployed publicly	Explain chunking, embeddings, retrieval, citations, and RAG evals.
9-10	Agents + frameworks	Agent with 2-3 tools and visible trace	Explain tool schemas, loops, memory, budgets, and failures.
11	Capstone build	One project you'd genuinely use	Prepare the architecture story and tradeoff decisions.
12	Deployment + polish	Project brief, demo video, eval report, blog post	Run mock system design and portfolio deep-dive rounds.

The seven traps

Common beginner mistakes.

If you hit one, you're not behind — you're average. Reading them up front saves you the wasted weeks.

Tutorial loop. You finish 12 courses and have no working project to show. Build first; learn while building.
Jumping to fine-tuning. Many beginner "fine-tuning" ideas are better solved with a clearer prompt, structured output, or RAG.
Skipping evaluation. "It looked good in 3 examples" isn't a result. Build a 20–50 example eval set early and re-run it after every change.
Ignoring tokens. Tokens are time, money, and quality. Always log them per request.
Trusting LLM output blindly. Validate JSON. Verify facts. Treat it like a junior engineer who's confident but new.
Over-frameworking. LangChain/LangGraph are great when you need them. For a 30-line script, plain requests + the SDK is fine.
Building in secret. Share your progress, publish small demos, and write up what you broke. The portfolio is the credential.

Interview prep

What a hiring manager will actually ask.

The questions that come up again and again: fundamentals, RAG, agents, production, security, and the portfolio deep dive. If you can answer these with tradeoffs, you're ahead of most applicants.

Round 01Code

Python + API task

Build a small endpoint or script that calls an LLM, validates JSON, handles errors, and logs latency/tokens.

Round 02System

Design a RAG app

Walk through ingestion, chunking, embeddings, vector DB, retrieval, prompt assembly, evals, and monitoring.

Round 03Debug

Fix bad output

Given hallucinations, slow latency, high cost, or wrong citations, explain what you would measure and change first.

Question	Answer in two sentences
What's the difference between fine-tuning and RAG?	RAG retrieves your docs at query time and injects them into the prompt — facts. Fine-tuning updates weights to teach behavior — style, tone, format. Try RAG first; fine-tune only when prompting + RAG genuinely can't get there.
How would you reduce hallucinations in production?	Ground the model with RAG and require citations, force structured output via JSON schema + Pydantic, lower temperature for factual tasks, and re-run a held-out eval set on every release. If the cost of a wrong answer is high, add a verification step or a human in the loop.
What is a token, and why does it matter?	A unit of text the model reads — about 4 characters in English, ¾ of a word. Every token affects cost and latency, so logging tokens per request is one of the best habits for improving cost, speed, and quality.
When would you NOT use an LLM?	When determinism matters (a regex or parser is better), when the input is structured data (use SQL), when latency budget is tight, or when failure is unacceptable without human review (medical, legal, financial decisions).
How do you evaluate an LLM application?	Build a 20–100 example eval set with input + expected output before you ship. Score with exact match, embedding similarity, LLM-as-judge, or human review depending on the task — and re-run on every change. If you can't measure it, you can't improve it.
What is prompt injection and how do you defend against it?	A user smuggling instructions into the prompt to override system rules ("ignore previous instructions and..."). Defenses: treat user input as data not instructions, use tool/function calling instead of free-text actions for sensitive operations, and require out-of-band confirmation for destructive actions.
Why does temperature matter, and what value should I use?	It controls sampling randomness. Use lower values for extraction, classification, and structured output; use higher values for writing or brainstorming. Always test on your own examples.
Hosted (GPT/Claude/Gemini) or open-source (Llama/Mistral)?	Hosted APIs are usually easiest to start with and charge per token. Open-weight models can help with privacy, control, and scale, but you operate more infrastructure. Start hosted for learning, then compare open models when cost or compliance matters.
How would you design a RAG chatbot for company policies?	Ingest docs with metadata, chunk by section, embed, store in a vector DB, retrieve top-k, re-rank, assemble a grounded prompt, and return citations. Then evaluate with known policy questions, missing-answer cases, and citation checks.
What makes an agent different from a normal chatbot?	A chatbot usually answers from conversation context; an agent can choose tools, observe results, update state, and continue toward a goal. Production agents need typed tools, iteration caps, logs, and budget controls.
How do you keep structured output reliable?	Use the provider's structured-output mode or function calling, validate with Pydantic/JSON Schema, and retry with the validation error when parsing fails. Keep temperature low and test with malformed, empty, and adversarial inputs.
How would you reduce LLM app latency and cost?	Use a smaller model where possible, trim context, cache repeated retrievals/responses, stream output, batch offline work, and stop generating early. Log tokens, latency, and model choice so optimization is based on data.
What should be in a GenAI portfolio case study?	Problem, demo link, architecture diagram, setup, example prompts, screenshots, eval results, cost/latency notes, limitations, and future improvements. It should prove you can think like an engineer, not only follow a tutorial.

What interviewers want to see: can you explain why the app failed and what you checked first? Keep prompt, response, retrieval, tool calls, latency, and token usage visible. Add a small eval set you can rerun after changes. That is enough to show practical engineering judgment.

After 12 weeks

What you'll actually be able to do.

By the end, you should have shipped, debugged, and deployed small GenAI apps. These are the practical capabilities to aim for.

Pick the right approach for a problem. Given a vague brief, decide whether it's a prompt problem, a RAG problem, an agent problem, or a fine-tuning problem — and defend the choice with cost and effort numbers.
Call LLM APIs with confidence. Hosted providers and local models. Stream responses, handle rate limits, manage env vars, and log tokens.
Build a production-shaped RAG pipeline. Ingest, chunk, embed, store, retrieve, re-rank, generate, and cite sources with clear limitations.
Get reliable structured output. JSON schemas, Pydantic validation, and retries on parse failure so a demo can become a backend feature.
Build a small tool-using workflow with guardrails. Iteration caps, cost logging, tool tests, visible traces, and clear stop conditions.
Deploy a public URL. FastAPI + Docker + a host (HF Spaces / Render / Railway). Logged, rate-limited, monitored.
Evaluate an LLM app rigorously. Build a 50-example eval set, score with multiple methods, track improvements across versions. The thing most beginners skip and most hiring managers ask about.
Know what you don't know. Read a paper abstract and tell whether it's relevant to your work. Read a model card and spot a red flag. Read a vendor pitch and ask the right cost questions.

Glossary

Speak the language.

The words you'll hear every day. Memorize the definitions and you'll keep up in any meeting or interview.

Term	One-line meaning
Token	The chunk of text the model actually sees. ~4 characters or ¾ of a word.
Context window	How many tokens the model can read at once (input + output combined).
Embedding	A vector that represents meaning. Two similar sentences, two close vectors.
Vector DB	A database that stores embeddings and finds nearest neighbors fast.
Temperature	Randomness knob. 0 = same answer every time, 1 = creative.
System prompt	Persistent instructions the model follows for the whole conversation.
Hallucination	Confident-sounding but wrong output. Mitigate with RAG + citations.
RAG	"Look it up in my docs, then answer using what you found."
Hybrid search	Combining keyword search and vector search so exact terms and semantic meaning both work.
Re-ranking	A second model reorders retrieved chunks so the best evidence reaches the LLM first.
Fine-tuning	Updating the model's weights on your data — usually a small LoRA adapter.
Agent	An LLM that can choose to call tools in a loop until a goal is met.
Tool / function calling	The LLM emits a JSON call; your code runs it; the result goes back.
MCP	Model Context Protocol: a standard way for AI apps to connect to external tools and data sources.
Prompt injection	A user smuggling instructions into your prompt to override system rules.
Eval set	A saved set of inputs and expected behavior you rerun after prompt, model, or code changes.
Latency	How long the user waits. LLM latency comes from retrieval, model time, output length, and network.
Rate limit	A provider cap on requests or tokens per minute. Handle it with backoff and queues.
Inference	Running the model to get a result (vs. training, which builds the model).

Start with one tiny build.

Open the notebook, run the first cell, and make one small LLM call work. Then add structure, data, evals, and deployment one layer at a time.

Open the notebook Subscribe on LinkedIn

Stay close to the field

Keep learning with fresh notes.

GenAI tooling changes quickly. Follow along for practical notes, interview patterns, project breakdowns, and updates when the roadmap changes.

Want weekly direction? Subscribe to the LinkedIn newsletter for concise GenAI notes and fresher-friendly interview prep.
Built something from the roadmap? Share it on LinkedIn and tag Purnendu Das so it can reach more learners.
Found something outdated? Send a LinkedIn message with the source and a short explanation. The rule is: cite numbers or working code, not vibes.

Follow on LinkedIn Subscribe on LinkedIn

Build toward an interview ready GenAI portfolio.

Your GenAI command center.

Stage 00 · Setup and Mindset

Start small, build confidence.

Get one tiny win before the roadmap begins.

The honest 2026 version.

Quick overview before the full roadmap.

Setup, Python, and honest basics

Setup and mindset

What good proof looks like.

Software foundation

LLM API fluency

Retrieval quality

Tool use and agents

Production habits

Clear explanations

The fourteen stages.

Mindset & Setup

Python Essentials

Math (Just Enough)

Machine Learning Foundations

Deep Learning Basics

NLP & Transformers

Large Language Models

Prompt Engineering

RAG · Retrieval-Augmented Generation

Frameworks & Tooling

Agents & Multi-step Workflows

Fine-tuning & Customization

Deployment & Productionization

Responsible AI

One row, one decision.

Bedrock-first GenAI system.

Vertex AI plus Cloud Run.

Foundry plus enterprise search.

Keep the first pass cheap.

Six beginner projects worth showing.

Smart Summarizer

AI Support Ticket Triage

Code Explainer Bot

Chat With Your PDF

Mini Research Agent

Personal Capstone

Minimum quality bar

Interview signal

Viral hook

One quarter. One portfolio.

Fast track

Recommended

College track

A project-first sprint that builds proof every two weeks.

Python, APIs, and setup

ML and deep learning basics

Transformers and LLM APIs

Prompting and RAG

Frameworks and agents

Capstone and deployment

Common beginner mistakes.

What a hiring manager will actually ask.

Python + API task

Design a RAG app

Fix bad output

What you'll actually be able to do.

Trusted resources to open when needed.

Speak the language.

Start with one tiny build.

Keep learning with fresh notes.