0% 0 of 14 stages complete
Resume at Stage 00
Full stack genai roadmap - 2026 edition

Build toward an interview ready GenAI portfolio.

A calm, practical roadmap for freshers: learn the small set of skills modern GenAI apps actually use, build proof one project at a time, and save advanced topics until they matter.

Purnendu Das ~25 min read 14 stages · 3 timelines Beginner-friendly path
Python + HTTP APIs LLM APIs + structured output RAG with citations Tools + safe agents Evals + deployment basics
Bookmark this

Your GenAI command center.

This is built to be used every week, not read once. It remembers progress, gives the next useful action, and keeps the learning path focused.

Next best action

Stage 00 · Setup and Mindset

0%

Install Python, set up VS Code, create a clean project folder, and run one tiny model call.

Ship proofHello model repo
Interview answerWhat can an LLM do and where can it fail?
Open next stage
Choose your lane

Start small, build confidence.

Use the roadmap in order. Do not rush math, agents, or cloud. Your first win is a working API call, a clear README, and one public mini-project.

  • Do Stage 00, 01, 06, and 07 before deep theory.
  • Ship Smart Summarizer before touching agents.
  • Write limitations in plain English after every build.
Start in 10 minutes

Get one tiny win before the roadmap begins.

Before reading Stage 00, run one small LLM experiment. The goal is not mastery. The goal is to feel the loop: prompt, response, log, improve.

  1. Open the notebook or a fresh Colab.
  2. Run a tiny model call or the first "hello model" cell.
  3. Change the prompt once and compare the response.
  4. Write one note: what changed, what failed, what you would measure next.
Open Day 1 notebook
# Day 1 shape: prompt -> response -> log
prompt = "Explain RAG to a fresher in 4 bullets"

response = call_your_llm(prompt)

print(response.text)
print({
    "model": response.model,
    "tokens": response.tokens,
    "latency_ms": response.latency_ms
})

Use the notebook for the real runnable version. This snippet shows the habit every GenAI app needs: call, inspect, and log.

Start here

The honest 2026 version.

You do not need to train a foundation model to become useful in GenAI. Most fresher-friendly work is building reliable apps around models: connect APIs, add your data, validate outputs, measure quality, and deploy something people can try.

Core path: call models, validate structured output, add RAG, measure quality, and deploy one small app.
Portfolio proof: show screenshots, logs, citations, eval results, limits, and tradeoffs instead of only certificates.
Interview focus: explain failures, cost, latency, safety, and why you chose prompt vs RAG vs fine-tuning.
Keep it small: finish one useful app before exploring advanced agents, fine-tuning, or cloud architecture.
Skip for now: training LLMs from scratch, Kubernetes, multi-agent teams, huge vector databases, advanced math proofs, and fine-tuning. Learn them later only when a project genuinely needs them.
TL;DR roadmap preview

Quick overview before the full roadmap.

This section is only the high-level map: five phases, timing, difficulty, and the big idea of each layer. The full detailed roadmap starts below in the stage-by-stage section with checklists, projects, interview prep, and build proof.

Foundations

Setup, Python, and honest basics

Start with a working development setup, Python fluency, APIs, and just enough math to understand the GenAI stack without drowning in theory.

Module 1.1

Setup and mindset

  • 01Install Python, VS Code, Git, and one hosted model provider.
  • 02Learn what LLMs can do, where they fail, and why logging matters.
  • 03Create your project folder, setup notes, screenshots, and learning journal.
End state You can run a Python script, call an LLM API, explain what happened, and show a tiny portfolio artifact.
Already a software engineer? Skim Stages 0-4, then start with LLM APIs, structured output, RAG, evals, and deployment. You do not need to relearn every ML concept before building useful GenAI apps.
Portfolio checklist

What good proof looks like.

A fresher does not need to know every paper. You need a few small projects that show you can build carefully, explain tradeoffs, and learn from failures.

Skill 01 Core

Software foundation

You can write clean Python, call APIs, read errors, and organize reliable portfolio projects.

  • Python, Git, CLI, virtual envs, HTTP, JSON, env vars.
  • Proof: a project folder with scripts, tests, screenshots, and a readable project brief.
Skill 02 LLM app

LLM API fluency

You can build a useful LLM feature without guessing how prompts, tokens, latency, and cost behave.

  • Chat APIs, streaming, retries, rate limits, model selection, cost logs.
  • Proof: a hosted summarizer or support ticket triage tool with structured output.
Skill 03 RAG

Retrieval quality

You can make an LLM answer from private documents and explain why the answer is grounded.

  • Load, chunk, embed, store, retrieve, re-rank, cite, evaluate.
  • Proof: chat-with-PDF with citations and a small eval set.
Skill 04 Agents

Tool use and agents

You can safely let a model call tools, use state, and stop before it loops or burns budget.

  • Function calling, typed schemas, tool tests, iteration caps, memory.
  • Proof: a research agent with 2-3 tools and visible traces.
Skill 05 Prod

Production habits

You can turn a notebook into a public service that is logged, validated, and cheap enough to run.

  • FastAPI, Docker, Pydantic, observability, guardrails, caching, secrets.
  • Proof: public URL, demo video, cost notes, failure cases.
Skill 06 Interview

Clear explanations

You can answer system-design and debugging questions in plain language with real tradeoffs.

  • RAG vs fine-tuning, hallucinations, prompt injection, evals, latency.
  • Proof: portfolio case studies with architecture and decisions.
Every project needs a case study: problem, architecture, setup, screenshots, eval results, limitations, and next steps.
Every LLM call needs logs: model, prompt version, tokens, latency, cost, errors, and retry behavior.
Every RAG demo needs citations: show retrieved chunks and explain how you measured faithfulness.
Every interview answer needs tradeoffs: quality vs latency, hosted vs open model, prompt vs RAG vs fine-tune.
The roadmap

The fourteen stages.

Each stage answers four questions: what is this, why does it matter, what should I ship, and what interview signal does it create.

Phase 01Foundations
Stage 00

Mindset & Setup

Time: 1–2 hours Difficulty: Prereq: a laptop

Start with the right mental model: an LLM predicts tokens from context. It can be useful, but it can also be wrong, stale, or overconfident.

  • Install: Python 3.10+, VS Code, Git, and one clean project workspace.
  • Learn: the command line, virtual environments (venv or conda), and the four Git verbs you'll use forever — clone, commit, push, pull.
  • Sign up: Google AI Studio, Hugging Face, and one hosted-model provider such as OpenAI, Anthropic, or Gemini. Use free tiers first; provider limits change, so always check current pricing before scaling.
  • Profile setup: clear LinkedIn headline, simple portfolio page, and one clean project case study. Hiring starts before the first interview.
Build: a "hello world" Python project with setup steps, one screenshot, and a short note on what an LLM can and cannot do.
Stage 01

Python Essentials

Time: 1–2 weeks Difficulty: Skip if: you've shipped Python before

You do not need to master all of Python. You need enough to read files, call APIs, handle errors, and organize a small app.

  • Variables, types, control flow, functions, classes.
  • Files, JSON, CSV, environment variables, and HTTP APIs (requests).
  • API basics: status codes, headers, auth tokens, timeouts, retries, pagination, and rate limits.
  • numpy (arrays), pandas (tables), matplotlib (plots).
  • Exception handling, list/dict comprehensions, f-strings, type hints, and light testing with pytest.
  • Backend preview: build one tiny FastAPI endpoint so deployment later does not feel alien.
Build: a small script that reads a CSV, calls a public API with retries, validates the response, and writes clean JSON.
Stage 02

Math (Just Enough)

Time: 2–4 days Difficulty: Goal: intuition, not proofs

You don't need a math degree. You need to look at a paper's diagram and not feel lost.

  • Linear algebra: vectors, matrices, dot products — and why "similarity" has a geometric meaning.
  • Probability: distributions, expectation, why softmax exists.
  • Calculus: what a gradient is and why "going downhill" trains a model.
Honest take: learn enough math to explain vectors, probability, gradients, and attention at a high level. Deep proofs can wait until a project or interview actually needs them.
Phase 02Core ML
Stage 03

Machine Learning Foundations

Time: 1 week Difficulty: Tool: scikit-learn

Before you train a transformer, train a logistic regression. The vocabulary is the same.

  • Supervised vs unsupervised vs reinforcement.
  • Train/validation/test splits, overfitting, the bias-variance tradeoff.
  • Data leakage, baseline models, feature quality, and why a bad split can make a weak model look brilliant.
  • Metrics: accuracy, precision, recall, F1, MSE, perplexity. Pick the right one for the right task.
  • scikit-learn on a small tabular dataset (Iris, Titanic) — feel the workflow.
  • One neuron → forward pass → loss → backprop → gradient descent. Karpathy's micrograd is the gold standard.
Stage 04

Deep Learning Basics

Time: 1–2 weeks Difficulty: Pick one: PyTorch (recommended)

Pick PyTorch. Train three small models. Move on. Don't build a custom CUDA kernel.

  • Tensors, layers, activations (ReLU, GELU), losses, optimizers (SGD, Adam).
  • One framework — PyTorch (every modern paper) or Keras.
  • Build end-to-end: a tiny MLP for MNIST, a small CNN for CIFAR-10, a character-level RNN/Transformer for text.
Ship: a network that beats random on a real dataset, saved as a notebook with charts that prove it.
Stage 05

NLP & Transformers

Time: 1 week Difficulty: Why it matters: GenAI's core

Text becomes numbers, numbers become attention, attention becomes a model that finishes your sentence. Get the picture, not the matrix algebra.

flowchart LR
    T["Hello world"] -->|tokenizer| TOK["IDs: 15496, 995"]
    TOK -->|embedding lookup| EMB[("vectors
e.g. 768-dim each")] EMB --> ATT["Self-attention
each token looks at every other"] ATT --> FFN["Feed-forward
+ residuals + layer norm"] FFN -. repeat N layers .-> OUT["Logits over vocabulary"] OUT -->|argmax / sampling| NEXT["next token: '!'"] NEXT -. feed back .-> T classDef inn fill:#fffbeb,stroke:#fbbf24 classDef proc fill:#eff6ff,stroke:#60a5fa classDef out fill:#ecfdf5,stroke:#34d399 class T,TOK inn class EMB,ATT,FFN proc class OUT,NEXT out
Click to expand
  • Tokenization: how text becomes numbers (BPE, SentencePiece). Words ≠ tokens.
  • Embeddings: vectors that capture meaning. Similar meaning, similar vectors.
  • Attention: the mechanism that lets a model "look at" earlier tokens with different weights.
  • Encoder vs decoder vs encoder-decoder: BERT (encoder, understanding), GPT/Llama/Claude (decoder, generation), T5/BART (both).
  • Hands-on: Hugging Face Transformers — load a model, run pipeline(...), inspect outputs.
Phase 03Applied LLMs
Stage 06

Large Language Models

Time: 3–5 days Difficulty: Knobs you'll use daily

An LLM is a tokenizer wrapped around a transformer wrapped around a prompt. Knowing the four knobs — context, tokens, temperature, top-p — covers most beginner app work.

flowchart LR
    P["Your prompt
+ system + history"] --> TOK["Tokenize
~4 chars per token"] TOK --> M["LLM
billions of parameters"] M --> LOG["Logits
score for every token
in the vocabulary"] LOG --> S{Sampling
strategy} S -->|temperature| OUT["Next token"] OUT -->|append + repeat| TOK OUT --> STOP["Stop on EOS
or max_tokens"] classDef inn fill:#fffbeb,stroke:#fbbf24 classDef proc fill:#eff6ff,stroke:#60a5fa classDef out fill:#ecfdf5,stroke:#34d399 class P,TOK inn class M,LOG,S proc class OUT,STOP out
Click to expand
  • Pre-training (next-token prediction on the internet) → post-training (instruction tuning + RLHF/DPO to make it useful and safe).
  • Open-weight vs closed: Llama, Mistral, Qwen, DeepSeek, Gemma (open) vs GPT, Claude, Gemini (closed APIs).
  • Context window — how much text the model can "see". It varies by model and provider, so check current docs before designing around a limit.
  • Tokens — the billing and length unit. ~1 token ≈ 4 English characters or ¾ of a word.
  • Temperature — lower for factual/extraction tasks, higher for brainstorming and writing. Measure instead of guessing.
  • top-p / top-k — alternative sampling controls. Stick with temperature unless you have a reason.
  • API production basics: streaming, retries with backoff, timeout handling, rate-limit handling, model fallback, and per-request cost logging.
  • Model choice: small/fast model for extraction and routing; stronger model for reasoning, coding, or hard synthesis. Measure instead of guessing.
Ship: call a hosted LLM with streaming + retries, log tokens/latency, then run a small open model locally with Ollama.
Stage 07

Prompt Engineering

Time: 3–5 days Difficulty: The first lever

Most "we need fine-tuning" problems are actually "we need a tighter prompt and a Pydantic schema."

  • Zero-shot, few-shot, chain-of-thought.
  • Roles: system (rules), user (input), assistant (response). Use them.
  • Structured output: ask for JSON; better, use the model's structured-output mode (response_format, with_structured_output). Validate with Pydantic.
  • Function / tool calling: the model returns which function to call with what arguments; your code runs it and feeds the result back.
  • Prompt evals: save 20 tricky inputs, run them after every prompt change, and keep the prompt version in logs.
  • Pitfalls: hallucinations, prompt injection (user input overrides system rules), context drift in long chats.
flowchart LR
    P[Prompt] --> M{Did it work?}
    M -->|Yes| DONE["Ship it"]
    M -->|Mostly. Bad format| P1["Tighten the prompt
+ JSON / Pydantic schema"] M -->|It does not know my data| RAG["Add RAG
Stage 8"] M -->|Wrong style/skill, even with examples| FT["Fine-tune
Stage 11"] P1 --> P RAG --> DONE FT --> DONE classDef good fill:#ecfdf5,stroke:#34d399 classDef warn fill:#fffbeb,stroke:#fbbf24 class DONE good class P1,RAG,FT warn
Click to expand
Default order: improve the prompt first, add RAG when the answer needs your data, and consider fine-tuning only when you need a repeated style, format, or behavior that prompting cannot hold.
Stage 08

RAG · Retrieval-Augmented Generation

Time: 1 week Difficulty: Most useful pattern in industry

LLMs don't know your data. RAG fixes that without retraining anything — you bolt search onto generation and let the model quote from your docs.

flowchart LR
    subgraph INDEX["Build (offline, once)"]
        D[Your docs
PDFs, wiki, tickets] --> CH[Chunk
~300-800 tokens] CH --> EMB1[Embedding model
e.g. all-MiniLM-L6-v2] EMB1 --> VDB[(Vector DB
FAISS / Chroma / Qdrant)] end subgraph QUERY["Answer (every request)"] Q[User question] --> EMB2[Same embedding model] EMB2 --> SEARCH[Top-k similarity search] VDB --> SEARCH SEARCH --> CTX[Top chunks = context] CTX --> PROMPT[Prompt = system + context + question] PROMPT --> LLM[LLM] LLM --> A[Grounded answer
with citations] end classDef offline fill:#eff6ff,stroke:#60a5fa classDef online fill:#ecfdf5,stroke:#34d399 class D,CH,EMB1,VDB offline class Q,EMB2,SEARCH,CTX,PROMPT,LLM,A online
Click to expand
  • Why RAG: LLMs don't know your data and have a knowledge cutoff. Inject relevant snippets into the prompt at query time.
  • Embeddings & vector DBs: FAISS (local, free), Chroma (simple), Qdrant (production OSS), Pinecone (managed).
  • Chunking: start with RecursiveCharacterTextSplitter, ~500 tokens, ~50 overlap. Tune later.
  • Re-ranking: add a cross-encoder (e.g. bge-reranker) on top-20 → top-3 for big quality wins.
  • Hybrid search: combine keyword/BM25 with vector search when exact terms, IDs, or error codes matter.
  • RAG evals: measure context recall, faithfulness, answer relevance, citation accuracy, and "I don't know" behavior.
Ship: "Chat with your PDF" with source citations, retrieved chunk preview, and a 25-question eval sheet. (Section 6 of the notebook.)
Phase 04Production
Stage 09

Frameworks & Tooling

Time: 1 week Difficulty: Pick one and commit

The frameworks aren't magic. They're glue. Pick the one with the docs you can read and the failures you can debug.

  • Orchestration: LangChain (broadest) or LlamaIndex (RAG-first). Pick one.
  • Validation: Pydantic v2 — your defense against malformed LLM output.
  • UIs in <50 lines: Gradio (great for ML demos) or Streamlit (great for dashboards).
  • Observability: LangSmith, Langfuse, or Phoenix — see every prompt, response, and tool call.
  • Model gateway: use LiteLLM or a thin adapter layer so switching between OpenAI, Anthropic, Gemini, and local models does not rewrite your app.
  • Integration pattern: learn MCP basics if you want agents/tools to connect cleanly to external systems.
  • Experiment tracking (optional): Weights & Biases, MLflow.
Stage 10

Agents & Multi-step Workflows

Time: 1–2 weeks Difficulty: Advanced pattern

Agent = LLM + tools + memory + a loop. Most "agent failures" are actually loops without limits and tools without tests.

flowchart LR
    U[User goal] --> AG{LLM
plans next step} AG -->|"Thought:
I need to search the docs"| TOOL[Pick a tool
+ arguments] TOOL --> EXEC[Run tool
search / API / code] EXEC -->|Observation| AG AG -->|I have enough info| ANS[Final answer] classDef think fill:#fffbeb,stroke:#fbbf24 classDef act fill:#eff6ff,stroke:#60a5fa classDef done fill:#ecfdf5,stroke:#34d399 class AG think class TOOL,EXEC act class ANS done
Click to expand
  • The ReAct pattern (Reason → Act → Observe → repeat) is the foundation.
  • Tool use / function calling: the LLM emits a JSON call; your runtime executes it and returns the result.
  • Frameworks: LangGraph (graphs and control), CrewAI (role-based), AutoGen (multi-agent chat), OpenAI Agents SDK.
  • Tool quality: typed schemas, input validation, deterministic tool tests, clear errors, and a max-iteration budget.
  • Memory: separate short-term conversation state from long-term user/profile memory; never store sensitive data by accident.
  • Watch out for: infinite loops (always cap iterations), runaway costs (log every call), bad tools (a flaky tool poisons the whole agent).
Ship: a research agent that takes a question, calls 2-3 tools, shows its trace, writes a 1-page sourced report, and stops on budget.
Stage 11

Fine-tuning & Customization

Time: 1–2 weeks Difficulty: Optional — skip until you can't

Fine-tune for style, tone, domain language, format. Almost never for adding facts — that's RAG's job.

flowchart TD
    DATA["Curated dataset
500-5,000 examples"] --> SPLIT["Split
train / eval (80/20)"] SPLIT --> EVAL_BEFORE["Run baseline
measure metrics"] SPLIT --> TRAIN["LoRA / QLoRA training
1-4 epochs
1x consumer GPU"] TRAIN --> ADAPTER["Adapter weights
~10-100 MB
not full model"] ADAPTER --> MERGE["Merge or load
at inference time"] MERGE --> EVAL_AFTER["Re-run eval
compare to baseline"] EVAL_AFTER --> SHIP{Better
on YOUR metric?} SHIP -->|Yes| DEPLOY["Ship adapter"] SHIP -->|No, regressed| DATA SHIP -->|No change| PROMPT["Try better prompts
or RAG instead"] classDef data fill:#fffbeb,stroke:#fbbf24 classDef train fill:#eff6ff,stroke:#60a5fa classDef eval fill:#f5f3ff,stroke:#a78bfa classDef done fill:#ecfdf5,stroke:#34d399 class DATA,SPLIT data class TRAIN,ADAPTER,MERGE train class EVAL_BEFORE,EVAL_AFTER,SHIP eval class DEPLOY,PROMPT done
Click to expand
  • LoRA / QLoRA — train a small "adapter" instead of the full model. Runs on a single consumer GPU.
  • Datasets: quality > quantity. 500 hand-curated examples beat 50,000 noisy ones.
  • Evaluation: build an eval set before you fine-tune. If you can't measure it, you can't improve it.
  • Tools: Hugging Face peft + trl, Unsloth, Axolotl.
Stage 12

Deployment & Productionization

Time: 1 week Difficulty: The portfolio multiplier

A demo on your laptop is a story. A public URL is a credential.

flowchart LR
    DEV[Notebook
that works] --> APP[FastAPI / Flask app
+ Pydantic schemas] APP --> DOCK[Dockerfile
requirements.txt] DOCK --> HOST{Where?} HOST --> HF["HF Spaces
simple demos"] HOST --> RND["Render
web app host"] HOST --> RAIL["Railway
web app host"] HOST --> CLOUD["AWS / GCP / Azure
full control"] APP --> OBS[Observability
LangSmith / Langfuse] APP --> GUARD[Guardrails
input/output filters] APP --> COST[Cost tracking
tokens x price] classDef green fill:#ecfdf5,stroke:#34d399 classDef blue fill:#eff6ff,stroke:#60a5fa classDef yellow fill:#fffbeb,stroke:#fbbf24 class HF,RND green class RAIL,CLOUD blue class OBS,GUARD,COST yellow
Click to expand
  • Wrap your model in FastAPI. Validate input/output with Pydantic.
  • Containerize with Docker so it runs the same everywhere.
  • Low-cost hosting: Hugging Face Spaces, Render, and Railway. Check current free tiers, sleep policies, and pricing before relying on them.
  • Production basics: secrets management, request queues, caching, background jobs, health checks, CI/CD, and rollback notes.
  • Reliability basics: retry only safe operations, cap output tokens, add timeouts, and surface useful errors to users.
Don't ship without: logging, error handling, rate limiting, a kill-switch, a per-request token-cost log, and a small eval set you can run before releases.
Stage 13

Responsible AI

Time: ongoing Difficulty: Not a stage — a lens

Treat user input as untrusted. Treat model output as confidently wrong until proven otherwise. Cite sources, log everything, and never paste PII into someone else's API.

flowchart LR
    USER["User input
(untrusted)"] --> GUARD1["Input guardrails
injection detection
PII redaction"] GUARD1 --> SYS["System prompt
+ retrieved context"] SYS --> LLM[LLM] LLM -->|may hallucinate| GUARD2["Output guardrails
schema validation
citation check
toxicity filter"] GUARD2 --> LOG["Log everything
prompt, response, tokens, cost"] LOG --> RESP["Response to user"] SECRET[("Sensitive data
PII, secrets, IP")] -. never send .-> LLM SECRET -->|use| LOCAL["Self-hosted model
Ollama / vLLM"] LOCAL --> GUARD2 classDef danger fill:#fef2f2,stroke:#dc2626,color:#7f1d1d classDef guard fill:#fffbeb,stroke:#fbbf24 classDef safe fill:#ecfdf5,stroke:#34d399 class USER,SECRET danger class GUARD1,GUARD2,LOG guard class RESP,LOCAL safe
Click to expand
  • Hallucinations: assume they happen. Cite sources, validate, add a confidence step.
  • Prompt injection / jailbreaks: never let user input override system instructions for sensitive actions.
  • Data leakage: don't send PII, secrets, or proprietary data to third-party APIs without consent. Self-host when in doubt.
  • Bias & fairness: test across demographics; "default" outputs aren't neutral.
  • Security baseline: know the OWASP Top 10 for LLM Apps and design around prompt injection, data exfiltration, excessive agency, and insecure plugins/tools.
  • Evals & guardrails: Ragas, DeepEval, Guardrails AI, NeMo Guardrails.
Minimal toolkit

One row, one decision.

There are 200 ways to do every step. Here's the smallest set that gets you to a deployed app — picked because they're free, popular, and won't trap you in a niche later.

AreaPick thisWhy
LanguagePython 3.10+Every GenAI library lives here.
DL frameworkPyTorchWhat modern papers and Hugging Face use.
LLM APIGemini, OpenAI, AnthropicLearn one deeply, compare two others for tradeoffs.
Backend APIFastAPISimple, typed, production-shaped Python web services.
LLM library (open)HF TransformersStandard for any open model.
Local model runnerOllamaOne command to run Llama, Qwen, Gemma.
Model gatewayLiteLLM or thin adapterSwitch providers without rewriting app logic.
OrchestrationLangChain or LlamaIndexPick one and commit.
Vector DB (start)FAISS or ChromaLocal, free, zero ops.
UI prototypeGradio or StreamlitDemo in < 50 lines.
ValidationPydantic v2Stops malformed JSON from breaking prod.
Agent frameworkLangGraphMost controllable; pairs with LangChain.
Tool integrationFunction calling + MCP basicsClean schemas and reusable tool connections.
RAG evalsRagas or DeepEvalMeasure faithfulness instead of trusting vibes.
ObservabilityLangSmith or LangfuseSee every prompt and response.
DeploymentDocker + HF SpacesFree, fast, public URL.
Optional enthusiast path: you do not need a full cloud stack to start. Learn this only after you have one working RAG or agent demo and want to understand how production GenAI systems are deployed in real teams.
AWS production stack

Bedrock-first GenAI system.

Use this lane when the company already trusts AWS and you want managed model access, RAG, agents, guardrails, identity, logging, and deployment without stitching every service from scratch.

AWS logo
Amazon BedrockS3Knowledge BasesAgentsCloudWatchGuardrails
Model access

Amazon Bedrock for hosted foundation models, inference, prompt management, and model choice.

RAG layer

S3 data sources into Bedrock Knowledge Bases with a supported vector store such as OpenSearch Serverless.

Agent layer

Bedrock Agents with action groups, knowledge base access, traces, versions, and aliases.

Serving layer

FastAPI behind Lambda/API Gateway for serverless or ECS Fargate/App Runner for always-on apps.

Safety layer

IAM least privilege, Secrets Manager, Bedrock Guardrails, PII filters, and a kill switch.

Ops layer

CloudWatch logs, token/cost dashboards, prompt versions, eval runs, and rollback notes.

Starter architecture
  • Client sends request to FastAPI.
  • API validates with Pydantic and calls Bedrock.
  • RAG calls Knowledge Bases, retrieves evidence, then generates answer with citations.
Production checklist
  • Secrets Manager, IAM least privilege, CloudWatch dashboards.
  • Retries with backoff, output token caps, request IDs, and rate limits.
  • Guardrails for denied topics, sensitive data, and grounding checks.
Interview story
  • Explain why managed Bedrock reduces operational work.
  • Explain when ECS/App Runner is better than Lambda.
  • Explain how guardrails and evals reduce risk before release.
Google Cloud stack

Vertex AI plus Cloud Run.

Use this lane when you want Gemini through Vertex AI, managed vector retrieval, agent deployment options, serverless containers, and Google-native evaluation/grounding workflows.

Google Cloud logo
Vertex AIGeminiVector SearchAgent EngineCloud RunCloud Logging
Model access

Vertex AI with Gemini, Model Garden, embeddings, prompt experiments, and evaluation workflows.

RAG layer

Cloud Storage or curated docs into Vertex AI Vector Search, with hybrid retrieval when exact terms matter.

Agent layer

Vertex AI Agent Engine or Agent Builder for managed agent runtime, auth, scaling, and tracing.

Serving layer

FastAPI on Cloud Run, images in Artifact Registry, secrets in Secret Manager, and jobs for ingestion.

Safety layer

IAM, service accounts, private networking where needed, grounding, schema validation, and allowlisted tools.

Ops layer

Cloud Logging, Cloud Monitoring, prompt/eval tracking, latency budgets, and per-request cost logs.

Starter architecture
  • FastAPI service on Cloud Run calls Vertex AI Gemini.
  • Ingestion job chunks docs and writes vectors to Vector Search.
  • App retrieves evidence, grounds the answer, and logs tokens and latency.
Production checklist
  • Secret Manager, Artifact Registry, Cloud Logging, and Cloud Monitoring.
  • Service account per workload, input validation, and clear timeout policies.
  • Saved eval set before every prompt or retrieval change.
Interview story
  • Explain why Cloud Run is a clean first production container target.
  • Explain vector search vs keyword search vs hybrid retrieval.
  • Explain how grounding and evals catch silent RAG failures.
Azure production stack

Foundry plus enterprise search.

Use this lane when the company runs on Microsoft identity, Azure OpenAI or Microsoft Foundry, governance-heavy workflows, and enterprise RAG over private documents.

Azure logo
Microsoft FoundryAzure OpenAIAI SearchAgent ServiceKey VaultApp Insights
Model access

Microsoft Foundry or Azure OpenAI for models, deployments, projects, endpoints, and governance.

RAG layer

Azure AI Search with vector indexes, vectorizers, chunking/indexer pipelines, semantic ranker, and hybrid search.

Agent layer

Microsoft Foundry Agent Service, Agent Framework, or a custom FastAPI agent with typed tools.

Serving layer

FastAPI on Azure Container Apps or App Service, with background jobs for ingestion and eval runs.

Safety layer

Key Vault, RBAC, private endpoints, Azure AI Content Safety, and policy controls around tools and data.

Ops layer

Application Insights, Azure Monitor, eval dashboards, cost budgets, and deployment slots or rollback notes.

Starter architecture
  • FastAPI app validates requests and calls Foundry or Azure OpenAI.
  • Documents are indexed into Azure AI Search with vectors and metadata.
  • RAG retrieves evidence, applies policy checks, and returns cited answers.
Production checklist
  • Key Vault, managed identity, private endpoints, and App Insights traces.
  • Content safety checks, schema validation, and PII handling.
  • Hybrid search, citation validation, and saved RAG evals.
Interview story
  • Explain why AI Search is a strong enterprise RAG default.
  • Explain Foundry Agent Service vs a custom API agent.
  • Explain how RBAC, Key Vault, and monitoring fit production readiness.
What this actually costs

Keep the first pass cheap.

For a learner doing the roadmap end-to-end, keep the first pass cheap: free tiers, local models, tiny datasets, and small eval sets before any paid scale-up.

ProviderTierWhat you can do
Google AI Studio (Gemini)Free / low-costGood for learner experiments. Limits change, so check the current console before large runs.
OpenAIOptional paid creditUse small/fast models for practice calls, extraction, routing, and structured output demos.
Anthropic ClaudeOptional paid creditUseful comparison provider for writing, analysis, and safety-focused workflows.
Hugging FaceFreeDatasets, model downloads, Spaces hosting (CPU).
Ollama (local)FreeRuns small open models on your laptop; performance depends on RAM, GPU, and model size.
Beginner play: start with free-tier hosted models + Ollama for local experiments. Spend money only when you need reliability, higher limits, or a production demo.
Build these, in this order

Six beginner projects worth showing.

A strong fresher portfolio is a few small things you can explain clearly, not one oversized project you abandoned. Personal, finished, and measurable beats flashy.

Project 01 · Stage 07

Smart Summarizer

Paste an article, lecture note, or long email and return a 5-bullet summary, key terms, risks, and one suggested follow-up question.

PromptingStructured outputStreamingCase-study first
Build scope
  • Input textarea or file upload.
  • Output JSON with summary, keywords, confidence, and limitations.
  • Log model, latency, token count, and prompt version.
Interview questions
  • How did you reduce hallucinations in summaries?
  • Why structured output instead of free text?
  • How would you evaluate summary quality?
Project 02 · Stage 07

AI Support Ticket Triage

Paste a customer issue and return structured routing: category, urgency, sentiment, missing details, and a suggested first reply.

PydanticJSON schemaPrompt evalsHuman review
Build scope
  • Accept pasted tickets or CSV rows and normalize messy text.
  • Return category, priority, confidence, missing fields, and next action.
  • Add 20 synthetic tickets for routing and tone evals.
Interview questions
  • How do you validate model output?
  • How do you handle low-confidence classifications?
  • What makes a triage result measurable?
Project 03 · Stage 07

Code Explainer Bot

Paste a function and get a beginner explanation, complexity estimate, edge cases, and one refactor suggestion.

Code promptsSafetyTestsExamples
Build scope
  • Support Python first. Add language detection later.
  • Return explanation, complexity, pitfalls, and test ideas.
  • Show prompt examples in the project notes.
Interview questions
  • When should you not use an LLM for code?
  • How do you detect wrong explanations?
  • How would you add unit-test generation safely?
Project 04 · Stage 08

Chat With Your PDF

Build a RAG app over one textbook, policy document, or handbook. Answers must cite source chunks and say "I don't know" when evidence is missing.

RAGEmbeddingsCitationsEval sheet
Build scope
  • Chunk document, embed, retrieve top-k, and cite sources.
  • Show retrieved chunks before the final answer.
  • Create 25 questions with expected evidence.
Interview questions
  • How did you choose chunk size?
  • What is context recall?
  • How do you reduce wrong citations?
Project 05 · Stage 10

Mini Research Agent

Give it one research question. It should call two or three safe tools, show its trace, and produce a short sourced report with a budget limit.

Tool callingTracesBudgetsStop rules
Build scope
  • Use typed tool schemas and validate arguments.
  • Cap iterations, tokens, and total tool calls.
  • Store trace logs for every run.
Interview questions
  • What makes an agent different from a chatbot?
  • How do you stop loops?
  • How do you test tools independently?
Project 06 · Capstone

Personal Capstone

Build something you would genuinely use: a study buddy for your notes, a personal finance explainer, a journal reflector, or a niche workflow from your college or internship.

Personal nichePublic demoEval reportDemo video
Build scope
  • One clear user, one painful workflow, one measurable outcome.
  • Public URL, project brief, screenshots, architecture, and limitations.
  • Include cost, latency, failure cases, and what you would improve.
Interview questions
  • Why did you choose prompt, RAG, agent, or fine-tuning?
  • What failed in v1?
  • How would you scale this for real users?

Minimum quality bar

Public demo, screenshots, setup commands, architecture diagram, prompt examples, and known limitations.

Interview signal

Show eval results, cost/latency logs, a failure case you fixed, and why you chose prompt/RAG/agent/fine-tune.

Viral hook

Build for a real niche: "AI study buddy for my notes" beats generic chatbot because people instantly understand the value.

12-week sprint

One quarter. One portfolio.

Doable part-time at ~10 hours/week. Doable full-time in 6–8 weeks if you can give it your day job's attention.

6 weeks · full-time

Fast track

25-35 hours/week. Best if you already know Python and can build daily.

  1. Week 1: Python/API refresh + ML/DL basics.
  2. Week 2: Transformers, hosted APIs, structured output.
  3. Weeks 3-4: RAG, vector DB, evals, deployment.
  4. Weeks 5-6: agent, capstone, interview drill.
12 weeks · part-time

Recommended

8-12 hours/week. Best for freshers, students, and working professionals.

  1. Mon-Wed: learn + notes.
  2. Thu-Fri: code the mini-build.
  3. Weekend: polish docs, record demo, add evals.
  4. Every second Sunday: mock interview + refactor.
24 weeks · slow burn

College track

4-6 hours/week. Best when exams, classes, or a job compete for time.

  1. One stage every 1-2 weeks.
  2. One public progress post per month.
  3. One polished project every 6-8 weeks.
  4. Final month: DSA refresh + GenAI system design.
Recommended 12-week path

A project-first sprint that builds proof every two weeks.

Use this as a pacing guide, not a deadline. If a project needs another week, take it and keep the project notes honest.

Weeks 1-2

Python, APIs, and setup

Get comfortable with scripts, files, HTTP, JSON, env vars, Git, and simple tests.

Public starter project
Weeks 3-4

ML and deep learning basics

Learn enough metrics, splits, tensors, and training loops to explain what models are doing.

Tiny classifier + metrics
Weeks 5-6

Transformers and LLM APIs

Understand tokens, context, latency, cost, structured output, and provider tradeoffs.

LLM tool with logs
Weeks 7-8

Prompting and RAG

Build retrieval over real documents, cite sources, and create a small eval sheet.

Chat-with-PDF demo
Weeks 9-10

Frameworks and agents

Add tool schemas, traces, stop rules, memory choices, and budget limits without overbuilding.

Safe mini-agent
Weeks 11-12

Capstone and deployment

Ship one useful app with a project brief, screenshots, demo video, eval report, and interview story.

Portfolio-ready app
WeeksFocusDeliverableInterview checkpoint
1-2Python + API basicsProject folder with scripts, tests, screenshots, and setup notesExplain HTTP, JSON, env vars, errors, and version-control flow.
3-4ML + Deep LearningTiny image/text classifier with metricsExplain overfitting, splits, metrics, and baseline models.
5-6Transformers + LLM APIsHosted LLM tool with streaming + logsExplain tokens, temperature, context, latency, and cost.
7-8Prompt engineering + RAGChat-with-PDF deployed publiclyExplain chunking, embeddings, retrieval, citations, and RAG evals.
9-10Agents + frameworksAgent with 2-3 tools and visible traceExplain tool schemas, loops, memory, budgets, and failures.
11Capstone buildOne project you'd genuinely usePrepare the architecture story and tradeoff decisions.
12Deployment + polishProject brief, demo video, eval report, blog postRun mock system design and portfolio deep-dive rounds.
The seven traps

Common beginner mistakes.

If you hit one, you're not behind — you're average. Reading them up front saves you the wasted weeks.

  1. Tutorial loop. You finish 12 courses and have no working project to show. Build first; learn while building.
  2. Jumping to fine-tuning. Many beginner "fine-tuning" ideas are better solved with a clearer prompt, structured output, or RAG.
  3. Skipping evaluation. "It looked good in 3 examples" isn't a result. Build a 20–50 example eval set early and re-run it after every change.
  4. Ignoring tokens. Tokens are time, money, and quality. Always log them per request.
  5. Trusting LLM output blindly. Validate JSON. Verify facts. Treat it like a junior engineer who's confident but new.
  6. Over-frameworking. LangChain/LangGraph are great when you need them. For a 30-line script, plain requests + the SDK is fine.
  7. Building in secret. Share your progress, publish small demos, and write up what you broke. The portfolio is the credential.
Interview prep

What a hiring manager will actually ask.

The questions that come up again and again: fundamentals, RAG, agents, production, security, and the portfolio deep dive. If you can answer these with tradeoffs, you're ahead of most applicants.

Round 01Code

Python + API task

Build a small endpoint or script that calls an LLM, validates JSON, handles errors, and logs latency/tokens.

Round 02System

Design a RAG app

Walk through ingestion, chunking, embeddings, vector DB, retrieval, prompt assembly, evals, and monitoring.

Round 03Debug

Fix bad output

Given hallucinations, slow latency, high cost, or wrong citations, explain what you would measure and change first.

QuestionAnswer in two sentences
What's the difference between fine-tuning and RAG? RAG retrieves your docs at query time and injects them into the prompt — facts. Fine-tuning updates weights to teach behavior — style, tone, format. Try RAG first; fine-tune only when prompting + RAG genuinely can't get there.
How would you reduce hallucinations in production? Ground the model with RAG and require citations, force structured output via JSON schema + Pydantic, lower temperature for factual tasks, and re-run a held-out eval set on every release. If the cost of a wrong answer is high, add a verification step or a human in the loop.
What is a token, and why does it matter? A unit of text the model reads — about 4 characters in English, ¾ of a word. Every token affects cost and latency, so logging tokens per request is one of the best habits for improving cost, speed, and quality.
When would you NOT use an LLM? When determinism matters (a regex or parser is better), when the input is structured data (use SQL), when latency budget is tight, or when failure is unacceptable without human review (medical, legal, financial decisions).
How do you evaluate an LLM application? Build a 20–100 example eval set with input + expected output before you ship. Score with exact match, embedding similarity, LLM-as-judge, or human review depending on the task — and re-run on every change. If you can't measure it, you can't improve it.
What is prompt injection and how do you defend against it? A user smuggling instructions into the prompt to override system rules ("ignore previous instructions and..."). Defenses: treat user input as data not instructions, use tool/function calling instead of free-text actions for sensitive operations, and require out-of-band confirmation for destructive actions.
Why does temperature matter, and what value should I use? It controls sampling randomness. Use lower values for extraction, classification, and structured output; use higher values for writing or brainstorming. Always test on your own examples.
Hosted (GPT/Claude/Gemini) or open-source (Llama/Mistral)? Hosted APIs are usually easiest to start with and charge per token. Open-weight models can help with privacy, control, and scale, but you operate more infrastructure. Start hosted for learning, then compare open models when cost or compliance matters.
How would you design a RAG chatbot for company policies? Ingest docs with metadata, chunk by section, embed, store in a vector DB, retrieve top-k, re-rank, assemble a grounded prompt, and return citations. Then evaluate with known policy questions, missing-answer cases, and citation checks.
What makes an agent different from a normal chatbot? A chatbot usually answers from conversation context; an agent can choose tools, observe results, update state, and continue toward a goal. Production agents need typed tools, iteration caps, logs, and budget controls.
How do you keep structured output reliable? Use the provider's structured-output mode or function calling, validate with Pydantic/JSON Schema, and retry with the validation error when parsing fails. Keep temperature low and test with malformed, empty, and adversarial inputs.
How would you reduce LLM app latency and cost? Use a smaller model where possible, trim context, cache repeated retrievals/responses, stream output, batch offline work, and stop generating early. Log tokens, latency, and model choice so optimization is based on data.
What should be in a GenAI portfolio case study? Problem, demo link, architecture diagram, setup, example prompts, screenshots, eval results, cost/latency notes, limitations, and future improvements. It should prove you can think like an engineer, not only follow a tutorial.
What interviewers want to see: can you explain why the app failed and what you checked first? Keep prompt, response, retrieval, tool calls, latency, and token usage visible. Add a small eval set you can rerun after changes. That is enough to show practical engineering judgment.
After 12 weeks

What you'll actually be able to do.

By the end, you should have shipped, debugged, and deployed small GenAI apps. These are the practical capabilities to aim for.

  1. Pick the right approach for a problem. Given a vague brief, decide whether it's a prompt problem, a RAG problem, an agent problem, or a fine-tuning problem — and defend the choice with cost and effort numbers.
  2. Call LLM APIs with confidence. Hosted providers and local models. Stream responses, handle rate limits, manage env vars, and log tokens.
  3. Build a production-shaped RAG pipeline. Ingest, chunk, embed, store, retrieve, re-rank, generate, and cite sources with clear limitations.
  4. Get reliable structured output. JSON schemas, Pydantic validation, and retries on parse failure so a demo can become a backend feature.
  5. Build a small tool-using workflow with guardrails. Iteration caps, cost logging, tool tests, visible traces, and clear stop conditions.
  6. Deploy a public URL. FastAPI + Docker + a host (HF Spaces / Render / Railway). Logged, rate-limited, monitored.
  7. Evaluate an LLM app rigorously. Build a 50-example eval set, score with multiple methods, track improvements across versions. The thing most beginners skip and most hiring managers ask about.
  8. Know what you don't know. Read a paper abstract and tell whether it's relevant to your work. Read a model card and spot a red flag. Read a vendor pitch and ask the right cost questions.
Free, ranked

Trusted resources to open when needed.

Use this as a ranked shortlist: watch one video for intuition, read one official doc for implementation, then build. Do not turn learning into 40 open tabs.

YouTubeStages 03-05
Karpathy — Neural Networks: Zero to Hero
A clear deep-learning series that builds neural networks step by step, including GPT-style models.
CourseStages 05-06
Hugging Face NLP + LLM Course
Practical, free, and useful for becoming productive with the open-source stack.
CourseLLM apps
DeepLearning.AI short courses
One-hour dives on Prompting, RAG, LangChain, Agents. High signal-to-noise.
CourseStage 04
fast.ai — Practical Deep Learning
Top-down. You ship a model in lecture one and only later learn what you built.
BuildStage 12
Full Stack Deep Learning
Production-grade ML systems — what tutorials skip and your team will ask about in week one.
DocsStage 07
Anthropic prompt engineering
Official Claude prompting guidance for examples, formatting, structure, and reducing vague model behavior.
DocsStage 06
OpenAI API quickstart
Use this when you are ready for your first real model call, streaming output, API keys, and a tiny working app.
VideoStage 02
3Blue1Brown neural networks
The best visual intuition for weights, activations, gradients, and backprop before you touch heavy formulas.
CourseStages 02-03
Google Machine Learning Crash Course
A friendly foundation for labels, features, loss, train/test splits, metrics, overfitting, and embeddings.
BuildStage 08
LangChain RAG tutorial
Use this when building the first Chat with PDF project: load, split, embed, retrieve, and answer from context.
BuildStage 08
LlamaIndex starter example
A clean alternative when the project is mostly document ingestion, indexing, retrieval, and query workflows.
CourseStage 10
Hugging Face Agents Course
A beginner-friendly route to tools, actions, planning, and why agents need limits instead of hype.
GuideStage 10
Anthropic building effective agents
Read this before overbuilding agents. It explains when simple workflows are better than autonomous loops.
DocsStage 12
OpenAI evaluation best practices
The resource most beginners skip: how to judge whether your LLM app improved after a prompt, model, or RAG change.
Foundation videos for beginners
LLM app builder path
RAG and document AI path
Agents, evals, and production path
Glossary

Speak the language.

The words you'll hear every day. Memorize the definitions and you'll keep up in any meeting or interview.

TermOne-line meaning
TokenThe chunk of text the model actually sees. ~4 characters or ¾ of a word.
Context windowHow many tokens the model can read at once (input + output combined).
EmbeddingA vector that represents meaning. Two similar sentences, two close vectors.
Vector DBA database that stores embeddings and finds nearest neighbors fast.
TemperatureRandomness knob. 0 = same answer every time, 1 = creative.
System promptPersistent instructions the model follows for the whole conversation.
HallucinationConfident-sounding but wrong output. Mitigate with RAG + citations.
RAG"Look it up in my docs, then answer using what you found."
Hybrid searchCombining keyword search and vector search so exact terms and semantic meaning both work.
Re-rankingA second model reorders retrieved chunks so the best evidence reaches the LLM first.
Fine-tuningUpdating the model's weights on your data — usually a small LoRA adapter.
AgentAn LLM that can choose to call tools in a loop until a goal is met.
Tool / function callingThe LLM emits a JSON call; your code runs it; the result goes back.
MCPModel Context Protocol: a standard way for AI apps to connect to external tools and data sources.
Prompt injectionA user smuggling instructions into your prompt to override system rules.
Eval setA saved set of inputs and expected behavior you rerun after prompt, model, or code changes.
LatencyHow long the user waits. LLM latency comes from retrieval, model time, output length, and network.
Rate limitA provider cap on requests or tokens per minute. Handle it with backoff and queues.
InferenceRunning the model to get a result (vs. training, which builds the model).

Start with one tiny build.

Open the notebook, run the first cell, and make one small LLM call work. Then add structure, data, evals, and deployment one layer at a time.

Stay close to the field

Keep learning with fresh notes.

GenAI tooling changes quickly. Follow along for practical notes, interview patterns, project breakdowns, and updates when the roadmap changes.

  1. Want weekly direction? Subscribe to the LinkedIn newsletter for concise GenAI notes and fresher-friendly interview prep.
  2. Built something from the roadmap? Share it on LinkedIn and tag Purnendu Das so it can reach more learners.
  3. Found something outdated? Send a LinkedIn message with the source and a short explanation. The rule is: cite numbers or working code, not vibes.