๐Ÿš€ AI ์ธํ”„๋ผ ์—”์ง€๋‹ˆ์–ด: LLM ์‹œ๋Œ€์˜ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฐฑ๋ณธ ์—ญํ•  ๋ฐ ๋‹จ๊ณ„๋ณ„ ํ•„์š” ํ•™์Šต ์—ญ๋Ÿ‰

AI ์ธํ”„๋ผ ์—”์ง€๋‹ˆ์–ด๋Š” ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ LLM ์›Œํฌ๋กœ๋“œ๊ฐ€ GPU, ํด๋Ÿฌ์Šคํ„ฐ, ๊ทธ๋ฆฌ๊ณ  ํด๋ผ์šฐ๋“œ์—์„œ ์•ˆ์ •์ ์œผ๋กœ ์‹คํ–‰๋˜๋„๋ก ํ•˜๋Š” ๋ฐฑ๋ณธ(backbone)์„ ์„ค๊ณ„ํ•˜๊ณ  ์šด์˜์„ ํ•˜๋ฉฐ, ์ด๋“ค์€ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

:light_bulb: ํ˜„์—…์—์„œ AI ์ธํ”„๋ผ ์—”์ง€๋‹ˆ์–ด๋Š” ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค:

:clipboard: GPU ํด๋Ÿฌ์Šคํ„ฐ, ๋ฆฌ์†Œ์Šค ์Šค์ผ€์ค„๋ง, ํ•™์Šต/์ถ”๋ก  ์Šค์ผ€์ผ๋ง ๊ด€๋ฆฌ
:clipboard: ML ์›Œํฌ๋กœ๋“œ์šฉ ๋ฐ์ดํ„ฐ ์ธ์ œ์…˜ ๋ฐ Feature ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•
:clipboard: Triton, vLLM, Ray Serve ๊ฐ™์€ ๋„๊ตฌ๋กœ ๋ฐฐํฌ ์ตœ์ ํ™”
:clipboard: AIOps ์Šคํƒ์„ ํ™œ์šฉํ•œ Observability ๋ฐ ์žฅ์•  ๋ณต๊ตฌ ์ž๋™ํ™”
:clipboard: ๋ชจ๋ธ ์ด๋™์„ฑ์„ ์œ„ํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ฐ ๋ฉ€ํ‹ฐํด๋ผ์šฐ๋“œ ์›Œํฌํ”Œ๋กœ์šฐ ์ง€์›

Level 1 โ€” AI ๊ธฐ์ดˆ ๋‹จ๊ณ„

โ†’ ํ”„๋กœ๊ทธ๋ž˜๋ฐ: Python, Bash, ๊ทธ๋ฆฌ๊ณ  ์‹œ์Šคํ…œ ์–ธ์–ด(Go ๋˜๋Š” Rust)
โ†’ ์šด์˜์ฒด์ œ/๋„คํŠธ์›Œํ‚น: TCP/IP, DNS, ํฌํŠธ, SSH, ๋ณด์•ˆ ๊ทธ๋ฃน
โ†’ ํด๋ผ์šฐ๋“œ ๊ธฐ์ดˆ: AWS, GCP, ๋˜๋Š” Azure โ€” VM, ์Šคํ† ๋ฆฌ์ง€, IAM, ๋น„์šฉ ๊ด€๋ฆฌ
โ†’ DevOps ๊ธฐ์ดˆ: ๋ฒ„์ „ ๊ด€๋ฆฌ(Git), CI/CD ๊ฐœ๋…, Docker

Level 2 โ€” ๋ฐ์ดํ„ฐ & ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ์ดˆ ๋‹จ๊ณ„

โ†’ ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง & ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค: SQL, NoSQL, ๋ถ„์‚ฐ ํŒŒ์ผ ์Šคํ† ๋ฆฌ์ง€
โ†’ ML & DL ๊ธฐ์ดˆ: ํ•ต์‹ฌ ML ๊ฐœ๋…, scikit-learn, TensorFlow, PyTorch
โ†’ ์‹คํ—˜ ๊ด€๋ฆฌ: Jupyter ๋…ธํŠธ๋ถ, ๋ฉ”ํŠธ๋ฆญ, ์žฌํ˜„์„ฑ
โ†’ ํ†ต๊ณ„ & ์ง€ํ‘œ: ๊ธฐ๋ณธ ํ†ต๊ณ„, precision/recall, ROC, ๋ฐ์ดํ„ฐ ํ”„๋กœํŒŒ์ผ๋ง

Level 3 โ€” AI ์ธํ”„๋ผ & ์—”์ง€๋‹ˆ์–ด๋ง ํ•ต์‹ฌ ๋‹จ๊ณ„

โ†’ ์ปจํ…Œ์ด๋„ˆ & ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜: Docker, Kubernetes, Helm
โ†’ ์Šคํ† ๋ฆฌ์ง€ & ๋ฐ์ดํ„ฐ ์›Œํฌํ”Œ๋กœ์šฐ: Object storage(S3/GCS), ETL ํŒŒ์ดํ”„๋ผ์ธ
โ†’ ๋ถ„์‚ฐ ํ•™์Šต & ์„œ๋น™: Multi-GPU ์‹œ์Šคํ…œ, NCCL, CUDA, Triton
โ†’ ์›Œํฌํ”Œ๋กœ์šฐ & ๋ชจ๋‹ˆํ„ฐ๋ง: MLflow, Kubeflow, Airflow, Prometheus, Grafana

Level 4 โ€” ๊ณ ๊ธ‰ AI ์ธํ”„๋ผ & DevOps ๋‹จ๊ณ„

โ†’ ๋ณด์•ˆ & ์ปดํ”Œ๋ผ์ด์–ธ์Šค: Secrets ๊ด€๋ฆฌ, Policy as Code, ๊ฐ์‚ฌ ๋กœ๊ทธ
โ†’ AI ๋„คํŠธ์›Œํ‚น: Istio/Linkerd, API Gateway, Load Balancing
โ†’ ํด๋ผ์šฐ๋“œ-๋„ค์ดํ‹ฐ๋ธŒ AI ํ”Œ๋žซํผ: Vertex AI, SageMaker, Databricks
โ†’ Infrastructure as Code: Terraform, CloudFormation, Ansible

Level 5 โ€” ์‹ค์ „ ์ ์šฉ & ํ”„๋กœ์ ํŠธ ๋‹จ๊ณ„

  1. Multi-GPU Training Setup
    ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•ด PyTorch DDP + Kubernetes + Prometheus๋กœ ๋ถ„์‚ฐ ํ•™์Šต ์‹œ๋ฎฌ๋ ˆ์ด์…˜
  2. RAG Deployment Demo
    LangChain + FastAPI + Triton inference๋กœ ์ตœ์†Œํ˜• RAG ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ• โ†’ ์ปจํ…Œ์ด๋„ˆํ™”ํ•˜๊ณ  Render ๋˜๋Š” Hugging Face Spaces์— ๋ฐฐํฌ
  3. AI Infra Observability
    GPU ์‚ฌ์šฉ๋ฅ , ์ง€์—ฐ ์‹œ๊ฐ„, ์š”์ฒญ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” Grafana ๋Œ€์‹œ๋ณด๋“œ ๊ตฌ์„ฑ
  4. Cost-Aware Scaling
    KEDA ๋˜๋Š” Autoscaler ๊ธฐ๋ฐ˜์œผ๋กœ GPU ์ž๋™ ์Šค์ผ€์ผ๋ง ๊ตฌํ˜„ โ†’ ๋น„์šฉ/์„ฑ๋Šฅ ๊ทธ๋ž˜ํ”„ ์ œ์‹œ

Level 6 โ€” ์ปค๋ฆฌ์–ด ์„ฑ์žฅ & ์ปค๋ฎค๋‹ˆํ‹ฐ ๋‹จ๊ณ„

โ†’ ์˜คํ”ˆ์†Œ์Šค ๊ธฐ์—ฌ: ML Infra ์ €์žฅ์†Œ ๊ธฐ์—ฌ, ์ด์Šˆ ๋ฆฌํฌํŒ…, ๊ธฐ๋Šฅ ๊ฐœ๋ฐœ
โ†’ ๋„คํŠธ์›Œํ‚น: KubeCon, PlatformCon ์ฐธ์„, ์˜จ๋ผ์ธ ML/Infra ์ปค๋ฎค๋‹ˆํ‹ฐ ์ฐธ์—ฌ

[์ถœ์ฒ˜] AI Infra Engineer Learning Roadmap

2 Likes