๐Ÿ“ [Wiki] ์ž ๋“œ๊ฐ€์žโ†’ AI-Driven Kubernetes RCA and Automated Remediation

:bookmark_tabs: [์ž ๋“œ๊ฐ€์ž] Official Wiki

KR: ์ด ํŽ˜์ด์ง€๋Š” Dr.Kube์˜ ๋น„์ „, ๊ธฐ์ˆ ์  ๋ฐฉํ–ฅ์„ฑ, ๊ทธ๋ฆฌ๊ณ  ํ˜‘์—… ๋ฐฉ์‹์„ ์ •์˜ํ•˜๋Š” ํ†ตํ•ฉ ๋ฌธ์„œ์ž…๋‹ˆ๋‹ค. ํŒ€์›๊ณผ ์™ธ๋ถ€ ๊ธฐ์—ฌ์ž๋“ค์ด ์กฐํ™”๋กญ๊ฒŒ ํ˜‘์—…ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ๊ณต์‹ ๊ฐ€์ด๋“œ๋ผ์ธ์ž…๋‹ˆ๋‹ค.

EN: This page serves as the comprehensive documentation defining the vision, technical direction, and collaboration methods for ์ž ๋“œ๊ฐ€์ž. It is an official guideline to ensure seamless collaboration between the team and external contributors.

1. ํ”„๋กœ์ ํŠธ ๊ฐœ์š” (Project Overview)

  • Purpose: AI๋ฅผ ํ™œ์šฉํ•œ Kubernetes ์ธ์‹œ๋˜ํŠธ ์•Œ๋žŒ ๋ถ„์„, ๊ทผ๋ณธ ์›์ธ ๋ถ„์„(RCA), ๊ทธ๋ฆฌ๊ณ  ๋Œ€์‘ ๊ฐ€์ด๋“œ ์ž๋™ํ™” ๋„๊ตฌ ; An AI-powered tool for Kubernetes incident alert analysis, Root Cause Analysis (RCA), and automated remediation guidance.

  • Background / Introduction (KR):

    • ์™„์ „ ์ž๋™ํ™”๋ณด๋‹ค๋Š” ํ•ด๊ฒฐ ์ „ ๋‹จ๊ณ„์˜ '์ •ํ™•์„ฑโ€™์— ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค. ์žฅ์•  ๋ฐœ์ƒ ์‹œ Prometheus, ๋กœ๊ทธ, ๋ฉ”ํŠธ๋ฆญ ๋“ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™์œผ๋กœ ์ˆ˜์ง‘ํ•˜๊ณ , LLM๊ณผ ๋ฒกํ„ฐ DB๋ฅผ ํ™œ์šฉํ•ด ๊ณผ๊ฑฐ ์œ ์‚ฌ ์žฅ์•  ์‚ฌ๋ก€ Top 5๋ฅผ ๋น„๊ตํ•˜์—ฌ ์‹ ์†ํ•œ ๋Œ€์‘ ๊ฐ€์ด๋“œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์ฃผ๋‹ˆ์–ด ์—”์ง€๋‹ˆ์–ด์˜ ๊ฒฝํ—˜ ์˜์กด๋„๋ฅผ ๋‚ฎ์ถ”๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
  • Background / Introduction (EN):

    • Our focus lies in the โ€˜accuracyโ€™ of the diagnostic phase rather than pursuing full automation. When an incident occurs, the system automatically aggregates data from Prometheus, logs, and metrics. By leveraging LLMs and Vector Databases, it identifies and compares the Top 5 similar historical cases to provide a rapid, data-backed response guide. Our primary goal is to standardize operational quality and reduce the dependency on the individual experience of junior engineers.
  • Core Values:

    • e.g., Knowledge Sovereignty (์ง€์‹ ์ฃผ๊ถŒ), Open Source (์˜คํ”ˆ ์†Œ์Šค ์ •์‹ ), Innovation (๊ธฐ์ˆ  ํ˜์‹ )

2. ํŒ€ ๊ตฌ์„ฑ (The Team)

Roles and responsibilities for the member team.

์ด๋ฆ„ (Name) ID ์—ญํ•  (Role) SNS ์ฃผ์š” ์ฑ…์ž„ (Responsibilities - KR/EN)
ํŒ€์› A @id_a Team Leader Link ๋กœ๋“œ๋งต ๋ฐ ์ตœ์ข… ์˜์‚ฌ๊ฒฐ์ • / Roadmap & Final decision-making
ํŒ€์› B @id_b Tech Lead Link ์•„ํ‚คํ…์ฒ˜ ๋ฐ ์ฝ”๋“œ ๋ฆฌ๋ทฐ / Architecture & Code reviews
ํŒ€์› C @id_c Core Dev Link ํ•ต์‹ฌ ๊ธฐ๋Šฅ ๊ตฌํ˜„ / Core logic & API implementation
ํŒ€์› D @id_d DevOps Link ์ธํ”„๋ผ ๋ฐ CI/CD ๊ด€๋ฆฌ / Infrastructure & CI/CD management
ํŒ€์› E @id_e Writer Link ๋ฌธ์„œํ™” ๋ฐ ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ด€๋ฆฌ / Documentation & Community

3. ๊ธฐ์ˆ  ์Šคํƒ (Tech Stack)

  • Language: (e.g., TypeScript, Go, Python)
  • Infra: (e.g., Kubernetes, Docker, Terraform)
  • Communication: Discord, GitHub Issues

4. ๋กœ๋“œ๋งต (Roadmap)

  • Phase 1: MVP ์š”๊ตฌ์‚ฌํ•ญ ์ •์˜ (MVP Requirement Definition)
  • Phase 2: ํ•ต์‹ฌ ๋ชจ๋“ˆ ๊ฐœ๋ฐœ ๋ฐ ์•ŒํŒŒ ํ…Œ์ŠคํŠธ (Core Module Dev & Alpha Test)
  • Phase 3: ๊ธ€๋กœ๋ฒŒ ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ณต๊ฐœ (Global Community Launch)

5. ์ฐธ์—ฌ ๋ฐฉ๋ฒ• (How to Contribute)

  • Issues: ๋ฒ„๊ทธ๋‚˜ ๊ธฐ๋Šฅ ์ œ์•ˆ์€ GitHub Issues๋ฅผ ํ™œ์šฉํ•˜์„ธ์š”. (Please use GitHub Issues for bug reports or feature requests.)
  • PRs: ๋ชจ๋“  Pull Request๋Š” Tech Lead์˜ ๊ฒ€ํ†  ํ›„ ๋ณ‘ํ•ฉ๋ฉ๋‹ˆ๋‹ค. (All PRs will be merged after review by the Tech Lead.)
  • Guide: [CONTRIBUTING.md] ํŒŒ์ผ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”. (Please refer to the [CONTRIBUTING.md] file.)
  • Discord (Official): [์ž ๋“œ๊ฐ€์ž Invite Link]
    • KR: ์‹ค์‹œ๊ฐ„ ์†Œํ†ต ๋ฐ ๊ธฐ์ˆ  ์ง€์›์„ ์œ„ํ•œ ์ฑ„๋„์ž…๋‹ˆ๋‹ค.
    • EN: Official channel for real-time communication and technical support.

6. ๋ฆฌ์†Œ์Šค ๋ฐ ๋งํฌ (Resources & Links)

  • GitHub Repository: [Link]
  • Docs: [Architecture / API Specs]

| This is a space where knowledge is not merely consumed, but respected, sovereign, and connectedโ€”shared together with cloud industry professionals (Bros).|
| ์ง€์‹์ด ์†Œ๋น„๋˜์ง€ ์•Š๊ณ  ์กด์ค‘ยท์ฃผ๊ถŒ๋ณด์žฅยท์—ฐ๊ฒฐ๋˜๋Š” ๊ณต๊ฐ„์œผ๋กœ ํด๋ผ์šฐ๋“œ ํ˜„์—… ์ „๋ฌธ๊ฐ€(Bro)์™€ ํ•จ๊ป˜ ๊ณต์œ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. |