[KubeRCA] Official Wiki
KR: ์ด ํ์ด์ง๋ KubeRCA์ ๋น์ , ๊ธฐ์ ์ ๋ฐฉํฅ์ฑ, ๊ทธ๋ฆฌ๊ณ ํ์ ๋ฐฉ์์ ์ ์ํ๋ ํตํฉ ๋ฌธ์์ ๋๋ค. ํ์๊ณผ ์ธ๋ถ ๊ธฐ์ฌ์๋ค์ด ์กฐํ๋กญ๊ฒ ํ์ ํ ์ ์๋๋ก ๋๋ ๊ณต์ ๊ฐ์ด๋๋ผ์ธ์ ๋๋ค.
EN: This page serves as the comprehensive documentation defining the vision, technical direction, and collaboration methods for ์ ๋๊ฐ์. It is an official guideline to ensure seamless collaboration between the team and external contributors.
1. ํ๋ก์ ํธ ๊ฐ์ (Project Overview)
-
Purpose: AI Agent๋ฅผ ํ์ฉํ์ฌ Kubernetes ํ๊ฒฝ์์ ๋ฐ์ํ๋ Incident Alarm์ ๋ถ์ํ๊ณ , ์ค์ ํด๋ฌ์คํฐ ๋ด Application๊ณผ Node์ ์ํ๋ฅผ ์ข ํฉ์ ์ผ๋ก ํด์ํ์ฌ ํ์คํ๋ RCA ํ ํ๋ฆฟ ๊ธฐ๋ฐ์ ๋ถ์ ๊ฒฐ๊ณผ๋ฅผ ์ ๊ณตํ๋ ๊ฒ์ ๋ชฉํ๋ก ํฉ๋๋ค.
-
Background / Introduction (KR):
- ์์ ์๋ํ๋ ์ฅ์ ๋์๋ณด๋ค๋ ์ฅ์ ๋์ ๊ณผ์ ์ค ๋ฐ์ํ๋ ๋ค์ํ Alert๋ฅผ AI Agent์ ํตํด ๋ถ์ํ์ฌ ์์ง๋์ด๊ฐ ์ฅ์ ์ ๊ทผ๋ณธ ์์ธ์ ๋ ๋น ๋ฅด๊ณ ๋ช ํํ๊ฒ ํ์ ํ๊ณ ์ฌํ ์ฌ๋ฐ ๋ฐฉ์ง ๋์ฑ ์ ์ฒด๊ณ์ ์ผ๋ก ์๋ฆฝํ ์ ์๋๋ก ๋๋ ๊ฒ์ ๋ชฉํ๋ก ํฉ๋๋ค.
- ์ฅ์ ๋ฐ์ ์ Prometheus Alert, ๋ก๊ทธ, ๋ฉํธ๋ฆญ ๋ฑ ๊ด์ธก ๋ฐ์ดํฐ๋ฅผ ์๋์ผ๋ก ์์งํ๊ณ , LLM๊ณผ Vector Database๋ฅผ ํ์ฉํด ๊ณผ๊ฑฐ ์ ์ฌ ์ฅ์ ์ฌ๋ก Top 3๋ฅผ ๋น๊ต ๋ถ์ํจ์ผ๋ก์จ ํ์ฌ ์ํฉ์ ๊ฐ์ฅ ์ ํฉํ ์ ์ํ๊ณ ์ผ๊ด๋ ๋์ ๊ฐ์ด๋๋ฅผ ์ ๊ณตํฉ๋๋ค.
- ์ด๋ฅผ ํตํด ๊ฐ์ธ์ ๊ฒฝํ์ ์์กดํ๋ ์ฅ์ ๋์ ๋ฐฉ์์ ์ค์ด๊ณ , ํนํ ์ฃผ๋์ด ์์ง๋์ด๋ ์์ ์ ์ธ ํ๋จ์ ๋ด๋ฆด ์ ์๋ ์ด์ ํ๊ฒฝ์ ๋ง๋๋ ๊ฒ์ ์งํฅํฉ๋๋ค.
-
Background / Introduction (EN):
-
Rather than pursuing fully automated incident remediation, this project focuses on analyzing the diverse alerts generated during incident response through AI agent-based analysis, enabling engineers to identify root causes more quickly and clearly and to systematically establish post-incident prevention strategies.
-
When an incident occurs, the system automatically aggregates observability data, including Prometheus alerts, logs, and metrics. By leveraging LLMs and Vector Databases, it compares the Top 3 most similar historical incidents to provide a rapid, consistent, and context-aware response guide tailored to the current situation.
-
Through this approach, the project aims to reduce reliance on individual experience and to create an operational environment in which junior engineers can make stable and well-informed decisions.
-
-
Core Values:
- Open Source, Human-in-the-Loop by Design, AI-Forward Architecture
2. ํ ๊ตฌ์ฑ (The Team)
Roles and responsibilities for the member team.
| ์ด๋ฆ (Name) | ID | ์ญํ (Role) | SNS | ์ฃผ์ ์ฑ ์ (Responsibilities - KR/EN) |
|---|---|---|---|---|
| ๊นํ์ง | @Taeji_Kim | Team Leader | Link | ๋ก๋๋งต ๋ฐ ์ต์ข ์์ฌ๊ฒฐ์ / Roadmap & Final decision-making |
| ๊นํ์ | @user116 | DevOps | Link | ์ธํ๋ผ ๋ฐ CI/CD ๊ด๋ฆฌ / Infrastructure & CI/CD management |
| ํฉ์ฐ๋น | @Binoo | BE/FE | Link | ํต์ฌ ๊ธฐ๋ฅ ๊ตฌํ / Core logic & API implementation |
| ์ต๋ณดํ | @brilly | BE/FE | Link | ํต์ฌ ๊ธฐ๋ฅ ๊ตฌํ / Core logic & API implementation |
3. ๊ธฐ์ ์คํ (Tech Stack)
- Infra: Kubernetes, Terraform, AWS
- Language: Go, Python, React
- Database: PostgreSQL
- Chaos Engineering: K6, Istio, Chaos Mesh
- Observability: Loki, Grafana, Tempo, Mimir, Kiali
- Communication: Discord, Slack
4. ๋ก๋๋งต (Roadmap)
- Phase 1: MVP ์๊ตฌ์ฌํญ ์ ์ (MVP Requirement Definition)
- Phase 2: ํต์ฌ ๋ชจ๋ ๊ฐ๋ฐ ๋ฐ ์ํ ํ ์คํธ (Core Module Dev & Alpha Test)
- Phase 3: ๊ธ๋ก๋ฒ ์ปค๋ฎค๋ํฐ ๊ณต๊ฐ (Global Community Launch)
5. ์ฐธ์ฌ ๋ฐฉ๋ฒ (How to Contribute)
- Issues: ๋ฒ๊ทธ๋ ๊ธฐ๋ฅ ์ ์์ GitHub Issues๋ฅผ ํ์ฉํ์ธ์. (Please use GitHub Issues for bug reports or feature requests.)
- PRs: ๋ชจ๋ Pull Request๋ Tech Lead์ ๊ฒํ ํ ๋ณํฉ๋ฉ๋๋ค. (All PRs will be merged after review by the Tech Lead.)
- Guide: [CONTRIBUTING.md] ํ์ผ์ ์ฐธ๊ณ ํ์ธ์. (Please refer to the [CONTRIBUTING.md] file.)
- Discord (Official): [์ ๋๊ฐ์ Invite Link]
- KR: ์ค์๊ฐ ์ํต ๋ฐ ๊ธฐ์ ์ง์์ ์ํ ์ฑ๋์ ๋๋ค.
- EN: Official channel for real-time communication and technical support.
6. ๋ฆฌ์์ค ๋ฐ ๋งํฌ (Resources & Links)
- GitHub Repository: [Link]
- Docs: [Architecture / API Specs]
| This is a space where knowledge is not merely consumed, but respected, sovereign, and connectedโshared together with cloud industry professionals (Bros).|
| ์ง์์ด ์๋น๋์ง ์๊ณ ์กด์คยท์ฃผ๊ถ๋ณด์ฅยท์ฐ๊ฒฐ๋๋ ๊ณต๊ฐ์ผ๋ก ํด๋ผ์ฐ๋ ํ์ ์ ๋ฌธ๊ฐ(Bro)์ ํจ๊ป ๊ณต์ ํ๊ณ ์์ต๋๋ค. |