๐Ÿ“ˆ RAG ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ: ์ˆœ์œ„ ๋ฏธ๊ณ ๋ ค ์ง€ํ‘œ(Rank-Unaware) - Precision, Recall, F1-Score

RAG์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๊ฒ€์ƒ‰๋œ ๊ฒฐ๊ณผ์˜ ์ˆœ์„œ๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ๋‹จ์ˆœํžˆ '์–ผ๋งˆ๋‚˜ ๊ด€๋ จ ์žˆ๋Š” ํ•ญ๋ชฉ์„ ์ž˜ ์ฐพ์•„๋ƒˆ๋Š”๊ฐ€'๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๊ธฐ๋ณธ์ ์ธ ์ง€ํ‘œ๋กœ Precision(์ •๋ฐ€๋„), Recall(์žฌํ˜„์œจ), ๊ทธ๋ฆฌ๊ณ  ์ด ๋‘˜์˜ ์กฐํ™” ํ‰๊ท ์ธ F1-Score๊ฐ€ ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด ๋…ธํŠธ์—์„œ๋Š” ๊ฐ ์ง€ํ‘œ์˜ ์ •์˜, ์˜๋ฏธ, ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ์žฅ๋‹จ์ ์„ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๐Ÿค”

1. ์ •์˜-์„ค๋ช…-์˜ˆ์‹œ

Precision (์ •๋ฐ€๋„) ๐ŸŽฏ

Precision=TPTP+FP=๊ฒ€์ƒ‰๋œ ๊ด€๋ จ ์žˆ๋Š” ํ•ญ๋ชฉ ์ˆ˜๊ฒ€์ƒ‰๋œ ์ด ํ•ญ๋ชฉ ์ˆ˜

(์—ฌ๊ธฐ์„œ TP: True Positive - ๊ด€๋ จ ์žˆ๋Š” ํ•ญ๋ชฉ์„ ๊ด€๋ จ ์žˆ๋‹ค๊ณ  ์˜ˆ์ธก, FP: False Positive - ๊ด€๋ จ ์—†๋Š” ํ•ญ๋ชฉ์„ ๊ด€๋ จ ์žˆ๋‹ค๊ณ  ์˜ˆ์ธก)

Recall (์žฌํ˜„์œจ) ๐Ÿ”

Recall=TPTP+FN=๊ฒ€์ƒ‰๋œ ๊ด€๋ จ ์žˆ๋Š” ํ•ญ๋ชฉ ์ˆ˜์ „์ฒด ๊ด€๋ จ ์žˆ๋Š” ํ•ญ๋ชฉ ์ˆ˜

(์—ฌ๊ธฐ์„œ FN: False Negative - ๊ด€๋ จ ์žˆ๋Š” ํ•ญ๋ชฉ์„ ๊ด€๋ จ ์—†๋‹ค๊ณ  ์˜ˆ์ธกํ•˜์—ฌ ๋†“์นจ)

F1-Score (F1 ์ ์ˆ˜) โš–๏ธ

F1=2ร—Precisionร—RecallPrecision+Recall=2TP2TP+FP+FN

2. ๋น„๊ต ๋ฐ ๋Œ€์กฐ: Precision vs Recall ๐Ÿค”

ํŠน์ง• Precision (์ •๋ฐ€๋„) Recall (์žฌํ˜„์œจ)
์ดˆ์  ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ์งˆ (Quality) ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ์–‘ (Quantity / Coverage)
์งˆ๋ฌธ "์ฐพ์•„์˜จ ๊ฒฐ๊ณผ ์ค‘ ์–ผ๋งˆ๋‚˜ ์ •ํ™•ํ•œ๊ฐ€?" "์ฐพ์•„์•ผ ํ•  ๊ฒƒ๋“ค์„ ์–ผ๋งˆ๋‚˜ ๋งŽ์ด ์ฐพ์•„์™”๋Š”๊ฐ€?"
๋ชฉํ‘œ False Positive (FP) ์ตœ์†Œํ™”: ๊ด€๋ จ ์—†๋Š” ๊ฒƒ์„ ๊ฑธ๋Ÿฌ๋‚ด๊ธฐ False Negative (FN) ์ตœ์†Œํ™”: ๊ด€๋ จ ์žˆ๋Š” ๊ฒƒ์„ ๋†“์น˜์ง€ ์•Š๊ธฐ
๋†’์„ ๋•Œ ์ข‹์Œ - ์ŠคํŒธ ๋ฉ”์ผ ํ•„ํ„ฐ๋ง (์ผ๋ฐ˜ ๋ฉ”์ผ์„ ์ŠคํŒธ์œผ๋กœ ์ž˜๋ชป ๋ถ„๋ฅ˜ํ•˜๋ฉด ์•ˆ ๋จ)
- ๊ฒ€์ƒ‰ ๊ด‘๊ณ  (๊ด€๋ จ ์—†๋Š” ์‚ฌ์šฉ์ž์—๊ฒŒ ๊ด‘๊ณ  ๋…ธ์ถœ ์ตœ์†Œํ™”)
- ์ถ”์ฒœ ์‹œ์Šคํ…œ (์‚ฌ์šฉ์ž๊ฐ€ ์‹ซ์–ดํ•  ๋งŒํ•œ ์•„์ดํ…œ ์ถ”์ฒœ ์ตœ์†Œํ™”)
- ์•” ์ง„๋‹จ (์‹ค์ œ ํ™˜์ž๋ฅผ ๋†“์น˜๋ฉด ์น˜๋ช…์ )
- ๋ฒ•๋ฅ  ๋ฌธ์„œ ๊ฒ€์ƒ‰ (๊ด€๋ จ ํŒ๋ก€๋ฅผ ํ•˜๋‚˜๋ผ๋„ ๋†“์น˜๋ฉด ์•ˆ ๋จ)
- ๋ณด์•ˆ ์‹œ์Šคํ…œ (์นจ์ž… ์‹œ๋„๋ฅผ ๋†“์น˜๋ฉด ์•ˆ ๋จ)
Trade-off Recall๊ณผ ๋ฐ˜๋น„๋ก€ ๊ด€๊ณ„์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ Precision๊ณผ ๋ฐ˜๋น„๋ก€ ๊ด€๊ณ„์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ

Precision๊ณผ Recall์€ ์ƒํ˜ธ ๋ณด์™„์ ์ธ ์ง€ํ‘œ์ด๋ฉฐ, ์–ด๋–ค ๊ฒƒ์„ ๋” ์ค‘์š”ํ•˜๊ฒŒ ์ƒ๊ฐํ• ์ง€๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๋ชฉ์ ๊ณผ ์ƒํ™ฉ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ์šฉ์ž์—๊ฒŒ ๋งค์šฐ ์ •ํ™•ํ•œ ์†Œ์ˆ˜์˜ ์ •๋ณด๋งŒ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค๋ฉด Precision์„ ์šฐ์„ ์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค (์˜ˆ: ๊ตฌ๊ธ€ ๊ฒ€์ƒ‰ ์ฒซ ํŽ˜์ด์ง€ ๊ฒฐ๊ณผ). ๋ฐ˜๋ฉด, ์ž ์žฌ์ ์œผ๋กœ ๊ด€๋ จ๋œ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๋น ์ง์—†์ด ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค๋ฉด Recall์„ ์šฐ์„ ์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค (์˜ˆ: ํŠนํ—ˆ ๊ฒ€์ƒ‰, ์˜๋ฃŒ ์ง„๋‹จ ๋ณด์กฐ). F1-Score๋Š” ์ด ๋‘ ๊ฐ€์ง€๋ฅผ ๊ท ํ˜• ์žˆ๊ฒŒ ํ‰๊ฐ€ํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

3. ์งˆ๋ฌธ-๋‹ต๋ณ€ (Q&A) โ“

์ฐธ๊ณ  ์ž๋ฃŒ:

๊ด€๋ จ ๋…ธํŠธ:
์ •๋ณด ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ ํ‰๊ฐ€: ์ˆœ์œ„ ๊ณ ๋ ค ์ง€ํ‘œ (MAP, NDCG)
Precision-Recall Curve์™€ Average Precision (AP)
๊ธฐ๊ณ„ ํ•™์Šต ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ‰๊ฐ€ ์ง€ํ‘œ (Accuracy, Confusion Matrix)
ROC Curve์™€ AUC: ์ด์ง„ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ ์‹œ๊ฐํ™” ๋ฐ ํ‰๊ฐ€
F-beta Score: Precision๊ณผ Recall ๊ฐ€์ค‘์น˜ ์กฐ์ ˆํ•˜๊ธฐ
์ •๋ณด ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ ๊ฐœ๋ก 
์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ํ‰๊ฐ€ ์ง€ํ‘œ
False Positive์™€ False Negative์˜ ์ค‘์š”์„ฑ ๋น„๊ต
๊ฒ€์ƒ‰ ์—”์ง„ ์ตœ์ ํ™”(SEO)์™€ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ
๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM) ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ ํ‰๊ฐ€

๐Ÿท๏ธ: ์ •๋ณด ๊ฒ€์ƒ‰, Evaluation Metrics, Precision, Recall, F1-Score, ๊ธฐ๊ณ„ ํ•™์Šต, ๋ฐ์ดํ„ฐ ๊ณผํ•™, ์„ฑ๋Šฅ ํ‰๊ฐ€, ์ˆœ์œ„ ๋ฏธ๊ณ ๋ ค ์ง€ํ‘œ