13 KiB
13 KiB
Synonym Expansion System v3.1
Purpose: Comprehensive synonym and natural language expansion library for 98%+ skill activation reliability.
🎯 Problem Solved: Natural Language Gap
Issue: Skills fail to activate because users use natural language variations, synonyms, and conversational phrasing that traditional keyword systems don't cover.
Example Problem:
- User says: "I need to get information from this website"
- Skill keywords: ["extract data", "analyze data"]
- Result: ❌ Skill doesn't activate, Claude ignores it
Enhanced Solution:
- Expanded keywords: ["extract data", "analyze data", "get information", "scrape content", "pull details", "harvest data", "collect metrics"]
- Result: ✅ Skill activates reliably
📚 Synonym Library by Category
1. Data & Information Synonyms
1.1 Core Data Synonyms
{
"data": ["information", "content", "details", "records", "dataset", "metrics", "figures", "statistics", "values", "numbers"],
"information": ["data", "content", "details", "facts", "insights", "knowledge", "records", "metrics"],
"content": ["data", "information", "material", "text", "details", "content", "substance"],
"details": ["data", "information", "specifics", "particulars", "facts", "records", "data points"],
"records": ["data", "information", "entries", "logs", "files", "documents", "records"],
"dataset": ["data", "information", "collection", "records", "files", "database", "records"],
"metrics": ["data", "measurements", "statistics", "figures", "indicators", "numbers", "values"],
"statistics": ["data", "metrics", "figures", "numbers", "measurements", "analytics", "data"]
}
1.2 Technical Data Synonyms
{
"extract": ["scrape", "get", "pull", "retrieve", "collect", "harvest", "obtain", "gather", "acquire", "fetch"],
"scrape": ["extract", "get", "pull", "harvest", "collect", "gather", "acquire", "mine", "pull"],
"retrieve": ["extract", "get", "pull", "fetch", "obtain", "collect", "gather", "acquire", "harvest"],
"collect": ["extract", "gather", "harvest", "acquire", "obtain", "pull", "get", "scrape", "fetch"],
"harvest": ["extract", "collect", "gather", "acquire", "obtain", "pull", "get", "scrape", "mine"]
}
2. Action & Processing Synonyms
2.1 Analysis & Processing Synonyms
{
"analyze": ["process", "handle", "work with", "examine", "study", "evaluate", "review", "assess", "explore", "investigate", "scrutinize"],
"process": ["analyze", "handle", "work with", "manage", "deal with", "work through", "examine", "study"],
"handle": ["process", "manage", "deal with", "work with", "work on", "handle", "address", "process"],
"work with": ["process", "handle", "manage", "deal with", "work on", "process", "handle", "address"],
"examine": ["analyze", "study", "review", "inspect", "check", "look at", "evaluate", "assess"],
"study": ["analyze", "examine", "review", "investigate", "research", "explore", "evaluate", "assess"]
}
2.2 Transformation & Normalization Synonyms
{
"normalize": ["clean", "format", "standardize", "structure", "organize", "regularize", "standardize", "clean", "format"],
"clean": ["normalize", "format", "structure", "organize", "standardize", "regularize", "tidy", "format"],
"format": ["normalize", "clean", "structure", "organize", "standardize", "regularize", "arrange", "organize"],
"structure": ["normalize", "organize", "format", "clean", "standardize", "regularize", "arrange", "organize"],
"organize": ["normalize", "structure", "format", "clean", "standardize", "regularize", "arrange", "structure"]
}
3. Source & Location Synonyms
3.1 Website & Source Synonyms
{
"website": ["site", "webpage", "web site", "online site", "digital platform", "internet site", "url"],
"site": ["website", "webpage", "web site", "online site", "digital platform", "internet page", "url"],
"webpage": ["website", "site", "web page", "online page", "internet page", "digital page"],
"source": ["origin", "location", "place", "point", "spot", "area", "region", "position"],
"api": ["application programming interface", "web service", "service", "endpoint", "interface"],
"database": ["db", "data store", "data repository", "information base", "record system"]
}
4. Workflow & Business Synonyms
4.1 Repetitive Task Synonyms
{
"every day": ["daily", "each day", "per day", "daily routine", "day to day"],
"daily": ["every day", "each day", "per day", "day to day", "daily routine", "regularly"],
"have to": ["need to", "must", "should", "got to", "required to", "obligated to"],
"need to": ["have to", "must", "should", "got to", "required to", "obligated to"],
"regularly": ["every day", "daily", "consistently", "frequently", "often", "routinely"],
"repeatedly": ["regularly", "frequently", "often", "consistently", "day after day"]
}
4.2 Business Process Synonyms
{
"reports": ["analytics", "analysis", "metrics", "statistics", "findings", "results", "outcomes"],
"metrics": ["reports", "analytics", "statistics", "figures", "measurements", "data", "indicators"],
"analytics": ["reports", "metrics", "statistics", "analysis", "insights", "findings", "intelligence"],
"dashboard": ["reports", "analytics", "overview", "summary", "display", "panel", "interface"],
"meetings": ["discussions", "reviews", "presentations", "briefings", "sessions", "gatherings"]
}
🔄 Synonym Expansion Algorithm
Core Expansion Function
def expand_with_synonyms(base_keywords, domain):
"""
Expand keywords with comprehensive synonym coverage
"""
expanded_keywords = set(base_keywords)
# 1. Core synonym expansion
for keyword in base_keywords:
if keyword in SYNONYM_LIBRARY:
expanded_keywords.update(SYNONYM_LIBRARY[keyword])
# 2. Reverse lookup (find synonyms that match)
expanded_keywords.update(find_synonym_matches(base_keywords))
# 3. Domain-specific expansion
if domain in DOMAIN_SYNONYMS:
expanded_keywords.update(DOMAIN_SYNONYMS[domain])
# 4. Combination generation
expanded_keywords.update(generate_combinations(base_keywords))
# 5. Natural language variations
expanded_keywords.update(generate_natural_variations(base_keywords))
return list(expanded_keywords)
Combination Generator
def generate_combinations(keywords):
"""
Generate natural combinations of keywords
"""
combinations = set()
# Action + Data combinations
actions = ["extract", "get", "pull", "scrape", "harvest", "collect"]
data_types = ["data", "information", "content", "records", "metrics"]
sources = ["from website", "from site", "from API", "from database", "from file"]
for action in actions:
for data_type in data_types:
for source in sources:
combinations.add(f"{action} {data_type} {source}")
return combinations
Natural Language Generator
def generate_natural_variations(keywords):
"""
Generate conversational and informal variations
"""
variations = set()
# Question forms
prefixes = ["how to", "what can I", "can you", "help me", "I need to"]
for keyword in keywords:
for prefix in prefixes:
variations.add(f"{prefix} {keyword}")
# Command forms
for keyword in keywords:
variations.add(f"{keyword} from this site")
variations.add(f"{keyword} from the website")
variations.add(f"{keyword} from that source")
return variations
📊 Domain-Specific Synonym Libraries
Finance Domain
{
"stock": ["equity", "share", "security", "ticker", "instrument", "investment"],
"analyze": ["research", "evaluate", "assess", "review", "examine", "study", "investigate"],
"technical": ["chart", "graph", "indicator", "signal", "pattern", "trend", "analysis"],
"investment": ["portfolio", "trading", "investing", "asset", "holding", "position"]
}
E-commerce Domain
{
"product": ["item", "goods", "merchandise", "inventory", "stock", "offering"],
"customer": ["client", "buyer", "shopper", "user", "consumer", "purchaser"],
"order": ["purchase", "transaction", "sale", "buy", "acquisition", "booking"],
"inventory": ["stock", "goods", "items", "products", "merchandise", "supply"]
}
Healthcare Domain
{
"patient": ["client", "individual", "person", "case", "member"],
"treatment": ["care", "therapy", "procedure", "intervention", "service"],
"medical": ["health", "clinical", "therapeutic", "diagnostic", "healing"],
"records": ["files", "documents", "charts", "history", "profile", "information"]
}
Technology Domain
{
"system": ["platform", "software", "application", "tool", "solution", "program"],
"user": ["person", "individual", "customer", "client", "member", "participant"],
"feature": ["capability", "function", "ability", "functionality", "option"],
"performance": ["speed", "efficiency", "optimization", "throughput", "capacity"]
}
🎯 Implementation Examples
Example 1: Data Extraction Skill
# Input:
base_keywords = ["extract data", "normalize data", "analyze data"]
domain = "data_extraction"
# Output (68 keywords total):
expanded_keywords = [
# Base (3)
"extract data", "normalize data", "analyze data",
# Synonym expansions (15)
"scrape data", "get data", "pull data", "harvest data", "collect data",
"clean data", "format data", "structure data", "organize data",
"process data", "handle data", "work with data", "examine data",
# Domain-specific (8)
"web scraping", "data mining", "API integration", "ETL process",
"content parsing", "information retrieval", "data processing",
# Combinations (20)
"extract and analyze data", "get and process information",
"scrape and normalize content", "pull and structure records",
"harvest and format metrics", "collect and organize dataset",
# Natural language (22)
"how to extract data", "what can I scrape from this site",
"can you process information", "help me handle records",
"I need to normalize information", "pull data from website"
]
Example 2: Finance Analysis Skill
# Input:
base_keywords = ["analyze stock", "technical analysis", "RSI indicator"]
domain = "finance"
# Output (45 keywords total):
expanded_keywords = [
# Base (3)
"analyze stock", "technical analysis", "RSI indicator",
# Synonym expansions (12)
"evaluate equity", "research security", "review ticker",
"chart analysis", "graph indicator", "signal pattern",
"trend analysis", "pattern detection", "investment analysis",
# Domain-specific (10)
"portfolio analysis", "trading signals", "asset evaluation",
"market analysis", "equity research", "investment research",
"performance metrics", "risk assessment", "return analysis",
# Combinations (10)
"analyze stock performance", "evaluate equity risk",
"research technical indicators", "review market trends",
# Natural language (10)
"how to analyze this stock", "can you evaluate the security",
"help me research the ticker", "I need technical analysis"
]
✅ Quality Assurance Checklist
Synonym Coverage:
- Each core keyword has 5-8 synonyms
- Technical terminology included
- Business language covered
- Conversational variations present
- Domain-specific terms added
Natural Language:
- Question forms included ("how to", "what can I")
- Command forms included ("extract from")
- Informal variations included ("get data")
- Workflow language included ("daily I have to")
Domain Specificity:
- Industry-specific terminology included
- Technical jargon covered
- [] Business language present
- Contextual variations added
Testing Requirements:
- 50+ keywords generated per skill
- 20+ natural language variations
- 98%+ activation reliability
- False negatives < 5%
🚀 Usage in Agent-Skill-Creator
Phase 4 Integration:
- Generate base keywords (traditional method)
- Apply synonym expansion (enhanced method)
- Add domain-specific terms (specialized coverage)
- Generate combinations (pattern-based)
- Include natural language (conversational)
Template Integration:
- Enhanced keyword generation in phase4-detection.md
- Synonym libraries in activation-patterns-guide.md
- Domain examples in marketplace-robust-template.json
Result:
- 50+ keywords per skill (vs 10-15 traditional)
- 98%+ activation reliability (vs 70% traditional)
- Natural language support (vs formal only)
- Domain-specific coverage (vs generic only)