GitLab CI/CD test job reliability analysis for OSDU projects. Tracks test job (unit/integration/acceptance) pass/fail status across pipeline runs. Use for test job status, flaky test job detection, test reliability/quality metrics, cloud provider analytics. Wraps osdu-quality CLI.
2.0.0
OSDU GitLab CI/CD test reliability analysis
keywords
verbs
patterns
osdu
gitlab
quality
ci
cd
pipeline
test
job
reliability
flaky
acceptance
integration
unit
azure
aws
gcp
cloud
provider
analyze
track
monitor
test
check
test.*(?:reliability|status|job)
pipeline.*(?:analysis|status)
flaky.*test
ci.*cd
gitlab.*(?:pipeline|job)
Bash
Analyze GitLab CI/CD test job reliability for OSDU platform projects, tracking test job pass/fail status across pipeline runs to identify flaky tests and provide quality metrics.
OSDU project test status queries ("how is {project} looking", "partition test quality")
Flaky test detection ("are there flaky tests in {project}")
Pipeline health monitoring ("recent pipeline failures")
Cloud provider comparison ("azure vs aws test reliability")
Stage-specific analysis ("unit test status", "integration test failures")
<skip-when>
<condition>Individual test case tracking (we track job-level, not test-level)</condition>
<condition>Non-test jobs (build, deploy, lint, security scans)</condition>
<condition>Non-OSDU projects or non-GitLab CI systems</condition>
<condition>Real-time monitoring (data is from completed pipelines only)</condition>
</skip-when>
Pipeline Run → Test Stage (unit/integration/acceptance) → Test Job → Test Suite (many tests)
<capabilities>
<supported>Test job pass/fail status across multiple pipeline runs</supported>
<supported>Flaky test job detection (jobs that intermittently fail)</supported>
<supported>Stage-level metrics (unit/integration/acceptance)</supported>
<supported>Cloud provider breakdown (azure, aws, gcp, ibm, cimpl)</supported>
<unsupported>Individual test results (not tracked)</unsupported>
<unsupported>Non-test jobs like build, deploy, lint</unsupported>
</capabilities>
<example>
Pipeline #1: job "unit-tests-azure" → PASS (100/100 tests passed)
Pipeline #2: job "unit-tests-azure" → FAIL (99/100 tests passed)
Pipeline #3: job "unit-tests-azure" → PASS (100/100 tests passed)
Result: This job is FLAKY (unreliable across runs)
</example>
<progressive-approach mandatory="true">
<step number="1" name="start-light">
<action>Use status.py for quick overview</action>
<command>script_run osdu status.py --format json --pipelines 3 --project {name}</command>
<rationale>Lightweight, fast, safe token usage</rationale>
</step>
<step number="2" name="deep-dive" condition="only-if-needed">
<action>Use analyze.py with strict filters</action>
<command>script_run osdu analyze.py --format json --pipelines 5 --project {name} --stage unit</command>
<rationale>Heavy query, use only when status insufficient</rationale>
</step>
<step number="3" name="never-query-all">
<action>ALWAYS specify --project to avoid 30-project scan</action>
<rationale>Prevents token limit exceeded error</rationale>
</step>
</progressive-approach>
<format-selection>
<format type="json">
<use-when>Extracting specific metrics or calculating statistics</use-when>
<use-when>Building summaries or comparisons</use-when>
<use-when>Parsing structured data programmatically</use-when>
<use-when importance="critical">ALWAYS for status.py (lightweight, parseable)</use-when>
</format>
<format type="markdown">
<use-when>Analyze.py queries (10x smaller than JSON, still readable)</use-when>
<use-when>Creating reports for sharing</use-when>
<use-when>Need human-readable tables without parsing</use-when>
<use-when>Token budget is tight</use-when>
</format>
<format type="terminal" status="never-use">
<avoid-because>Includes ANSI codes and colors, hard to parse</avoid-because>
<avoid-because>Only for direct human terminal viewing</avoid-because>
</format>
</format-selection>
<dont-do description="Use high pipeline counts without project filter">
<bad-example>script_run osdu analyze.py --format json --pipelines 20</bad-example>
<reason>Takes 3+ minutes, huge output</reason>
</dont-do>
<dont-do description="Use terminal format in agent context">
<bad-example>script_run osdu status.py --format terminal --project partition</bad-example>
<reason>Includes ANSI codes, hard to parse</reason>
</dont-do>
<dont-do description="Jump straight to analyze.py">
<bad-example>script_run osdu analyze.py --format json --pipelines 10 --project partition</bad-example>
<reason>Heavy query when status.py would suffice</reason>
</dont-do>
Always start with status.py, only use analyze.py if needed
Always specify --format json or --format markdown
Always include --project {name} to avoid all-30-projects scan
Start with --pipelines 3-5, increase only if necessary
Use --stage or --provider to narrow scope
Prefer --format markdown for analyze.py (10x token savings)
Structured parsing, metrics extraction
Reports, sharing, analyze.py queries
Human viewing in terminal with colors
Clear message with installation command
Message about authentication requirements
API error details
List of valid options
Suggestion to reduce --pipelines or add filters
Always follow progressive query approach
Never query without --project filter
Start with minimal pipeline counts
Use markdown format for analyze.py to save tokens
Apply stage or provider filters when possible