7.9 KiB
Hypothesis Quality Criteria
Framework for Evaluating Scientific Hypotheses
Use these criteria to assess the quality and rigor of generated hypotheses. A robust hypothesis should score well across multiple dimensions.
Note on Report Structure: When generating hypothesis reports, provide a brief quality assessment summary in the main text (comparative table with ratings), and include detailed evaluation with strengths, weaknesses, and comprehensive analysis in Appendix C: Quality Assessment.
Core Criteria
1. Testability
Definition: The hypothesis can be empirically tested through observation or experimentation.
Evaluation questions:
- Can specific experiments or observations test this hypothesis?
- Are the predicted outcomes measurable?
- Can the hypothesis be tested with current or near-future methods?
- Are there multiple independent ways to test it?
Strong testability examples:
- "Increased expression of protein X will reduce cell proliferation rate by >30%"
- "Patients receiving treatment Y will show 50% reduction in symptom Z within 4 weeks"
Weak testability examples:
- "This process is influenced by complex interactions" (vague, no specific prediction)
- "The mechanism involves quantum effects" (if no method to test quantum effects exists)
2. Falsifiability
Definition: Clear conditions or observations would disprove the hypothesis (Popperian criterion).
Evaluation questions:
- What specific observations would prove this hypothesis wrong?
- Are the falsifying conditions realistic to observe?
- Is the hypothesis stated clearly enough to be disproven?
- Can null results meaningfully falsify the hypothesis?
Strong falsifiability examples:
- "If we knock out gene X, phenotype Y will disappear" (can be falsified if phenotype persists)
- "Drug A will outperform placebo in 80% of patients" (clear falsification threshold)
Weak falsifiability examples:
- "Multiple factors contribute to the outcome" (too vague to falsify)
- "The effect may vary depending on context" (built-in escape clauses)
3. Parsimony (Occam's Razor)
Definition: Among competing hypotheses with equal explanatory power, prefer the simpler explanation.
Evaluation questions:
- Does the hypothesis invoke the minimum number of entities/mechanisms needed?
- Are all proposed elements necessary to explain the phenomenon?
- Could a simpler mechanism account for the observations?
- Does it avoid unnecessary assumptions?
Parsimony considerations:
- Simple ≠ simplistic; complexity is justified when evidence demands it
- Established mechanisms are "simpler" than novel, unproven ones
- Direct mechanisms are simpler than elaborate multi-step pathways
- One well-supported mechanism beats multiple speculative ones
4. Explanatory Power
Definition: The hypothesis accounts for a substantial portion of the observed phenomenon.
Evaluation questions:
- How much of the observed data does this hypothesis explain?
- Does it account for both typical and atypical observations?
- Can it explain related phenomena beyond the immediate observation?
- Does it resolve apparent contradictions in existing data?
Strong explanatory power indicators:
- Explains multiple independent observations
- Accounts for quantitative relationships, not just qualitative patterns
- Resolves previously puzzling findings
- Makes sense of seemingly contradictory results
Limited explanatory power indicators:
- Only explains part of the phenomenon
- Requires additional hypotheses for complete explanation
- Leaves major observations unexplained
5. Scope
Definition: The range of phenomena and contexts the hypothesis can address.
Evaluation questions:
- Does it apply only to the specific case or to broader situations?
- Can it generalize across conditions, species, or systems?
- Does it connect to larger theoretical frameworks?
- What are its boundaries and limitations?
Broader scope (generally preferable):
- Applies across multiple experimental conditions
- Generalizes to related systems or species
- Connects phenomenon to established principles
Narrower scope (acceptable if explicitly defined):
- Limited to specific conditions or contexts
- Requires different mechanisms in different settings
- Context-dependent with clear boundaries
6. Consistency with Established Knowledge
Definition: Alignment with well-supported theories, principles, and empirical findings.
Evaluation questions:
- Is it consistent with established physical, chemical, or biological principles?
- Does it align with or reasonably extend current theories?
- If contradicting established knowledge, is there strong justification?
- Does it require violating well-supported laws or findings?
Levels of consistency:
- Fully consistent: Applies established mechanisms in new context
- Mostly consistent: Extends current understanding in plausible ways
- Partially inconsistent: Contradicts some findings but has explanatory value
- Highly inconsistent: Requires rejecting well-established principles (requires exceptional evidence)
7. Novelty and Insight
Definition: The hypothesis offers new understanding beyond merely restating known facts.
Evaluation questions:
- Does it provide new mechanistic insight?
- Does it challenge assumptions or conventional wisdom?
- Does it suggest unexpected connections or relationships?
- Does it open new research directions?
Novel contributions:
- Proposes previously unconsidered mechanisms
- Reframes the problem in a productive way
- Connects disparate observations
- Suggests non-obvious testable predictions
Note: Novelty alone doesn't make a hypothesis valuable; it must also be testable, parsimonious, and explanatory.
Comparative Evaluation
When evaluating multiple competing hypotheses:
Trade-offs and Balancing
Hypotheses often involve trade-offs:
- More parsimonious but less explanatory power
- Broader scope but less testable with current methods
- Novel insights but less consistent with current knowledge
Evaluation approach:
- No hypothesis needs to be perfect on all dimensions
- Identify each hypothesis's strengths and weaknesses
- Consider which criteria are most important for the specific phenomenon
- Note which hypotheses are most immediately testable
- Identify which would be most informative if supported
Distinguishability
Key question: Can experiments distinguish between competing hypotheses?
- Identify predictions that differ between hypotheses
- Prioritize hypotheses that make distinct predictions
- Note which experiments would most efficiently narrow the field
- Consider whether hypotheses could all be partially correct
Common Pitfalls
Untestable Hypotheses
- Too vague to generate specific predictions
- Invoke unobservable or unmeasurable entities
- Require technology that doesn't exist
Unfalsifiable Hypotheses
- Built-in escape clauses ("may or may not occur")
- Post-hoc explanations that fit any outcome
- No specification of what would disprove them
Overly Complex Hypotheses
- Invoke multiple unproven mechanisms
- Add unnecessary steps or entities
- Complexity not justified by explanatory gains
Just-So Stories
- Plausible narratives without testable predictions
- Explain observations but don't predict new ones
- Impossible to distinguish from alternative stories
Practical Application
When generating hypotheses:
- Draft initial hypotheses focusing on mechanistic explanations
- Apply quality criteria to identify weaknesses
- Refine hypotheses to improve testability and clarity
- Develop specific predictions to enhance testability and falsifiability
- Compare systematically across all criteria
- Prioritize for testing based on distinguishability and feasibility
Remember: The goal is not a perfect hypothesis, but a set of testable, falsifiable, informative hypotheses that advance understanding of the phenomenon.