# Hypothesis Quality Criteria ## Framework for Evaluating Scientific Hypotheses Use these criteria to assess the quality and rigor of generated hypotheses. A robust hypothesis should score well across multiple dimensions. **Note on Report Structure:** When generating hypothesis reports, provide a brief quality assessment summary in the main text (comparative table with ratings), and include detailed evaluation with strengths, weaknesses, and comprehensive analysis in **Appendix C: Quality Assessment**. ## Core Criteria ### 1. Testability **Definition:** The hypothesis can be empirically tested through observation or experimentation. **Evaluation questions:** - Can specific experiments or observations test this hypothesis? - Are the predicted outcomes measurable? - Can the hypothesis be tested with current or near-future methods? - Are there multiple independent ways to test it? **Strong testability examples:** - "Increased expression of protein X will reduce cell proliferation rate by >30%" - "Patients receiving treatment Y will show 50% reduction in symptom Z within 4 weeks" **Weak testability examples:** - "This process is influenced by complex interactions" (vague, no specific prediction) - "The mechanism involves quantum effects" (if no method to test quantum effects exists) ### 2. Falsifiability **Definition:** Clear conditions or observations would disprove the hypothesis (Popperian criterion). **Evaluation questions:** - What specific observations would prove this hypothesis wrong? - Are the falsifying conditions realistic to observe? - Is the hypothesis stated clearly enough to be disproven? - Can null results meaningfully falsify the hypothesis? **Strong falsifiability examples:** - "If we knock out gene X, phenotype Y will disappear" (can be falsified if phenotype persists) - "Drug A will outperform placebo in 80% of patients" (clear falsification threshold) **Weak falsifiability examples:** - "Multiple factors contribute to the outcome" (too vague to falsify) - "The effect may vary depending on context" (built-in escape clauses) ### 3. Parsimony (Occam's Razor) **Definition:** Among competing hypotheses with equal explanatory power, prefer the simpler explanation. **Evaluation questions:** - Does the hypothesis invoke the minimum number of entities/mechanisms needed? - Are all proposed elements necessary to explain the phenomenon? - Could a simpler mechanism account for the observations? - Does it avoid unnecessary assumptions? **Parsimony considerations:** - Simple ≠ simplistic; complexity is justified when evidence demands it - Established mechanisms are "simpler" than novel, unproven ones - Direct mechanisms are simpler than elaborate multi-step pathways - One well-supported mechanism beats multiple speculative ones ### 4. Explanatory Power **Definition:** The hypothesis accounts for a substantial portion of the observed phenomenon. **Evaluation questions:** - How much of the observed data does this hypothesis explain? - Does it account for both typical and atypical observations? - Can it explain related phenomena beyond the immediate observation? - Does it resolve apparent contradictions in existing data? **Strong explanatory power indicators:** - Explains multiple independent observations - Accounts for quantitative relationships, not just qualitative patterns - Resolves previously puzzling findings - Makes sense of seemingly contradictory results **Limited explanatory power indicators:** - Only explains part of the phenomenon - Requires additional hypotheses for complete explanation - Leaves major observations unexplained ### 5. Scope **Definition:** The range of phenomena and contexts the hypothesis can address. **Evaluation questions:** - Does it apply only to the specific case or to broader situations? - Can it generalize across conditions, species, or systems? - Does it connect to larger theoretical frameworks? - What are its boundaries and limitations? **Broader scope (generally preferable):** - Applies across multiple experimental conditions - Generalizes to related systems or species - Connects phenomenon to established principles **Narrower scope (acceptable if explicitly defined):** - Limited to specific conditions or contexts - Requires different mechanisms in different settings - Context-dependent with clear boundaries ### 6. Consistency with Established Knowledge **Definition:** Alignment with well-supported theories, principles, and empirical findings. **Evaluation questions:** - Is it consistent with established physical, chemical, or biological principles? - Does it align with or reasonably extend current theories? - If contradicting established knowledge, is there strong justification? - Does it require violating well-supported laws or findings? **Levels of consistency:** - **Fully consistent:** Applies established mechanisms in new context - **Mostly consistent:** Extends current understanding in plausible ways - **Partially inconsistent:** Contradicts some findings but has explanatory value - **Highly inconsistent:** Requires rejecting well-established principles (requires exceptional evidence) ### 7. Novelty and Insight **Definition:** The hypothesis offers new understanding beyond merely restating known facts. **Evaluation questions:** - Does it provide new mechanistic insight? - Does it challenge assumptions or conventional wisdom? - Does it suggest unexpected connections or relationships? - Does it open new research directions? **Novel contributions:** - Proposes previously unconsidered mechanisms - Reframes the problem in a productive way - Connects disparate observations - Suggests non-obvious testable predictions **Note:** Novelty alone doesn't make a hypothesis valuable; it must also be testable, parsimonious, and explanatory. ## Comparative Evaluation When evaluating multiple competing hypotheses: ### Trade-offs and Balancing Hypotheses often involve trade-offs: - More parsimonious but less explanatory power - Broader scope but less testable with current methods - Novel insights but less consistent with current knowledge **Evaluation approach:** - No hypothesis needs to be perfect on all dimensions - Identify each hypothesis's strengths and weaknesses - Consider which criteria are most important for the specific phenomenon - Note which hypotheses are most immediately testable - Identify which would be most informative if supported ### Distinguishability **Key question:** Can experiments distinguish between competing hypotheses? - Identify predictions that differ between hypotheses - Prioritize hypotheses that make distinct predictions - Note which experiments would most efficiently narrow the field - Consider whether hypotheses could all be partially correct ## Common Pitfalls ### Untestable Hypotheses - Too vague to generate specific predictions - Invoke unobservable or unmeasurable entities - Require technology that doesn't exist ### Unfalsifiable Hypotheses - Built-in escape clauses ("may or may not occur") - Post-hoc explanations that fit any outcome - No specification of what would disprove them ### Overly Complex Hypotheses - Invoke multiple unproven mechanisms - Add unnecessary steps or entities - Complexity not justified by explanatory gains ### Just-So Stories - Plausible narratives without testable predictions - Explain observations but don't predict new ones - Impossible to distinguish from alternative stories ## Practical Application When generating hypotheses: 1. **Draft initial hypotheses** focusing on mechanistic explanations 2. **Apply quality criteria** to identify weaknesses 3. **Refine hypotheses** to improve testability and clarity 4. **Develop specific predictions** to enhance testability and falsifiability 5. **Compare systematically** across all criteria 6. **Prioritize for testing** based on distinguishability and feasibility Remember: The goal is not a perfect hypothesis, but a set of testable, falsifiable, informative hypotheses that advance understanding of the phenomenon.