Free Depression Test Accuracy Checklist: How to Know If You Can Trust Your Results

Robert Williams, Consumer Finance Writer · Updated March 28, 2026

The PHQ-9 and a random mood quiz are both free, both ask about how you have been feeling, and both produce a score at the end. That is where the similarity ends. One has 88% sensitivity, decades of peer-reviewed validation, and standardized scoring used in clinical settings worldwide. The other is editorial content with no scientific basis. Knowing which one you are looking at changes what you should do with the result.

This checklist gives you five concrete checks to run on any free depression test before you act on its results. Some tests pass every item. Many fail on the first.

According to the National Institute of Mental Health (NIMH), only validated screeners - those with published sensitivity and specificity data - should be used to inform mental health decisions. This page shows you exactly how to tell them apart.

The Accuracy Checklist: 5 Things to Verify Before Trusting Any Free Depression Test

Checklist Item 1: Does It Use the PHQ-9 - and Is It the Real Version?

The PHQ-9 (Patient Health Questionnaire-9) is the most widely studied depression screener in clinical medicine. Developed by Spitzer, Kroenke, and Williams, it is freely licensed for clinical and research use by Pfizer, the original publisher.

The authentic PHQ-9 has exactly 9 symptom questions, each scored on a 0-3 scale, with a maximum total of 27. It also includes a 10th question - the functional impairment question - asking how much symptoms have interfered with work, daily tasks, and relationships.

Many online implementations skip that 10th question. Clinicians consider it essential. A score without it tells you nothing about real-world impact.

The PHQ-9 has documented sensitivity and specificity of 88% for major depression - but only when administered with standard scoring intact. Modify the questions or drop item 10, and those numbers no longer apply.

What to verify:

Exactly 9 scored symptom questions plus 1 functional impairment question
Each question scored 0 (Not at all) to 3 (Nearly every day)
Maximum score of 27
The site cites Spitzer, Kroenke, and Williams as original authors
The site credits Pfizer or PHQ Screeners as the source

Checklist Item 2: Does It Distinguish Screening from Diagnosis?

Screening accuracy and diagnostic accuracy are not the same thing. A well-built screener can be highly accurate while being completely inappropriate for self-diagnosis. Confusing the two is the most common mistake people make with free tests online.

A screener tells you whether you are likely to benefit from a professional evaluation. Diagnosis is a different process entirely - it requires a clinician who can weigh your full history, rule out medical causes, and apply DSM-5 diagnostic criteria from the American Psychiatric Association. One gives you a signal. The other gives you an answer.

What to verify:

The test explicitly says it is a screener, not a diagnostic tool
Results recommend professional follow-up rather than a self-diagnosis label
The site does not claim the test replaces clinical evaluation

Checklist Item 3: Is the BDI-II Being Used - and Is It the Official Version?

The Beck Depression Inventory-II (BDI-II) is more detailed than the PHQ-9 and widely used in research trials and clinical settings. It is considered the benchmark for measuring depression severity.

There is a catch. The BDI-II is copyrighted, which means truly "free" versions online are often unofficial adaptations with modified wording. Even small changes to a question can reduce psychometric validity - the scoring norms from published research no longer apply to the version you are taking.

If a site offers a "free BDI" without disclosing licensing, treat the results with caution. You may not be taking the same instrument that clinical studies validated.

What to verify:

The site discloses whether it is using the official BDI-II or an adapted version
If adapted, the site explains what was changed and why
Results include a note about the limitations of unofficial adaptations

Checklist Item 4: Does It Screen for Comorbidity - or Just Depression in Isolation?

Anxiety and depression co-occur in roughly 50% of cases. A depression-only screener misses half the picture for a significant portion of people who take it.

The GAD-7 (Generalized Anxiety Disorder 7-item scale) is frequently paired with depression screeners for exactly this reason. Both were developed by the same research team, use the same scoring format, and together they produce a far more complete picture.

A depression-only screener is not wrong. But you should know what it is not measuring before you act on the results.

What to verify:

Does the test include a GAD-7 or equivalent anxiety screener?
If not, does the results page acknowledge the limitation?
Does the test flag when anxiety symptoms are also elevated?

Checklist Item 5: Were You in the Right Conditions to Take It?

A validated instrument only works if you use it right. The conditions under which you complete a test matter as much as the test itself.

Accuracy drops when you rush through responses, take tests in public, or answer based on how you feel right now rather than reflecting on a stable two-week period. The PHQ-9 asks specifically about the past two weeks. Racing through 9 questions in 60 seconds does not give your memory time to actually cover that span.

According to NIMH, self-report reliability is affected by cognitive load, emotional state at the time of testing, and whether the environment allowed honest, private reflection.

Conditions checklist before you start:

You are in a private space where you feel comfortable being honest
You are not in crisis, in the middle of a panic, or acutely upset right now
You have 5-10 minutes to reflect - not 60 seconds
You are reading each question fully before answering
You are thinking about the past two weeks, not just today

Summary: Accuracy Checklist at a Glance

Check	What to Look For	Red Flag
PHQ-9 authenticity	9 questions + item 10, 0-3 scale, 27-point max, cites Spitzer/Kroenke/Williams	Missing item 10, no author credit, modified wording
Screening vs. diagnosis	Explicitly labeled as a screener, recommends professional follow-up	Claims to diagnose depression, skips follow-up guidance
BDI-II version	Discloses whether official or adapted, explains any modifications	Free BDI with no licensing disclosure
Comorbidity screening	Includes GAD-7 or flags what is not being screened	Depression-only with no mention of anxiety
Testing conditions	Private, calm, 5-10 minutes, reflecting on past two weeks	Rushed, public, taken during acute distress

Next Steps After Using This Checklist

Passing this checklist does not mean you have diagnosed anything. It means you have a credible score worth bringing to a professional.

A PHQ-9 score in the moderate to severe range is clinically meaningful. Bring the actual number - and the name of the test - to your next doctor's appointment or therapy intake. Clinicians know these instruments and can use your result as a baseline for tracking progress.

If you scored mild or minimal but something still feels off, trust that. Screening thresholds are population-level averages. Some people fall just below a cutpoint while experiencing real, significant distress.

Learn about the different types of depression tests to understand what each one measures
Take the PHQ-9 depression screener with proper scoring and item 10 included
Find out when your score means you should see a doctor
Take the combined PHQ-9 + GAD-7 screener for both depression and anxiety

Most Homeowners Skip 9 of These 12 Tasks

Gutters in November. HVAC filter every 90 days. Water heater flush in spring. This one-page calendar has every maintenance task by month - just print it and follow along.

The most accurate test is one you can actually trust - because it uses a validated instrument, tells you what it measures and what it does not, and you took it under the right conditions. This checklist covers all three. Run it every time.

Frequently Asked Questions

How do I know if a free online depression test is using the real PHQ-9 or a modified knockoff?

The authentic PHQ-9 has exactly 9 symptom questions plus one functional impairment question (item 10). Each question uses a 0-3 scale - "Not at all," "Several days," "More than half the days," and "Nearly every day." The maximum score is 27. The site should credit Spitzer, Kroenke, and Williams as authors and note that the instrument is freely licensed by Pfizer. If the wording feels different from what you find on official PHQ screener sites, or if item 10 is missing entirely, treat it as a modified version. Modified versions may not perform at the documented 88% sensitivity level.

Can a free depression test be as accurate as one given by a therapist?

The instrument itself - the PHQ-9 or BDI-II - is identical whether delivered online or by a clinician. The test questions do not change. What changes is the context around it. A therapist can ask follow-up questions, observe non-verbal cues, rule out medical causes, and apply DSM-5 diagnostic criteria from the American Psychiatric Association. The online version gives you a score. The clinical version gives you a score plus interpretation, context, and next steps. The numbers from a validated online test are real and meaningful - but what you do with them requires human judgment.

What makes a depression test clinically validated versus just a quiz?

A clinically validated instrument has three things a quiz does not. First, peer-reviewed published studies with data on sensitivity and specificity - meaning researchers tested it against real diagnoses and measured how often it got them right. Second, defined normative populations - the test was validated on specific groups, and you can see who. Third, standardized scoring with clinical cutpoints that correspond to real-world severity levels. An editorial quiz has none of these. To tell the difference instantly: look for published sensitivity and specificity percentages, a citation to original authors, and acknowledgment that the tool is a screener rather than a diagnostic instrument.

Is a high score on a free test enough reason to seek help?

Yes. A score in the moderate to severe range on a validated screener like the PHQ-9 is clinically meaningful. According to NIMH, validated screeners are specifically designed to identify people who are likely to benefit from professional evaluation. You do not need a formal diagnosis to reach out to a doctor, therapist, or mental health helpline. A score is a signal. Acting on that signal is the right response - not waiting for certainty.

Should I retake the test if I think my results were affected by bad conditions?

Yes - and it is encouraged. Self-report accuracy is directly tied to the conditions under which you complete a test. If you rushed, were distracted, or took the test during a moment of acute distress, the results reflect that moment rather than your typical experience over the past two weeks. Wait until you are calm, private, and have enough time to reflect. Then retake it. Clinicians often re-administer the PHQ-9 at follow-up appointments for this reason. One score is data. Two scores over time is a pattern.

About this article

Researched and written by Robert Williams at depression tests. Our editorial team reviews depression tests to help readers make informed decisions. About our editorial process.