Data Limitations | ConflictClarifier

No dataset on this site is complete

Every dashboard, counter, and verification label is a best-effort summary of what our pipeline could collect from publicly available sources at last refresh. The limits below are real and material.

Fog of war

During active conflict the same incident is described differently within hours by different sides, and the descriptions change over the following days. First-day casualty counts are routinely revised upward or downward by 50% or more. Treat any live counter on this site as a snapshot of the reporting environment at that moment, not as ground truth.

Source-availability bias

We can only see what is published. Areas with internet shutdowns, embedded journalists, expelled press corps, or active military operations are systematically under-reported. The absence of events for a region on a given day usually means we could not collect, not that nothing happened.

Language bias

Our pipeline is heaviest on English-language sources, with secondary coverage in Arabic, Persian, Hebrew, Russian, Chinese, and Turkish via NewsData.io.
Stories that break first in less-covered languages reach our system later than English-language stories, sometimes by a day or more.
Translation introduces drift. Where we quote a foreign-language source, the original wording may carry connotations the translation loses.

Platform bias

X (Twitter) coverage is biased by what X surfaces to its API. Suppressed, shadow-banned, or geo-restricted posts are invisible to us.
Other platforms (Telegram, VK, WeChat, Facebook, TikTok) are not currently in the ingestion pipeline. Conflict reporting that lives primarily on those platforms is under-represented.
Algorithmic feed bias affects what we see — virality is not the same as accuracy.

State censorship risk

State-owned outlets do not publish material that contradicts their government. Their silence is a data point.
Some governments require domestic press to use specific framings. Stories that originate in those press environments inherit those constraints.
Independent outlets operating inside authoritarian environments may self-censor.

API and pipeline outage risk

Our data depends on third-party APIs (CurrentsAPI, NewsData.io, GDELT, ACLED, ScrapingBee, X, oil-price feeds). Any of these can fail.
When a feed is down, the corresponding section of the site falls behind. We try to surface this honestly rather than show stale data as fresh.
Pipeline health is monitored, but readers should treat a sudden change in event volume as a possible pipeline issue rather than a real-world signal.

Delayed verification risk

Verifying a claim independently takes time. New events sit at Unverified or Partial by default. The label is not a value judgement about the story — it is a status. Some unverified items will turn out to be true; some will turn out to be wrong.

Historical revision risk

Records on the site are not frozen. If new evidence resolves a contradiction, the verification status of the original event changes and a correction note is added. Citing this site, please:

Include the URL.
Include the date you accessed it.
Re-check if the figure or status is load-bearing for your work.

Casualty figures are uncertain

Casualty counts come from named sources. They are not the site's independently produced numbers.
Different sources use different methodologies (confirmed vs. reported, civilian vs. combatant definitions, hospital intake vs. on-scene counts).
We do not sum incompatible methodologies into a single figure.

Prediction and simulator outputs are not forecasts

The Predictions and Simulator features present scenarios and probability ranges. They are not forecasts and they are not advice — see the disclaimers on those pages. They are most useful as a way of structuring "what would need to happen for this to escalate / de-escalate", not as an answer.

Geolocation precision

Event locations are taken from source reporting. Where multiple sources disagree on the exact location, we use the most specific common element (city, province, country) rather than picking one.
Coordinates, where shown, are approximate unless explicitly geolocated against satellite imagery.

What we are working on

Improved non-English language coverage.
Tighter event clustering across syndicated wires.
More structured casualty attribution.
Better surfacing of pipeline health on the front end.

What this page is for

This is the page we point readers to when a number on a dashboard surprises them. The most truthful thing we can say about any specific figure is: here are the sources we drew on, here are the rules we applied, here is what could still be wrong. That is what these methodology pages exist to provide.