wwwChecker

Download

What is this?

Web update checker.

Features

Method

  1. Flatten HTML DOM tree to the sequence of paragraphs.
  2. Apply diff algorithm to detect inserted/deleted paragraphs.
  3. Filter out irrelevant changes, which uses a linear combination of standard scores of (#[anchored text] / #[whole text]) and (log #[whole text]) per pages ("#[X]" means "the length of X").

Usage

  1. Please write URIs one per line in ~/.www-list file.
  2. python /path/to/wwwChecker
  3. A Web browser will be automatically started on finished. If not, please open a ~/.www-check.html manually.

cd ../

Yasuhiro Fujii <y-fujii at mimosa-pudica.net>