项目作者: erikfox

项目描述 :
A comprehensive module used to calculate the high bound, low bound, and center of a Wilson score interval.
高级语言: JavaScript
项目地址: git://github.com/erikfox/wilson-interval.git
创建时间: 2015-01-18T05:36:56Z
项目社区:https://github.com/erikfox/wilson-interval

开源协议:MIT License

下载


Wilson Interval

Twitter @erkfox

license
npm
npm
CircleCI
Coveralls

Pull Requests Welcome

A comprehensive module used to calculate the high bound, low bound, and center of a Wilson score interval. Features support for known populations (i.e. Singleton’s adjustment).

Popularized by Reddit’s Comment/Best Sort and similar voting algorithms.

Install

  1. npm install wilson-interval

Include

  1. import wilson from 'wilson-interval';

Usage

wilson(observed, sample[, population ][, options ]);

  • observed - Number of observed positive outcomes.

  • sample - Size of sample.

Optional arguments:

  • population - Default false. Total population from which sample was taken (to use Singleton’s adjustment[1]).

  • options - Default {}. Options object. Available parameters:

    • confidence - Default 0.95. Desired confidence level of interval.
    • precision - Default 20. Number of significant figures to use in calculations and output.

Example

  1. return wilson(5, 100);

returns

  1. {
  2. "center": "0.066647073981204927863",
  3. "high": "0.11175046869375655694",
  4. "low": "0.021543679268653298792",
  5. }

Use cases

Low bound sorting

Most often, the low bound of the interval will be used as the sorting parameter (e.g. Reddit’s Comment/Best Sort). This places more importance on confidence than total score.

Even if a ranked item has 100% positive responses, this ensures it won’t be ranked at the top until enough data has been gathered for the algorithm to be confident that that ratio is what it really deserves.

Singleton’s adjustment

Uses a known, finite population size to inform the degree of uncertainty of the prediction.

Singleton's adjustment

Descriptive statistics summarises the sample as if it were the entire population (left), whereas inferential statistics assumes the sample is a tiny subset of the population (right). If the sample is a large part of the population the confidence interval on observations is reduced (middle).[1]

USE WHEN:

  1. Your sample size represents a significant portion of the population.
  2. You have an imperfect original sample, from which you can only verify a subsample. The original sample can serve as a “population” to produce a verification interval to be combined with the first.[2]

Sources

[1] Wallis, Sean 2012. Inferential Statistics — and other animals. London: Survey of English Usage, UCL.

[2] Wallis, Sean 2014. Coping with imperfect data. London: Survey of English Usage, UCL.


Special thanks to Sean Wallis—Senior Research Fellow, Survey of English Usage—for his aid in transcribing equations, and for his blog posts which inspired many of the features of this module.