项目作者: niklas88

项目描述 :
An bare bones, text only local Wikipedia server, working directly on the .bz2 compressed dumps
高级语言: Go
项目地址: git://github.com/niklas88/tinypedia.git
创建时间: 2018-02-21T19:11:47Z
项目社区:https://github.com/niklas88/tinypedia

开源协议:Apache License 2.0

下载


tinypedia

A very bare bones, text only, local Wikipedia server. It works directly on the
.bz2 compressed dumps without creating any additional files.

Current State

Retrieving articles by title using the path #<URL-encoded-article-name>works
and the text can be viewed as extracted by wtf_wikipedia.js + some formatting
for sections. Sadly this fails to extract the text from special markup such as
IPA pronounciations. The raw mediawiki markdown can also be extracted using
/wiki/<URL-encoded-article-name>

Since we currently use URL encoding directly this is not compatible with the
title encoding used by Wikipedia (e.g. Ada%20Lovelace instead of Ada_Lovelace).

Building and Installing

First make sure you have Go and the go command installed and that
$GOTPATH/bin is in your path. Then install with a simple go get

  1. go get github.com/ad-freiburg/tinypedia

Running

Change to the directory containing the dump files

  • enwiki-latest-pages-articles-multistream-index.txt.bz2
  • enwiki-latest-pages-articles-multistream.xml.bz2

And simply run the tinypedia exectuable

  1. tinypedia

If you have named the files differenty use the -i and -d command line
switches to point tinypedia to the index and data files respectively.