项目作者: attwad

项目描述 :
Scraper for College de France audio pages
高级语言: Python
项目地址: git://github.com/attwad/cdf-scraper.git
创建时间: 2017-07-19T05:07:04Z
项目社区:https://github.com/attwad/cdf-scraper

开源协议:MIT License

下载


College de France audio scraper

Build Status

This is a sraper for pages containing audio material from the College de France website.

Purpose

It doesn’t download any audio file but instead stores metadata about them (lesson title, lecturer, date, etc.)
in Google’s Datastore to allow statistics and further extraction of data to be done.

How to run

For devs, follow the gist.

For an actual prod run, a docker file is provided you can run it with:

  1. docker build -t scraper .
  2. docker run scraper

You’ll have to create your own project in Google Compute Engine and pass in your project ID and proper
json service account credentials via environment variables.

Output

Once you run it, you’ll see a few thousands entities in your dashboard.