项目作者: ivanovishado

项目描述 :
Article scraper for Mexican news websites. My terminal project at Universidad de Guadalajara - CUCEI 2018.
高级语言: Python
项目地址: git://github.com/ivanovishado/NewsScraper.git
创建时间: 2018-04-16T18:02:57Z
项目社区:https://github.com/ivanovishado/NewsScraper

开源协议:

下载


NewsScraper - Scrape any newspaper automatically

This is a simple python script for automatically scraping the most recent articles from any news-site.
Just add the websites you want to scrape to NewsPapers.json and the script will go through
and scrape each site listed in the file.

For more info read comments in NewsScraper.py.

Usage

To run this project, you must create a virtual environment,
then install the packages in requirements.txt with:

  1. pip install -r requirements.txt

Open a terminal, set the environment variable ‘FLASK_APP’ to test.py

  1. set FLASK_APP=test.py

In Linux:

  1. export FLASK_APP=test.py

Then execute:

  1. flask run

The app should be running on localhost on port 5000

You also need to have a MongoDB server running in order to save the scraped articles.

Libraries

This script uses the following libraries:

https://github.com/codelucas/newspaper

https://github.com/kurtmckee/feedparser