GitHub Scraper (beta, active development paused)

GitHub Scraper automatically builds issues and PRs tables in Google Sheets documents, and periodically updates them.

Multirepo: track several repositories within one sheet and several sheets within one spreadsheet
Constructible: tweak table structure, coloring and filling functions
Adaptive: change your preferences and add new repos/sheets without restarting Scraper
Ready to go: avoid tweaking Scraper and just use completely workable examples

Setup
To build your tables and start tracking repositories you need:

Enable Google Sheets API
Install required packages with requirements.txt
In scraper folder:
Create config.py and set your tables configurations with it (see config_example.py)
Create fill_funcs.py and set your filling functions with it (see fill_funcs_example.py)
Create and run main.py (see main_example.py)

Scraper will build tables and start tracking specified repositories. First filling can take time, but subsequent updates are faster (~80% faster), as Scraper is processing only recently updated PRs and issues. You can check filling progress in logs.txt. If any error occur, its traceback will be shown in logs.txt as well.

Structure, auto and manual filling
You can tweak table filling in fill_funcs.py, leaving some columns for manual-only use (for example "Comment"), setting ignoring and cleanup rules, sorting, coloring, etc., in any way you like.

Scraper uses config.py as a source of preferences. Before update it reloads config.py module, so you can change preferences without stoping Scraper - add new sheets, repositories, rules, etc.

PR autodetection
To make Scraper detect PRs, use GitHub keywords "Towards", "Closes", "Fixes" to make link from PRs body to the original issue. Scraper will use these links to fill "Public PR" field in the related issues.

Archive
By default, Scraper moves issues with Done status into Archive sheet and stops tracking them. This feature allows to avoid overwhelming sheets with non-active data. The behavior can be changed by setting ARCHIVE_SHEET configurations and to_be_archived filling function.

Credentials
On a first Scraper launch you'll have to authenticate on Google Sheets API (you'll see appropriate popup-window) and on GitHub (with a console). On subsequent launches Scraper will use previous credentials without asking to enter them once again.

Beta version disclaimer
Scraper is in a state of active development yet. Please, use Releases as the most stable versions, and feel free to open an issue in case of any problems.

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.github		.github
scraper		scraper
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
GitHubScraperPreview.png		GitHubScraperPreview.png
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

scraper

scraper

.gitignore

.gitignore

CONTRIBUTING.md

CONTRIBUTING.md

GitHubScraperPreview.png

GitHubScraperPreview.png

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

GitHub Scraper (beta, active development paused)

About

Releases 8

Packages

Languages

License

IlyaFaer/GitHub-Scraper

Folders and files

Latest commit

History

Repository files navigation

GitHub Scraper (beta, active development paused)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages