Commit graph

9 commits

Author SHA1 Message Date
Dima Gerasimov
049820c827 my.github.gdpr: support uncompressed .tar.gz files
related to https://github.com/karlicoss/HPI/issues/20
2022-05-31 22:16:05 +01:00
Dima Gerasimov
1b4ca6ad1b github.gdpr: prepare for using .tag.gz 2022-05-31 22:16:05 +01:00
Maxim Efremov
80c5be7293 Adding bots file type to reduce parsing issues 2022-05-02 08:53:46 +01:00
Dima Gerasimov
5ef2775265 my.github: some work in progress on generating consistent ids
sadly it seems that there are at several issues:

- gdpr has less detailed data so it's hard to generate a proper ID at times
- sometimes there is a small (1s?) discrepancy between created_at between same event in GDPR an API
- some API events can have duplicate payload, but different id, which violates uniqueness
2021-04-02 20:09:53 +01:00
Sean Breckenridge
5ecd4b4810 cleanup; remove unused imports 2021-04-02 08:38:06 +01:00
Sean Breckenridge
02a9fb5e8f github.gdpr: parse project files
also fixed a typo in commit_comments
2021-03-15 12:40:22 +00:00
Dima Gerasimov
1cc4eb5d8d core: add helper for computing stats; use it in modules 2020-06-04 22:19:34 +01:00
Dima Gerasimov
a267aeec5b github: add config templates + docs
- ghexport: use export_path (export_dir is still supported)
2020-06-01 23:33:34 +01:00
Dima Gerasimov
ca39187c63 github: DEPRECATE my.coding.github
Instead my.github.all should be used (still backward compatible)

The reasons are
a) I don't feel that grouping (i.e. my.coding.*) makes much sense
b) using .all pattern (same way as twitter) allows for more composable and cleaner separation of GDPR and API data
2020-06-01 22:49:31 +01:00