Commit graph

36 commits

Author SHA1 Message Date
Dima Gerasimov
6a18f47c37 my.github.gdpr/my.zulip.organization: use kompress support for tar.gz if it's available
otherwise fall back onto unpacking into tmp dir via my.core.structure
2024-09-18 23:35:03 +01:00
Dima Gerasimov
d0df8e8f2d ruff: enable PLR rules and fix bug in my.github.gdpr._is_bot 2024-08-28 04:06:32 +01:00
Dima Gerasimov
d244c7cc4e ruff: enable and fix C4 ruleset 2024-08-28 04:06:32 +01:00
Dima Gerasimov
c64d7f5b67 core: cleanup itertool style helpers
- deprecate group_by_key, should use itertool.bucket instead
- move make_dict and ensure_unique to my.core.utils.itertools
2024-08-16 10:22:29 +01:00
Dima Gerasimov
973c4205df core: cleanup deprecations, exclude from type checking and show runtime warnings
among affected things:

- core.common.assert_never
- core.common.cproperty
- core.common.isoparse
- core.common.mcachew
- core.common.the
- core.common.tzdatetime
- core.compat.sqlite_backup
2024-08-16 10:22:29 +01:00
karlicoss
f28f68b14b general: enhancle logging for various modules 2023-10-29 22:32:07 +00:00
Dima Gerasimov
4f7c9b4a71 core: move split compat/legacy modules into hpi_compat and compat 2023-10-28 20:06:54 +01:00
Dima Gerasimov
642e3b14d5 my.github.gdpr: some minor enhancements
- better error context
- handle some unknown files
- handle user=None in some cases
- cleanup imports
2023-08-24 23:46:23 +01:00
Dima Gerasimov
dff31455f1 general: switch to make_logger in a few modules, use a bit more consistent logging, rely on default INFO level 2023-06-21 18:42:15 +01:00
Dima Gerasimov
c12224af74 misc: replace uses of pytz.utc with timezone.utc where it makes sense 2023-06-09 03:31:13 +01:00
Dima Gerasimov
5fe21240b4 core: move mcachew into my.core.cachew; use better typing annotations (copied from cachew) 2023-06-08 01:29:49 +01:00
Dima Gerasimov
c34656e8fb general: update mypy config, seems that logs of type: ignore aren't necessary anymore 2023-05-16 01:18:30 +01:00
Kian-Meng Ang
d2ef23fcb4 docs: fix typos
found via `codespell -L copie,datas,pres,fo,tooks,noo,ue,ket,frop`
2023-03-27 03:02:35 +01:00
Dima Gerasimov
049820c827 my.github.gdpr: support uncompressed .tar.gz files
related to https://github.com/karlicoss/HPI/issues/20
2022-05-31 22:16:05 +01:00
Dima Gerasimov
1b4ca6ad1b github.gdpr: prepare for using .tag.gz 2022-05-31 22:16:05 +01:00
Maxim Efremov
80c5be7293 Adding bots file type to reduce parsing issues 2022-05-02 08:53:46 +01:00
Dima Gerasimov
5ef2775265 my.github: some work in progress on generating consistent ids
sadly it seems that there are at several issues:

- gdpr has less detailed data so it's hard to generate a proper ID at times
- sometimes there is a small (1s?) discrepancy between created_at between same event in GDPR an API
- some API events can have duplicate payload, but different id, which violates uniqueness
2021-04-02 20:09:53 +01:00
Dima Gerasimov
386234970b my.github.ghexport: handle more event types, more consisten body handling 2021-04-02 20:09:53 +01:00
Sean Breckenridge
5ecd4b4810 cleanup; remove unused imports 2021-04-02 08:38:06 +01:00
Sean Breckenridge
02a9fb5e8f github.gdpr: parse project files
also fixed a typo in commit_comments
2021-03-15 12:40:22 +00:00
Dima Gerasimov
3e821ca7fd my.github.ghexport: get rid of custom cache_dir 2021-02-21 19:51:58 +00:00
Dima Gerasimov
571cb48aea core: add modules_ast for more robust module collection 2020-12-11 07:02:16 +01:00
Dima Gerasimov
cc127f1876 kython.klogging
- move to core
- add a proper description why it's useful
- make default level INFO
- use HPI_LOGS variable for easier log level control (abdc6df1ea)
2020-10-29 03:13:18 +01:00
Dima Gerasimov
e8e4994c02 google.takeout.paths: return Optional if there are no takeouts 2020-10-12 21:48:04 +02:00
Dima Gerasimov
0682919449 general: use module dependencies as proper PIP packages + fallback 2020-09-30 23:33:06 +02:00
Dima Gerasimov
abbaa47aaf core.warnings: handle stacklevel properly
add more warnings about deprecated config arguments
2020-09-29 19:44:45 +02:00
Dima Gerasimov
109edd9da3 general: add compat module and helper for easy backwards compatibiltity for pre-PIP dependencies
my.hypothesis: use hypexport as a proper PIP package + fallback
2020-09-29 19:44:45 +02:00
Sean Breckenridge
78489157a1 fix spelling mistakes 2020-09-06 20:44:28 +01:00
Dima Gerasimov
092aef88ce core: detect compression, wrap in CPath if necessary 2020-07-26 21:31:26 +01:00
Dima Gerasimov
1cc4eb5d8d core: add helper for computing stats; use it in modules 2020-06-04 22:19:34 +01:00
Dima Gerasimov
a267aeec5b github: add config templates + docs
- ghexport: use export_path (export_dir is still supported)
2020-06-01 23:33:34 +01:00
Dima Gerasimov
ca39187c63 github: DEPRECATE my.coding.github
Instead my.github.all should be used (still backward compatible)

The reasons are
a) I don't feel that grouping (i.e. my.coding.*) makes much sense
b) using .all pattern (same way as twitter) allows for more composable and cleaner separation of GDPR and API data
2020-06-01 22:49:31 +01:00
Dima Gerasimov
d7aff1be3f github: start moving to a proper artbitrated module 2020-06-01 22:49:31 +01:00
Dima Gerasimov
18638d60dd move github to my.coding 2019-09-20 08:04:07 +01:00
Dima Gerasimov
b9a06d7a7d adjust github provier to use exporter model 2019-09-19 23:53:31 +01:00
Dima Gerasimov
a99661892a import github 2019-09-19 23:19:27 +01:00