karlicoss
65c617ed94
my.emfit: add missing properties to fake data generator
2023-11-10 02:10:30 +00:00
karlicoss
ac5f71c68b
my.jawbone: get rid of matplotlib import on top level
2023-11-10 02:10:30 +00:00
karlicoss
e547acfa59
general: update minimal cachew version
...
had quite a few useful fixes/performance optimizations since
2023-11-07 21:24:56 +00:00
karlicoss
33f8d867e2
my.browser.export: cleanup
...
- make logging INFO (default) -- otherwise it's too quiet during processing lots of databases
- can pass inputs cachew directly now
2023-11-07 21:24:56 +00:00
karlicoss
19353e996d
my.hackernews.harmonic: use orjson + add __hash__ for Saved object
...
plus some minor cleanup
2023-11-07 01:03:57 +00:00
karlicoss
4ac3bbb101
my.bumble.android: fix message deduplication
2023-11-07 01:03:57 +00:00
karlicoss
5630621ec1
my.pinboard: some cleanup
2023-11-06 23:10:00 +00:00
karlicoss
7631f1f2e4
monzo.monzoexport: initial module
2023-11-02 00:47:13 +00:00
karlicoss
105928238f
vk_messages_backup: some cleanup + switch to get_files
2023-11-02 00:43:10 +00:00
Dima Gerasimov
24da04f142
ci: fix wrong release command
2023-11-01 01:54:16 +00:00
karlicoss
71cb66df5f
core: add helper for more_iterable to check that all types involved are hashable
...
Otherwise unique_everseen performance may degrade to quadratic rather than linear
For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag
also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
d6786084ca
general: deprecate some old methods by hiding behind TYPE_CHECKING
2023-10-30 22:51:31 +00:00
karlicoss
79ce8e84ec
fbmessenger.android: support processing msys database
...
seems that threads_db2 stopped updating some time ago, and msys contains all new data now
2023-10-30 02:54:22 +00:00
karlicoss
f28f68b14b
general: enhancle logging for various modules
2023-10-29 22:32:07 +00:00
karlicoss
ea195e3d17
general: improve logging during file processing in various modules
2023-10-29 01:01:30 +01:00
karlicoss
bd27bd4c24
docs: add documentation on logging during HPI module development
2023-10-29 00:50:22 +01:00
karlicoss
f668208bce
my.stackexchange.stexport: small cleanup & stat improvements
2023-10-28 21:33:36 +01:00
Dima Gerasimov
6821fbc2fe
core/config: implement a warning if config is imported from the dir other than MY_CONFIG
...
this should help with identifying setup issues
2023-10-28 20:56:07 +01:00
Dima Gerasimov
edea2c2e75
my.kobo: add hightlights method to return Hightlight objects iteratively
...
also minor cleanup
2023-10-28 20:06:54 +01:00
Dima Gerasimov
d88a1b9933
my.hypothesis: explose data as iterators instead of lists
...
also add an adapter to support migrating in backwards compatible manner
2023-10-28 20:06:54 +01:00
Dima Gerasimov
4f7c9b4a71
core: move split compat/legacy modules into hpi_compat and compat
2023-10-28 20:06:54 +01:00
karlicoss
70bf51a125
core/stats: exclude contextmanagers from guess_stats
2023-10-28 00:08:32 +01:00
karlicoss
fb2b3e07de
my.emfit: cleanup and pass cpu pool
2023-10-27 23:52:03 +01:00
Dima Gerasimov
32aa87b3ec
dcotor: make compileall check a bit more defensive
2023-10-27 02:38:22 +01:00
karlicoss
3a25c9042c
my.hackernews.dogsheep: use utc datetime + minor cleanup
2023-10-27 02:38:03 +01:00
karlicoss
bef0423b4f
my.zulip.organization: use UTC timestamps, support custom archive names + some cleanup
2023-10-27 02:38:03 +01:00
karlicoss
a0910e798d
core.logging: ignore CollapseLogsHandler if we're not attached to a terminal
...
otherwise fails at os.get_terminal_size
2023-10-25 02:42:52 +01:00
Dima Gerasimov
1f61e853c9
reddit.rexport: experiment with using optional cpu pool (used by all of HPI)
...
Enabled by the env variable, specifying how many cores to dedicate, e.g.
HPI_CPU_POOL=4 hpi query ...
2023-10-25 02:06:45 +01:00
Dima Gerasimov
a5c04e789a
twitter.archive: deduplicate results via json.dumps
...
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs
interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
Dima Gerasimov
0e94e0a9ea
whatsapp.andrdoid: handle most messages types properly
2023-10-24 00:31:34 +01:00
Dima Gerasimov
72ab2603d5
my.whatsapp.android: exclude some dummy messages, minor cleanup
2023-10-24 00:31:34 +01:00
Dima Gerasimov
414b88178f
tinder.android: infer user's own name automatically
2023-10-24 00:31:34 +01:00
Dima Gerasimov
f355a55e06
my.instagram.gdpr: process all historic archives + better normalising
2023-10-23 18:42:50 +01:00
Dima Gerasimov
f9a1050ceb
my.instagram.android: more defensive error handling
2023-10-23 18:42:50 +01:00
karlicoss
86ea605aec
core/stats: enable processing input files, report first and last filename
...
can be useful for quick investigation/testing setup
2023-10-22 00:47:36 +01:00
karlicoss
c335c0c9d8
core/stats: report datetime of first item in addition to last
...
quite useful for quickly determining time span of a data source
2023-10-22 00:47:36 +01:00
karlicoss
a60d69fb30
core/stats: get rid of duplicated keys for 'auto stats'
...
previously:
```
{'iter_data': {'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}}
```
after
```
{'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}
```
2023-10-22 00:47:36 +01:00
karlicoss
c5fe2e9412
core.stats: fix is_data_provider when from __future__ import annotations is used
2023-10-21 23:46:40 +01:00
karlicoss
872053a3c3
my.hackernews.harmonic: fix issue with crashing due to html escaping
...
also add proper logging
2023-10-21 23:46:40 +01:00
karlicoss
37bb33cdbc
experimental: add a hacky helper to import "original/shadowed" modules from within overlays
2023-10-21 22:46:16 +01:00
karlicoss
8c2d1c9463
general: use less explicit kompress boilerplate in modules
...
now get_files/kompress library can handle it transparently
2023-10-20 21:13:59 +01:00
karlicoss
c63e80ce94
core: more consistent handling of zip archives in get_files + tests
2023-10-20 21:13:59 +01:00
Dima Gerasimov
9ffce1b696
reddit.rexport: add accessors for subreddits, multireddits and profile
2023-10-19 02:26:28 +01:00
Dima Gerasimov
29832a9f75
core: fix test_get_files after updating kompress
2023-10-19 02:26:28 +01:00
Dima Gerasimov
28d2450a21
reddit.rexport: some cleanup, move get_events stuff into personal overlay
2023-10-19 02:26:28 +01:00
karlicoss
fe26efaea8
core/kompress: move vendorized to _deprecated, use kompress library directly
2023-10-12 23:47:05 +01:00
karlicoss
bb478f369d
core/logging: no need for super call in Filter
2023-10-12 23:47:05 +01:00
karlicoss
68289c1be3
general: fix ignores after mypy version update
2023-10-12 23:47:05 +01:00
Dima Gerasimov
0512488241
ci: sync configs to pymplate
...
- add python3.12
- add ruff
2023-10-06 02:24:01 +01:00
Dima Gerasimov
fabcbab751
fix mypy errors after version update
2023-10-02 01:27:49 +01:00