Commit graph

1103 commits

Author SHA1 Message Date
Dima Gerasimov
adbc0e73a2 docs: add note about directly checking overlays with mypy 2023-12-22 02:54:36 +00:00
Dima Gerasimov
84d835962d docs: some documentation/thoughts on properly implementing overlay packages 2023-12-20 02:51:27 +00:00
Sean Breckenridge
224ba521e3 gpslogger: catch broken xml file error 2023-12-20 02:41:52 +00:00
Dima Gerasimov
a843407e40 core/compat: move fromisoformat to .core.compat module 2023-11-19 23:45:08 +00:00
karlicoss
09e0f66892 tox: disable --parallel flag in hpi module install
It's been so flaky it ends up taking more time to merge stuff. See https://github.com/karlicoss/HPI/issues/306
2023-11-19 19:18:19 +00:00
Dima Gerasimov
bde43d6a7a my.body.sleep: massive speedup for average temperature calculation 2023-11-11 00:42:49 +00:00
karlicoss
37643c098f tox: remove cat coverage index from tox, it's not very useful anyway 2023-11-10 23:11:54 +00:00
karlicoss
7b1cec9326 codeforces/topcode: move to top level and check in ci 2023-11-10 23:11:54 +00:00
karlicoss
657ce08ac8 fix mypy issues after mypy/libraries updates 2023-11-10 22:59:09 +00:00
karlicoss
996169aa29 time.tz.via_location: more consistent behaviour wrt caching
previously it was possible to cachew never properly initialize the cache because if you only queried some dates in the past
because we never made it to the end of _iter_tzs

also some minor cleanup
2023-11-10 22:59:09 +00:00
karlicoss
70bb9ed0c5 location.google_takeout_semantic: handle None visitConfidence 2023-11-10 02:10:30 +00:00
karlicoss
65c617ed94 my.emfit: add missing properties to fake data generator 2023-11-10 02:10:30 +00:00
karlicoss
ac5f71c68b my.jawbone: get rid of matplotlib import on top level 2023-11-10 02:10:30 +00:00
karlicoss
e547acfa59 general: update minimal cachew version
had quite a few useful fixes/performance optimizations since
2023-11-07 21:24:56 +00:00
karlicoss
33f8d867e2 my.browser.export: cleanup
- make logging INFO (default) -- otherwise it's too quiet during processing lots of databases
- can pass inputs cachew directly now
2023-11-07 21:24:56 +00:00
karlicoss
19353e996d my.hackernews.harmonic: use orjson + add __hash__ for Saved object
plus some minor cleanup
2023-11-07 01:03:57 +00:00
karlicoss
4ac3bbb101 my.bumble.android: fix message deduplication 2023-11-07 01:03:57 +00:00
karlicoss
5630621ec1 my.pinboard: some cleanup 2023-11-06 23:10:00 +00:00
karlicoss
7631f1f2e4 monzo.monzoexport: initial module 2023-11-02 00:47:13 +00:00
karlicoss
105928238f vk_messages_backup: some cleanup + switch to get_files 2023-11-02 00:43:10 +00:00
Dima Gerasimov
24da04f142 ci: fix wrong release command 2023-11-01 01:54:16 +00:00
karlicoss
71cb66df5f core: add helper for more_iterable to check that all types involved are hashable
Otherwise unique_everseen performance may degrade to quadratic rather than linear

For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag

also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
d6786084ca general: deprecate some old methods by hiding behind TYPE_CHECKING 2023-10-30 22:51:31 +00:00
karlicoss
79ce8e84ec fbmessenger.android: support processing msys database
seems that threads_db2 stopped updating some time ago, and msys contains all new data now
2023-10-30 02:54:22 +00:00
karlicoss
f28f68b14b general: enhancle logging for various modules 2023-10-29 22:32:07 +00:00
karlicoss
ea195e3d17 general: improve logging during file processing in various modules 2023-10-29 01:01:30 +01:00
karlicoss
bd27bd4c24 docs: add documentation on logging during HPI module development 2023-10-29 00:50:22 +01:00
karlicoss
f668208bce my.stackexchange.stexport: small cleanup & stat improvements 2023-10-28 21:33:36 +01:00
Dima Gerasimov
6821fbc2fe core/config: implement a warning if config is imported from the dir other than MY_CONFIG
this should help with identifying setup issues
2023-10-28 20:56:07 +01:00
Dima Gerasimov
edea2c2e75 my.kobo: add hightlights method to return Hightlight objects iteratively
also minor cleanup
2023-10-28 20:06:54 +01:00
Dima Gerasimov
d88a1b9933 my.hypothesis: explose data as iterators instead of lists
also add an adapter to support migrating in backwards compatible manner
2023-10-28 20:06:54 +01:00
Dima Gerasimov
4f7c9b4a71 core: move split compat/legacy modules into hpi_compat and compat 2023-10-28 20:06:54 +01:00
karlicoss
70bf51a125 core/stats: exclude contextmanagers from guess_stats 2023-10-28 00:08:32 +01:00
karlicoss
fb2b3e07de my.emfit: cleanup and pass cpu pool 2023-10-27 23:52:03 +01:00
Dima Gerasimov
32aa87b3ec dcotor: make compileall check a bit more defensive 2023-10-27 02:38:22 +01:00
karlicoss
3a25c9042c my.hackernews.dogsheep: use utc datetime + minor cleanup 2023-10-27 02:38:03 +01:00
karlicoss
bef0423b4f my.zulip.organization: use UTC timestamps, support custom archive names + some cleanup 2023-10-27 02:38:03 +01:00
karlicoss
a0910e798d core.logging: ignore CollapseLogsHandler if we're not attached to a terminal
otherwise fails at os.get_terminal_size
2023-10-25 02:42:52 +01:00
Dima Gerasimov
1f61e853c9 reddit.rexport: experiment with using optional cpu pool (used by all of HPI)
Enabled by the env variable, specifying how many cores to dedicate, e.g.

HPI_CPU_POOL=4 hpi query ...
2023-10-25 02:06:45 +01:00
Dima Gerasimov
a5c04e789a twitter.archive: deduplicate results via json.dumps
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs

interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
Dima Gerasimov
0e94e0a9ea whatsapp.andrdoid: handle most messages types properly 2023-10-24 00:31:34 +01:00
Dima Gerasimov
72ab2603d5 my.whatsapp.android: exclude some dummy messages, minor cleanup 2023-10-24 00:31:34 +01:00
Dima Gerasimov
414b88178f tinder.android: infer user's own name automatically 2023-10-24 00:31:34 +01:00
Dima Gerasimov
f355a55e06 my.instagram.gdpr: process all historic archives + better normalising 2023-10-23 18:42:50 +01:00
Dima Gerasimov
f9a1050ceb my.instagram.android: more defensive error handling 2023-10-23 18:42:50 +01:00
karlicoss
86ea605aec core/stats: enable processing input files, report first and last filename
can be useful for quick investigation/testing setup
2023-10-22 00:47:36 +01:00
karlicoss
c335c0c9d8 core/stats: report datetime of first item in addition to last
quite useful for quickly determining time span of a data source
2023-10-22 00:47:36 +01:00
karlicoss
a60d69fb30 core/stats: get rid of duplicated keys for 'auto stats'
previously:
```
{'iter_data': {'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}}
```

after
```
{'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}
```
2023-10-22 00:47:36 +01:00
karlicoss
c5fe2e9412 core.stats: fix is_data_provider when from __future__ import annotations is used 2023-10-21 23:46:40 +01:00
karlicoss
872053a3c3 my.hackernews.harmonic: fix issue with crashing due to html escaping
also add proper logging
2023-10-21 23:46:40 +01:00