Dima Gerasimov
bde43d6a7a
my.body.sleep: massive speedup for average temperature calculation
2023-11-11 00:42:49 +00:00
karlicoss
7b1cec9326
codeforces/topcode: move to top level and check in ci
2023-11-10 23:11:54 +00:00
karlicoss
657ce08ac8
fix mypy issues after mypy/libraries updates
2023-11-10 22:59:09 +00:00
karlicoss
996169aa29
time.tz.via_location: more consistent behaviour wrt caching
...
previously it was possible to cachew never properly initialize the cache because if you only queried some dates in the past
because we never made it to the end of _iter_tzs
also some minor cleanup
2023-11-10 22:59:09 +00:00
karlicoss
70bb9ed0c5
location.google_takeout_semantic: handle None visitConfidence
2023-11-10 02:10:30 +00:00
karlicoss
65c617ed94
my.emfit: add missing properties to fake data generator
2023-11-10 02:10:30 +00:00
karlicoss
ac5f71c68b
my.jawbone: get rid of matplotlib import on top level
2023-11-10 02:10:30 +00:00
karlicoss
33f8d867e2
my.browser.export: cleanup
...
- make logging INFO (default) -- otherwise it's too quiet during processing lots of databases
- can pass inputs cachew directly now
2023-11-07 21:24:56 +00:00
karlicoss
19353e996d
my.hackernews.harmonic: use orjson + add __hash__ for Saved object
...
plus some minor cleanup
2023-11-07 01:03:57 +00:00
karlicoss
4ac3bbb101
my.bumble.android: fix message deduplication
2023-11-07 01:03:57 +00:00
karlicoss
5630621ec1
my.pinboard: some cleanup
2023-11-06 23:10:00 +00:00
karlicoss
7631f1f2e4
monzo.monzoexport: initial module
2023-11-02 00:47:13 +00:00
karlicoss
105928238f
vk_messages_backup: some cleanup + switch to get_files
2023-11-02 00:43:10 +00:00
karlicoss
71cb66df5f
core: add helper for more_iterable to check that all types involved are hashable
...
Otherwise unique_everseen performance may degrade to quadratic rather than linear
For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag
also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
d6786084ca
general: deprecate some old methods by hiding behind TYPE_CHECKING
2023-10-30 22:51:31 +00:00
karlicoss
79ce8e84ec
fbmessenger.android: support processing msys database
...
seems that threads_db2 stopped updating some time ago, and msys contains all new data now
2023-10-30 02:54:22 +00:00
karlicoss
f28f68b14b
general: enhancle logging for various modules
2023-10-29 22:32:07 +00:00
karlicoss
ea195e3d17
general: improve logging during file processing in various modules
2023-10-29 01:01:30 +01:00
karlicoss
f668208bce
my.stackexchange.stexport: small cleanup & stat improvements
2023-10-28 21:33:36 +01:00
Dima Gerasimov
6821fbc2fe
core/config: implement a warning if config is imported from the dir other than MY_CONFIG
...
this should help with identifying setup issues
2023-10-28 20:56:07 +01:00
Dima Gerasimov
edea2c2e75
my.kobo: add hightlights method to return Hightlight objects iteratively
...
also minor cleanup
2023-10-28 20:06:54 +01:00
Dima Gerasimov
d88a1b9933
my.hypothesis: explose data as iterators instead of lists
...
also add an adapter to support migrating in backwards compatible manner
2023-10-28 20:06:54 +01:00
Dima Gerasimov
4f7c9b4a71
core: move split compat/legacy modules into hpi_compat and compat
2023-10-28 20:06:54 +01:00
karlicoss
70bf51a125
core/stats: exclude contextmanagers from guess_stats
2023-10-28 00:08:32 +01:00
karlicoss
fb2b3e07de
my.emfit: cleanup and pass cpu pool
2023-10-27 23:52:03 +01:00
Dima Gerasimov
32aa87b3ec
dcotor: make compileall check a bit more defensive
2023-10-27 02:38:22 +01:00
karlicoss
3a25c9042c
my.hackernews.dogsheep: use utc datetime + minor cleanup
2023-10-27 02:38:03 +01:00
karlicoss
bef0423b4f
my.zulip.organization: use UTC timestamps, support custom archive names + some cleanup
2023-10-27 02:38:03 +01:00
karlicoss
a0910e798d
core.logging: ignore CollapseLogsHandler if we're not attached to a terminal
...
otherwise fails at os.get_terminal_size
2023-10-25 02:42:52 +01:00
Dima Gerasimov
1f61e853c9
reddit.rexport: experiment with using optional cpu pool (used by all of HPI)
...
Enabled by the env variable, specifying how many cores to dedicate, e.g.
HPI_CPU_POOL=4 hpi query ...
2023-10-25 02:06:45 +01:00
Dima Gerasimov
a5c04e789a
twitter.archive: deduplicate results via json.dumps
...
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs
interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
Dima Gerasimov
0e94e0a9ea
whatsapp.andrdoid: handle most messages types properly
2023-10-24 00:31:34 +01:00
Dima Gerasimov
72ab2603d5
my.whatsapp.android: exclude some dummy messages, minor cleanup
2023-10-24 00:31:34 +01:00
Dima Gerasimov
414b88178f
tinder.android: infer user's own name automatically
2023-10-24 00:31:34 +01:00
Dima Gerasimov
f355a55e06
my.instagram.gdpr: process all historic archives + better normalising
2023-10-23 18:42:50 +01:00
Dima Gerasimov
f9a1050ceb
my.instagram.android: more defensive error handling
2023-10-23 18:42:50 +01:00
karlicoss
86ea605aec
core/stats: enable processing input files, report first and last filename
...
can be useful for quick investigation/testing setup
2023-10-22 00:47:36 +01:00
karlicoss
c335c0c9d8
core/stats: report datetime of first item in addition to last
...
quite useful for quickly determining time span of a data source
2023-10-22 00:47:36 +01:00
karlicoss
a60d69fb30
core/stats: get rid of duplicated keys for 'auto stats'
...
previously:
```
{'iter_data': {'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}}
```
after
```
{'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}
```
2023-10-22 00:47:36 +01:00
karlicoss
c5fe2e9412
core.stats: fix is_data_provider when from __future__ import annotations is used
2023-10-21 23:46:40 +01:00
karlicoss
872053a3c3
my.hackernews.harmonic: fix issue with crashing due to html escaping
...
also add proper logging
2023-10-21 23:46:40 +01:00
karlicoss
37bb33cdbc
experimental: add a hacky helper to import "original/shadowed" modules from within overlays
2023-10-21 22:46:16 +01:00
karlicoss
8c2d1c9463
general: use less explicit kompress boilerplate in modules
...
now get_files/kompress library can handle it transparently
2023-10-20 21:13:59 +01:00
karlicoss
c63e80ce94
core: more consistent handling of zip archives in get_files + tests
2023-10-20 21:13:59 +01:00
Dima Gerasimov
9ffce1b696
reddit.rexport: add accessors for subreddits, multireddits and profile
2023-10-19 02:26:28 +01:00
Dima Gerasimov
29832a9f75
core: fix test_get_files after updating kompress
2023-10-19 02:26:28 +01:00
Dima Gerasimov
28d2450a21
reddit.rexport: some cleanup, move get_events stuff into personal overlay
2023-10-19 02:26:28 +01:00
karlicoss
fe26efaea8
core/kompress: move vendorized to _deprecated, use kompress library directly
2023-10-12 23:47:05 +01:00
karlicoss
bb478f369d
core/logging: no need for super call in Filter
2023-10-12 23:47:05 +01:00
karlicoss
68289c1be3
general: fix ignores after mypy version update
2023-10-12 23:47:05 +01:00