karlicoss
1c452b12d4
twitter.android: extract likes and own tweets as well
2023-12-28 00:12:39 +00:00
karlicoss
51209c547e
my.twitter.android: refactor into a proper module
...
for now only extracting bookmarks, will use it for some time and see how it goes
2023-12-24 00:49:07 +00:00
karlicoss
a4a7bc41b9
my.twitter.android: extract entities
2023-12-24 00:49:07 +00:00
karlicoss
3d75abafe9
my.twitter.android: some intial work on pasring sqlite databases from official Android app
2023-12-24 00:49:07 +00:00
Dima Gerasimov
a8f8858cb1
docs: document more experiments with overlays in docs
2023-12-22 02:54:36 +00:00
Dima Gerasimov
adbc0e73a2
docs: add note about directly checking overlays with mypy
2023-12-22 02:54:36 +00:00
Dima Gerasimov
84d835962d
docs: some documentation/thoughts on properly implementing overlay packages
2023-12-20 02:51:27 +00:00
Sean Breckenridge
224ba521e3
gpslogger: catch broken xml file error
2023-12-20 02:41:52 +00:00
Dima Gerasimov
a843407e40
core/compat: move fromisoformat to .core.compat module
2023-11-19 23:45:08 +00:00
karlicoss
09e0f66892
tox: disable --parallel flag in hpi module install
...
It's been so flaky it ends up taking more time to merge stuff. See https://github.com/karlicoss/HPI/issues/306
2023-11-19 19:18:19 +00:00
Dima Gerasimov
bde43d6a7a
my.body.sleep: massive speedup for average temperature calculation
2023-11-11 00:42:49 +00:00
karlicoss
37643c098f
tox: remove cat coverage index from tox, it's not very useful anyway
2023-11-10 23:11:54 +00:00
karlicoss
7b1cec9326
codeforces/topcode: move to top level and check in ci
2023-11-10 23:11:54 +00:00
karlicoss
657ce08ac8
fix mypy issues after mypy/libraries updates
2023-11-10 22:59:09 +00:00
karlicoss
996169aa29
time.tz.via_location: more consistent behaviour wrt caching
...
previously it was possible to cachew never properly initialize the cache because if you only queried some dates in the past
because we never made it to the end of _iter_tzs
also some minor cleanup
2023-11-10 22:59:09 +00:00
karlicoss
70bb9ed0c5
location.google_takeout_semantic: handle None visitConfidence
2023-11-10 02:10:30 +00:00
karlicoss
65c617ed94
my.emfit: add missing properties to fake data generator
2023-11-10 02:10:30 +00:00
karlicoss
ac5f71c68b
my.jawbone: get rid of matplotlib import on top level
2023-11-10 02:10:30 +00:00
karlicoss
e547acfa59
general: update minimal cachew version
...
had quite a few useful fixes/performance optimizations since
2023-11-07 21:24:56 +00:00
karlicoss
33f8d867e2
my.browser.export: cleanup
...
- make logging INFO (default) -- otherwise it's too quiet during processing lots of databases
- can pass inputs cachew directly now
2023-11-07 21:24:56 +00:00
karlicoss
19353e996d
my.hackernews.harmonic: use orjson + add __hash__ for Saved object
...
plus some minor cleanup
2023-11-07 01:03:57 +00:00
karlicoss
4ac3bbb101
my.bumble.android: fix message deduplication
2023-11-07 01:03:57 +00:00
karlicoss
5630621ec1
my.pinboard: some cleanup
2023-11-06 23:10:00 +00:00
karlicoss
7631f1f2e4
monzo.monzoexport: initial module
2023-11-02 00:47:13 +00:00
karlicoss
105928238f
vk_messages_backup: some cleanup + switch to get_files
2023-11-02 00:43:10 +00:00
Dima Gerasimov
24da04f142
ci: fix wrong release command
2023-11-01 01:54:16 +00:00
karlicoss
71cb66df5f
core: add helper for more_iterable to check that all types involved are hashable
...
Otherwise unique_everseen performance may degrade to quadratic rather than linear
For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag
also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
d6786084ca
general: deprecate some old methods by hiding behind TYPE_CHECKING
2023-10-30 22:51:31 +00:00
karlicoss
79ce8e84ec
fbmessenger.android: support processing msys database
...
seems that threads_db2 stopped updating some time ago, and msys contains all new data now
2023-10-30 02:54:22 +00:00
karlicoss
f28f68b14b
general: enhancle logging for various modules
2023-10-29 22:32:07 +00:00
karlicoss
ea195e3d17
general: improve logging during file processing in various modules
2023-10-29 01:01:30 +01:00
karlicoss
bd27bd4c24
docs: add documentation on logging during HPI module development
2023-10-29 00:50:22 +01:00
karlicoss
f668208bce
my.stackexchange.stexport: small cleanup & stat improvements
2023-10-28 21:33:36 +01:00
Dima Gerasimov
6821fbc2fe
core/config: implement a warning if config is imported from the dir other than MY_CONFIG
...
this should help with identifying setup issues
2023-10-28 20:56:07 +01:00
Dima Gerasimov
edea2c2e75
my.kobo: add hightlights method to return Hightlight objects iteratively
...
also minor cleanup
2023-10-28 20:06:54 +01:00
Dima Gerasimov
d88a1b9933
my.hypothesis: explose data as iterators instead of lists
...
also add an adapter to support migrating in backwards compatible manner
2023-10-28 20:06:54 +01:00
Dima Gerasimov
4f7c9b4a71
core: move split compat/legacy modules into hpi_compat and compat
2023-10-28 20:06:54 +01:00
karlicoss
70bf51a125
core/stats: exclude contextmanagers from guess_stats
2023-10-28 00:08:32 +01:00
karlicoss
fb2b3e07de
my.emfit: cleanup and pass cpu pool
2023-10-27 23:52:03 +01:00
Dima Gerasimov
32aa87b3ec
dcotor: make compileall check a bit more defensive
2023-10-27 02:38:22 +01:00
karlicoss
3a25c9042c
my.hackernews.dogsheep: use utc datetime + minor cleanup
2023-10-27 02:38:03 +01:00
karlicoss
bef0423b4f
my.zulip.organization: use UTC timestamps, support custom archive names + some cleanup
2023-10-27 02:38:03 +01:00
karlicoss
a0910e798d
core.logging: ignore CollapseLogsHandler if we're not attached to a terminal
...
otherwise fails at os.get_terminal_size
2023-10-25 02:42:52 +01:00
Dima Gerasimov
1f61e853c9
reddit.rexport: experiment with using optional cpu pool (used by all of HPI)
...
Enabled by the env variable, specifying how many cores to dedicate, e.g.
HPI_CPU_POOL=4 hpi query ...
2023-10-25 02:06:45 +01:00
Dima Gerasimov
a5c04e789a
twitter.archive: deduplicate results via json.dumps
...
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs
interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
Dima Gerasimov
0e94e0a9ea
whatsapp.andrdoid: handle most messages types properly
2023-10-24 00:31:34 +01:00
Dima Gerasimov
72ab2603d5
my.whatsapp.android: exclude some dummy messages, minor cleanup
2023-10-24 00:31:34 +01:00
Dima Gerasimov
414b88178f
tinder.android: infer user's own name automatically
2023-10-24 00:31:34 +01:00
Dima Gerasimov
f355a55e06
my.instagram.gdpr: process all historic archives + better normalising
2023-10-23 18:42:50 +01:00
Dima Gerasimov
f9a1050ceb
my.instagram.android: more defensive error handling
2023-10-23 18:42:50 +01:00