Commit graph

699 commits

Author SHA1 Message Date
seanbreckenridge
35dd5d82a0
smscalls: parse mms from smscalls export (#370)
* initial mms exploration
2024-06-05 22:03:03 +01:00
Dima Gerasimov
8a8a1ebb0e my.tinder.android: better error handing and fix case with empty db 2024-04-03 20:13:40 +01:00
Dima Gerasimov
103ea2096e my.coding.commits: fix for git repo discovery after fdfind v9 2024-03-13 00:46:18 +00:00
Dima Gerasimov
7236024c7a my.twitter.android: better detection of own user id 2024-03-13 00:46:18 +00:00
Dima Gerasimov
87a8a7781b my.google.maps: intitial module for extracting placed data from Android app 2024-01-01 23:46:02 +00:00
Sean Breckenridge
93e475795d google takeout: support multiple locales
uses the known locales in google_takeout_parser
to determine the expected paths for each locale,
and performs a partial match on the paths to
detect and use match_structure
2023-12-31 18:57:30 +00:00
Dima Gerasimov
1b187b2c1b whatsapp.android: expose all entities extracted from the db 2023-12-29 00:57:49 +00:00
Dima Gerasimov
3ec362fce9 fbmessenger.android: expose contacts 2023-12-28 18:13:16 +00:00
karlicoss
a0ce666024 my.youtube.takeout: fix exception handling 2023-12-28 00:25:05 +00:00
karlicoss
1c452b12d4 twitter.android: extract likes and own tweets as well 2023-12-28 00:12:39 +00:00
karlicoss
51209c547e my.twitter.android: refactor into a proper module
for now only extracting bookmarks, will use it for some time and see how it goes
2023-12-24 00:49:07 +00:00
karlicoss
a4a7bc41b9 my.twitter.android: extract entities 2023-12-24 00:49:07 +00:00
karlicoss
3d75abafe9 my.twitter.android: some intial work on pasring sqlite databases from official Android app 2023-12-24 00:49:07 +00:00
Sean Breckenridge
224ba521e3 gpslogger: catch broken xml file error 2023-12-20 02:41:52 +00:00
Dima Gerasimov
a843407e40 core/compat: move fromisoformat to .core.compat module 2023-11-19 23:45:08 +00:00
Dima Gerasimov
bde43d6a7a my.body.sleep: massive speedup for average temperature calculation 2023-11-11 00:42:49 +00:00
karlicoss
7b1cec9326 codeforces/topcode: move to top level and check in ci 2023-11-10 23:11:54 +00:00
karlicoss
657ce08ac8 fix mypy issues after mypy/libraries updates 2023-11-10 22:59:09 +00:00
karlicoss
996169aa29 time.tz.via_location: more consistent behaviour wrt caching
previously it was possible to cachew never properly initialize the cache because if you only queried some dates in the past
because we never made it to the end of _iter_tzs

also some minor cleanup
2023-11-10 22:59:09 +00:00
karlicoss
70bb9ed0c5 location.google_takeout_semantic: handle None visitConfidence 2023-11-10 02:10:30 +00:00
karlicoss
65c617ed94 my.emfit: add missing properties to fake data generator 2023-11-10 02:10:30 +00:00
karlicoss
ac5f71c68b my.jawbone: get rid of matplotlib import on top level 2023-11-10 02:10:30 +00:00
karlicoss
33f8d867e2 my.browser.export: cleanup
- make logging INFO (default) -- otherwise it's too quiet during processing lots of databases
- can pass inputs cachew directly now
2023-11-07 21:24:56 +00:00
karlicoss
19353e996d my.hackernews.harmonic: use orjson + add __hash__ for Saved object
plus some minor cleanup
2023-11-07 01:03:57 +00:00
karlicoss
4ac3bbb101 my.bumble.android: fix message deduplication 2023-11-07 01:03:57 +00:00
karlicoss
5630621ec1 my.pinboard: some cleanup 2023-11-06 23:10:00 +00:00
karlicoss
7631f1f2e4 monzo.monzoexport: initial module 2023-11-02 00:47:13 +00:00
karlicoss
105928238f vk_messages_backup: some cleanup + switch to get_files 2023-11-02 00:43:10 +00:00
karlicoss
71cb66df5f core: add helper for more_iterable to check that all types involved are hashable
Otherwise unique_everseen performance may degrade to quadratic rather than linear

For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag

also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
d6786084ca general: deprecate some old methods by hiding behind TYPE_CHECKING 2023-10-30 22:51:31 +00:00
karlicoss
79ce8e84ec fbmessenger.android: support processing msys database
seems that threads_db2 stopped updating some time ago, and msys contains all new data now
2023-10-30 02:54:22 +00:00
karlicoss
f28f68b14b general: enhancle logging for various modules 2023-10-29 22:32:07 +00:00
karlicoss
ea195e3d17 general: improve logging during file processing in various modules 2023-10-29 01:01:30 +01:00
karlicoss
f668208bce my.stackexchange.stexport: small cleanup & stat improvements 2023-10-28 21:33:36 +01:00
Dima Gerasimov
6821fbc2fe core/config: implement a warning if config is imported from the dir other than MY_CONFIG
this should help with identifying setup issues
2023-10-28 20:56:07 +01:00
Dima Gerasimov
edea2c2e75 my.kobo: add hightlights method to return Hightlight objects iteratively
also minor cleanup
2023-10-28 20:06:54 +01:00
Dima Gerasimov
d88a1b9933 my.hypothesis: explose data as iterators instead of lists
also add an adapter to support migrating in backwards compatible manner
2023-10-28 20:06:54 +01:00
Dima Gerasimov
4f7c9b4a71 core: move split compat/legacy modules into hpi_compat and compat 2023-10-28 20:06:54 +01:00
karlicoss
70bf51a125 core/stats: exclude contextmanagers from guess_stats 2023-10-28 00:08:32 +01:00
karlicoss
fb2b3e07de my.emfit: cleanup and pass cpu pool 2023-10-27 23:52:03 +01:00
Dima Gerasimov
32aa87b3ec dcotor: make compileall check a bit more defensive 2023-10-27 02:38:22 +01:00
karlicoss
3a25c9042c my.hackernews.dogsheep: use utc datetime + minor cleanup 2023-10-27 02:38:03 +01:00
karlicoss
bef0423b4f my.zulip.organization: use UTC timestamps, support custom archive names + some cleanup 2023-10-27 02:38:03 +01:00
karlicoss
a0910e798d core.logging: ignore CollapseLogsHandler if we're not attached to a terminal
otherwise fails at os.get_terminal_size
2023-10-25 02:42:52 +01:00
Dima Gerasimov
1f61e853c9 reddit.rexport: experiment with using optional cpu pool (used by all of HPI)
Enabled by the env variable, specifying how many cores to dedicate, e.g.

HPI_CPU_POOL=4 hpi query ...
2023-10-25 02:06:45 +01:00
Dima Gerasimov
a5c04e789a twitter.archive: deduplicate results via json.dumps
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs

interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
Dima Gerasimov
0e94e0a9ea whatsapp.andrdoid: handle most messages types properly 2023-10-24 00:31:34 +01:00
Dima Gerasimov
72ab2603d5 my.whatsapp.android: exclude some dummy messages, minor cleanup 2023-10-24 00:31:34 +01:00
Dima Gerasimov
414b88178f tinder.android: infer user's own name automatically 2023-10-24 00:31:34 +01:00
Dima Gerasimov
f355a55e06 my.instagram.gdpr: process all historic archives + better normalising 2023-10-23 18:42:50 +01:00