seanbreckenridge
35dd5d82a0
smscalls: parse mms from smscalls export ( #370 )
...
* initial mms exploration
2024-06-05 22:03:03 +01:00
Dima Gerasimov
8a8a1ebb0e
my.tinder.android: better error handing and fix case with empty db
2024-04-03 20:13:40 +01:00
Dima Gerasimov
103ea2096e
my.coding.commits: fix for git repo discovery after fdfind v9
2024-03-13 00:46:18 +00:00
Dima Gerasimov
7236024c7a
my.twitter.android: better detection of own user id
2024-03-13 00:46:18 +00:00
Dima Gerasimov
87a8a7781b
my.google.maps: intitial module for extracting placed data from Android app
2024-01-01 23:46:02 +00:00
Sean Breckenridge
93e475795d
google takeout: support multiple locales
...
uses the known locales in google_takeout_parser
to determine the expected paths for each locale,
and performs a partial match on the paths to
detect and use match_structure
2023-12-31 18:57:30 +00:00
Dima Gerasimov
1b187b2c1b
whatsapp.android: expose all entities extracted from the db
2023-12-29 00:57:49 +00:00
Dima Gerasimov
3ec362fce9
fbmessenger.android: expose contacts
2023-12-28 18:13:16 +00:00
karlicoss
a0ce666024
my.youtube.takeout: fix exception handling
2023-12-28 00:25:05 +00:00
karlicoss
1c452b12d4
twitter.android: extract likes and own tweets as well
2023-12-28 00:12:39 +00:00
karlicoss
51209c547e
my.twitter.android: refactor into a proper module
...
for now only extracting bookmarks, will use it for some time and see how it goes
2023-12-24 00:49:07 +00:00
karlicoss
a4a7bc41b9
my.twitter.android: extract entities
2023-12-24 00:49:07 +00:00
karlicoss
3d75abafe9
my.twitter.android: some intial work on pasring sqlite databases from official Android app
2023-12-24 00:49:07 +00:00
Sean Breckenridge
224ba521e3
gpslogger: catch broken xml file error
2023-12-20 02:41:52 +00:00
Dima Gerasimov
a843407e40
core/compat: move fromisoformat to .core.compat module
2023-11-19 23:45:08 +00:00
Dima Gerasimov
bde43d6a7a
my.body.sleep: massive speedup for average temperature calculation
2023-11-11 00:42:49 +00:00
karlicoss
7b1cec9326
codeforces/topcode: move to top level and check in ci
2023-11-10 23:11:54 +00:00
karlicoss
657ce08ac8
fix mypy issues after mypy/libraries updates
2023-11-10 22:59:09 +00:00
karlicoss
996169aa29
time.tz.via_location: more consistent behaviour wrt caching
...
previously it was possible to cachew never properly initialize the cache because if you only queried some dates in the past
because we never made it to the end of _iter_tzs
also some minor cleanup
2023-11-10 22:59:09 +00:00
karlicoss
70bb9ed0c5
location.google_takeout_semantic: handle None visitConfidence
2023-11-10 02:10:30 +00:00
karlicoss
65c617ed94
my.emfit: add missing properties to fake data generator
2023-11-10 02:10:30 +00:00
karlicoss
ac5f71c68b
my.jawbone: get rid of matplotlib import on top level
2023-11-10 02:10:30 +00:00
karlicoss
33f8d867e2
my.browser.export: cleanup
...
- make logging INFO (default) -- otherwise it's too quiet during processing lots of databases
- can pass inputs cachew directly now
2023-11-07 21:24:56 +00:00
karlicoss
19353e996d
my.hackernews.harmonic: use orjson + add __hash__ for Saved object
...
plus some minor cleanup
2023-11-07 01:03:57 +00:00
karlicoss
4ac3bbb101
my.bumble.android: fix message deduplication
2023-11-07 01:03:57 +00:00
karlicoss
5630621ec1
my.pinboard: some cleanup
2023-11-06 23:10:00 +00:00
karlicoss
7631f1f2e4
monzo.monzoexport: initial module
2023-11-02 00:47:13 +00:00
karlicoss
105928238f
vk_messages_backup: some cleanup + switch to get_files
2023-11-02 00:43:10 +00:00
karlicoss
71cb66df5f
core: add helper for more_iterable to check that all types involved are hashable
...
Otherwise unique_everseen performance may degrade to quadratic rather than linear
For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag
also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
d6786084ca
general: deprecate some old methods by hiding behind TYPE_CHECKING
2023-10-30 22:51:31 +00:00
karlicoss
79ce8e84ec
fbmessenger.android: support processing msys database
...
seems that threads_db2 stopped updating some time ago, and msys contains all new data now
2023-10-30 02:54:22 +00:00
karlicoss
f28f68b14b
general: enhancle logging for various modules
2023-10-29 22:32:07 +00:00
karlicoss
ea195e3d17
general: improve logging during file processing in various modules
2023-10-29 01:01:30 +01:00
karlicoss
f668208bce
my.stackexchange.stexport: small cleanup & stat improvements
2023-10-28 21:33:36 +01:00
Dima Gerasimov
6821fbc2fe
core/config: implement a warning if config is imported from the dir other than MY_CONFIG
...
this should help with identifying setup issues
2023-10-28 20:56:07 +01:00
Dima Gerasimov
edea2c2e75
my.kobo: add hightlights method to return Hightlight objects iteratively
...
also minor cleanup
2023-10-28 20:06:54 +01:00
Dima Gerasimov
d88a1b9933
my.hypothesis: explose data as iterators instead of lists
...
also add an adapter to support migrating in backwards compatible manner
2023-10-28 20:06:54 +01:00
Dima Gerasimov
4f7c9b4a71
core: move split compat/legacy modules into hpi_compat and compat
2023-10-28 20:06:54 +01:00
karlicoss
70bf51a125
core/stats: exclude contextmanagers from guess_stats
2023-10-28 00:08:32 +01:00
karlicoss
fb2b3e07de
my.emfit: cleanup and pass cpu pool
2023-10-27 23:52:03 +01:00
Dima Gerasimov
32aa87b3ec
dcotor: make compileall check a bit more defensive
2023-10-27 02:38:22 +01:00
karlicoss
3a25c9042c
my.hackernews.dogsheep: use utc datetime + minor cleanup
2023-10-27 02:38:03 +01:00
karlicoss
bef0423b4f
my.zulip.organization: use UTC timestamps, support custom archive names + some cleanup
2023-10-27 02:38:03 +01:00
karlicoss
a0910e798d
core.logging: ignore CollapseLogsHandler if we're not attached to a terminal
...
otherwise fails at os.get_terminal_size
2023-10-25 02:42:52 +01:00
Dima Gerasimov
1f61e853c9
reddit.rexport: experiment with using optional cpu pool (used by all of HPI)
...
Enabled by the env variable, specifying how many cores to dedicate, e.g.
HPI_CPU_POOL=4 hpi query ...
2023-10-25 02:06:45 +01:00
Dima Gerasimov
a5c04e789a
twitter.archive: deduplicate results via json.dumps
...
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs
interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
Dima Gerasimov
0e94e0a9ea
whatsapp.andrdoid: handle most messages types properly
2023-10-24 00:31:34 +01:00
Dima Gerasimov
72ab2603d5
my.whatsapp.android: exclude some dummy messages, minor cleanup
2023-10-24 00:31:34 +01:00
Dima Gerasimov
414b88178f
tinder.android: infer user's own name automatically
2023-10-24 00:31:34 +01:00
Dima Gerasimov
f355a55e06
my.instagram.gdpr: process all historic archives + better normalising
2023-10-23 18:42:50 +01:00