Commit graph

69 commits

Author SHA1 Message Date
Dima Gerasimov
02dabe9f2b my.twitter.archive: cleanup linting and use proper configuration via abstract class 2024-09-22 02:13:10 +01:00
Dima Gerasimov
239e6617fe my.twitter.archive: deduplicate tweets based on id_str/created_at and raw tweet text 2024-09-22 02:13:10 +01:00
Dima Gerasimov
e036cc9e85 my.twitter.android: get own user id as string, consistent with rest of module 2024-09-22 02:13:10 +01:00
Dima Gerasimov
72cc8ff3ac ruff: enable B warnings (mainly suppressed exceptions and unused variables) 2024-08-28 04:06:32 +01:00
Dima Gerasimov
b594377a59 ruff: enable RUF ruleset 2024-08-28 04:06:32 +01:00
Dima Gerasimov
d244c7cc4e ruff: enable and fix C4 ruleset 2024-08-28 04:06:32 +01:00
Dima Gerasimov
b1fe23b8d0 my.rss.feedly/my.twittr.talon -- migrate to use lazy user configs 2024-08-26 04:00:58 +01:00
Dima Gerasimov
2c63fe25c0 my.twitter.android: get data from statues table rather that timeline_view 2024-08-05 23:35:24 +01:00
karlicoss
d5fccf1874 twitter.android: more comments on timeline types 2024-08-03 16:50:09 +01:00
Dima Gerasimov
7236024c7a my.twitter.android: better detection of own user id 2024-03-13 00:46:18 +00:00
karlicoss
1c452b12d4 twitter.android: extract likes and own tweets as well 2023-12-28 00:12:39 +00:00
karlicoss
51209c547e my.twitter.android: refactor into a proper module
for now only extracting bookmarks, will use it for some time and see how it goes
2023-12-24 00:49:07 +00:00
karlicoss
a4a7bc41b9 my.twitter.android: extract entities 2023-12-24 00:49:07 +00:00
karlicoss
3d75abafe9 my.twitter.android: some intial work on pasring sqlite databases from official Android app 2023-12-24 00:49:07 +00:00
karlicoss
71cb66df5f core: add helper for more_iterable to check that all types involved are hashable
Otherwise unique_everseen performance may degrade to quadratic rather than linear

For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag

also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
a5c04e789a twitter.archive: deduplicate results via json.dumps
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs

interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
karlicoss
8c2d1c9463 general: use less explicit kompress boilerplate in modules
now get_files/kompress library can handle it transparently
2023-10-20 21:13:59 +01:00
Dima Gerasimov
0512488241 ci: sync configs to pymplate
- add python3.12
- add ruff
2023-10-06 02:24:01 +01:00
Dima Gerasimov
dff31455f1 general: switch to make_logger in a few modules, use a bit more consistent logging, rely on default INFO level 2023-06-21 18:42:15 +01:00
Dima Gerasimov
fe88380499 general: switch to using native 3.8 versions for cached_property/Literal/Protocol instead of compat 2023-05-16 01:18:30 +01:00
Dima Gerasimov
c34656e8fb general: update mypy config, seems that logs of type: ignore aren't necessary anymore 2023-05-16 01:18:30 +01:00
Dima Gerasimov
9db5f318fb my.twitter.twint: use dict row factory instead of sqlite Row
otherwise it's not json serializable
2023-03-17 00:33:22 +00:00
Dima Gerasimov
c63177e186 general/ci: clean up mypy-misc pipeline, only exclude specific files instead
marked some module configs which aren't really ready for public use as type: ignore
2023-02-21 00:20:58 +00:00
Dima Gerasimov
5c82d0faa9 switch from using dataset to raw sqlite3 module
dataset is kinda unmaintaned and currently broken due to sqlalchemy 2.0 changes

resolves https://github.com/karlicoss/HPI/issues/264
2023-02-07 01:57:00 +00:00
Dima Gerasimov
5f1d41fa52 my.twitter.archive: fix for newer format (tweets filename changed to tweets.js) 2022-10-19 00:06:23 +01:00
Dima Gerasimov
ca91be8154 twitter.archive: fix legacy config detection
apparently .name contains the parent module so previously it was throwing the exception instead
2022-10-19 00:06:23 +01:00
Dima Gerasimov
4e59a65f9a core/general: move cached_property into compat, use standard implementation from python3.8 2022-05-31 14:08:50 +01:00
Dima Gerasimov
711157e0f5 my.twitter.archive: switch to zippath, add config section, better mypy coverage 2022-05-31 14:08:50 +01:00
Dima Gerasimov
d092608002 twitter.talon: make retweets more compatible with twitter archive 2022-05-31 01:28:11 +01:00
Dima Gerasimov
ef120bc643 twitter.talon: expland URLs 2022-05-31 01:28:11 +01:00
Dima Gerasimov
946daf40d0 twitter: prefer archive data over twidump for tweets
also add a script to check twitter data
2022-05-31 01:28:11 +01:00
Dima Gerasimov
bb4c77612b twitter.twint: fix missing mentions in tweet text 2022-05-31 01:28:11 +01:00
Dima Gerasimov
bb6201bf2d my.twitter.archive: expand entities in tweet text 2022-05-31 01:28:11 +01:00
Dima Gerasimov
1e2fc3bec7 twitter.archive: unescape stuff like &lt/&gt 2022-05-31 01:28:11 +01:00
Dima Gerasimov
44a6b17ec3 twitter: use created_at as an extra key for merging 2022-05-31 01:28:11 +01:00
Dima Gerasimov
4104f821fa twitter.twint: actually need to treat created_at is UTC 2022-05-31 01:28:11 +01:00
Dima Gerasimov
d65e1b5245 twitter.twint: localize timestamps correctly
same issue as discussed here https://memex.zulipchat.com/#narrow/stream/279610-data/topic/google.20takeout.20timestamps

also see corresponding changes for google_takeout_parser

- https://github.com/seanbreckenridge/google_takeout_parser/pull/28/files
- https://github.com/seanbreckenridge/google_takeout_parser/pull/30/files
2022-05-31 01:28:11 +01:00
Dima Gerasimov
de7972be05 twitter: add permalink to Talon objects; extract shared method 2022-05-31 01:28:11 +01:00
Sean Breckenridge
62832a6756 twitter/archive: set default logger to warning 2022-02-09 23:18:24 +00:00
Sean Breckenridge
b6fa26b899 twitter/archive: update deprecated imports 2022-02-09 23:18:24 +00:00
Dima Gerasimov
b9852f45cf twitter: use import_source and proper merging for tweets from different sources
+ use proper datetime_aware for created_at
2022-02-08 20:45:10 +00:00
Dima Gerasimov
afdf9d4334 twitter: initial talon module, processing data from Talon android app 2022-02-08 20:45:10 +00:00
Sean Breckenridge
5ecd4b4810 cleanup; remove unused imports 2021-04-02 08:38:06 +01:00
Dima Gerasimov
5ef638694e minor requirements updates 2021-03-08 00:40:19 +00:00
Dima Gerasimov
571cb48aea core: add modules_ast for more robust module collection 2020-12-11 07:02:16 +01:00
Dima Gerasimov
15789a4149 kyhton.kompress: move to core (with a fallback, used in promnesia) 2020-10-29 03:13:18 +01:00
Dima Gerasimov
fd41caa640 core: add __NOT_HPI_MODULE__ flag to mark utility files etc
(more of an intermediate solution perhaps)
2020-09-30 21:54:09 +02:00
Dima Gerasimov
fbaa8e0b44 core: add warnings helper to highlight warnings so they are more visible in the output 2020-09-27 17:47:30 +02:00
Dima Gerasimov
626ee994bf twint: open database in read only mode 2020-07-31 12:22:13 +01:00
Dima Gerasimov
1cc4eb5d8d core: add helper for computing stats; use it in modules 2020-06-04 22:19:34 +01:00