Dima Gerasimov
02dabe9f2b
my.twitter.archive: cleanup linting and use proper configuration via abstract class
2024-09-22 02:13:10 +01:00
Dima Gerasimov
239e6617fe
my.twitter.archive: deduplicate tweets based on id_str/created_at and raw tweet text
2024-09-22 02:13:10 +01:00
Dima Gerasimov
e036cc9e85
my.twitter.android: get own user id as string, consistent with rest of module
2024-09-22 02:13:10 +01:00
Dima Gerasimov
72cc8ff3ac
ruff: enable B warnings (mainly suppressed exceptions and unused variables)
2024-08-28 04:06:32 +01:00
Dima Gerasimov
b594377a59
ruff: enable RUF ruleset
2024-08-28 04:06:32 +01:00
Dima Gerasimov
d244c7cc4e
ruff: enable and fix C4 ruleset
2024-08-28 04:06:32 +01:00
Dima Gerasimov
b1fe23b8d0
my.rss.feedly/my.twittr.talon -- migrate to use lazy user configs
2024-08-26 04:00:58 +01:00
Dima Gerasimov
2c63fe25c0
my.twitter.android: get data from statues table rather that timeline_view
2024-08-05 23:35:24 +01:00
karlicoss
d5fccf1874
twitter.android: more comments on timeline types
2024-08-03 16:50:09 +01:00
Dima Gerasimov
7236024c7a
my.twitter.android: better detection of own user id
2024-03-13 00:46:18 +00:00
karlicoss
1c452b12d4
twitter.android: extract likes and own tweets as well
2023-12-28 00:12:39 +00:00
karlicoss
51209c547e
my.twitter.android: refactor into a proper module
...
for now only extracting bookmarks, will use it for some time and see how it goes
2023-12-24 00:49:07 +00:00
karlicoss
a4a7bc41b9
my.twitter.android: extract entities
2023-12-24 00:49:07 +00:00
karlicoss
3d75abafe9
my.twitter.android: some intial work on pasring sqlite databases from official Android app
2023-12-24 00:49:07 +00:00
karlicoss
71cb66df5f
core: add helper for more_iterable to check that all types involved are hashable
...
Otherwise unique_everseen performance may degrade to quadratic rather than linear
For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag
also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
a5c04e789a
twitter.archive: deduplicate results via json.dumps
...
this speeds up processing quite a bit, from 40s to 20s for me, plus removes tons of identical outputs
interesting enough, using raw object without json.dumps as key brings unique_everseen to crawl...
2023-10-24 01:54:30 +01:00
karlicoss
8c2d1c9463
general: use less explicit kompress boilerplate in modules
...
now get_files/kompress library can handle it transparently
2023-10-20 21:13:59 +01:00
Dima Gerasimov
0512488241
ci: sync configs to pymplate
...
- add python3.12
- add ruff
2023-10-06 02:24:01 +01:00
Dima Gerasimov
dff31455f1
general: switch to make_logger in a few modules, use a bit more consistent logging, rely on default INFO level
2023-06-21 18:42:15 +01:00
Dima Gerasimov
fe88380499
general: switch to using native 3.8 versions for cached_property/Literal/Protocol instead of compat
2023-05-16 01:18:30 +01:00
Dima Gerasimov
c34656e8fb
general: update mypy config, seems that logs of type: ignore aren't necessary anymore
2023-05-16 01:18:30 +01:00
Dima Gerasimov
9db5f318fb
my.twitter.twint: use dict row factory instead of sqlite Row
...
otherwise it's not json serializable
2023-03-17 00:33:22 +00:00
Dima Gerasimov
c63177e186
general/ci: clean up mypy-misc pipeline, only exclude specific files instead
...
marked some module configs which aren't really ready for public use as type: ignore
2023-02-21 00:20:58 +00:00
Dima Gerasimov
5c82d0faa9
switch from using dataset to raw sqlite3 module
...
dataset is kinda unmaintaned and currently broken due to sqlalchemy 2.0 changes
resolves https://github.com/karlicoss/HPI/issues/264
2023-02-07 01:57:00 +00:00
Dima Gerasimov
5f1d41fa52
my.twitter.archive: fix for newer format (tweets filename changed to tweets.js)
2022-10-19 00:06:23 +01:00
Dima Gerasimov
ca91be8154
twitter.archive: fix legacy config detection
...
apparently .name contains the parent module so previously it was throwing the exception instead
2022-10-19 00:06:23 +01:00
Dima Gerasimov
4e59a65f9a
core/general: move cached_property into compat, use standard implementation from python3.8
2022-05-31 14:08:50 +01:00
Dima Gerasimov
711157e0f5
my.twitter.archive: switch to zippath, add config section, better mypy coverage
2022-05-31 14:08:50 +01:00
Dima Gerasimov
d092608002
twitter.talon: make retweets more compatible with twitter archive
2022-05-31 01:28:11 +01:00
Dima Gerasimov
ef120bc643
twitter.talon: expland URLs
2022-05-31 01:28:11 +01:00
Dima Gerasimov
946daf40d0
twitter: prefer archive data over twidump for tweets
...
also add a script to check twitter data
2022-05-31 01:28:11 +01:00
Dima Gerasimov
bb4c77612b
twitter.twint: fix missing mentions in tweet text
2022-05-31 01:28:11 +01:00
Dima Gerasimov
bb6201bf2d
my.twitter.archive: expand entities in tweet text
2022-05-31 01:28:11 +01:00
Dima Gerasimov
1e2fc3bec7
twitter.archive: unescape stuff like </>
2022-05-31 01:28:11 +01:00
Dima Gerasimov
44a6b17ec3
twitter: use created_at as an extra key for merging
2022-05-31 01:28:11 +01:00
Dima Gerasimov
4104f821fa
twitter.twint: actually need to treat created_at is UTC
2022-05-31 01:28:11 +01:00
Dima Gerasimov
d65e1b5245
twitter.twint: localize timestamps correctly
...
same issue as discussed here https://memex.zulipchat.com/#narrow/stream/279610-data/topic/google.20takeout.20timestamps
also see corresponding changes for google_takeout_parser
- https://github.com/seanbreckenridge/google_takeout_parser/pull/28/files
- https://github.com/seanbreckenridge/google_takeout_parser/pull/30/files
2022-05-31 01:28:11 +01:00
Dima Gerasimov
de7972be05
twitter: add permalink to Talon objects; extract shared method
2022-05-31 01:28:11 +01:00
Sean Breckenridge
62832a6756
twitter/archive: set default logger to warning
2022-02-09 23:18:24 +00:00
Sean Breckenridge
b6fa26b899
twitter/archive: update deprecated imports
2022-02-09 23:18:24 +00:00
Dima Gerasimov
b9852f45cf
twitter: use import_source and proper merging for tweets from different sources
...
+ use proper datetime_aware for created_at
2022-02-08 20:45:10 +00:00
Dima Gerasimov
afdf9d4334
twitter: initial talon module, processing data from Talon android app
2022-02-08 20:45:10 +00:00
Sean Breckenridge
5ecd4b4810
cleanup; remove unused imports
2021-04-02 08:38:06 +01:00
Dima Gerasimov
5ef638694e
minor requirements updates
2021-03-08 00:40:19 +00:00
Dima Gerasimov
571cb48aea
core: add modules_ast for more robust module collection
2020-12-11 07:02:16 +01:00
Dima Gerasimov
15789a4149
kyhton.kompress: move to core (with a fallback, used in promnesia)
2020-10-29 03:13:18 +01:00
Dima Gerasimov
fd41caa640
core: add __NOT_HPI_MODULE__ flag to mark utility files etc
...
(more of an intermediate solution perhaps)
2020-09-30 21:54:09 +02:00
Dima Gerasimov
fbaa8e0b44
core: add warnings helper to highlight warnings so they are more visible in the output
2020-09-27 17:47:30 +02:00
Dima Gerasimov
626ee994bf
twint: open database in read only mode
2020-07-31 12:22:13 +01:00
Dima Gerasimov
1cc4eb5d8d
core: add helper for computing stats; use it in modules
2020-06-04 22:19:34 +01:00