Commit graph

46 commits

Author SHA1 Message Date
Dima Gerasimov
5c82d0faa9 switch from using dataset to raw sqlite3 module
dataset is kinda unmaintaned and currently broken due to sqlalchemy 2.0 changes

resolves https://github.com/karlicoss/HPI/issues/264
2023-02-07 01:57:00 +00:00
Dima Gerasimov
5f1d41fa52 my.twitter.archive: fix for newer format (tweets filename changed to tweets.js) 2022-10-19 00:06:23 +01:00
Dima Gerasimov
ca91be8154 twitter.archive: fix legacy config detection
apparently .name contains the parent module so previously it was throwing the exception instead
2022-10-19 00:06:23 +01:00
Dima Gerasimov
4e59a65f9a core/general: move cached_property into compat, use standard implementation from python3.8 2022-05-31 14:08:50 +01:00
Dima Gerasimov
711157e0f5 my.twitter.archive: switch to zippath, add config section, better mypy coverage 2022-05-31 14:08:50 +01:00
Dima Gerasimov
d092608002 twitter.talon: make retweets more compatible with twitter archive 2022-05-31 01:28:11 +01:00
Dima Gerasimov
ef120bc643 twitter.talon: expland URLs 2022-05-31 01:28:11 +01:00
Dima Gerasimov
946daf40d0 twitter: prefer archive data over twidump for tweets
also add a script to check twitter data
2022-05-31 01:28:11 +01:00
Dima Gerasimov
bb4c77612b twitter.twint: fix missing mentions in tweet text 2022-05-31 01:28:11 +01:00
Dima Gerasimov
bb6201bf2d my.twitter.archive: expand entities in tweet text 2022-05-31 01:28:11 +01:00
Dima Gerasimov
1e2fc3bec7 twitter.archive: unescape stuff like &lt/&gt 2022-05-31 01:28:11 +01:00
Dima Gerasimov
44a6b17ec3 twitter: use created_at as an extra key for merging 2022-05-31 01:28:11 +01:00
Dima Gerasimov
4104f821fa twitter.twint: actually need to treat created_at is UTC 2022-05-31 01:28:11 +01:00
Dima Gerasimov
d65e1b5245 twitter.twint: localize timestamps correctly
same issue as discussed here https://memex.zulipchat.com/#narrow/stream/279610-data/topic/google.20takeout.20timestamps

also see corresponding changes for google_takeout_parser

- https://github.com/seanbreckenridge/google_takeout_parser/pull/28/files
- https://github.com/seanbreckenridge/google_takeout_parser/pull/30/files
2022-05-31 01:28:11 +01:00
Dima Gerasimov
de7972be05 twitter: add permalink to Talon objects; extract shared method 2022-05-31 01:28:11 +01:00
Sean Breckenridge
62832a6756 twitter/archive: set default logger to warning 2022-02-09 23:18:24 +00:00
Sean Breckenridge
b6fa26b899 twitter/archive: update deprecated imports 2022-02-09 23:18:24 +00:00
Dima Gerasimov
b9852f45cf twitter: use import_source and proper merging for tweets from different sources
+ use proper datetime_aware for created_at
2022-02-08 20:45:10 +00:00
Dima Gerasimov
afdf9d4334 twitter: initial talon module, processing data from Talon android app 2022-02-08 20:45:10 +00:00
Sean Breckenridge
5ecd4b4810 cleanup; remove unused imports 2021-04-02 08:38:06 +01:00
Dima Gerasimov
5ef638694e minor requirements updates 2021-03-08 00:40:19 +00:00
Dima Gerasimov
571cb48aea core: add modules_ast for more robust module collection 2020-12-11 07:02:16 +01:00
Dima Gerasimov
15789a4149 kyhton.kompress: move to core (with a fallback, used in promnesia) 2020-10-29 03:13:18 +01:00
Dima Gerasimov
fd41caa640 core: add __NOT_HPI_MODULE__ flag to mark utility files etc
(more of an intermediate solution perhaps)
2020-09-30 21:54:09 +02:00
Dima Gerasimov
fbaa8e0b44 core: add warnings helper to highlight warnings so they are more visible in the output 2020-09-27 17:47:30 +02:00
Dima Gerasimov
626ee994bf twint: open database in read only mode 2020-07-31 12:22:13 +01:00
Dima Gerasimov
1cc4eb5d8d core: add helper for computing stats; use it in modules 2020-06-04 22:19:34 +01:00
Dima Gerasimov
a267aeec5b github: add config templates + docs
- ghexport: use export_path (export_dir is still supported)
2020-06-01 23:33:34 +01:00
Dima Gerasimov
ca39187c63 github: DEPRECATE my.coding.github
Instead my.github.all should be used (still backward compatible)

The reasons are
a) I don't feel that grouping (i.e. my.coding.*) makes much sense
b) using .all pattern (same way as twitter) allows for more composable and cleaner separation of GDPR and API data
2020-06-01 22:49:31 +01:00
Dima Gerasimov
216944b3cd core: improvements for warnings, twitter/rss: try using @warn_if_empty 2020-05-25 00:56:03 +01:00
Dima Gerasimov
f5267d05d7 my.twitter.archive: rename config (preserving bckwd compatibility for now) 2020-05-24 13:06:52 +01:00
Dima Gerasimov
b99b2f3cfa core: add warning when get_files returns no files, my.twitter.archive: make more defensive in case of no archives 2020-05-24 12:51:23 +01:00
Dima Gerasimov
b7662378a2 docs: minor updates 2020-05-22 19:38:14 +01:00
Dima Gerasimov
03773a7b2c twitter module: prettify top level twitter.all 2020-05-22 19:00:02 +01:00
Dima Gerasimov
63d4198fd9 rss module: prettify & reorganize to allow for easily adding extra modules 2020-05-13 22:58:09 +01:00
Dima Gerasimov
976b3da6f4 Autoextract documentation for some modules, improve docs 2020-05-10 18:09:12 +01:00
Dima Gerasimov
9cb39103c6 start autogenerating documentation on modules 2020-05-10 16:42:40 +01:00
Dima Gerasimov
e92ca215e3 Adapt takeout and twitter configs to the new pattern
Works fairly well so far?
2020-05-10 15:56:57 +01:00
Dima Gerasimov
8b8a85e8c3 kompress.kopen improvements
- tests
- uniform handling for bytes/str, always return utf8 str by default
2020-05-04 08:37:36 +01:00
Dima Gerasimov
51ae8601b4 Update docstrings and add links 2020-04-26 16:50:06 +01:00
Dima Gerasimov
96a850faf9 remove unnecessary methods from twitter provider 2020-04-20 08:38:01 +01:00
Dima Gerasimov
1d681eb802 typo fix. that was embarrassing! 2020-04-14 23:27:00 +01:00
Dima Gerasimov
81986b0624 support likes from twint 2020-04-14 23:01:44 +01:00
Dima Gerasimov
69a1624f8f use more-itertools; merge tweets 2020-04-14 22:15:35 +01:00
Dima Gerasimov
30b6918a8d unified view for twitter data 2020-04-14 22:05:47 +01:00
Dima Gerasimov
56b6ab9aaf move twitter stuff to twitter subdir 2020-04-14 21:38:21 +01:00