seanbreckenridge
9e5cd60ff2
browser: parse browser history using browserexport ( #216 )
...
* browser: parse browser history using browserexport
from seanbreckenridge/HPI module:
1fba8ccf2f/my/browser/export.py
2022-02-13 23:56:05 +00:00
Sean Breckenridge
059c4ae791
docs: add link to template
2022-02-11 09:33:03 +00:00
Sean Breckenridge
a791b25650
core/cli: add --debug flag, add HPI_LOGS to docs
2022-02-11 09:31:10 +00:00
seanbreckenridge
7bf316eb9a
core/source: use import error ( #211 )
...
core/source: use import error
uses the more broad ImportError
instead of ModuleNotFoundError
reasoning being if some submodule
(the one I'm configuring currently is
my.twitter.twint) doesn't have additional
imports from another parser/DAL, but it
still has a config block, the user would
have to create a stub-config block in their
config to use the all.py file
2022-02-10 08:57:52 +00:00
seanbreckenridge
bea2c6a201
core/structure: add partial matching ( #212 )
...
* core/structure: add partial matching
2022-02-10 08:49:13 +00:00
Sean Breckenridge
62832a6756
twitter/archive: set default logger to warning
2022-02-09 23:18:24 +00:00
Sean Breckenridge
b6fa26b899
twitter/archive: update deprecated imports
2022-02-09 23:18:24 +00:00
Dima Gerasimov
b9852f45cf
twitter: use import_source and proper merging for tweets from different sources
...
+ use proper datetime_aware for created_at
2022-02-08 20:45:10 +00:00
Dima Gerasimov
afdf9d4334
twitter: initial talon module, processing data from Talon android app
2022-02-08 20:45:10 +00:00
Dima Gerasimov
f8e73134b3
fbmessenger: add all.py, merge messages from different sources
...
followup for https://github.com/karlicoss/HPI/pull/179
2022-02-08 19:21:44 +00:00
Dima Gerasimov
4626c1bba6
fbmessenger: support config migration for fbmessengerexport source
...
for now kinda copied from reddit... still thinking about a more generic way
2022-02-05 14:49:12 +00:00
Dima Gerasimov
403ca9c111
fbmessenger: process Android app data
...
for now, no merging, will figure it out later
2022-02-05 14:49:12 +00:00
Dima Gerasimov
fcd7ca6480
fbmessenger: only import from .export in legacy mode
2022-02-05 14:49:12 +00:00
Dima Gerasimov
f78b12f005
ci: fix pytest.warns type error
...
use warnings.catch_warnings to suppress instead
https://docs.pytest.org/en/7.0.x/how-to/capture-warnings.html?highlight=warnings#additional-use-cases-of-warnings-in-tests
likely due to pytest update to version 7
2022-02-04 23:38:50 +00:00
Dima Gerasimov
c4ad84ad95
move materialistic module inside hackernews package
...
followup for https://github.com/seanbreckenridge/HPI/pull/18
2022-02-04 23:38:50 +00:00
Dima Gerasimov
590e09f80b
hackernews: add initial dogsheep database importer
2022-02-04 23:38:50 +00:00
Dima Gerasimov
1e635502a2
instagram: initial module for GDPR export
...
still somewhat WIP, unclear how to correlate it with android data
2022-02-04 00:18:33 +00:00
Dima Gerasimov
0e891a267f
doctor: suggest config documentation in case of ImportError from config
...
doesn't help in all cases but perhaps helpful anyway
relevant: https://github.com/karlicoss/HPI/issues/109
2022-02-02 23:46:46 +00:00
Dima Gerasimov
d1f791dee8
my.fbmessenger: move fbmessenger.py into fbmessenger/export.py
...
keeping it backwards compatible + conditional warning similar to https://github.com/karlicoss/HPI/pull/179
follow up for https://github.com/seanbreckenridge/HPI/pull/18
for now without the __path__ hacking, will do it in bulk later
too lazy to run test_import_warnings.sh on CI for now, but figured I'd commit it for the reference anyway
2022-02-02 23:22:45 +00:00
Dima Gerasimov
e30953195c
instagram: initial module for android app data (direct messages)
2022-02-02 21:50:43 +00:00
Sean Breckenridge
823668ca5c
make reddit.rexport logs info by default
...
can always be configured with HPI_LOGS
having this on debug makes hpi doctor
quite verbose
2022-02-02 00:35:54 +00:00
Dima Gerasimov
7ead8eb4c9
bumble: add initial module for android database
2022-01-30 23:56:24 +00:00
Dima Gerasimov
673ee53a49
my.zulip: add message permalink
2022-01-30 23:33:05 +00:00
Dima Gerasimov
a39b5605ae
my.zulip: extract Server/Sender objects, experiment with normalised and denormalised objects
2022-01-30 23:33:05 +00:00
Dima Gerasimov
a1f03f9c02
my.zulip: initial zulip module, parsing full public organization export archive
2022-01-27 22:58:33 +00:00
Dima Gerasimov
73c9e46c4c
core: better support for compressed stuff, add .tar.gz
2022-01-27 22:58:33 +00:00
Sean Breckenridge
7493770d4d
core: remove vendorized py37 isoformat code
2022-01-27 19:25:42 +00:00
Sean Breckenridge
03dd1271f4
cli/query: add short flags, stream affects pprint
...
adds some short flags as CLI flags for convenience
the --stream flag previously only affected json, but
I can imagine '-o pprint -s -l 5' to print the first
5 items from some function could be useful as well
2022-01-27 08:50:57 +00:00
Sean Breckenridge
3f4fb64d56
core: drop py36 support, update docs for reddit ( #193 )
...
* docs: update references to my.reddit
* ci: remove 3.6, add 3.9
2022-01-27 08:26:15 +00:00
Dima Gerasimov
be21606075
my.reddit: better handling for legacy reddit config
...
prior to this change it would error with
@dataclass
> class pushshift_config(uconfig.pushshift):
E AttributeError: type object 'test_config' has no attribute 'pushshift'
2021-12-24 18:02:37 +00:00
Dima Gerasimov
5e9cc2a6a0
my.reddit: enable CI tests
2021-12-24 18:02:37 +00:00
Sean Breckenridge
01dfbbd58e
use default for getattr instead of catching error
2021-12-19 19:33:31 +00:00
Sean Breckenridge
83725e49dd
cli/query: allow querying dynamic functions
2021-12-19 19:33:31 +00:00
Dima Gerasimov
dd928964e6
general: fix mypy errors after mypy and pytz stubs updates
...
see 968fd6d01d/stubs/pytz/pytz/tzinfo.pyi (L6)
it says all concrete instances should not be None
2021-12-19 18:53:29 +00:00
Dima Gerasimov
9578b13fca
my.pdf: handle update to pdfannots 0.2
...
undoes f5b47dd695
, tests work properly now
resolves https://github.com/karlicoss/HPI/issues/180
2021-12-19 18:53:29 +00:00
Sean Breckenridge
074b8685d6
reddit: pass logger to cachew
...
so that HPI_LOGS can be used to interact
with this module, to check if cachew
is working properly
2021-12-19 18:25:50 +00:00
Sean Breckenridge
8033b5cdbd
docs/reddit: fix code block
2021-12-19 17:20:12 +00:00
Sean Breckenridge
4364484192
docs: add hpi query example, links to other repos
...
also updated the MODULE_DESIGN docs to mention the
current workaround for converting single file
modules to namespace packages through the
deprecation warning
2021-12-19 17:20:12 +00:00
Sean Breckenridge
d006339ab4
reddit: fix spelling mistakes
2021-11-03 20:18:10 +00:00
Sean Breckenridge
d6c484f321
reddit: ensure rexport isnt pointing to repo
2021-10-31 21:47:10 +00:00
Sean Breckenridge
5d2eadbcc6
reddit: swap inheritance order for Protocol ( #183 )
2021-10-31 21:24:16 +00:00
Sean Breckenridge
8422c6e420
my.reddit: refactor into module that supports pushshift/gdpr ( #179 )
...
* initial pushshift/rexport merge implementation, using id for merging
* smarter module deprecation warning using regex
* add `RedditBase` from promnesia
* `import_source` helper for gracefully handing mixin data sources
2021-10-31 20:39:04 +00:00
Dima Gerasimov
b54ec0d7f1
ci: fix minor mypy complaints from gitpython
2021-10-29 01:41:44 +01:00
Dima Gerasimov
f5b47dd695
ci: temporary suppress pdfs tests so we can pass CI
...
see https://github.com/karlicoss/HPI/issues/180
2021-10-29 01:41:44 +01:00
Dima Gerasimov
68d77981db
ci: update python stuff, exclude 3.6 from osx
2021-10-29 01:41:44 +01:00
Sean Breckenridge
4a04c09f31
docs: fix copy-paste errors/spelling mistakes
2021-07-10 10:56:23 +01:00
Sean Breckenridge
46198a6447
my.core.serialize: simplejson support, more types ( #176 )
...
* my.core.serialize: simplejson support, more types
I added a couple extra checks to the default function,
serializing datetime, dates and dataclasses (incase
orjson isn't installed)
(copied from below)
if orjson couldn't be imported, try simplejson
This is included for compatibility reasons because orjson
is rust-based and compiling on rarer architectures may not work
out of the box
as an example, I've been having issues getting it to install
on my phone (termux/android)
unlike the builtin JSON modue which serializes NamedTuples as lists
(even if you provide a default function), simplejson correctly
serializes namedtuples to dictionaries
this just gives another option to people, simplejson is pure python
so no one should have issues with that. orjson is still way faster,
so still preferable if its easy and theres a precompiled build
for your architecture (which there typically is)
If you're ever running this with simplejson installed and not orjson,
its pretty easy to tell as the JSON styling is different; orjson has
no spaces between tokens, simplejson puts spaces between tokens. e.g.
simplejson: {"a": 5, "b": 10}
orjson: {"a":5,"b":10}
2021-07-08 23:02:56 +01:00
Sean Breckenridge
821bc08a23
core/structure: help locate/extract gdpr exports ( #175 )
...
* core/structure: help locate/extract gdpr exports
* ci: add install-types to install stub packages
2021-07-08 00:44:55 +01:00
Dima Gerasimov
8ca88bde2e
polar: backward compatibility for my.reading.polar
2021-05-29 13:26:01 +01:00
Dima Gerasimov
2a4bddea79
polar: move to top level, add page support
2021-05-29 13:26:01 +01:00