adapted from https://github.com/seanbreckenridge/HPI/blob/master/my/google_takeout.py
additions:
- pass my.core.time.user_forced() to google_takeout_parser
without it, BST gets weird results for me, e.g. US/Aleutian
- support ZipPath via a config switch
- flexible error handling via a config switch
previously it would crash with:
SyntaxError: Forward reference must be an expression -- got 'yield'
(reproducible via python3 -c 'from typing import Union; Union[int, "yield"]' )
* core: add ZipPath encapsulating compressed zip files
this way you don't have to unpack it first and can work as if it's a 'virtual' directory
related: https://github.com/karlicoss/HPI/issues/20
core/source: use import error
uses the more broad ImportError
instead of ModuleNotFoundError
reasoning being if some submodule
(the one I'm configuring currently is
my.twitter.twint) doesn't have additional
imports from another parser/DAL, but it
still has a config block, the user would
have to create a stub-config block in their
config to use the all.py file
adds some short flags as CLI flags for convenience
the --stream flag previously only affected json, but
I can imagine '-o pprint -s -l 5' to print the first
5 items from some function could be useful as well
* initial pushshift/rexport merge implementation, using id for merging
* smarter module deprecation warning using regex
* add `RedditBase` from promnesia
* `import_source` helper for gracefully handing mixin data sources
* my.core.serialize: simplejson support, more types
I added a couple extra checks to the default function,
serializing datetime, dates and dataclasses (incase
orjson isn't installed)
(copied from below)
if orjson couldn't be imported, try simplejson
This is included for compatibility reasons because orjson
is rust-based and compiling on rarer architectures may not work
out of the box
as an example, I've been having issues getting it to install
on my phone (termux/android)
unlike the builtin JSON modue which serializes NamedTuples as lists
(even if you provide a default function), simplejson correctly
serializes namedtuples to dictionaries
this just gives another option to people, simplejson is pure python
so no one should have issues with that. orjson is still way faster,
so still preferable if its easy and theres a precompiled build
for your architecture (which there typically is)
If you're ever running this with simplejson installed and not orjson,
its pretty easy to tell as the JSON styling is different; orjson has
no spaces between tokens, simplejson puts spaces between tokens. e.g.
simplejson: {"a": 5, "b": 10}
orjson: {"a":5,"b":10}
allows you to do something like
hpi query --stream my.reddit.comments
to stream the JSON objects one per line, makes
it nicer to pipe into 'jq'/'fzf' instead
of having to process the giant list
at the end
- restructure query code for cli, some test fixes
- initial query_range implementation
refactored functions in query some more
to allow re-use in range_range, select()
pretty much just calls out to a bunch
of handlers now
everything is backwards-compatible with the previous
interface, the only minor changes were to the doctor cmd
which can now accept more than one item to run,
and the --skip-config-check to skip the config_ok
check if the user specifies to
added a test using click.testing.CliRunner (tests
the CLI in an isolated environment), though
additional tests which aren't testing the CLI
itself (parsing arguments or decorator behaviour)
can just call the functions themselves, as they
no longer accept a argparser.Namespace and instead
accept the direct arguments
* core: discovery_pure; allow multiple package roots
iterates over my.__path__._path if possible
to discover additional paths to iterate over
else defaults to the path relative to
the current file
- modernize:
- add REQUIRES spec for pdfannots library
- config dataclass/config stub
- stats function
- absolute my.core imports in anticipation of splitting core
- use 'paths' instead of 'roots' (better reflects the semantics), use get_files
backward compatible via config migration
- properly run tests/mypy