prior to this change it would error with
@dataclass
> class pushshift_config(uconfig.pushshift):
E AttributeError: type object 'test_config' has no attribute 'pushshift'
also updated the MODULE_DESIGN docs to mention the
current workaround for converting single file
modules to namespace packages through the
deprecation warning
* initial pushshift/rexport merge implementation, using id for merging
* smarter module deprecation warning using regex
* add `RedditBase` from promnesia
* `import_source` helper for gracefully handing mixin data sources
* my.core.serialize: simplejson support, more types
I added a couple extra checks to the default function,
serializing datetime, dates and dataclasses (incase
orjson isn't installed)
(copied from below)
if orjson couldn't be imported, try simplejson
This is included for compatibility reasons because orjson
is rust-based and compiling on rarer architectures may not work
out of the box
as an example, I've been having issues getting it to install
on my phone (termux/android)
unlike the builtin JSON modue which serializes NamedTuples as lists
(even if you provide a default function), simplejson correctly
serializes namedtuples to dictionaries
this just gives another option to people, simplejson is pure python
so no one should have issues with that. orjson is still way faster,
so still preferable if its easy and theres a precompiled build
for your architecture (which there typically is)
If you're ever running this with simplejson installed and not orjson,
its pretty easy to tell as the JSON styling is different; orjson has
no spaces between tokens, simplejson puts spaces between tokens. e.g.
simplejson: {"a": 5, "b": 10}
orjson: {"a":5,"b":10}
allows you to do something like
hpi query --stream my.reddit.comments
to stream the JSON objects one per line, makes
it nicer to pipe into 'jq'/'fzf' instead
of having to process the giant list
at the end
- restructure query code for cli, some test fixes
- initial query_range implementation
refactored functions in query some more
to allow re-use in range_range, select()
pretty much just calls out to a bunch
of handlers now
everything is backwards-compatible with the previous
interface, the only minor changes were to the doctor cmd
which can now accept more than one item to run,
and the --skip-config-check to skip the config_ok
check if the user specifies to
added a test using click.testing.CliRunner (tests
the CLI in an isolated environment), though
additional tests which aren't testing the CLI
itself (parsing arguments or decorator behaviour)
can just call the functions themselves, as they
no longer accept a argparser.Namespace and instead
accept the direct arguments
sadly it seems that there are at several issues:
- gdpr has less detailed data so it's hard to generate a proper ID at times
- sometimes there is a small (1s?) discrepancy between created_at between same event in GDPR an API
- some API events can have duplicate payload, but different id, which violates uniqueness