fix fairly uncontroversial stuff in my.core like
- line spacing, which isn't too annoying (e.g. unlike many inline whitespace checks that break vertical formatting)
- unused imports/variables
- too broad except
- use portable separators
- paths should be prepended with r' (so backwards slash isn't treated as escaping)
- sqlite connections should be closed (otherwise windows fails to remove the underlying db file)
- workaround for emojis via PYTHONUTF8=1 test for now
- make ZipPath portable
- properly use tox python environment everywhere
this was causing issues on Windows
e.g.
WARNING: test command found but not installed in testenv
cmd: C:\hostedtoolcache\windows\Python\3.9.12\x64\python3.EXE
doctor: better quick option propogation for stats
* use contextmanager for quick stats instead of editing global state
directly
* send quick to lots of stat related functions, so they
could possibly be used without doctor, if someone wanted to
* if a stats function has a 'quick' kwarg, send the value
there as well
* add an option to sort locations in my.time.tz.via_location
- fallback on the old logic if google_takeout_parser isn't available
- move to my.youtube.takeout (possibly mixing in other sources later)
- keep my.media.youtube, but issue deprecation warning
currently used in orger etc, so doesn't hurt to keep
- also fixes https://github.com/karlicoss/HPI/issues/113
adapted from https://github.com/seanbreckenridge/HPI/blob/master/my/google_takeout.py
additions:
- pass my.core.time.user_forced() to google_takeout_parser
without it, BST gets weird results for me, e.g. US/Aleutian
- support ZipPath via a config switch
- flexible error handling via a config switch
previously it would crash with:
SyntaxError: Forward reference must be an expression -- got 'yield'
(reproducible via python3 -c 'from typing import Union; Union[int, "yield"]' )
* core: add ZipPath encapsulating compressed zip files
this way you don't have to unpack it first and can work as if it's a 'virtual' directory
related: https://github.com/karlicoss/HPI/issues/20
core/source: use import error
uses the more broad ImportError
instead of ModuleNotFoundError
reasoning being if some submodule
(the one I'm configuring currently is
my.twitter.twint) doesn't have additional
imports from another parser/DAL, but it
still has a config block, the user would
have to create a stub-config block in their
config to use the all.py file
adds some short flags as CLI flags for convenience
the --stream flag previously only affected json, but
I can imagine '-o pprint -s -l 5' to print the first
5 items from some function could be useful as well
* initial pushshift/rexport merge implementation, using id for merging
* smarter module deprecation warning using regex
* add `RedditBase` from promnesia
* `import_source` helper for gracefully handing mixin data sources
* my.core.serialize: simplejson support, more types
I added a couple extra checks to the default function,
serializing datetime, dates and dataclasses (incase
orjson isn't installed)
(copied from below)
if orjson couldn't be imported, try simplejson
This is included for compatibility reasons because orjson
is rust-based and compiling on rarer architectures may not work
out of the box
as an example, I've been having issues getting it to install
on my phone (termux/android)
unlike the builtin JSON modue which serializes NamedTuples as lists
(even if you provide a default function), simplejson correctly
serializes namedtuples to dictionaries
this just gives another option to people, simplejson is pure python
so no one should have issues with that. orjson is still way faster,
so still preferable if its easy and theres a precompiled build
for your architecture (which there typically is)
If you're ever running this with simplejson installed and not orjson,
its pretty easy to tell as the JSON styling is different; orjson has
no spaces between tokens, simplejson puts spaces between tokens. e.g.
simplejson: {"a": 5, "b": 10}
orjson: {"a":5,"b":10}
allows you to do something like
hpi query --stream my.reddit.comments
to stream the JSON objects one per line, makes
it nicer to pipe into 'jq'/'fzf' instead
of having to process the giant list
at the end