9.2 KiB
This file is an overview of documented modules (which I'm progressively expanding).
There are many more, see:
- "What's inside" for the full list of modules.
- you can also run
hpi modules
to list what's available on your system - source code is always the primary source of truth
If you have some issues with the setup, see "Troubleshooting".
TOC
Intro
See SETUP to find out how to set up your own config.
Some explanations:
MY_CONFIG
is the path where you are keeping your private configuration (usually~/.config/my/
)- Path is a standard Python object to represent paths
- PathIsh is a helper type to allow using either
str
, or aPath
-
Paths is another helper type for paths.
It's 'smart', allows you to be flexible about your config:
- simple
str
or aPath
/a/path/to/directory/
, so the module will consume all files from this directory- a list of files/directories (it will be flattened)
- a glob string, so you can be flexible about the format of your data on disk (e.g. if you want to keep it compressed)
- empty string (e.g.
export_path = ''
), this will prevent the module from consuming any data This can be useful for modules that merge multiple data sources (for example,my.twitter
ormy.github
)
Typically, such variable will be passed to
get_files
to actually extract the list of real files to use. You can see usage examples here. - simple
- if the field has a default value, you can omit it from your private config altogether
For more thoughts on modules and their structure, see MODULE_DESIGN
Configs
The config snippets below are meant to be modified accordingly and pasted into your private configuration, e.g $MY_CONFIG/my/config.py
.
You don't have to set up all modules at once, it's recommended to do it gradually, to get the feel of how HPI works.
# TODO ugh, pkgutil.walk_packages doesn't recurse and find packages like my.twitter.archive??
# yep.. https://stackoverflow.com/q/41203765/706389
import importlib
# from lint import all_modules # meh
# TODO figure out how to discover configs automatically...
modules = [
('google' , 'my.google.takeout.paths'),
('hypothesis' , 'my.hypothesis' ),
('pocket' , 'my.pocket' ),
('twint' , 'my.twitter.twint' ),
('twitter_archive', 'my.twitter.archive' ),
('lastfm' , 'my.lastfm' ),
('polar' , 'my.polar' ),
('instapaper' , 'my.instapaper' ),
('github' , 'my.github.gdpr' ),
('github' , 'my.github.ghexport' ),
('kobo' , 'my.kobo' ),
]
def indent(s, spaces=4):
return ''.join(' ' * spaces + l for l in s.splitlines(keepends=True))
from pathlib import Path
import inspect
from dataclasses import fields
import re
print('\n') # ugh. hack for org-ruby drawers bug
for cls, p in modules:
m = importlib.import_module(p)
C = getattr(m, cls)
src = inspect.getsource(C)
i = src.find('@property')
if i != -1:
src = src[:i]
src = src.strip()
src = re.sub(r'(class \w+)\(.*', r'\1:', src)
mpath = p.replace('.', '/')
for x in ['.py', '__init__.py']:
if Path(mpath + x).exists():
mpath = mpath + x
print(f'** [[file:../{mpath}][{p}]]')
mdoc = m.__doc__
if mdoc is not None:
print(indent(mdoc))
print(f' #+begin_src python')
print(indent(src))
print(f' #+end_src')
my.google.takeout.paths
Module for locating and accessing Google Takeout data
class google:
takeout_path: Paths # path/paths/glob for the takeout zips
my.hypothesis
Hypothes.is highlights and annotations
class hypothesis:
'''
Uses [[https://github.com/karlicoss/hypexport][hypexport]] outputs
'''
# paths[s]/glob to the exported JSON data
export_path: Paths
my.reddit
Reddit data: saved items/comments/upvotes/etc.
class reddit:
class rexport:
'''
Uses [[https://github.com/karlicoss/rexport][rexport]] output.
'''
# path[s]/glob to the exported JSON data
export_path: Paths
class pushshift:
'''
Uses [[https://github.com/seanbreckenridge/pushshift_comment_export][pushshift]] to get access to old comments
'''
# path[s]/glob to the exported JSON data
export_path: Paths
my.pocket
Pocket bookmarks and highlights
class pocket:
'''
Uses [[https://github.com/karlicoss/pockexport][pockexport]] outputs
'''
# paths[s]/glob to the exported JSON data
export_path: Paths
my.twitter.twint
Twitter data (tweets and favorites).
Uses Twint data export.
Requirements: pip3 install --user dataset
class twint:
export_path: Paths # path[s]/glob to the twint Sqlite database
my.twitter.archive
Twitter data (uses official twitter archive export)
class twitter_archive:
export_path: Paths # path[s]/glob to the twitter archive takeout
my.lastfm
Last.fm scrobbles
class lastfm:
"""
Uses [[https://github.com/karlicoss/lastfm-backup][lastfm-backup]] outputs
"""
export_path: Paths
my.polar
Polar articles and highlights
class polar:
'''
Polar config is optional, you only need it if you want to specify custom 'polar_dir'
'''
polar_dir: PathIsh = Path('~/.polar').expanduser()
defensive: bool = True # pass False if you want it to fail faster on errors (useful for debugging)
my.instapaper
Instapaper bookmarks, highlights and annotations
class instapaper:
'''
Uses [[https://github.com/karlicoss/instapexport][instapexport]] outputs.
'''
# path[s]/glob to the exported JSON data
export_path : Paths
my.github.gdpr
Github data (uses official GDPR export)
class github:
gdpr_dir: PathIsh # path to unpacked GDPR archive
my.github.ghexport
Github data: events, comments, etc. (API data)
class github:
'''
Uses [[https://github.com/karlicoss/ghexport][ghexport]] outputs.
'''
# path[s]/glob to the exported JSON data
export_path: Paths
# path to a cache directory
# if omitted, will use /tmp
cache_dir: Optional[PathIsh] = None