HPI/doc/MODULES.org at 8d998146e29cd9dcbd424b6835bc2acd2b0937df

fz0x1/HPI

Dima Gerasimov eba2d26b31 Update lastfm order/tests/docs

2020-05-13 22:52:23 +01:00

4.2 KiB

Raw Blame History

This file is an overview of documented modules. There are many more, see "What's inside" for the full list of modules.

See SETUP to find out how to set up your own config.

Some explanations:

Path is a standard Python object to represent paths
PathIsh is a helper type to allow using either str, or a Path
Paths is another helper type for paths.

It's 'smart', allows you to be flexible about your config:
- simple str or a Path
- /a/path/to/directory/, so the module will consume all files from this directory
- a list of files/directories (it will be flattened)
- a glob string, so you can be flexible about the format of your data on disk (e.g. if you want to keep it compressed)
Typically, such variable will be passed to get_files to actually extract the list of real files to use. You can see usage examples here.
if the field has a default value, you can omit it from your private config.

Modules:

# TODO ugh, pkgutil.walk_packages doesn't recurse and find packages like my.twitter.archive??
import importlib
# from lint import all_modules # meh
# TODO figure out how to discover configs automatically...
modules = [
    ('google' , 'my.google.takeout.paths'),
    ('reddit' , 'my.reddit'              ),
    ('twint'  , 'my.twitter.twint'       ),
    ('twitter', 'my.twitter.archive'     ),
    ('lastfm' , 'my.lastfm'              ),
]

def indent(s, spaces=4):
    return ''.join(' ' * spaces + l for l in s.splitlines(keepends=True))

from pathlib import Path
import inspect
from dataclasses import fields
import re
print('\n') # ugh. hack for org-ruby drawers bug
for cls, p in modules:
    m = importlib.import_module(p)
    C = getattr(m, cls)
    src = inspect.getsource(C)
    i = src.find('@property')
    if i != -1:
        src = src[:i]
    src = src.strip()
    src = re.sub(r'(class \w+)\(.*', r'\1:', src)
    mpath = p.replace('.', '/')
    for x in ['.py', '__init__.py']:
        if Path(mpath + x).exists():
            mpath = mpath + x
    print(f'- [[file:../{mpath}][{p}]]')
    mdoc = m.__doc__
    if mdoc is not None:
        print(indent(mdoc))
    print(f'    #+begin_src python')
    print(indent(src))
    print(f'    #+end_src')

my.google.takeout.paths

Module for locating and accessing Google Takeout data

class google:
    takeout_path: Paths # path/paths/glob for the takeout zips

my.reddit

Reddit data: saved items/comments/upvotes/etc.

Uses rexport output.

class reddit:
    export_path: Paths                     # path[s]/glob to the exported data
    rexport    : Optional[PathIsh] = None  # path to a local clone of rexport

my.twitter.twint

Twitter data (tweets and favorites).

Uses Twint data export.

class twint:
    export_path: Paths # path[s]/glob to the twint Sqlite database

my.twitter.archive

Twitter data (uses official twitter archive export)

class twitter:
    export_path: Paths # path[s]/glob to the twitter archive takeout

my.lastfm

Last.fm scrobbles

class lastfm:
    """
    Uses [[https://github.com/karlicoss/lastfm-backup][lastfm-backup]] outputs
    """
    export_path: Paths

4.2 KiB Raw Blame History

4.2 KiB

Raw Blame History