This file is an overview of *documented* modules. There are many more, see [[file:../README.org::#whats-inside]["What's inside"]] for the full list of modules.

See [[file:SETUP.org][SETUP]] to find out how to set up your own config.

Some explanations:

- [[https://docs.python.org/3/library/pathlib.html#pathlib.Path][Path]] is a standard Python object to represent paths
- [[https://github.com/karlicoss/HPI/blob/5f4acfddeeeba18237e8b039c8f62bcaa62a4ac2/my/core/common.py#L9][PathIsh]] is a helper type to allow using either =str=, or a =Path=
- [[https://github.com/karlicoss/HPI/blob/5f4acfddeeeba18237e8b039c8f62bcaa62a4ac2/my/core/common.py#L108][Paths]] is another helper type for paths.

  It's 'smart', allows you to be flexible about your config:

  - simple =str= or a =Path=
  - =/a/path/to/directory/=, so the module will consume all files from this directory
  - a list of files/directories (it will be flattened)
  - a [[https://docs.python.org/3/library/glob.html?highlight=glob#glob.glob][glob]] string, so you can be flexible about the format of your data on disk (e.g. if you want to keep it compressed)

  Typically, such variable will be passed to =get_files= to actually extract the list of real files to use. You can see usage examples [[https://github.com/karlicoss/HPI/blob/master/tests/get_files.py][here]].

- if the field has a default value, you can omit it from your private config.


Modules:

#+begin_src python :dir .. :results output drawer :exports result
# TODO ugh, pkgutil.walk_packages doesn't recurse and find packages like my.twitter.archive??
import importlib
# from lint import all_modules # meh
# TODO figure out how to discover configs automatically...
modules = [
    ('google' , 'my.google.takeout.paths'),
    ('reddit' , 'my.reddit'              ),
    ('twint'  , 'my.twitter.twint'       ),
    ('twitter', 'my.twitter.archive'     ),
]

def indent(s, spaces=4):
    return ''.join(' ' * spaces + l for l in s.splitlines(keepends=True))

from pathlib import Path
import inspect
from dataclasses import fields
import re
print('\n') # ugh. hack for org-ruby drawers bug
for cls, p in modules:
    m = importlib.import_module(p)
    C = getattr(m, cls)
    src = inspect.getsource(C)
    i = src.find('@property')
    if i != -1:
        src = src[:i]
    src = src.strip()
    src = re.sub(r'(class \w+)\(.*', r'\1:', src)
    mpath = p.replace('.', '/')
    for x in ['.py', '__init__.py']:
        if Path(mpath + x).exists():
            mpath = mpath + x
    print(f'- [[file:../{mpath}][{p}]]')
    mdoc = m.__doc__
    if mdoc is not None:
        print(indent(mdoc))
    print(f'    #+begin_src python')
    print(indent(src))
    print(f'    #+end_src')
#+end_src

#+RESULTS:
:results:


- [[file:../my/google/takeout/paths.py][my.google.takeout.paths]]

    Module for locating and accessing [[https://takeout.google.com][Google Takeout]] data

    #+begin_src python
    class google:
        takeout_path: Paths # path/paths/glob for the takeout zips
    #+end_src
- [[file:../my/reddit.py][my.reddit]]

    Reddit data: saved items/comments/upvotes/etc.

    Uses [[https://github.com/karlicoss/rexport][rexport]] output.

    #+begin_src python
    class reddit:
        export_path: Paths                     # path[s]/glob to the exported data
        rexport    : Optional[PathIsh] = None  # path to a local clone of rexport
    #+end_src
- [[file:../my/twitter/twint.py][my.twitter.twint]]

    Twitter data (tweets and favorites).

    Uses [[https://github.com/twintproject/twint][Twint]] data export.

    #+begin_src python
    class twint:
        export_path: Paths # path[s]/glob to the twint Sqlite database
    #+end_src
- [[file:../my/twitter/archive.py][my.twitter.archive]]

    Twitter data (uses [[https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive][official twitter archive export]])

    #+begin_src python
    class twitter:
        export_path: Paths # path[s]/glob to the twitter archive takeout
    #+end_src
:end: