Merge pull request #45 from karlicoss/better-configs

Better configs: safer and self documented
2020-05-10 18:11:52 +01:00 · 2020-05-10 18:11:52 +01:00 · d6f071e3b1
commit d6f071e3b1
parent 5f4acfddee 976b3da6f4
13 changed files with 560 additions and 38 deletions
--- a/doc/CONFIGURING.org
+++ b/doc/CONFIGURING.org
@ -0,0 +1,266 @@
+I feel like it's good to keep the rationales in the documentation,
+but happy to [[https://github.com/karlicoss/HPI/issues/46][discuss]] it here.
+
+Before discussing the abstract matters, let's consider a specific situation.
+Say, we want to let the user configure [[https://github.com/karlicoss/HPI/blob/master/my/bluemaestro/__init__.py][bluemaestro]] module.
+At the moment, it uses the following config attributes:
+
+- ~export_path~
+
+  Path to the data, this is obviously a *required* attribute
+
+- ~cache_path~
+
+  Cache is extremely useful to speed up some queries. But it's *optional*, everything should work without it.
+
+
+
+I'll refer to this config as *specific* further in the doc, and give examples. to each point. Note that they are only illustrating the specific requirement, potentially ignoring the other ones.
+Now, the requirements as I see it:
+
+1. configuration should be *extremely* flexible
+
+   We need to make sure it's very easy to combine/filter/extend data without having to modify and rewrite the module code.
+   This means using a powerful language for config, and realistically, a Turing complete.
+
+   General: that means that you should be able to use powerful syntax, potentially running arbitrary code if
+   this is something you need (for whatever mad reason). It should be possible to override config attributes in runtime, if necessary.
+
+   Specific: we've got Python already, so it makes a lot of sense to use it!
+
+   #+begin_src python
+   class bluemaestro:
+       export_path = '/path/to/bluemaestro/data'
+       cache_path  = '/tmp/bluemaestro.cache'
+   #+end_src
+
+   Downsides:
+
+   - keeping it overly flexible and powerful means it's potentially less accessible to people less familiar with programming
+
+     But see the further point about keeping it simple. I claim that simple programs look as easy as simple json.
+
+   - Python is 'less safe' than a plain json/yaml config
+
+     But at the moment the whole thing is running potentially untrusted Python code anyway.
+     It's not a tool you're going to install it across your organization, run under root privileges, and let the employers tweak it.
+
+     Ultimately, you set it up for yourself, and the config has exactly the same permissions as the code you're installing.
+     Thinking that plain config would give you more security is deceptive, and it's a false sense of security (at this stage of the project).
+
+   # TODO  I don't mind having  json/toml/whatever, but only as an additional interface
+
+   I also write more about all this [[https://beepb00p.xyz/configs-suck.html][here]].
+
+2. configuration should be *backwards compatible*
+
+   General: the whole system is pretty chaotic, it's hard to control the versioning of different modules and their compatibility.
+   It's important to allow changing attribute names and adding new functionality, while making sure the module works against an older version of the config.
+   Ideally warn the user that they'd better migrate to a newer version if the fallbacks are triggered.
+   Potentially: use individual versions for modules? Although it makes things a bit complicated.
+
+   Specific: say the module is using a new config attribute, ~timezone~.
+   We would need to adapt the module to support the old configs without timezone. For example, in ~bluemaestro.py~ (pseudo code):
+
+   #+begin_src python
+   user_config = load_user_config()
+   if not hasattr(user_config, 'timezone'):
+       warnings.warn("Please specify 'timezone' in the config! Falling back to the system timezone.")
+       user_config.timezone = get_system_timezone()
+   #+end_src
+
+   This is possible to achieve with pretty much any config format, just important to keep in mind.
+
+   Downsides: hopefully no one argues backwards compatibility is important.
+
+3. configuration should be as *easy to write* as possible
+
+   General: as lean and non-verbose as possible. No extra imports, no extra inheritance, annotations, etc. Loose coupling.
+
+   Specific: the user *only* has to specify ~export_path~ to make the module function and that's it. For example:
+
+   #+begin_src js
+   {
+        'export_path': '/path/to/bluemaestro/'
+   }
+   #+end_src
+
+   It's possible to achieve with any configuration format (aided by some helpers to fill in optional attributes etc), so it's more of a guiding principle.
+
+   Downsides:
+
+   - no (mandatory) annotations means more potential to break, but I'd rather leave this decision to the users
+
+4. configuration should be as *easy to use and extend* as possible
+
+   General: enable the users to add new config attributes and *immediately* use them without any hassle and boilerplate.
+   It's easy to achieve on it's own, but harder to achieve simultaneously with (2).
+
+   Specific: if you keep the config as Python, simply importing the config in the module satisfies this property:
+
+   #+begin_src python
+   from my.config import bluemaestro as user_config
+   #+end_src
+
+   If the config is in JSON or something, it's possible to load it dynamically too without the boilerplate.
+
+   Downsides: none, hopefully no one is against extensibility
+
+5. configuration should have checks
+
+   General: make sure it's easy to track down configuration errors. At least runtime checks for required attributes, their types, warnings, that sort of thing. But a biggie for me is using *mypy* to statically typecheck the modules.
+   To some extent it gets in the way of (2) and (4).
+
+   Specific: using ~NamedTuple/dataclass~ has capabilities to verify the config with no extra boilerplate on the user side.
+
+   #+begin_src python
+   class bluemaestro(NamedTuple):
+        export_path: str
+        cache_path : Optional[str] = None
+
+   raw_config = json.load('configs/bluemaestro.json')
+   config = bluemaestro(**raw_config)
+   #+end_src
+
+   This will fail if required =export_path= is missing, and fill optional =cache_path= with None. In addition, it's ~mypy~ friendly.
+
+   Downsides: none, especially if it's possible to turn checks on/off.
+
+6. configuration should be easy to document
+
+   General: ideally, it should be autogenerated, be self-descriptive and have some sort of schema, to make sure the documentation (which no one likes to write) doesn't diverge.
+
+   Specific: mypy annotations seem like the way to go. See the example from (5), it's pretty clear from the code what needs to be in the config.
+
+   Downsides: none, self-documented code is good.
+
+* Solution?
+
+Now I'll consider potential solutions to the configuration, taking the different requirements into account.
+
+Like I already mentioned, plain configs (JSON/YAML/TOML) are very inflexible and go against (1), which in my opinion think makes them no-go.
+
+So: my suggestion is to write the *configs as Python code*.
+It's hard to satisfy all requirements *at the same time*, but I want to argue, it's possible to satisfy most of them, depending on the maturity of the module which we're configuring.
+
+Let's say you want to write a new module. You start with a
+
+#+begin_src python
+class bluemaestro:
+    export_path = '/path/to/bluemaestro/data'
+    cache_path  = '/tmp/bluemaestro.cache'
+#+end_src
+
+And to use it:
+
+#+begin_src python
+from my.config import bluemaestro as user_config
+#+end_src
+
+Let's go through requirements:
+
+- (1): *yes*, simply importing Python code is the most flexible you can get
+- (2): *no*, but backwards compatibility is not necessary in the first version of the module
+- (3): *mostly*, although optional fields require extra work
+- (4): *yes*, whatever is in the config can immediately be used by the code
+- (5): *mostly*, imports are transparent to ~mypy~, although runtime type checks would be nice too
+- (6): *no*, you have to guess the config from the usage.
+
+This approach is extremely simple, and already *good enough for initial prototyping* or *private modules*.
+
+The main downside so far is the lack of documentation (6), which I'll try to solve next.
+I see mypy annotations as the only sane way to support it, because we also get (5) for free. So we could use:
+
+- potentially [[https://github.com/karlicoss/HPI/issues/12#issuecomment-610038961][file-config]]
+
+  However, it's using plain files and doesn't satisfy (1).
+
+  Also not sure about (5). =file-config= allows using mypy annotations, but I'm not convinced they would be correctly typed with mypy, I think you need a plugin for that.
+ 
+- [[https://mypy.readthedocs.io/en/stable/protocols.html#simple-user-defined-protocols][Protocol]]
+
+  I experimented with ~Protocol~ [[https://github.com/karlicoss/HPI/pull/45/commits/90b9d1d9c15abe3944913add5eaa5785cc3bffbc][here]].
+  It's pretty cool, very flexible, and doesn't impose any runtime modifications, which makes it good for (4).
+
+  The downsides are:
+
+  - it doesn't support optional attributes (optional as in non-required, not as ~typing.Optional~), so it goes against (3)
+  - prior to python 3.8, it's a part of =typing_extensions= rather than standard =typing=, so using it requires guarding the code with =if typing.TYPE_CHECKING=, which is a bit confusing and bloating.
+
+- =NamedTuple=
+
+  [[https://github.com/karlicoss/HPI/pull/45/commits/c877104b90c9d168eaec96e0e770e59048ce4465][Here]] I experimented with using ~NamedTuple~.
+
+  Similarly to Protocol, it's self-descriptive, and in addition allows for non-required fields.
+  # TODO something about helper methods? can't use them with Protocol
+
+  Downsides:
+  - it goes against (4), because NamedTuple (being a =tuple= in runtime) can only contain the attributes declared in the schema.
+
+- =dataclass=
+
+  Similar to =NamedTuple=, but it's possible to add extra attributes =dataclass= with ~setattr~ to implement (4).
+
+  Downsides:
+  - we partially lost (5), because dynamic attributes are not transparent to mypy.
+   
+
+My conclusion was using a *combined approach*:
+
+- Use =@dataclass= base for documentation and default attributes, achieving (6) and (3)
+- Inherit the original config class to bring in the extra attributes, achieving (4)
+
+Inheritance is a standard mechanism, which doesn't require any extra frameworks and plays well with other Python concepts. As a specific example:
+
+#+begin_src python
+from my.config import bluemaestro as user_config
+
+@dataclass
+class bluemaestro(user_config):
+    '''
+    The header of this file contributes towards the documentation
+    '''
+    export_path: str
+    cache_path : Optional[str] = None
+
+    @classmethod
+    def make_config(cls) -> 'bluemaestro':
+        params = {
+            k: v
+            for k, v in vars(cls.__base__).items()
+            if k in {f.name for f in dataclasses.fields(cls)}
+        }
+        return cls(**params)
+
+config = reddit.make_config()
+#+end_src
+
+I claim this solves pretty much everything:
+- *(1)*: yes, the config attributes are preserved and can be anything that's allowed in Python
+- *(2)*: collaterally, we also solved it, because we can adapt for renames and other legacy config adaptations in ~make_config~
+- *(3)*: supports default attributes, at no extra cost
+- *(4)*: the user config's attributes are available through the base class
+- *(5)*: everything is transparent to mypy. However, it still lacks runtime checks.
+- *(6)*: the dataclass header is easily readable, and it's possible to generate the docs automatically
+
+Downsides:
+- the =make_config= bit is a little scary and manual, however, it can be extracted in a generic helper method
+
+My conclusion is that I'm going with this approach for now.
+Note that at no stage in required any changes to the user configs, so if I missed something, it would be reversible.
+
+* Side modules :noexport:
+
+Some of TODO rexport?
+
+To some extent, this is an experiment. I'm not sure how much value is in .
+
+
+One thing are TODO software? libraries that have fairly well defined APIs and you can reasonably version them.
+
+Another thing is the modules for accessing data, where you'd hopefully have everything backwards compatible.
+Maybe in the future
+
+I'm just not sure, happy to hear people's opinions on this.
+
+
--- a/doc/MODULES.org
+++ b/doc/MODULES.org
@ -0,0 +1,108 @@
+This file is an overview of *documented* modules. There are many more, see [[file:../README.org::#whats-inside]["What's inside"]] for the full list of modules.
+
+See [[file:SETUP.org][SETUP]] to find out how to set up your own config.
+
+Some explanations:
+
+- [[https://docs.python.org/3/library/pathlib.html#pathlib.Path][Path]] is a standard Python object to represent paths
+- [[https://github.com/karlicoss/HPI/blob/5f4acfddeeeba18237e8b039c8f62bcaa62a4ac2/my/core/common.py#L9][PathIsh]] is a helper type to allow using either =str=, or a =Path=
+- [[https://github.com/karlicoss/HPI/blob/5f4acfddeeeba18237e8b039c8f62bcaa62a4ac2/my/core/common.py#L108][Paths]] is another helper type for paths.
+
+  It's 'smart', allows you to be flexible about your config:
+
+  - simple =str= or a =Path=
+  - =/a/path/to/directory/=, so the module will consume all files from this directory
+  - a list of files/directories (it will be flattened)
+  - a [[https://docs.python.org/3/library/glob.html?highlight=glob#glob.glob][glob]] string, so you can be flexible about the format of your data on disk (e.g. if you want to keep it compressed)
+
+  Typically, such variable will be passed to =get_files= to actually extract the list of real files to use. You can see usage examples [[https://github.com/karlicoss/HPI/blob/master/tests/get_files.py][here]].
+
+- if the field has a default value, you can omit it from your private config.
+
+
+Modules:
+
+#+begin_src python :dir .. :results output drawer :exports result
+# TODO ugh, pkgutil.walk_packages doesn't recurse and find packages like my.twitter.archive??
+import importlib
+# from lint import all_modules # meh
+# TODO figure out how to discover configs automatically...
+modules = [
+    ('google' , 'my.google.takeout.paths'),
+    ('reddit' , 'my.reddit'              ),
+    ('twint'  , 'my.twitter.twint'       ),
+    ('twitter', 'my.twitter.archive'     ),
+]
+
+def indent(s, spaces=4):
+    return ''.join(' ' * spaces + l for l in s.splitlines(keepends=True))
+
+from pathlib import Path
+import inspect
+from dataclasses import fields
+import re
+print('\n') # ugh. hack for org-ruby drawers bug
+for cls, p in modules:
+    m = importlib.import_module(p)
+    C = getattr(m, cls)
+    src = inspect.getsource(C)
+    i = src.find('@property')
+    if i != -1:
+        src = src[:i]
+    src = src.strip()
+    src = re.sub(r'(class \w+)\(.*', r'\1:', src)
+    mpath = p.replace('.', '/')
+    for x in ['.py', '__init__.py']:
+        if Path(mpath + x).exists():
+            mpath = mpath + x
+    print(f'- [[file:../{mpath}][{p}]]')
+    mdoc = m.__doc__
+    if mdoc is not None:
+        print(indent(mdoc))
+    print(f'    #+begin_src python')
+    print(indent(src))
+    print(f'    #+end_src')
+#+end_src
+
+#+RESULTS:
+:results:
+
+
+- [[file:../my/google/takeout/paths.py][my.google.takeout.paths]]
+
+    Module for locating and accessing [[https://takeout.google.com][Google Takeout]] data
+
+    #+begin_src python
+    class google:
+        takeout_path: Paths # path/paths/glob for the takeout zips
+    #+end_src
+- [[file:../my/reddit.py][my.reddit]]
+
+    Reddit data: saved items/comments/upvotes/etc.
+
+    Uses [[https://github.com/karlicoss/rexport][rexport]] output.
+
+    #+begin_src python
+    class reddit:
+        export_path: Paths                     # path[s]/glob to the exported data
+        rexport    : Optional[PathIsh] = None  # path to a local clone of rexport
+    #+end_src
+- [[file:../my/twitter/twint.py][my.twitter.twint]]
+
+    Twitter data (tweets and favorites).
+
+    Uses [[https://github.com/twintproject/twint][Twint]] data export.
+
+    #+begin_src python
+    class twint:
+        export_path: Paths # path[s]/glob to the twint Sqlite database
+    #+end_src
+- [[file:../my/twitter/archive.py][my.twitter.archive]]
+
+    Twitter data (uses [[https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive][official twitter archive export]])
+
+    #+begin_src python
+    class twitter:
+        export_path: Paths # path[s]/glob to the twitter archive takeout
+    #+end_src
+:end:
--- a/doc/SETUP.org
+++ b/doc/SETUP.org
@ -73,6 +73,9 @@ They aren't necessary, but improve your experience. At the moment these are:
 This is an *optional step* as some modules might work without extra setup.
 But it depends on the specific module.

+You might also find interesting to read [[file:CONFIGURING.org][CONFIGURING]], where I'm
+elaborating on some rationales behind the current configuration system.
+
 ** private configuration (=my.config=)
 # TODO write about dynamic configuration
 # TODO add a command to edit config?? e.g. HPI config edit
@ -103,12 +106,15 @@ Since it's a Python package, generally it's very *flexible* and there are many w
      username    = 'karlicoss'

  #+end_src
- 
-  I'm [[https://github.com/karlicoss/HPI/issues/12][working]] on improving the documentation for configuring the individual modules,
-  but in the meantime the easiest is perhaps to skim through the code of the module and see what config attributes it's using.

-  For example, if you search for =config.= in [[file:../my/emfit/__init__.py][emfit module]], you'll see that it's using =export_path=, =tz=, =excluded_sids= and =cache_path=.
-  Or you can just try running them and fill in the attributes Python complains about.
+  To find out which attributes you need to specify:
+
+  - check in [[file:MODULES.org][MODULES]]
+  - if there is nothing there, the easiest is perhaps to skim through the code of the module and to search for =config.= uses.
+   
+    For example, if you search for =config.= in [[file:../my/emfit/__init__.py][emfit module]], you'll see that it's using =export_path=, =tz=, =excluded_sids= and =cache_path=.
+
+  - or you can just try running them and fill in the attributes Python complains about!

 - My config layout is a bit more complicated:

--- a/my/cfg.py
+++ b/my/cfg.py
@ -16,11 +16,14 @@ After that, you can set config attributes:
 import my.config as config


-def set_repo(name: str, repo):
+from pathlib import Path
+from typing import Union
+def set_repo(name: str, repo: Union[Path, str]) -> None:
    from .core.init import assign_module
    from . common import import_from

-    module = import_from(repo, name)
+    r = Path(repo)
+    module = import_from(r.parent, name)
    assign_module('my.config.repos', name, module)


--- a/my/core/cfg.py
+++ b/my/core/cfg.py
@ -0,0 +1,18 @@
+from typing import TypeVar, Type, Callable, Dict, Any
+
+Attrs = Dict[str, Any]
+
+C = TypeVar('C')
+
+# todo not sure about it, could be overthinking...
+# but short enough to change later
+def make_config(cls: Type[C], migration: Callable[[Attrs], Attrs]=lambda x: x) -> C:
+    props = dict(vars(cls.__base__))
+    props = migration(props)
+    from dataclasses import fields
+    params = {
+        k: v
+        for k, v in props.items()
+        if k in {f.name for f in fields(cls)}
+    }
+    return cls(**params) # type: ignore[call-arg]
--- a/my/core/common.py
+++ b/my/core/common.py
@ -195,3 +195,27 @@ def fastermime(path: PathIsh) -> str:


 Json = Dict[str, Any]
+
+
+from typing import TypeVar, Callable, Generic
+
+_C = TypeVar('_C')
+_R = TypeVar('_R')
+
+# https://stackoverflow.com/a/5192374/706389
+class classproperty(Generic[_R]):
+    def __init__(self, f: Callable[[_C], _R]) -> None:
+        self.f = f
+
+    def __get__(self, obj: None, cls: _C) -> _R:
+        return self.f(cls)
+
+
+# hmm, this doesn't really work with mypy well..
+# https://github.com/python/mypy/issues/6244
+# class staticproperty(Generic[_R]):
+#     def __init__(self, f: Callable[[], _R]) -> None:
+#         self.f = f
+#
+#     def __get__(self) -> _R:
+#         return self.f()
--- a/my/core/time.py
+++ b/my/core/time.py
@ -1,5 +1,5 @@
 from functools import lru_cache
-from datetime import datetime
+from datetime import datetime, tzinfo

 import pytz # type: ignore

@ -11,6 +11,7 @@ tz_lookup = {
 tz_lookup['UTC'] = pytz.utc # ugh. otherwise it'z Zulu...


+# TODO dammit, lru_cache interferes with mypy?
@lru_cache(None)
-def abbr_to_timezone(abbr: str):
+def abbr_to_timezone(abbr: str) -> tzinfo:
    return tz_lookup[abbr]
--- a/my/google/takeout/paths.py
+++ b/my/google/takeout/paths.py
@ -1,10 +1,27 @@
+'''
+Module for locating and accessing [[https://takeout.google.com][Google Takeout]] data
+'''
+
+from dataclasses import dataclass
+from ...core.common import Paths
+
+from my.config import google as user_config
+@dataclass
+class google(user_config):
+    takeout_path: Paths # path/paths/glob for the takeout zips
+###
+
+# TODO rename 'google' to 'takeout'? not sure
+
+from ...core.cfg import make_config
+config = make_config(google)
+
 from pathlib import Path
 from typing import Optional, Iterable

 from ...common import get_files
 from ...kython.kompress import kopen, kexists

-from my.config import google as config

 def get_takeouts(*, path: Optional[str]=None) -> Iterable[Path]:
    """
--- a/my/reddit.py
+++ b/my/reddit.py
@ -1,26 +1,74 @@
 """
 Reddit data: saved items/comments/upvotes/etc.
+
+Uses [[https://github.com/karlicoss/rexport][rexport]] output.
 """
-from pathlib import Path
+
+from typing import Optional
+from .core.common import Paths, PathIsh
+
+from types import ModuleType
+from my.config import reddit as uconfig
+from dataclasses import dataclass
+
+@dataclass
+class reddit(uconfig):
+    export_path: Paths                     # path[s]/glob to the exported data
+    rexport    : Optional[PathIsh] = None  # path to a local clone of rexport
+
+    @property
+    def rexport_module(self) -> ModuleType:
+        # todo return Type[rexport]??
+        # todo ModuleIsh?
+        rpath = self.rexport
+        if rpath is not None:
+            from my.cfg import set_repo
+            set_repo('rexport', rpath)
+
+        import my.config.repos.rexport.dal as m
+        return m
+
+
+from .core.cfg import make_config, Attrs
+# hmm, also nice thing about this is that migration is possible to test without the rest of the config?
+def migration(attrs: Attrs) -> Attrs:
+    if 'export_dir' in attrs: # legacy name
+        attrs['export_path'] = attrs['export_dir']
+    return attrs
+config = make_config(reddit, migration=migration)
+
+###
+# TODO not sure about the laziness...
+
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    # TODO not sure what is the right way to handle this..
+    import my.config.repos.rexport.dal as rexport
+else:
+    # TODO ugh. this would import too early
+    # but on the other hand we do want to bring the objects into the scope for easier imports, etc. ugh!
+    # ok, fair enough I suppose. It makes sense to configure something before using it. can always figure it out later..
+    # maybe, the config could dynamically detect change and reimport itself? dunno.
+    rexport = config.rexport_module
+###
+
+
 from typing import List, Sequence, Mapping, Iterator
+from .core.common import mcachew, get_files, LazyLogger, make_dict

+
+logger = LazyLogger(__name__, level='debug')
+
+
+from pathlib import Path
 from .kython.kompress import CPath
-from .common import mcachew, get_files, LazyLogger, make_dict
-
-from my.config import reddit as config
-import my.config.repos.rexport.dal as rexport
-
-
 def inputs() -> Sequence[Path]:
-    # TODO rename to export_path?
-    files = get_files(config.export_dir)
+    files = get_files(config.export_path)
    # TODO Cpath better be automatic by get_files...
    res = list(map(CPath, files)); assert len(res) > 0
    # todo move the assert to get_files?
    return tuple(res)

-logger = LazyLogger(__name__, level='debug')
-

 Sid        = rexport.Sid
 Save       = rexport.Save
@ -64,10 +112,6 @@ from multiprocessing import Pool

 # TODO hmm. apparently decompressing takes quite a bit of time...

-def reddit(suffix: str) -> str:
-    return 'https://reddit.com' + suffix
-
-
 class SaveWithDt(NamedTuple):
    save: Save
    backup_dt: datetime
--- a/my/twitter/archive.py
+++ b/my/twitter/archive.py
@ -1,6 +1,22 @@
 """
 Twitter data (uses [[https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive][official twitter archive export]])
 """
+from dataclasses import dataclass
+from ..core.common import Paths
+
+from my.config import twitter as user_config
+
+@dataclass
+class twitter(user_config):
+    export_path: Paths # path[s]/glob to the twitter archive takeout
+
+
+###
+
+from ..core.cfg import make_config
+config = make_config(twitter)
+
+
 from datetime import datetime
 from typing import Union, List, Dict, Set, Optional, Iterator, Any, NamedTuple
 from pathlib import Path
@ -13,14 +29,13 @@ import pytz
 from ..common import PathIsh, get_files, LazyLogger, Json
 from ..kython import kompress

-from my.config import twitter as config


 logger = LazyLogger(__name__)


 def _get_export() -> Path:
-    return max(get_files(config.export_path, '*.zip'))
+    return max(get_files(config.export_path))


 Tid = str
--- a/my/twitter/twint.py
+++ b/my/twitter/twint.py
@ -1,24 +1,34 @@
 """
-Twitter data (tweets and favorites). Uses [[https://github.com/twintproject/twint][Twint]] data export.
+Twitter data (tweets and favorites).
+
+Uses [[https://github.com/twintproject/twint][Twint]] data export.
 """

+from ..core.common import Paths
+from dataclasses import dataclass
+from my.config import twint as user_config
+
+@dataclass
+class twint(user_config):
+    export_path: Paths # path[s]/glob to the twint Sqlite database
+
+
+from ..core.cfg import make_config
+config = make_config(twint)
+
+
 from datetime import datetime
 from typing import NamedTuple, Iterable, List
 from pathlib import Path

-from ..common import PathIsh, get_files, LazyLogger, Json
+from ..core.common import get_files, LazyLogger, Json
 from ..core.time import abbr_to_timezone

-from my.config import twint as config
-
-
 log = LazyLogger(__name__)


 def get_db_path() -> Path:
-    # TODO don't like the hardcoded extension. maybe, config should decide?
-    # or, glob only applies to directories?
-    return max(get_files(config.export_path, glob='*.db'))
+    return max(get_files(config.export_path))


 class Tweet(NamedTuple):
--- a/tests/config.py
+++ b/tests/config.py
@ -55,8 +55,7 @@ DAL = None
    ''')

    from my.cfg import set_repo
-    # FIXME meh. hot sure about setting the parent??
-    set_repo('hypexport', tmp_path)
+    set_repo('hypexport', fake_hypexport)

    # should succeed now!
    import my.hypothesis
--- a/tests/reddit.py
+++ b/tests/reddit.py
@ -1,16 +1,17 @@
 from datetime import datetime
 import pytz

-from my.reddit import events, inputs, saved
 from my.common import make_dict


 def test() -> None:
+    from my.reddit import events, inputs, saved
    list(events())
    list(saved())


 def test_unfav() -> None:
+    from my.reddit import events, inputs, saved
    ev = events()
    url = 'https://reddit.com/r/QuantifiedSelf/comments/acxy1v/personal_dashboard/'
    uev = [e for e in ev if e.url == url]
@ -23,6 +24,7 @@ def test_unfav() -> None:


 def test_saves() -> None:
+    from my.reddit import events, inputs, saved
    # TODO not sure if this is necesasry anymore?
    saves = list(saved())
    # just check that they are unique..
@ -30,6 +32,7 @@ def test_saves() -> None:


 def test_disappearing() -> None:
+    from my.reddit import events, inputs, saved
    # eh. so for instance, 'metro line colors' is missing from reddit-20190402005024.json for no reason
    # but I guess it was just a short glitch... so whatever
    saves = events()
@ -39,12 +42,18 @@ def test_disappearing() -> None:


 def test_unfavorite() -> None:
+    from my.reddit import events, inputs, saved
    evs = events()
    unfavs = [s for s in evs if s.text == 'unfavorited']
    [xxx] = [u for u in unfavs if u.eid == 'unf-19ifop']
    assert xxx.dt == datetime(2019, 1, 28, 8, 10, 20, tzinfo=pytz.utc)


+def test_extra_attr() -> None:
+    from my.reddit import config
+    assert isinstance(getattr(config, 'passthrough'), str)
+
+
 import pytest # type: ignore
@pytest.fixture(autouse=True, scope='module')
 def prepare():
@ -55,3 +64,5 @@ def prepare():
    # first bit is for 'test_unfavorite, the second is for test_disappearing
    files = files[300:330] + files[500:520]
    config.export_dir = files # type: ignore
+
+    setattr(config, 'passthrough', "isn't handled, but available dynamically nevertheless")