diff --git a/README.org b/README.org index 8b3268a..8b73c4e 100644 --- a/README.org +++ b/README.org @@ -10,8 +10,9 @@ If you're in a hurry, feel free to jump straight to the [[#usecases][demos]]. - see [[https://github.com/karlicoss/HPI/tree/master/doc/SETUP.org][SETUP]] for the *installation/configuration guide* -- see [[https://github.com/karlicoss/HPI/tree/master/doc/DEVELOPMENT.org][DEVELOPMENT]] for the *development/extension guide* +- see [[https://github.com/karlicoss/HPI/tree/master/doc/DEVELOPMENT.org][DEVELOPMENT]] for the *development guide* - see [[https://github.com/karlicoss/HPI/tree/master/doc/DESIGN.org][DESIGN]] for the *design goals* +- see [[https://github.com/karlicoss/HPI/tree/master/doc/MODULE_DESIGN.org][MODULE_DESIGN]] for some thoughts on structuring modules, and possibly *extending HPI* - see [[https://beepb00p.xyz/exobrain/projects/hpi.html][exobrain/HPI]] for some of my raw thoughts and todos on the project *TLDR*: I'm using [[https://github.com/karlicoss/HPI][HPI]] (Human Programming Interface) package as a means of unifying, accessing and interacting with all of my personal data. diff --git a/doc/CONFIGURING.org b/doc/CONFIGURING.org index ef060fb..eadb263 100644 --- a/doc/CONFIGURING.org +++ b/doc/CONFIGURING.org @@ -17,8 +17,6 @@ At the moment, it uses the following config attributes: Cache is extremely useful to speed up some queries. But it's *optional*, everything should work without it. - - I'll refer to this config as *specific* further in the doc, and give examples. to each point. Note that they are only illustrating the specific requirement, potentially ignoring the other ones. Now, the requirements as I see it: @@ -42,9 +40,9 @@ Now, the requirements as I see it: - keeping it overly flexible and powerful means it's potentially less accessible to people less familiar with programming - But see the further point about keeping it simple. I claim that simple programs look as easy as simple json. + But see the further point about keeping it simple. I claim that simple programs look as easy as simple JSON. - - Python is 'less safe' than a plain json/yaml config + - Python is 'less safe' than a plain JSON/YAML config But at the moment the whole thing is running potentially untrusted Python code anyway. It's not a tool you're going to install it across your organization, run under root privileges, and let the employers tweak it. @@ -52,7 +50,7 @@ Now, the requirements as I see it: Ultimately, you set it up for yourself, and the config has exactly the same permissions as the code you're installing. Thinking that plain config would give you more security is deceptive, and it's a false sense of security (at this stage of the project). - # TODO I don't mind having json/toml/whatever, but only as an additional interface + # TODO I don't mind having JSON/TOML/whatever, but only as an additional interface I also write more about all this [[https://beepb00p.xyz/configs-suck.html][here]]. @@ -295,12 +293,9 @@ Some of TODO rexport? To some extent, this is an experiment. I'm not sure how much value is in . - One thing are TODO software? libraries that have fairly well defined APIs and you can reasonably version them. Another thing is the modules for accessing data, where you'd hopefully have everything backwards compatible. Maybe in the future I'm just not sure, happy to hear people's opinions on this. - - diff --git a/doc/CONTRIBUTING.org b/doc/CONTRIBUTING.org index e7f1e59..6dc69b1 100644 --- a/doc/CONTRIBUTING.org +++ b/doc/CONTRIBUTING.org @@ -8,3 +8,5 @@ doc in progress Of course reasonable formatting improvements (like obvious typos, missing spaces or too dense code) are welcome. And of course, if we end up collaborating a lot on the project I'm open to discussion if automatic code style is really important to you. + +- See [[file:MODULE_DESIGN.org][MODULE_DESIGN.org]] for common practices in HPI diff --git a/doc/DESIGN.org b/doc/DESIGN.org index 0e4ed61..b8d40f9 100644 --- a/doc/DESIGN.org +++ b/doc/DESIGN.org @@ -4,7 +4,8 @@ note: this doc is in progress - interoperable - This is the main motivation and [[file::README.org::#why][why]] I created HPI in the first place. + # note: this link doesnt work in org, but does for the github preview + This is the main motivation and [[file:../README.org#why][why]] I created HPI in the first place. Ideally it should be possible to hook into anything you can imagine -- regardless the database/programming language/etc. @@ -31,12 +32,12 @@ note: this doc is in progress Data is inherently messy, and it's inevitable to get parsing errors and missing fields now and then. - I'm trying to combat this with [[https://beepb00p.xyz/mypy-error-handling.html][mypy assisted error handlign]], + I'm trying to combat this with [[https://beepb00p.xyz/mypy-error-handling.html][mypy assisted error handling]], so you are aware of errors, but still can work with the 'good' subset of data. - robust - The code is extensively covered with tests & mypy to make sure it doesn't rot. + The code is extensively covered with tests & ~mypy~ to make sure it doesn't rot. I also try to keep everything as backwards compatible as possible. - (almost) no magic @@ -49,7 +50,6 @@ note: this doc is in progress - use mature tools like =pip= or =mypy= - - * other docs - [[file:CONFIGURING.org][some decisions around HPI configuration 'system']] +- [[file:MODULE_DESIGN.org][some thoughts on the modules, their design, and adding new ones]] diff --git a/doc/MODULES.org b/doc/MODULES.org index a1e91c0..2a1bed8 100644 --- a/doc/MODULES.org +++ b/doc/MODULES.org @@ -52,9 +52,11 @@ Some explanations: This can be useful for modules that merge multiple data sources (for example, =my.twitter= or =my.github=) Typically, such variable will be passed to =get_files= to actually extract the list of real files to use. You can see usage examples [[https://github.com/karlicoss/HPI/blob/master/tests/get_files.py][here]]. - + - if the field has a default value, you can omit it from your private config altogether +For more thoughts on modules and their structure, see [[file:MODULE_DESIGN.org][MODULE_DESIGN]] + * Configs The config snippets below are meant to be modified accordingly and *pasted into your private configuration*, e.g =$MY_CONFIG/my/config.py=. diff --git a/doc/MODULE_DESIGN.org b/doc/MODULE_DESIGN.org new file mode 100644 index 0000000..a238342 --- /dev/null +++ b/doc/MODULE_DESIGN.org @@ -0,0 +1,148 @@ +Some thoughts on modules, how to structure them, and adding your own/extending HPI + +This is slightly more advanced, and would be useful if you're trying to extend HPI by developing your own modules, or contributing back to HPI + +* module count + + Having way too many modules could end up being an issue. For now, I'm basically happy to merge new modules - With the current module count, things don't seem to break much, and most of them are modules I use myself, so they get tested with my own data. + + For services I don't use, I would prefer if they had tests/example data somewhere, else I can't guarantee they're still working... + + Its great if when you start using HPI, you get a few modules 'for free' (perhaps ~github~ and ~reddit~), but its likely not everyone uses the same services + + This shouldn't end up becoming a monorepo (a la [[https://www.spacemacs.org/][Spacemacs]]) with hundreds of modules supporting every use case. Its hard to know what the common usecase is for everyone, and new services/companies which silo your data appear all the time... + + Its also not obvious how people want to access their data. This problem is often mitigated by the output of HPI being python functions -- one can always write a small script to take the output data from a module and wrangle it into some format you want + + This is why HPI aims to be as extendable as possible. If you have some programming know-how, hopefully you're able to create some basic modules for yourself - plug in your own data and gain the benefits of using the functions in ~my.core~, the configuration layer and possibly libraries like [[https://github.com/karlicoss/cachew][cachew]] to 'automatically' cache your data + + In some ways it may make sense to think of HPI as akin to emacs or a ones 'dotfiles'. This provides a configuration layer and structure for you to access your data, and you can extend it to your own use case. + +* single file modules + +... or, the question 'should we split code from individual HPI files into setuptools packages' + +It's possible for a single HPI module or file to handle *everything*. Most of the python files in ~my/~ are 'single file' modules + +By everything, I mean: + + - Exporting data from an API/locating data on your disk/maybe saving data so you don't lose it + - Parsing data from some raw (JSON/SQLite/HTML) format + - Merging different data sources into some common =NamedTuple=-like schema + - caching expensive computation/merge results + - configuration through ~my.config~ + +For short modules which aren't that complex, while developing your own personal modules, or while bootstrapping modules - this is actually fine. + +From a users perspective, the ability to clone and install HPI as editable, add an new python file into ~my/~, and it immediately be accessible as ~my.modulename~ is a pattern that should always be supported + +However, as modules get more and more complex, especially if they include backing up/locating data from some location on your filesystem or interacting with a live API -- ideally they should be split off into their own repositories. There are trade-offs to doing this, but they are typically worth it. + +As an example of this, take a look at the [[https://github.com/karlicoss/HPI/tree/5ef277526577daaa115223e79a07a064ffa9bc85/my/github][my.github]] and the corresponding [[https://github.com/karlicoss/ghexport][ghexport]] data exporter which saves github data. + +- Pros: + - This allows someone to install and use ~ghexport~ without having to setup HPI at all -- its a standalone tool which means there's less barrier to entry + - It being a separate repository means issues relating to exporting data and the [[https://beepb00p.xyz/exports.html#dal][DAL]] (loading the data) can be handled there, instead of in HPI + - This reduces complexity for someone looking at the ~my.github~ files trying to debug issues related to HPI. The functionality for ~ghexport~ can be tested independently of someone new to HPI trying to debug a configuration issue + - Is easier to combine additional data sources, like ~my.github.gdpr~, which includes additional data from the GDPR export + +- Cons: + - Leads to some code duplication, as you can no longer use helper functions from ~my.core~ in the new repository + - Additional boilerplate - instructions, installation scripts, testing. It's not required, but typically you want to leverage ~setuptools~ to allows ~pip install git+https...~ type installs, which are used in ~hpi module install~ + +Not all HPI Modules are currently at that level of complexity -- some are simple enough that one can understand the file by just reading it top to bottom. Some wouldn't make sense to split off into separate modules for one reason or another. + +A related concern is how to structure namespace packages to allow users to easily extend them, and how this conflicts with single file modules. Keep reading below for more information on namespace packages/extension. If a module is converted from a single file module to a namespace with multiple files, it seems this is a breaking change, see [[https://github.com/karlicoss/HPI/issues/89][#89]] for an example of this. + +#+html:
+ +* Adding new modules + + As always, if the changes you wish to make are small, or you just want to add a few modules, you can clone and edit an editable install of HPI. See [[file:SETUP.org][SETUP]] for more information + + The "proper way" (unless you want to contribute to the upstream) is to create a separate file hierarchy and add your module to =PYTHONPATH=. + +# TODO link to 'overlays' documentation? + You can check my own [[https://github.com/karlicoss/hpi-personal-overlay][personal overlay]] as a reference. + + For example, if you want to add an =awesomedatasource=, it could be: + + : custom_module + : └── my + : └──awesomedatasource.py + + You can use all existing HPI modules in =awesomedatasource.py=, including =my.config= and everything from =my.core=. + =hpi modules= or =hpi doctor= commands should also detect your extra modules. + +- In addition, you can *override* the builtin HPI modules too: + + : custom_reddit_overlay + : └── my + : └──reddit.py + + Now if you add =custom_reddit_overlay= *in front* of ~PYTHONPATH~, all the downstream scripts using =my.reddit= will load it from =custom_reddit_overlay= instead. + + This could be useful to monkey patch some behaviours, or dynamically add some extra data sources -- anything that comes to your mind. + You can check [[https://github.com/karlicoss/hpi-personal-overlay/blob/7fca8b1b6031bf418078da2d8be70fd81d2d8fa0/src/my/calendar/holidays.py#L1-L14][my.calendar.holidays]] in my personal overlay as a reference. + +** Namespace Packages + +Note: this section covers some of the complexities and benefits with this being a namespace package and/or editable install, so it assumes some familiarity with python/imports + +HPI is installed as a namespace package, which allows an additional way to add your own modules. For the details on namespace packges, see [[https://www.python.org/dev/peps/pep-0420/][PEP420]], or the [[https://packaging.python.org/guides/packaging-namespace-packages][packaging docs for a summary]], but for our use case, a sufficient description might be: Namespace packages let you split a package across multiple directories on disk. + +Without adding a bulky/boilerplate-y plugin framework to HPI, as that increases the barrier to entry, [[https://packaging.python.org/guides/creating-and-discovering-plugins/#using-namespace-packages][namespace packages offers an alternative]] with little downsides. + +Creating a separate file hierarchy still allows you to keep up to date with any changes from this repository by running ~git pull~ on your local clone of HPI periodically (assuming you've installed it as an editable package (~pip install -e .~)), while creating your own modules, and possibly overwriting any files you wish to override/overlay. + +In order to do that, like stated above, you could edit the ~PYTHONPATH~ variable, which in turn modifies your computed ~sys.path~, which is how python [[https://docs.python.org/3/library/sys.html?highlight=pythonpath#sys.path][determines the search path for modules]]. This is sort of what [[file:../with_my][with_my]] allows you to do. + +In the context of HPI, it being a namespace package means you can have a local clone of this repository, and your own 'HPI' modules in a separate folder, which then get combined into the ~my~ package. + +As an example, say you were trying to override the ~my.reddit~ file, to include some new feature. You could create a new file hierarchy like: + +: . +: ├── my +: │   ├── reddit.py +: │   └── some_new_module.py +: └── setup.py + +Where ~reddit.py~ is your version of ~my.reddit~, which you've copied from this repository and applied your changes to. The ~setup.py~ would be something like: + + #+begin_src python + from setuptools import setup, find_namespace_packages + + # should use a different name, + # so its possible to differentiate between HPI installs + setup( + name=f"my-HPI-overlay", + zip_safe=False, + packages=find_namespace_packages(".", include=("my*")), + ) + #+end_src + +Then, running ~pip3 install -e .~ in that directory would install that as part of the namespace package, and assuming (see below for possible issues) this appears on ~sys.path~ before the upstream repository, your ~reddit.py~ file overrides the upstream. Adding more files, like ~my.some_new_module~ into that directory immediately updates the global ~my~ package -- allowing you to quickly add new modules without having to re-install. + +If you install both directories as editable packages (which has the benefit of any changes you making in either repository immediately updating the globally installed ~my~ package), there are some concerns with which editable install appears on your ~sys.path~ first. If you wanted your modules to override the upstream modules, yours would have to appear on the ~sys.path~ first (this is the same reason that =custom_reddit_overlay= must be at the front of your ~PYTHONPATH~). For more details and examples on dealing with editable namespace packages in the context of HPI, see the [[https://github.com/seanbreckenridge/reorder_editable][reorder_editable]] repository. + +There is no limit to how many directories you could install into a single namespace package, which could be a possible way for people to install additional HPI modules, without worrying about the module count here becoming too large to manage. + +There are some other users [[https://github.com/hpi/hpi][who have begun publishing their own modules]] as namespace packages, which you could potentially install and use, in addition to this repository, if any of those interest you. + +Though, enabling this many modules may make ~hpi doctor~ look pretty busy. You can explicilty choose to enable/disable modules with a list of modules/regexes in your [[https://github.com/karlicoss/HPI/blob/f559e7cb899107538e6c6bbcf7576780604697ef/my/core/core_config.py#L24-L55][core config]], see [[https://github.com/seanbreckenridge/dotfiles/blob/a1a77c581de31bd55a6af3d11b8af588614a207e/.config/my/my/config/__init__.py#L42-L72][here]] for an example. + +You may use the other modules or [[https://github.com/karlicoss/hpi-personal-overlay][my overlay]] as reference, but python packaging is already a complicated issue, before adding complexities like namespace packages and editable installs on top of it... If you're having trouble extending HPI in this fashion, you can open an issue here, preferably with a link to your code/repository and/or ~setup.py~ you're trying to use. + +* An Extendable module structure + +In this context, 'overlay'/'override' means you create your own namespace package/file structure like described above, and since your files are in front of the upstream repository files in the computed ~sys.path~ (either by using namespace modules, the ~PYTHONPATH~ or ~with_my~), your file overrides the upstream repository + +This isn't set in stone, and is currently being discussed in multiple issues: [[https://github.com/karlicoss/HPI/issues/102][#102]], [[https://github.com/karlicoss/HPI/issues/89][#89]], [[https://github.com/karlicoss/HPI/issues/154][#154]] + +The main goals are: + +- low effort: ideally it should be a matter of a few lines of code to override something. +- good interop: e.g. ability to keep with the upstream, use modules coming from separate repositories, etc. +- ideally mypy friendly. This kind of means 'not too dynamic and magical', which is ultimately a good thing even if you don't care about mypy. + +# TODO: add example with overriding 'all' diff --git a/doc/SETUP.org b/doc/SETUP.org index df5eefc..149843f 100644 --- a/doc/SETUP.org +++ b/doc/SETUP.org @@ -40,13 +40,15 @@ You'd be really helping me, I want to make the setup as straightforward as possi * Few notes -I understand that people who'd like to use this may not be super familiar with Python, PIP or generally unix, so here are some useful notes: +I understand that people who'd like to use this may not be super familiar with Python, pip or generally unix, so here are some useful notes: - only ~python >= 3.6~ is supported - I'm using ~pip3~ command, but on your system you might only have ~pip~. If your ~pip --version~ says python 3, feel free to use ~pip~. +- If you have issues getting ~pip~ or ~pip3~ to work, it may be worth invoking the module instead using a fully qualified path, like ~python3 -m pip~ (e.g. ~python3 -m pip install --user ..~) + - similarly, I'm using =python3= in the documentation, but if your =python --version= says python3, it's okay to use =python= - when you are using ~pip install~, [[https://stackoverflow.com/a/42989020/706389][always pass]] =--user=, and *never install third party packages with sudo* (unless you know what you are doing) @@ -97,7 +99,7 @@ This is less convenient, but gives you more control. The benefit of this way is that you get a bit more control, explicitly allowing your scripts to use your data. ** appendix: optional packages -You can also install some opional packages +You can also install some optional packages : pip3 install 'HPI[optional]' @@ -105,6 +107,7 @@ They aren't necessary, but will improve your experience. At the moment these are - [[https://github.com/karlicoss/cachew][cachew]]: automatic caching library, which can greatly speedup data access - [[https://github.com/metachris/logzero][logzero]]: a nice logging library, supporting colors +- [[https://github.com/ijl/orjson][orjson]]: a library for serializing data to JSON, used in ~my.core.serialize~ and the ~hpi query~ interface - [[https://github.com/python/mypy][mypy]]: mypy is used for checking configs and troubleshooting * Setting up modules @@ -123,6 +126,7 @@ If you're not planning to use private configuration (some modules don't need it) The configuration usually contains paths to the data on your disk, and some modules have extra settings. The config is simply a *python package* (named =my.config=), expected to be in =~/.config/my=. +If you'd like to change the location of the =my.config= directory, you can set the =MY_CONFIG= environment variable. e.g. in your .bashrc add: ~export MY_CONFIG=$HOME/.my/~ Since it's a Python package, generally it's very *flexible* and there are many ways to set it up. @@ -196,7 +200,7 @@ If you experience issues, feel free to report, but please attach your: - OS version - python version: =python3 --version= - HPI version: =pip3 show HPI= -- if you see some exception, attach a full log (just make suer there is not private information in it) +- if you see some exception, attach a full log (just make sure there is not private information in it) - if you think it can help, attach screenshots ** common issues @@ -453,34 +457,14 @@ Also check out [[https://beepb00p.xyz/myinfra.html#hpi][my personal infrastructu - The easiest is just to clone HPI repository and run an editable PIP install (=pip3 install --user -e .=), or via [[#use-without-installing][with_my]] wrapper. - After theat you can just edit the code directly, your changes will be reflected immediately, and you will be able to quickly iterate/fix bugs/add new methods. + After that you can just edit the code directly, your changes will be reflected immediately, and you will be able to quickly iterate/fix bugs/add new methods. + + This is great if you just want to add a few of your own personal modules, or make minimal changes to a few files. If you do much more than that, you may run into possible merge conflicts if/when you update (~git pull~) HPI # TODO eh. doesn't even have to be in 'my' namespace?? need to check it - The "proper way" (unless you want to contribute to the upstream) is to create a separate file hierarchy and add your module to =PYTHONPATH=. - You can check my own [[https://github.com/karlicoss/hpi-personal-overlay][personal overlay]] as a reference. - - For example, if you want to add an =awesomedatasource=, it could be: - - : custom_module - : └── my - : └──awesomedatasource.py - - You can use all existing HPI modules in =awesomedatasource.py=, including =my.config= and everything from =my.core=. - =hpi modules= or =hpi doctor= commands should also detect your extra modules. - -- In addition, you can *override* the builtin HPI modules too: - - : custom_reddit_overlay - : └── my - : └──reddit.py - - Now if you add =custom_reddit_overlay= *in front* of ~PYTHONPATH~, all the downstream scripts using =my.reddit= will load it from =custom_reddit_overlay= instead. - - This could be useful to monkey patch some behaviours, or dynamically add some extra data sources -- anything that comes to your mind. - You can check [[https://github.com/karlicoss/hpi-personal-overlay/blob/7fca8b1b6031bf418078da2d8be70fd81d2d8fa0/src/my/calendar/holidays.py#L1-L14][my.calendar.holidays]] in my personal overlay as a reference. - -I'll put up a better guide on this, in the meantime see [[https://packaging.python.org/guides/packaging-namespace-packages]["namespace packages"]] for more info. - -# TODO add example with overriding 'all' - + # hmmm seems to be no obvious way to link to a header in a separate file, + # if you want this in both emacs and how github renders org mode + # https://github.com/karlicoss/HPI/pull/160#issuecomment-817318076 + See [[file:MODULE_DESIGN.org#addingmodules][MODULE_DESIGN/adding modules]] for more information