diff --git a/doc/DENYLIST.md b/doc/DENYLIST.md new file mode 100644 index 0000000..b701a0a --- /dev/null +++ b/doc/DENYLIST.md @@ -0,0 +1,128 @@ +For code reference, see: [`my.core.denylist.py`](../my/core/denylist.py) + +A helper module for defining denylists for sources programmatically (in layman's terms, this lets you remove some particular output from a module you don't want) + +Lets you specify a class, an attribute to match on, +and a JSON file containing a list of values to deny/filter out + +As an example, this will use the `my.ip` module, as filtering incorrect IPs was the original use case for this module: + +```python +class IP(NamedTuple): + addr: str + dt: datetime +``` + +A possible denylist file would contain: + +```json +[ + { + "addr": "192.168.1.1", + }, + { + "dt": "2020-06-02T03:12:00+00:00", + } +] +``` + +Note that if the value being compared to is not a single (non-array/object) JSON primitive +(str, int, float, bool, None), it will be converted to a string before comparison + +To use this in code: + +```python +from my.ip.all import ips +filtered = DenyList("~/data/ip_denylist.json").filter(ips()) +``` + +To add items to the denylist, in python (in a one-off script): + +```python +from my.ip.all import ips +from my.core.denylist import DenyList + +d = DenyList("~/data/ip_denylist.json") + +for ip in ips(): + # some custom code you define + if ip.addr == ...: + d.deny(key="ip", value=ip.ip) + d.write() +``` + +... or interactively, which requires [`fzf`](https://github.com/junegunn/fzf) and [`pyfzf-iter`](https://pypi.org/project/pyfzf-iter/) (`python3 -m pip install pyfzf-iter`) to be installed: + +```python +from my.ip.all import ips +from my.core.denylist import DenyList + +d = DenyList("~/data/ip_denylist.json") +d.deny_cli(ips()) +d.write() +``` + +That will open up an interactive `fzf` prompt, where you can select items to add to the denylist + +This is meant for relatively simple filters, where you want to filter items out +based on a single attribute of a namedtuple/dataclass. If you want to do something +more complex, I would recommend overriding the `all.py` file for that source and +writing your own filter function there. + +For more info on all.py: + +https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#allpy + +This would typically be used in an overriden `all.py` file, or in a one-off script +which you may want to filter out some items from a source, progressively adding more +items to the denylist as you go. + +A potential `my/ip/all.py` file might look like: + +```python +from typing import Iterator + +from my.ip.common import IP +from my.core.denylist import DenyList + +deny = DenyList("~/data/ip_denylist.json") + +def _ips() -> Iterator[IP]: + from my.ip import discord + + yield from discord.ips() + + +def ips() -> Iterator[IP]: + yield from deny.filter(_ips()) +``` + +To add items to the denylist, you could create a `__main__.py` in your namespace package (in this case, `my/ip/__main__.py`), with contents like: + +```python +from my.ip import all + +if __name__ == "__main__": + all.deny.deny_cli(all.ips()) +``` + +Which could then be called like: `python3 -m my.ip` + +Or, you could just run it from the command line: + +``` +python3 -c 'from my.ip import all; all.deny.deny_cli(all.ips())' +``` + +To edit the `all.py`, you could either: + +- install it as editable (`python3 -m pip install --user -e ./HPI`), and then edit the file directly +- or, create a namespace package, which splits the package across multiple directories. For info on that see [`MODULE_DESIGN`](https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#namespace-packages), [`reorder_editable`](https://github.com/seanbreckenridge/reorder_editable), and possibly the [`HPI-template`](https://github.com/seanbreckenridge/HPI-template) to create your own HPI namespace package to create your own `all.py` file. + +TODO: link to seanbreckenridge/HPI-personal for an example of this once this is merged/settled + +Sidenote: the reason why we want to specifically override +the all.py and not just create a script that filters out the items you're +not interested in is because we want to be able to import from `my.ip.all` +or `my.location.all` from other modules and get the filtered results, without +having to mix data filtering logic with parsing/loading/caching (the stuff HPI does) diff --git a/doc/MODULE_DESIGN.org b/doc/MODULE_DESIGN.org index b17526d..691dd1c 100644 --- a/doc/MODULE_DESIGN.org +++ b/doc/MODULE_DESIGN.org @@ -226,8 +226,7 @@ The main goals are: - doesn't require you to maintain a fork of this repository, though you can maintain a separate HPI repository (so no patching/merge conflicts) - allows you to easily add/remove sources to the ~all.py~ module, either by: - overriding an ~all.py~ in your own repository - - just commenting out the source/adding 2 lines to import and ~yield - from~ your new source + - just commenting out the source/adding 2 lines to import and ~yield from~ your new source - doing nothing! (~import_source~ will catch the error and just warn you and continue to work without changing any code) diff --git a/my/core/denylist.py b/my/core/denylist.py index 4cf960d..fcf3e2b 100644 --- a/my/core/denylist.py +++ b/my/core/denylist.py @@ -1,105 +1,8 @@ """ -TODO: move this to doc/DENYLIST ? - A helper module for defining denylists for sources programatically (in lamens terms, this lets you remove some output from a module you don't want) -Lets you specify a class, an attribute to match on, -and a json file containing a list of values to deny/filter out - -As an example, for a class like this: - -class IP(NamedTuple): - ip: str - dt: datetime - -A possible denylist file would contain: - -[ - { - "ip": "192.168.1.1", - }, - { - "dt": "2020-06-02T03:12:00+00:00", - } -] - -Note that if the value being compared to is not a single (non-array/object) JSON primitive -(str, int, float, bool, None), it will be converted to a string before comparison - -To use this in code: - -``` -from my.ip.all import ips -filtered = DenyList("~/data/ip_denylist.json").filter(ips()) -``` - -To add items to the denylist, in python (in a one-off script): - -``` -from my.ip.all import ips -from my.core.denylist import DenyList - -d = DenyList("~/data/ip_denylist.json") - -for ip in ips(): - # some custom code you define - if ip.ip == ...: - d.deny(key="ip", value=ip.ip) - d.write() -``` - -... or interactively, which requires `fzf` to be installed, after running - -``` -from my.ip.all import ips -from my.core.denylist import DenyList - -d = DenyList("~/data/ip_denylist.json") -d.deny_cli(ips()) -d.write() -``` - -This is meant for relatively simple filters, where you want to filter out -based on a single attribute of a namedtuple/dataclass. If you want to do something -more complex, I would recommend overriding the all.py file for that source and -writing your own filter function there. - -For more info on all.py: -https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#allpy - -This would typically be used in an overriden all.py file, or in a one-off script -which you may want to filter out some items from a source, progressively adding more -items to the denylist as you go. - -A potential my/ip/all.py file might look like: - -``` -from typing import Iterator - -from my.ip.common import IP # type: ignore[import] -from my.core.denylist import DenyList - -deny = DenyList("~/data/ip_denylist.json") - -def ips() -> Iterator[IP]: - from my.ip import discord - - yield from deny.filter(discord.ips()) -``` - - -To add items to the denylist, you could create a __main__.py file, or: - -``` -python3 -c 'from my.ip import all; all.deny.deny_cli(all.ips())' -``` - -Sidenote: the reason why we want to specifically override -the all.py and not just create a script that filters out the items you're -not interested in is because we want to be able to import from `my.ip.all` -or `my.location.all` from other modules and get the filtered results, without -having to mix data filtering logic with parsing/loading/caching (the stuff HPI does) +For docs, see doc/DENYLIST.md """ import sys