denylist docs/installation instructions
This commit is contained in:
parent
8971a3915e
commit
6d4724ffbc
3 changed files with 130 additions and 100 deletions
128
doc/DENYLIST.md
Normal file
128
doc/DENYLIST.md
Normal file
|
@ -0,0 +1,128 @@
|
||||||
|
For code reference, see: [`my.core.denylist.py`](../my/core/denylist.py)
|
||||||
|
|
||||||
|
A helper module for defining denylists for sources programmatically (in layman's terms, this lets you remove some particular output from a module you don't want)
|
||||||
|
|
||||||
|
Lets you specify a class, an attribute to match on,
|
||||||
|
and a JSON file containing a list of values to deny/filter out
|
||||||
|
|
||||||
|
As an example, this will use the `my.ip` module, as filtering incorrect IPs was the original use case for this module:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class IP(NamedTuple):
|
||||||
|
addr: str
|
||||||
|
dt: datetime
|
||||||
|
```
|
||||||
|
|
||||||
|
A possible denylist file would contain:
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"addr": "192.168.1.1",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"dt": "2020-06-02T03:12:00+00:00",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that if the value being compared to is not a single (non-array/object) JSON primitive
|
||||||
|
(str, int, float, bool, None), it will be converted to a string before comparison
|
||||||
|
|
||||||
|
To use this in code:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from my.ip.all import ips
|
||||||
|
filtered = DenyList("~/data/ip_denylist.json").filter(ips())
|
||||||
|
```
|
||||||
|
|
||||||
|
To add items to the denylist, in python (in a one-off script):
|
||||||
|
|
||||||
|
```python
|
||||||
|
from my.ip.all import ips
|
||||||
|
from my.core.denylist import DenyList
|
||||||
|
|
||||||
|
d = DenyList("~/data/ip_denylist.json")
|
||||||
|
|
||||||
|
for ip in ips():
|
||||||
|
# some custom code you define
|
||||||
|
if ip.addr == ...:
|
||||||
|
d.deny(key="ip", value=ip.ip)
|
||||||
|
d.write()
|
||||||
|
```
|
||||||
|
|
||||||
|
... or interactively, which requires [`fzf`](https://github.com/junegunn/fzf) and [`pyfzf-iter`](https://pypi.org/project/pyfzf-iter/) (`python3 -m pip install pyfzf-iter`) to be installed:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from my.ip.all import ips
|
||||||
|
from my.core.denylist import DenyList
|
||||||
|
|
||||||
|
d = DenyList("~/data/ip_denylist.json")
|
||||||
|
d.deny_cli(ips())
|
||||||
|
d.write()
|
||||||
|
```
|
||||||
|
|
||||||
|
That will open up an interactive `fzf` prompt, where you can select items to add to the denylist
|
||||||
|
|
||||||
|
This is meant for relatively simple filters, where you want to filter items out
|
||||||
|
based on a single attribute of a namedtuple/dataclass. If you want to do something
|
||||||
|
more complex, I would recommend overriding the `all.py` file for that source and
|
||||||
|
writing your own filter function there.
|
||||||
|
|
||||||
|
For more info on all.py:
|
||||||
|
|
||||||
|
https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#allpy
|
||||||
|
|
||||||
|
This would typically be used in an overriden `all.py` file, or in a one-off script
|
||||||
|
which you may want to filter out some items from a source, progressively adding more
|
||||||
|
items to the denylist as you go.
|
||||||
|
|
||||||
|
A potential `my/ip/all.py` file might look like:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from typing import Iterator
|
||||||
|
|
||||||
|
from my.ip.common import IP
|
||||||
|
from my.core.denylist import DenyList
|
||||||
|
|
||||||
|
deny = DenyList("~/data/ip_denylist.json")
|
||||||
|
|
||||||
|
def _ips() -> Iterator[IP]:
|
||||||
|
from my.ip import discord
|
||||||
|
|
||||||
|
yield from discord.ips()
|
||||||
|
|
||||||
|
|
||||||
|
def ips() -> Iterator[IP]:
|
||||||
|
yield from deny.filter(_ips())
|
||||||
|
```
|
||||||
|
|
||||||
|
To add items to the denylist, you could create a `__main__.py` in your namespace package (in this case, `my/ip/__main__.py`), with contents like:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from my.ip import all
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
all.deny.deny_cli(all.ips())
|
||||||
|
```
|
||||||
|
|
||||||
|
Which could then be called like: `python3 -m my.ip`
|
||||||
|
|
||||||
|
Or, you could just run it from the command line:
|
||||||
|
|
||||||
|
```
|
||||||
|
python3 -c 'from my.ip import all; all.deny.deny_cli(all.ips())'
|
||||||
|
```
|
||||||
|
|
||||||
|
To edit the `all.py`, you could either:
|
||||||
|
|
||||||
|
- install it as editable (`python3 -m pip install --user -e ./HPI`), and then edit the file directly
|
||||||
|
- or, create a namespace package, which splits the package across multiple directories. For info on that see [`MODULE_DESIGN`](https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#namespace-packages), [`reorder_editable`](https://github.com/seanbreckenridge/reorder_editable), and possibly the [`HPI-template`](https://github.com/seanbreckenridge/HPI-template) to create your own HPI namespace package to create your own `all.py` file.
|
||||||
|
|
||||||
|
TODO: link to seanbreckenridge/HPI-personal for an example of this once this is merged/settled
|
||||||
|
|
||||||
|
Sidenote: the reason why we want to specifically override
|
||||||
|
the all.py and not just create a script that filters out the items you're
|
||||||
|
not interested in is because we want to be able to import from `my.ip.all`
|
||||||
|
or `my.location.all` from other modules and get the filtered results, without
|
||||||
|
having to mix data filtering logic with parsing/loading/caching (the stuff HPI does)
|
|
@ -226,8 +226,7 @@ The main goals are:
|
||||||
- doesn't require you to maintain a fork of this repository, though you can maintain a separate HPI repository (so no patching/merge conflicts)
|
- doesn't require you to maintain a fork of this repository, though you can maintain a separate HPI repository (so no patching/merge conflicts)
|
||||||
- allows you to easily add/remove sources to the ~all.py~ module, either by:
|
- allows you to easily add/remove sources to the ~all.py~ module, either by:
|
||||||
- overriding an ~all.py~ in your own repository
|
- overriding an ~all.py~ in your own repository
|
||||||
- just commenting out the source/adding 2 lines to import and ~yield
|
- just commenting out the source/adding 2 lines to import and ~yield from~ your new source
|
||||||
from~ your new source
|
|
||||||
- doing nothing! (~import_source~ will catch the error and just warn you
|
- doing nothing! (~import_source~ will catch the error and just warn you
|
||||||
and continue to work without changing any code)
|
and continue to work without changing any code)
|
||||||
|
|
||||||
|
|
|
@ -1,105 +1,8 @@
|
||||||
"""
|
"""
|
||||||
TODO: move this to doc/DENYLIST ?
|
|
||||||
|
|
||||||
A helper module for defining denylists for sources programatically
|
A helper module for defining denylists for sources programatically
|
||||||
(in lamens terms, this lets you remove some output from a module you don't want)
|
(in lamens terms, this lets you remove some output from a module you don't want)
|
||||||
|
|
||||||
Lets you specify a class, an attribute to match on,
|
For docs, see doc/DENYLIST.md
|
||||||
and a json file containing a list of values to deny/filter out
|
|
||||||
|
|
||||||
As an example, for a class like this:
|
|
||||||
|
|
||||||
class IP(NamedTuple):
|
|
||||||
ip: str
|
|
||||||
dt: datetime
|
|
||||||
|
|
||||||
A possible denylist file would contain:
|
|
||||||
|
|
||||||
[
|
|
||||||
{
|
|
||||||
"ip": "192.168.1.1",
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"dt": "2020-06-02T03:12:00+00:00",
|
|
||||||
}
|
|
||||||
]
|
|
||||||
|
|
||||||
Note that if the value being compared to is not a single (non-array/object) JSON primitive
|
|
||||||
(str, int, float, bool, None), it will be converted to a string before comparison
|
|
||||||
|
|
||||||
To use this in code:
|
|
||||||
|
|
||||||
```
|
|
||||||
from my.ip.all import ips
|
|
||||||
filtered = DenyList("~/data/ip_denylist.json").filter(ips())
|
|
||||||
```
|
|
||||||
|
|
||||||
To add items to the denylist, in python (in a one-off script):
|
|
||||||
|
|
||||||
```
|
|
||||||
from my.ip.all import ips
|
|
||||||
from my.core.denylist import DenyList
|
|
||||||
|
|
||||||
d = DenyList("~/data/ip_denylist.json")
|
|
||||||
|
|
||||||
for ip in ips():
|
|
||||||
# some custom code you define
|
|
||||||
if ip.ip == ...:
|
|
||||||
d.deny(key="ip", value=ip.ip)
|
|
||||||
d.write()
|
|
||||||
```
|
|
||||||
|
|
||||||
... or interactively, which requires `fzf` to be installed, after running
|
|
||||||
|
|
||||||
```
|
|
||||||
from my.ip.all import ips
|
|
||||||
from my.core.denylist import DenyList
|
|
||||||
|
|
||||||
d = DenyList("~/data/ip_denylist.json")
|
|
||||||
d.deny_cli(ips())
|
|
||||||
d.write()
|
|
||||||
```
|
|
||||||
|
|
||||||
This is meant for relatively simple filters, where you want to filter out
|
|
||||||
based on a single attribute of a namedtuple/dataclass. If you want to do something
|
|
||||||
more complex, I would recommend overriding the all.py file for that source and
|
|
||||||
writing your own filter function there.
|
|
||||||
|
|
||||||
For more info on all.py:
|
|
||||||
https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#allpy
|
|
||||||
|
|
||||||
This would typically be used in an overriden all.py file, or in a one-off script
|
|
||||||
which you may want to filter out some items from a source, progressively adding more
|
|
||||||
items to the denylist as you go.
|
|
||||||
|
|
||||||
A potential my/ip/all.py file might look like:
|
|
||||||
|
|
||||||
```
|
|
||||||
from typing import Iterator
|
|
||||||
|
|
||||||
from my.ip.common import IP # type: ignore[import]
|
|
||||||
from my.core.denylist import DenyList
|
|
||||||
|
|
||||||
deny = DenyList("~/data/ip_denylist.json")
|
|
||||||
|
|
||||||
def ips() -> Iterator[IP]:
|
|
||||||
from my.ip import discord
|
|
||||||
|
|
||||||
yield from deny.filter(discord.ips())
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
To add items to the denylist, you could create a __main__.py file, or:
|
|
||||||
|
|
||||||
```
|
|
||||||
python3 -c 'from my.ip import all; all.deny.deny_cli(all.ips())'
|
|
||||||
```
|
|
||||||
|
|
||||||
Sidenote: the reason why we want to specifically override
|
|
||||||
the all.py and not just create a script that filters out the items you're
|
|
||||||
not interested in is because we want to be able to import from `my.ip.all`
|
|
||||||
or `my.location.all` from other modules and get the filtered results, without
|
|
||||||
having to mix data filtering logic with parsing/loading/caching (the stuff HPI does)
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import sys
|
import sys
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue