4.3 KiB
For code reference, see: my.core.denylist.py
A helper module for defining denylists for sources programmatically (in layman's terms, this lets you remove some particular output from a module you don't want)
Lets you specify a class, an attribute to match on, and a JSON file containing a list of values to deny/filter out
As an example, this will use the my.ip
module, as filtering incorrect IPs was the original use case for this module:
class IP(NamedTuple):
addr: str
dt: datetime
A possible denylist file would contain:
[
{
"addr": "192.168.1.1",
},
{
"dt": "2020-06-02T03:12:00+00:00",
}
]
Note that if the value being compared to is not a single (non-array/object) JSON primitive (str, int, float, bool, None), it will be converted to a string before comparison
To use this in code:
from my.ip.all import ips
filtered = DenyList("~/data/ip_denylist.json").filter(ips())
To add items to the denylist, in python (in a one-off script):
from my.ip.all import ips
from my.core.denylist import DenyList
d = DenyList("~/data/ip_denylist.json")
for ip in ips():
# some custom code you define
if ip.addr == ...:
d.deny(key="ip", value=ip.ip)
d.write()
... or interactively, which requires fzf
and pyfzf-iter
(python3 -m pip install pyfzf-iter
) to be installed:
from my.ip.all import ips
from my.core.denylist import DenyList
d = DenyList("~/data/ip_denylist.json")
d.deny_cli(ips()) # automatically writes after each selection
That will open up an interactive fzf
prompt, where you can select an item to add to the denylist
This is meant for relatively simple filters, where you want to filter items out
based on a single attribute of a namedtuple/dataclass. If you want to do something
more complex, I would recommend overriding the all.py
file for that source and
writing your own filter function there.
For more info on all.py:
https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#allpy
This would typically be used in an overridden all.py
file, or in a one-off script
which you may want to filter out some items from a source, progressively adding more
items to the denylist as you go.
A potential my/ip/all.py
file might look like (Sidenote: discord
module from here):
from typing import Iterator
from my.ip.common import IP
from my.core.denylist import DenyList
deny = DenyList("~/data/ip_denylist.json")
# all possible data from the source
def _ips() -> Iterator[IP]:
from my.ip import discord
# could add other imports here
yield from discord.ips()
# filtered data
def ips() -> Iterator[IP]:
yield from deny.filter(_ips())
To add items to the denylist, you could create a __main__.py
in your namespace package (in this case, my/ip/__main__.py
), with contents like:
from my.ip import all
if __name__ == "__main__":
all.deny.deny_cli(all.ips())
Which could then be called like: python3 -m my.ip
Or, you could just run it from the command line:
python3 -c 'from my.ip import all; all.deny.deny_cli(all.ips())'
To edit the all.py
, you could either:
- install it as editable (
python3 -m pip install --user -e ./HPI
), and then edit the file directly - or, create a namespace package, which splits the package across multiple directories. For info on that see
MODULE_DESIGN
,reorder_editable
, and possibly theHPI-template
to create your own HPI namespace package to create your ownall.py
file.
For a real example of this see, purarue/HPI-personal
Sidenote: the reason why we want to specifically override
the all.py and not just create a script that filters out the items you're
not interested in is because we want to be able to import from my.ip.all
or my.location.all
from other modules and get the filtered results, without
having to mix data filtering logic with parsing/loading/caching (the stuff HPI does)