HPI/doc/DENYLIST.md
seanbreckenridge 98b086f746
location fallback (#263)
see https://github.com/karlicoss/HPI/issues/262

* move home to fallback/via_home.py
* move via_ip to fallback
* add fallback model
* add stub via_ip file
* add fallback_locations for via_ip
* use protocol for locations
* estimate_from helper, via_home estimator, all.py
* via_home: add accuracy, cache history
* add datasources to gpslogger/google_takeout
* tz/via_location.py: update import to fallback
* denylist docs/installation instructions
* tz.via_location: let user customize cachew refresh time
* add via_ip.estimate_location using binary search
* use estimate_location in via_home.get_location
* tests: add gpslogger to location config stub
* tests: install tz related libs in test env
* tz: add regression test for broken windows dates

* vendorize bisect_left from python src
doesnt have a 'key' parameter till python3.10
2023-02-28 04:30:06 +00:00

4.3 KiB

For code reference, see: my.core.denylist.py

A helper module for defining denylists for sources programmatically (in layman's terms, this lets you remove some particular output from a module you don't want)

Lets you specify a class, an attribute to match on, and a JSON file containing a list of values to deny/filter out

As an example, this will use the my.ip module, as filtering incorrect IPs was the original use case for this module:

class IP(NamedTuple):
    addr: str
    dt: datetime

A possible denylist file would contain:

[
    {
        "addr": "192.168.1.1",
    },
    {
        "dt": "2020-06-02T03:12:00+00:00",
    }
]

Note that if the value being compared to is not a single (non-array/object) JSON primitive (str, int, float, bool, None), it will be converted to a string before comparison

To use this in code:

from my.ip.all import ips
filtered = DenyList("~/data/ip_denylist.json").filter(ips())

To add items to the denylist, in python (in a one-off script):

from my.ip.all import ips
from my.core.denylist import DenyList

d = DenyList("~/data/ip_denylist.json")

for ip in ips():
    # some custom code you define
    if ip.addr == ...:
        d.deny(key="ip", value=ip.ip)
    d.write()

... or interactively, which requires fzf and pyfzf-iter (python3 -m pip install pyfzf-iter) to be installed:

from my.ip.all import ips
from my.core.denylist import DenyList

d = DenyList("~/data/ip_denylist.json")
d.deny_cli(ips())  # automatically writes after each selection

That will open up an interactive fzf prompt, where you can select an item to add to the denylist

This is meant for relatively simple filters, where you want to filter items out based on a single attribute of a namedtuple/dataclass. If you want to do something more complex, I would recommend overriding the all.py file for that source and writing your own filter function there.

For more info on all.py:

https://github.com/karlicoss/HPI/blob/master/doc/MODULE_DESIGN.org#allpy

This would typically be used in an overridden all.py file, or in a one-off script which you may want to filter out some items from a source, progressively adding more items to the denylist as you go.

A potential my/ip/all.py file might look like (Sidenote: discord module from here):

from typing import Iterator

from my.ip.common import IP
from my.core.denylist import DenyList

deny = DenyList("~/data/ip_denylist.json")

# all possible data from the source
def _ips() -> Iterator[IP]:
    from my.ip import discord
    # could add other imports here

    yield from discord.ips()


# filtered data
def ips() -> Iterator[IP]:
    yield from deny.filter(_ips())

To add items to the denylist, you could create a __main__.py in your namespace package (in this case, my/ip/__main__.py), with contents like:

from my.ip import all

if __name__ == "__main__":
    all.deny.deny_cli(all.ips())

Which could then be called like: python3 -m my.ip

Or, you could just run it from the command line:

python3 -c 'from my.ip import all; all.deny.deny_cli(all.ips())'

To edit the all.py, you could either:

  • install it as editable (python3 -m pip install --user -e ./HPI), and then edit the file directly
  • or, create a namespace package, which splits the package across multiple directories. For info on that see MODULE_DESIGN, reorder_editable, and possibly the HPI-template to create your own HPI namespace package to create your own all.py file.

TODO: link to seanbreckenridge/HPI-personal for an example of this once this is merged/settled

Sidenote: the reason why we want to specifically override the all.py and not just create a script that filters out the items you're not interested in is because we want to be able to import from my.ip.all or my.location.all from other modules and get the filtered results, without having to mix data filtering logic with parsing/loading/caching (the stuff HPI does)