From bd27bd4c24a1a726cdd9ac87b5e3b9ac76502acc Mon Sep 17 00:00:00 2001 From: karlicoss Date: Sat, 28 Oct 2023 22:01:50 +0100 Subject: [PATCH] docs: add documentation on logging during HPI module development --- doc/MODULE_DESIGN.org | 100 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 91 insertions(+), 9 deletions(-) diff --git a/doc/MODULE_DESIGN.org b/doc/MODULE_DESIGN.org index c0ab4f6..7aedf2f 100644 --- a/doc/MODULE_DESIGN.org +++ b/doc/MODULE_DESIGN.org @@ -2,6 +2,19 @@ Some thoughts on modules, how to structure them, and adding your own/extending H This is slightly more advanced, and would be useful if you're trying to extend HPI by developing your own modules, or contributing back to HPI +* TOC +:PROPERTIES: +:TOC: :include all :depth 1 :force (nothing) :ignore (this) :local (nothing) +:END: +:CONTENTS: +- [[#allpy][all.py]] +- [[#module-count][module count]] +- [[#single-file-modules][single file modules]] +- [[#adding-new-modules][Adding new modules]] +- [[#an-extendable-module-structure][An Extendable module structure]] +- [[#logging-guidelines][Logging guidelines]] +:END: + * all.py Some modules have lots of different sources for data. For example, ~my.location~ (location data) has lots of possible sources -- from ~my.google.takeout.parser~, using the ~gpslogger~ android app, or through geo locating ~my.ip~ addresses. For a module with multiple possible sources, its common to split it into files like: @@ -234,16 +247,85 @@ It could be argued that namespace packages and editable installs are a bit compl There's no requirement to use this for individual modules, it just seems to be the best solution we've arrived at so far -* Logging +* Logging guidelines +HPI doesn't enforce any specific logging mechanism, you're free to use whatever you prefer in your modules. -The ~my.core~ module exports a ~make_logger~ function which works nicely with -~cachew~ and gives you colored logs. You can use it like this: +However there are some general guidelines for developing modules that can make them more pleasant to use. -#+begin_src python - from my.core import make_logger +- each module should have its unique logger, the easiest way to ensure that is simply use module's ~__name__~ attribute as the logger name - logger = make_logger(__name__) + In addition, this ensures the logger hierarchy reflect the package hierarchy. + For instance, if you initialize the logger for =my.module= with specific settings, the logger for =my.module.helper= would inherit these settings. See more on that [[ https://docs.python.org/3/library/logging.html?highlight=logging#logger-objects][in python docs]]. - # or to set a custom level - logger = make_logger(__name__, level='warning') -#+end_src + As a bonus, if you use the module ~__name__~, this logger will be automatically be picked up and used by ~cachew~. + +- often modules are processing multiple files, extracting data from each one ([[https://beepb00p.xyz/exports.html#types][incremental/synthetic exports]]) + + It's nice to log each file name you're processing as =logger.info= so the user of module gets a sense of progress. + If possible, add the index of file you're processing and the total count. + + #+begin_src python + def process_all_data(): + paths = inputs() + total = len(paths) + width = len(str(total)) + for idx, path in enumerate(paths): + # :>{width} to align the logs vertically + logger.info(f'processing [{idx:>{width}}/{total:>{width}}] {path}') + yield from process_path(path) + #+end_src + + If there is a lot of logging happening related to a specific path, instead of adding path to each logging message manually, consider using [[https://docs.python.org/3/library/logging.html?highlight=loggeradapter#logging.LoggerAdapter][LoggerAdapter]]. + +- log exceptions, but sparingly + + Generally it's a good practice to call ~logging.exception~ from the ~except~ clause, so it's immediately visible where the errors are happening. + + However, in HPI, instead of crashing on exceptions we often behave defensively and ~yield~ them instead (see [[https://beepb00p.xyz/mypy-error-handling.html][mypy assisted error handling]]). + + In this case logging every time may become a bit spammy, so use exception logging sparingly in this case. + Typically it's best to rely on the downstream data consumer to handle the exceptions properly. + +- instead of =logging.getLogger=, it's best to use =my.core.make_logger= + + #+begin_src python + from my.core import make_logger + + logger = make_logger(__name__) + + # or to set a custom level + logger = make_logger(__name__, level='warning') + #+end_src + + This sets up some nicer defaults over standard =logging= module: + + - colored logs (via =colorlog= library) + - =INFO= as the initial logging level (instead of default =ERROR=) + - logging full exception trace when even when logging outside of the exception handler + + This is particularly useful for [[https://beepb00p.xyz/mypy-error-handling.html][mypy assisted error handling]]. + + By default, =logging= only logs the exception message (without the trace) in this case, which makes errors harder to debug. + - control logging level from the shell via ~LOGGING_LEVEL_*~ env variable + + This can be useful to suppress logging output if it's too spammy, or showing more output for debugging. + + E.g. ~LOGGING_LEVEL_my_instagram_gdpr=DEBUG hpi query my.instagram.gdpr.messages~ + + - experimental: passing env variable ~LOGGING_COLLAPSE=~ will "collapse" logging with the same level + + Instead of printing new logging line each time, it will 'redraw' the last logged line with a new logging message. + + This can be convenient if there are too many logs, you just need logging to get a sense of progress. + + - experimental: passing env variable ~ENLIGHTEN_ENABLE=yes~ will display TUI progress bars in some cases + + See [[https://github.com/Rockhopper-Technologies/enlighten#readme][https://github.com/Rockhopper-Technologies/enlighten#readme]] + + This can be convenient for showing the progress of parallel processing of different files from HPI: + + #+BEGIN_EXAMPLE + ghexport.dal[111] 29%|████████████████████ | 29/100 [00:03<00:07, 10.03 files/s] + rexport.dal[comments] 17%|████████ | 115/682 [00:03<00:14, 39.15 files/s] + my.instagram.android 0%|▎ | 3/2631 [00:02<34:50, 1.26 files/s] + #+END_EXAMPLE