docs: add documentation on logging during HPI module development

2023-10-28 22:01:50 +01:00 · 2023-10-28 22:01:50 +01:00 · bd27bd4c24
commit bd27bd4c24
parent f668208bce
1 changed files with 91 additions and 9 deletions
--- a/doc/MODULE_DESIGN.org
+++ b/doc/MODULE_DESIGN.org
@ -2,6 +2,19 @@ Some thoughts on modules, how to structure them, and adding your own/extending H
 This is slightly more advanced, and would be useful if you're trying to extend HPI by developing your own modules, or contributing back to HPI
 * TOC
 :PROPERTIES:
 :TOC:      :include all :depth 1 :force (nothing) :ignore (this) :local (nothing)
 :END:
 :CONTENTS:
 - [[#allpy][all.py]]
 - [[#module-count][module count]]
 - [[#single-file-modules][single file modules]]
 - [[#adding-new-modules][Adding new modules]]
 - [[#an-extendable-module-structure][An Extendable module structure]]
 - [[#logging-guidelines][Logging guidelines]]
 :END:
 * all.py
 Some modules have lots of different sources for data. For example, ~my.location~ (location data) has lots of possible sources -- from ~my.google.takeout.parser~, using the ~gpslogger~ android app, or through geo locating ~my.ip~ addresses. For a module with multiple possible sources, its common to split it into files like:
@ -234,16 +247,85 @@ It could be argued that namespace packages and editable installs are a bit compl
 There's no requirement to use this for individual modules, it just seems to be the best solution we've arrived at so far
-* Logging
+* Logging guidelines
 HPI doesn't enforce any specific logging mechanism, you're free to use whatever you prefer in your modules.
-The ~my.core~ module exports a ~make_logger~ function which works nicely with
+However there are some general guidelines for developing modules that can make them more pleasant to use.
 ~cachew~ and gives you colored logs. You can use it like this:
-#+begin_src python
+- each module should have its unique logger, the easiest way to ensure that is simply use module's ~__name__~ attribute as the logger name
  In addition, this ensures the logger hierarchy reflect the package hierarchy.
  For instance, if you initialize the logger for =my.module= with specific settings, the logger for =my.module.helper= would inherit these settings. See more on that [[ https://docs.python.org/3/library/logging.html?highlight=logging#logger-objects][in python docs]].
  As a bonus, if you use the module ~__name__~, this logger will be automatically be picked up and used by ~cachew~.
 - often modules are processing multiple files, extracting data from each one ([[https://beepb00p.xyz/exports.html#types][incremental/synthetic exports]])
  It's nice to log each file name you're processing as =logger.info= so the user of module gets a sense of progress.
  If possible, add the index of file you're processing and the total count.
  #+begin_src python
  def process_all_data():
      paths = inputs()
      total = len(paths)
      width = len(str(total))
      for idx, path in enumerate(paths):
          # :>{width} to align the logs vertically
          logger.info(f'processing [{idx:>{width}}/{total:>{width}}] {path}')
          yield from process_path(path)
  #+end_src
  If there is a lot of logging happening related to a specific path, instead of adding path to each logging message manually, consider using [[https://docs.python.org/3/library/logging.html?highlight=loggeradapter#logging.LoggerAdapter][LoggerAdapter]].
 - log exceptions, but sparingly
  Generally it's a good practice to call ~logging.exception~ from the ~except~ clause, so it's immediately visible where the errors are happening.
  However, in HPI, instead of crashing on exceptions we often behave defensively and ~yield~ them instead (see [[https://beepb00p.xyz/mypy-error-handling.html][mypy assisted error handling]]).
  In this case logging every time may become a bit spammy, so use exception logging sparingly in this case.
  Typically it's best to rely on the downstream data consumer to handle the exceptions properly.
 - instead of =logging.getLogger=, it's best to use =my.core.make_logger=
  #+begin_src python
      from my.core import make_logger
      logger = make_logger(__name__)
      # or to set a custom level
      logger = make_logger(__name__, level='warning')
-#+end_src
+  #+end_src
  This sets up some nicer defaults over standard =logging= module:
  - colored logs (via =colorlog= library)
  - =INFO= as the initial logging level (instead of default =ERROR=)
  - logging full exception trace when even when logging outside of the exception handler
    This is particularly useful for [[https://beepb00p.xyz/mypy-error-handling.html][mypy assisted error handling]].
    By default, =logging= only logs the exception message (without the trace) in this case, which makes errors harder to debug.
  - control logging level from the shell via ~LOGGING_LEVEL_*~ env variable
    This can be useful to suppress logging output if it's too spammy, or showing more output for debugging.
    E.g. ~LOGGING_LEVEL_my_instagram_gdpr=DEBUG hpi query my.instagram.gdpr.messages~
  - experimental: passing env variable ~LOGGING_COLLAPSE=<loglevel>~ will "collapse" logging with the same level
    Instead of printing new logging line each time, it will 'redraw' the last logged line with a new logging message.
    This can be convenient if there are too many logs, you just need logging to get a sense of progress.
  - experimental: passing env variable ~ENLIGHTEN_ENABLE=yes~ will display TUI progress bars in some cases
    See [[https://github.com/Rockhopper-Technologies/enlighten#readme][https://github.com/Rockhopper-Technologies/enlighten#readme]]
    This can be convenient for showing the progress of parallel processing of different files from HPI:
    #+BEGIN_EXAMPLE
      ghexport.dal[111]  29%|████████████████████                          |  29/100 [00:03<00:07, 10.03 files/s]
      rexport.dal[comments]  17%|████████                                  | 115/682 [00:03<00:14, 39.15 files/s]
      my.instagram.android   0%|▎                                          |    3/2631 [00:02<34:50, 1.26 files/s]
    #+END_EXAMPLE