14 KiB
- Installing (editable install)
- Testing runtime behaviour (editable install)
- Mypy support (editable install)
- What if we don't install at all?
- Modifying (monkey patching) original module in the overlay
NOTE this kinda overlaps with the module design doc, should be unified in the future.
Relevant discussion about overlays: https://github.com/karlicoss/HPI/issues/102
Consider a toy package/module structure with minimal code, without any actual data parsing, just for demonstration purposes.
-
main
package structuremy/twitter/gdpr.py
Extracts Twitter data from GDPR archive.my/twitter/all.py
Merges twitter data from multiple sources (onlygdpr
in this case), so data consumers are agnostic of specific data sources used. This will be overridden byoverlay
.my/twitter/common.py
Contains helper function to merge data, so they can be reused by overlay'sall.py
.my/reddit.py
Extracts Reddit data – this won't be overridden by the overlay, we just keep it for demonstration purposes.
-
overlay
package structuremy/twitter/talon.py
Extracts Twitter data from Talon android app.my/twitter/all.py
Override forall.py
frommain
package – it merges together data fromgpdr
andtalon
modules.
Installing (editable install)
NOTE: this was tested with python 3.10
and pip 23.3.2
.
To install, we run:
pip3 install --user -e overlay/ pip3 install --user -e main/
As a result, we get:
pip3 list | grep hpi hpi-main 0.0.0 /project/main/src hpi-overlay 0.0.0 /project/overlay/src
cat ~/.local/lib/python3.10/site-packages/easy-install.pth /project/overlay/src /project/main/src
(the order above is important, so overlay
takes precedence over main
TODO link)
Verify the setup:
$ python3 -c 'import my; print(my.__path__)' _NamespacePath(['/project/overlay/src/my', '/project/main/src/my'])
This basically means that modules will be searched in both paths, with overlay taking precedence.
Installing with --use-pep517
See here for discussion https://github.com/purarue/reorder_editable/issues/2, but TLDR it should work similarly.
Testing runtime behaviour (editable install)
$ python3 -c 'import my.reddit as R; print(R.upvotes())' [main] my.reddit hello ['reddit upvote1', 'reddit upvote2']
Just as expected here, my.reddit
is imported from the main
package, since it doesn't exist in overlay
.
Let's theck twitter now:
$ python3 -c 'import my.twitter.all as T; print(T.tweets())' [overlay] my.twitter.all hello [main] my.twitter.common hello [main] my.twitter.gdpr hello [overlay] my.twitter.talon hello ['gdpr tweet 1', 'gdpr tweet 2', 'talon tweet 1', 'talon tweet 2']
As expected, my.twitter.all
was imported from the overlay
.
As you can see it's merged data from gdpr
(from main
package) and talon
(from overlay
package).
So far so good, let's see how it works with mypy.
Mypy support (editable install)
To check that mypy works as expected I injected some statements in modules that have no impact on runtime,
but should trigger mypy, like this trigger_mypy_error: str = 123
:
Let's run it:
$ mypy --namespace-packages --strict -p my overlay/src/my/twitter/talon.py:9: error: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment] trigger_mypy_error: str = 123 ^ Found 1 error in 1 file (checked 4 source files)
Hmm, this did find the statement in the overlay
, but missed everything from main
(e.g. reddit.py
and gdpr.py
should have also triggered the check).
First, let's check which sources mypy is processing:
$ mypy --namespace-packages --strict -p my -v 2>&1 | grep BuildSource LOG: Found source: BuildSource(path='/project/overlay/src/my', module='my', has_text=False, base_dir=None) LOG: Found source: BuildSource(path='/project/overlay/src/my/twitter', module='my.twitter', has_text=False, base_dir=None) LOG: Found source: BuildSource(path='/project/overlay/src/my/twitter/all.py', module='my.twitter.all', has_text=False, base_dir=None) LOG: Found source: BuildSource(path='/project/overlay/src/my/twitter/talon.py', module='my.twitter.talon', has_text=False, base_dir=None)
So seems like mypy is not processing anything from main
package at all?
At this point I cloned mypy, put a breakpoint, and found out this is the culprit: https://github.com/python/mypy/blob/1dd8e7fe654991b01bd80ef7f1f675d9e3910c3a/mypy/modulefinder.py#L288
This basically returns the first path where it finds my
package, which happens to be the overlay in this case.
So everything else is ignored?
It even seems to have a test for a similar usecase, which is quite sad. https://github.com/python/mypy/blob/1dd8e7fe654991b01bd80ef7f1f675d9e3910c3a/mypy/test/testmodulefinder.py#L64-L71
For now, I opened an issue in mypy repository https://github.com/python/mypy/issues/16683
But ok, maybe mypy treats main
as an external package somehow but still type checks it properly?
Let's see what's going on with imports:
$ mypy --namespace-packages --strict -p my --follow-imports=error overlay/src/my/twitter/talon.py:9: error: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment] trigger_mypy_error: str = 123 ^ overlay/src/my/twitter/all.py:3: error: Import of "my.twitter.common" ignored [misc] from .common import merge ^ overlay/src/my/twitter/all.py:6: error: Import of "my.twitter.gdpr" ignored [misc] from . import gdpr ^ overlay/src/my/twitter/all.py:6: note: (Using --follow-imports=error, module not passed on command line) overlay/src/my/twitter/all.py: note: In function "tweets": overlay/src/my/twitter/all.py:8: error: Returning Any from function declared to return "List[str]" [no-any-return] return merge(gdpr, talon) ^ Found 4 errors in 2 files (checked 4 source files)
Nope – looks like it's completely unawareof main
, and what's worst, by default (without tweaking --follow-imports
), these errors would be suppressed.
What if we check my.twitter
directly?
$ mypy --namespace-packages --strict -p my.twitter --follow-imports=error overlay/src/my/twitter/talon.py:9: error: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment] trigger_mypy_error: str = 123 ^~~ overlay/src/my/twitter: error: Ancestor package "my" ignored [misc] overlay/src/my/twitter: note: (Using --follow-imports=error, submodule passed on command line) overlay/src/my/twitter/all.py:3: error: Import of "my.twitter.common" ignored [misc] from .common import merge ^ overlay/src/my/twitter/all.py:3: note: (Using --follow-imports=error, module not passed on command line) overlay/src/my/twitter/all.py:6: error: Import of "my.twitter.gdpr" ignored [misc] from . import gdpr ^ overlay/src/my/twitter/all.py: note: In function "tweets": overlay/src/my/twitter/all.py:8: error: Returning Any from function declared to return "list[str]" [no-any-return] return merge(gdpr, talon) ^~~~~~~~~~~~~~~~~~~~~~~~~ Found 5 errors in 3 files (checked 3 source files)
Now we're also getting error: Ancestor package "my" ignored [misc]
.. not ideal.
What if we don't install at all?
Instead of editable install let's try running mypy directly over source files
First let's only check main
package:
$ MYPYPATH=main/src mypy --namespace-packages --strict -p my main/src/my/twitter/gdpr.py:9: error: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment] trigger_mypy_error: str = 123 ^~~ main/src/my/reddit.py:11: error: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment] trigger_mypy_error: str = 123 ^~~ Found 2 errors in 2 files (checked 6 source files)
As expected, it found both errors.
Now with overlay as well:
$ MYPYPATH=overlay/src:main/src mypy --namespace-packages --strict -p my overlay/src/my/twitter/all.py:6: note: In module imported here: main/src/my/twitter/gdpr.py:9: error: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment] trigger_mypy_error: str = 123 ^~~ overlay/src/my/twitter/talon.py:9: error: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment] trigger_mypy_error: str = 123 ^~~ Found 2 errors in 2 files (checked 4 source files)
Interesting enough, this is slightly better than the editable install (it detected error in gdpr.py
as well).
But still no reddit.py
error.
TODO possibly worth submitting to mypy issue tracker as well…
Overall it seems that properly type checking HPI setup as a whole is kinda problematic, especially if the modules actually override/extend base modules.
Modifying (monkey patching) original module in the overlay
Let's say we want to modify/monkey patch my.twitter.talon
module from main
, for example, convert "gdpr" to uppercase, i.e. tweet.replace('gdpr', 'GDPR')
.
I think our options are:
-
symlink to the 'parent' packages, e.g.
main
in the caseAlternatively, somehow install
main
under a different name/alias (managed by pip).This is discussed here: https://github.com/karlicoss/HPI/issues/102
The main upside is that it's relatively simple and (sort of works with mypy).
There are a few big downsides:
- creates a parallel package hierarchy (to the one maintained by pip), symlinks will need to be carefully managed manually This may not be such a huge deal if you don't have too many overlays. However this results in problems if you're trying to switch between two different HPI checkouts (e.g. stable and development). If you have symlinks into "stable" from the overlay then stable modules will sometimes be picked up when you're expecting "development" package.
- symlinks pointing outside of the source tree might cause pip install to go into infinite loop
- it modifies the package name This may potentially result in some confusing behaviours. One thing I noticed for example is that cachew caches might get duplicated.
- it might not work in all cases or might result in recursive imports
-
do not shadow the original module
Basically instead of shadowing via namespace package mechanism and creating identically named module, create some sort of hook that would patch the original
my.twitter.talon
module frommain
.The downside is that it's a bit unclear where to do that, we need some sort of entry point?
- it could be some global dynamic hook defined in the overlay, and then executed from
my.core
However, it's a bit intrusive, and unclear how to handle errors. E.g. what if we're monkey patching a module that we weren't intending to use, don't have dependencies installed and it's crashing? Perhaps core could support something like_hook
in each of HPI's modules? Note that it can't bemy.twitter.all
, since we might want to override.all
itself. The downside is is this probably not going to work well withtmp_config
and such – we'll need to somehow execute the hook again on reloading the module? -
ideally we'd have something that integrates with
importlib
and executed automatically when module is imported?TODO explore these:
- https://stackoverflow.com/questions/43571737/how-to-implement-an-import-hook-that-can-modify-the-source-code-on-the-fly-using
-
https://github.com/brettlangdon/importhook
This one is pretty intrusive, and has some issues, e.g. https://github.com/brettlangdon/importhook/issues/4
Let's try it:
$ PYTHONPATH=overlay3/src:main/src python3 -c 'import my.twitter._hook; import my.twitter.all as M; print(M.tweets())' [main] my.twitter.all hello [main] my.twitter.common hello [main] my.twitter.gdpr hello EXECUTING IMPORT HOOK! ['GDPR tweet 1', 'GDPR tweet 2']
Ok it worked, and seems pretty neat. However sadly it doesn't work with
tmp_config
(TODO add a proper demo?) Not sure if it's more of an issue withtmp_config
implementation (which is very hacky), orimporthook
itself?
In addition, still the question is where to put the hook itself, but in that case even a global one could be fine.
-
define hook in
my/twitter/__init__.py
Basically, use
extend_path
to make it behave like a namespace package, but in addition, patch originalmy.twitter.talon
?$ cat overlay2/src/my/twitter/__init__.py print(f'[overlay2] {__name__} hello') from pkgutil import extend_path __path__ = extend_path(__path__, __name__) def hack_gdpr_module() -> None: from . import gdpr tweets_orig = gdpr.tweets def tweets_patched(): return [t.replace('gdpr', 'GDPR') for t in tweets_orig()] gdpr.tweets = tweets_patched hack_gdpr_module()
This actually seems to work??
PYTHONPATH=overlay2/src:main/src python3 -c 'import my.twitter.all as M; print(M.tweets())' [overlay2] my.twitter hello [main] my.twitter.gdpr hello [main] my.twitter.all hello [main] my.twitter.common hello ['GDPR tweet 1', 'GDPR tweet 2']
However, this doesn't stack, i.e. if the 'parent' overlay had its own
__init__.py
, it won't get called.
- it could be some global dynamic hook defined in the overlay, and then executed from
- shadow the original module and temporarily modify
__path__
before importing the same module from the parent overlay This approach is implemented inmy.core.experimental.import_original_module
TODO demonstrate it properly, but I think that also works in a 'chain' of overlays Seems like that option is the most promising so far, albeit very hacky.
Note that none of these options work well with mypy (since it's all dynamic hackery), even if you disregard the issues described in the previous sections.