update references to my.reddit across the docs

This commit is contained in:
Sean Breckenridge 2021-10-29 00:12:42 -07:00
parent 6a170b4c10
commit f6c6bed42d
4 changed files with 12 additions and 16 deletions

View file

@ -35,9 +35,9 @@ You simply 'import' your data and get to work with familiar Python types and dat
- Here's a short example to give you an idea: "which subreddits I find the most interesting?" - Here's a short example to give you an idea: "which subreddits I find the most interesting?"
#+begin_src python #+begin_src python
import my.reddit import my.reddit.all
from collections import Counter from collections import Counter
return Counter(s.subreddit for s in my.reddit.saved()).most_common(4) return Counter(s.subreddit for s in my.reddit.all.saved()).most_common(4)
#+end_src #+end_src
| orgmode | 62 | | orgmode | 62 |

View file

@ -74,7 +74,6 @@ import importlib
modules = [ modules = [
('google' , 'my.google.takeout.paths'), ('google' , 'my.google.takeout.paths'),
('hypothesis' , 'my.hypothesis' ), ('hypothesis' , 'my.hypothesis' ),
('reddit' , 'my.reddit' ),
('pocket' , 'my.pocket' ), ('pocket' , 'my.pocket' ),
('twint' , 'my.twitter.twint' ), ('twint' , 'my.twitter.twint' ),
('twitter_archive', 'my.twitter.archive' ), ('twitter_archive', 'my.twitter.archive' ),

View file

@ -76,11 +76,11 @@ A related concern is how to structure namespace packages to allow users to easil
- In addition, you can *override* the builtin HPI modules too: - In addition, you can *override* the builtin HPI modules too:
: custom_reddit_overlay : custom_lastfm_overlay
: └── my : └── my
: └──reddit.py : └──lastfm.py
Now if you add =custom_reddit_overlay= *in front* of ~PYTHONPATH~, all the downstream scripts using =my.reddit= will load it from =custom_reddit_overlay= instead. Now if you add =custom_lastfm_overlay= [[https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH][*in front* of ~PYTHONPATH~]], all the downstream scripts using =my.lastfm= will load it from =custom_lastfm_overlay= instead.
This could be useful to monkey patch some behaviours, or dynamically add some extra data sources -- anything that comes to your mind. This could be useful to monkey patch some behaviours, or dynamically add some extra data sources -- anything that comes to your mind.
You can check [[https://github.com/karlicoss/hpi-personal-overlay/blob/7fca8b1b6031bf418078da2d8be70fd81d2d8fa0/src/my/calendar/holidays.py#L1-L14][my.calendar.holidays]] in my personal overlay as a reference. You can check [[https://github.com/karlicoss/hpi-personal-overlay/blob/7fca8b1b6031bf418078da2d8be70fd81d2d8fa0/src/my/calendar/holidays.py#L1-L14][my.calendar.holidays]] in my personal overlay as a reference.
@ -99,15 +99,15 @@ In order to do that, like stated above, you could edit the ~PYTHONPATH~ variable
In the context of HPI, it being a namespace package means you can have a local clone of this repository, and your own 'HPI' modules in a separate folder, which then get combined into the ~my~ package. In the context of HPI, it being a namespace package means you can have a local clone of this repository, and your own 'HPI' modules in a separate folder, which then get combined into the ~my~ package.
As an example, say you were trying to override the ~my.reddit~ file, to include some new feature. You could create a new file hierarchy like: As an example, say you were trying to override the ~my.lastfm~ file, to include some new feature. You could create a new file hierarchy like:
: . : .
: ├── my : ├── my
: │   ├── reddit.py : │   ├── lastfm.py
: │   └── some_new_module.py : │   └── some_new_module.py
: └── setup.py : └── setup.py
Where ~reddit.py~ is your version of ~my.reddit~, which you've copied from this repository and applied your changes to. The ~setup.py~ would be something like: Where ~lastfm.py~ is your version of ~my.lastfm~, which you've copied from this repository and applied your changes to. The ~setup.py~ would be something like:
#+begin_src python #+begin_src python
from setuptools import setup, find_namespace_packages from setuptools import setup, find_namespace_packages
@ -121,9 +121,9 @@ Where ~reddit.py~ is your version of ~my.reddit~, which you've copied from this
) )
#+end_src #+end_src
Then, running ~pip3 install -e .~ in that directory would install that as part of the namespace package, and assuming (see below for possible issues) this appears on ~sys.path~ before the upstream repository, your ~reddit.py~ file overrides the upstream. Adding more files, like ~my.some_new_module~ into that directory immediately updates the global ~my~ package -- allowing you to quickly add new modules without having to re-install. Then, running ~python3 -m pip install -e .~ in that directory would install that as part of the namespace package, and assuming (see below for possible issues) this appears on ~sys.path~ before the upstream repository, your ~lastfm.py~ file overrides the upstream. Adding more files, like ~my.some_new_module~ into that directory immediately updates the global ~my~ package -- allowing you to quickly add new modules without having to re-install.
If you install both directories as editable packages (which has the benefit of any changes you making in either repository immediately updating the globally installed ~my~ package), there are some concerns with which editable install appears on your ~sys.path~ first. If you wanted your modules to override the upstream modules, yours would have to appear on the ~sys.path~ first (this is the same reason that =custom_reddit_overlay= must be at the front of your ~PYTHONPATH~). For more details and examples on dealing with editable namespace packages in the context of HPI, see the [[https://github.com/seanbreckenridge/reorder_editable][reorder_editable]] repository. If you install both directories as editable packages (which has the benefit of any changes you making in either repository immediately updating the globally installed ~my~ package), there are some concerns with which editable install appears on your ~sys.path~ first. If you wanted your modules to override the upstream modules, yours would have to appear on the ~sys.path~ first (this is the same reason that =custom_lastfm_overlay= must be at the front of your ~PYTHONPATH~). For more details and examples on dealing with editable namespace packages in the context of HPI, see the [[https://github.com/seanbreckenridge/reorder_editable][reorder_editable]] repository.
There is no limit to how many directories you could install into a single namespace package, which could be a possible way for people to install additional HPI modules, without worrying about the module count here becoming too large to manage. There is no limit to how many directories you could install into a single namespace package, which could be a possible way for people to install additional HPI modules, without worrying about the module count here becoming too large to manage.

View file

@ -355,7 +355,7 @@ The only thing you need to do is to tell it where to find the files on your disk
Reddit has a proper API, so in theory HPI could talk directly to Reddit and retrieve the latest data. But that's not what it doing! Reddit has a proper API, so in theory HPI could talk directly to Reddit and retrieve the latest data. But that's not what it doing!
- first, there are excellent programmatic APIs for Reddit out there already, for example, [[https://github.com/praw-dev/praw][praw]] - first, there are excellent programmatic APIs for Reddit out there already, for example, [[https://github.com/praw-dev/praw][praw]]
- more importantly, this is the [[https://beepb00p.xyz/exports.html#design][design decision]] of HP - more importantly, this is the [[https://beepb00p.xyz/exports.html#design][design decision]] of HPI
It doesn't deal with all with the complexities of API interactions. It doesn't deal with all with the complexities of API interactions.
Instead, it relies on other tools to put *intermediate, raw data*, on your disk and then transforms this data into something nice. Instead, it relies on other tools to put *intermediate, raw data*, on your disk and then transforms this data into something nice.
@ -368,16 +368,13 @@ As an example, for [[file:../my/reddit.py][Reddit]], HPI is relying on data fetc
: ⇓⇓⇓ : ⇓⇓⇓
: |💾 /backups/reddit/*.json | : |💾 /backups/reddit/*.json |
: ⇓⇓⇓ : ⇓⇓⇓
: HPI (my.reddit) : HPI (my.reddit.rexport)
: ⇓⇓⇓ : ⇓⇓⇓
: < python interface > : < python interface >
So, in your [[file:MODULES.org::#myreddit][reddit config]], similarly to Takeout, you need =export_path=, so HPI knows how to find your Reddit data on the disk. So, in your [[file:MODULES.org::#myreddit][reddit config]], similarly to Takeout, you need =export_path=, so HPI knows how to find your Reddit data on the disk.
But there is an extra caveat: rexport is already coming with nice [[https://github.com/karlicoss/rexport/blob/master/dal.py][data bindings]] to parse its outputs. But there is an extra caveat: rexport is already coming with nice [[https://github.com/karlicoss/rexport/blob/master/dal.py][data bindings]] to parse its outputs.
Another *design decision* of HPI is to use existing code and libraries as much as possible, so we also specify a path to =rexport= repository in the config.
(note: in the future it's possible that rexport will be installed via PIP, I just haven't had time for it so far).
Several other HPI modules are following a similar pattern: hypothesis, instapaper, pinboard, kobo, etc. Several other HPI modules are following a similar pattern: hypothesis, instapaper, pinboard, kobo, etc.