added --warn-exceptions (like --raise-exceptions/--drop-exceptions, but lets you pass a warn_func if you want to customize how the exceptions are handled. By default this creates a logger in main and logs the exception added dateparser as a fallback if its installed (it's not a strong dependency, but I mentioned in the docs that it's useful for parsing dates/times) added docs for query, and a few examples --output gpx respects the --{drop,warn,raise}--exceptions flags, have an example of that in the docs as well
301 lines
15 KiB
Markdown
301 lines
15 KiB
Markdown
`hpi query` is a command line tool for querying the output of any `hpi` function.
|
|
|
|
```
|
|
Usage: hpi query [OPTIONS] FUNCTION_NAME...
|
|
|
|
This allows you to query the results from one or more functions in HPI
|
|
|
|
By default this runs with '-o json', converting the results to JSON and
|
|
printing them to STDOUT
|
|
|
|
You can specify '-o pprint' to just print the objects using their repr, or
|
|
'-o repl' to drop into a ipython shell with access to the results
|
|
|
|
While filtering using --order-key datetime, the --after, --before and
|
|
--within flags parse the input to their datetime and timedelta equivalents.
|
|
datetimes can be epoch time, the string 'now', or an date formatted in the
|
|
ISO format. timedelta (durations) are parsed from a similar format to the
|
|
GNU 'sleep' command, e.g. 1w2d8h5m20s -> 1 week, 2 days, 8 hours, 5 minutes,
|
|
20 seconds
|
|
|
|
As an example, to query reddit comments I've made in the last month
|
|
|
|
hpi query --order-type datetime --before now --within 4w my.reddit.all.comments
|
|
or...
|
|
hpi query --recent 4w my.reddit.all.comments
|
|
|
|
Can also query within a range. To filter comments between 2016 and 2018:
|
|
hpi query --order-type datetime --after '2016-01-01' --before '2019-01-01' my.reddit.all.comments
|
|
|
|
Options:
|
|
-o, --output [json|pprint|repl|gpx]
|
|
what to do with the result [default: json]
|
|
-s, --stream stream objects from the data source instead
|
|
of printing a list at the end
|
|
-k, --order-key TEXT order by an object attribute or dict key on
|
|
the individual objects returned by the HPI
|
|
function
|
|
-t, --order-type [datetime|date|int|float]
|
|
order by searching for some type on the
|
|
iterable
|
|
-a, --after TEXT while ordering, filter items for the key or
|
|
type larger than or equal to this
|
|
-b, --before TEXT while ordering, filter items for the key or
|
|
type smaller than this
|
|
-w, --within TEXT a range 'after' or 'before' to filter items
|
|
by. see above for further explanation
|
|
-r, --recent TEXT a shorthand for '--order-type datetime
|
|
--reverse --before now --within'. e.g.
|
|
--recent 5d
|
|
--reverse / --no-reverse reverse the results returned from the
|
|
functions
|
|
-l, --limit INTEGER limit the number of items returned from the
|
|
(functions)
|
|
--drop-unsorted if the order of an item can't be determined
|
|
while ordering, drop those items from the
|
|
results
|
|
--wrap-unsorted if the order of an item can't be determined
|
|
while ordering, wrap them into an
|
|
'Unsortable' object
|
|
--warn-exceptions if any errors are returned, print them as
|
|
errors on STDERR
|
|
--raise-exceptions if any errors are returned (as objects, not
|
|
raised) from the functions, raise them
|
|
--drop-exceptions ignore any errors returned as objects from
|
|
the functions
|
|
```
|
|
|
|
This works with any function which returns an iterable, for example `my.coding.commits`, which searches for `git commit`s on your computer:
|
|
|
|
```bash
|
|
hpi query my.coding.commits
|
|
```
|
|
|
|
When run with a module, this does some analysis of the functions in that module and tries to find ones that look like data sources. If it can't figure out which, it prompts you like:
|
|
|
|
```
|
|
Which function should be used from 'my.coding.commits'?
|
|
|
|
1. commits
|
|
2. repos
|
|
```
|
|
|
|
You select the one you want by clicking `1` or `2` on your keyboard. Otherwise, you can provide a fully qualified path, like:
|
|
|
|
```
|
|
hpi query my.coding.commits.repos
|
|
```
|
|
|
|
The corresponding `repos` function this queries is defined in [`my/coding/commits.py`](../my/coding/commits.py)
|
|
|
|
### Ordering/Filtering/Streaming
|
|
|
|
By default, this just returns the items in the order they were returned by the function. This allows you to filter by specifying a `--order-key`, or `--order-type`. For example, to get the 10 most recent commits. `--order-type datetime` will try to automatically figure out which attribute to use. If it chooses the wrong one (since `Commit`s have both a `committed_dt` and `authored_dt`), you could tell it which to use. For example, to scan my computer and find the most recent commit I made:
|
|
|
|
```
|
|
hpi query my.coding.commits.commits --order-key committed_dt --limit 1 --reverse --output pprint --stream
|
|
Commit(committed_dt=datetime.datetime(2023, 4, 14, 23, 9, 1, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200))),
|
|
authored_dt=datetime.datetime(2023, 4, 14, 23, 4, 1, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200))),
|
|
message='sources.smscalls: propogate errors if there are breaking '
|
|
'schema changes',
|
|
repo='/home/sean/Repos/promnesia-fork',
|
|
sha='22a434fca9a28df9b0915ccf16368df129d2c9ce',
|
|
ref='refs/heads/smscalls-handle-result')
|
|
```
|
|
|
|
To instead limit in some range, you can use `--before` and `--within` to filter by a range. For example, to get all the commits I committed in the last day:
|
|
|
|
```
|
|
hpi query my.coding.commits.commits --order-type datetime --before now --within 1d
|
|
```
|
|
|
|
That prints a a list of `Commit` as JSON objects. You could also use `--output pprint` to pretty-print the objects or `--output repl` drop into a REPL.
|
|
|
|
To process the JSON, you can pipe it to [`jq`](https://github.com/stedolan/jq). I often use `jq length` to get the count of some output:
|
|
|
|
```
|
|
hpi query my.coding.commits.commits --order-type datetime --before now --within 1d | jq length
|
|
6
|
|
```
|
|
|
|
Because that is such a common use case, the `--recent` flag is a shorthand for `--order-type datetime --reverse --before now --within`. The same as above:
|
|
|
|
```
|
|
hpi query my.coding.commits.commits --recent 1d | jq length
|
|
6
|
|
```
|
|
|
|
To select a range of commits, you can use `--after` and `--before`, passing ISO or epoch timestamps. Those can be full `datetimes` (`2021-01-01T00:05:30`) or just dates (`2021-01-01`). For example, to get all the commits I made on January 1st, 2021:
|
|
|
|
```
|
|
hpi query my.coding.commits.commits --order-type datetime --after 2021-01-01 --before 2021-01-02 | jq length
|
|
1
|
|
```
|
|
|
|
If you have [`dateparser`](https://github.com/scrapinghub/dateparser#how-to-use) installed, this supports dozens more natural language formats:
|
|
|
|
```
|
|
hpi query my.coding.commits.commits --order-type datetime --after 'last week' --before 'day before yesterday' | jq length
|
|
28
|
|
```
|
|
|
|
If you're having issues ordering because there are exceptions in your results not all data is sortable (may have `None` for some attributes), you can use `--drop-unsorted` to drop those items from the results, or `--drop-exceptions` to remove the exceptions
|
|
|
|
You can also stream the results, which is useful for functions that take a while to process or have a lot of data. For example, if you wanted to pick a sha hash from a particular repo, you could combine `jq` to `select` and pick that attribute from the JSON:
|
|
|
|
```
|
|
hpi query my.coding.commits.commits --recent 30d --stream | jq 'select(.repo | contains("HPI"))' | jq '.sha' -r
|
|
4afa899c8b365b3c10e468f6279c02e316d3b650
|
|
40de162fab741df594b4d9651348ee46ee021e9b
|
|
e1cb229913482074dc5523e57ef0acf6e9ec2bb2
|
|
87c13defd131e39292b93dcea661d3191222dace
|
|
02c738594f2cae36ca4fab43cf9533fe6aa89396
|
|
0b3a2a6ef3a9e4992771aaea0252fb28217b814a
|
|
84817ce72d208038b66f634d4ceb6e3a4c7ec5e9
|
|
47992b8e046d27fc5141839179f06f925c159510
|
|
425615614bd508e28ccceb56f43c692240e429ab
|
|
eed8f949460d768fb1f1c4801e9abab58a5f9021
|
|
d26ad7d9ce6a4718f96346b994c3c1cd0d74380c
|
|
aec517e53c6ac022f2b4cc91261daab5651cebf0
|
|
44b75a88fdfc7af132f61905232877031ce32fcb
|
|
b0ff6f29dd2846e97f8aa85a2ca73736b03254a8
|
|
```
|
|
|
|
`select` acts on a stream of JSON objects, not a list, so it filters as the objects are generated. The alternative would be to print the entire JSON list at the end, like:
|
|
|
|
`hpi query my.coding.commits.commits --recent 30d | jq '.[] | select(.repo | contains("Repos/HPI"))' | jq '.sha' -r`, using `jq '.[]'` to convert the JSON list into a stream of JSON objects.
|
|
|
|
## Usage on non-HPI code
|
|
|
|
The command can accept any qualified function name, so this could for example be used to check the output of [`promnesia`](https://github.com/karlicoss/promnesia) commands:
|
|
|
|
```
|
|
hpi query promnesia.sources.smscalls | jq length
|
|
371
|
|
```
|
|
|
|
## GPX
|
|
|
|
The `hpi query` command can also be used with the `--output gpx` flag to generate GPX files from a list of locations, like the ones defined in the `my.location` package. This could be used to extract some date range and create a `gpx` file which can then be visualized by a GUI application.
|
|
|
|
This prints the contents for the `GPX` file to STDOUT, and prints warnings for any objects it could not convert to locations to STDERR, so pipe STDOUT to a output file, like `>out.gpx`
|
|
|
|
```
|
|
hpi query my.location.all --after '2021-07-01T00:00:00' --before '2021-07-05T00:00:00' --order-type datetime --output gpx >out.gpx
|
|
```
|
|
|
|
If you want to ignore any errors, you can use `--drop-exceptions`.
|
|
|
|
To preview, you can use something like [`qgis`](https://qgis.org/en/site/) or for something easier more lightweight, [`gpxsee`](https://github.com/tumic0/GPXSee):
|
|
|
|
`gpxsee out.gpx`:
|
|
|
|
TODO: add image here
|
|
|
|
(Sidenote: this is [`@seanbreckenridge`](https://github.com/seanbreckenridge/)s locations, on a trip to Chicago)
|
|
|
|
## Python reference
|
|
|
|
The `hpi query` command is a CLI wrapper around the code in [`query.py`](../my/core/query.py) and [`query_range.py`](../my/core/query_range.py). The `select` function is the core of this, and `select_range` lets you specify dates, timedelta, start-end ranges, and other CLI-specific code.
|
|
|
|
`select`:
|
|
|
|
```
|
|
A function to query, order, sort and filter items from one or more sources
|
|
This supports iterables and lists of mixed types (including handling errors),
|
|
by allowing you to provide custom predicates (functions) which can sort
|
|
by a function, an attribute, dict key, or by the attributes values.
|
|
|
|
Since this supports mixed types, there's always a possibility
|
|
of KeyErrors or AttributeErrors while trying to find some value to order by,
|
|
so this provides multiple mechanisms to deal with that
|
|
|
|
'where' lets you filter items before ordering, to remove possible errors
|
|
or filter the iterator by some condition
|
|
|
|
There are multiple ways to instruct select on how to order items. The most
|
|
flexible is to provide an 'order_by' function, which takes an item in the
|
|
iterator, does any custom checks you may want and then returns the value to sort by
|
|
|
|
'order_key' is best used on items which have a similar structure, or have
|
|
the same attribute name for every item in the iterator. If you have a
|
|
iterator of objects whose datetime is accessed by the 'timestamp' attribute,
|
|
supplying order_key='timestamp' would sort by that (dictionary or attribute) key
|
|
|
|
'order_value' is the most confusing, but often the most useful. Instead of
|
|
testing against the keys of an item, this allows you to write a predicate
|
|
(function) to test against its values (dictionary, NamedTuple, dataclass, object).
|
|
If you had an iterator of mixed types and wanted to sort by the datetime,
|
|
but the attribute to access the datetime is different on each type, you can
|
|
provide `order_value=lambda v: isinstance(v, datetime)`, and this will
|
|
try to find that value for each type in the iterator, to sort it by
|
|
the value which is received when the predicate is true
|
|
|
|
'order_value' is often used in the 'hpi query' interface, because of its brevity.
|
|
Just given the input function, this can typically sort it by timestamp with
|
|
no human intervention. It can sort of be thought as an educated guess,
|
|
but it can always be improved by providing a more complete guess function
|
|
|
|
Note that 'order_value' is also the most computationally expensive, as it has
|
|
to copy the iterator in memory (using itertools.tee) to determine how to order it
|
|
in memory
|
|
|
|
The 'drop_exceptions', 'raise_exceptions', 'warn_exceptions' let you ignore or raise
|
|
when the src contains exceptions. The 'warn_func' lets you provide a custom function
|
|
to call when an exception is encountered instead of using the 'warnings' module
|
|
|
|
src: an iterable of mixed types, or a function to be called,
|
|
as the input to this function
|
|
|
|
where: a predicate which filters the results before sorting
|
|
|
|
order_by: a function which when given an item in the src,
|
|
returns the value to sort by. Similar to the 'key' value
|
|
typically passed directly to 'sorted'
|
|
|
|
order_key: a string which represents a dict key or attribute name
|
|
to use as they key to sort by
|
|
|
|
order_value: predicate which determines which attribute on an ADT-like item to sort by,
|
|
when given its value. lambda o: isinstance(o, datetime) is commonly passed to sort
|
|
by datetime, without knowing the attributes or interface for the items in the src
|
|
|
|
default: while ordering, if the order for an object cannot be determined,
|
|
use this as the default value
|
|
|
|
reverse: reverse the order of the resulting iterable
|
|
|
|
limit: limit the results to this many items
|
|
|
|
drop_unsorted: before ordering, drop any items from the iterable for which a
|
|
order could not be determined. False by default
|
|
|
|
wrap_unsorted: before ordering, wrap any items into an 'Unsortable' object. Place
|
|
them at the front of the list. True by default
|
|
|
|
drop_exceptions: ignore any exceptions from the src
|
|
|
|
raise_exceptions: raise exceptions when received from the input src
|
|
```
|
|
|
|
`select_range`:
|
|
|
|
```
|
|
A specialized select function which offers generating functions
|
|
to filter/query ranges from an iterable
|
|
|
|
order_key and order_value are used in the same way they are in select
|
|
|
|
If you specify order_by_value_type, it tries to search for an attribute
|
|
on each object/type which has that type, ordering the iterable by that value
|
|
|
|
unparsed_range is a tuple of length 3, specifying 'after', 'before', 'duration',
|
|
i.e. some start point to allow the computed value we're ordering by, some
|
|
end point and a duration (can use the RangeTuple NamedTuple to construct one)
|
|
|
|
(this is typically parsed/created in my.core.__main__, from CLI flags
|
|
|
|
If you specify a range, drop_unsorted is forced to be True
|
|
```
|
|
|
|
Those can be imported and accept any sort of iterator, `hpi query` just defaults to the output of functions here. As an example, see [`listens`](https://github.com/seanbreckenridge/HPI-personal/blob/master/scripts/listens) which just passes an generator (iterator) as the first argument to `query_range`
|