Dima Gerasimov
c64d7f5b67
core: cleanup itertool style helpers
...
- deprecate group_by_key, should use itertool.bucket instead
- move make_dict and ensure_unique to my.core.utils.itertools
2024-08-16 10:22:29 +01:00
Dima Gerasimov
973c4205df
core: cleanup deprecations, exclude from type checking and show runtime warnings
...
among affected things:
- core.common.assert_never
- core.common.cproperty
- core.common.isoparse
- core.common.mcachew
- core.common.the
- core.common.tzdatetime
- core.compat.sqlite_backup
2024-08-16 10:22:29 +01:00
Dima Gerasimov
a7439c7846
general: move assert_never to my.core.compat as it's in stdlib from 3.11
...
rely on typing-extensions for fallback
introducing typing-extensions dependency without fallback, should be ok since it's in the top 10 of popular packages
2024-08-16 10:22:29 +01:00
Dima Gerasimov
1317914bff
general: add 'destructive parsing' (kinda what we were doing in my.core.konsume) to my.experimental
...
also some cleanup for my.codeforces and my.topcoder
2024-08-12 13:24:28 +01:00
Dima Gerasimov
1e1e8d8494
my.topcoder: get rid of kjson in favor of using builtin dict methods
2024-08-12 13:24:28 +01:00
Dima Gerasimov
069264ce52
core.common: get rid of deprecated utcfromtimestamp
2024-08-10 17:46:30 +01:00
Dima Gerasimov
34593c032d
tests: move more tests into core, more consistent tests running in tox
2024-08-07 01:08:39 +01:00
Dima Gerasimov
074e24c309
general: deprecate my.core.dataset and simplify tox file
2024-08-07 01:08:39 +01:00
Dima Gerasimov
fb8e9909a4
tests: simplify tests for my.core.serialize a bit and simplify tox file
2024-08-07 01:08:39 +01:00
Dima Gerasimov
3aebc573e8
tests: use updated conftest from pymplate, this allows to run individual test modules properly
...
e.g. pytest --pyargs my.core.tests.test_get_files
2024-08-06 20:55:16 +01:00
Dima Gerasimov
b615ba10b1
ci: temporary suppress pandas mypy error in check_dateish
2024-08-05 23:35:24 +01:00
Dima Gerasimov
0e6dd32afe
ci: minor fixes after mypy update
2024-08-03 16:18:32 +01:00
karlicoss
51209c547e
my.twitter.android: refactor into a proper module
...
for now only extracting bookmarks, will use it for some time and see how it goes
2023-12-24 00:49:07 +00:00
Dima Gerasimov
a843407e40
core/compat: move fromisoformat to .core.compat module
2023-11-19 23:45:08 +00:00
karlicoss
657ce08ac8
fix mypy issues after mypy/libraries updates
2023-11-10 22:59:09 +00:00
karlicoss
71cb66df5f
core: add helper for more_iterable to check that all types involved are hashable
...
Otherwise unique_everseen performance may degrade to quadratic rather than linear
For now hidden behind HPI_CHECK_UNIQUE_EVERSEEN flag
also switch some modules to use it
2023-10-31 01:02:17 +00:00
Dima Gerasimov
6821fbc2fe
core/config: implement a warning if config is imported from the dir other than MY_CONFIG
...
this should help with identifying setup issues
2023-10-28 20:56:07 +01:00
Dima Gerasimov
d88a1b9933
my.hypothesis: explose data as iterators instead of lists
...
also add an adapter to support migrating in backwards compatible manner
2023-10-28 20:06:54 +01:00
Dima Gerasimov
4f7c9b4a71
core: move split compat/legacy modules into hpi_compat and compat
2023-10-28 20:06:54 +01:00
karlicoss
70bf51a125
core/stats: exclude contextmanagers from guess_stats
2023-10-28 00:08:32 +01:00
Dima Gerasimov
32aa87b3ec
dcotor: make compileall check a bit more defensive
2023-10-27 02:38:22 +01:00
karlicoss
a0910e798d
core.logging: ignore CollapseLogsHandler if we're not attached to a terminal
...
otherwise fails at os.get_terminal_size
2023-10-25 02:42:52 +01:00
Dima Gerasimov
1f61e853c9
reddit.rexport: experiment with using optional cpu pool (used by all of HPI)
...
Enabled by the env variable, specifying how many cores to dedicate, e.g.
HPI_CPU_POOL=4 hpi query ...
2023-10-25 02:06:45 +01:00
karlicoss
86ea605aec
core/stats: enable processing input files, report first and last filename
...
can be useful for quick investigation/testing setup
2023-10-22 00:47:36 +01:00
karlicoss
c335c0c9d8
core/stats: report datetime of first item in addition to last
...
quite useful for quickly determining time span of a data source
2023-10-22 00:47:36 +01:00
karlicoss
a60d69fb30
core/stats: get rid of duplicated keys for 'auto stats'
...
previously:
```
{'iter_data': {'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}}
```
after
```
{'iter_data': {'count': 9, 'last': datetime.datetime(2020, 1, 3, 1, 1, 1)}}
```
2023-10-22 00:47:36 +01:00
karlicoss
c5fe2e9412
core.stats: fix is_data_provider when from __future__ import annotations is used
2023-10-21 23:46:40 +01:00
karlicoss
37bb33cdbc
experimental: add a hacky helper to import "original/shadowed" modules from within overlays
2023-10-21 22:46:16 +01:00
karlicoss
8c2d1c9463
general: use less explicit kompress boilerplate in modules
...
now get_files/kompress library can handle it transparently
2023-10-20 21:13:59 +01:00
karlicoss
c63e80ce94
core: more consistent handling of zip archives in get_files + tests
2023-10-20 21:13:59 +01:00
Dima Gerasimov
29832a9f75
core: fix test_get_files after updating kompress
2023-10-19 02:26:28 +01:00
karlicoss
fe26efaea8
core/kompress: move vendorized to _deprecated, use kompress library directly
2023-10-12 23:47:05 +01:00
karlicoss
bb478f369d
core/logging: no need for super call in Filter
2023-10-12 23:47:05 +01:00
karlicoss
68289c1be3
general: fix ignores after mypy version update
2023-10-12 23:47:05 +01:00
Dima Gerasimov
0512488241
ci: sync configs to pymplate
...
- add python3.12
- add ruff
2023-10-06 02:24:01 +01:00
Dima Gerasimov
01480ec8eb
core/logging: fix issue with logger setup called multiple times when called with different levels
...
should resolve https://github.com/karlicoss/HPI/issues/308
2023-09-19 22:39:52 +01:00
Sean Breckenridge
2a46341ce2
my.core.logging: compatibility with HPI_LOGS
...
re-adds a removed check for HPI_LOGS, add some docs
fix the checks for browserexport/takeout logs to
use the computed level from my.core.logging
2023-09-07 02:36:26 +01:00
Dima Gerasimov
c283e542e3
general: fix some issues after mypy update
2023-08-24 23:46:23 +01:00
Sean Breckenridge
fcaa7c1561
core/cli: allow user to bypass PEP 668
...
when installing dependencies with 'hpi module install',
this now lets a user pass '--break-system-packages' (or '-B'),
which passes the same option down to pip, to allow the user
to bypass PEP 668 and install packages that could possibly
conflict with system packages.
2023-08-10 01:41:43 +01:00
Dima Gerasimov
c25ab51664
core: some tweaks for better colour handling when we're redirecting stdout/stderr
2023-06-21 20:42:10 +01:00
Dima Gerasimov
661714f1d9
core/logging: overhaul and many improvements -- mainly to deprecate abandoned logzero
...
- generally saner/cleaner logger initialization
In particular now it doesn't override logging level specified by the user code prior to instantiating the logger.
Also remove the `LazyLogger` hack, doesn't seem like it's necessary when the above is implemented.
- get rid of `logzero` which is archived and abandoned now, use `colorlog` for coloured logging formatter
- allow configuring log level via shell via `LOGGING_LEVEL_module_name=<level>`
E.g. `LOGGING_LEVEL_rescuexport_dal=WARNING LOGGING_LEVEL_my_rescuetime=debug ./script.py`
- port `AddExceptionTraceback` from HPI/promnesia
- port `CollapseLogsHandler` from HPI/promnesia
Also allow configuring from the shell, e.g. `LOGGING_COLLAPSE=<level>`
- add support for `enlighten` progress bar, so it can be shared between different projects
See https://github.com/Rockhopper-Technologies/enlighten#readme
This allows nice CLI progressbars, e.g. for parallel processing of different files from HPI:
ghexport.dal[111] 29%|████████████████████████████████████████████████████████████████▏ | 29/100 [00:03<00:07, 10.03 files/s]
rexport.dal[comments] 17%|████████████████████████████████████▋ | 115/682 [00:03<00:14, 39.15 files/s]
my.instagram.android 0%|▎ | 3/2631 [00:02<34:50, 1.26 files/s]
Currently off by default, and hidden behind an env variable (`ENLIGHTEN_ENABLE=true`)
2023-06-21 18:42:15 +01:00
Dima Gerasimov
6aa3d4225e
sort out mypy after its update
2023-06-21 03:32:46 +01:00
Dima Gerasimov
ab7135d42f
core: experimental import of my._init_hook to configure logging/warnings/env variables
2023-06-21 03:32:46 +01:00
Dima Gerasimov
c91534b966
set json files to empty dicts so they are at least valid jsons
...
(promnesia was stumbling over these, seems like the easiest fix :) )
2023-06-09 03:31:13 +01:00
Dima Gerasimov
5fe21240b4
core: move mcachew into my.core.cachew; use better typing annotations (copied from cachew)
2023-06-08 01:29:49 +01:00
Dima Gerasimov
f8cd31044e
general: move reddit tests into my/tests + tweak my.core.cfg to be more reliable
2023-05-26 00:58:23 +01:00
Dima Gerasimov
9594caa1cd
general: move most core tests inside my.core.tests package
...
- distributes tests alongside the package, might be convenient for package users
- removes some weird indirection (e.g. dummy test files improting tests from modules)
- makes the command line for tests cleaner (e.g. no need to remember to manually add files to tox.ini)
- tests automatically covered by mypy (so makes mypy runs cleaner and ultimately better coverage)
The (vague) convention is
- tests/somemodule.py -- testing my.core.somemodule, contains tests directly re
- tests/test_something.py -- testing a specific feature, e.g. test_get_files.py tests get_files methon only
2023-05-25 00:25:13 +01:00
Dima Gerasimov
04d976f937
my/core/pandas tests: fix weird pytest error when constructing dataclass inside a def
...
can quickly reproduce by running pytest tests/tz.py tests/core/test_pandas.py
possibly will be resolved after fix in pytest?
see https://github.com/pytest-dev/pytest/issues/7856
2023-05-24 22:32:44 +01:00
Dima Gerasimov
a98bc6daca
my.core.pandas: rely on typing annotations from types-pandas
2023-05-24 22:32:44 +01:00
Dima Gerasimov
fe88380499
general: switch to using native 3.8 versions for cached_property/Literal/Protocol instead of compat
2023-05-16 01:18:30 +01:00