my.github: some work in progress on generating consistent ids

sadly it seems that there are at several issues:

- gdpr has less detailed data so it's hard to generate a proper ID at times
- sometimes there is a small (1s?) discrepancy between created_at between same event in GDPR an API
- some API events can have duplicate payload, but different id, which violates uniqueness
This commit is contained in:
Dima Gerasimov 2021-04-02 19:42:12 +01:00 committed by karlicoss
parent 386234970b
commit 5ef2775265
3 changed files with 38 additions and 10 deletions

View file

@ -1,6 +1,9 @@
"""
Github events and their metadata: comments/issues/pull requests
"""
from ..core import __NOT_HPI_MODULE__
from datetime import datetime
from typing import Optional, NamedTuple, Iterable, Set, Tuple
@ -48,4 +51,12 @@ def parse_dt(s: str) -> datetime:
return pytz.utc.localize(datetime.strptime(s, '%Y-%m-%dT%H:%M:%SZ'))
from ..core import __NOT_HPI_MODULE__
# experimental way of supportint event ids... not sure
class EventIds:
@staticmethod
def repo_created(*, dts: str, name: str, ref_type: str, ref: Optional[str]) -> str:
return f'{dts}_repocreated_{name}_{ref_type}_{ref}'
@staticmethod
def pr(*, dts: str, action: str, url: str) -> str:
return f'{dts}_pr{action}_{url}'