- we can't just look at the len of the string (~ #bytes) - that breaks down for
tables containing characters consisting of multiple bytes. This handles
more (still not all) cases and is good enough for now
- add _ to allowed tag chars - also require space between headline and tags
- links (link itself, not the description) spanning multiple lines are not
supported - otherwise we would have to take care of splitting link and adding
indentation for org pretty printing - and that sounds like such an edge case
that it seems cleaner to forbid them
- drawer entries without value were printed as FOO rather than :FOO:
- account for differences between raw & non-raw block:
raw blocks are not wrapped in a further element, just raw text & line breaks:
-> the first line has to be indented manually
non raw blocks do not end in a linebreak newline -> the END_BLOCK line has to
be indented (rather they end with a manual newline from another element)
list items only contain content that is indented to their respective
level. Except when that content is inside a block. To allow for this we have to
ignore the parentStop when parsing a block and just include everything until
the end of that block.
Can't think of any problems with this right now. Let's see if this comes
back to bite me.
see https://orgmode.org/manual/Checkboxes.html
We're deviating from Org mode regarding the assigned css classes but the chosen
classes feel better than (on, off, trans) and it's not like the export matches
1:1 otherwise.
Org mode does not care where those tokens are when it comes to the
export (afaict). We'll do the same.
(They should only be in the first line of a list item or a headline)
Org mode separates kvs not as initially assumed by whitespace (~ csv) but
rather at keywords (~ :\w+).
This is still not replicating Org mode behaviour though as I decided against
attributes ignoring multi-definitions. Instead we stack their
values (and those existing on the element) for certain attributes (class, style
for now).
e.g.
[[foo]]
would become <foo class="a"> in Org mode but becomes <foo class="a b"> with
go-org.
While adding another test case from the goorgeous issues it became clear that
inline markup and html entity replacement were erronously applied to raw text
elements like inline code =foo=, src/example/export blocks, example lines,
etc.
To correctly handle those cases in both org and html exports a new
parseRawInline method had to be added.
Also some misc html export whitespace fixes and stuff
Until now the footnotes section was parsed but not included in the resulting
AST - his required rebuilding it in the OrgWriter. It feels cleaner to include
it in the AST and only exclude it in the export
including org files is more complex - e.g. footnotes need to be namespaced to
their source file. org does this by prefixing each included files footnotes
with a number - but even that is not enough as it doesn't guarantee
uniqueness.
As I don't have a usecase for it, I'll avoid the additional complexity for
now.
Also dismissed implementing colgroups for now - had it but didn't like the
added complexity for a very questionable benefit - i've actually never used
that feature of org tables...
- was missing spaces between attributes when rendering to org
- was duplicating attributes when rendering to html - now we join / replace
attributes depending on the name - for now only class & style are appended
To more faithfully handle inline images we need to know whether the original
link included a description - being more explicit about that will make it
easier.
see org.el/org-display-inline-images
> An inline image is a link which follows either of these
> conventions:
>
> 1. Its path is a file with an extension matching return value
> from `image-file-name-regexp' and it has no contents.
>
> 2. Its description consists in a single link of the previous
> type.
I went through the issues of goorgeous and picked a few that seemed easy enough
to add (and added some fore as todos for later). That helped a lot and showed
some bugs / edge cases that required changes.
- the org writer wrote a lot of eol spaces and just removed it whenever
String() was actually called. That worked until now but did not bode with
rendering an empty headline - by removing ALL eol space we would render "* "
back as just "*" -> not a headline anymore.
- the html writer had some special handling for line spacing inside paragraphs
and list items - with the introduction of more blocks we need that handling
everywhere.
As browsers / html renderers are nice enough to collapse whitespace (and
especially collapse "\s*\n" into " ") we can just write out the newlines and
let the renderer take care of the rest.