Discussion:
[bug #57448] groff will not break a line at a hard hyphen following some letter combinations
(too old to reply)
Dave
2019-12-19 13:16:38 UTC
Permalink
URL:
<https://savannah.gnu.org/bugs/?57448>

Summary: groff will not break a line at a hard hyphen
following some letter combinations
Project: GNU troff
Submitted by: barx
Submitted on: Thu 19 Dec 2019 07:16:37 AM CST
Category: Core
Severity: 3 - Normal
Item Group: Incorrect behaviour
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None

_______________________________________________________

Details:

I see this behavior in a build of the latest groff from git sources:

$ groff --version | head -1
GNU groff version 1.22.4.74-b400-dirty

as well as in earlier release versions, going back to at least 1.22.3.

$ groff -a <<EOF
.ll 1i
One has the AB-style switch.
EOF

This produces the output:

<beginning of page>
One has the AB-
style switch.

The same formatting is output by using any two capital letters in place of AB
-- *except* in 19 of the possible 676 combinations. If the two letters are
AT, AV, AW, AY, DV, DW, DY, LT, LV, LW, LY, OT, OV, OW, OY, RT, RV, RW, or RY,
groff will refuse to break the line at the hyphen, carrying over the entire
"xx-style" phrase to the next line (or putting it all on the first line, if
you increase .ll a bit).

This script generates all 676 combinations and shows which ones won't break at
the hyphen:

#!/bin/bash

(echo .ll 1i
for first in $(seq 65 90)
do
for second in $(seq 65 90)
do
echo -e One has the \
"\x$(printf %x $first)\x$(printf %x $second)"-style \
switch.
echo .br
done
done
) | groff -a | grep -o ..-style




_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?57448>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Bjarni Ingi Gislason
2019-12-19 23:13:40 UTC
Permalink
Follow-up Comment #1, bug #57448 (project groff):

The reason for this is the kerning,
which makes some of the text strings "XY-" to be thinner than others.

Test with adding the ".kern 0" request.

So I see no bug there.


_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?57448>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Dave
2019-12-20 04:08:56 UTC
Permalink
Follow-up Comment #2, bug #57448 (project groff):

How it kerns characters elsewhere on the line should have no effect on whether
it is willing to break a line at a hard hyphen. If it does, this is a bug.

groff will not break the line at a hyphen after these particular letter
combinations. You can verify this by varying the line length from 1 to 1.99
inches, a hundredth of an inch at a time, to see the various places groff
breaks the line:

seq -w 0 99 | xargs -I FRACTION printf '.br\n.ll 1.FRACTIONi\nOne has the
DW-style switch.\n' | groff -a | fgrep One | uniq

(The above is all one line, though this bug tracker will probably break it.)

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?57448>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Dave
2019-12-20 10:36:27 UTC
Permalink
Follow-up Comment #3, bug #57448 (project groff):

But you are correct that kerning appears to be what triggers the bug.

In the default font (TR), only four uppercase letters have kerning between
them and a following hyphen:

$ sed -n '/^kernpairs/,/^$/p' /usr/share/groff/current/font/devps/TR | grep
'^[[:upper:]] - '
T - -92
V - -100
W - -65
Y - -111

(Side note: these four letters' glyphs are basically symmetrical, so it's
unclear why this file lacks the same four combinations in reverse order; there
are no kern pairs beginning with a hyphen.)

Now look at all kern pairs beginning with any uppercase letter and ending with
one of the above four:

$ sed -n '/^kernpairs/,/^$/p' /usr/share/groff/current/font/devps/TR | grep
'^[[:upper:]] [TVWY]'

This generates my list of the 19 offending combinations. Thus a more succinct
bug description would be: groff fails to break a line at a hyphen following
two uppercase letters when the font file specifies kerning both between those
letters and between the second letter and the hyphen.

Curiously, the place where the line break should occur is never kerned: the
break should always be after the hyphen, and no kern pairs have a hyphen as
the first element. It is kerning that occurs before (what should be) a
breakpoint that somehow inhibits breaking at that point.

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?57448>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Bjarni Ingi Gislason
2019-12-22 23:37:45 UTC
Permalink
Follow-up Comment #4, bug #57448 (project groff):

If the kerning in front of the hyphen is removed (with '\&'),
the expected result is there.

Whoever fixes this bug, should add (a lot of) explanations,
especially reasons.


_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?57448>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Dave
2019-12-23 12:50:39 UTC
Permalink
Post by Bjarni Ingi Gislason
If the kerning in front of the hyphen is removed (with '\&'),
the expected result is there.
Well, not if the expected result is for kerning and line-breaking to
simultaneously work. :)

If one needs a workaround, a better one is to add a zero-width breakpoint (\:)
after the hyphen.

But the greater trick is figuring out all the places such a \: is needed. In
groff's default font set -- with relatively few kern pairs that have a hyphen
as the second character -- the problem is limited in scope. But for installed
fonts with a larger set of kern pairs, the bug could crop up in unexpected
places. (And there is nothing magical about two characters: any number of
kerned characters before a hyphen triggers the bug.)

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?57448>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Dave
2020-05-14 06:41:00 UTC
Permalink
Follow-up Comment #6, bug #57448 (project groff):

And it turns out that kern pairs need not even exist between every letter of a
hyphen-containing phrase to trigger the bug. For instance, the string
"major-Y-axis" has only two kern pairs in the Times typeface: between "r" and
"-", and "Y" and "-". Yet groff will break at neither hyphen:


seq -w 0 99 | xargs -I FRACTION printf '.br\n.ll 3.FRACTIONc\nIt can show the
major-Y-axis line\n' | groff -a | fgrep It | uniq


The problem disappears if the Y is turned into an X.

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?57448>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Loading...