Discussion:
[bug #53413] [PATCH] Add hyphenation patterns to use with US and hy=48
(too old to reply)
Werner LEMBERG
2018-03-22 05:18:31 UTC
Permalink
Follow-up Comment #1, bug #53413 (project groff):

Thanks. However, I really wonder whether we should add this to groff.

Compare this to a red traffic light in the US. You tell all passengers that
it is forbidden to cross the street in that case. However, at the same time
you provide an armor to them as a protection so that they can still cross the
street anytime without being hurt. As soon as they visit a different country
like Germany they think the armor still works but no, this is no longer the
case...

This doesn't make sense to me.

I'm leaving a decision to other people.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Bjarni Ingi Gislason
2018-03-22 10:50:25 UTC
Permalink
Follow-up Comment #2, bug #53413 (project groff):

The number 4 is not high enough to forbid the hyphenation as there are
patterns of type "5x$" and "^x5", so it must be changed to 6.

The fundamental flaw of the used algorithms is to compare patterns to
the beginning and end of words without the anchor point '.'.

"*roff" has the 'hy=1' which TeX does not(?).

The new additional pattern files are simply a choice for those who
generally want to use hy=48 (current development state of groff) or
hy=1 (current stable version) with the current implementation of the
hyphenation algorithm.


_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Werner LEMBERG
2018-03-22 11:08:04 UTC
Permalink
Follow-up Comment #3, bug #53413 (project groff):

German patterns use numbers up to value 8 – this means that such a `safety
addition' as you suggest isn't possible, because values can't be larger
than 9.
Post by Bjarni Ingi Gislason
"*roff" has the 'hy=1' which TeX does not(?).
It's not clear what you want to say. If you mean that TeX doesn't support
\{left,right}hyphenmin=1, this is not correct. You can set those two
parameters to any value. However, if you select a too low value, you also get
invalid hyphenation points.

And `generally' using `.hy 48' is a bad idea. Values of `.hy' are always
bound to the currently selected hyphenation patterns.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
G. Branden Robinson
2018-04-23 12:02:09 UTC
Permalink
Update of bug #53413 (project groff):

Status: None => Need Info

_______________________________________________________

Follow-up Comment #4:

Bjarni,

Can you address Werner's last comment, please? This bug is stuck in limbo.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Bjarni Ingi Gislason
2018-04-24 01:37:07 UTC
Permalink
Follow-up Comment #5, bug #53413 (project groff):

The number 8 is high enough to prohibit wrong hyphenations caused by
odd numbers less than 9.

It is more important to avoid wrong hyphenations than finding
patterns for rare ones.

And the last used number should be an even number to eliminate wrong
patterns found earlier.

[If 9 is used, maybe '0' could be used as a counterbalance?]

I am not saying that hy=48 should be default. Users may (are free
to, this is "free" software) choose for themselves any defined value
(irrespective of the used main-hyphenation file).

I use hy=48 with the additional files for my own writing; for reading
man pages, no hyphenations (waste of resources).
I have not noticed any irregularities.

The additional files just lessen the chance of seeing wrong
hyphenations, as long as the algorithm does not take care of the issue.

N.B.
The files need a prolog to explain it, like

% These patterns avoid false hyphenations for US English with the
% current (April 2018) algorithm in groff and values of ".hy" that allow
% splitting of one character.


_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Dave
2018-05-14 22:38:05 UTC
Permalink
And `generally' using `.hy 48' is a bad idea. Values of
`.hy' are always bound to the currently selected hyphenation
patterns.
Yes, but the entire reason .hy 48 produces erroneous output is that the
hyphenation patterns were not crafted with this setting in mind. This strikes
me as a failure of the hyphenation patterns, not of the user's .hy selection.

The proposed patch (haphazardly) addresses that, but for English only.
Werner's point, in comment #1, is that it's imprudent to fix a problem for one
language that ships with groff while leaving the same problem in the others.

I suppose it's a philosophical question whether it's better to partially fix a
shortcoming but create an inconsistency between languages (what this patch
does), or maintain consistency and address the shortcoming in the
documentation (the status quo). Werner also declined to make this call in
comment #1. And I'm not a groff developer, so it's not really my place to
have an opinion, either. But I hope this clarifies the issue.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Werner LEMBERG
2018-05-15 05:43:14 UTC
Permalink
Follow-up Comment #7, bug #53413 (project groff):

I can only repeat that there is no `failure' in hyphenation patterns. It's
outside of groff's scope to decide that since it is meta information bound to
a given language.

What's missing is a mechanism for groff to get the correct minmax hyphenation
values. Fortunately, there are efforts to improve that: Look at
`yaml-headers' branch of the central repository of TeX hyphenation patterns

https://github.com/hyphenation/tex-hyphen/tree/yaml-headers

which adds the necessary information to all available hyphenation patterns!
AFAIK, this will eventually be merged into `master' (as soon as the maintainer
has time to do that).

I suggest to add code to groff to make use of this information, for example,
by replacing the current `hyphen.xxx' files with the corresponding files from
the `tex-hyphen' repository, then parsing the YAML headers for the necessary
information.


_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Dave
2018-05-16 21:57:54 UTC
Permalink
It's outside of groff's scope to decide that since it is
meta information bound to a given language.
Well, yes and no. Looking at the bigger picture:

In English, a single letter is not a valid hyphenation breakpoint, even if it
is a syllable breakpoint. This is a longstanding typographic convention and a
fairly ironclad rule in modern publishing.

However, what if the user asks the typesetter to override this rule? Should
the typesetter acquiesce, because the user is ultimately in charge, and it's
not software's job to protect users from themselves? This appears to be
Bjarni's position.

Or should the typesetting software refuse, because knowing typesetting best
practices is its job, despite what the user asks for? This seems to be
Werner's position.

Still not taking sides, just trying to clarify what the sides are.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Werner LEMBERG
2018-05-17 04:58:50 UTC
Permalink
Follow-up Comment #9, bug #53413 (project groff):

If the user asks to hyphenate after the first letter, groff should indeed
refuse in case current language's hyphenation parameters don't allow it.

Right now, these parameters are hardcoded using `.hy', this is, they are part
of groff's language setup files. My suggestion is to make groff look into the
meta information of the pattern themselves while they are loaded to set those
parameters. The `.hy' command could then become a user-only request
restricted by the meta information – for example, setting hy=48 would have
no effect because the patterns don't allow it.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Dave
2020-07-12 23:42:23 UTC
Permalink
Follow-up Comment #10, bug #53413 (project groff):

[comment #9 comment #9:]
My suggestion is to make groff look into the meta information
of the pattern themselves while they are loaded to set those
parameters. The `.hy' command could then become a user-only
request restricted by the meta information – for example,
setting hy=48 would have no effect because the patterns don't
allow it.
I've opened bug #57556 with this suggestion.
Right now, these parameters are hardcoded using `.hy', this
is, they are part of groff's language setup files.
I take it you're referring here to the tmac/{cs|de|fr|ja|sv|zh}.tmac files?


$ shopt -s extglob
$ grep '\. *hy' tmac/@(cs|de|fr|ja|sv|zh).tmac
tmac/cs.tmac:.hy 1
tmac/de.tmac:.hy 1
tmac/fr.tmac:.hy 4
tmac/sv.tmac:.hy 32
$


Curiously, there is no such setup file for English, even though the groff
default (.hy 1) is invalid for the English hyphenation patterns. Until a fix
for bug 57556 offers a more robust solution to this, perhaps there should be a
tmac/en.tmac that is called by default in English environments?

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
G. Branden Robinson
2020-08-06 15:15:52 UTC
Permalink
Follow-up Comment #11, bug #53413 (project groff):

[comment #10 comment #10:]
Post by Dave
Curiously, there is no such setup file for English, even though the groff
default (.hy 1) is invalid for the English hyphenation patterns. Until a fix
for bug 57556 offers a more robust solution to this, perhaps there should be a
tmac/en.tmac that is called by default in English environments?

Probably, or at least en_US.tmac to keep the Commonwealthers from screaming
too loudly.

The setup for English is handled by troffrc and hyphen{,ex}.us.

Unless I'm misunderstanding you.



_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Dave
2020-08-07 23:32:55 UTC
Permalink
Follow-up Comment #12, bug #53413 (project groff):

[comment #11 comment #11:]
Post by G. Branden Robinson
Probably, or at least en_US.tmac to keep the Commonwealthers from screaming
too loudly.

Groff already makes no distinction between US and UK English in its
hyphenation handling, so whatever screaming they want to do, it's over
something more fundamental than the name of the .tmac file.
Post by G. Branden Robinson
The setup for English is handled by troffrc and hyphen{,ex}.us.
Unless I'm misunderstanding you.
I don't know, because I'm not sure I'm following you.

hyphen{,ex}.us, despite living in tmac/, aren't tmac files, so the default .hy
can't be set there.

troffrc looks like it unconditionally sets the language to "us" and loads
those two hyphenation files. So this might be a good place to go ahead and
set .hy as well, rather than creating a new .tmac file. But I don't know how
non-English groffs are set up (Do they skip troffrc altogether? Does the
installation process modify this file to load different hyphenation
patterns?), so I'm probably not the best person to say.

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?53413>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Loading...