Discussion:
[bug #58206] [PATCH] fix PDFPIC issue with determining size of pdfs containing images
(too old to reply)
anonymous
2020-04-19 18:53:52 UTC
Permalink
URL:
<https://savannah.gnu.org/bugs/?58206>

Summary: [PATCH] fix PDFPIC issue with determining size of
pdfs containing images
Project: GNU troff
Submitted by: None
Submitted on: Sun 19 Apr 2020 06:53:50 PM UTC
Category: Device gropdf
Severity: 3 - Normal
Item Group: Incorrect behaviour
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None

_______________________________________________________

Details:

When .PDFPIC is used it calls to pdfinfo piped to grep to determine the
dimensions of the pdf. However when the pdf contains an image grep fails to
parse the date giving the following error.


Binary file (standard input) matches


Allowing grep to process Binary files seems to fix the issue. Below is the
patch.


diff --git a/tmac/pdfpic.tmac b/tmac/pdfpic.tmac
index 0400c1cf..4bc6f03b 100644
--- a/tmac/pdfpic.tmac
+++ b/tmac/pdfpic.tmac
@@ -84,7 +84,7 @@
.\" get image dimensions
. ec @
. sy pdfinfo @$1 | \
-grep "Page *size" | \
+grep -a "Page *size" | \
sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
.nr pdf-wid (p;\\1)\\n\
.nr pdf-ht (p;\\2)/' \


Should be a relatively easy fix.




_______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Sun 19 Apr 2020 06:53:50 PM UTC Name: pdfpic.diff Size: 375B By:
None

<http://savannah.gnu.org/bugs/download.php?file_id=48873>

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
G. Branden Robinson
2020-04-19 22:51:25 UTC
Permalink
Follow-up Comment #1, bug #58206 (project groff):

Unfortunately, the '-a' option to grep is not part of the POSIX standard for
the utility so we can't rely on it being available in the runtime
environment.

Of course, 'pdfinfo' isn't standard at all, so there's probably some room to
argue here.

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Ingo Schwarze
2020-04-20 03:18:54 UTC
Permalink
Update of bug #58206 (project groff):

Category: Device gropdf => Macro - others
Status: None => Need Info

_______________________________________________________

Follow-up Comment #2:

While nowadays, grep(1) -a tends to be universally available on all Linux and
BSD systems, notable systems that do not support it include Illumos and Oracle
Solaris (even the newest version 11.3). So Branden's remark about portability
is not a theoretical concern. While Illumos and Solaris don't appear to ship
groff by default, using groff there doesn't seem uncommon, so i'd consider
them relevant target platforms, and the patch seems likely to totally break
pdfpic.tmac on these platforms.

The Solaris versions of grep seem to simply fail on binary files; when called
with -F, they seem to simply pass through non-printable characters, but we
can't use -F here because '*' is needed. The Solaris manual page does not
mention any way to handle binary files.

The normal way to include images into roff(7) files is to convert them to eps
format and then use the .PSPIC macro.

Using the .PDFPIC macro looks like a bad idea in the first place. It is
highly insecure because it makes copious use auf the .sy request. Also, it is
almost undocumented: all i managed to find so far is a passing mention in
groff_tmac(7), which is riddled with very confusing typos - it talks about
defining PSPIC, and about PSDIF options, neither of which make sense. The
info(1) documentation doesn't seem to mention .PDFPIC it at all.

I'm wondering though how it can happen that the *output* from pdfinfo(1)
contains non-printable characters... Maybe this is a bug in pdfinfo(1), not
in groff? Can you show the PDF file that triggered the problem for you?

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
anonymous
2020-04-21 02:03:20 UTC
Permalink
Additional Item Attachment, bug #58206 (project groff):

File name: angular-1280-800.pdf Size:279 KB
<https://savannah.gnu.org/file/angular-1280-800.pdf?file_id=48889>



_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
anonymous
2020-04-21 02:07:04 UTC
Permalink
Follow-up Comment #3, bug #58206 (project groff):

I just attached the pdf i was trying to embed. This was mostly just to see if
pdfpic was working.

_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
anonymous
2020-04-21 02:34:03 UTC
Permalink
Follow-up Comment #4, bug #58206 (project groff):

Ok so a work around I used was to use tr's -d flag to remove the NULL sections
from the output from pdfinfo. As far as I know this should be available to all
systems. Another option is to use cat -v but I don't think that is POSIX
complaint.

Here is the diff

diff --git a/tmac/pdfpic.tmac b/tmac/pdfpic.tmac
index 0400c1cf..3dae30be 100644
--- a/tmac/pdfpic.tmac
+++ b/tmac/pdfpic.tmac
@@ -84,6 +84,7 @@
.\" get image dimensions
. ec @
. sy pdfinfo @$1 | \
+tr -d '\000' | \
grep "Page *size" | \
sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
.nr pdf-wid (p;\\1)\\n\


(file #48890)
_______________________________________________________

Additional Item Attachment:

File name: trversion.diff Size:0 KB
<https://savannah.gnu.org/file/trversion.diff?file_id=48890>



_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
G. Branden Robinson
2020-04-21 13:01:34 UTC
Permalink
This post might be inappropriate. Click to display it.
anonymous
2020-04-21 20:12:22 UTC
Permalink
Follow-up Comment #6, bug #58206 (project groff):

I found a similar issue in the their issue tracker sadly it is still
unresolved.

https://gitlab.freedesktop.org/poppler/poppler/-/issues/776

The argument for supporting PDFPIC is to allow files contain pdfmark's pdfhref
feature can also contain pictures. If a file that contains pdfhref's is
converted to postscript then only the text remains but no link is available
even after converting it to a pdf using ps2pdf.


_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Deri James
2020-04-21 23:00:23 UTC
Permalink
Follow-up Comment #7, bug #58206 (project groff):

PDFPIC is not written by me, but I understand why it was written! The gropdf
driver supports this:-

\X’pdf: pdfpic file alignment width height line-length’

Place an image of the specified width containing the PDF drawing from file
file of desired width
and height (if height is missing or zero then it is scaled proportionally). If
alignment is -L the
drawing is left aligned. If it is -C or -R a linelength greater than the width
of the drawing is re-
quired as well. If width is specified as zero then the width is scaled in
proportion to the height.

The problem with this low level command is that the position where groff will
render next is not altered, gropdf has no way of signalling back to groff the
vertical space occupied by the picture. So it is down to the author to move
down the required distance, or set an indent if the text is to flow to the
right of the picture, adding a trap to remove the indent when the bottom the
picture is reached.

Here's a simple example using your bad pdf. It requires that you know the x/y
proportions, in your case 1200x800 pixels it is 3/2:-

Here it is:-
\#.PDFPIC Bad3..pdf
\# Alternatively ...
.nr len 4i
\X'pdf: pdfpic Bad.pdf -L \n[len]z'
\# Back to top of picture
.sp -1l
\# Pic is 1200x800, so if width is 4i length would be 4i * 2/3
.sp \n[len]u*2u/3u
And here we are.

You don't need to groff -U so it perfectly safe.


_______________________________________________________

Reply to this item at:

<https://savannah.gnu.org/bugs/?58206>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Loading...