Ok I found mutool
(from the mupdf-tools
package) which seems to be helping with parsing via mutool show /path/to/file.pdf grep
.
The good file has, among other things, clearly defined references to the embedded file.
example_041.pdf:7: <</Border[0 0 0]/Contents<FEFF0074006500780074002000660069006C0065>/CreationDate(D:20150802122217+00'00')/F 4/FS 8 0 R/M(D:20150802122217+00'00')/NM(0001-0000)/Name/PushPin/P 11 0 R/Rect[240.94489 751.18136 255.1181 765.35458]/Subtype/FileAttachment/Type/Annot>>
example_041.pdf:8: <</EF<</F 9 0 R>>/F(utf8test.txt)/Type/Filespec>>
example_041.pdf:9: <</Filter/FlateDecode/Length 2649/Params<</Size 4384>>/Type/EmbeddedFile>>
âŽ
example_041.pdf:16: <</Lang<FEFF0065006E>/Metadata 15 0 R/Names<</EmbeddedFiles<</Names[(utf8test.txt)8 0 R]>>>>/OpenAction[11 0 R/FitH null]/PageLayout/SinglePage/PageMode/UseNone/Pages 1 0 R/Type/Catalog/Version/1.7/ViewerPreferences<</Direction/L2R>>>>
Looking at our problem child it doesnât look so problematic after all, though the definitions are different and thereâs JavaScript, which is odd, but it seems to be the thing that warns about the embedded files (e.g. in okular
):
ITA-560-2005.pdf:97: <</Desc<>/EF<</F 29 0 R>>/F(001. INDEX SHEET.pdf)/Thumb 34 0 R/Type/Filespec/UF(001. INDEX SHEET.pdf)>>
ITA-560-2005.pdf:98: <</Desc<>/EF<</F 30 0 R>>/F(002. PLEADING.pdf)/Thumb 33 0 R/Type/Filespec/UF(002. PLEADING.pdf)>>
âŽ
ITA-560-2005.pdf:136: <</EmbeddedFiles 68 0 R/JavaScript 137 0 R>>
ITA-560-2005.pdf:137: <</Names[(ADBE::FileAttachmentsCompatibility\000)138 0 R]>>
ITA-560-2005.pdf:138: <</JS(var v = app.viewerVersion;\nif \(v < 7\)\n{\n\tvar n = 0;\n\tif \(this.dataObjects != null\)\n\t\tn = this.dataObjects.length;\n\tif \(v >= 5 && v < 6 && n > 0 && \(app.viewerVariation == "Full" || app.viewerVariation == "Fill-In"\)\)\n\t{\n\t\tif \(this.external\)\n\t\t\tapp.alert\("This document has file attachments. To view the attachments, click the Save button to save a copy of the document, open the copy in Acrobat, and use the File > Document Properties > Embedded Data Objects menu.", 3, 0\);\n\t\telse\n\t\t\tapp.alert\("This document has file attachments. Use the File > Document Properties > Embedded Data Objects menu to view the attachments.", 3, 0\);\n\t}\n\telse if \(v >= 6 && v < 7\)\n\t{\n\t\tif \(n == 0\)\n\t\t{\n\t\t\tvar np = this.numPages;\n\t\t\tsyncAnnotScan\(\);\n\t\t\tfor \(var p = 0; p < np && n == 0; ++p\)\n\t\t\t{\n\t\t\t\tvar annots = this.getAnnots\(p\);\n\t\t\t\tif \(annots != null\)\n\t\t\t\t{\n\t\t\t\t\tfor \(var i = 0; i < annots.length; ++i\)\n\t\t\t\t\t{\n\t\t\t\t\t\tif \(annots[i].type == "FileAttachment"\)\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tn = 1;\n\t\t\t\t\t\t\tbreak;\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t\tif \(n > 0\)\n\t\t{\n\t\t\tif \(this.external\)\n\t\t\t\tapp.alert\("This document has file attachments. To view the attachments, click the black triangle at the top of the document window's vertical scrollbar and choose File Attachments.", 3, 0\);\n\t\t\telse\n\t\t\t\tapp.alert\("This document has file attachments. Use the Document > File Attachments menu to view the attachments.", 3, 0\);\n\t\t}\n\t}\n}\n)/S/JavaScript>>
Using pdfinfo
from the poppler-utils
package (qpdfview
uses Poppler libraries for PDF processes), it seems that there arenât any other glaring differences and they are indeed using the same PDF version (1.7). Finally using pdfdetach -list
on both files shows the correct filenames as being attached to the PDF.