Standard font information
[Editors Note: This piece originally appeared in Mark Stephens Java PDF blog which I recommend you visit if you’re interested in the technical side of PDF. Mark is the CEO of IDRsolutions, the company behind JPedal, a 100% Java PDF library. Visit his LinkedIn page for more information.]
One of the reasons that the PDF file format is so popular is that it embeds a large amount of font information in the PDF file, so that it can accurately reproduce the display as intended on any machine. It will not turn your beautifully crafted 12 page document into a horribly mis-formatted 14 page version, as Microsoft word does, if it cannot find all the fonts.
To help reduce the size of PDF files, the PDF file format specifies a set of font families (originally 14 and now reduced to 8…) which all PDF readers should know about. No information is included in the PDF beyond the font name (i.e. Arial).
This is fantastic for PDF file size, but where do you get information about these fonts if you need it for some reason. With an embedded font, you have the width and bounding box of each glyph. It must be somewhere, because Adobe has access to it, but it is not in the PDF file. Where is the information stored?
It is actually stored inside a set of separate text files which are built into PDF readers. It contains a header with information about the font followed by a set of lines describing each glyph. This is part of what one looks like…
Comment Copyright (c) 1989, 1990, 1991, 1992, 1993, 1997 Adobe Systems Incorporated. All Rights Reserved.
Comment Creation Date: Thu May 1 17:27:09 1997
Comment UniqueID 43050
Comment VMusage 39754 50779
FontBBox -23 -250 715 805
Notice Copyright (c) 1989, 1990, 1991, 1992, 1993, 1997 Adobe Systems Incorporated. All Rights Reserved.
C 32 ; WX 600 ; N space ; B 0 0 0 0 ;
C 33 ; WX 600 ; N exclam ; B 236 -15 364 572 ;
C 34 ; WX 600 ; N quotedbl ; B 187 328 413 562 ;
C 35 ; WX 600 ; N numbersign ; B 93 -32 507 639 ;
For each glyph, there is a line giving the index number (33), the width of the font (600 units out of 1000), the name of the glyph (exclam) and the Bounding Box which can be drawn around it.
The afm files themselves have become rather elusive on the web, so here is a link to my copy.