2024年5月4日发(作者:)

备课教案:

讲课人:王航

讲课内容:3《Syntax》 3.1《Lexical Conventions》3.2《Objects》

第三章 语法(Syntax)

名词解释:Syntax

Computer Science The rules governing construction of a machine language.

【计算机科学】 语法:支配某种机器语言的构造的规则

在PDF Reference中语法指的是PDF文件构造的规则。PDF语法规则。

没有规矩不成方圆所有的合法PDF文档都必须遵守这个规则。

This chapter covers everything about the syntax of PDF at the object, file, and document level. It sets the stage for

subsequent chapters, which describe how the contents of a PDF file are interpreted as page descriptions,

interactive navigational aids, and application-level logical structure.

第三章涵盖了PDF在Object /file/ document三个方面的语法规则。它讲述了PDF文件的页面描述,交互规

则,和应用层面的逻辑结构,并为后面章节的学习打下基础。

PDF syntax is best understood by thinking of it in four parts, as shown in Figure 3.1:

•Objects. A PDF document is a data structure composed from a small set of basic types of data objects.

Section 3.1, ―Lexical Conventions,‖ describes the character set used to write objects and other syntactic elements.

Section 3.2, ―Objects,‖ describes the syntax and essential properties of the objects.

Section 3.2.7, ―Stream Objects,‖ provides complete details of the most complex data type, the stream object.

PDF语法最好通过以下4个部分来理解。

对象:PDF文档是一个由一些小的基本数据类型组成的。

3.1节 词法约定 介绍了对象和其他的语法元素使用的字符集。

3.2节 对象 描述了语法和对象的本质属性。

3.2.7节 流对象 可以用来完成大部分复杂的数据结构。

•File structure. The PDF file structure determines how objects are stored in a PDF file, how they are accessed, and

how they are updated. This structure is independent of the semantics of the objects.

Section 3.4, ―File Structure,‖ describesthe file structure.

Section 3.5, ―Encryption,‖ describes a file-level mechanism for protecting a document’s contents from

unauthorized access.

文件结构.PDF文件结构规定了对象的存储方法,访问方法,以及校正方法。它与对象是两个完全不同的概

念。

3.4节 文件结构

3.5节 加密技术 介绍了文档加密内容的保护机制。

•Document structure. The PDF document structure specifies how the basic object types are used to represent

components of a PDF document: pages, fonts, annotations, and so forth.

Section 3.6, ―Document Structure,‖ describes the overall document structure; later chapters address the detailed

semantics of the components.

文档结构.PDF文档结构用来描述如何使用基本对象类型来构造一个PDF文档:page/fonts/annotations等等。

3.6 节 文档结构 是文档结构的总纲,以后的章节则是详细的语法解释。

•Content streams. A PDF content stream contains a sequence of instructions describing the appearance of a page

or other graphical entity. These instructions, while also represented as objects, are conceptually distinct from the

objects that represent the document structure and are described separately.

内容串。PDF内容串让页面或别的实体按一定的次序表达出来。这部分的说明,也提到了对象,可是这里

的对象要和文档结构中描述的对象区别开,是不同的两个概念。

Section 3.7, ―Content Streams and Resources,‖ discusses PDF content streams and their associated resources.

In addition, this chapter describes some data structures, built

from basic objects, that are so widely used that they can

almost be considered basic object types in their own right.

These objects are covered in Sections 3.8, ―Common Data

Structures‖; 3.9, “Functions‖; and 3.10, ―File Specifications.‖

PDF’s object and file syntax is also used as the basis for other

file formats. These include the Forms Data Format (FDF),

described in Section 8.6.6, ―Forms Data Format,‖ and the

Portable Job Ticket Format (PJTF), described in Adobe

Technical Note #5620, Portable Job Ticket Format.

3.7节 《内容串和数据资源》讨论了PDF内容串和与它相关的资源。另外这个章节还介绍了一些由基本对

象构造而成的数据结构,但由于广泛使用,所以也被我们把他们当做是基本的数据结构。这些对象将在3.8

节 通用数据类型 3.9节 函数 以及 3.10节 文件格式中详细介绍。PDF对象和文件的语法规则也被其它

的文件格式作为基本规则。这些文件包括,在8.6.6《表格数据格式》介绍的表格数据格式(*.FDF),以及

轻便式工作标签格式(PJTF),这在Adobe技术笔记#5620,轻便式工作标签格式中有描述。

第一节 Lexical Conventions

词法约定

At the most fundamental level, a PDF file is a sequence of 8-bit bytes. These bytes can be grouped into

tokens according to the syntax rules described below. One or more tokens are assembled to form

higher-level syntactic entities, principally objects, which are the basic data values from which a PDF

document is constructed.

在底层的角度看,PDF文件是一串8位字节数据。这些字节根据下面的语法规则组合成语法标志。

主要的对象就是由这样一个或者多个的语法标志通过更加高级的语法规则组装而成的,它们就是

构成PDF文档的基本数据。

PDF can be entirely represented using byte values corresponding to the visible printable subset of the

ASCII character set, plus white space characters such as space, tab, carriage return, and line feed

characters. ASCII is the American Standard Code for Information Interchange, a widely used convention

forencoding a specific set of 128 characters as binary numbers. However, a PDF file is not restricted to

the ASCII character set; it can contain arbitrary 8-bit bytes, subject to the following considerations:

PDF文件可以完全用字节值来表达,有ASCII字符集中可打印部分、以及间隔符号如空格、制表

符、回车及换行符。 ASCII是美国信息交换标准码,一个广泛使用的包含128个字符的二进制字

符集。但是基于下面三点考虑,PDF文件并不局限于ASCII字符集,它可以包含任意的8位字节。

•The tokens that delimit objects and that describe the structure of a PDF file are all written in the ASCII

character set, as are all the reserved words and the names used as keys in standard dictionaries.

所有定义基本对象的语法标志和PDF文件结构描述符号,以及所有的保留字和标准字典中使用的ke

关键字全都用的是ACSCII字符集。

•The data values of certain types of objects—strings and streams—can be but need not be written entirely

in ASCII. For the purpose of exposition (as in this book), ASCII representation is preferred. However, in

actual practice, data that is naturally binary, such as sampled images, is represented directly in binary for

compactness and efficiency.

可是一些特定的对象类型,比如字符串和数据流不完全是采用ASCII。为了便于说明(例如这本书),

ASCII是首选的表示方法。然而,实际操作上,例如一个图像采样,为了简洁有效考虑,就需要直