Spread the love

PDF is a subset of the PDF format that contains a limited set of presentation options. This format is an ISO standard and is intended for long-term storage of electronic documents. Ensuring a long shelf life is achieved by embedding in the content of an electronic document in PDF all the information necessary for its display. Such information, in particular, is fonts those that are used in the document are included in it. By the way, this affects its size: a document in PDF format is not smaller and probably even larger than a document with similar content saved in Word to PDF format.

Word to PDF

Use of PDF

It is believed that a document stored in PDF format, due to the complete lack of communication with such volatile things as hyperlinks and multimedia content, can be opened in any operating system after any lengthy time using an application that supports the corresponding format. Since PDF is provided with the status of an international standard, its support from software developers in the long term is justified, and its use is advisable in comparison with other available storage formats that can change at any time we recall the recent history of the abandonment of support for old-style Word documents up to version 2003 in Google Drive start killing the Word 97-2003 format is given.

Integrity and Immutability

At the same time, the integrity and immutability of an unsigned document in PDF format cannot be guaranteed and is not claimed as a feature of the format. In other words, despite the fact that this format is positioned as providing long-term storage, changing the contents of a document is possible and is not a deviation from the norm if it is not encrypted. However, there is one more nuance: for each specific document, the format of which is declared as PDF, it is impossible to assert that this is indeed so. Verification of compliance with the format requirements for each specific document is necessary, and if it is not carried out at the stage of posting in the archive or after the next change, the mission to ensure long-term storage is potentially failed with some reservations, but still.


Based on the differences described above between PDF formats and its descendant PDF / A, it can be assumed that the former is more suitable for online exchange and short-term storage of electronic documents, while PDF / A, despite the potentially large size of a single document all are embedded in it used fonts, and this is for short-term use excessive and tangible ballast, having the status of an international standard, ensures that even after a long time, regardless of the environment and operating system, any zovatel able to open the document in this format, having Viewer application. This fact fits into the concept of an archive of electronic documents and should be taken into account when saving each document in it.


Now you need to determine what a scanned image of documents is. In the vast majority of cases, this is a bitmap image. It is assumed that there is no text on top of it, that is, the document stores exclusively a scanned raster an image on which the text is incomprehensible to the computer but understandable only to humans. In exceptional cases, a text layer may be placed on top of the bitmap, partially or entirely filled either manually by a person or using a text recognition system. It can be assumed that the document contains metadata that is somehow related to the type of document and its contents, for example, if it is an invoice, metadata may contain information about the supplier, date of issue, amount, etc.