The paper develops Editor, a language for manipulating semistructured documents, such as those typically available on the Web. Editor programs are based on two simple ideas, taken from text editors: “search” instructions are used to select regions of interest in a document, and “cut & paste” instructions to restructure them. We study the expressive power and the complexity of these programs. We show that they are computationally complete, in the sense that any computable document restructuring can be expressed in Editor. We also study the complexity of a safe subclass of programs, showing that it captures exactly the class of polynomial-time restructurings. The language has been implemented in Java and is currently used in the project as a basis for a wrapper-generation toolkit.
Cut and Paste
MECCA, Giansalvatore;
1999-01-01
Abstract
The paper develops Editor, a language for manipulating semistructured documents, such as those typically available on the Web. Editor programs are based on two simple ideas, taken from text editors: “search” instructions are used to select regions of interest in a document, and “cut & paste” instructions to restructure them. We study the expressive power and the complexity of these programs. We show that they are computationally complete, in the sense that any computable document restructuring can be expressed in Editor. We also study the complexity of a safe subclass of programs, showing that it captures exactly the class of polynomial-time restructurings. The language has been implemented in Java and is currently used in the project as a basis for a wrapper-generation toolkit.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.