        |
libxml2 cheat sheet
Contenido de la página http://hamburgsteak.sandwich.net/writ/libxml2.txt
Developing XML-enabled C programs with libxml2
A beginner's guide
By David Turover
This document is in the public domain.
For brevity's sake, the code in this document contains no error checking.
In real life, you will want to check for NULL pointers and function returns.
Introduction
libxml2 is a library of functions for handling XML data.
A simple example:
#include <stdio.h>
#include <libxml/tree.h>
int main(){
xmlDocPtr doc;
xmlNodePtr nodeLevel1;
xmlNodePtr nodeLevel2;
doc = xmlParseFile("xmlfile.xml");
for( nodeLevel1 = doc->children;
nodeLevel1 != NULL;
nodeLevel1 = nodeLevel1->next)
{
printf("%s\n",nodeLevel1->name);
for( nodeLevel2 = nodeLevel1->children;
nodeLevel2 != NULL;
nodeLevel2 = nodeLevel2->next)
{
printf("\t%s\n",nodeLevel2->name);
}
}
xmlSaveFile("xmlfile_copy.xml", doc);
xmlFreeDoc(doc);
return 0;
}
The above code, compiled with -lxml2, should print out the names of the
elements in the first two elements' depth of an XML file, and save a copy
of the file.
Explanation of Introduction
The xmlDocPtr is a pointer to an xmlDoc structure.
It represents an XML data source.
You load an XML file with the xmlParseFile() function, which takes
as a parameter the name of an XML file and returns a pointer to a
new xmlDoc structure (or NULL on failure). When done, you release
this memory with the xmlFreeDoc() function. You can export an xmlDocPtr's
data as an XML file with the xmlSaveFile() function.
The xmlNodePtr points to a single element or node of an XML document.
Each xmlNode has a .children member which is an xmlNodePtr to the first
of this node's children. Each xmlNode has a .name member which is a
string containing the name of the element it represents, or the word "text"
for a text node.
The xmlNodePtr is the basic structure used to traverse an XML document
with libxml2. It contains several xmlNodePtrs which can be used to move
around the document. If there is no other node in a particular direction,
the pointer is NULL.
xmlNodePtr->children The first child of the node
xmlNodePtr->last The node's last child
xmlNodePtr->parent The current node's parent node
xmlNodePtr->next The next sibling node
xmlNodePtr->prev The previous sibling node
xmlNodePtr->doc The xmlDocPtr for the document containing this node
<node>
<node> ->parent
<node>
<node> ->prev
</node>
<node>
</node>
</node>
<node> You Are Here
<node> ->children
</node>
<node>
</node>
<node> ->last
</node>
</node>
<node> ->next
<node>
</node>
</node>
</node>
</node>
Checking for text nodes
You can easily check to see what type of xmlNode you have by looking
at the xmlNodePtr->type member, which is an integer with one of the
following values:
XML_ELEMENT_NODE
XML_ATTRIBUTE_NODE
XML_TEXT_NODE
XML_CDATA_SECTION_NODE
XML_ENTITY_REF_NODE
XML_ENTITY_NODE
XML_PI_NODE
XML_COMMENT_NODE
XML_DOCUMENT_NODE
XML_DOCUMENT_TYPE_NODE
XML_DOCUMENT_FRAG_NODE
XML_NOTATION_NODE
XML_HTML_DOCUMENT_NODE
XML_DTD_NODE
XML_ELEMENT_DECL
XML_ATTRIBUTE_DECL
XML_ENTITY_DECL
XML_NAMESPACE_DECL
XML_XINCLUDE_START
XML_XINCLUDE_END
XML_DOCB_DOCUMENT_NODE
The only ones you need to care about right now are XML_TEXT_NODE
and XML_ELEMENT_NODE.
Handling a Node
An XML node generally looks like this:
<this_is_a_node attribute1="abcdefg" attribute2="12345">
<this_is_a_child_node>Hello World</this_is_a_child_node>
</this_is_a_node>
The things you can manipulate are the node itself, the node's attributes,
and the node's contents.
Attributes
Working with attributes of a node is fairly straightforward: You use the
xmlGetProp() function to get an attribute's value and the xmlSetProp()
function to change an attribute's value. If you want to know if an
attribute exists, you use the xmlHasProp() function. If you want to
completely remove an attribute, use xmlUnsetProp().
xmlSetProp(xmlNodePtr node, xmlChar *name, xmlChar *value);
xmlGetProp(xmlNodePtr node, xmlChar *name);
xmlHasProp(xmlNodePtr node, xmlChar *name);
xmlUnsetProp(xmlNodePtr node, xmlChar *name);
xmlGetProp returns a string that must be freed with the xmlFree() function
when you are done with it, or else your program will have a memory leak.
Content
Working with content is less intuitive. The content of a node is not simply
what a node contains, but is the text of a node and its children with the
elements stripped and removed. Thus the content of <this_is_a_node> from the
above example would be "Hello World", with the child element
<this_is_a_child_node> nowhere to be seen. If you try adding element tags
to a node's content, libxml2 will &escape their < and > characters.
To work with content, then, you use the xmlNodeSetContent() and
xmlNodeGetContent() functions to set or retrieve a node's content,
or the xmlNodeAddContent() function to append to a node's content.
xmlNodeSetContent(xmlNodePtr node, xmlChar *content);
xmlNodeAddContent(xmlNodePtr node, xmlChar *content);
xmlNodeGetContent(xmlNodePtr node);
As with xmlGetProp(), you must use xmlFree() on the result
of xmlNodeGetContent() or else you will have a memory leak.
To print everything an element contains, not simply its content,
use xmlElemDump()
xmlElemDump(FILE * output, xmlDocPtr doc, xmlNodePtr node);
Strings: xmlChar* versus char*
xmlChar* is the string type used by libxml2.
You can easily cast between char* and xmlChar*.
Creating a New Node
To create a node from scratch and add it to a document:
xmlNodePtr node = xmlNewNode(NULL, "name");
xmlNodePtr nodeParent = doc->children;
node = xmlDocCopyNode(node, doc, 1);
xmlAddChild(nodeParent, node);
The xmlNewNode() function allocates memory for a new node. When you are done,
you must free the node with xmlFree() unless the node has been added to
another structure (as it has here) which will be freed. The NULL
in xmlNewNode() is where an xmlNsPtr namespace pointer would be if the node
was going to be assigned to a particular namespace; we are not using namespaces
right now, so it is left as NULL.
The xmlDocCopyNode() function does not copy the node to the target document.
Instead, it only copies the document information to the node, so that the
node believes it is part of the document. To add the node to the document,
you must then use another function such as xmlAddChild(), xmlAddSibling(),
xmlAddNextSibling(), or xmlAddPrevSibling().
Summary of xmlNode Members and Simple Interface Functions
type Node type (usually XML_ELEMENT_NODE or XML_ELEMENT_TEXT)
name String containing element's name, or "text" if a text node
children First child of node
last Last child of node
parent Parent node
next Next sibling node
prev Previous sibling node
doc The document containing this node
xmlSetProp(xmlNodePtr node, const xmlChar *name, const xmlChar *value);
xmlGetProp(xmlNodePtr node, const xmlChar *name);
xmlHasProp(xmlNodePtr node, const xmlChar *name);
xmlUnsetProp(xmlNodePtr node, const xmlChar *name);
xmlNodeSetContent(xmlNodePtr cur, const xmlChar *content);
xmlNodeAddContent(xmlNodePtr cur, const xmlChar *content);
xmlNodeGetContent(xmlNodePtr cur);
xmlElemDump(FILE * output, xmlDocPtr doc, xmlNodePtr node);
For more information, read the API docs at:
http://xmlsoft.org/html/libxml-tree.html
|