read_xml {xml2} | R Documentation |
Read HTML or XML.
read_xml(x, encoding = "", ..., as_html = FALSE, options = "NOBLANKS") read_html(x, encoding = "", ..., options = c("RECOVER", "NOERROR", "NOBLANKS")) ## S3 method for class 'character' read_xml(x, encoding = "", ..., as_html = FALSE, options = "NOBLANKS") ## S3 method for class 'raw' read_xml(x, encoding = "", base_url = "", ..., as_html = FALSE, options = "NOBLANKS") ## S3 method for class 'connection' read_xml(x, encoding = "", n = 64 * 1024, verbose = FALSE, ..., base_url = "", as_html = FALSE, options = "NOBLANKS")
x |
A string, a connection, or a raw vector. A string can be either a path, a url or literal xml. Urls will
be converted into connections either using If a connection, the complete connection is read into a raw vector before being parsed. |
encoding |
Specify a default encoding for the document. Unless otherwise specified XML documents are assumed to be in UTF-8 or UTF-16. If the document is not UTF-8/16, and lacks an explicit encoding directive, this allows you to supply a default. |
... |
Additional arguments passed on to methods. |
as_html |
Optionally parse an xml file as if it's html. |
options |
Set parsing options for the libxml2 parser. Zero of more of
|
base_url |
When loading from a connection, raw vector or literal html/xml, this allows you to specify a base url for the document. Base urls are used to turn relative urls into absolute urls. |
n |
If |
verbose |
When reading from a slow connection, this prints some output on every iteration so you know its working. |
An XML document. HTML is normalised to valid XML - this may not be exactly the same transformation performed by the browser, but it's a reasonable approximation.
# Literal xml/html is useful for small examples read_xml("<foo><bar /></foo>") read_html("<html><title>Hi<title></html>") read_html("<html><title>Hi") # From a local path read_html(system.file("extdata", "r-project.html", package = "xml2")) # From a url cd <- read_xml(xml2_example("cd_catalog.xml")) me <- read_html("http://had.co.nz")