A more efficient implementation of affecting all string nodes, is to replace
the Text node prototype in the PrototypicalNodeFactory
with a
custom TextNode that performs the required operation.
For example, if you were using:
StringNodeFactory factory = new StringNodeFactory(); factory.setDecode(true);to decode all text issued from
Text.toPlainTextString()
,
you would instead create a subclass of TextNode
and set it as the prototype for text node generation:
PrototypicalNodeFactory factory = new PrototypicalNodeFactory (); factory.setTextPrototype (new TextNode () { public String toPlainTextString() { return (org.htmlparser.util.Translate.decode (super.toPlainTextString ())); } });Similar constructs apply to removing escapes and converting non-breaking spaces, which were the examples previously provided.
Using a subclass avoids the wrapping and delegation inherent in the decorator pattern, with subsequent improvements in processing speed and memory usage.
public class StringNodeFactory extends PrototypicalNodeFactory implements java.io.Serializable
Modifier and Type | Field and Description |
---|---|
protected boolean |
mConvertNonBreakingSpaces
Deprecated.
Flag to tell the parser to convert non breaking space (from ? to a space " ").
|
protected boolean |
mDecode
Deprecated.
Flag to tell the parser to decode strings returned by StringNode's toPlainTextString.
|
protected boolean |
mRemoveEscapes
Deprecated.
Flag to tell the parser to remove escape characters, like \n and \t, returned by StringNode's toPlainTextString.
|
mBlastocyst, mRemark, mTag, mText
Constructor and Description |
---|
StringNodeFactory()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
Text |
createStringNode(Page page,
int start,
int end)
Deprecated.
Create a new string node.
|
boolean |
getConvertNonBreakingSpaces()
Deprecated.
Get the non-breaking space replacing state.
|
boolean |
getDecode()
Deprecated.
Get the decoding state.
|
boolean |
getRemoveEscapes()
Deprecated.
Get the escape removing state.
|
void |
setConvertNonBreakingSpaces(boolean convert)
Deprecated.
Set the non-breaking space replacing state.
|
void |
setDecode(boolean decode)
Deprecated.
Set the decoding state.
|
void |
setRemoveEscapes(boolean remove)
Deprecated.
Set the escape removing state.
|
clear, createRemarkNode, createTagNode, get, getRemarkPrototype, getTagNames, getTagPrototype, getTextPrototype, put, registerTag, registerTags, remove, setRemarkPrototype, setTagPrototype, setTextPrototype, unregisterTag
protected boolean mDecode
protected boolean mRemoveEscapes
protected boolean mConvertNonBreakingSpaces
public Text createStringNode(Page page, int start, int end)
createStringNode
in interface NodeFactory
createStringNode
in class PrototypicalNodeFactory
page
- The page the node is on.start
- The beginning position of the string.end
- The ending positiong of the string.public void setDecode(boolean decode)
decode
- If true
, string nodes decode text using Translate.decode(java.lang.String)
.public boolean getDecode()
true
if string nodes decode text.public void setRemoveEscapes(boolean remove)
remove
- If true
, string nodes remove escape characters.public boolean getRemoveEscapes()
public void setConvertNonBreakingSpaces(boolean convert)
convert
- If true
, string nodes replace ;nbsp; characters with spaces.public boolean getConvertNonBreakingSpaces()
HTML Parser is an open source library released under LGPL.