CMS 3D CMS Logo

List of all members | Public Member Functions | Public Attributes | Private Member Functions
BeautifulSoup.BeautifulStoneSoup Class Reference
Inheritance diagram for BeautifulSoup.BeautifulStoneSoup:
BeautifulSoup.Tag BeautifulSoup.PageElement BeautifulSoup.BeautifulSOAP BeautifulSoup.BeautifulSoup BeautifulSoup.RobustXMLParser BeautifulSoup.SimplifyingSOAPParser BeautifulSoup.ICantBelieveItsBeautifulSoup BeautifulSoup.MinimalSoup BeautifulSoup.RobustHTMLParser BeautifulSoup.RobustWackAssHTMLParser BeautifulSoup.RobustInsanelyWackAssHTMLParser

Public Member Functions

def __getattr__ (self, methodName)
 
def __init__ (self, markup="", parseOnlyThese=None, fromEncoding=None, markupMassage=True, smartQuotesTo=XML_ENTITIES, convertEntities=None, selfClosingTags=None, isHTML=False)
 
def convert_charref (self, name)
 
def endData (self, containerClass=NavigableString)
 
def handle_charref (self, ref)
 
def handle_comment (self, text)
 
def handle_data (self, data)
 
def handle_decl (self, data)
 
def handle_entityref (self, ref)
 
def handle_pi (self, text)
 
def isSelfClosingTag (self, name)
 
def parse_declaration (self, i)
 
def popTag (self)
 
def pushTag (self, tag)
 
def reset (self)
 
def unknown_endtag (self, name)
 
def unknown_starttag (self, name, attrs, selfClosing=0)
 
- Public Member Functions inherited from BeautifulSoup.Tag
def __call__ (self, args, kwargs)
 
def __contains__ (self, x)
 
def __delitem__ (self, key)
 
def __eq__ (self, other)
 
def __getattr__ (self, tag)
 
def __getitem__ (self, key)
 
def __init__ (self, parser, name, attrs=None, parent=None, previous=None)
 
def __iter__ (self)
 
def __len__ (self)
 
def __ne__ (self, other)
 
def __nonzero__ (self)
 
def __repr__ (self, encoding=DEFAULT_OUTPUT_ENCODING)
 
def __setitem__ (self, key, value)
 
def __str__ (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)
 
def __unicode__ (self)
 
def childGenerator (self)
 
def clear (self)
 
def decompose (self)
 
def fetchText (self, text=None, recursive=True, limit=None)
 
def find (self, name=None, attrs={}, recursive=True, text=None, kwargs)
 
def findAll (self, name=None, attrs={}, recursive=True, text=None, limit=None, kwargs)
 
def firstText (self, text=None, recursive=True)
 
def get (self, key, default=None)
 
def getString (self)
 
def getText (self, separator=u"")
 
def has_key (self, key)
 
def index (self, element)
 
def prettify (self, encoding=DEFAULT_OUTPUT_ENCODING)
 
def recursiveChildGenerator (self)
 
def renderContents (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)
 
def setString (self, string)
 
- Public Member Functions inherited from BeautifulSoup.PageElement
def append (self, tag)
 
def extract (self)
 
def findAllNext (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def findAllPrevious (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def findNext (self, name=None, attrs={}, text=None, kwargs)
 
def findNextSibling (self, name=None, attrs={}, text=None, kwargs)
 
def findNextSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def findParent (self, name=None, attrs={}, kwargs)
 
def findParents (self, name=None, attrs={}, limit=None, kwargs)
 
def findPrevious (self, name=None, attrs={}, text=None, kwargs)
 
def findPreviousSibling (self, name=None, attrs={}, text=None, kwargs)
 
def findPreviousSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def insert (self, position, newChild)
 
def nextGenerator (self)
 
def nextSiblingGenerator (self)
 
def parentGenerator (self)
 
def previousGenerator (self)
 
def previousSiblingGenerator (self)
 
def replaceWith (self, replaceWith)
 
def replaceWithChildren (self)
 
def setup (self, parent=None, previous=None)
 
def substituteEncoding (self, str, encoding=None)
 
def toEncoding (self, s, encoding=None)
 

Public Attributes

 convertEntities
 
 convertHTMLEntities
 
 convertXMLEntities
 
 currentData
 
 currentTag
 
 declaredHTMLEncoding
 
 escapeUnrecognizedEntities
 
 fromEncoding
 
 hidden
 
 instanceSelfClosingTags
 
 literal
 
 markup
 
 markupMassage
 
 originalEncoding
 
 parseOnlyThese
 
 previous
 
 quoteStack
 
 smartQuotesTo
 
 tagStack
 
- Public Attributes inherited from BeautifulSoup.Tag
 attrMap
 
 attrs
 
 containsSubstitutions
 
 contents
 
 convertHTMLEntities
 
 convertXMLEntities
 
 escapeUnrecognizedEntities
 
 hidden
 
 isSelfClosing
 
 name
 
 parserClass
 
- Public Attributes inherited from BeautifulSoup.PageElement
 next
 
 nextSibling
 
 parent
 
 previous
 
 previousSibling
 

Private Member Functions

def _feed (self, inDocumentEncoding=None, isHTML=False)
 
def _popToTag (self, name, inclusivePop=True)
 
def _smartPop (self, name)
 
def _toStringSubclass (self, text, subclass)
 

Additional Inherited Members

- Properties inherited from BeautifulSoup.Tag
 string = property(getString, setString)
 
 text = property(getText)
 

Detailed Description

This class contains the basic parser and search code. It defines
a parser that knows nothing about tag behavior except for the
following:

  You can't close a tag without closing all the tags it encloses.
  That is, "<foo><bar></foo>" actually means
  "<foo><bar></bar></foo>".

[Another possible explanation is "<foo><bar /></foo>", but since
this class defines no SELF_CLOSING_TAGS, it will never use that
explanation.]

This class is useful for parsing XML or made-up markup languages,
or when BeautifulSoup makes an assumption counter to what you were
expecting.

Definition at line 1039 of file BeautifulSoup.py.

Constructor & Destructor Documentation

def BeautifulSoup.BeautifulStoneSoup.__init__ (   self,
  markup = "",
  parseOnlyThese = None,
  fromEncoding = None,
  markupMassage = True,
  smartQuotesTo = XML_ENTITIES,
  convertEntities = None,
  selfClosingTags = None,
  isHTML = False 
)
The Soup object is initialized as the 'root tag', and the
provided markup (which can be a string or a file-like object)
is fed into the underlying parser.

sgmllib will process most bad HTML, and the BeautifulSoup
class has some tricks for dealing with some HTML that kills
sgmllib, but Beautiful Soup can nonetheless choke or lose data
if your data uses self-closing tags or declarations
incorrectly.

By default, Beautiful Soup uses regexes to sanitize input,
avoiding the vast majority of these problems. If the problems
don't apply to you, pass in False for markupMassage, and
you'll get better performance.

The default parser massage techniques fix the two most common
instances of invalid HTML that choke sgmllib:

 <br/> (No space between name of closing tag and tag close)
 <! --Comment--> (Extraneous whitespace in declaration)

You can pass in a custom list of (RE object, replace method)
tuples to get Beautiful Soup to scrub your input the way you
want.

Definition at line 1085 of file BeautifulSoup.py.

1085  convertEntities=None, selfClosingTags=None, isHTML=False):
1086  """The Soup object is initialized as the 'root tag', and the
1087  provided markup (which can be a string or a file-like object)
1088  is fed into the underlying parser.
1089 
1090  sgmllib will process most bad HTML, and the BeautifulSoup
1091  class has some tricks for dealing with some HTML that kills
1092  sgmllib, but Beautiful Soup can nonetheless choke or lose data
1093  if your data uses self-closing tags or declarations
1094  incorrectly.
1095 
1096  By default, Beautiful Soup uses regexes to sanitize input,
1097  avoiding the vast majority of these problems. If the problems
1098  don't apply to you, pass in False for markupMassage, and
1099  you'll get better performance.
1100 
1101  The default parser massage techniques fix the two most common
1102  instances of invalid HTML that choke sgmllib:
1103 
1104  <br/> (No space between name of closing tag and tag close)
1105  <! --Comment--> (Extraneous whitespace in declaration)
1106 
1107  You can pass in a custom list of (RE object, replace method)
1108  tuples to get Beautiful Soup to scrub your input the way you
1109  want."""
1110 
1111  self.parseOnlyThese = parseOnlyThese
1112  self.fromEncoding = fromEncoding
1113  self.smartQuotesTo = smartQuotesTo
1114  self.convertEntities = convertEntities
1115  # Set the rules for how we'll deal with the entities we
1116  # encounter
1117  if self.convertEntities:
1118  # It doesn't make sense to convert encoded characters to
1119  # entities even while you're converting entities to Unicode.
1120  # Just convert it all to Unicode.
1121  self.smartQuotesTo = None
1122  if convertEntities == self.HTML_ENTITIES:
1123  self.convertXMLEntities = False
1126  elif convertEntities == self.XHTML_ENTITIES:
1127  self.convertXMLEntities = True
1128  self.convertHTMLEntities = True
1129  self.escapeUnrecognizedEntities = False
1130  elif convertEntities == self.XML_ENTITIES:
1131  self.convertXMLEntities = True
1132  self.convertHTMLEntities = False
1133  self.escapeUnrecognizedEntities = False
1134  else:
1135  self.convertXMLEntities = False
1136  self.convertHTMLEntities = False
1137  self.escapeUnrecognizedEntities = False
1138 
1139  self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
1140  SGMLParser.__init__(self)
1141 
1142  if hasattr(markup, 'read'): # It's a file-type object.
1143  markup = markup.read()
1144  self.markup = markup
1145  self.markupMassage = markupMassage
1146  try:
1147  self._feed(isHTML=isHTML)
1148  except StopParsing:
1149  pass
1150  self.markup = None # The markup can now be GCed
1151 
def buildTagMap(default, args)
def _feed(self, inDocumentEncoding=None, isHTML=False)

Member Function Documentation

def BeautifulSoup.BeautifulStoneSoup.__getattr__ (   self,
  methodName 
)
This method routes method call requests to either the SGMLParser
superclass or the Tag superclass, depending on the method name.

Definition at line 1195 of file BeautifulSoup.py.

Referenced by VarParsing.VarParsing.setType().

1195  def __getattr__(self, methodName):
1196  """This method routes method call requests to either the SGMLParser
1197  superclass or the Tag superclass, depending on the method name."""
1198  #print "__getattr__ called on %s.%s" % (self.__class__, methodName)
1199 
1200  if methodName.startswith('start_') or methodName.startswith('end_') \
1201  or methodName.startswith('do_'):
1202  return SGMLParser.__getattr__(self, methodName)
1203  elif not methodName.startswith('__'):
1204  return Tag.__getattr__(self, methodName)
1205  else:
1206  raise AttributeError
1207 
def __getattr__(self, methodName)
def BeautifulSoup.BeautifulStoneSoup._feed (   self,
  inDocumentEncoding = None,
  isHTML = False 
)
private

Definition at line 1162 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.markup.

1162  def _feed(self, inDocumentEncoding=None, isHTML=False):
1163  # Convert the document to Unicode.
1164  markup = self.markup
1165  if isinstance(markup, unicode):
1166  if not hasattr(self, 'originalEncoding'):
1167  self.originalEncoding = None
1168  else:
1169  dammit = UnicodeDammit\
1170  (markup, [self.fromEncoding, inDocumentEncoding],
1171  smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
1172  markup = dammit.unicode
1173  self.originalEncoding = dammit.originalEncoding
1174  self.declaredHTMLEncoding = dammit.declaredHTMLEncoding
1175  if markup:
1176  if self.markupMassage:
1177  if not hasattr(self.markupMassage, "__iter__"):
1178  self.markupMassage = self.MARKUP_MASSAGE
1179  for fix, m in self.markupMassage:
1180  markup = fix.sub(m, markup)
1181  # TODO: We get rid of markupMassage so that the
1182  # soup object can be deepcopied later on. Some
1183  # Python installations can't copy regexes. If anyone
1184  # was relying on the existence of markupMassage, this
1185  # might cause problems.
1186  del(self.markupMassage)
1187  self.reset()
1188 
1189  SGMLParser.feed(self, markup)
1190  # Close out any unfinished strings and close all the open tags.
1191  self.endData()
1192  while self.currentTag.name != self.ROOT_TAG_NAME:
1193  self.popTag()
1194 
def _feed(self, inDocumentEncoding=None, isHTML=False)
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup._popToTag (   self,
  name,
  inclusivePop = True 
)
private
Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
stack up to but *not* including the most recent instqance of
the given tag.

Definition at line 1262 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.popTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._smartPop(), and BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

1262  def _popToTag(self, name, inclusivePop=True):
1263  """Pops the tag stack up to and including the most recent
1264  instance of the given tag. If inclusivePop is false, pops the tag
1265  stack up to but *not* including the most recent instqance of
1266  the given tag."""
1267  #print "Popping to %s" % name
1268  if name == self.ROOT_TAG_NAME:
1269  return
1270 
1271  numPops = 0
1272  mostRecentTag = None
1273  for i in range(len(self.tagStack)-1, 0, -1):
1274  if name == self.tagStack[i].name:
1275  numPops = len(self.tagStack)-i
1276  break
1277  if not inclusivePop:
1278  numPops = numPops - 1
1279 
1280  for i in range(0, numPops):
1281  mostRecentTag = self.popTag()
1282  return mostRecentTag
1283 
def _popToTag(self, name, inclusivePop=True)
def BeautifulSoup.BeautifulStoneSoup._smartPop (   self,
  name 
)
private
We need to pop up to the previous tag of this type, unless
one of this tag's nesting reset triggers comes between this
tag and the previous tag of this type, OR unless this tag is a
generic nesting trigger and another generic nesting trigger
comes between this tag and the previous tag of this type.

Examples:
 <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
 <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
 <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.

 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
 <td><tr><td> *<td>* should pop to 'tr', not the first 'td'

Definition at line 1284 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1284  def _smartPop(self, name):
1285 
1286  """We need to pop up to the previous tag of this type, unless
1287  one of this tag's nesting reset triggers comes between this
1288  tag and the previous tag of this type, OR unless this tag is a
1289  generic nesting trigger and another generic nesting trigger
1290  comes between this tag and the previous tag of this type.
1291 
1292  Examples:
1293  <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
1294  <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
1295  <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.
1296 
1297  <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
1298  <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
1299  <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
1300  """
1301 
1302  nestingResetTriggers = self.NESTABLE_TAGS.get(name)
1303  isNestable = nestingResetTriggers != None
1304  isResetNesting = self.RESET_NESTING_TAGS.has_key(name)
1305  popTo = None
1306  inclusive = True
1307  for i in range(len(self.tagStack)-1, 0, -1):
1308  p = self.tagStack[i]
1309  if (not p or p.name == name) and not isNestable:
1310  #Non-nestable tags get popped to the top or to their
1311  #last occurance.
1312  popTo = name
1313  break
1314  if (nestingResetTriggers is not None
1315  and p.name in nestingResetTriggers) \
1316  or (nestingResetTriggers is None and isResetNesting
1317  and self.RESET_NESTING_TAGS.has_key(p.name)):
1318 
1319  #If we encounter one of the nesting reset triggers
1320  #peculiar to this tag, or we encounter another tag
1321  #that causes nesting to reset, pop up to but not
1322  #including that tag.
1323  popTo = p.name
1324  inclusive = False
1325  break
1326  p = p.parent
1327  if popTo:
1328  self._popToTag(popTo, inclusive)
1329 
def _popToTag(self, name, inclusivePop=True)
def BeautifulSoup.BeautifulStoneSoup._toStringSubclass (   self,
  text,
  subclass 
)
private
Adds a certain piece of text to the tree as a NavigableString
subclass.

Definition at line 1376 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.endData(), and BeautifulSoup.BeautifulStoneSoup.handle_data().

Referenced by BeautifulSoup.BeautifulStoneSoup.handle_comment(), BeautifulSoup.BeautifulStoneSoup.handle_decl(), BeautifulSoup.BeautifulStoneSoup.handle_pi(), and BeautifulSoup.BeautifulStoneSoup.parse_declaration().

1376  def _toStringSubclass(self, text, subclass):
1377  """Adds a certain piece of text to the tree as a NavigableString
1378  subclass."""
1379  self.endData()
1380  self.handle_data(text)
1381  self.endData(subclass)
1382 
def _toStringSubclass(self, text, subclass)
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup.convert_charref (   self,
  name 
)
This method fixes a bug in Python's SGMLParser.

Definition at line 1152 of file BeautifulSoup.py.

References createfilelist.int.

1152  def convert_charref(self, name):
1153  """This method fixes a bug in Python's SGMLParser."""
1154  try:
1155  n = int(name)
1156  except ValueError:
1157  return
1158  if not 0 <= n <= 127 : # ASCII ends at 127, not 255
1159  return
1160  return self.convert_codepoint(n)
1161 
def BeautifulSoup.BeautifulStoneSoup.endData (   self,
  containerClass = NavigableString 
)

Definition at line 1239 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentData, BeautifulSoup.BeautifulStoneSoup.currentTag, reco::helper::VirtualJetProducerHelper.intersection(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.PageElement.previous, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._toStringSubclass(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1239  def endData(self, containerClass=NavigableString):
1240  if self.currentData:
1241  currentData = u''.join(self.currentData)
1242  if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
1243  not set([tag.name for tag in self.tagStack]).intersection(
1244  self.PRESERVE_WHITESPACE_TAGS)):
1245  if '\n' in currentData:
1246  currentData = '\n'
1247  else:
1248  currentData = ' '
1249  self.currentData = []
1250  if self.parseOnlyThese and len(self.tagStack) <= 1 and \
1251  (not self.parseOnlyThese.text or \
1252  not self.parseOnlyThese.search(currentData)):
1253  return
1254  o = containerClass(currentData)
1255  o.setup(self.currentTag, self.previous)
1256  if self.previous:
1257  self.previous.next = o
1258  self.previous = o
1259  self.currentTag.contents.append(o)
1260 
1261 
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup.handle_charref (   self,
  ref 
)

Definition at line 1395 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.convertEntities, BeautifulSoup.BeautifulStoneSoup.handle_data(), and createfilelist.int.

1395  def handle_charref(self, ref):
1396  "Handle character references as data."
1397  if self.convertEntities:
1398  data = unichr(int(ref))
1399  else:
1400  data = '&#%s;' % ref
1401  self.handle_data(data)
1402 
def BeautifulSoup.BeautifulStoneSoup.handle_comment (   self,
  text 
)

Definition at line 1391 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1391  def handle_comment(self, text):
1392  "Handle comments as Comment objects."
1393  self._toStringSubclass(text, Comment)
1394 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.handle_data (   self,
  data 
)
def BeautifulSoup.BeautifulStoneSoup.handle_decl (   self,
  data 
)

Definition at line 1446 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1446  def handle_decl(self, data):
1447  "Handle DOCTYPEs and the like as Declaration objects."
1448  self._toStringSubclass(data, Declaration)
1449 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.handle_entityref (   self,
  ref 
)
Handle entity references as data, possibly converting known
HTML and/or XML entity references to the corresponding Unicode
characters.

Definition at line 1403 of file BeautifulSoup.py.

References BeautifulSoup.Tag.convertHTMLEntities, BeautifulSoup.Tag.convertXMLEntities, and BeautifulSoup.BeautifulStoneSoup.handle_data().

1403  def handle_entityref(self, ref):
1404  """Handle entity references as data, possibly converting known
1405  HTML and/or XML entity references to the corresponding Unicode
1406  characters."""
1407  data = None
1408  if self.convertHTMLEntities:
1409  try:
1410  data = unichr(name2codepoint[ref])
1411  except KeyError:
1412  pass
1413 
1414  if not data and self.convertXMLEntities:
1415  data = self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref)
1416 
1417  if not data and self.convertHTMLEntities and \
1418  not self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref):
1419  # TODO: We've got a problem here. We're told this is
1420  # an entity reference, but it's not an XML entity
1421  # reference or an HTML entity reference. Nonetheless,
1422  # the logical thing to do is to pass it through as an
1423  # unrecognized entity reference.
1424  #
1425  # Except: when the input is "&carol;" this function
1426  # will be called with input "carol". When the input is
1427  # "AT&T", this function will be called with input
1428  # "T". We have no way of knowing whether a semicolon
1429  # was present originally, so we don't know whether
1430  # this is an unknown entity or just a misplaced
1431  # ampersand.
1432  #
1433  # The more common case is a misplaced ampersand, so I
1434  # escape the ampersand and omit the trailing semicolon.
1435  data = "&amp;%s" % ref
1436  if not data:
1437  # This case is different from the one above, because we
1438  # haven't already gone through a supposedly comprehensive
1439  # mapping of entities to Unicode characters. We might not
1440  # have gone through any mapping at all. So the chances are
1441  # very high that this is a real entity, and not a
1442  # misplaced ampersand.
1443  data = "&%s;" % ref
1444  self.handle_data(data)
1445 
def BeautifulSoup.BeautifulStoneSoup.handle_pi (   self,
  text 
)
Handle a processing instruction as a ProcessingInstruction
object, possibly one with a %SOUP-ENCODING% slot into which an
encoding will be plugged later.

Definition at line 1383 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1383  def handle_pi(self, text):
1384  """Handle a processing instruction as a ProcessingInstruction
1385  object, possibly one with a %SOUP-ENCODING% slot into which an
1386  encoding will be plugged later."""
1387  if text[:3] == "xml":
1388  text = u"xml version='1.0' encoding='%SOUP-ENCODING%'"
1389  self._toStringSubclass(text, ProcessingInstruction)
1390 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag (   self,
  name 
)
Returns true iff the given string is the name of a
self-closing tag according to this parser.

Definition at line 1208 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1208  def isSelfClosingTag(self, name):
1209  """Returns true iff the given string is the name of a
1210  self-closing tag according to this parser."""
1211  return self.SELF_CLOSING_TAGS.has_key(name) \
1212  or self.instanceSelfClosingTags.has_key(name)
1213 
def BeautifulSoup.BeautifulStoneSoup.parse_declaration (   self,
  i 
)
Treat a bogus SGML declaration as raw data. Treat a CDATA
declaration as a CData object.

Definition at line 1450 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass(), BeautifulSoup.BeautifulStoneSoup.handle_data(), and DQMNet::Object.rawdata.

1450  def parse_declaration(self, i):
1451  """Treat a bogus SGML declaration as raw data. Treat a CDATA
1452  declaration as a CData object."""
1453  j = None
1454  if self.rawdata[i:i+9] == '<![CDATA[':
1455  k = self.rawdata.find(']]>', i)
1456  if k == -1:
1457  k = len(self.rawdata)
1458  data = self.rawdata[i+9:k]
1459  j = k+3
1460  self._toStringSubclass(data, CData)
1461  else:
1462  try:
1463  j = SGMLParser.parse_declaration(self, i)
1464  except SGMLParseError:
1465  toHandle = self.rawdata[i:]
1466  self.handle_data(toHandle)
1467  j = i + len(toHandle)
1468  return j
1469 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.popTag (   self)

Definition at line 1224 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1224  def popTag(self):
1225  tag = self.tagStack.pop()
1226 
1227  #print "Pop", tag.name
1228  if self.tagStack:
1229  self.currentTag = self.tagStack[-1]
1230  return self.currentTag
1231 
def BeautifulSoup.BeautifulStoneSoup.pushTag (   self,
  tag 
)

Definition at line 1232 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1232  def pushTag(self, tag):
1233  #print "Push", tag.name
1234  if self.currentTag:
1235  self.currentTag.contents.append(tag)
1236  self.tagStack.append(tag)
1237  self.currentTag = self.tagStack[-1]
1238 
def BeautifulSoup.BeautifulStoneSoup.reset (   self)

Definition at line 1214 of file BeautifulSoup.py.

1214  def reset(self):
1215  Tag.__init__(self, self, self.ROOT_TAG_NAME)
1216  self.hidden = 1
1217  SGMLParser.reset(self)
1218  self.currentData = []
1219  self.currentTag = None
1220  self.tagStack = []
1221  self.quoteStack = []
1222  self.pushTag(self)
1223 
def BeautifulSoup.BeautifulStoneSoup.unknown_endtag (   self,
  name 
)

Definition at line 1360 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.literal, and BeautifulSoup.BeautifulStoneSoup.quoteStack.

1360  def unknown_endtag(self, name):
1361  #print "End tag %s" % name
1362  if self.quoteStack and self.quoteStack[-1] != name:
1363  #This is not a real end tag.
1364  #print "</%s> is not real!" % name
1365  self.handle_data('</%s>' % name)
1366  return
1367  self.endData()
1368  self._popToTag(name)
1369  if self.quoteStack and self.quoteStack[-1] == name:
1370  self.quoteStack.pop()
1371  self.literal = (len(self.quoteStack) > 0)
1372 
def _popToTag(self, name, inclusivePop=True)
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup.unknown_starttag (   self,
  name,
  attrs,
  selfClosing = 0 
)

Definition at line 1330 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._smartPop(), BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.PageElement.previous, BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.quoteStack, and BeautifulSoup.BeautifulStoneSoup.tagStack.

1330  def unknown_starttag(self, name, attrs, selfClosing=0):
1331  #print "Start tag %s: %s" % (name, attrs)
1332  if self.quoteStack:
1333  #This is not a real tag.
1334  #print "<%s> is not real!" % name
1335  attrs = ''.join([' %s="%s"' % (x, y) for x, y in attrs])
1336  self.handle_data('<%s%s>' % (name, attrs))
1337  return
1338  self.endData()
1339 
1340  if not self.isSelfClosingTag(name) and not selfClosing:
1341  self._smartPop(name)
1342 
1343  if self.parseOnlyThese and len(self.tagStack) <= 1 \
1344  and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
1345  return
1346 
1347  tag = Tag(self, name, attrs, self.currentTag, self.previous)
1348  if self.previous:
1349  self.previous.next = tag
1350  self.previous = tag
1351  self.pushTag(tag)
1352  if selfClosing or self.isSelfClosingTag(name):
1353  self.popTag()
1354  if name in self.QUOTE_TAGS:
1355  #print "Beginning quote (%s)" % name
1356  self.quoteStack.append(name)
1357  self.literal = 1
1358  return tag
1359 
def unknown_starttag(self, name, attrs, selfClosing=0)
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
def endData(self, containerClass=NavigableString)

Member Data Documentation

BeautifulSoup.BeautifulStoneSoup.convertEntities

Definition at line 1114 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.handle_charref().

BeautifulSoup.BeautifulStoneSoup.convertHTMLEntities

Definition at line 1124 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.convertXMLEntities

Definition at line 1123 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.currentData

Definition at line 1218 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData().

BeautifulSoup.BeautifulStoneSoup.currentTag
BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding
BeautifulSoup.BeautifulStoneSoup.escapeUnrecognizedEntities

Definition at line 1125 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.fromEncoding

Definition at line 1112 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.hidden

Definition at line 1216 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.instanceSelfClosingTags

Definition at line 1139 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.literal

Definition at line 1357 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

BeautifulSoup.BeautifulStoneSoup.markup
BeautifulSoup.BeautifulStoneSoup.markupMassage

Definition at line 1145 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.originalEncoding

Definition at line 1167 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.parseOnlyThese
BeautifulSoup.BeautifulStoneSoup.previous

Definition at line 1258 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.quoteStack
BeautifulSoup.BeautifulStoneSoup.smartQuotesTo
BeautifulSoup.BeautifulStoneSoup.tagStack