CMS 3D CMS Logo

List of all members | Public Member Functions | Public Attributes | Private Member Functions
BeautifulSoup.BeautifulStoneSoup Class Reference
Inheritance diagram for BeautifulSoup.BeautifulStoneSoup:
BeautifulSoup.Tag BeautifulSoup.PageElement BeautifulSoup.BeautifulSOAP BeautifulSoup.BeautifulSoup BeautifulSoup.RobustXMLParser BeautifulSoup.SimplifyingSOAPParser BeautifulSoup.ICantBelieveItsBeautifulSoup BeautifulSoup.MinimalSoup BeautifulSoup.RobustHTMLParser BeautifulSoup.RobustWackAssHTMLParser BeautifulSoup.RobustInsanelyWackAssHTMLParser

Public Member Functions

def __getattr__ (self, methodName)
 
def __init__ (self, markup="", parseOnlyThese=None, fromEncoding=None, markupMassage=True, smartQuotesTo=XML_ENTITIES, convertEntities=None, selfClosingTags=None, isHTML=False)
 
def convert_charref (self, name)
 
def endData (self, containerClass=NavigableString)
 
def handle_charref (self, ref)
 
def handle_comment (self, text)
 
def handle_data (self, data)
 
def handle_decl (self, data)
 
def handle_entityref (self, ref)
 
def handle_pi (self, text)
 
def isSelfClosingTag (self, name)
 
def parse_declaration (self, i)
 
def popTag (self)
 
def pushTag (self, tag)
 
def reset (self)
 
def unknown_endtag (self, name)
 
def unknown_starttag (self, name, attrs, selfClosing=0)
 
- Public Member Functions inherited from BeautifulSoup.Tag
def __call__ (self, args, kwargs)
 
def __contains__ (self, x)
 
def __delitem__ (self, key)
 
def __eq__ (self, other)
 
def __getattr__ (self, tag)
 
def __getitem__ (self, key)
 
def __init__ (self, parser, name, attrs=None, parent=None, previous=None)
 
def __iter__ (self)
 
def __len__ (self)
 
def __ne__ (self, other)
 
def __nonzero__ (self)
 
def __repr__ (self, encoding=DEFAULT_OUTPUT_ENCODING)
 
def __setitem__ (self, key, value)
 
def __str__ (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)
 
def __unicode__ (self)
 
def childGenerator (self)
 
def clear (self)
 
def decompose (self)
 
def fetchText (self, text=None, recursive=True, limit=None)
 
def find (self, name=None, attrs={}, recursive=True, text=None, kwargs)
 
def findAll (self, name=None, attrs={}, recursive=True, text=None, limit=None, kwargs)
 
def firstText (self, text=None, recursive=True)
 
def get (self, key, default=None)
 
def getString (self)
 
def getText (self, separator=u"")
 
def has_key (self, key)
 
def index (self, element)
 
def prettify (self, encoding=DEFAULT_OUTPUT_ENCODING)
 
def recursiveChildGenerator (self)
 
def renderContents (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)
 
def setString (self, string)
 
- Public Member Functions inherited from BeautifulSoup.PageElement
def append (self, tag)
 
def extract (self)
 
def findAllNext (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def findAllPrevious (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def findNext (self, name=None, attrs={}, text=None, kwargs)
 
def findNextSibling (self, name=None, attrs={}, text=None, kwargs)
 
def findNextSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def findParent (self, name=None, attrs={}, kwargs)
 
def findParents (self, name=None, attrs={}, limit=None, kwargs)
 
def findPrevious (self, name=None, attrs={}, text=None, kwargs)
 
def findPreviousSibling (self, name=None, attrs={}, text=None, kwargs)
 
def findPreviousSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)
 
def insert (self, position, newChild)
 
def nextGenerator (self)
 
def nextSiblingGenerator (self)
 
def parentGenerator (self)
 
def previousGenerator (self)
 
def previousSiblingGenerator (self)
 
def replaceWith (self, replaceWith)
 
def replaceWithChildren (self)
 
def setup (self, parent=None, previous=None)
 
def substituteEncoding (self, str, encoding=None)
 
def toEncoding (self, s, encoding=None)
 

Public Attributes

 convertEntities
 
 convertHTMLEntities
 
 convertXMLEntities
 
 currentData
 
 currentTag
 
 declaredHTMLEncoding
 
 escapeUnrecognizedEntities
 
 fromEncoding
 
 hidden
 
 instanceSelfClosingTags
 
 literal
 
 markup
 
 markupMassage
 
 originalEncoding
 
 parseOnlyThese
 
 previous
 
 quoteStack
 
 smartQuotesTo
 
 tagStack
 
- Public Attributes inherited from BeautifulSoup.Tag
 attrMap
 
 attrs
 
 containsSubstitutions
 
 contents
 
 convertHTMLEntities
 
 convertXMLEntities
 
 escapeUnrecognizedEntities
 
 hidden
 
 isSelfClosing
 
 name
 
 parserClass
 
- Public Attributes inherited from BeautifulSoup.PageElement
 next
 
 nextSibling
 
 parent
 
 previous
 
 previousSibling
 

Private Member Functions

def _feed (self, inDocumentEncoding=None, isHTML=False)
 
def _popToTag (self, name, inclusivePop=True)
 
def _smartPop (self, name)
 
def _toStringSubclass (self, text, subclass)
 

Additional Inherited Members

- Properties inherited from BeautifulSoup.Tag
 string = property(getString, setString)
 
 text = property(getText)
 

Detailed Description

This class contains the basic parser and search code. It defines
a parser that knows nothing about tag behavior except for the
following:

  You can't close a tag without closing all the tags it encloses.
  That is, "<foo><bar></foo>" actually means
  "<foo><bar></bar></foo>".

[Another possible explanation is "<foo><bar /></foo>", but since
this class defines no SELF_CLOSING_TAGS, it will never use that
explanation.]

This class is useful for parsing XML or made-up markup languages,
or when BeautifulSoup makes an assumption counter to what you were
expecting.

Definition at line 1040 of file BeautifulSoup.py.

Constructor & Destructor Documentation

def BeautifulSoup.BeautifulStoneSoup.__init__ (   self,
  markup = "",
  parseOnlyThese = None,
  fromEncoding = None,
  markupMassage = True,
  smartQuotesTo = XML_ENTITIES,
  convertEntities = None,
  selfClosingTags = None,
  isHTML = False 
)
The Soup object is initialized as the 'root tag', and the
provided markup (which can be a string or a file-like object)
is fed into the underlying parser.

sgmllib will process most bad HTML, and the BeautifulSoup
class has some tricks for dealing with some HTML that kills
sgmllib, but Beautiful Soup can nonetheless choke or lose data
if your data uses self-closing tags or declarations
incorrectly.

By default, Beautiful Soup uses regexes to sanitize input,
avoiding the vast majority of these problems. If the problems
don't apply to you, pass in False for markupMassage, and
you'll get better performance.

The default parser massage techniques fix the two most common
instances of invalid HTML that choke sgmllib:

 <br/> (No space between name of closing tag and tag close)
 <! --Comment--> (Extraneous whitespace in declaration)

You can pass in a custom list of (RE object, replace method)
tuples to get Beautiful Soup to scrub your input the way you
want.

Definition at line 1086 of file BeautifulSoup.py.

1086  convertEntities=None, selfClosingTags=None, isHTML=False):
1087  """The Soup object is initialized as the 'root tag', and the
1088  provided markup (which can be a string or a file-like object)
1089  is fed into the underlying parser.
1090 
1091  sgmllib will process most bad HTML, and the BeautifulSoup
1092  class has some tricks for dealing with some HTML that kills
1093  sgmllib, but Beautiful Soup can nonetheless choke or lose data
1094  if your data uses self-closing tags or declarations
1095  incorrectly.
1096 
1097  By default, Beautiful Soup uses regexes to sanitize input,
1098  avoiding the vast majority of these problems. If the problems
1099  don't apply to you, pass in False for markupMassage, and
1100  you'll get better performance.
1101 
1102  The default parser massage techniques fix the two most common
1103  instances of invalid HTML that choke sgmllib:
1104 
1105  <br/> (No space between name of closing tag and tag close)
1106  <! --Comment--> (Extraneous whitespace in declaration)
1107 
1108  You can pass in a custom list of (RE object, replace method)
1109  tuples to get Beautiful Soup to scrub your input the way you
1110  want."""
1111 
1112  self.parseOnlyThese = parseOnlyThese
1113  self.fromEncoding = fromEncoding
1114  self.smartQuotesTo = smartQuotesTo
1115  self.convertEntities = convertEntities
1116  # Set the rules for how we'll deal with the entities we
1117  # encounter
1118  if self.convertEntities:
1119  # It doesn't make sense to convert encoded characters to
1120  # entities even while you're converting entities to Unicode.
1121  # Just convert it all to Unicode.
1122  self.smartQuotesTo = None
1123  if convertEntities == self.HTML_ENTITIES:
1124  self.convertXMLEntities = False
1127  elif convertEntities == self.XHTML_ENTITIES:
1128  self.convertXMLEntities = True
1129  self.convertHTMLEntities = True
1130  self.escapeUnrecognizedEntities = False
1131  elif convertEntities == self.XML_ENTITIES:
1132  self.convertXMLEntities = True
1133  self.convertHTMLEntities = False
1134  self.escapeUnrecognizedEntities = False
1135  else:
1136  self.convertXMLEntities = False
1137  self.convertHTMLEntities = False
1138  self.escapeUnrecognizedEntities = False
1139 
1140  self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
1141  SGMLParser.__init__(self)
1142 
1143  if hasattr(markup, 'read'): # It's a file-type object.
1144  markup = markup.read()
1145  self.markup = markup
1146  self.markupMassage = markupMassage
1147  try:
1148  self._feed(isHTML=isHTML)
1149  except StopParsing:
1150  pass
1151  self.markup = None # The markup can now be GCed
1152 
def buildTagMap(default, args)
def _feed(self, inDocumentEncoding=None, isHTML=False)

Member Function Documentation

def BeautifulSoup.BeautifulStoneSoup.__getattr__ (   self,
  methodName 
)
This method routes method call requests to either the SGMLParser
superclass or the Tag superclass, depending on the method name.

Definition at line 1196 of file BeautifulSoup.py.

Referenced by VarParsing.VarParsing.setType().

1196  def __getattr__(self, methodName):
1197  """This method routes method call requests to either the SGMLParser
1198  superclass or the Tag superclass, depending on the method name."""
1199  #print "__getattr__ called on %s.%s" % (self.__class__, methodName)
1200 
1201  if methodName.startswith('start_') or methodName.startswith('end_') \
1202  or methodName.startswith('do_'):
1203  return SGMLParser.__getattr__(self, methodName)
1204  elif not methodName.startswith('__'):
1205  return Tag.__getattr__(self, methodName)
1206  else:
1207  raise AttributeError
1208 
def __getattr__(self, methodName)
def BeautifulSoup.BeautifulStoneSoup._feed (   self,
  inDocumentEncoding = None,
  isHTML = False 
)
private

Definition at line 1163 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.markup.

1163  def _feed(self, inDocumentEncoding=None, isHTML=False):
1164  # Convert the document to Unicode.
1165  markup = self.markup
1166  if isinstance(markup, unicode):
1167  if not hasattr(self, 'originalEncoding'):
1168  self.originalEncoding = None
1169  else:
1170  dammit = UnicodeDammit\
1171  (markup, [self.fromEncoding, inDocumentEncoding],
1172  smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
1173  markup = dammit.unicode
1174  self.originalEncoding = dammit.originalEncoding
1175  self.declaredHTMLEncoding = dammit.declaredHTMLEncoding
1176  if markup:
1177  if self.markupMassage:
1178  if not hasattr(self.markupMassage, "__iter__"):
1179  self.markupMassage = self.MARKUP_MASSAGE
1180  for fix, m in self.markupMassage:
1181  markup = fix.sub(m, markup)
1182  # TODO: We get rid of markupMassage so that the
1183  # soup object can be deepcopied later on. Some
1184  # Python installations can't copy regexes. If anyone
1185  # was relying on the existence of markupMassage, this
1186  # might cause problems.
1187  del(self.markupMassage)
1188  self.reset()
1189 
1190  SGMLParser.feed(self, markup)
1191  # Close out any unfinished strings and close all the open tags.
1192  self.endData()
1193  while self.currentTag.name != self.ROOT_TAG_NAME:
1194  self.popTag()
1195 
def _feed(self, inDocumentEncoding=None, isHTML=False)
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup._popToTag (   self,
  name,
  inclusivePop = True 
)
private
Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
stack up to but *not* including the most recent instqance of
the given tag.

Definition at line 1263 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.popTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._smartPop(), and BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

1263  def _popToTag(self, name, inclusivePop=True):
1264  """Pops the tag stack up to and including the most recent
1265  instance of the given tag. If inclusivePop is false, pops the tag
1266  stack up to but *not* including the most recent instqance of
1267  the given tag."""
1268  #print "Popping to %s" % name
1269  if name == self.ROOT_TAG_NAME:
1270  return
1271 
1272  numPops = 0
1273  mostRecentTag = None
1274  for i in range(len(self.tagStack)-1, 0, -1):
1275  if name == self.tagStack[i].name:
1276  numPops = len(self.tagStack)-i
1277  break
1278  if not inclusivePop:
1279  numPops = numPops - 1
1280 
1281  for i in range(0, numPops):
1282  mostRecentTag = self.popTag()
1283  return mostRecentTag
1284 
def _popToTag(self, name, inclusivePop=True)
def BeautifulSoup.BeautifulStoneSoup._smartPop (   self,
  name 
)
private
We need to pop up to the previous tag of this type, unless
one of this tag's nesting reset triggers comes between this
tag and the previous tag of this type, OR unless this tag is a
generic nesting trigger and another generic nesting trigger
comes between this tag and the previous tag of this type.

Examples:
 <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
 <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
 <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.

 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
 <td><tr><td> *<td>* should pop to 'tr', not the first 'td'

Definition at line 1285 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1285  def _smartPop(self, name):
1286 
1287  """We need to pop up to the previous tag of this type, unless
1288  one of this tag's nesting reset triggers comes between this
1289  tag and the previous tag of this type, OR unless this tag is a
1290  generic nesting trigger and another generic nesting trigger
1291  comes between this tag and the previous tag of this type.
1292 
1293  Examples:
1294  <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
1295  <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
1296  <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.
1297 
1298  <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
1299  <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
1300  <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
1301  """
1302 
1303  nestingResetTriggers = self.NESTABLE_TAGS.get(name)
1304  isNestable = nestingResetTriggers != None
1305  isResetNesting = name in self.RESET_NESTING_TAGS
1306  popTo = None
1307  inclusive = True
1308  for i in range(len(self.tagStack)-1, 0, -1):
1309  p = self.tagStack[i]
1310  if (not p or p.name == name) and not isNestable:
1311  #Non-nestable tags get popped to the top or to their
1312  #last occurance.
1313  popTo = name
1314  break
1315  if (nestingResetTriggers is not None
1316  and p.name in nestingResetTriggers) \
1317  or (nestingResetTriggers is None and isResetNesting
1318  and p.name in self.RESET_NESTING_TAGS):
1319 
1320  #If we encounter one of the nesting reset triggers
1321  #peculiar to this tag, or we encounter another tag
1322  #that causes nesting to reset, pop up to but not
1323  #including that tag.
1324  popTo = p.name
1325  inclusive = False
1326  break
1327  p = p.parent
1328  if popTo:
1329  self._popToTag(popTo, inclusive)
1330 
def _popToTag(self, name, inclusivePop=True)
def BeautifulSoup.BeautifulStoneSoup._toStringSubclass (   self,
  text,
  subclass 
)
private
Adds a certain piece of text to the tree as a NavigableString
subclass.

Definition at line 1377 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.endData(), and BeautifulSoup.BeautifulStoneSoup.handle_data().

Referenced by BeautifulSoup.BeautifulStoneSoup.handle_comment(), BeautifulSoup.BeautifulStoneSoup.handle_decl(), BeautifulSoup.BeautifulStoneSoup.handle_pi(), and BeautifulSoup.BeautifulStoneSoup.parse_declaration().

1377  def _toStringSubclass(self, text, subclass):
1378  """Adds a certain piece of text to the tree as a NavigableString
1379  subclass."""
1380  self.endData()
1381  self.handle_data(text)
1382  self.endData(subclass)
1383 
def _toStringSubclass(self, text, subclass)
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup.convert_charref (   self,
  name 
)
This method fixes a bug in Python's SGMLParser.

Definition at line 1153 of file BeautifulSoup.py.

References createfilelist.int.

1153  def convert_charref(self, name):
1154  """This method fixes a bug in Python's SGMLParser."""
1155  try:
1156  n = int(name)
1157  except ValueError:
1158  return
1159  if not 0 <= n <= 127 : # ASCII ends at 127, not 255
1160  return
1161  return self.convert_codepoint(n)
1162 
def BeautifulSoup.BeautifulStoneSoup.endData (   self,
  containerClass = NavigableString 
)

Definition at line 1240 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentData, BeautifulSoup.BeautifulStoneSoup.currentTag, reco::helper::VirtualJetProducerHelper.intersection(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.PageElement.previous, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._toStringSubclass(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1240  def endData(self, containerClass=NavigableString):
1241  if self.currentData:
1242  currentData = u''.join(self.currentData)
1243  if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
1244  not set([tag.name for tag in self.tagStack]).intersection(
1245  self.PRESERVE_WHITESPACE_TAGS)):
1246  if '\n' in currentData:
1247  currentData = '\n'
1248  else:
1249  currentData = ' '
1250  self.currentData = []
1251  if self.parseOnlyThese and len(self.tagStack) <= 1 and \
1252  (not self.parseOnlyThese.text or \
1253  not self.parseOnlyThese.search(currentData)):
1254  return
1255  o = containerClass(currentData)
1256  o.setup(self.currentTag, self.previous)
1257  if self.previous:
1258  self.previous.next = o
1259  self.previous = o
1260  self.currentTag.contents.append(o)
1261 
1262 
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup.handle_charref (   self,
  ref 
)

Definition at line 1396 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.convertEntities, BeautifulSoup.BeautifulStoneSoup.handle_data(), and createfilelist.int.

1396  def handle_charref(self, ref):
1397  "Handle character references as data."
1398  if self.convertEntities:
1399  data = unichr(int(ref))
1400  else:
1401  data = '&#%s;' % ref
1402  self.handle_data(data)
1403 
def BeautifulSoup.BeautifulStoneSoup.handle_comment (   self,
  text 
)

Definition at line 1392 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1392  def handle_comment(self, text):
1393  "Handle comments as Comment objects."
1394  self._toStringSubclass(text, Comment)
1395 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.handle_data (   self,
  data 
)
def BeautifulSoup.BeautifulStoneSoup.handle_decl (   self,
  data 
)

Definition at line 1447 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1447  def handle_decl(self, data):
1448  "Handle DOCTYPEs and the like as Declaration objects."
1449  self._toStringSubclass(data, Declaration)
1450 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.handle_entityref (   self,
  ref 
)
Handle entity references as data, possibly converting known
HTML and/or XML entity references to the corresponding Unicode
characters.

Definition at line 1404 of file BeautifulSoup.py.

References BeautifulSoup.Tag.convertHTMLEntities, BeautifulSoup.Tag.convertXMLEntities, and BeautifulSoup.BeautifulStoneSoup.handle_data().

1404  def handle_entityref(self, ref):
1405  """Handle entity references as data, possibly converting known
1406  HTML and/or XML entity references to the corresponding Unicode
1407  characters."""
1408  data = None
1409  if self.convertHTMLEntities:
1410  try:
1411  data = unichr(name2codepoint[ref])
1412  except KeyError:
1413  pass
1414 
1415  if not data and self.convertXMLEntities:
1416  data = self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref)
1417 
1418  if not data and self.convertHTMLEntities and \
1419  not self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref):
1420  # TODO: We've got a problem here. We're told this is
1421  # an entity reference, but it's not an XML entity
1422  # reference or an HTML entity reference. Nonetheless,
1423  # the logical thing to do is to pass it through as an
1424  # unrecognized entity reference.
1425  #
1426  # Except: when the input is "&carol;" this function
1427  # will be called with input "carol". When the input is
1428  # "AT&T", this function will be called with input
1429  # "T". We have no way of knowing whether a semicolon
1430  # was present originally, so we don't know whether
1431  # this is an unknown entity or just a misplaced
1432  # ampersand.
1433  #
1434  # The more common case is a misplaced ampersand, so I
1435  # escape the ampersand and omit the trailing semicolon.
1436  data = "&amp;%s" % ref
1437  if not data:
1438  # This case is different from the one above, because we
1439  # haven't already gone through a supposedly comprehensive
1440  # mapping of entities to Unicode characters. We might not
1441  # have gone through any mapping at all. So the chances are
1442  # very high that this is a real entity, and not a
1443  # misplaced ampersand.
1444  data = "&%s;" % ref
1445  self.handle_data(data)
1446 
def BeautifulSoup.BeautifulStoneSoup.handle_pi (   self,
  text 
)
Handle a processing instruction as a ProcessingInstruction
object, possibly one with a %SOUP-ENCODING% slot into which an
encoding will be plugged later.

Definition at line 1384 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1384  def handle_pi(self, text):
1385  """Handle a processing instruction as a ProcessingInstruction
1386  object, possibly one with a %SOUP-ENCODING% slot into which an
1387  encoding will be plugged later."""
1388  if text[:3] == "xml":
1389  text = u"xml version='1.0' encoding='%SOUP-ENCODING%'"
1390  self._toStringSubclass(text, ProcessingInstruction)
1391 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag (   self,
  name 
)
Returns true iff the given string is the name of a
self-closing tag according to this parser.

Definition at line 1209 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.instanceSelfClosingTags.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1209  def isSelfClosingTag(self, name):
1210  """Returns true iff the given string is the name of a
1211  self-closing tag according to this parser."""
1212  return name in self.SELF_CLOSING_TAGS \
1213  or name in self.instanceSelfClosingTags
1214 
def BeautifulSoup.BeautifulStoneSoup.parse_declaration (   self,
  i 
)
Treat a bogus SGML declaration as raw data. Treat a CDATA
declaration as a CData object.

Definition at line 1451 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass(), BeautifulSoup.BeautifulStoneSoup.handle_data(), and DQMNet::Object.rawdata.

1451  def parse_declaration(self, i):
1452  """Treat a bogus SGML declaration as raw data. Treat a CDATA
1453  declaration as a CData object."""
1454  j = None
1455  if self.rawdata[i:i+9] == '<![CDATA[':
1456  k = self.rawdata.find(']]>', i)
1457  if k == -1:
1458  k = len(self.rawdata)
1459  data = self.rawdata[i+9:k]
1460  j = k+3
1461  self._toStringSubclass(data, CData)
1462  else:
1463  try:
1464  j = SGMLParser.parse_declaration(self, i)
1465  except SGMLParseError:
1466  toHandle = self.rawdata[i:]
1467  self.handle_data(toHandle)
1468  j = i + len(toHandle)
1469  return j
1470 
def _toStringSubclass(self, text, subclass)
def BeautifulSoup.BeautifulStoneSoup.popTag (   self)

Definition at line 1225 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1225  def popTag(self):
1226  tag = self.tagStack.pop()
1227 
1228  #print "Pop", tag.name
1229  if self.tagStack:
1230  self.currentTag = self.tagStack[-1]
1231  return self.currentTag
1232 
def BeautifulSoup.BeautifulStoneSoup.pushTag (   self,
  tag 
)

Definition at line 1233 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1233  def pushTag(self, tag):
1234  #print "Push", tag.name
1235  if self.currentTag:
1236  self.currentTag.contents.append(tag)
1237  self.tagStack.append(tag)
1238  self.currentTag = self.tagStack[-1]
1239 
def BeautifulSoup.BeautifulStoneSoup.reset (   self)

Definition at line 1215 of file BeautifulSoup.py.

1215  def reset(self):
1216  Tag.__init__(self, self, self.ROOT_TAG_NAME)
1217  self.hidden = 1
1218  SGMLParser.reset(self)
1219  self.currentData = []
1220  self.currentTag = None
1221  self.tagStack = []
1222  self.quoteStack = []
1223  self.pushTag(self)
1224 
def BeautifulSoup.BeautifulStoneSoup.unknown_endtag (   self,
  name 
)

Definition at line 1361 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.literal, and BeautifulSoup.BeautifulStoneSoup.quoteStack.

1361  def unknown_endtag(self, name):
1362  #print "End tag %s" % name
1363  if self.quoteStack and self.quoteStack[-1] != name:
1364  #This is not a real end tag.
1365  #print "</%s> is not real!" % name
1366  self.handle_data('</%s>' % name)
1367  return
1368  self.endData()
1369  self._popToTag(name)
1370  if self.quoteStack and self.quoteStack[-1] == name:
1371  self.quoteStack.pop()
1372  self.literal = (len(self.quoteStack) > 0)
1373 
def _popToTag(self, name, inclusivePop=True)
def endData(self, containerClass=NavigableString)
def BeautifulSoup.BeautifulStoneSoup.unknown_starttag (   self,
  name,
  attrs,
  selfClosing = 0 
)

Definition at line 1331 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._smartPop(), BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.PageElement.previous, BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.quoteStack, and BeautifulSoup.BeautifulStoneSoup.tagStack.

1331  def unknown_starttag(self, name, attrs, selfClosing=0):
1332  #print "Start tag %s: %s" % (name, attrs)
1333  if self.quoteStack:
1334  #This is not a real tag.
1335  #print "<%s> is not real!" % name
1336  attrs = ''.join([' %s="%s"' % (x, y) for x, y in attrs])
1337  self.handle_data('<%s%s>' % (name, attrs))
1338  return
1339  self.endData()
1340 
1341  if not self.isSelfClosingTag(name) and not selfClosing:
1342  self._smartPop(name)
1343 
1344  if self.parseOnlyThese and len(self.tagStack) <= 1 \
1345  and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
1346  return
1347 
1348  tag = Tag(self, name, attrs, self.currentTag, self.previous)
1349  if self.previous:
1350  self.previous.next = tag
1351  self.previous = tag
1352  self.pushTag(tag)
1353  if selfClosing or self.isSelfClosingTag(name):
1354  self.popTag()
1355  if name in self.QUOTE_TAGS:
1356  #print "Beginning quote (%s)" % name
1357  self.quoteStack.append(name)
1358  self.literal = 1
1359  return tag
1360 
def unknown_starttag(self, name, attrs, selfClosing=0)
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
def endData(self, containerClass=NavigableString)

Member Data Documentation

BeautifulSoup.BeautifulStoneSoup.convertEntities

Definition at line 1115 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.handle_charref().

BeautifulSoup.BeautifulStoneSoup.convertHTMLEntities

Definition at line 1125 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.convertXMLEntities

Definition at line 1124 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.currentData

Definition at line 1219 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData().

BeautifulSoup.BeautifulStoneSoup.currentTag
BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding
BeautifulSoup.BeautifulStoneSoup.escapeUnrecognizedEntities

Definition at line 1126 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.fromEncoding

Definition at line 1113 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.hidden

Definition at line 1217 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.instanceSelfClosingTags
BeautifulSoup.BeautifulStoneSoup.literal

Definition at line 1358 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

BeautifulSoup.BeautifulStoneSoup.markup
BeautifulSoup.BeautifulStoneSoup.markupMassage

Definition at line 1146 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.originalEncoding

Definition at line 1168 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.parseOnlyThese
BeautifulSoup.BeautifulStoneSoup.previous

Definition at line 1259 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.quoteStack
BeautifulSoup.BeautifulStoneSoup.smartQuotesTo
BeautifulSoup.BeautifulStoneSoup.tagStack