CMS 3D CMS Logo

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Pages
List of all members | Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions
BeautifulSoup.BeautifulStoneSoup Class Reference
Inheritance diagram for BeautifulSoup.BeautifulStoneSoup:
BeautifulSoup.Tag BeautifulSoup.PageElement BeautifulSoup.BeautifulSOAP BeautifulSoup.BeautifulSoup BeautifulSoup.RobustXMLParser BeautifulSoup.SimplifyingSOAPParser BeautifulSoup.ICantBelieveItsBeautifulSoup BeautifulSoup.MinimalSoup BeautifulSoup.RobustHTMLParser BeautifulSoup.RobustWackAssHTMLParser BeautifulSoup.RobustInsanelyWackAssHTMLParser

Public Member Functions

def __getattr__
 
def __init__
 
def convert_charref
 
def endData
 
def handle_charref
 
def handle_comment
 
def handle_data
 
def handle_decl
 
def handle_entityref
 
def handle_pi
 
def isSelfClosingTag
 
def parse_declaration
 
def popTag
 
def pushTag
 
def reset
 
def unknown_endtag
 
def unknown_starttag
 
- Public Member Functions inherited from BeautifulSoup.Tag
def __call__
 
def __contains__
 
def __delitem__
 
def __eq__
 
def __getattr__
 
def __getitem__
 
def __init__
 
def __iter__
 
def __len__
 
def __ne__
 
def __nonzero__
 
def __repr__
 
def __setitem__
 
def __str__
 
def __unicode__
 
def childGenerator
 
def clear
 
def decompose
 
def fetchText
 
def find
 
def findAll
 
def firstText
 
def get
 
def getString
 
def getText
 
def has_key
 
def index
 
def prettify
 
def recursiveChildGenerator
 
def renderContents
 
def setString
 

Public Attributes

 convertEntities
 
 convertHTMLEntities
 
 convertXMLEntities
 
 currentData
 
 currentTag
 
 declaredHTMLEncoding
 
 escapeUnrecognizedEntities
 
 fromEncoding
 
 hidden
 
 instanceSelfClosingTags
 
 literal
 
 markup
 
 markupMassage
 
 originalEncoding
 
 parseOnlyThese
 
 previous
 
 quoteStack
 
 smartQuotesTo
 
 tagStack
 
- Public Attributes inherited from BeautifulSoup.Tag
 attrMap
 
 attrs
 
 containsSubstitutions
 
 contents
 
 convertHTMLEntities
 
 convertXMLEntities
 
 escapeUnrecognizedEntities
 
 hidden
 
 isSelfClosing
 
 name
 
 parserClass
 

Static Public Attributes

 ALL_ENTITIES = XHTML_ENTITIES
 
string HTML_ENTITIES = "html"
 
list MARKUP_MASSAGE
 
dictionary NESTABLE_TAGS = {}
 
list PRESERVE_WHITESPACE_TAGS = []
 
dictionary QUOTE_TAGS = {}
 
dictionary RESET_NESTING_TAGS = {}
 
string ROOT_TAG_NAME = u'[document]'
 
dictionary SELF_CLOSING_TAGS = {}
 
dictionary STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, }
 
string XHTML_ENTITIES = "xhtml"
 
string XML_ENTITIES = "xml"
 
- Static Public Attributes inherited from BeautifulSoup.Tag
 fetch = findAll
 
 findChild = find
 
 findChildren = findAll
 
 first = find
 

Private Member Functions

def _feed
 
def _popToTag
 
def _smartPop
 
def _toStringSubclass
 

Additional Inherited Members

- Properties inherited from BeautifulSoup.Tag
 string = property(getString, setString)
 
 text = property(getText)
 

Detailed Description

This class contains the basic parser and search code. It defines
a parser that knows nothing about tag behavior except for the
following:

  You can't close a tag without closing all the tags it encloses.
  That is, "<foo><bar></foo>" actually means
  "<foo><bar></bar></foo>".

[Another possible explanation is "<foo><bar /></foo>", but since
this class defines no SELF_CLOSING_TAGS, it will never use that
explanation.]

This class is useful for parsing XML or made-up markup languages,
or when BeautifulSoup makes an assumption counter to what you were
expecting.

Definition at line 1039 of file BeautifulSoup.py.

Constructor & Destructor Documentation

def BeautifulSoup.BeautifulStoneSoup.__init__ (   self,
  markup = "",
  parseOnlyThese = None,
  fromEncoding = None,
  markupMassage = True,
  smartQuotesTo = XML_ENTITIES,
  convertEntities = None,
  selfClosingTags = None,
  isHTML = False 
)
The Soup object is initialized as the 'root tag', and the
provided markup (which can be a string or a file-like object)
is fed into the underlying parser.

sgmllib will process most bad HTML, and the BeautifulSoup
class has some tricks for dealing with some HTML that kills
sgmllib, but Beautiful Soup can nonetheless choke or lose data
if your data uses self-closing tags or declarations
incorrectly.

By default, Beautiful Soup uses regexes to sanitize input,
avoiding the vast majority of these problems. If the problems
don't apply to you, pass in False for markupMassage, and
you'll get better performance.

The default parser massage techniques fix the two most common
instances of invalid HTML that choke sgmllib:

 <br/> (No space between name of closing tag and tag close)
 <! --Comment--> (Extraneous whitespace in declaration)

You can pass in a custom list of (RE object, replace method)
tuples to get Beautiful Soup to scrub your input the way you
want.

Definition at line 1085 of file BeautifulSoup.py.

1086  convertEntities=None, selfClosingTags=None, isHTML=False):
1087  """The Soup object is initialized as the 'root tag', and the
1088  provided markup (which can be a string or a file-like object)
1089  is fed into the underlying parser.
1090 
1091  sgmllib will process most bad HTML, and the BeautifulSoup
1092  class has some tricks for dealing with some HTML that kills
1093  sgmllib, but Beautiful Soup can nonetheless choke or lose data
1094  if your data uses self-closing tags or declarations
1095  incorrectly.
1096 
1097  By default, Beautiful Soup uses regexes to sanitize input,
1098  avoiding the vast majority of these problems. If the problems
1099  don't apply to you, pass in False for markupMassage, and
1100  you'll get better performance.
1101 
1102  The default parser massage techniques fix the two most common
1103  instances of invalid HTML that choke sgmllib:
1104 
1105  <br/> (No space between name of closing tag and tag close)
1106  <! --Comment--> (Extraneous whitespace in declaration)
1107 
1108  You can pass in a custom list of (RE object, replace method)
1109  tuples to get Beautiful Soup to scrub your input the way you
1110  want."""
1112  self.parseOnlyThese = parseOnlyThese
1113  self.fromEncoding = fromEncoding
1114  self.smartQuotesTo = smartQuotesTo
1115  self.convertEntities = convertEntities
1116  # Set the rules for how we'll deal with the entities we
1117  # encounter
1118  if self.convertEntities:
1119  # It doesn't make sense to convert encoded characters to
1120  # entities even while you're converting entities to Unicode.
1121  # Just convert it all to Unicode.
1122  self.smartQuotesTo = None
1123  if convertEntities == self.HTML_ENTITIES:
1124  self.convertXMLEntities = False
1126  self.escapeUnrecognizedEntities = True
1127  elif convertEntities == self.XHTML_ENTITIES:
1128  self.convertXMLEntities = True
1129  self.convertHTMLEntities = True
1130  self.escapeUnrecognizedEntities = False
1131  elif convertEntities == self.XML_ENTITIES:
1132  self.convertXMLEntities = True
1133  self.convertHTMLEntities = False
1134  self.escapeUnrecognizedEntities = False
1135  else:
1136  self.convertXMLEntities = False
1137  self.convertHTMLEntities = False
1138  self.escapeUnrecognizedEntities = False
1140  self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
1141  SGMLParser.__init__(self)
1142 
1143  if hasattr(markup, 'read'): # It's a file-type object.
1144  markup = markup.read()
1145  self.markup = markup
1146  self.markupMassage = markupMassage
1147  try:
1148  self._feed(isHTML=isHTML)
1149  except StopParsing:
1150  pass
1151  self.markup = None # The markup can now be GCed

Member Function Documentation

def BeautifulSoup.BeautifulStoneSoup.__getattr__ (   self,
  methodName 
)
This method routes method call requests to either the SGMLParser
superclass or the Tag superclass, depending on the method name.

Definition at line 1195 of file BeautifulSoup.py.

Referenced by VarParsing.VarParsing.setType().

1196  def __getattr__(self, methodName):
1197  """This method routes method call requests to either the SGMLParser
1198  superclass or the Tag superclass, depending on the method name."""
1199  #print "__getattr__ called on %s.%s" % (self.__class__, methodName)
1200 
1201  if methodName.startswith('start_') or methodName.startswith('end_') \
1202  or methodName.startswith('do_'):
1203  return SGMLParser.__getattr__(self, methodName)
1204  elif not methodName.startswith('__'):
1205  return Tag.__getattr__(self, methodName)
1206  else:
1207  raise AttributeError
def BeautifulSoup.BeautifulStoneSoup._feed (   self,
  inDocumentEncoding = None,
  isHTML = False 
)
private

Definition at line 1162 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.markup.

1163  def _feed(self, inDocumentEncoding=None, isHTML=False):
1164  # Convert the document to Unicode.
1165  markup = self.markup
1166  if isinstance(markup, unicode):
1167  if not hasattr(self, 'originalEncoding'):
1168  self.originalEncoding = None
1169  else:
1170  dammit = UnicodeDammit\
1171  (markup, [self.fromEncoding, inDocumentEncoding],
1172  smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
1173  markup = dammit.unicode
1174  self.originalEncoding = dammit.originalEncoding
1175  self.declaredHTMLEncoding = dammit.declaredHTMLEncoding
1176  if markup:
1177  if self.markupMassage:
1178  if not hasattr(self.markupMassage, "__iter__"):
1179  self.markupMassage = self.MARKUP_MASSAGE
1180  for fix, m in self.markupMassage:
1181  markup = fix.sub(m, markup)
1182  # TODO: We get rid of markupMassage so that the
1183  # soup object can be deepcopied later on. Some
1184  # Python installations can't copy regexes. If anyone
1185  # was relying on the existence of markupMassage, this
1186  # might cause problems.
1187  del(self.markupMassage)
1188  self.reset()
1189 
1190  SGMLParser.feed(self, markup)
1191  # Close out any unfinished strings and close all the open tags.
1192  self.endData()
1193  while self.currentTag.name != self.ROOT_TAG_NAME:
1194  self.popTag()
def BeautifulSoup.BeautifulStoneSoup._popToTag (   self,
  name,
  inclusivePop = True 
)
private
Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
stack up to but *not* including the most recent instqance of
the given tag.

Definition at line 1262 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._smartPop(), and BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

1263  def _popToTag(self, name, inclusivePop=True):
1264  """Pops the tag stack up to and including the most recent
1265  instance of the given tag. If inclusivePop is false, pops the tag
1266  stack up to but *not* including the most recent instqance of
1267  the given tag."""
1268  #print "Popping to %s" % name
1269  if name == self.ROOT_TAG_NAME:
1270  return
1271 
1272  numPops = 0
1273  mostRecentTag = None
1274  for i in range(len(self.tagStack)-1, 0, -1):
1275  if name == self.tagStack[i].name:
1276  numPops = len(self.tagStack)-i
1277  break
1278  if not inclusivePop:
1279  numPops = numPops - 1
1280 
1281  for i in range(0, numPops):
1282  mostRecentTag = self.popTag()
1283  return mostRecentTag
def BeautifulSoup.BeautifulStoneSoup._smartPop (   self,
  name 
)
private
We need to pop up to the previous tag of this type, unless
one of this tag's nesting reset triggers comes between this
tag and the previous tag of this type, OR unless this tag is a
generic nesting trigger and another generic nesting trigger
comes between this tag and the previous tag of this type.

Examples:
 <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
 <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
 <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.

 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
 <td><tr><td> *<td>* should pop to 'tr', not the first 'td'

Definition at line 1284 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1285  def _smartPop(self, name):
1286 
1287  """We need to pop up to the previous tag of this type, unless
1288  one of this tag's nesting reset triggers comes between this
1289  tag and the previous tag of this type, OR unless this tag is a
1290  generic nesting trigger and another generic nesting trigger
1291  comes between this tag and the previous tag of this type.
1292 
1293  Examples:
1294  <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
1295  <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
1296  <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.
1297 
1298  <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
1299  <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
1300  <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
1301  """
1302 
1303  nestingResetTriggers = self.NESTABLE_TAGS.get(name)
1304  isNestable = nestingResetTriggers != None
1305  isResetNesting = self.RESET_NESTING_TAGS.has_key(name)
1306  popTo = None
1307  inclusive = True
1308  for i in range(len(self.tagStack)-1, 0, -1):
1309  p = self.tagStack[i]
1310  if (not p or p.name == name) and not isNestable:
1311  #Non-nestable tags get popped to the top or to their
1312  #last occurance.
1313  popTo = name
1314  break
1315  if (nestingResetTriggers is not None
1316  and p.name in nestingResetTriggers) \
1317  or (nestingResetTriggers is None and isResetNesting
1318  and self.RESET_NESTING_TAGS.has_key(p.name)):
1319 
1320  #If we encounter one of the nesting reset triggers
1321  #peculiar to this tag, or we encounter another tag
1322  #that causes nesting to reset, pop up to but not
1323  #including that tag.
1324  popTo = p.name
1325  inclusive = False
1326  break
1327  p = p.parent
1328  if popTo:
1329  self._popToTag(popTo, inclusive)
def BeautifulSoup.BeautifulStoneSoup._toStringSubclass (   self,
  text,
  subclass 
)
private
Adds a certain piece of text to the tree as a NavigableString
subclass.

Definition at line 1376 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.endData(), and BeautifulSoup.BeautifulStoneSoup.handle_data().

Referenced by BeautifulSoup.BeautifulStoneSoup.handle_comment(), BeautifulSoup.BeautifulStoneSoup.handle_decl(), BeautifulSoup.BeautifulStoneSoup.handle_pi(), and BeautifulSoup.BeautifulStoneSoup.parse_declaration().

1377  def _toStringSubclass(self, text, subclass):
1378  """Adds a certain piece of text to the tree as a NavigableString
1379  subclass."""
1380  self.endData()
1381  self.handle_data(text)
1382  self.endData(subclass)
def BeautifulSoup.BeautifulStoneSoup.convert_charref (   self,
  name 
)
This method fixes a bug in Python's SGMLParser.

Definition at line 1152 of file BeautifulSoup.py.

1153  def convert_charref(self, name):
1154  """This method fixes a bug in Python's SGMLParser."""
1155  try:
1156  n = int(name)
1157  except ValueError:
1158  return
1159  if not 0 <= n <= 127 : # ASCII ends at 127, not 255
1160  return
1161  return self.convert_codepoint(n)
def BeautifulSoup.BeautifulStoneSoup.endData (   self,
  containerClass = NavigableString 
)

Definition at line 1239 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentData, BeautifulSoup.BeautifulStoneSoup.currentTag, reco::helper::VirtualJetProducerHelper.intersection(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.PRESERVE_WHITESPACE_TAGS, BeautifulSoup.BeautifulStoneSoup.previous, BeautifulSoup.BeautifulStoneSoup.STRIP_ASCII_SPACES, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._toStringSubclass(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1240  def endData(self, containerClass=NavigableString):
1241  if self.currentData:
1242  currentData = u''.join(self.currentData)
1243  if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
1244  not set([tag.name for tag in self.tagStack]).intersection(
1245  self.PRESERVE_WHITESPACE_TAGS)):
1246  if '\n' in currentData:
1247  currentData = '\n'
1248  else:
1249  currentData = ' '
1250  self.currentData = []
1251  if self.parseOnlyThese and len(self.tagStack) <= 1 and \
1252  (not self.parseOnlyThese.text or \
1253  not self.parseOnlyThese.search(currentData)):
1254  return
1255  o = containerClass(currentData)
1256  o.setup(self.currentTag, self.previous)
1257  if self.previous:
1258  self.previous.next = o
1259  self.previous = o
1260  self.currentTag.contents.append(o)
1261 
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
def BeautifulSoup.BeautifulStoneSoup.handle_charref (   self,
  ref 
)

Definition at line 1395 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.convertEntities, and BeautifulSoup.BeautifulStoneSoup.handle_data().

1396  def handle_charref(self, ref):
1397  "Handle character references as data."
1398  if self.convertEntities:
1399  data = unichr(int(ref))
1400  else:
1401  data = '&#%s;' % ref
1402  self.handle_data(data)
def BeautifulSoup.BeautifulStoneSoup.handle_comment (   self,
  text 
)

Definition at line 1391 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1392  def handle_comment(self, text):
1393  "Handle comments as Comment objects."
1394  self._toStringSubclass(text, Comment)
def BeautifulSoup.BeautifulStoneSoup.handle_data (   self,
  data 
)

Definition at line 1373 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup._toStringSubclass(), BeautifulSoup.BeautifulStoneSoup.handle_charref(), BeautifulSoup.BeautifulStoneSoup.handle_entityref(), BeautifulSoup.BeautifulStoneSoup.parse_declaration(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1374  def handle_data(self, data):
1375  self.currentData.append(data)
def BeautifulSoup.BeautifulStoneSoup.handle_decl (   self,
  data 
)

Definition at line 1446 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1447  def handle_decl(self, data):
1448  "Handle DOCTYPEs and the like as Declaration objects."
1449  self._toStringSubclass(data, Declaration)
def BeautifulSoup.BeautifulStoneSoup.handle_entityref (   self,
  ref 
)
Handle entity references as data, possibly converting known
HTML and/or XML entity references to the corresponding Unicode
characters.

Definition at line 1403 of file BeautifulSoup.py.

References BeautifulSoup.Tag.convertHTMLEntities, BeautifulSoup.Tag.convertXMLEntities, and BeautifulSoup.BeautifulStoneSoup.handle_data().

1404  def handle_entityref(self, ref):
1405  """Handle entity references as data, possibly converting known
1406  HTML and/or XML entity references to the corresponding Unicode
1407  characters."""
1408  data = None
1409  if self.convertHTMLEntities:
1410  try:
1411  data = unichr(name2codepoint[ref])
1412  except KeyError:
1413  pass
1414 
1415  if not data and self.convertXMLEntities:
1416  data = self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref)
1417 
1418  if not data and self.convertHTMLEntities and \
1419  not self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref):
1420  # TODO: We've got a problem here. We're told this is
1421  # an entity reference, but it's not an XML entity
1422  # reference or an HTML entity reference. Nonetheless,
1423  # the logical thing to do is to pass it through as an
1424  # unrecognized entity reference.
1425  #
1426  # Except: when the input is "&carol;" this function
1427  # will be called with input "carol". When the input is
1428  # "AT&T", this function will be called with input
1429  # "T". We have no way of knowing whether a semicolon
1430  # was present originally, so we don't know whether
1431  # this is an unknown entity or just a misplaced
1432  # ampersand.
1433  #
1434  # The more common case is a misplaced ampersand, so I
1435  # escape the ampersand and omit the trailing semicolon.
1436  data = "&amp;%s" % ref
1437  if not data:
1438  # This case is different from the one above, because we
1439  # haven't already gone through a supposedly comprehensive
1440  # mapping of entities to Unicode characters. We might not
1441  # have gone through any mapping at all. So the chances are
1442  # very high that this is a real entity, and not a
1443  # misplaced ampersand.
1444  data = "&%s;" % ref
1445  self.handle_data(data)
def BeautifulSoup.BeautifulStoneSoup.handle_pi (   self,
  text 
)
Handle a processing instruction as a ProcessingInstruction
object, possibly one with a %SOUP-ENCODING% slot into which an
encoding will be plugged later.

Definition at line 1383 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass().

1384  def handle_pi(self, text):
1385  """Handle a processing instruction as a ProcessingInstruction
1386  object, possibly one with a %SOUP-ENCODING% slot into which an
1387  encoding will be plugged later."""
1388  if text[:3] == "xml":
1389  text = u"xml version='1.0' encoding='%SOUP-ENCODING%'"
1390  self._toStringSubclass(text, ProcessingInstruction)
def BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag (   self,
  name 
)
Returns true iff the given string is the name of a
self-closing tag according to this parser.

Definition at line 1208 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1209  def isSelfClosingTag(self, name):
1210  """Returns true iff the given string is the name of a
1211  self-closing tag according to this parser."""
1212  return self.SELF_CLOSING_TAGS.has_key(name) \
1213  or self.instanceSelfClosingTags.has_key(name)
def BeautifulSoup.BeautifulStoneSoup.parse_declaration (   self,
  i 
)
Treat a bogus SGML declaration as raw data. Treat a CDATA
declaration as a CData object.

Definition at line 1450 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._toStringSubclass(), BeautifulSoup.BeautifulStoneSoup.handle_data(), and DQMNet::Object.rawdata.

1451  def parse_declaration(self, i):
1452  """Treat a bogus SGML declaration as raw data. Treat a CDATA
1453  declaration as a CData object."""
1454  j = None
1455  if self.rawdata[i:i+9] == '<![CDATA[':
1456  k = self.rawdata.find(']]>', i)
1457  if k == -1:
1458  k = len(self.rawdata)
1459  data = self.rawdata[i+9:k]
1460  j = k+3
1461  self._toStringSubclass(data, CData)
1462  else:
1463  try:
1464  j = SGMLParser.parse_declaration(self, i)
1465  except SGMLParseError:
1466  toHandle = self.rawdata[i:]
1467  self.handle_data(toHandle)
1468  j = i + len(toHandle)
1469  return j
def BeautifulSoup.BeautifulStoneSoup.popTag (   self)

Definition at line 1224 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1225  def popTag(self):
1226  tag = self.tagStack.pop()
1227 
1228  #print "Pop", tag.name
1229  if self.tagStack:
1230  self.currentTag = self.tagStack[-1]
1231  return self.currentTag
def BeautifulSoup.BeautifulStoneSoup.pushTag (   self,
  tag 
)

Definition at line 1232 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1233  def pushTag(self, tag):
1234  #print "Push", tag.name
1235  if self.currentTag:
1236  self.currentTag.contents.append(tag)
1237  self.tagStack.append(tag)
1238  self.currentTag = self.tagStack[-1]
def BeautifulSoup.BeautifulStoneSoup.reset (   self)

Definition at line 1214 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME.

1215  def reset(self):
1216  Tag.__init__(self, self, self.ROOT_TAG_NAME)
1217  self.hidden = 1
1218  SGMLParser.reset(self)
1219  self.currentData = []
1220  self.currentTag = None
1221  self.tagStack = []
1222  self.quoteStack = []
1223  self.pushTag(self)
def BeautifulSoup.BeautifulStoneSoup.unknown_endtag (   self,
  name 
)

Definition at line 1360 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.literal, and BeautifulSoup.BeautifulStoneSoup.quoteStack.

1361  def unknown_endtag(self, name):
1362  #print "End tag %s" % name
1363  if self.quoteStack and self.quoteStack[-1] != name:
1364  #This is not a real end tag.
1365  #print "</%s> is not real!" % name
1366  self.handle_data('</%s>' % name)
1367  return
1368  self.endData()
1369  self._popToTag(name)
1370  if self.quoteStack and self.quoteStack[-1] == name:
1371  self.quoteStack.pop()
1372  self.literal = (len(self.quoteStack) > 0)
def BeautifulSoup.BeautifulStoneSoup.unknown_starttag (   self,
  name,
  attrs,
  selfClosing = 0 
)

Definition at line 1330 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._smartPop(), BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulStoneSoup.previous, BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.QUOTE_TAGS, BeautifulSoup.BeautifulStoneSoup.quoteStack, and BeautifulSoup.BeautifulStoneSoup.tagStack.

1331  def unknown_starttag(self, name, attrs, selfClosing=0):
1332  #print "Start tag %s: %s" % (name, attrs)
1333  if self.quoteStack:
1334  #This is not a real tag.
1335  #print "<%s> is not real!" % name
1336  attrs = ''.join([' %s="%s"' % (x, y) for x, y in attrs])
1337  self.handle_data('<%s%s>' % (name, attrs))
1338  return
1339  self.endData()
1340 
1341  if not self.isSelfClosingTag(name) and not selfClosing:
1342  self._smartPop(name)
1343 
1344  if self.parseOnlyThese and len(self.tagStack) <= 1 \
1345  and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
1346  return
1347 
1348  tag = Tag(self, name, attrs, self.currentTag, self.previous)
1349  if self.previous:
1350  self.previous.next = tag
1351  self.previous = tag
1352  self.pushTag(tag)
1353  if selfClosing or self.isSelfClosingTag(name):
1354  self.popTag()
1355  if name in self.QUOTE_TAGS:
1356  #print "Beginning quote (%s)" % name
1357  self.quoteStack.append(name)
1358  self.literal = 1
1359  return tag
static std::string join(char **cmd)
Definition: RemoteFile.cc:18

Member Data Documentation

BeautifulSoup.BeautifulStoneSoup.ALL_ENTITIES = XHTML_ENTITIES
static

Definition at line 1075 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.convertEntities

Definition at line 1114 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.handle_charref().

BeautifulSoup.BeautifulStoneSoup.convertHTMLEntities

Definition at line 1124 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.convertXMLEntities

Definition at line 1123 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.currentData

Definition at line 1218 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData().

BeautifulSoup.BeautifulStoneSoup.currentTag

Definition at line 1219 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulStoneSoup.pushTag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding

Definition at line 1174 of file BeautifulSoup.py.

Referenced by BeautifulSoup.UnicodeDammit._detectEncoding(), and BeautifulSoup.BeautifulSoup.start_meta().

BeautifulSoup.BeautifulStoneSoup.escapeUnrecognizedEntities

Definition at line 1125 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.fromEncoding

Definition at line 1112 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.hidden

Definition at line 1216 of file BeautifulSoup.py.

string BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES = "html"
static

Definition at line 1071 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulSoup.__init__().

BeautifulSoup.BeautifulStoneSoup.instanceSelfClosingTags

Definition at line 1139 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.literal

Definition at line 1357 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

BeautifulSoup.BeautifulStoneSoup.markup

Definition at line 1144 of file BeautifulSoup.py.

Referenced by BeautifulSoup.UnicodeDammit._convertFrom(), and BeautifulSoup.BeautifulStoneSoup._feed().

list BeautifulSoup.BeautifulStoneSoup.MARKUP_MASSAGE
static
Initial value:
1 = [(re.compile('(<[^<>]*)/>'),
2  lambda x: x.group(1) + ' />'),
3  (re.compile('<!\s+([^<>]*)>'),
4  lambda x: '<!' + x.group(1) + '>')
5  ]

Definition at line 1063 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.markupMassage

Definition at line 1145 of file BeautifulSoup.py.

dictionary BeautifulSoup.BeautifulStoneSoup.NESTABLE_TAGS = {}
static

Definition at line 1058 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.originalEncoding

Definition at line 1167 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.parseOnlyThese

Definition at line 1111 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

list BeautifulSoup.BeautifulStoneSoup.PRESERVE_WHITESPACE_TAGS = []
static

Definition at line 1061 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData().

BeautifulSoup.BeautifulStoneSoup.previous

Definition at line 1258 of file BeautifulSoup.py.

Referenced by BeautifulSoup.PageElement._invert(), BeautifulSoup.BeautifulStoneSoup.endData(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

dictionary BeautifulSoup.BeautifulStoneSoup.QUOTE_TAGS = {}
static

Definition at line 1060 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

BeautifulSoup.BeautifulStoneSoup.quoteStack

Definition at line 1221 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

dictionary BeautifulSoup.BeautifulStoneSoup.RESET_NESTING_TAGS = {}
static

Definition at line 1059 of file BeautifulSoup.py.

string BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME = u'[document]'
static

Definition at line 1069 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.reset().

dictionary BeautifulSoup.BeautifulStoneSoup.SELF_CLOSING_TAGS = {}
static

Definition at line 1057 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.smartQuotesTo

Definition at line 1113 of file BeautifulSoup.py.

Referenced by BeautifulSoup.UnicodeDammit._convertFrom(), and BeautifulSoup.UnicodeDammit._subMSChar().

dictionary BeautifulSoup.BeautifulStoneSoup.STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, }
static

Definition at line 1081 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData().

BeautifulSoup.BeautifulStoneSoup.tagStack

Definition at line 1220 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup._smartPop(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulSOAP.popTag(), BeautifulSoup.BeautifulStoneSoup.pushTag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

string BeautifulSoup.BeautifulStoneSoup.XHTML_ENTITIES = "xhtml"
static

Definition at line 1073 of file BeautifulSoup.py.

string BeautifulSoup.BeautifulStoneSoup.XML_ENTITIES = "xml"
static

Definition at line 1072 of file BeautifulSoup.py.