CMS 3D CMS Logo

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Pages
List of all members | Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions
BeautifulSoup.BeautifulStoneSoup Class Reference
Inheritance diagram for BeautifulSoup.BeautifulStoneSoup:
BeautifulSoup.Tag BeautifulSoup.Tag BeautifulSoup.PageElement BeautifulSoup.PageElement BeautifulSoup.PageElement BeautifulSoup.PageElement BeautifulSoup.BeautifulSOAP BeautifulSoup.BeautifulSOAP BeautifulSoup.BeautifulSoup BeautifulSoup.BeautifulSoup BeautifulSoup.RobustXMLParser BeautifulSoup.RobustXMLParser BeautifulSoup.SimplifyingSOAPParser BeautifulSoup.SimplifyingSOAPParser BeautifulSoup.SimplifyingSOAPParser BeautifulSoup.SimplifyingSOAPParser BeautifulSoup.ICantBelieveItsBeautifulSoup BeautifulSoup.ICantBelieveItsBeautifulSoup BeautifulSoup.MinimalSoup BeautifulSoup.MinimalSoup BeautifulSoup.RobustHTMLParser BeautifulSoup.RobustHTMLParser BeautifulSoup.ICantBelieveItsBeautifulSoup BeautifulSoup.ICantBelieveItsBeautifulSoup BeautifulSoup.MinimalSoup BeautifulSoup.MinimalSoup BeautifulSoup.RobustHTMLParser BeautifulSoup.RobustHTMLParser

Public Member Functions

def __init__
 
def __init__
 
def endData
 
def endData
 
def extractCharsetFromMeta
 
def extractCharsetFromMeta
 
def handle_data
 
def handle_data
 
def isSelfClosingTag
 
def isSelfClosingTag
 
def popTag
 
def popTag
 
def pushTag
 
def pushTag
 
def reset
 
def reset
 
def unknown_endtag
 
def unknown_endtag
 
def unknown_starttag
 
def unknown_starttag
 
- Public Member Functions inherited from BeautifulSoup.PageElement
def append
 
def append
 
def extract
 
def extract
 
def findAllNext
 
def findAllNext
 
def findAllPrevious
 
def findAllPrevious
 
def findNext
 
def findNext
 
def findNextSibling
 
def findNextSibling
 
def findNextSiblings
 
def findNextSiblings
 
def findParent
 
def findParent
 
def findParents
 
def findParents
 
def findPrevious
 
def findPrevious
 
def findPreviousSibling
 
def findPreviousSibling
 
def findPreviousSiblings
 
def findPreviousSiblings
 
def insert
 
def insert
 
def nextGenerator
 
def nextGenerator
 
def nextSiblingGenerator
 
def nextSiblingGenerator
 
def parentGenerator
 
def parentGenerator
 
def previousGenerator
 
def previousGenerator
 
def previousSiblingGenerator
 
def previousSiblingGenerator
 
def replaceWith
 
def replaceWith
 
def setup
 
def setup
 
def substituteEncoding
 
def substituteEncoding
 
def toEncoding
 
def toEncoding
 

Public Attributes

 builder
 
 convertEntities
 
 convertHTMLEntities
 
 convertXMLEntities
 
 currentData
 
 currentTag
 
 declaredHTMLEncoding
 
 escapeUnrecognizedEntities
 
 fromEncoding
 
 hidden
 
 instanceSelfClosingTags
 
 literal
 
 markup
 
 markupMassage
 
 originalEncoding
 
 parseOnlyThese
 
 previous
 
 quoteStack
 
 smartQuotesTo
 
 tagStack
 
- Public Attributes inherited from BeautifulSoup.PageElement
 next
 
 nextSibling
 
 parent
 
 previous
 
 previousSibling
 

Static Public Attributes

 ALL_ENTITIES = XHTML_ENTITIES
 
string HTML_ENTITIES = "html"
 
list MARKUP_MASSAGE
 
dictionary NESTABLE_TAGS = {}
 
list PRESERVE_WHITESPACE_TAGS = []
 
dictionary QUOTE_TAGS = {}
 
dictionary RESET_NESTING_TAGS = {}
 
string ROOT_TAG_NAME = u'[document]'
 
dictionary SELF_CLOSING_TAGS = {}
 
dictionary STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, }
 
string XHTML_ENTITIES = "xhtml"
 
string XML_ENTITIES = "xml"
 
- Static Public Attributes inherited from BeautifulSoup.PageElement
 fetchNextSiblings = findNextSiblings
 
 fetchParents = findParents
 
 fetchPrevious = findAllPrevious
 
 fetchPreviousSiblings = findPreviousSiblings
 

Private Member Functions

def _feed
 
def _feed
 
def _popToTag
 
def _popToTag
 
def _smartPop
 
def _smartPop
 

Detailed Description

This class contains the basic parser and search code. It defines
a parser that knows nothing about tag behavior except for the
following:

  You can't close a tag without closing all the tags it encloses.
  That is, "<foo><bar></foo>" actually means
  "<foo><bar></bar></foo>".

[Another possible explanation is "<foo><bar /></foo>", but since
this class defines no SELF_CLOSING_TAGS, it will never use that
explanation.]

This class is useful for parsing XML or made-up markup languages,
or when BeautifulSoup makes an assumption counter to what you were
expecting.

Definition at line 1120 of file BeautifulSoup.py.

Constructor & Destructor Documentation

def BeautifulSoup.BeautifulStoneSoup.__init__ (   self,
  markup = "",
  parseOnlyThese = None,
  fromEncoding = None,
  markupMassage = True,
  smartQuotesTo = XML_ENTITIES,
  convertEntities = None,
  selfClosingTags = None,
  isHTML = False,
  builder = HTMLParserBuilder 
)
The Soup object is initialized as the 'root tag', and the
provided markup (which can be a string or a file-like object)
is fed into the underlying parser.

HTMLParser will process most bad HTML, and the BeautifulSoup
class has some tricks for dealing with some HTML that kills
HTMLParser, but Beautiful Soup can nonetheless choke or lose data
if your data uses self-closing tags or declarations
incorrectly.

By default, Beautiful Soup uses regexes to sanitize input,
avoiding the vast majority of these problems. If the problems
don't apply to you, pass in False for markupMassage, and
you'll get better performance.

The default parser massage techniques fix the two most common
instances of invalid HTML that choke HTMLParser:

 <br/> (No space between name of closing tag and tag close)
 <! --Comment--> (Extraneous whitespace in declaration)

You can pass in a custom list of (RE object, replace method)
tuples to get Beautiful Soup to scrub your input the way you
want.

Definition at line 1167 of file BeautifulSoup.py.

1168  builder=HTMLParserBuilder):
1169  """The Soup object is initialized as the 'root tag', and the
1170  provided markup (which can be a string or a file-like object)
1171  is fed into the underlying parser.
1172 
1173  HTMLParser will process most bad HTML, and the BeautifulSoup
1174  class has some tricks for dealing with some HTML that kills
1175  HTMLParser, but Beautiful Soup can nonetheless choke or lose data
1176  if your data uses self-closing tags or declarations
1177  incorrectly.
1178 
1179  By default, Beautiful Soup uses regexes to sanitize input,
1180  avoiding the vast majority of these problems. If the problems
1181  don't apply to you, pass in False for markupMassage, and
1182  you'll get better performance.
1183 
1184  The default parser massage techniques fix the two most common
1185  instances of invalid HTML that choke HTMLParser:
1186 
1187  <br/> (No space between name of closing tag and tag close)
1188  <! --Comment--> (Extraneous whitespace in declaration)
1189 
1190  You can pass in a custom list of (RE object, replace method)
1191  tuples to get Beautiful Soup to scrub your input the way you
1192  want."""
1194  self.parseOnlyThese = parseOnlyThese
1195  self.fromEncoding = fromEncoding
1196  self.smartQuotesTo = smartQuotesTo
1197  self.convertEntities = convertEntities
1198  # Set the rules for how we'll deal with the entities we
1199  # encounter
1200  if self.convertEntities:
1201  # It doesn't make sense to convert encoded characters to
1202  # entities even while you're converting entities to Unicode.
1203  # Just convert it all to Unicode.
1204  self.smartQuotesTo = None
1205  if convertEntities == self.HTML_ENTITIES:
1206  self.convertXMLEntities = False
1208  self.escapeUnrecognizedEntities = True
1209  elif convertEntities == self.XHTML_ENTITIES:
1210  self.convertXMLEntities = True
1211  self.convertHTMLEntities = True
1212  self.escapeUnrecognizedEntities = False
1213  elif convertEntities == self.XML_ENTITIES:
1214  self.convertXMLEntities = True
1215  self.convertHTMLEntities = False
1216  self.escapeUnrecognizedEntities = False
1217  else:
1218  self.convertXMLEntities = False
1219  self.convertHTMLEntities = False
1220  self.escapeUnrecognizedEntities = False
1222  self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
1223  self.builder = builder(self)
1224  self.reset()
1225 
1226  if hasattr(markup, 'read'): # It's a file-type object.
1227  markup = markup.read()
1228  self.markup = markup
1229  self.markupMassage = markupMassage
1230  try:
1231  self._feed(isHTML=isHTML)
1232  except StopParsing:
1233  pass
1234  self.markup = None # The markup can now be GCed.
1235  self.builder = None # So can the builder.
def BeautifulSoup.BeautifulStoneSoup.__init__ (   self,
  markup = "",
  parseOnlyThese = None,
  fromEncoding = None,
  markupMassage = True,
  smartQuotesTo = XML_ENTITIES,
  convertEntities = None,
  selfClosingTags = None,
  isHTML = False,
  builder = HTMLParserBuilder 
)
The Soup object is initialized as the 'root tag', and the
provided markup (which can be a string or a file-like object)
is fed into the underlying parser.

HTMLParser will process most bad HTML, and the BeautifulSoup
class has some tricks for dealing with some HTML that kills
HTMLParser, but Beautiful Soup can nonetheless choke or lose data
if your data uses self-closing tags or declarations
incorrectly.

By default, Beautiful Soup uses regexes to sanitize input,
avoiding the vast majority of these problems. If the problems
don't apply to you, pass in False for markupMassage, and
you'll get better performance.

The default parser massage techniques fix the two most common
instances of invalid HTML that choke HTMLParser:

 <br/> (No space between name of closing tag and tag close)
 <! --Comment--> (Extraneous whitespace in declaration)

You can pass in a custom list of (RE object, replace method)
tuples to get Beautiful Soup to scrub your input the way you
want.

Definition at line 1167 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._feed(), BeautifulSoup.BeautifulStoneSoup.builder, BeautifulSoup.buildTagMap(), BeautifulSoup.BeautifulStoneSoup.convertEntities, BeautifulSoup.BeautifulStoneSoup.convertHTMLEntities, BeautifulSoup.BeautifulStoneSoup.convertXMLEntities, BeautifulSoup.BeautifulStoneSoup.escapeUnrecognizedEntities, BeautifulSoup.BeautifulStoneSoup.fromEncoding, BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES, BeautifulSoup.BeautifulStoneSoup.instanceSelfClosingTags, BeautifulSoup.BeautifulStoneSoup.markup, BeautifulSoup.BeautifulStoneSoup.markupMassage, BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, SiPixelCalibDigi.reset(), BinningPointByMap.reset(), TPedResult.reset(), ora::NamedReference.reset(), pftools::CaloBox.reset(), AlcaBeamSpotManager.reset(), MatrixReader.MatrixReader.reset(), ora::IRelationalOperation.reset(), ApvLatencyAnalysis.reset(), pftools::CalibrationResultWrapper.reset(), ora::Handle< T >.reset(), ora::IteratorBuffer.reset(), TB06Tree.reset(), EcalCondHeader.reset(), VEcalCalibBlock.reset(), cond::BaseValueExtractor< T >.reset(), pftools::CaloRing.reset(), ora::ContainerIterator.reset(), L1MonitorDigi.reset(), SamplingAnalysis.reset(), CondIter< DataT >.reset(), edm::ProductID.reset(), ora::OId.reset(), pftools::CalibratableElement.reset(), edm::ProcessConfiguration::Transients.reset(), IMACalibBlock.reset(), edm::BranchMapper.reset(), edm::HLTGlobalStatus.reset(), DaqScopeModeAnalysis.reset(), L3CalibBlock.reset(), ora::MultiRecordInsertOperation.reset(), FedTimingAnalysis.reset(), edm::Parentage::Transients.reset(), pos::PixelROCStatus.reset(), edm::WrapperOwningHolder.reset(), L1DataEmulDigi.reset(), L1MuDTTFParameters.reset(), L1MuDTTFMasks.reset(), L1MuGMTReadoutCollection.reset(), TB06TreeH2.reset(), L1MuDTPhiLut.reset(), L1MuDTPtaLut.reset(), edm::Hash< I >.reset(), edm::ProductProvenance::Transients.reset(), BlockWipedAllocator::LocalCache.reset(), L1MuDTEtaPatternLut.reset(), pftools::CaloEllipse.reset(), FedCablingAnalysis.reset(), L1MuGMTReadoutRecord.reset(), L1MuDTQualPatternLut.reset(), PhysicsTools::Calibration::Histogram< Value_t, Axis_t >.reset(), reco::PFCluster.reset(), edm::HLTPathStatus.reset(), PedsOnlyAnalysis.reset(), reco::CaloCluster.reset(), L1MuGMTCand.reset(), L1MuDTExtLut.reset(), PhysicsTools::Calibration::Histogram2D< Value_t, AxisX_t, AxisY_t >.reset(), edm::TransientProductLookupMap.reset(), graphwalker< N, E >.reset(), VpspScanAnalysis.reset(), edm::EventTime.reset(), L1MuGMTExtendedCand.reset(), TB06Reco.reset(), coral_bridge::AuthenticationCredentialSet.reset(), PedestalsAnalysis.reset(), NoiseAnalysis.reset(), OptoScanAnalysis.reset(), GltDEDigi.reset(), PhysicsTools::Calibration::Histogram3D< Value_t, AxisX_t, AxisY_t, AxisZ_t >.reset(), TrackerMap.reset(), DDFilteredView.reset(), ApvTimingAnalysis.reset(), CalibrationAnalysis.reset(), FastFedCablingAnalysis.reset(), PedsFullNoiseAnalysis.reset(), ora::UniqueRef< T >.reset(), edm::ProcessHistory::Transients.reset(), edm::WrapperHolder.reset(), pat::eventhypothesis::Looper< T >.reset(), L1CaloMipQuietRegion.reset(), pftools::CandidateWrapper.reset(), CommissioningAnalysis.reset(), ora::InsertOperation.reset(), L1CaloRegion.reset(), ora::Ptr< T >.reset(), DDExpandedView.reset(), edm::BranchDescription::Transients.reset(), TB06RecoH2.reset(), ora::BulkInsertOperation.reset(), cond::XMLAuthenticationService::XMLAuthenticationService.reset(), L1GlobalTriggerObjectMap.reset(), edm::FileIndex::Transients.reset(), L1GlobalTriggerRecord.reset(), L1GlobalTriggerEvmReadoutRecord.reset(), pftools::CaloWindow.reset(), ora::UpdateOperation.reset(), pftools::Calibratable.reset(), edm::ProductRegistry::Transients.reset(), L1GtfeExtWord.reset(), ora::DeleteOperation.reset(), L1MuRegionalCand.reset(), L1GlobalTriggerReadoutRecord.reset(), LocalCache< T >.reset(), L1GtPsbWord.reset(), DTTFBitArray< N >.reset(), L1GtfeWord.reset(), ora::QueryableVector< Tp >.reset(), L1TcsWord.reset(), L1GtFdlWord.reset(), edm::IndexIntoFile::Transients.reset(), BeautifulSoup.BeautifulStoneSoup.reset(), BeautifulSoup.BeautifulStoneSoup.smartQuotesTo, BeautifulSoup.BeautifulStoneSoup.XHTML_ENTITIES, and BeautifulSoup.BeautifulStoneSoup.XML_ENTITIES.

1168  builder=HTMLParserBuilder):
1169  """The Soup object is initialized as the 'root tag', and the
1170  provided markup (which can be a string or a file-like object)
1171  is fed into the underlying parser.
1172 
1173  HTMLParser will process most bad HTML, and the BeautifulSoup
1174  class has some tricks for dealing with some HTML that kills
1175  HTMLParser, but Beautiful Soup can nonetheless choke or lose data
1176  if your data uses self-closing tags or declarations
1177  incorrectly.
1178 
1179  By default, Beautiful Soup uses regexes to sanitize input,
1180  avoiding the vast majority of these problems. If the problems
1181  don't apply to you, pass in False for markupMassage, and
1182  you'll get better performance.
1183 
1184  The default parser massage techniques fix the two most common
1185  instances of invalid HTML that choke HTMLParser:
1186 
1187  <br/> (No space between name of closing tag and tag close)
1188  <! --Comment--> (Extraneous whitespace in declaration)
1189 
1190  You can pass in a custom list of (RE object, replace method)
1191  tuples to get Beautiful Soup to scrub your input the way you
1192  want."""
1193 
1194  self.parseOnlyThese = parseOnlyThese
1195  self.fromEncoding = fromEncoding
1196  self.smartQuotesTo = smartQuotesTo
1197  self.convertEntities = convertEntities
1198  # Set the rules for how we'll deal with the entities we
1199  # encounter
1200  if self.convertEntities:
1201  # It doesn't make sense to convert encoded characters to
1202  # entities even while you're converting entities to Unicode.
1203  # Just convert it all to Unicode.
1204  self.smartQuotesTo = None
1205  if convertEntities == self.HTML_ENTITIES:
1206  self.convertXMLEntities = False
1207  self.convertHTMLEntities = True
1208  self.escapeUnrecognizedEntities = True
1209  elif convertEntities == self.XHTML_ENTITIES:
1210  self.convertXMLEntities = True
1211  self.convertHTMLEntities = True
1212  self.escapeUnrecognizedEntities = False
1213  elif convertEntities == self.XML_ENTITIES:
1214  self.convertXMLEntities = True
1215  self.convertHTMLEntities = False
1216  self.escapeUnrecognizedEntities = False
1217  else:
1218  self.convertXMLEntities = False
1219  self.convertHTMLEntities = False
1220  self.escapeUnrecognizedEntities = False
1221 
1222  self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
1223  self.builder = builder(self)
1224  self.reset()
1225 
1226  if hasattr(markup, 'read'): # It's a file-type object.
1227  markup = markup.read()
1228  self.markup = markup
1229  self.markupMassage = markupMassage
1230  try:
1231  self._feed(isHTML=isHTML)
1232  except StopParsing:
1233  pass
1234  self.markup = None # The markup can now be GCed.
1235  self.builder = None # So can the builder.

Member Function Documentation

def BeautifulSoup.BeautifulStoneSoup._feed (   self,
  inDocumentEncoding = None,
  isHTML = False 
)
private

Definition at line 1236 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.markup.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), BeautifulSoup.BeautifulStoneSoup._feed(), and BeautifulSoup.BeautifulSoup.extractCharsetFromMeta().

1237  def _feed(self, inDocumentEncoding=None, isHTML=False):
1238  # Convert the document to Unicode.
1239  markup = self.markup
1240  if isinstance(markup, unicode):
1241  if not hasattr(self, 'originalEncoding'):
1242  self.originalEncoding = None
1243  else:
1244  dammit = UnicodeDammit\
1245  (markup, [self.fromEncoding, inDocumentEncoding],
1246  smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
1247  markup = dammit.unicode
1248  self.originalEncoding = dammit.originalEncoding
1249  self.declaredHTMLEncoding = dammit.declaredHTMLEncoding
1250  if markup:
1251  if self.markupMassage:
1252  if not isList(self.markupMassage):
1253  self.markupMassage = self.MARKUP_MASSAGE
1254  for fix, m in self.markupMassage:
1255  markup = fix.sub(m, markup)
1256  # TODO: We get rid of markupMassage so that the
1257  # soup object can be deepcopied later on. Some
1258  # Python installations can't copy regexes. If anyone
1259  # was relying on the existence of markupMassage, this
1260  # might cause problems.
1261  del(self.markupMassage)
1262  self.builder.reset()
1263 
1264  self.builder.feed(markup)
1265  # Close out any unfinished strings and close all the open tags.
1266  self.endData()
1267  while self.currentTag.name != self.ROOT_TAG_NAME:
1268  self.popTag()
def BeautifulSoup.BeautifulStoneSoup._feed (   self,
  inDocumentEncoding = None,
  isHTML = False 
)
private

Definition at line 1236 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._feed(), BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding, BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.fromEncoding, BeautifulSoup.isList(), BeautifulSoup.BeautifulStoneSoup.markup, BeautifulSoup.BeautifulStoneSoup.MARKUP_MASSAGE, BeautifulSoup.BeautifulStoneSoup.markupMassage, BeautifulSoup.BeautifulStoneSoup.originalEncoding, BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME, and BeautifulSoup.BeautifulStoneSoup.smartQuotesTo.

1237  def _feed(self, inDocumentEncoding=None, isHTML=False):
1238  # Convert the document to Unicode.
1239  markup = self.markup
1240  if isinstance(markup, unicode):
1241  if not hasattr(self, 'originalEncoding'):
1242  self.originalEncoding = None
1243  else:
1244  dammit = UnicodeDammit\
1245  (markup, [self.fromEncoding, inDocumentEncoding],
1246  smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
1247  markup = dammit.unicode
1248  self.originalEncoding = dammit.originalEncoding
1249  self.declaredHTMLEncoding = dammit.declaredHTMLEncoding
1250  if markup:
1251  if self.markupMassage:
1252  if not isList(self.markupMassage):
1253  self.markupMassage = self.MARKUP_MASSAGE
1254  for fix, m in self.markupMassage:
1255  markup = fix.sub(m, markup)
1256  # TODO: We get rid of markupMassage so that the
1257  # soup object can be deepcopied later on. Some
1258  # Python installations can't copy regexes. If anyone
1259  # was relying on the existence of markupMassage, this
1260  # might cause problems.
1261  del(self.markupMassage)
1262  self.builder.reset()
1263 
1264  self.builder.feed(markup)
1265  # Close out any unfinished strings and close all the open tags.
1266  self.endData()
1267  while self.currentTag.name != self.ROOT_TAG_NAME:
1268  self.popTag()
def BeautifulSoup.BeautifulStoneSoup._popToTag (   self,
  name,
  inclusivePop = True 
)
private
Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
stack up to but *not* including the most recent instqance of
the given tag.

Definition at line 1329 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME, and BeautifulSoup.BeautifulStoneSoup.tagStack.

1330  def _popToTag(self, name, inclusivePop=True):
1331  """Pops the tag stack up to and including the most recent
1332  instance of the given tag. If inclusivePop is false, pops the tag
1333  stack up to but *not* including the most recent instqance of
1334  the given tag."""
1335  #print "Popping to %s" % name
1336  if name == self.ROOT_TAG_NAME:
1337  return
1338 
1339  numPops = 0
1340  mostRecentTag = None
1341  for i in range(len(self.tagStack)-1, 0, -1):
1342  if name == self.tagStack[i].name:
1343  numPops = len(self.tagStack)-i
1344  break
1345  if not inclusivePop:
1346  numPops = numPops - 1
1347 
1348  for i in range(0, numPops):
1349  mostRecentTag = self.popTag()
1350  return mostRecentTag
def BeautifulSoup.BeautifulStoneSoup._popToTag (   self,
  name,
  inclusivePop = True 
)
private
Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
stack up to but *not* including the most recent instqance of
the given tag.

Definition at line 1329 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup._smartPop(), and BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

1330  def _popToTag(self, name, inclusivePop=True):
1331  """Pops the tag stack up to and including the most recent
1332  instance of the given tag. If inclusivePop is false, pops the tag
1333  stack up to but *not* including the most recent instqance of
1334  the given tag."""
1335  #print "Popping to %s" % name
1336  if name == self.ROOT_TAG_NAME:
1337  return
1338 
1339  numPops = 0
1340  mostRecentTag = None
1341  for i in range(len(self.tagStack)-1, 0, -1):
1342  if name == self.tagStack[i].name:
1343  numPops = len(self.tagStack)-i
1344  break
1345  if not inclusivePop:
1346  numPops = numPops - 1
1347 
1348  for i in range(0, numPops):
1349  mostRecentTag = self.popTag()
1350  return mostRecentTag
def BeautifulSoup.BeautifulStoneSoup._smartPop (   self,
  name 
)
private
We need to pop up to the previous tag of this type, unless
one of this tag's nesting reset triggers comes between this
tag and the previous tag of this type, OR unless this tag is a
generic nesting trigger and another generic nesting trigger
comes between this tag and the previous tag of this type.

Examples:
 <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
 <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
 <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.

 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
 <td><tr><td> *<td>* should pop to 'tr', not the first 'td'

Definition at line 1351 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._smartPop(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1352  def _smartPop(self, name):
1353 
1354  """We need to pop up to the previous tag of this type, unless
1355  one of this tag's nesting reset triggers comes between this
1356  tag and the previous tag of this type, OR unless this tag is a
1357  generic nesting trigger and another generic nesting trigger
1358  comes between this tag and the previous tag of this type.
1359 
1360  Examples:
1361  <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
1362  <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
1363  <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.
1364 
1365  <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
1366  <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
1367  <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
1368  """
1369 
1370  nestingResetTriggers = self.NESTABLE_TAGS.get(name)
1371  isNestable = nestingResetTriggers != None
1372  isResetNesting = self.RESET_NESTING_TAGS.has_key(name)
1373  popTo = None
1374  inclusive = True
1375  for i in range(len(self.tagStack)-1, 0, -1):
1376  p = self.tagStack[i]
1377  if (not p or p.name == name) and not isNestable:
1378  #Non-nestable tags get popped to the top or to their
1379  #last occurance.
1380  popTo = name
1381  break
1382  if (nestingResetTriggers != None
1383  and p.name in nestingResetTriggers) \
1384  or (nestingResetTriggers == None and isResetNesting
1385  and self.RESET_NESTING_TAGS.has_key(p.name)):
1386 
1387  #If we encounter one of the nesting reset triggers
1388  #peculiar to this tag, or we encounter another tag
1389  #that causes nesting to reset, pop up to but not
1390  #including that tag.
1391  popTo = p.name
1392  inclusive = False
1393  break
1394  p = p.parent
1395  if popTo:
1396  self._popToTag(popTo, inclusive)
def BeautifulSoup.BeautifulStoneSoup._smartPop (   self,
  name 
)
private
We need to pop up to the previous tag of this type, unless
one of this tag's nesting reset triggers comes between this
tag and the previous tag of this type, OR unless this tag is a
generic nesting trigger and another generic nesting trigger
comes between this tag and the previous tag of this type.

Examples:
 <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
 <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
 <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.

 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
 <td><tr><td> *<td>* should pop to 'tr', not the first 'td'

Definition at line 1351 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup._smartPop(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

1352  def _smartPop(self, name):
1353 
1354  """We need to pop up to the previous tag of this type, unless
1355  one of this tag's nesting reset triggers comes between this
1356  tag and the previous tag of this type, OR unless this tag is a
1357  generic nesting trigger and another generic nesting trigger
1358  comes between this tag and the previous tag of this type.
1359 
1360  Examples:
1361  <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
1362  <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
1363  <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.
1364 
1365  <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
1366  <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
1367  <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
1368  """
1369 
1370  nestingResetTriggers = self.NESTABLE_TAGS.get(name)
1371  isNestable = nestingResetTriggers != None
1372  isResetNesting = self.RESET_NESTING_TAGS.has_key(name)
1373  popTo = None
1374  inclusive = True
1375  for i in range(len(self.tagStack)-1, 0, -1):
1376  p = self.tagStack[i]
1377  if (not p or p.name == name) and not isNestable:
1378  #Non-nestable tags get popped to the top or to their
1379  #last occurance.
1380  popTo = name
1381  break
1382  if (nestingResetTriggers != None
1383  and p.name in nestingResetTriggers) \
1384  or (nestingResetTriggers == None and isResetNesting
1385  and self.RESET_NESTING_TAGS.has_key(p.name)):
1386 
1387  #If we encounter one of the nesting reset triggers
1388  #peculiar to this tag, or we encounter another tag
1389  #that causes nesting to reset, pop up to but not
1390  #including that tag.
1391  popTo = p.name
1392  inclusive = False
1393  break
1394  p = p.parent
1395  if popTo:
1396  self._popToTag(popTo, inclusive)
def BeautifulSoup.BeautifulStoneSoup.endData (   self,
  containerClass = NavigableString 
)

Definition at line 1306 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentData, BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.endData(), reco::helper::VirtualJetProducerHelper.intersection(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.PRESERVE_WHITESPACE_TAGS, BeautifulSoup.PageElement.previous, runtimedef.set(), BeautifulSoup.BeautifulStoneSoup.STRIP_ASCII_SPACES, and BeautifulSoup.BeautifulStoneSoup.tagStack.

1307  def endData(self, containerClass=NavigableString):
1308  if self.currentData:
1309  currentData = u''.join(self.currentData)
1310  if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
1311  not set([tag.name for tag in self.tagStack]).intersection(
1312  self.PRESERVE_WHITESPACE_TAGS)):
1313  if '\n' in currentData:
1314  currentData = '\n'
1315  else:
1316  currentData = ' '
1317  self.currentData = []
1318  if self.parseOnlyThese and len(self.tagStack) <= 1 and \
1319  (not self.parseOnlyThese.text or \
1320  not self.parseOnlyThese.search(currentData)):
1321  return
1322  o = containerClass(currentData)
1323  o.setup(self.currentTag, self.previous)
1324  if self.previous:
1325  self.previous.next = o
1326  self.previous = o
1327  self.currentTag.contents.append(o)
1328 
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
void set(const std::string &name, int value)
set the flag, with a run-time name
def BeautifulSoup.BeautifulStoneSoup.endData (   self,
  containerClass = NavigableString 
)

Definition at line 1306 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentData, BeautifulSoup.BeautifulStoneSoup.currentTag, reco::helper::VirtualJetProducerHelper.intersection(), join(), BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.PRESERVE_WHITESPACE_TAGS, BeautifulSoup.PageElement.previous, runtimedef.set(), BeautifulSoup.BeautifulStoneSoup.STRIP_ASCII_SPACES, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._feed(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1307  def endData(self, containerClass=NavigableString):
1308  if self.currentData:
1309  currentData = u''.join(self.currentData)
1310  if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
1311  not set([tag.name for tag in self.tagStack]).intersection(
1312  self.PRESERVE_WHITESPACE_TAGS)):
1313  if '\n' in currentData:
1314  currentData = '\n'
1315  else:
1316  currentData = ' '
1317  self.currentData = []
1318  if self.parseOnlyThese and len(self.tagStack) <= 1 and \
1319  (not self.parseOnlyThese.text or \
1320  not self.parseOnlyThese.search(currentData)):
1321  return
1322  o = containerClass(currentData)
1323  o.setup(self.currentTag, self.previous)
1324  if self.previous:
1325  self.previous.next = o
1326  self.previous = o
1327  self.currentTag.contents.append(o)
1328 
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
void set(const std::string &name, int value)
set the flag, with a run-time name
def BeautifulSoup.BeautifulStoneSoup.extractCharsetFromMeta (   self,
  attrs 
)

Definition at line 1443 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.extractCharsetFromMeta(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1444  def extractCharsetFromMeta(self, attrs):
1445  self.unknown_starttag('meta', attrs)
1446 
def BeautifulSoup.BeautifulStoneSoup.extractCharsetFromMeta (   self,
  attrs 
)

Definition at line 1443 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

Referenced by BeautifulSoup.BeautifulStoneSoup.extractCharsetFromMeta().

1444  def extractCharsetFromMeta(self, attrs):
1445  self.unknown_starttag('meta', attrs)
1446 
def BeautifulSoup.BeautifulStoneSoup.handle_data (   self,
  data 
)

Definition at line 1440 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1441  def handle_data(self, data):
1442  self.currentData.append(data)
def BeautifulSoup.BeautifulStoneSoup.handle_data (   self,
  data 
)

Definition at line 1440 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.handle_data().

1441  def handle_data(self, data):
1442  self.currentData.append(data)
def BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag (   self,
  name 
)
Returns true iff the given string is the name of a
self-closing tag according to this parser.

Definition at line 1269 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1270  def isSelfClosingTag(self, name):
1271  """Returns true iff the given string is the name of a
1272  self-closing tag according to this parser."""
1273  return self.SELF_CLOSING_TAGS.has_key(name) \
1274  or self.instanceSelfClosingTags.has_key(name)
def BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag (   self,
  name 
)
Returns true iff the given string is the name of a
self-closing tag according to this parser.

Definition at line 1269 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag().

1270  def isSelfClosingTag(self, name):
1271  """Returns true iff the given string is the name of a
1272  self-closing tag according to this parser."""
1273  return self.SELF_CLOSING_TAGS.has_key(name) \
1274  or self.instanceSelfClosingTags.has_key(name)
def BeautifulSoup.BeautifulStoneSoup.popTag (   self)

Definition at line 1285 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup._feed(), BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup.popTag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1286  def popTag(self):
1287  tag = self.tagStack.pop()
1288  # Tags with just one string-owning child get the child as a
1289  # 'string' property, so that soup.tag.string is shorthand for
1290  # soup.tag.contents[0]
1291  if len(self.currentTag.contents) == 1 and \
1292  isinstance(self.currentTag.contents[0], NavigableString):
1293  self.currentTag.string = self.currentTag.contents[0]
1294 
1295  #print "Pop", tag.name
1296  if self.tagStack:
1297  self.currentTag = self.tagStack[-1]
1298  return self.currentTag
def BeautifulSoup.BeautifulStoneSoup.popTag (   self)

Definition at line 1285 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.popTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

1286  def popTag(self):
1287  tag = self.tagStack.pop()
1288  # Tags with just one string-owning child get the child as a
1289  # 'string' property, so that soup.tag.string is shorthand for
1290  # soup.tag.contents[0]
1291  if len(self.currentTag.contents) == 1 and \
1292  isinstance(self.currentTag.contents[0], NavigableString):
1293  self.currentTag.string = self.currentTag.contents[0]
1294 
1295  #print "Pop", tag.name
1296  if self.tagStack:
1297  self.currentTag = self.tagStack[-1]
1298  return self.currentTag
def BeautifulSoup.BeautifulStoneSoup.pushTag (   self,
  tag 
)

Definition at line 1299 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.reset(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1300  def pushTag(self, tag):
1301  #print "Push", tag.name
1302  if self.currentTag:
1303  self.currentTag.contents.append(tag)
1304  self.tagStack.append(tag)
1305  self.currentTag = self.tagStack[-1]
def BeautifulSoup.BeautifulStoneSoup.pushTag (   self,
  tag 
)

Definition at line 1299 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.pushTag(), and BeautifulSoup.BeautifulStoneSoup.tagStack.

1300  def pushTag(self, tag):
1301  #print "Push", tag.name
1302  if self.currentTag:
1303  self.currentTag.contents.append(tag)
1304  self.tagStack.append(tag)
1305  self.currentTag = self.tagStack[-1]
def BeautifulSoup.BeautifulStoneSoup.reset (   self)

Definition at line 1275 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.currentData, BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.hidden, BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.quoteStack, BeautifulSoup.BeautifulStoneSoup.reset(), BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME, and BeautifulSoup.BeautifulStoneSoup.tagStack.

1276  def reset(self):
1277  Tag.__init__(self, self, self.ROOT_TAG_NAME)
1278  self.hidden = 1
1279  self.builder.reset()
1280  self.currentData = []
1281  self.currentTag = None
1282  self.tagStack = []
1283  self.quoteStack = []
1284  self.pushTag(self)
def BeautifulSoup.BeautifulStoneSoup.reset (   self)

Definition at line 1275 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), and BeautifulSoup.BeautifulStoneSoup.reset().

1276  def reset(self):
1277  Tag.__init__(self, self, self.ROOT_TAG_NAME)
1278  self.hidden = 1
1279  self.builder.reset()
1280  self.currentData = []
1281  self.currentTag = None
1282  self.tagStack = []
1283  self.quoteStack = []
1284  self.pushTag(self)
def BeautifulSoup.BeautifulStoneSoup.unknown_endtag (   self,
  name 
)

Definition at line 1427 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.HTMLParserBuilder.handle_data(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.literal, BeautifulSoup.BeautifulStoneSoup.quoteStack, and BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

1428  def unknown_endtag(self, name):
1429  #print "End tag %s" % name
1430  if self.quoteStack and self.quoteStack[-1] != name:
1431  #This is not a real end tag.
1432  #print "</%s> is not real!" % name
1433  self.handle_data('</%s>' % name)
1434  return
1435  self.endData()
1436  self._popToTag(name)
1437  if self.quoteStack and self.quoteStack[-1] == name:
1438  self.quoteStack.pop()
1439  self.literal = (len(self.quoteStack) > 0)
def BeautifulSoup.BeautifulStoneSoup.unknown_endtag (   self,
  name 
)

Definition at line 1427 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.HTMLParserBuilder.handle_data(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.literal, and BeautifulSoup.BeautifulStoneSoup.quoteStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_endtag().

1428  def unknown_endtag(self, name):
1429  #print "End tag %s" % name
1430  if self.quoteStack and self.quoteStack[-1] != name:
1431  #This is not a real end tag.
1432  #print "</%s> is not real!" % name
1433  self.handle_data('</%s>' % name)
1434  return
1435  self.endData()
1436  self._popToTag(name)
1437  if self.quoteStack and self.quoteStack[-1] == name:
1438  self.quoteStack.pop()
1439  self.literal = (len(self.quoteStack) > 0)
def BeautifulSoup.BeautifulStoneSoup.unknown_starttag (   self,
  name,
  attrs,
  selfClosing = 0 
)

Definition at line 1397 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._smartPop(), BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.HTMLParserBuilder.handle_data(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag(), join(), BeautifulSoup.BeautifulStoneSoup.literal, Association.map, BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.PageElement.previous, BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.QUOTE_TAGS, BeautifulSoup.BeautifulStoneSoup.quoteStack, BeautifulSoup.BeautifulStoneSoup.tagStack, and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1398  def unknown_starttag(self, name, attrs, selfClosing=0):
1399  #print "Start tag %s: %s" % (name, attrs)
1400  if self.quoteStack:
1401  #This is not a real tag.
1402  #print "<%s> is not real!" % name
1403  attrs = ''.join(map(lambda(x, y): ' %s="%s"' % (x, y), attrs))
1404  self.handle_data('<%s%s>' % (name, attrs))
1405  return
1406  self.endData()
1407 
1408  if not self.isSelfClosingTag(name) and not selfClosing:
1409  self._smartPop(name)
1410 
1411  if self.parseOnlyThese and len(self.tagStack) <= 1 \
1412  and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
1413  return
1414 
1415  tag = Tag(self, name, attrs, self.currentTag, self.previous)
1416  if self.previous:
1417  self.previous.next = tag
1418  self.previous = tag
1419  self.pushTag(tag)
1420  if selfClosing or self.isSelfClosingTag(name):
1421  self.popTag()
1422  if name in self.QUOTE_TAGS:
1423  #print "Beginning quote (%s)" % name
1424  self.quoteStack.append(name)
1425  self.literal = 1
1426  return tag
dictionary map
Definition: Association.py:205
static std::string join(char **cmd)
Definition: RemoteFile.cc:18
def BeautifulSoup.BeautifulStoneSoup.unknown_starttag (   self,
  name,
  attrs,
  selfClosing = 0 
)

Definition at line 1397 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup._smartPop(), BeautifulSoup.BeautifulStoneSoup.currentTag, BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.HTMLParserBuilder.handle_data(), BeautifulSoup.BeautifulStoneSoup.handle_data(), BeautifulSoup.BeautifulStoneSoup.isSelfClosingTag(), join(), Association.map, BeautifulSoup.BeautifulStoneSoup.parseOnlyThese, BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.PageElement.previous, BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.QUOTE_TAGS, BeautifulSoup.BeautifulStoneSoup.quoteStack, and BeautifulSoup.BeautifulStoneSoup.tagStack.

Referenced by BeautifulSoup.BeautifulStoneSoup.extractCharsetFromMeta(), BeautifulSoup.BeautifulSoup.extractCharsetFromMeta(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1398  def unknown_starttag(self, name, attrs, selfClosing=0):
1399  #print "Start tag %s: %s" % (name, attrs)
1400  if self.quoteStack:
1401  #This is not a real tag.
1402  #print "<%s> is not real!" % name
1403  attrs = ''.join(map(lambda(x, y): ' %s="%s"' % (x, y), attrs))
1404  self.handle_data('<%s%s>' % (name, attrs))
1405  return
1406  self.endData()
1407 
1408  if not self.isSelfClosingTag(name) and not selfClosing:
1409  self._smartPop(name)
1410 
1411  if self.parseOnlyThese and len(self.tagStack) <= 1 \
1412  and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
1413  return
1414 
1415  tag = Tag(self, name, attrs, self.currentTag, self.previous)
1416  if self.previous:
1417  self.previous.next = tag
1418  self.previous = tag
1419  self.pushTag(tag)
1420  if selfClosing or self.isSelfClosingTag(name):
1421  self.popTag()
1422  if name in self.QUOTE_TAGS:
1423  #print "Beginning quote (%s)" % name
1424  self.quoteStack.append(name)
1425  self.literal = 1
1426  return tag
dictionary map
Definition: Association.py:205
static std::string join(char **cmd)
Definition: RemoteFile.cc:18

Member Data Documentation

BeautifulSoup.BeautifulStoneSoup.ALL_ENTITIES = XHTML_ENTITIES
static

Definition at line 1156 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.builder

Definition at line 1222 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__().

BeautifulSoup.BeautifulStoneSoup.convertEntities

Definition at line 1196 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__().

BeautifulSoup.BeautifulStoneSoup.convertHTMLEntities

Definition at line 1206 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), and BeautifulSoup.Tag._invert().

BeautifulSoup.BeautifulStoneSoup.convertXMLEntities

Definition at line 1205 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), and BeautifulSoup.Tag._invert().

BeautifulSoup.BeautifulStoneSoup.currentData

Definition at line 1279 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData(), and BeautifulSoup.BeautifulStoneSoup.reset().

BeautifulSoup.BeautifulStoneSoup.currentTag

Definition at line 1280 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.reset(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding

Definition at line 1248 of file BeautifulSoup.py.

Referenced by BeautifulSoup.UnicodeDammit.__init__(), BeautifulSoup.UnicodeDammit._detectEncoding(), BeautifulSoup.BeautifulStoneSoup._feed(), and BeautifulSoup.BeautifulSoup.extractCharsetFromMeta().

BeautifulSoup.BeautifulStoneSoup.escapeUnrecognizedEntities

Definition at line 1207 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), and BeautifulSoup.Tag._invert().

BeautifulSoup.BeautifulStoneSoup.fromEncoding

Definition at line 1194 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), BeautifulSoup.BeautifulStoneSoup._feed(), and BeautifulSoup.BeautifulSoup.extractCharsetFromMeta().

BeautifulSoup.BeautifulStoneSoup.hidden

Definition at line 1277 of file BeautifulSoup.py.

Referenced by BeautifulSoup.Tag._invert(), and BeautifulSoup.BeautifulStoneSoup.reset().

string BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES = "html"
static

Definition at line 1152 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), and BeautifulSoup.BeautifulSoup.__init__().

BeautifulSoup.BeautifulStoneSoup.instanceSelfClosingTags

Definition at line 1221 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__().

BeautifulSoup.BeautifulStoneSoup.literal

Definition at line 1424 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

BeautifulSoup.BeautifulStoneSoup.markup

Definition at line 1227 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), BeautifulSoup.UnicodeDammit.__init__(), BeautifulSoup.UnicodeDammit._convertFrom(), and BeautifulSoup.BeautifulStoneSoup._feed().

list BeautifulSoup.BeautifulStoneSoup.MARKUP_MASSAGE
static
Initial value:
1 = [(re.compile('(<[^<>]*)/>'),
2  lambda x: x.group(1) + ' />'),
3  (re.compile('<!\s+([^<>]*)>'),
4  lambda x: '<!' + x.group(1) + '>')
5  ]

Definition at line 1144 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup._feed().

BeautifulSoup.BeautifulStoneSoup.markupMassage

Definition at line 1228 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), and BeautifulSoup.BeautifulStoneSoup._feed().

dictionary BeautifulSoup.BeautifulStoneSoup.NESTABLE_TAGS = {}
static

Definition at line 1139 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.originalEncoding

Definition at line 1241 of file BeautifulSoup.py.

Referenced by BeautifulSoup.UnicodeDammit.__init__(), BeautifulSoup.UnicodeDammit._convertFrom(), BeautifulSoup.BeautifulStoneSoup._feed(), and BeautifulSoup.BeautifulSoup.extractCharsetFromMeta().

BeautifulSoup.BeautifulStoneSoup.parseOnlyThese

Definition at line 1193 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), BeautifulSoup.BeautifulStoneSoup.endData(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

list BeautifulSoup.BeautifulStoneSoup.PRESERVE_WHITESPACE_TAGS = []
static

Definition at line 1142 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData().

BeautifulSoup.BeautifulStoneSoup.previous

Definition at line 1325 of file BeautifulSoup.py.

dictionary BeautifulSoup.BeautifulStoneSoup.QUOTE_TAGS = {}
static

Definition at line 1141 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

BeautifulSoup.BeautifulStoneSoup.quoteStack

Definition at line 1282 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.reset(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

dictionary BeautifulSoup.BeautifulStoneSoup.RESET_NESTING_TAGS = {}
static

Definition at line 1140 of file BeautifulSoup.py.

string BeautifulSoup.BeautifulStoneSoup.ROOT_TAG_NAME = u'[document]'
static

Definition at line 1150 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup._feed(), BeautifulSoup.BeautifulStoneSoup._popToTag(), and BeautifulSoup.BeautifulStoneSoup.reset().

dictionary BeautifulSoup.BeautifulStoneSoup.SELF_CLOSING_TAGS = {}
static

Definition at line 1138 of file BeautifulSoup.py.

BeautifulSoup.BeautifulStoneSoup.smartQuotesTo

Definition at line 1195 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__(), BeautifulSoup.UnicodeDammit.__init__(), BeautifulSoup.UnicodeDammit._convertFrom(), BeautifulSoup.BeautifulStoneSoup._feed(), and BeautifulSoup.UnicodeDammit._subMSChar().

dictionary BeautifulSoup.BeautifulStoneSoup.STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, }
static

Definition at line 1162 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.endData().

BeautifulSoup.BeautifulStoneSoup.tagStack

Definition at line 1281 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup._popToTag(), BeautifulSoup.BeautifulStoneSoup._smartPop(), BeautifulSoup.BeautifulStoneSoup.endData(), BeautifulSoup.BeautifulStoneSoup.popTag(), BeautifulSoup.BeautifulSOAP.popTag(), BeautifulSoup.BeautifulStoneSoup.pushTag(), BeautifulSoup.BeautifulStoneSoup.reset(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

string BeautifulSoup.BeautifulStoneSoup.XHTML_ENTITIES = "xhtml"
static

Definition at line 1154 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__().

string BeautifulSoup.BeautifulStoneSoup.XML_ENTITIES = "xml"
static

Definition at line 1153 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulStoneSoup.__init__().