Inheritance diagram for BeautifulSoup.BeautifulSoup:

Public Member Functions
def	__init__ (self, args, kwargs)

def	start_meta (self, attrs)

Public Member Functions inherited from BeautifulSoup.BeautifulStoneSoup
def	__getattr__ (self, methodName)

def	__init__ (self, markup="", parseOnlyThese=None, fromEncoding=None, markupMassage=True, smartQuotesTo=XML_ENTITIES, convertEntities=None, selfClosingTags=None, isHTML=False)

def	convert_charref (self, name)

def	endData (self, containerClass=NavigableString)

def	handle_charref (self, ref)

def	handle_comment (self, text)

def	handle_data (self, data)

def	handle_decl (self, data)

def	handle_entityref (self, ref)

def	handle_pi (self, text)

def	isSelfClosingTag (self, name)

def	parse_declaration (self, i)

def	popTag (self)

def	pushTag (self, tag)

def	reset (self)

def	unknown_endtag (self, name)

def	unknown_starttag (self, name, attrs, selfClosing=0)

Public Member Functions inherited from BeautifulSoup.Tag
def	__call__ (self, args, kwargs)

def	__contains__ (self, x)

def	__delitem__ (self, key)

def	__eq__ (self, other)

def	__getattr__ (self, tag)

def	__getitem__ (self, key)

def	__init__ (self, parser, name, attrs=None, parent=None, previous=None)

def	__iter__ (self)

def	__len__ (self)

def	__ne__ (self, other)

def	__nonzero__ (self)

def	__repr__ (self, encoding=DEFAULT_OUTPUT_ENCODING)

def	__setitem__ (self, key, value)

def	__str__ (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)

def	__unicode__ (self)

def	childGenerator (self)

def	clear (self)

def	decompose (self)

def	fetchText (self, text=None, recursive=True, limit=None)

def	find (self, name=None, attrs={}, recursive=True, text=None, kwargs)

def	findAll (self, name=None, attrs={}, recursive=True, text=None, limit=None, kwargs)

def	firstText (self, text=None, recursive=True)

def	get (self, key, default=None)

def	getString (self)

def	getText (self, separator=u"")

def	has_key (self, key)

def	index (self, element)

def	prettify (self, encoding=DEFAULT_OUTPUT_ENCODING)

def	recursiveChildGenerator (self)

def	renderContents (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)

def	setString (self, string)

Public Member Functions inherited from BeautifulSoup.PageElement
def	append (self, tag)

def	extract (self)

def	findAllNext (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	findAllPrevious (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	findNext (self, name=None, attrs={}, text=None, kwargs)

def	findNextSibling (self, name=None, attrs={}, text=None, kwargs)

def	findNextSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	findParent (self, name=None, attrs={}, kwargs)

def	findParents (self, name=None, attrs={}, limit=None, kwargs)

def	findPrevious (self, name=None, attrs={}, text=None, kwargs)

def	findPreviousSibling (self, name=None, attrs={}, text=None, kwargs)

def	findPreviousSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	insert (self, position, newChild)

def	nextGenerator (self)

def	nextSiblingGenerator (self)

def	parentGenerator (self)

def	previousGenerator (self)

def	previousSiblingGenerator (self)

def	replaceWith (self, replaceWith)

def	replaceWithChildren (self)

def	setup (self, parent=None, previous=None)

def	substituteEncoding (self, str, encoding=None)

def	toEncoding (self, s, encoding=None)

Public Attributes
	declaredHTMLEncoding

	originalEncoding

Public Attributes inherited from BeautifulSoup.BeautifulStoneSoup
	convertEntities

	convertHTMLEntities

	convertXMLEntities

	currentData

	currentTag

	declaredHTMLEncoding

	escapeUnrecognizedEntities

	fromEncoding

	hidden

	instanceSelfClosingTags

	literal

	markup

	markupMassage

	originalEncoding

	parseOnlyThese

	previous

	quoteStack

	smartQuotesTo

	tagStack

Public Attributes inherited from BeautifulSoup.Tag
	attrMap

	attrs

	containsSubstitutions

	contents

	convertHTMLEntities

	convertXMLEntities

	escapeUnrecognizedEntities

	hidden

	isSelfClosing

	name

	parserClass

Public Attributes inherited from BeautifulSoup.PageElement
	next

	nextSibling

	parent

	previous

	previousSibling

Static Public Attributes
	CHARSET_RE

	NESTABLE_BLOCK_TAGS

	NESTABLE_INLINE_TAGS

	NESTABLE_LIST_TAGS

	NESTABLE_TABLE_TAGS

	NESTABLE_TAGS

	NON_NESTABLE_BLOCK_TAGS

	PRESERVE_WHITESPACE_TAGS

	QUOTE_TAGS

	RESET_NESTING_TAGS

	SELF_CLOSING_TAGS

Static Public Attributes inherited from BeautifulSoup.BeautifulStoneSoup
	ALL_ENTITIES

	HTML_ENTITIES

	MARKUP_MASSAGE

	NESTABLE_TAGS

	PRESERVE_WHITESPACE_TAGS

	QUOTE_TAGS

	RESET_NESTING_TAGS

	ROOT_TAG_NAME

	SELF_CLOSING_TAGS

	STRIP_ASCII_SPACES

	XHTML_ENTITIES

	XML_ENTITIES

Static Public Attributes inherited from BeautifulSoup.Tag
	fetch

	findChild

	findChildren

	first

Static Public Attributes inherited from BeautifulSoup.PageElement
	BARE_AMPERSAND_OR_BRACKET

	fetchNextSiblings

	fetchParents

	fetchPrevious

	fetchPreviousSiblings

	XML_ENTITIES_TO_SPECIAL_CHARS

	XML_SPECIAL_CHARS_TO_ENTITIES

Additional Inherited Members
Properties inherited from BeautifulSoup.Tag
	string = property(getString, setString)

	text = property(getText)

Detailed Description

This parser knows the following facts about HTML:

* Some tags have no closing tag and should be interpreted as being
  closed as soon as they are encountered.

* The text inside some tags (ie. 'script') may contain tags which
  are not really part of the document and which should be parsed
  as text, not tags. If you want to parse the text as tags, you can
  always fetch it and parse it explicitly.

* Tag nesting rules:

  Most tags can't be nested at all. For instance, the occurance of
  a <p> tag should implicitly close the previous <p> tag.

   <p>Para1<p>Para2
    should be transformed into:
   <p>Para1</p><p>Para2

  Some tags can be nested arbitrarily. For instance, the occurance
  of a <blockquote> tag should _not_ implicitly close the previous
  <blockquote> tag.

   Alice said: <blockquote>Bob said: <blockquote>Blah
    should NOT be transformed into:
   Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah

  Some tags can be nested, but the nesting is reset by the
  interposition of other tags. For instance, a <tr> tag should
  implicitly close the previous <tr> tag within the same <table>,
  but not close a <tr> tag in another table.

   <table><tr>Blah<tr>Blah
    should be transformed into:
   <table><tr>Blah</tr><tr>Blah
    but,
   <tr>Blah<table><tr>Blah
    should NOT be transformed into
   <tr>Blah<table></tr><tr>Blah

Differing assumptions about tag nesting rules are a major source
of problems with the BeautifulSoup class. If BeautifulSoup is not
treating as nestable a tag your page author treats as nestable,
try ICantBelieveItsBeautifulSoup, MinimalSoup, or
BeautifulStoneSoup before writing your own subclass.

Definition at line 1470 of file BeautifulSoup.py.

Constructor & Destructor Documentation

◆ init()

def BeautifulSoup.BeautifulSoup.__init__	(	self,
		args,
		kwargs
	)

Definition at line 1518 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES.

     def __init__(self, *args, **kwargs):
         if not kwargs.has_key('smartQuotesTo'):
             kwargs['smartQuotesTo'] = self.HTML_ENTITIES
         kwargs['isHTML'] = True
         BeautifulStoneSoup.__init__(self, *args, **kwargs)
 

Member Function Documentation

◆ start_meta()

def BeautifulSoup.BeautifulSoup.start_meta	(	self,
		attrs
	)

Beautiful Soup can detect a charset included in a META tag,
try to convert the document to that charset, and re-parse the
document from the beginning.

Definition at line 1576 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulSoup.CHARSET_RE, BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding, FastTimerService_cff.range, and cond::persistency.search().

     def start_meta(self, attrs):
         """Beautiful Soup can detect a charset included in a META tag,
         try to convert the document to that charset, and re-parse the
         document from the beginning."""
         httpEquiv = None
         contentType = None
         contentTypeIndex = None
         tagNeedsEncodingSubstitution = False
 
         for i in range(0, len(attrs)):
             key, value = attrs[i]
             key = key.lower()
             if key == 'http-equiv':
                 httpEquiv = value
             elif key == 'content':
                 contentType = value
                 contentTypeIndex = i
 
         if httpEquiv and contentType: # It's an interesting meta tag.
             match = self.CHARSET_RE.search(contentType)
             if match:
                 if (self.declaredHTMLEncoding is not None or
                     self.originalEncoding == self.fromEncoding):
                     # An HTML encoding was sniffed while converting
                     # the document to Unicode, or an HTML encoding was
                     # sniffed during a previous pass through the
                     # document, or an encoding was specified
                     # explicitly and it worked. Rewrite the meta tag.
                     def rewrite(match):
                         return match.group(1) + "%SOUP-ENCODING%"
                     newAttr = self.CHARSET_RE.sub(rewrite, contentType)
                     attrs[contentTypeIndex] = (attrs[contentTypeIndex][0],
                                                newAttr)
                     tagNeedsEncodingSubstitution = True
                 else:
                     # This is our first pass through the document.
                     # Go through it again with the encoding information.
                     newCharset = match.group(3)
                     if newCharset and newCharset != self.originalEncoding:
                         self.declaredHTMLEncoding = newCharset
                         self._feed(self.declaredHTMLEncoding)
                         raise StopParsing
                     pass
         tag = self.unknown_starttag("meta", attrs)
         if tag and tagNeedsEncodingSubstitution:
             tag.containsSubstitutions = True
 

Member Data Documentation

◆ CHARSET_RE

BeautifulSoup.BeautifulSoup.CHARSET_RE

static

Definition at line 1574 of file BeautifulSoup.py.

Referenced by BeautifulSoup.BeautifulSoup.start_meta().

◆ declaredHTMLEncoding

BeautifulSoup.BeautifulSoup.declaredHTMLEncoding

Definition at line 1615 of file BeautifulSoup.py.

Referenced by BeautifulSoup.UnicodeDammit._detectEncoding().

◆ NESTABLE_BLOCK_TAGS

BeautifulSoup.BeautifulSoup.NESTABLE_BLOCK_TAGS

static

Definition at line 1541 of file BeautifulSoup.py.

◆ NESTABLE_INLINE_TAGS

BeautifulSoup.BeautifulSoup.NESTABLE_INLINE_TAGS

static

Definition at line 1535 of file BeautifulSoup.py.

◆ NESTABLE_LIST_TAGS

BeautifulSoup.BeautifulSoup.NESTABLE_LIST_TAGS

static

Definition at line 1544 of file BeautifulSoup.py.

◆ NESTABLE_TABLE_TAGS

BeautifulSoup.BeautifulSoup.NESTABLE_TABLE_TAGS

static

Definition at line 1552 of file BeautifulSoup.py.

◆ NESTABLE_TAGS

BeautifulSoup.BeautifulSoup.NESTABLE_TAGS

static

Definition at line 1570 of file BeautifulSoup.py.

◆ NON_NESTABLE_BLOCK_TAGS

BeautifulSoup.BeautifulSoup.NON_NESTABLE_BLOCK_TAGS

static

Definition at line 1561 of file BeautifulSoup.py.

◆ originalEncoding

BeautifulSoup.BeautifulSoup.originalEncoding

Definition at line 1598 of file BeautifulSoup.py.

◆ PRESERVE_WHITESPACE_TAGS

BeautifulSoup.BeautifulSoup.PRESERVE_WHITESPACE_TAGS

static

Definition at line 1528 of file BeautifulSoup.py.

◆ QUOTE_TAGS

BeautifulSoup.BeautifulSoup.QUOTE_TAGS

static

Definition at line 1530 of file BeautifulSoup.py.

◆ RESET_NESTING_TAGS

BeautifulSoup.BeautifulSoup.RESET_NESTING_TAGS

static

Definition at line 1565 of file BeautifulSoup.py.

◆ SELF_CLOSING_TAGS

BeautifulSoup.BeautifulSoup.SELF_CLOSING_TAGS

static

Definition at line 1524 of file BeautifulSoup.py.

Public Member Functions

Public Attributes

Static Public Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ start_meta()

Member Data Documentation

◆ CHARSET_RE

◆ declaredHTMLEncoding

◆ NESTABLE_BLOCK_TAGS

◆ NESTABLE_INLINE_TAGS

◆ NESTABLE_LIST_TAGS

◆ NESTABLE_TABLE_TAGS

◆ NESTABLE_TAGS

◆ NON_NESTABLE_BLOCK_TAGS

◆ originalEncoding

◆ PRESERVE_WHITESPACE_TAGS

◆ QUOTE_TAGS

◆ RESET_NESTING_TAGS

◆ SELF_CLOSING_TAGS

◆ init()