Inheritance diagram for BeautifulSoup.BeautifulSoup:

Public Member Functions
def	__init__ (self, args, kwargs)

def	start_meta (self, attrs)

Public Member Functions inherited from BeautifulSoup.BeautifulStoneSoup
def	__getattr__ (self, methodName)

def	__init__ (self, markup="", parseOnlyThese=None, fromEncoding=None, markupMassage=True, smartQuotesTo=XML_ENTITIES, convertEntities=None, selfClosingTags=None, isHTML=False)

def	convert_charref (self, name)

def	endData (self, containerClass=NavigableString)

def	handle_charref (self, ref)

def	handle_comment (self, text)

def	handle_data (self, data)

def	handle_decl (self, data)

def	handle_entityref (self, ref)

def	handle_pi (self, text)

def	isSelfClosingTag (self, name)

def	parse_declaration (self, i)

def	popTag (self)

def	pushTag (self, tag)

def	reset (self)

def	unknown_endtag (self, name)

def	unknown_starttag (self, name, attrs, selfClosing=0)

Public Member Functions inherited from BeautifulSoup.Tag
def	__call__ (self, args, kwargs)

def	__contains__ (self, x)

def	__delitem__ (self, key)

def	__eq__ (self, other)

def	__getattr__ (self, tag)

def	__getitem__ (self, key)

def	__init__ (self, parser, name, attrs=None, parent=None, previous=None)

def	__iter__ (self)

def	__len__ (self)

def	__ne__ (self, other)

def	__nonzero__ (self)

def	__repr__ (self, encoding=DEFAULT_OUTPUT_ENCODING)

def	__setitem__ (self, key, value)

def	__str__ (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)

def	__unicode__ (self)

def	childGenerator (self)

def	clear (self)

def	decompose (self)

def	fetchText (self, text=None, recursive=True, limit=None)

def	find (self, name=None, attrs={}, recursive=True, text=None, kwargs)

def	findAll (self, name=None, attrs={}, recursive=True, text=None, limit=None, kwargs)

def	firstText (self, text=None, recursive=True)

def	get (self, key, default=None)

def	getString (self)

def	getText (self, separator=u"")

def	has_key (self, key)

def	index (self, element)

def	prettify (self, encoding=DEFAULT_OUTPUT_ENCODING)

def	recursiveChildGenerator (self)

def	renderContents (self, encoding=DEFAULT_OUTPUT_ENCODING, prettyPrint=False, indentLevel=0)

def	setString (self, string)

Public Member Functions inherited from BeautifulSoup.PageElement
def	append (self, tag)

def	extract (self)

def	findAllNext (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	findAllPrevious (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	findNext (self, name=None, attrs={}, text=None, kwargs)

def	findNextSibling (self, name=None, attrs={}, text=None, kwargs)

def	findNextSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	findParent (self, name=None, attrs={}, kwargs)

def	findParents (self, name=None, attrs={}, limit=None, kwargs)

def	findPrevious (self, name=None, attrs={}, text=None, kwargs)

def	findPreviousSibling (self, name=None, attrs={}, text=None, kwargs)

def	findPreviousSiblings (self, name=None, attrs={}, text=None, limit=None, kwargs)

def	insert (self, position, newChild)

def	nextGenerator (self)

def	nextSiblingGenerator (self)

def	parentGenerator (self)

def	previousGenerator (self)

def	previousSiblingGenerator (self)

def	replaceWith (self, replaceWith)

def	replaceWithChildren (self)

def	setup (self, parent=None, previous=None)

def	substituteEncoding (self, str, encoding=None)

def	toEncoding (self, s, encoding=None)

Public Attributes
	declaredHTMLEncoding

	originalEncoding

Public Attributes inherited from BeautifulSoup.BeautifulStoneSoup
	convertEntities

	convertHTMLEntities

	convertXMLEntities

	currentData

	currentTag

	declaredHTMLEncoding

	escapeUnrecognizedEntities

	fromEncoding

	hidden

	instanceSelfClosingTags

	literal

	markup

	markupMassage

	originalEncoding

	parseOnlyThese

	previous

	quoteStack

	smartQuotesTo

	tagStack

Public Attributes inherited from BeautifulSoup.Tag
	attrMap

	attrs

	containsSubstitutions

	contents

	convertHTMLEntities

	convertXMLEntities

	escapeUnrecognizedEntities

	hidden

	isSelfClosing

	name

	parserClass

Public Attributes inherited from BeautifulSoup.PageElement
	next

	nextSibling

	parent

	previous

	previousSibling

Additional Inherited Members
Properties inherited from BeautifulSoup.Tag
	string = property(getString, setString)

	text = property(getText)

Detailed Description

This parser knows the following facts about HTML:

* Some tags have no closing tag and should be interpreted as being
  closed as soon as they are encountered.

* The text inside some tags (ie. 'script') may contain tags which
  are not really part of the document and which should be parsed
  as text, not tags. If you want to parse the text as tags, you can
  always fetch it and parse it explicitly.

* Tag nesting rules:

  Most tags can't be nested at all. For instance, the occurance of
  a <p> tag should implicitly close the previous <p> tag.

   <p>Para1<p>Para2
    should be transformed into:
   <p>Para1</p><p>Para2

  Some tags can be nested arbitrarily. For instance, the occurance
  of a <blockquote> tag should _not_ implicitly close the previous
  <blockquote> tag.

   Alice said: <blockquote>Bob said: <blockquote>Blah
    should NOT be transformed into:
   Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah

  Some tags can be nested, but the nesting is reset by the
  interposition of other tags. For instance, a <tr> tag should
  implicitly close the previous <tr> tag within the same <table>,
  but not close a <tr> tag in another table.

   <table><tr>Blah<tr>Blah
    should be transformed into:
   <table><tr>Blah</tr><tr>Blah
    but,
   <tr>Blah<table><tr>Blah
    should NOT be transformed into
   <tr>Blah<table></tr><tr>Blah

Differing assumptions about tag nesting rules are a major source
of problems with the BeautifulSoup class. If BeautifulSoup is not
treating as nestable a tag your page author treats as nestable,
try ICantBelieveItsBeautifulSoup, MinimalSoup, or
BeautifulStoneSoup before writing your own subclass.

Definition at line 1471 of file BeautifulSoup.py.

Constructor & Destructor Documentation

def BeautifulSoup.BeautifulSoup.__init__	(	self,
		args,
		kwargs
	)

Definition at line 1519 of file BeautifulSoup.py.

References BeautifulSoup.buildTagMap().

     def __init__(self, *args, **kwargs):
         if 'smartQuotesTo' not in kwargs:
             kwargs['smartQuotesTo'] = self.HTML_ENTITIES
         kwargs['isHTML'] = True
         BeautifulStoneSoup.__init__(self, *args, **kwargs)
 

Member Function Documentation

def BeautifulSoup.BeautifulSoup.start_meta	(	self,
		attrs
	)

Beautiful Soup can detect a charset included in a META tag,
try to convert the document to that charset, and re-parse the
document from the beginning.

Definition at line 1577 of file BeautifulSoup.py.

References BeautifulSoup.BeautifulStoneSoup.declaredHTMLEncoding.

     def start_meta(self, attrs):
         """Beautiful Soup can detect a charset included in a META tag,
         try to convert the document to that charset, and re-parse the
         document from the beginning."""
         httpEquiv = None
         contentType = None
         contentTypeIndex = None
         tagNeedsEncodingSubstitution = False
 
         for i in range(0, len(attrs)):
             key, value = attrs[i]
             key = key.lower()
             if key == 'http-equiv':
                 httpEquiv = value
             elif key == 'content':
                 contentType = value
                 contentTypeIndex = i
 
         if httpEquiv and contentType: # It's an interesting meta tag.
             match = self.CHARSET_RE.search(contentType)
             if match:
                 if (self.declaredHTMLEncoding is not None or
                     self.originalEncoding == self.fromEncoding):
                     # An HTML encoding was sniffed while converting
                     # the document to Unicode, or an HTML encoding was
                     # sniffed during a previous pass through the
                     # document, or an encoding was specified
                     # explicitly and it worked. Rewrite the meta tag.
                     def rewrite(match):
                         return match.group(1) + "%SOUP-ENCODING%"
                     newAttr = self.CHARSET_RE.sub(rewrite, contentType)
                     attrs[contentTypeIndex] = (attrs[contentTypeIndex][0],
                                                newAttr)
                     tagNeedsEncodingSubstitution = True
                 else:
                     # This is our first pass through the document.
                     # Go through it again with the encoding information.
                     newCharset = match.group(3)
                     if newCharset and newCharset != self.originalEncoding:
                         self.declaredHTMLEncoding = newCharset
                         self._feed(self.declaredHTMLEncoding)
                         raise StopParsing
                     pass
         tag = self.unknown_starttag("meta", attrs)
         if tag and tagNeedsEncodingSubstitution:
             tag.containsSubstitutions = True
 

Member Data Documentation

BeautifulSoup.BeautifulSoup.declaredHTMLEncoding

Definition at line 1616 of file BeautifulSoup.py.

Referenced by BeautifulSoup.UnicodeDammit._detectEncoding().

BeautifulSoup.BeautifulSoup.originalEncoding

Definition at line 1599 of file BeautifulSoup.py.

Public Member Functions

Public Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation