CMS 3D CMS Logo

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Pages
List of all members | Public Member Functions | Public Attributes | Private Member Functions
BeautifulSoup.HTMLParserBuilder Class Reference
Inheritance diagram for BeautifulSoup.HTMLParserBuilder:

Public Member Functions

def __init__
 
def __init__
 
def handle_charref
 
def handle_charref
 
def handle_comment
 
def handle_comment
 
def handle_data
 
def handle_data
 
def handle_decl
 
def handle_decl
 
def handle_endtag
 
def handle_endtag
 
def handle_entityref
 
def handle_entityref
 
def handle_pi
 
def handle_pi
 
def handle_starttag
 
def handle_starttag
 
def parse_declaration
 
def parse_declaration
 

Public Attributes

 soup
 

Private Member Functions

def _toStringSubclass
 
def _toStringSubclass
 

Detailed Description

Definition at line 1005 of file BeautifulSoup.py.

Constructor & Destructor Documentation

def BeautifulSoup.HTMLParserBuilder.__init__ (   self,
  soup 
)

Definition at line 1007 of file BeautifulSoup.py.

Referenced by BeautifulSoup.HTMLParserBuilder.__init__(), and BeautifulSoup.HTMLParserBuilder.parse_declaration().

1008  def __init__(self, soup):
1009  HTMLParser.__init__(self)
1010  self.soup = soup
def BeautifulSoup.HTMLParserBuilder.__init__ (   self,
  soup 
)

Definition at line 1007 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.__init__(), and BeautifulSoup.HTMLParserBuilder.soup.

1008  def __init__(self, soup):
1009  HTMLParser.__init__(self)
1010  self.soup = soup

Member Function Documentation

def BeautifulSoup.HTMLParserBuilder._toStringSubclass (   self,
  text,
  subclass 
)
private
Adds a certain piece of text to the tree as a NavigableString
subclass.

Definition at line 1025 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_data().

Referenced by BeautifulSoup.HTMLParserBuilder._toStringSubclass(), BeautifulSoup.HTMLParserBuilder.handle_comment(), BeautifulSoup.HTMLParserBuilder.handle_decl(), BeautifulSoup.HTMLParserBuilder.handle_pi(), and BeautifulSoup.HTMLParserBuilder.parse_declaration().

1026  def _toStringSubclass(self, text, subclass):
1027  """Adds a certain piece of text to the tree as a NavigableString
1028  subclass."""
1029  self.soup.endData()
1030  self.handle_data(text)
1031  self.soup.endData(subclass)
def BeautifulSoup.HTMLParserBuilder._toStringSubclass (   self,
  text,
  subclass 
)
private
Adds a certain piece of text to the tree as a NavigableString
subclass.

Definition at line 1025 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass(), and BeautifulSoup.HTMLParserBuilder.handle_data().

1026  def _toStringSubclass(self, text, subclass):
1027  """Adds a certain piece of text to the tree as a NavigableString
1028  subclass."""
1029  self.soup.endData()
1030  self.handle_data(text)
1031  self.soup.endData(subclass)
def BeautifulSoup.HTMLParserBuilder.handle_charref (   self,
  ref 
)

Definition at line 1044 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_charref(), and BeautifulSoup.HTMLParserBuilder.handle_data().

1045  def handle_charref(self, ref):
1046  "Handle character references as data."
1047  if self.soup.convertEntities:
1048  data = unichr(int(ref))
1049  else:
1050  data = '&#%s;' % ref
1051  self.handle_data(data)
def BeautifulSoup.HTMLParserBuilder.handle_charref (   self,
  ref 
)

Definition at line 1044 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_data().

Referenced by BeautifulSoup.HTMLParserBuilder.handle_charref().

1045  def handle_charref(self, ref):
1046  "Handle character references as data."
1047  if self.soup.convertEntities:
1048  data = unichr(int(ref))
1049  else:
1050  data = '&#%s;' % ref
1051  self.handle_data(data)
def BeautifulSoup.HTMLParserBuilder.handle_comment (   self,
  text 
)

Definition at line 1040 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass(), and BeautifulSoup.HTMLParserBuilder.handle_comment().

1041  def handle_comment(self, text):
1042  "Handle comments as Comment objects."
1043  self._toStringSubclass(text, Comment)
def BeautifulSoup.HTMLParserBuilder.handle_comment (   self,
  text 
)

Definition at line 1040 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass().

Referenced by BeautifulSoup.HTMLParserBuilder.handle_comment().

1041  def handle_comment(self, text):
1042  "Handle comments as Comment objects."
1043  self._toStringSubclass(text, Comment)
def BeautifulSoup.HTMLParserBuilder.handle_data (   self,
  content 
)

Definition at line 1022 of file BeautifulSoup.py.

Referenced by BeautifulSoup.HTMLParserBuilder._toStringSubclass(), BeautifulSoup.HTMLParserBuilder.handle_charref(), BeautifulSoup.HTMLParserBuilder.handle_data(), BeautifulSoup.HTMLParserBuilder.handle_entityref(), BeautifulSoup.HTMLParserBuilder.parse_declaration(), BeautifulSoup.BeautifulStoneSoup.unknown_endtag(), and BeautifulSoup.BeautifulStoneSoup.unknown_starttag().

1023  def handle_data(self, content):
1024  self.soup.handle_data(content)
def BeautifulSoup.HTMLParserBuilder.handle_data (   self,
  content 
)

Definition at line 1022 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_data().

1023  def handle_data(self, content):
1024  self.soup.handle_data(content)
def BeautifulSoup.HTMLParserBuilder.handle_decl (   self,
  data 
)

Definition at line 1095 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass().

Referenced by BeautifulSoup.HTMLParserBuilder.handle_decl().

1096  def handle_decl(self, data):
1097  "Handle DOCTYPEs and the like as Declaration objects."
1098  self._toStringSubclass(data, Declaration)
def BeautifulSoup.HTMLParserBuilder.handle_decl (   self,
  data 
)

Definition at line 1095 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass(), and BeautifulSoup.HTMLParserBuilder.handle_decl().

1096  def handle_decl(self, data):
1097  "Handle DOCTYPEs and the like as Declaration objects."
1098  self._toStringSubclass(data, Declaration)
def BeautifulSoup.HTMLParserBuilder.handle_endtag (   self,
  name 
)

Definition at line 1019 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_endtag().

1020  def handle_endtag(self, name):
1021  self.soup.unknown_endtag(name)
def BeautifulSoup.HTMLParserBuilder.handle_endtag (   self,
  name 
)

Definition at line 1019 of file BeautifulSoup.py.

Referenced by BeautifulSoup.HTMLParserBuilder.handle_endtag().

1020  def handle_endtag(self, name):
1021  self.soup.unknown_endtag(name)
def BeautifulSoup.HTMLParserBuilder.handle_entityref (   self,
  ref 
)
Handle entity references as data, possibly converting known
HTML and/or XML entity references to the corresponding Unicode
characters.

Definition at line 1052 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_data().

Referenced by BeautifulSoup.HTMLParserBuilder.handle_entityref().

1053  def handle_entityref(self, ref):
1054  """Handle entity references as data, possibly converting known
1055  HTML and/or XML entity references to the corresponding Unicode
1056  characters."""
1057  data = None
1058  if self.soup.convertHTMLEntities:
1059  try:
1060  data = unichr(name2codepoint[ref])
1061  except KeyError:
1062  pass
1063 
1064  if not data and self.soup.convertXMLEntities:
1065  data = self.soup.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref)
1066 
1067  if not data and self.soup.convertHTMLEntities and \
1068  not self.soup.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref):
1069  # TODO: We've got a problem here. We're told this is
1070  # an entity reference, but it's not an XML entity
1071  # reference or an HTML entity reference. Nonetheless,
1072  # the logical thing to do is to pass it through as an
1073  # unrecognized entity reference.
1074  #
1075  # Except: when the input is "&carol;" this function
1076  # will be called with input "carol". When the input is
1077  # "AT&T", this function will be called with input
1078  # "T". We have no way of knowing whether a semicolon
1079  # was present originally, so we don't know whether
1080  # this is an unknown entity or just a misplaced
1081  # ampersand.
1082  #
1083  # The more common case is a misplaced ampersand, so I
1084  # escape the ampersand and omit the trailing semicolon.
1085  data = "&%s" % ref
1086  if not data:
1087  # This case is different from the one above, because we
1088  # haven't already gone through a supposedly comprehensive
1089  # mapping of entities to Unicode characters. We might not
1090  # have gone through any mapping at all. So the chances are
1091  # very high that this is a real entity, and not a
1092  # misplaced ampersand.
1093  data = "&%s;" % ref
1094  self.handle_data(data)
def BeautifulSoup.HTMLParserBuilder.handle_entityref (   self,
  ref 
)
Handle entity references as data, possibly converting known
HTML and/or XML entity references to the corresponding Unicode
characters.

Definition at line 1052 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_data(), and BeautifulSoup.HTMLParserBuilder.handle_entityref().

1053  def handle_entityref(self, ref):
1054  """Handle entity references as data, possibly converting known
1055  HTML and/or XML entity references to the corresponding Unicode
1056  characters."""
1057  data = None
1058  if self.soup.convertHTMLEntities:
1059  try:
1060  data = unichr(name2codepoint[ref])
1061  except KeyError:
1062  pass
1063 
1064  if not data and self.soup.convertXMLEntities:
1065  data = self.soup.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref)
1066 
1067  if not data and self.soup.convertHTMLEntities and \
1068  not self.soup.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref):
1069  # TODO: We've got a problem here. We're told this is
1070  # an entity reference, but it's not an XML entity
1071  # reference or an HTML entity reference. Nonetheless,
1072  # the logical thing to do is to pass it through as an
1073  # unrecognized entity reference.
1074  #
1075  # Except: when the input is "&carol;" this function
1076  # will be called with input "carol". When the input is
1077  # "AT&T", this function will be called with input
1078  # "T". We have no way of knowing whether a semicolon
1079  # was present originally, so we don't know whether
1080  # this is an unknown entity or just a misplaced
1081  # ampersand.
1082  #
1083  # The more common case is a misplaced ampersand, so I
1084  # escape the ampersand and omit the trailing semicolon.
1085  data = "&%s" % ref
1086  if not data:
1087  # This case is different from the one above, because we
1088  # haven't already gone through a supposedly comprehensive
1089  # mapping of entities to Unicode characters. We might not
1090  # have gone through any mapping at all. So the chances are
1091  # very high that this is a real entity, and not a
1092  # misplaced ampersand.
1093  data = "&%s;" % ref
1094  self.handle_data(data)
def BeautifulSoup.HTMLParserBuilder.handle_pi (   self,
  text 
)
Handle a processing instruction as a ProcessingInstruction
object, possibly one with a %SOUP-ENCODING% slot into which an
encoding will be plugged later.

Definition at line 1032 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass(), and BeautifulSoup.HTMLParserBuilder.handle_pi().

1033  def handle_pi(self, text):
1034  """Handle a processing instruction as a ProcessingInstruction
1035  object, possibly one with a %SOUP-ENCODING% slot into which an
1036  encoding will be plugged later."""
1037  if text[:3] == "xml":
1038  text = u"xml version='1.0' encoding='%SOUP-ENCODING%'"
1039  self._toStringSubclass(text, ProcessingInstruction)
def BeautifulSoup.HTMLParserBuilder.handle_pi (   self,
  text 
)
Handle a processing instruction as a ProcessingInstruction
object, possibly one with a %SOUP-ENCODING% slot into which an
encoding will be plugged later.

Definition at line 1032 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass().

Referenced by BeautifulSoup.HTMLParserBuilder.handle_pi().

1033  def handle_pi(self, text):
1034  """Handle a processing instruction as a ProcessingInstruction
1035  object, possibly one with a %SOUP-ENCODING% slot into which an
1036  encoding will be plugged later."""
1037  if text[:3] == "xml":
1038  text = u"xml version='1.0' encoding='%SOUP-ENCODING%'"
1039  self._toStringSubclass(text, ProcessingInstruction)
def BeautifulSoup.HTMLParserBuilder.handle_starttag (   self,
  name,
  attrs 
)

Definition at line 1013 of file BeautifulSoup.py.

Referenced by BeautifulSoup.HTMLParserBuilder.handle_starttag().

1014  def handle_starttag(self, name, attrs):
1015  if name == 'meta':
1016  self.soup.extractCharsetFromMeta(attrs)
1017  else:
1018  self.soup.unknown_starttag(name, attrs)
def BeautifulSoup.HTMLParserBuilder.handle_starttag (   self,
  name,
  attrs 
)

Definition at line 1013 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.handle_starttag().

1014  def handle_starttag(self, name, attrs):
1015  if name == 'meta':
1016  self.soup.extractCharsetFromMeta(attrs)
1017  else:
1018  self.soup.unknown_starttag(name, attrs)
def BeautifulSoup.HTMLParserBuilder.parse_declaration (   self,
  i 
)
Treat a bogus SGML declaration as raw data. Treat a CDATA
declaration as a CData object.

Definition at line 1099 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder.__init__(), BeautifulSoup.HTMLParserBuilder._toStringSubclass(), BeautifulSoup.HTMLParserBuilder.handle_data(), BeautifulSoup.HTMLParserBuilder.parse_declaration(), DTROMonitorFilter.rawdata, and DQMNet::Object.rawdata.

1100  def parse_declaration(self, i):
1101  """Treat a bogus SGML declaration as raw data. Treat a CDATA
1102  declaration as a CData object."""
1103  j = None
1104  if self.rawdata[i:i+9] == '<![CDATA[':
1105  k = self.rawdata.find(']]>', i)
1106  if k == -1:
1107  k = len(self.rawdata)
1108  data = self.rawdata[i+9:k]
1109  j = k+3
1110  self._toStringSubclass(data, CData)
1111  else:
1112  try:
1113  j = HTMLParser.parse_declaration(self, i)
1114  except HTMLParseError:
1115  toHandle = self.rawdata[i:]
1116  self.handle_data(toHandle)
1117  j = i + len(toHandle)
1118  return j
1119 
def BeautifulSoup.HTMLParserBuilder.parse_declaration (   self,
  i 
)
Treat a bogus SGML declaration as raw data. Treat a CDATA
declaration as a CData object.

Definition at line 1099 of file BeautifulSoup.py.

References BeautifulSoup.HTMLParserBuilder._toStringSubclass(), BeautifulSoup.HTMLParserBuilder.handle_data(), DTROMonitorFilter.rawdata, and DQMNet::Object.rawdata.

Referenced by BeautifulSoup.HTMLParserBuilder.parse_declaration().

1100  def parse_declaration(self, i):
1101  """Treat a bogus SGML declaration as raw data. Treat a CDATA
1102  declaration as a CData object."""
1103  j = None
1104  if self.rawdata[i:i+9] == '<![CDATA[':
1105  k = self.rawdata.find(']]>', i)
1106  if k == -1:
1107  k = len(self.rawdata)
1108  data = self.rawdata[i+9:k]
1109  j = k+3
1110  self._toStringSubclass(data, CData)
1111  else:
1112  try:
1113  j = HTMLParser.parse_declaration(self, i)
1114  except HTMLParseError:
1115  toHandle = self.rawdata[i:]
1116  self.handle_data(toHandle)
1117  j = i + len(toHandle)
1118  return j
1119 

Member Data Documentation

BeautifulSoup.HTMLParserBuilder.soup

Definition at line 1009 of file BeautifulSoup.py.

Referenced by BeautifulSoup.HTMLParserBuilder.__init__().