CMS 3D CMS Logo

Public Member Functions | Public Attributes | Private Member Functions | Private Attributes | Static Private Attributes

parserPerfsuiteMetadata::parserPerfsuiteMetadata Class Reference

List of all members.

Public Member Functions

def __init__
def doQuery
def findFirstIndex_ofStartsWith
def findLineAfter
def findLineBefore
def firstTimeStampAfter
def firstTimeStampBefore
def get_tarball_fromlog
def getIgSummary
def getMachineInfo
def getSummaryInfo
def handleParsingError
def isTimeStamp
def parseAll
def parseAllOtherTests
def parseGeneralInfo
def parseTheCompletion
def parseTimeSize
def readCmsScimark
def readCmsScimarkTest
def readInput
def validateSteps

Public Attributes

 lines_general
 lines_other
 lines_timesize
 missing_fields
 reCmsScimarkTest

Private Member Functions

def _applyParsingRules

Private Attributes

 _DEBUG
 _MAX_STEPS
 _otherStart
 _path
 _timeSizeEnd
 _timeSizeStart

Static Private Attributes

string _LINE_SEPARATOR = "|"

Detailed Description

        The whole parsing works as following. We split the file into 3 parts (we keep 3 variables of line lists:self.lines_general, self.lines_timesize, self.lines_other ):

                * General info
        As most of the info are simple one line strings, we define some regular expressions defining and matching each of those lines. The regular expressions are associated with data which we can get from them. e.g. ^Suite started at (.+) on (.+) by user (.+)$ would match only the line defining the time suite started and on which machine. It's associated with tuple of field names for general info which will be filled in. in this way we get info = {'start_time': start-taken-from-regexp, 'host': host, 'user': user}. This is done by calling simple function _applyParsingRules which checks each lines with each if one passes another, if it does fills in the result dictionary with the result.
        Additionaly we get the cpu and memmory info from /proc/cpuinfo /proc/meminfo

                * TimeSize test
        We use the same technique a little bit also. But at first we divide the timesize lines by job (individual run of cmssw - per candle, and pileup/not). Then for each of the jobs we apply our parsing rules, also we find the starting and ending times (i.e. We know that start timestamp is somethere after certain line containing "Written out cmsRelvalreport.py input file at:")

                * All other tests
        We find the stating that the test is being launched (containing the test name, core and num events). Above we have the thread number, and below the starting time.
        The ending time can be ONLY connected with the starting time by the Thread-ID. The problem is that the file names different the same test instance like <Launching "PILE UP Memcheck"> and <"Memcheck" stopped>.

Definition at line 8 of file parserPerfsuiteMetadata.py.


Constructor & Destructor Documentation

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::__init__ (   self,
  path 
)

Definition at line 28 of file parserPerfsuiteMetadata.py.

00029                                 :
00030                 
00031                 self._MAX_STEPS  = 5 # MAXIMUM NUMBER OF STEPS PER RUN (taskset relvalreport.py...)
00032                 self._DEBUG = False
00033 
00034 
00035                 self._path = path
00036                 
00037                 """ some initialisation to speedup the other functions """
00038                 #for cmsscimark
00039                 self.reCmsScimarkTest = re.compile(r"""^Composite Score:(\s*)([^\s]+)$""")
00040 
00041                 #TimeSize
00042                 """ the separator for beginning of timeSize / end of general statistics """
00043                 self._timeSizeStart = re.compile(r"""^Launching the TimeSize tests \(TimingReport, TimeReport, SimpleMemoryCheck, EdmSize\) with (\d+) events each$""")
00044                 """ (the first timestamp is the start of TimeSize) """
00045 
00046 
00047                 """ the separator for end of timeSize / beginning of IgProf_Perf, IgProf_Mem,  Memcheck, Callgrind tests """
00048                 self._timeSizeEnd = re.compile(r"""^Stopping all cmsScimark jobs now$""")
00049 
00050                 #Other tests:
00051                 self._otherStart = re.compile(r"^Preparing")
00052 
00053                 """ 
00054                 ----- READ THE DATA -----
00055                 """
00056                 lines = self.readInput(path)
00057                 """ split the whole file  into parts """
00058                 #Let's not assume there are ALWAYS TimeSize tests in the runs of the Performance Suite!:
00059                 #Check first:  
00060                 #FIXME: Vidmantas did not think to this case... will need to implement protectionb against it for all the IB tests...
00061                 #To do as soon as possible...
00062                 #Maybe revisit the strategy if it can be done quickly.
00063                 timesize_end= [lines.index(line)  for line in lines if self._timeSizeEnd.match(line)]
00064                 if timesize_end:
00065                         timesize_end_index = timesize_end[0]
00066                 else:
00067                         timesize_end_index=0
00068                 timesize_start=[lines.index(line) for line in lines if self._timeSizeStart.match(line)]
00069                 general_stop=[lines.index(line) for line in lines if self._otherStart.match(line)]
00070                 if timesize_start:
00071                         timesize_start_index = timesize_start[0]
00072                         general_stop_index=timesize_start_index
00073                 elif general_stop:
00074                         timesize_start_index=0
00075                         general_stop_index=general_stop[0]
00076                 else:
00077                         timesize_start_index=0
00078                         general_stop_index=-1
00079 
00080                 """ we split the structure:
00081                         * general
00082                         * timesize
00083                         * all others [igprof etc]
00084                 """
00085         
00086                 """ we get the indexes of spliting """
00087                 #Not OK to use timsize_start_index for the general lines... want to be general, also to cases of no TimeSize tests...
00088                 #self.lines_general = lines[:timesize_start_index]
00089                 self.lines_general = lines[:general_stop_index]
00090                 self.lines_timesize = lines[timesize_start_index:timesize_end_index+1]
00091                 self.lines_other = lines[timesize_end_index:]           
00092         
00093                 """ a list of missing fields """
00094                 self.missing_fields = []


Member Function Documentation

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::_applyParsingRules (   self,
  parsing_rules,
  lines 
) [private]
        Applies the (provided) regular expression rules (=rule[1] for rule in parsing_rules)
        to each line and if it matches the line,
        puts the mached information to the dictionary as the specified keys (=rule[0]) which is later returned
        Rule[3] contains whether the field is required to be found. If so and it isn't found the exception would be raised.
        rules = [
          ( (field_name_1_to_match, field_name_2), regular expression, /optionaly: is the field required? if so "req"/ )
        ]
 
we call a shared parsing helper 

Definition at line 235 of file parserPerfsuiteMetadata.py.

00236                                                           :
00237                 """ 
00238                         Applies the (provided) regular expression rules (=rule[1] for rule in parsing_rules)
00239                         to each line and if it matches the line,
00240                         puts the mached information to the dictionary as the specified keys (=rule[0]) which is later returned
00241                         Rule[3] contains whether the field is required to be found. If so and it isn't found the exception would be raised.
00242                         rules = [
00243                           ( (field_name_1_to_match, field_name_2), regular expression, /optionaly: is the field required? if so "req"/ )
00244                         ]
00245                  """
00246                 """ we call a shared parsing helper """
00247                 #parsing_rules = map(parsingRulesHelper.rulesRegexpCompileFunction, parsing_rules)
00248                 #print parsing_rules
00249                 (info, missing_fields) = parsingRulesHelper.rulesParser(parsing_rules, lines, compileRules = True)
00250 
00251                 self.missing_fields.extend(missing_fields)
00252 
00253                 return info
00254 

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::doQuery (   self,
  query,
  database 
)

Definition at line 631 of file parserPerfsuiteMetadata.py.

00632                                           :
00633                 if os.path.exists("/usr/bin/sqlite3"):
00634                         sqlite="/usr/bin/sqlite3"
00635                 else:
00636                         sqlite="/afs/cern.ch/user/e/eulisse/www/bin/sqlite"
00637                 return getstatusoutput("echo '%s' | %s -separator @@@ %s" % (query, sqlite, database))
                    
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findFirstIndex_ofStartsWith (   job_lines,
  start_of_line 
)

Definition at line 113 of file parserPerfsuiteMetadata.py.

00114                                                                  :
00115                 return [job_lines.index(line) 
00116                         for line in job_lines 
00117                         if line.startswith(start_of_line)][0]
        
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findLineAfter (   self,
  line_index,
  lines,
  test_condition,
  return_index = False 
)
finds a line satisfying the `test_condition` comming after the `line_index` 

Definition at line 129 of file parserPerfsuiteMetadata.py.

00130                                                                                         :
00131                 """ finds a line satisfying the `test_condition` comming after the `line_index` """
00132                 # we're going forward the lines list
00133                 for line_index in xrange(line_index + 1, len(lines)):
00134                         line = lines[line_index]
00135 
00136                         if test_condition(line):        
00137                                 if return_index:
00138                                         return line_index
00139                                 return line

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findLineBefore (   self,
  line_index,
  lines,
  test_condition 
)
finds a line satisfying the `test_condition` comming before the `line_index` 

Definition at line 118 of file parserPerfsuiteMetadata.py.

00119                                                                    :
00120                 """ finds a line satisfying the `test_condition` comming before the `line_index` """
00121                 # we're going backwards the lines list
00122                 for line_index in  xrange(line_index -1, -1, -1):
00123                         line = lines[line_index]
00124 
00125                         if test_condition(line):
00126                                 return line
00127                 raise ValueError
00128 

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::firstTimeStampAfter (   self,
  line_index,
  lines 
)
returns the first timestamp AFTER the line with given index 

Definition at line 145 of file parserPerfsuiteMetadata.py.

00146                                                         :
00147                 """ returns the first timestamp AFTER the line with given index """
00148 
00149                 return self.findLineAfter(line_index, lines, test_condition = self.isTimeStamp)

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::firstTimeStampBefore (   self,
  line_index,
  lines 
)
returns the first timestamp BEFORE the line with given index 

Definition at line 140 of file parserPerfsuiteMetadata.py.

00141                                                          :
00142                 """ returns the first timestamp BEFORE the line with given index """
00143 
00144                 return self.findLineBefore(line_index, lines, test_condition = self.isTimeStamp)

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::get_tarball_fromlog (   self)
Return the tarball castor location by parsing the cmsPerfSuite.log file

Definition at line 690 of file parserPerfsuiteMetadata.py.

00691                                      :
00692                 '''Return the tarball castor location by parsing the cmsPerfSuite.log file'''
00693                 print "Getting the url from the cmsPerfSuite.log"
00694                 log=open("cmsPerfSuite.log","r")
00695                 castor_dir="UNKNOWN_CASTOR_DIR"
00696                 tarball="UNKNOWN_TARBALL"
00697                 for line in log.readlines():
00698                         if 'castordir' in line:
00699                                 castor_dir=line.split()[1]
00700                         if 'tgz' in line and tarball=="UNKNOWN_TARBALL": #Pick the first line that contains the tar command...
00701                                 if 'tar' in line:
00702                                         tarball=os.path.basename(line.split()[2])
00703                 castor_tarball=os.path.join(castor_dir,tarball)
00704                 return castor_tarball

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::getIgSummary (   self)

Definition at line 602 of file parserPerfsuiteMetadata.py.

00603                               :
00604                 igresult = []
00605                 globbed = glob.glob(os.path.join(self._path, "../*/IgProfData/*/*/*.sql3"))
00606 
00607                 for f in globbed:
00608                         #print f
00609                         profileInfo = self.getSummaryInfo(f)
00610                         if not profileInfo:
00611                                 continue
00612                         cumCounts, cumCalls = profileInfo
00613                         dump, architecture, release, rest = f.rsplit("/", 3)
00614                         candle, sequence, pileup, conditions, process, counterType, events = rest.split("___")
00615                         events = events.replace(".sql3", "")
00616                         igresult.append({"counter_type": counterType, "event": events, "cumcounts": cumCounts, "cumcalls": cumCalls})
00617 
00618                 return igresult 

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::getMachineInfo (   self)
Returns the cpu and memory info  
cpu info 
we assume that:
 * num_cores = max(core id+1) [it's counted from 0]
 * 'model name' is processor type [we will return only the first one - we assume others to be same!!??
 * cpu MHz - is the speed of CPU
for 
        model name	: Intel(R) Core(TM)2 Duo CPU     L9400  @ 1.86GHz
        cpu MHz		: 800.000
        cache size	: 6144 KB

Definition at line 175 of file parserPerfsuiteMetadata.py.

00176                                 :
00177                 """ Returns the cpu and memory info  """
00178 
00179                 """ cpu info """
00180 
00181                 """
00182                 we assume that:
00183                  * num_cores = max(core id+1) [it's counted from 0]
00184                  * 'model name' is processor type [we will return only the first one - we assume others to be same!!??
00185                  * cpu MHz - is the speed of CPU
00186                 """
00187                 #TODO: BUT cpu MHz show not the maximum speed but current, 
00188                 """
00189                 for 
00190                         model name      : Intel(R) Core(TM)2 Duo CPU     L9400  @ 1.86GHz
00191                         cpu MHz         : 800.000
00192                         cache size      : 6144 KB
00193                 """
00194                 cpu_result = {}
00195                 try:
00196                         f= open(os.path.join(self._path, "cpuinfo"), "r")
00197 
00198                         #we split data into a list of tuples = [(attr_name, attr_value), ...]
00199                         cpu_attributes = [l.strip().split(":") for l in f.readlines()]
00200                         #print cpu_attributes
00201                         f.close()
00202                         cpu_result = {
00203                                 "num_cores": max ([int(attr[1].strip())+1 for attr in cpu_attributes if attr[0].strip() == "processor"]), #Bug... Vidmantas used "core id"
00204                                 "cpu_speed_MHZ": max ([attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "cpu MHz"]),
00205                                 "cpu_cache_size": [attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "cache size"][0],
00206                                 "cpu_model_name": [attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "model name"][0]
00207                         }
00208                 except IOError,e:
00209                         print e
00210 
00211                 
00212                 
00213 
00214 
00215                 """ memory info """
00216                 mem_result = {}
00217 
00218                 try:
00219                         f= open(os.path.join(self._path, "meminfo"), "r")
00220 
00221                         #we split data into a list of tuples = [(attr_name, attr_value), ...]
00222                         mem_attributes = [l.strip().split(":") for l in f.readlines()]
00223 
00224                         mem_result = {
00225                                 "memory_total_ram": [attr[1].strip() for attr in mem_attributes if attr[0].strip() == "MemTotal"][0]
00226                         }
00227 
00228                 except IOError,e:
00229                         print e
00230         
00231                 cpu_result.update(mem_result)
00232                 return cpu_result
00233 
00234 
        
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::getSummaryInfo (   self,
  database 
)

Definition at line 619 of file parserPerfsuiteMetadata.py.

00620                                           :
00621                 summary_query="""SELECT counter, total_count, total_freq, tick_period
00622                                  FROM summary;"""
00623                 error, output = self.doQuery(summary_query, database)
00624                 if error or not output or output.count("\n") > 1:
00625                         return None
00626                 counter, total_count, total_freq, tick_period = output.split("@@@")
00627                 if counter == "PERF_TICKS":
00628                         return float(tick_period) * float(total_count), int(total_freq)
00629                 else:
00630                         return int(total_count), int(total_freq)

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::handleParsingError (   self,
  message 
)

Definition at line 150 of file parserPerfsuiteMetadata.py.

00151                                              :
00152                 if self._DEBUG:
00153                         raise ValueError, message
00154                 print " ======== AND ERROR WHILE PARSING METADATA ===="
00155                 print message
00156                 print " =============== end ========================= "

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::isTimeStamp (   line)
Returns whether the string is a timestamp (if not returns None)

>>> parserPerfsuiteMetadata.isTimeStamp("Fri Aug 14 01:16:03 2009")
True
>>> parserPerfsuiteMetadata.isTimeStamp("Fri Augx 14 01:16:03 2009")

Definition at line 96 of file parserPerfsuiteMetadata.py.

00097                              :
00098                 """
00099                 Returns whether the string is a timestamp (if not returns None)
00100 
00101                 >>> parserPerfsuiteMetadata.isTimeStamp("Fri Aug 14 01:16:03 2009")
00102                 True
00103                 >>> parserPerfsuiteMetadata.isTimeStamp("Fri Augx 14 01:16:03 2009")
00104 
00105                 """
00106                 datetime_format = "%a %b %d %H:%M:%S %Y" # we use default date format
00107                 try:
00108                         time.strptime(line, datetime_format)
00109                         return True
00110                 except ValueError:
00111                         return None
        
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseAll (   self)

Definition at line 705 of file parserPerfsuiteMetadata.py.

00706                           :
00707                 result = {"General": {}, "TestResults":{}, "cmsSciMark":{}, "IgSummary":{}, 'unrecognized_jobs': []}
00708 
00709                 """ all the general info - start, arguments, host etc """
00710                 result["General"].update(self.parseGeneralInfo())
00711 
00712                 """ machine info - cpu, memmory """
00713                 result["General"].update(self.getMachineInfo())
00714 
00715                 """ we add info about how successfull was the run, when it finished and final castor url to the file! """
00716                 result["General"].update(self.parseTheCompletion())
00717 
00718                 try:
00719                         result["TestResults"].update(self.parseTimeSize())
00720                 except Exception, e:
00721                         print "BAD BAD BAD UNHANDLED ERROR" + str(e)
00722 
00723 
00724                 #TODO:
00725                 #Check what Vidmantas was doing in the parseAllOtherTests, de facto it is not used now, so commenting it for now (to avoid the "BAD BAD BAD...."
00726                 #try:
00727                 #       result["TestResults"].update(self.parseAllOtherTests())
00728                 #except Exception, e:
00729                 #       print "BAD BAD BAD UNHANDLED ERROR" + str(e)
00730 
00731 
00732                 main_cores = [result["General"]["run_on_cpus"]]
00733                 num_cores = result["General"].get("num_cores", 0)
00734                 #DEBUG
00735                 #print "Number of cores was: %s"%num_cores
00736                 #TODO: temporarly - search for cores, use regexp
00737                 main_cores = [1]
00738 
00739                 # THE MAHCINE SCIMARKS
00740                 result["cmsSciMark"] = self.readCmsScimark(main_cores = main_cores)
00741                 result["IgSummary"] = self.getIgSummary()
00742                 
00743 
00744 
00745                 if self.missing_fields:
00746                         self.handleParsingError("========== SOME REQUIRED FIELDS WERE NOT FOUND DURING PARSING ======= "+ str(self.missing_fields))
00747 
00748                 return result
00749                 
00750                 

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseAllOtherTests (   self)

Definition at line 360 of file parserPerfsuiteMetadata.py.

00361                                     :
00362                 threads = {}
00363                 tests = {
00364                         #"IgProf_Perf": {}, "IgProf_Mem": {}, "Memcheck": {},   "Callgrind": {},
00365                 }
00366 
00367                 lines = self.lines_other
00368                 """
00369 
00370                 for each of IgProf_Perf, IgProf_Mem,  Memcheck, Callgrind tests we have such a structure of input file:
00371                 * beginning ->> and start timestamp- the firstone:
00372                         Adding thread <simpleGenReportThread(Thread-1, started)> to the list of active threads
00373                         Launching the Memcheck tests on cpu 3 with 5 events each
00374                         Fri Aug 14 01:16:03 2009
00375 
00376                         <... whatever might be here, might overlap with other test start/end messages ..>
00377 
00378                         Fri Aug 14 02:13:18 2009
00379                         Memcheck test, in thread <simpleGenReportThread(Thread-1, stopped)> is done running on core 3
00380                 * ending - the last timestamp "before is done running ...."
00381                 """
00382                 # we take the first TimeStamp after the starting message and the first before the finishing message
00383 
00384         
00385                 #TODO: if threads would be changed it would stop working!!!
00386 
00387                 # i.e. Memcheck, cpu, events
00388                 reStart = re.compile(r"""^Launching the (.*) tests on cpu (\d+) with (\d+) events each$""")
00389                 # i.e. Memcheck, thread name,core number
00390                 reEnd = re.compile(r"""^(.*) test, in thread <simpleGenReportThread\((.+), stopped\)> is done running on core (\d+)$""")
00391                 
00392                 #i.e. thread = Thread-1
00393                 reAddThread =  re.compile(r"""^Adding thread <simpleGenReportThread\((.+), started\)> to the list of active threads$""")
00394 
00395                 reExitCode = re.compile(r"""Individual cmsRelvalreport.py ExitCode (\d+)""")
00396                 """ we search for lines being either: (it's a little pascal'ish but we need the index!) """
00397                 for line_index in xrange(0, len(lines)):
00398                         line = lines[line_index]
00399 
00400                         # * starting of test
00401                         if reStart.match(line):
00402                                 #print reStart.match(line).groups()
00403                                 testName, testCore, testEventsNum = reStart.match(line).groups()
00404 
00405                                 time = self.firstTimeStampAfter(line_index, lines)
00406 
00407                                 #find the name of Thread: it's one of the lines before
00408                                 line_thread = self.findLineBefore(line_index, lines, test_condition=lambda l: reAddThread.match(l))
00409                                 (thread_id, ) =  reAddThread.match(line_thread).groups()
00410                                 
00411                                 #we add it to the list of threads as we DO NOT KNOW EXACT NAME OF TEST
00412                                 if not threads.has_key(thread_id):
00413                                         threads[thread_id] = {}
00414                                 # this way we would get an Exception in case of unknown test name! 
00415                                 threads[thread_id].update({"name": testName, "events_num": testEventsNum, "core": testCore, "start": time, "thread_id": thread_id})
00416 
00417                         # * or end of test
00418                         if reEnd.match(line):
00419                                 testName, thread_id, testCore = reEnd.match(line).groups()
00420                                 if not threads.has_key(testName):
00421                                         threads[thread_id] = {}
00422                                 #TODO: we get an exception if we found non existing
00423 
00424                                 time = self.firstTimeStampBefore(line_index, lines)
00425                                 try:
00426                                         exit_code = ""
00427                                         #we search for the exit code
00428                                         line_exitcode = self.findLineBefore(line_index, lines, test_condition=lambda l: reExitCode.match(l))
00429                                         exit_code, = reExitCode.match(line_exitcode).groups()
00430                                 except Exception, e:
00431                                         print "Error while getting exit code (Other test): %s" + str(e)
00432                                         
00433 
00434                                 # this way we would get an Exception in case of unknown test name! So we would be warned if the format have changed
00435                                 threads[thread_id].update({"end": time, "exit_code":exit_code})
00436                         for key, thread in threads.items():
00437                                 tests[thread["name"]] = thread
00438                 return tests
00439 

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseGeneralInfo (   self)

Definition at line 255 of file parserPerfsuiteMetadata.py.

00256                                   :
00257                 lines = self.lines_general
00258                 """ we define a simple list (tuple) of rules for parsing, the first part tuple defines the parameters to be fetched from the
00259                         regexp while the second one is the regexp itself """
00260                 #TIP: don't forget that tuple of one ends with ,
00261                 parsing_rules = (
00262                         (("", "num_cores", "run_on_cpus"), r"""^This machine \((.+)\) is assumed to have (\d+) cores, and the suite will be run on cpu \[(.+)\]$"""),
00263                         (("start_time", "host", "local_workdir", "user"), r"""^Performance Suite started running at (.+) on (.+) in directory (.+), run by user (.+)$""", "req"),
00264                         (("architecture",) ,r"""^Current Architecture is (.+)$"""),
00265                         (("test_release_based_on",), r"""^Test Release based on: (.+)$""", "req"),
00266                         (("base_release_path",) , r"""^Base Release in: (.+)$"""),
00267                         (("test_release_local_path",) , r"""^Your Test release in: (.+)$"""),
00268 
00269                         (("castor_dir",) , r"""^The performance suite results tarball will be stored in CASTOR at (.+)$"""),
00270                         
00271                         (("TimeSize_events",) , r"""^(\d+) TimeSize events$"""),
00272                         (("IgProf_events",) , r"""^(\d+) IgProf events$"""),
00273                         (("CallGrind_events",) , r"""^(\d+) Callgrind events$"""),
00274                         (("Memcheck_events",) , r"""^(\d+) Memcheck events$"""), 
00275 
00276                         (("candles_TimeSize",) , r"""^TimeSizeCandles \[(.*)\]$"""),
00277                         (("candles_TimeSizePU",) , r"""^TimeSizePUCandles \[(.*)\]$"""),
00278                         
00279                         (("candles_Memcheck",) , r"""^MemcheckCandles \[(.*)\]$"""),
00280                         (("candles_MemcheckPU",) , r"""^MemcheckPUCandles \[(.*)\]$"""),
00281 
00282                         (("candles_Callgrind",) , r"""^CallgrindCandles \[(.*)\]$"""),
00283                         (("candles_CallgrindPU",) , r"""^CallgrindPUCandles \[(.*)\]$"""),
00284 
00285                         (("candles_IgProfPU",) , r"""^IgProfPUCandles \[(.*)\]$"""),
00286                         (("candles_IgProf",) , r"""^IgProfCandles \[(.*)\]$"""),
00287 
00288 
00289                         (("cmsScimark_before",) , r"""^(\d+) cmsScimark benchmarks before starting the tests$"""),
00290                         (("cmsScimark_after",) , r"""^(\d+) cmsScimarkLarge benchmarks before starting the tests$"""),
00291                         (("cmsDriverOptions",) , r"""^Running cmsDriver.py with user defined options: --cmsdriver="(.+)"$"""),
00292 
00293                         (("HEPSPEC06_SCORE",) ,r"""^This machine's HEPSPEC06 score is: (.+)$"""),
00294 
00295 
00296                 )
00297                 """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """
00298                 info = self._applyParsingRules(parsing_rules, lines)
00299 
00300 
00301                 """ postprocess the candles list """
00302                 candles = {}
00303                 for field, value in info.items():
00304                         if field.startswith("candles_"):
00305                                 test = field.replace("candles_", "")
00306                                 value = [v.strip(" '") for v in value.split(",")]
00307                                 #if value:
00308                                 candles[test]=value
00309                                 del info[field]
00310                 #print candles
00311                 info["candles"] = self._LINE_SEPARATOR.join([k+":"+",".join(v) for (k, v) in candles.items()])
00312 
00313 
00314                 """ TAGS """
00315                 """ 
00316                 --- Tag ---    --- RelTag --- -------- Package --------                        
00317                 HEAD           V05-03-06      IgTools/IgProf                                   
00318                 V01-06-05      V01-06-04      Validation/Performance                           
00319                 ---------------------------------------
00320                 total packages: 2 (2 displayed)
00321                 """
00322                 tags_start_index = -1 # set some default
00323                 try:
00324                         tags_start_index = [i for i in xrange(0, len(lines)) if lines[i].startswith("--- Tag ---")][0]
00325                 except:
00326                         pass
00327                 if tags_start_index > -1:
00328                         tags_end_index = [i for i in xrange(tags_start_index + 1, len(lines)) if lines[i].startswith("---------------------------------------")][0]
00329                         # print "tags start index: %s, end index: %s" % (tags_start_index, tags_end_index)
00330                         tags = lines[tags_start_index:tags_end_index+2]
00331                         # print [tag.split("  ") for tag in tags]
00332                         # print "\n".join(tags)
00333                 else: # no tags found, make an empty list ...
00334                         tags = []
00335                 """ we join the tags with separator to store as simple string """
00336                 info["tags"] = self._LINE_SEPARATOR.join(tags)
00337                 #FILES/PATHS
00338         
00339 
00340                 """ get the command line """
00341                 try:
00342                         cmd_index = self.findFirstIndex_ofStartsWith(lines, "Performance suite invoked with command line:") + 1 #that's the next line
00343                         info["command_line"] =  lines[cmd_index]
00344                 except IndexError, e:
00345                         if self._DEBUG:
00346                                 print e
00347                         info["command_line"] =  ""
00348                 
00349                 try:
00350                         cmd_parsed_start = self.findFirstIndex_ofStartsWith(lines, "Initial PerfSuite Arguments:") + 1
00351                         cmd_parsed_end = self.findFirstIndex_ofStartsWith(lines, "Running cmsDriver.py")
00352                         info["command_line_parsed"] = self._LINE_SEPARATOR.join(lines[cmd_parsed_start:cmd_parsed_end])
00353                 except IndexError, e:
00354                         if self._DEBUG:
00355                                 print e
00356                         info["command_line"] =  ""
00357 
00358                 return  info
00359 
        
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseTheCompletion (   self)
 checks if the suite has successfully finished  
        and if the tarball was successfully archived and uploaded to the castor 

Definition at line 638 of file parserPerfsuiteMetadata.py.

00639                                     :
00640                 """
00641                  checks if the suite has successfully finished  
00642                         and if the tarball was successfully archived and uploaded to the castor """
00643 
00644                 parsing_rules = (
00645                         (("finishing_time", "", ""), r"""^Performance Suite finished running at (.+) on (.+) in directory (.+)$"""),
00646                         (("castor_md5",) , r"""^The md5 checksum of the tarball: (.+)$"""),     
00647                         (("successfully_archived_tarball", ), r"""^Successfully archived the tarball (.+) in CASTOR!$"""),
00648                         #TODO: WE MUST HAVE THE CASTOR URL, but for some of files it's not included [probably crashed]
00649                         (("castor_file_url",), r"""^The tarball can be found: (.+)$"""),                        
00650                         (("castor_logfile_url",), r"""^The logfile can be found: (.+)$"""),
00651                 )
00652 
00653                 
00654                 """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """
00655                 info = self._applyParsingRules(parsing_rules, self.lines_other)
00656 
00657                 """ did we detect any errors in log files ? """
00658                 info["no_errors_detected"] = [line for line in self.lines_other if line == "There were no errors detected in any of the log files!"] and "1" or "0"
00659                 if not info["successfully_archived_tarball"]:
00660                         info["castor_file_url"] = ""
00661 
00662                 if not info["castor_file_url"]:
00663                         #TODO: get the castor file url or abort
00664                         self.handleParsingError( "Castor tarball URL not found. Trying to get from environment")
00665                         lmdb_castor_url_is_valid = lambda url: url.startswith("/castor/")
00666 
00667                         url = ""
00668                         try:
00669                                 print "HERE!"
00670                                 url=self.get_tarball_fromlog()
00671                                 print "Extracted castor tarball full path by re-parsing cmsPerfSuite.log: %s"%url
00672                                 
00673                         except:
00674                                 if os.environ.has_key("PERFDB_CASTOR_FILE_URL"):
00675                                         url = os.environ["PERFDB_CASTOR_FILE_URL"]
00676                                         
00677                                 else: #FIXME: add the possibility to get it directly from the cmsPerfSuite.log file (make sure it is dumped there before doing the tarball itself...)
00678                                         print "Failed to get the tarball location from environment variable PERFDB_CASTOR_FILE_URL" 
00679                                         self.handleParsingError( "Castor tarball URL not found. Provide interactively")
00680 
00681                         while True:
00682                                 
00683                                 if lmdb_castor_url_is_valid(url):
00684                                         info["castor_file_url"] = url
00685                                         break
00686                                 print "Please enter a valid CASTOR url: has to start with /castor/ and should point to the tarball"
00687                                 url = sys.stdin.readline()
00688 
00689 
                return info
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseTimeSize (   self)
parses the timeSize 

Definition at line 440 of file parserPerfsuiteMetadata.py.

00441                                :
00442                 """ parses the timeSize """
00443                 timesize_result = []
00444 
00445                 # TODO: we will use the first timestamp after the "or these tests will use user input file..."
00446                 #TODO: do we have to save the name of input file somewhere?
00447                 """
00448                 the structure of input file:
00449                 * beginning ->> and start timestamp- the firstone:              
00450                         >>> [optional:For these tests will use user input file /build/RAWReference/MinBias_RAW_320_IDEAL.root]
00451                         <...>
00452                         Using user-specified cmsDriver.py options: --conditions FrontierConditions_GlobalTag,MC_31X_V4::All --eventcontent RECOSIM
00453                         Candle MinBias will be PROCESSED
00454                         You defined your own steps to run:
00455                         RAW2DIGI-RECO
00456                         *Candle MinBias
00457                         Written out cmsRelvalreport.py input file at:
00458                         /build/relval/CMSSW_3_2_4/workStep2/MinBias_TimeSize/SimulationCandles_CMSSW_3_2_4.txt
00459                         Thu Aug 13 14:53:37 2009 [start]
00460                         <....>
00461                         Thu Aug 13 16:04:48 2009 [end]
00462                         Individual cmsRelvalreport.py ExitCode 0
00463                 * ending - the last timestamp "... ExitCode ...."
00464                 """
00465                 #TODO: do we need the cmsDriver --conditions? I suppose it would the global per work directory = 1 perfsuite run (so samefor all candles in one work dir)
00466                 # TODO: which candle definition to use?
00467                 """ divide into separate jobs """
00468                 lines = self.lines_timesize
00469                 jobs = []
00470                 start = False
00471                 timesize_start_indicator = re.compile(r"""^taskset -c (\d+) cmsRelvalreportInput.py""")
00472                 for line_index in xrange(0, len(lines)):
00473                         line = lines[line_index]
00474                         # search for start of each TimeSize job (with a certain candle and step)
00475                         if timesize_start_indicator.match(line):
00476                                 if start:
00477                                         jobs.append(lines[start:line_index])
00478                                 start = line_index
00479                 #add the last one
00480                 jobs.append(lines[start:len(lines)])
00481                 #print "\n".join(str(i) for i in jobs)
00482 
00483                 parsing_rules = (
00484                         (("", "candle", ), r"""^(Candle|ONLY) (.+) will be PROCESSED$""", "req"),
00485                         #e.g.: --conditions FrontierConditions_GlobalTag,MC_31X_V4::All --eventcontent RECOSIM
00486                         (("cms_driver_options", ), r"""^Using user-specified cmsDriver.py options: (.+)$"""),
00487                         (("", "conditions", ""), r"""^Using user-specified cmsDriver.py options: (.*)--conditions ([^\s]+)(.*)$""", "req"),
00488                         # for this we cannot guarrantee that it has been found, TODO: we might count the number of pileup candles and compare with arguments
00489                         (("",  "pileup_type", ""), r"""^Using user-specified cmsDriver.py options:(.*)--pileup=([^\s]+)(.*)$"""),
00490                         #not shure if event content is required
00491                         (("",  "event_content", ""), r"""^Using user-specified cmsDriver.py options:(.*)--eventcontent ([^\s]+)(.*)$""", "req"),
00492                         #TODO: after changeing the splitter to "taskset -c ..." this is no longer included into the part of correct job
00493                         #(("input_user_root_file", ), r"""^For these tests will use user input file (.+)$"""),
00494                 )
00495 
00496                 #parse each of the TimeSize jobs: find candles, etc and start-end times
00497 
00498                 reExit_code = re.compile(r"""Individual ([^\s]+) ExitCode (\d+)""")
00499 
00500                 if self._DEBUG:
00501                         print "TimeSize (%d) jobs: %s" % (len(jobs), str(jobs))
00502 
00503                 for job_lines in jobs:
00504                         """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """
00505                         info = self._applyParsingRules(parsing_rules, job_lines)
00506                         #Fixing here the compatibility with new cmsdriver.py --conditions option (for which now we have autoconditions and FrontierConditions_GlobalTag is optional):
00507                         if 'auto:' in info['conditions']:
00508                                 from Configuration.PyReleaseValidation.autoCond import autoCond
00509                                 info['conditions'] = autoCond[ info['conditions'].split(':')[1] ].split("::")[0]
00510                         else:
00511                                 if 'FrontierConditions_GlobalTag' in info['conditions']:
00512                                         info['conditions']=info['conditions'].split(",")[1]
00513                                                                                                                                 
00514                         #DEBUG:
00515                         #print "CONDITIONS are: %s"%info['conditions']
00516                         #start time - the index after which comes the time stamp
00517                         """ the following is not available on one of the releases, instead
00518                         use the first timestamp available on our job - that's the starting time :) """ 
00519                         
00520                         #start_time_after = self.findFirstIndex_ofStartsWith(job_lines, "Written out cmsRelvalreport.py input file at:")
00521                         #print start_time_after
00522                         info["start"] = self.firstTimeStampAfter(0, job_lines)
00523 
00524                         #TODO: improve in future (in case of some changes) we could use findBefore instead which uses the regexp as parameter for searching 
00525                         #end time - the index before which comes the time stamp
00526 
00527                         # On older files we have - "Individual Relvalreport.py ExitCode 0" instead of "Individual cmsRelvalreport.py ExitCode"
00528                         end_time_before = self.findLineAfter(0, job_lines, test_condition = reExit_code.match, return_index = True)
00529 
00530                         # on the same line we have the exit Code - so let's get it
00531                         nothing, exit_code = reExit_code.match(job_lines[end_time_before]).groups()
00532 
00533                         info["end"] = self.firstTimeStampBefore(end_time_before, job_lines)
00534                         info["exit_code"] = exit_code
00535 
00536                         steps_start = self.findFirstIndex_ofStartsWith(job_lines, "You defined your own steps to run:")
00537                         steps_end = self.findFirstIndex_ofStartsWith(job_lines, "*Candle ")
00538                         #probably it includes steps until we found *Candle... ?
00539                         steps = job_lines[steps_start + 1:steps_end]
00540                         if not self.validateSteps(steps):
00541                                 self.handleParsingError( "Steps were not found corrently: %s for current job: %s" % (str(steps), str(job_lines)))
00542                                 
00543                                 """ quite nasty - just a work around """
00544                                 print "Trying to recover from this error in case of old cmssw"
00545                                 
00546                                 """ we assume that steps are between the following sentance and a TimeStamp """
00547                                 steps_start = self.findFirstIndex_ofStartsWith(job_lines, "Steps passed to writeCommands")
00548                                 steps_end = self.findLineAfter(steps_start, job_lines, test_condition = self.isTimeStamp, return_index = True)
00549                                 
00550                                 steps = job_lines[steps_start + 1:steps_end]
00551                                 if not self.validateSteps(steps):
00552                                         self.handleParsingError( "EVEN AFTER RECOVERY Steps were not found corrently! : %s for current job: %s" % (str(steps), str(job_lines)))
00553                                 else:
00554                                         print "RECOVERY SEEMS to be successful: %s" % str(steps)
00555 
00556                         info["steps"] = self._LINE_SEPARATOR.join(steps) #!!!! STEPS MIGHT CONTAIN COMMA: ","
00557                         
00558 
00559                         timesize_result.append(info)
                return {"TimeSize": timesize_result}
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readCmsScimark (   self,
  main_cores = [1] 
)

Definition at line 576 of file parserPerfsuiteMetadata.py.

00577                                                   :
00578                 main_core = main_cores[0]
00579                 #TODO: WE DO NOT ALWAYS REALLY KNOW THE MAIN CORE NUMBER! but we don't care too much
00580                 #we parse each of the SciMark files and the Composite scores
00581                 csimark = []
00582                 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark2", testType = "mainCore", core = main_core))
00583                 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark2_large", testType = "mainCore_Large", core = main_core))
00584 
00585 
00586                 #we not always know the number of cores available so we will just search the directory to find out core numbers
00587                 reIsCsiMark_notusedcore = re.compile("^cmsScimark_(\d+).log$")
00588                 scimark_files = [reIsCsiMark_notusedcore.match(f).groups()[0]
00589                                 for f in os.listdir(self._path)
00590                                  if reIsCsiMark_notusedcore.match(f) 
00591                                         and os.path.isfile(os.path.join(self._path, f)) ]
00592 
00593                 for core_number in scimark_files:
00594                         try:
00595                                 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark_%s" % str(core_number), testType = "NotUsedCore_%s" %str(core_number), core = core_number))
00596                         except IOError, e:
00597                                 if self._DEBUG:
00598                                         print e
00599                 return csimark
00600                 #print csimark

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readCmsScimarkTest (   self,
  testName,
  testType,
  core 
)

Definition at line 564 of file parserPerfsuiteMetadata.py.

00565                                                               :
00566                 lines  = self.readInput(self._path, fileName = testName + ".log")
00567                 scores = [{"score": self.reCmsScimarkTest.match(line).groups()[1], "type": testType, "core": core}
00568                                 for line in lines 
00569                                 if self.reCmsScimarkTest.match(line)]
00570                 #add the number of messurment
00571                 i = 0
00572                 for score in scores:
00573                         i += 1
00574                         score.update({"messurement_number": i})
00575                 return scores
                
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readInput (   self,
  path,
  fileName = "cmsPerfSuite.log" 
)

Definition at line 161 of file parserPerfsuiteMetadata.py.

00162                                                                 :
00163                 try:
00164                         f = open(os.path.join(path, fileName), "r")
00165                         lines =  [s.strip() for s in f.readlines()]
00166                         f.close()
00167                 except IOError:
00168                         lines = []
00169 
00170                 #print self._lines
00171                 return lines
00172 
00173 
00174 

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::validateSteps (   self,
  steps 
)
Simple function for error detection. TODO: we could use a list of possible steps also 

Definition at line 24 of file parserPerfsuiteMetadata.py.

00025                                       :
00026                 """ Simple function for error detection. TODO: we could use a list of possible steps also """
00027                 return not (not steps or len(steps) > self._MAX_STEPS)


Member Data Documentation

Definition at line 28 of file parserPerfsuiteMetadata.py.

Definition at line 23 of file parserPerfsuiteMetadata.py.

Definition at line 28 of file parserPerfsuiteMetadata.py.

Definition at line 28 of file parserPerfsuiteMetadata.py.

Definition at line 28 of file parserPerfsuiteMetadata.py.

Definition at line 28 of file parserPerfsuiteMetadata.py.

Definition at line 28 of file parserPerfsuiteMetadata.py.

Definition at line 34 of file parserPerfsuiteMetadata.py.

Definition at line 34 of file parserPerfsuiteMetadata.py.

Definition at line 34 of file parserPerfsuiteMetadata.py.

Definition at line 34 of file parserPerfsuiteMetadata.py.

Definition at line 28 of file parserPerfsuiteMetadata.py.