Public Member Functions
def	__init__
def	findFirstIndex_ofStartsWith
def	findLineAfter
def	findLineBefore
def	firstTimeStampAfter
def	firstTimeStampBefore
def	get_tarball_fromlog
def	getMachineInfo
def	handleParsingError
def	isTimeStamp
def	parseAll
def	parseAllOtherTests
def	parseGeneralInfo
def	parseTheCompletion
def	parseTimeSize
def	readCmsScimark
def	readCmsScimarkTest
def	readInput
def	validateSteps
Public Attributes
	lines_general
	lines_other
	lines_timesize
	missing_fields
	reCmsScimarkTest
Private Member Functions
def	_applyParsingRules
Private Attributes
	_DEBUG
	_MAX_STEPS
	_otherStart
	_path
	_timeSizeEnd
	_timeSizeStart
Static Private Attributes
string	_LINE_SEPARATOR = "\|"

Detailed Description

The whole parsing works as following. We split the file into 3 parts (we keep 3 variables of line lists:self.lines_general, self.lines_timesize, self.lines_other ):

* General info
As most of the info are simple one line strings, we define some regular expressions defining and matching each of those lines. The regular expressions are associated with data which we can get from them. e.g. ^Suite started at (.+) on (.+) by user (.+)$ would match only the line defining the time suite started and on which machine. It's associated with tuple of field names for general info which will be filled in. in this way we get info = {'start_time': start-taken-from-regexp, 'host': host, 'user': user}. This is done by calling simple function _applyParsingRules which checks each lines with each if one passes another, if it does fills in the result dictionary with the result.
Additionaly we get the cpu and memmory info from /proc/cpuinfo /proc/meminfo

* TimeSize test
We use the same technique a little bit also. But at first we divide the timesize lines by job (individual run of cmssw - per candle, and pileup/not). Then for each of the jobs we apply our parsing rules, also we find the starting and ending times (i.e. We know that start timestamp is somethere after certain line containing "Written out cmsRelvalreport.py input file at:")

* All other tests
We find the stating that the test is being launched (containing the test name, core and num events). Above we have the thread number, and below the starting time.
The ending time can be ONLY connected with the starting time by the Thread-ID. The problem is that the file names different the same test instance like <Launching "PILE UP Memcheck"> and <"Memcheck" stopped>.

Definition at line 8 of file parserPerfsuiteMetadata.py.

Constructor & Destructor Documentation

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::__init__	(	self,
		path
	)

Definition at line 28 of file parserPerfsuiteMetadata.py.

00029                                 :
00030                 
00031                 self._MAX_STEPS  = 5 # MAXIMUM NUMBER OF STEPS PER RUN (taskset relvalreport.py...)
00032                 self._DEBUG = False
00033 
00034 
00035                 self._path = path
00036                 
00037                 """ some initialisation to speedup the other functions """
00038                 #for cmsscimark
00039                 self.reCmsScimarkTest = re.compile(r"""^Composite Score:(\s*)([^\s]+)$""")
00040 
00041                 #TimeSize
00042                 """ the separator for beginning of timeSize / end of general statistics """
00043                 self._timeSizeStart = re.compile(r"""^Launching the TimeSize tests \(TimingReport, TimeReport, SimpleMemoryCheck, EdmSize\) with (\d+) events each$""")
00044                 """ (the first timestamp is the start of TimeSize) """
00045 
00046 
00047                 """ the separator for end of timeSize / beginning of IgProf_Perf, IgProf_Mem,  Memcheck, Callgrind tests """
00048                 self._timeSizeEnd = re.compile(r"""^Stopping all cmsScimark jobs now$""")
00049 
00050                 #Other tests:
00051                 self._otherStart = re.compile(r"^Preparing")
00052 
00053                 """ 
00054                 ----- READ THE DATA -----
00055                 """
00056                 lines = self.readInput(path)
00057                 """ split the whole file  into parts """
00058                 #Let's not assume there are ALWAYS TimeSize tests in the runs of the Performance Suite!:
00059                 #Check first:  
00060                 #FIXME: Vidmantas did not think to this case... will need to implement protectionb against it for all the IB tests...
00061                 #To do as soon as possible...
00062                 #Maybe revisit the strategy if it can be done quickly.
00063                 timesize_end= [lines.index(line)  for line in lines if self._timeSizeEnd.match(line)]
00064                 if timesize_end:
00065                         timesize_end_index = timesize_end[0]
00066                 else:
00067                         timesize_end_index=0
00068                 timesize_start=[lines.index(line) for line in lines if self._timeSizeStart.match(line)]
00069                 general_stop=[lines.index(line) for line in lines if self._otherStart.match(line)]
00070                 if timesize_start:
00071                         timesize_start_index = timesize_start[0]
00072                         general_stop_index = timesize_start_index
00073                 elif general_stop:
00074                         timesize_start_index=timesize_end_index+1
00075                         general_stop_index=general_stop[0]
00076                 else:
00077                         timesize_start_index=0
00078                         general_stop_index=-1
00079 
00080                 """ we split the structure:
00081                         * general
00082                         * timesize
00083                         * all others [igprof etc]
00084                 """
00085         
00086                 """ we get the indexes of spliting """
00087                 #Not OK to use timsize_start_index for the general lines... want to be general, also to cases of no TimeSize tests...
00088                 #self.lines_general = lines[:timesize_start_index]
00089                 self.lines_general = lines[:general_stop_index]
00090                 self.lines_timesize = lines[timesize_start_index:timesize_end_index+1]
00091                 self.lines_other = lines[timesize_end_index:]           
00092         
00093                 """ a list of missing fields """
00094                 self.missing_fields = []

Member Function Documentation

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::_applyParsingRules	(	self,
		parsing_rules,
		lines
	)	`[private]`

        Applies the (provided) regular expression rules (=rule[1] for rule in parsing_rules)
        to each line and if it matches the line,
        puts the mached information to the dictionary as the specified keys (=rule[0]) which is later returned
        Rule[3] contains whether the field is required to be found. If so and it isn't found the exception would be raised.
        rules = [
          ( (field_name_1_to_match, field_name_2), regular expression, /optionaly: is the field required? if so "req"/ )
        ]

we call a shared parsing helper

Definition at line 235 of file parserPerfsuiteMetadata.py.

00236                                                           :
00237                 """ 
00238                         Applies the (provided) regular expression rules (=rule[1] for rule in parsing_rules)
00239                         to each line and if it matches the line,
00240                         puts the mached information to the dictionary as the specified keys (=rule[0]) which is later returned
00241                         Rule[3] contains whether the field is required to be found. If so and it isn't found the exception would be raised.
00242                         rules = [
00243                           ( (field_name_1_to_match, field_name_2), regular expression, /optionaly: is the field required? if so "req"/ )
00244                         ]
00245                  """
00246                 """ we call a shared parsing helper """
00247                 #parsing_rules = map(parsingRulesHelper.rulesRegexpCompileFunction, parsing_rules)
00248                 #print parsing_rules
00249                 (info, missing_fields) = parsingRulesHelper.rulesParser(parsing_rules, lines, compileRules = True)
00250 
00251                 self.missing_fields.extend(missing_fields)
00252 
00253                 return info
00254

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findFirstIndex_ofStartsWith	(	job_lines,
		start_of_line
	)

Definition at line 113 of file parserPerfsuiteMetadata.py.

00114                                                                  :
00115                 return [job_lines.index(line) 
00116                         for line in job_lines 
00117                         if line.startswith(start_of_line)][0]

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findLineAfter	(	self,
		line_index,
		lines,
		test_condition,
		return_index = `False`
	)

finds a line satisfying the `test_condition` comming after the `line_index`

Definition at line 129 of file parserPerfsuiteMetadata.py.

00130                                                                                         :
00131                 """ finds a line satisfying the `test_condition` comming after the `line_index` """
00132                 # we're going forward the lines list
00133                 for line_index in xrange(line_index + 1, len(lines)):
00134                         line = lines[line_index]
00135 
00136                         if test_condition(line):        
00137                                 if return_index:
00138                                         return line_index
00139                                 return line

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findLineBefore	(	self,
		line_index,
		lines,
		test_condition
	)

finds a line satisfying the `test_condition` comming before the `line_index`

Definition at line 118 of file parserPerfsuiteMetadata.py.

00119                                                                    :
00120                 """ finds a line satisfying the `test_condition` comming before the `line_index` """
00121                 # we're going backwards the lines list
00122                 for line_index in  xrange(line_index -1, -1, -1):
00123                         line = lines[line_index]
00124 
00125                         if test_condition(line):
00126                                 return line
00127                 raise ValueError
00128

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::firstTimeStampAfter	(	self,
		line_index,
		lines
	)

returns the first timestamp AFTER the line with given index

Definition at line 145 of file parserPerfsuiteMetadata.py.

00146                                                         :
00147                 """ returns the first timestamp AFTER the line with given index """
00148 
00149                 return self.findLineAfter(line_index, lines, test_condition = self.isTimeStamp)

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::firstTimeStampBefore	(	self,
		line_index,
		lines
	)

returns the first timestamp BEFORE the line with given index

Definition at line 140 of file parserPerfsuiteMetadata.py.

00141                                                          :
00142                 """ returns the first timestamp BEFORE the line with given index """
00143 
00144                 return self.findLineBefore(line_index, lines, test_condition = self.isTimeStamp)

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::get_tarball_fromlog ( self )

Return the tarball castor location by parsing the cmsPerfSuite.log file

Definition at line 707 of file parserPerfsuiteMetadata.py.

00708                                      :
00709                 '''Return the tarball castor location by parsing the cmsPerfSuite.log file'''
00710                 print "Getting the url from the cmsPerfSuite.log"
00711                 log=open("cmsPerfSuite.log","r")
00712                 castor_dir="UNKNOWN_CASTOR_DIR"
00713                 tarball="UNKNOWN_TARBALL"
00714                 for line in log.readlines():
00715                         if 'castordir' in line:
00716                                 castor_dir=line.split()[1]
00717                         if 'tgz' in line and tarball=="UNKNOWN_TARBALL": #Pick the first line that contains the tar command...
00718                                 if 'tar' in line:
00719                                         tarball=os.path.basename(line.split()[2])
00720                 castor_tarball=os.path.join(castor_dir,tarball)
00721                 return castor_tarball

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::getMachineInfo ( self )

Returns the cpu and memory info

cpu info

we assume that:
 * num_cores = max(core id+1) [it's counted from 0]
 * 'model name' is processor type [we will return only the first one - we assume others to be same!!??
 * cpu MHz - is the speed of CPU

for 
        model name	: Intel(R) Core(TM)2 Duo CPU     L9400  @ 1.86GHz
        cpu MHz		: 800.000
        cache size	: 6144 KB

Definition at line 175 of file parserPerfsuiteMetadata.py.

00176                                 :
00177                 """ Returns the cpu and memory info  """
00178 
00179                 """ cpu info """
00180 
00181                 """
00182                 we assume that:
00183                  * num_cores = max(core id+1) [it's counted from 0]
00184                  * 'model name' is processor type [we will return only the first one - we assume others to be same!!??
00185                  * cpu MHz - is the speed of CPU
00186                 """
00187                 #TODO: BUT cpu MHz show not the maximum speed but current, 
00188                 """
00189                 for 
00190                         model name      : Intel(R) Core(TM)2 Duo CPU     L9400  @ 1.86GHz
00191                         cpu MHz         : 800.000
00192                         cache size      : 6144 KB
00193                 """
00194                 cpu_result = {}
00195                 try:
00196                         f= open(os.path.join(self._path, "cpuinfo"), "r")
00197 
00198                         #we split data into a list of tuples = [(attr_name, attr_value), ...]
00199                         cpu_attributes = [l.strip().split(":") for l in f.readlines()]
00200                         #print cpu_attributes
00201                         f.close()
00202                         cpu_result = {
00203                                 "num_cores": max ([int(attr[1].strip())+1 for attr in cpu_attributes if attr[0].strip() == "processor"]), #Bug... Vidmantas used "core id"
00204                                 "cpu_speed_MHZ": max ([attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "cpu MHz"]),
00205                                 "cpu_cache_size": [attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "cache size"][0],
00206                                 "cpu_model_name": [attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "model name"][0]
00207                         }
00208                 except IOError,e:
00209                         print e
00210 
00211                 
00212                 
00213 
00214 
00215                 """ memory info """
00216                 mem_result = {}
00217 
00218                 try:
00219                         f= open(os.path.join(self._path, "meminfo"), "r")
00220 
00221                         #we split data into a list of tuples = [(attr_name, attr_value), ...]
00222                         mem_attributes = [l.strip().split(":") for l in f.readlines()]
00223 
00224                         mem_result = {
00225                                 "memory_total_ram": [attr[1].strip() for attr in mem_attributes if attr[0].strip() == "MemTotal"][0]
00226                         }
00227 
00228                 except IOError,e:
00229                         print e
00230         
00231                 cpu_result.update(mem_result)
00232                 return cpu_result
00233 
00234

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::handleParsingError	(	self,
		message
	)

Definition at line 150 of file parserPerfsuiteMetadata.py.

00151                                              :
00152                 if self._DEBUG:
00153                         raise ValueError, message
00154                 print " ======== AND ERROR WHILE PARSING METADATA ===="
00155                 print message
00156                 print " =============== end ========================= "

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::isTimeStamp ( line )

Returns whether the string is a timestamp (if not returns None)

>>> parserPerfsuiteMetadata.isTimeStamp("Fri Aug 14 01:16:03 2009")
True
>>> parserPerfsuiteMetadata.isTimeStamp("Fri Augx 14 01:16:03 2009")

Definition at line 96 of file parserPerfsuiteMetadata.py.

00097                              :
00098                 """
00099                 Returns whether the string is a timestamp (if not returns None)
00100 
00101                 >>> parserPerfsuiteMetadata.isTimeStamp("Fri Aug 14 01:16:03 2009")
00102                 True
00103                 >>> parserPerfsuiteMetadata.isTimeStamp("Fri Augx 14 01:16:03 2009")
00104 
00105                 """
00106                 datetime_format = "%a %b %d %H:%M:%S %Y" # we use default date format
00107                 try:
00108                         time.strptime(line, datetime_format)
00109                         return True
00110                 except ValueError:
00111                         return None

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseAll ( self )

Definition at line 722 of file parserPerfsuiteMetadata.py.

00723                           :
00724                 result = {"General": {}, "TestResults":{}, "cmsSciMark":{}, 'unrecognized_jobs': []}
00725 
00726                 """ all the general info - start, arguments, host etc """
00727                 result["General"].update(self.parseGeneralInfo())
00728 
00729                 """ machine info - cpu, memmory """
00730                 result["General"].update(self.getMachineInfo())
00731 
00732                 """ we add info about how successfull was the run, when it finished and final castor url to the file! """
00733                 result["General"].update(self.parseTheCompletion())
00734 
00735                 print "Parsing TimeSize runs..."
00736                 if len(self.lines_timesize) > 0:
00737                         try:
00738                                 result["TestResults"].update(self.parseTimeSize())
00739                         except Exception, e:
00740                                 print "BAD BAD BAD UNHANDLED ERROR in parseTimeSize: " + str(e)
00741 
00742                 print "Parsing Other(IgProf, Memcheck, ...) runs..."
00743                 try:
00744                         result["TestResults"].update(self.parseAllOtherTests())
00745                 except Exception, e:
00746                         print "BAD BAD BAD UNHANDLED ERROR in parseAllOtherTests: " + str(e)
00747 
00748                 #print result["TestResults"]
00749 
00750 
00751                 main_cores = [result["General"]["run_on_cpus"]]
00752                 num_cores = result["General"].get("num_cores", 0)
00753                 #DEBUG
00754                 #print "Number of cores was: %s"%num_cores
00755                 #TODO: temporarly - search for cores, use regexp
00756                 main_cores = [1]
00757 
00758                 # THE MAHCINE SCIMARKS
00759                 result["cmsSciMark"] = self.readCmsScimark(main_cores = main_cores)
00760 
00761                 if self.missing_fields:
00762                         self.handleParsingError("========== SOME REQUIRED FIELDS WERE NOT FOUND DURING PARSING ======= "+ str(self.missing_fields))
00763 
00764                 return result
00765                 
00766

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseAllOtherTests ( self )

Definition at line 360 of file parserPerfsuiteMetadata.py.

00361                                     :
00362                 #make it general, for whatever test comes...
00363                 test = {}
00364 
00365                 parsing_rules = (
00366                         (("", "candle", ), r"""^(Candle|ONLY) (.+) will be PROCESSED$""", "req"),
00367                         #e.g.: --conditions FrontierConditions_GlobalTag,MC_31X_V4::All --eventcontent RECOSIM
00368                         (("cms_driver_options", ), r"""^Using user-specified cmsDriver.py options: (.+)$"""),
00369                         (("", "conditions", ""), r"""^Using user-specified cmsDriver.py options: (.*)--conditions ([^\s]+)(.*)$""", "req"),
00370                         # for this we cannot guarrantee that it has been found, TODO: we might count the number of pileup candles and compare with arguments
00371                         (("",  "pileup_type", ""), r"""^Using user-specified cmsDriver.py options:(.*)--pileup=([^\s]+)(.*)$"""),
00372                         #not shure if event content is required
00373                         (("",  "event_content", ""), r"""^Using user-specified cmsDriver.py options:(.*)--eventcontent ([^\s]+)(.*)$""", "req"),
00374                         #TODO: after changeing the splitter to "taskset -c ..." this is no longer included into the part of correct job
00375                         #(("input_user_root_file", ), r"""^For these tests will use user input file (.+)$"""),
00376                 )
00377 
00378 
00379                 lines = self.lines_other
00380                 """
00381 
00382                 for each of IgProf_Perf, IgProf_Mem,  Memcheck, Callgrind tests we have such a structure of input file:
00383                 * beginning ->> and start timestamp- the firstone:
00384                         Launching the PILE UP IgProf_Mem tests on cpu 4 with 201 events each
00385                         Adding thread <simpleGenReportThread(Thread-1, started -176235632)> to the list of active threads
00386                         Mon Jun 14 20:06:54 2010
00387 
00388                         <... whatever might be here, might overlap with other test start/end messages ..>
00389 
00390                         Mon Jun 14 21:59:33 2010
00391                         IgProf_Mem test, in thread <simpleGenReportThread(Thread-1, stopped -176235632)> is done running on core 4
00392 
00393                 * ending - the last timestamp "before is done running ...."
00394                 """
00395                 # we take the first TimeStamp after the starting message and the first before the finishing message in 2 rounds..
00396         
00397                 #TODO: if threads would be changed it would stop working!!!
00398 
00399                 # i.e. Memcheck, cpu, events
00400                 reSubmit = re.compile(r"""^Let's submit (.+) test on core (\d+)$""")
00401                 
00402                 reStart = re.compile(r"""^Launching the (PILE UP |)(.*) tests on cpu (\d+) with (\d+) events each$""")
00403 
00404                 # i.e. Memcheck, thread name,id,core number
00405                 reEnd = re.compile(r"""^(.*) test, in thread <simpleGenReportThread\((.+), stopped -(\d+)\)> is done running on core (\d+)$""")
00406                 
00407                 reAddThread =  re.compile(r"""^Adding thread <simpleGenReportThread\((.+), started -(\d+)\)> to the list of active threads$""")
00408 
00409                 reWaiting = re.compile(r"""^Waiting for tests to be done...$""")
00410 
00411                 reExitCode = re.compile(r"""Individual cmsRelvalreport.py ExitCode (\d+)""")
00412                 """ we search for lines being either: (it's a little pascal'ish but we need the index!) """
00413 
00414                 jobs = []
00415 
00416                 #can split it into jobs ! just have to reparse it for the exit codes later....
00417                 for line_index in xrange(0, len(lines)):
00418                         line = lines[line_index]
00419                         if reSubmit.match(line):
00420                                 end_index = self.findLineAfter(line_index, lines, test_condition=lambda l: reWaiting.match(l), return_index = True)
00421                                 jobs.append(lines[line_index:end_index])
00422 
00423                 for job_lines in jobs:
00424                         #print job_lines
00425                         info = self._applyParsingRules(parsing_rules, job_lines)
00426                         #Fixing here the compatibility with new cmsdriver.py --conditions option
00427                         #(for which now we have autoconditions and FrontierConditions_GlobalTag is optional):
00428                         if 'auto:' in info['conditions']:
00429                                 from Configuration.AlCa.autoCond import autoCond
00430                                 info['conditions'] = autoCond[ info['conditions'].split(':')[1] ].split("::")[0]
00431                         else:
00432                                 if 'FrontierConditions_GlobalTag' in info['conditions']:
00433                                         info['conditions']=info['conditions'].split(",")[1]
00434 
00435                         steps_start = self.findFirstIndex_ofStartsWith(job_lines, "You defined your own steps to run:")
00436                         steps_end = self.findFirstIndex_ofStartsWith(job_lines, "*Candle ")
00437                         #probably it includes steps until we found *Candle... ?
00438                         steps = job_lines[steps_start + 1:steps_end]
00439                         if not self.validateSteps(steps):
00440                                 self.handleParsingError( "Steps were not found corrently: %s for current job: %s" % (str(steps), str(job_lines)))
00441                                 
00442                                 """ quite nasty - just a work around """
00443                                 print "Trying to recover from this error in case of old cmssw"
00444                                 
00445                                 """ we assume that steps are between the following sentance and a TimeStamp """
00446                                 steps_start = self.findFirstIndex_ofStartsWith(job_lines, "Steps passed to writeCommands")
00447                                 steps_end = self.findLineAfter(steps_start, job_lines, test_condition = self.isTimeStamp, return_index = True)
00448                                 
00449                                 steps = job_lines[steps_start + 1:steps_end]
00450                                 if not self.validateSteps(steps):
00451                                         self.handleParsingError( "EVEN AFTER RECOVERY Steps were not found corrently! : %s for current job: %s" % (str(steps), str(job_lines)))
00452                                 else:
00453                                         print "RECOVERY SEEMS to be successful: %s" % str(steps)
00454 
00455                         info["steps"] = self._LINE_SEPARATOR.join(steps) #!!!! STEPS MIGHT CONTAIN COMMA: ","
00456 
00457                         start_id_index = self.findLineAfter(0, job_lines, test_condition = reStart.match, return_index = True)
00458                         pileUp, testName, testCore, testEventsNum = reStart.match(job_lines[start_id_index]).groups()                   
00459                         info["testname"] = testName
00460 
00461                         thread_id_index = self.findLineAfter(0, job_lines, test_condition = reAddThread.match, return_index = True)
00462                         info["start"] = self.firstTimeStampAfter(thread_id_index, job_lines)
00463 
00464                         thread_id, thread_number = reAddThread.match(job_lines[thread_id_index]).groups()
00465                         info["thread_id"] = thread_id
00466                         
00467                         if not test.has_key(testName):
00468                                 test[testName] = []
00469                         test[testName].append(info)
00470                 
00471                 for line_index in xrange(0, len(lines)):
00472                         line = lines[line_index]
00473 
00474                         if reEnd.match(line):
00475                                 testName, thread_id, thread_num, testCore = reEnd.match(line).groups()
00476                                 time = self.firstTimeStampBefore(line_index, lines)
00477                                 try:
00478                                         exit_code = ""
00479                                         #we search for the exit code
00480                                         line_exitcode = self.findLineBefore(line_index, lines, test_condition=lambda l: reExitCode.match(l))
00481                                         exit_code, = reExitCode.match(line_exitcode).groups()
00482                                 except Exception, e:
00483                                         print "Error while getting exit code (Other test): %s" + str(e)
00484                                         
00485                                 for key, thread in test.items():
00486                                         for i in range(0, len(thread)):
00487                                                 if thread[i]["thread_id"] == thread_id:
00488                                                         thread[i].update({"end": time, "exit_code": exit_code})
00489                                                         break
00490                                 
00491                 return test
00492

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseGeneralInfo ( self )

Definition at line 255 of file parserPerfsuiteMetadata.py.

00256                                   :
00257                 lines = self.lines_general
00258                 """ we define a simple list (tuple) of rules for parsing, the first part tuple defines the parameters to be fetched from the
00259                         regexp while the second one is the regexp itself """
00260                 #TIP: don't forget that tuple of one ends with ,
00261                 parsing_rules = (
00262                         (("", "num_cores", "run_on_cpus"), r"""^This machine \((.+)\) is assumed to have (\d+) cores, and the suite will be run on cpu \[(.+)\]$"""),
00263                         (("start_time", "host", "local_workdir", "user"), r"""^Performance Suite started running at (.+) on (.+) in directory (.+), run by user (.+)$""", "req"),
00264                         (("architecture",) ,r"""^Current Architecture is (.+)$"""),
00265                         (("test_release_based_on",), r"""^Test Release based on: (.+)$""", "req"),
00266                         (("base_release_path",) , r"""^Base Release in: (.+)$"""),
00267                         (("test_release_local_path",) , r"""^Your Test release in: (.+)$"""),
00268 
00269                         (("castor_dir",) , r"""^The performance suite results tarball will be stored in CASTOR at (.+)$"""),
00270                         
00271                         (("TimeSize_events",) , r"""^(\d+) TimeSize events$"""),
00272                         (("IgProf_events",) , r"""^(\d+) IgProf events$"""),
00273                         (("CallGrind_events",) , r"""^(\d+) Callgrind events$"""),
00274                         (("Memcheck_events",) , r"""^(\d+) Memcheck events$"""), 
00275 
00276                         (("candles_TimeSize",) , r"""^TimeSizeCandles \[(.*)\]$"""),
00277                         (("candles_TimeSizePU",) , r"""^TimeSizePUCandles \[(.*)\]$"""),
00278                         
00279                         (("candles_Memcheck",) , r"""^MemcheckCandles \[(.*)\]$"""),
00280                         (("candles_MemcheckPU",) , r"""^MemcheckPUCandles \[(.*)\]$"""),
00281 
00282                         (("candles_Callgrind",) , r"""^CallgrindCandles \[(.*)\]$"""),
00283                         (("candles_CallgrindPU",) , r"""^CallgrindPUCandles \[(.*)\]$"""),
00284 
00285                         (("candles_IgProfPU",) , r"""^IgProfPUCandles \[(.*)\]$"""),
00286                         (("candles_IgProf",) , r"""^IgProfCandles \[(.*)\]$"""),
00287 
00288 
00289                         (("cmsScimark_before",) , r"""^(\d+) cmsScimark benchmarks before starting the tests$"""),
00290                         (("cmsScimark_after",) , r"""^(\d+) cmsScimarkLarge benchmarks before starting the tests$"""),
00291                         (("cmsDriverOptions",) , r"""^Running cmsDriver.py with user defined options: --cmsdriver="(.+)"$"""),
00292 
00293                         (("HEPSPEC06_SCORE",) ,r"""^This machine's HEPSPEC06 score is: (.+)$"""),
00294 
00295 
00296                 )
00297                 """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """
00298                 info = self._applyParsingRules(parsing_rules, lines)
00299 
00300 
00301                 """ postprocess the candles list """
00302                 candles = {}
00303                 for field, value in info.items():
00304                         if field.startswith("candles_"):
00305                                 test = field.replace("candles_", "")
00306                                 value = [v.strip(" '") for v in value.split(",")]
00307                                 #if value:
00308                                 candles[test]=value
00309                                 del info[field]
00310                 #print candles
00311                 info["candles"] = self._LINE_SEPARATOR.join([k+":"+",".join(v) for (k, v) in candles.items()])
00312 
00313 
00314                 """ TAGS """
00315                 """ 
00316                 --- Tag ---    --- RelTag --- -------- Package --------                        
00317                 HEAD           V05-03-06      IgTools/IgProf                                   
00318                 V01-06-05      V01-06-04      Validation/Performance                           
00319                 ---------------------------------------
00320                 total packages: 2 (2 displayed)
00321                 """
00322                 tags_start_index = -1 # set some default
00323                 try:
00324                         tags_start_index = [i for i in xrange(0, len(lines)) if lines[i].startswith("--- Tag ---")][0]
00325                 except:
00326                         pass
00327                 if tags_start_index > -1:
00328                         tags_end_index = [i for i in xrange(tags_start_index + 1, len(lines)) if lines[i].startswith("---------------------------------------")][0]
00329                         # print "tags start index: %s, end index: %s" % (tags_start_index, tags_end_index)
00330                         tags = lines[tags_start_index:tags_end_index+2]
00331                         # print [tag.split("  ") for tag in tags]
00332                         # print "\n".join(tags)
00333                 else: # no tags found, make an empty list ...
00334                         tags = []
00335                 """ we join the tags with separator to store as simple string """
00336                 info["tags"] = self._LINE_SEPARATOR.join(tags)
00337                 #FILES/PATHS
00338         
00339 
00340                 """ get the command line """
00341                 try:
00342                         cmd_index = self.findFirstIndex_ofStartsWith(lines, "Performance suite invoked with command line:") + 1 #that's the next line
00343                         info["command_line"] =  lines[cmd_index]
00344                 except IndexError, e:
00345                         if self._DEBUG:
00346                                 print e
00347                         info["command_line"] =  ""
00348                 
00349                 try:
00350                         cmd_parsed_start = self.findFirstIndex_ofStartsWith(lines, "Initial PerfSuite Arguments:") + 1
00351                         cmd_parsed_end = self.findFirstIndex_ofStartsWith(lines, "Running cmsDriver.py")
00352                         info["command_line_parsed"] = self._LINE_SEPARATOR.join(lines[cmd_parsed_start:cmd_parsed_end])
00353                 except IndexError, e:
00354                         if self._DEBUG:
00355                                 print e
00356                         info["command_line"] =  ""
00357 
00358                 return  info
00359

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseTheCompletion ( self )

 checks if the suite has successfully finished  
        and if the tarball was successfully archived and uploaded to the castor

Definition at line 654 of file parserPerfsuiteMetadata.py.

00655                                     :
00656                 """
00657                  checks if the suite has successfully finished  
00658                         and if the tarball was successfully archived and uploaded to the castor """
00659 
00660                 parsing_rules = (
00661                         (("finishing_time", "", ""), r"""^Performance Suite finished running at (.+) on (.+) in directory (.+)$"""),
00662                         (("castor_md5",) , r"""^The md5 checksum of the tarball: (.+)$"""),     
00663                         (("successfully_archived_tarball", ), r"""^Successfully archived the tarball (.+) in CASTOR!$"""),
00664                         #TODO: WE MUST HAVE THE CASTOR URL, but for some of files it's not included [probably crashed]
00665                         (("castor_file_url",), r"""^The tarball can be found: (.+)$"""),                        
00666                         (("castor_logfile_url",), r"""^The logfile can be found: (.+)$"""),
00667                 )
00668 
00669                 
00670                 """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """
00671                 info = self._applyParsingRules(parsing_rules, self.lines_other)
00672 
00673                 """ did we detect any errors in log files ? """
00674                 info["no_errors_detected"] = [line for line in self.lines_other if line == "There were no errors detected in any of the log files!"] and "1" or "0"
00675                 if not info["successfully_archived_tarball"]:
00676                         info["castor_file_url"] = ""
00677 
00678                 if not info["castor_file_url"]:
00679                         #TODO: get the castor file url or abort
00680                         self.handleParsingError( "Castor tarball URL not found. Trying to get from environment")
00681                         lmdb_castor_url_is_valid = lambda url: url.startswith("/castor/")
00682 
00683                         url = ""
00684                         try:
00685                                 #print "HERE!"
00686                                 url=self.get_tarball_fromlog()
00687                                 print "Extracted castor tarball full path by re-parsing cmsPerfSuite.log: %s"%url
00688                                 
00689                         except:
00690                                 if os.environ.has_key("PERFDB_CASTOR_FILE_URL"):
00691                                         url = os.environ["PERFDB_CASTOR_FILE_URL"]
00692                                         
00693                                 else: #FIXME: add the possibility to get it directly from the cmsPerfSuite.log file (make sure it is dumped there before doing the tarball itself...)
00694                                         print "Failed to get the tarball location from environment variable PERFDB_CASTOR_FILE_URL" 
00695                                         self.handleParsingError( "Castor tarball URL not found. Provide interactively")
00696 
00697                         while True:
00698                                 
00699                                 if lmdb_castor_url_is_valid(url):
00700                                         info["castor_file_url"] = url
00701                                         break
00702                                 print "Please enter a valid CASTOR url: has to start with /castor/ and should point to the tarball"
00703                                 if os.isatty(0): url = sys.stdin.readline()
00704                                 else: raise IOError("stdin is closed.")
00705 
00706 
                return info

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseTimeSize ( self )

parses the timeSize

Definition at line 493 of file parserPerfsuiteMetadata.py.

00494                                :
00495                 """ parses the timeSize """
00496                 timesize_result = []
00497 
00498                 # TODO: we will use the first timestamp after the "or these tests will use user input file..."
00499                 #TODO: do we have to save the name of input file somewhere?
00500                 """
00501                 the structure of input file:
00502                 * beginning ->> and start timestamp- the firstone:              
00503                         >>> [optional:For these tests will use user input file /build/RAWReference/MinBias_RAW_320_IDEAL.root]
00504                         <...>
00505                         Using user-specified cmsDriver.py options: --conditions FrontierConditions_GlobalTag,MC_31X_V4::All --eventcontent RECOSIM
00506                         Candle MinBias will be PROCESSED
00507                         You defined your own steps to run:
00508                         RAW2DIGI-RECO
00509                         *Candle MinBias
00510                         Written out cmsRelvalreport.py input file at:
00511                         /build/relval/CMSSW_3_2_4/workStep2/MinBias_TimeSize/SimulationCandles_CMSSW_3_2_4.txt
00512                         Thu Aug 13 14:53:37 2009 [start]
00513                         <....>
00514                         Thu Aug 13 16:04:48 2009 [end]
00515                         Individual cmsRelvalreport.py ExitCode 0
00516                 * ending - the last timestamp "... ExitCode ...."
00517                 """
00518                 #TODO: do we need the cmsDriver --conditions? I suppose it would the global per work directory = 1 perfsuite run (so samefor all candles in one work dir)
00519                 # TODO: which candle definition to use?
00520                 """ divide into separate jobs """
00521                 lines = self.lines_timesize
00522                 jobs = []
00523                 start = False
00524                 timesize_start_indicator = re.compile(r"""^taskset -c (\d+) cmsRelvalreportInput.py""")
00525                 for line_index in xrange(0, len(lines)):
00526                         line = lines[line_index]
00527                         # search for start of each TimeSize job (with a certain candle and step)
00528                         if timesize_start_indicator.match(line):
00529                                 if start:
00530                                         jobs.append(lines[start:line_index])
00531                                 start = line_index
00532                 #add the last one
00533                 jobs.append(lines[start:len(lines)])
00534                 #print "\n".join(str(i) for i in jobs)
00535 
00536                 parsing_rules = (
00537                         (("", "candle", ), r"""^(Candle|ONLY) (.+) will be PROCESSED$""", "req"),
00538                         #e.g.: --conditions FrontierConditions_GlobalTag,MC_31X_V4::All --eventcontent RECOSIM
00539                         (("cms_driver_options", ), r"""^Using user-specified cmsDriver.py options: (.+)$"""),
00540                         (("", "conditions", ""), r"""^Using user-specified cmsDriver.py options: (.*)--conditions ([^\s]+)(.*)$""", "req"),
00541                         # for this we cannot guarrantee that it has been found, TODO: we might count the number of pileup candles and compare with arguments
00542                         (("",  "pileup_type", ""), r"""^Using user-specified cmsDriver.py options:(.*)--pileup=([^\s]+)(.*)$"""),
00543                         #not shure if event content is required
00544                         (("",  "event_content", ""), r"""^Using user-specified cmsDriver.py options:(.*)--eventcontent ([^\s]+)(.*)$""", "req"),
00545                         #TODO: after changeing the splitter to "taskset -c ..." this is no longer included into the part of correct job
00546                         #(("input_user_root_file", ), r"""^For these tests will use user input file (.+)$"""),
00547                 )
00548 
00549                 #parse each of the TimeSize jobs: find candles, etc and start-end times
00550 
00551                 reExit_code = re.compile(r"""Individual ([^\s]+) ExitCode (\d+)""")
00552 
00553                 if self._DEBUG:
00554                         print "TimeSize (%d) jobs: %s" % (len(jobs), str(jobs))
00555 
00556                 for job_lines in jobs:
00557                         """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """
00558                         info = self._applyParsingRules(parsing_rules, job_lines)
00559                         #Fixing here the compatibility with new cmsdriver.py --conditions option (for which now we have autoconditions and FrontierConditions_GlobalTag is optional):
00560                         if 'auto:' in info['conditions']:
00561                                 from Configuration.AlCa.autoCond import autoCond
00562                                 info['conditions'] = autoCond[ info['conditions'].split(':')[1] ].split("::")[0]
00563                         else:
00564                                 if 'FrontierConditions_GlobalTag' in info['conditions']:
00565                                         info['conditions']=info['conditions'].split(",")[1]
00566                                                                                                                                 
00567                         #DEBUG:
00568                         #print "CONDITIONS are: %s"%info['conditions']
00569                         #start time - the index after which comes the time stamp
00570                         """ the following is not available on one of the releases, instead
00571                         use the first timestamp available on our job - that's the starting time :) """ 
00572                         
00573                         #start_time_after = self.findFirstIndex_ofStartsWith(job_lines, "Written out cmsRelvalreport.py input file at:")
00574                         #print start_time_after
00575                         info["start"] = self.firstTimeStampAfter(0, job_lines)
00576 
00577                         #TODO: improve in future (in case of some changes) we could use findBefore instead which uses the regexp as parameter for searching 
00578                         #end time - the index before which comes the time stamp
00579 
00580                         # On older files we have - "Individual Relvalreport.py ExitCode 0" instead of "Individual cmsRelvalreport.py ExitCode"
00581                         end_time_before = self.findLineAfter(0, job_lines, test_condition = reExit_code.match, return_index = True)
00582 
00583                         # on the same line we have the exit Code - so let's get it
00584                         nothing, exit_code = reExit_code.match(job_lines[end_time_before]).groups()
00585 
00586                         info["end"] = self.firstTimeStampBefore(end_time_before, job_lines)
00587                         info["exit_code"] = exit_code
00588 
00589                         steps_start = self.findFirstIndex_ofStartsWith(job_lines, "You defined your own steps to run:")
00590                         steps_end = self.findFirstIndex_ofStartsWith(job_lines, "*Candle ")
00591                         #probably it includes steps until we found *Candle... ?
00592                         steps = job_lines[steps_start + 1:steps_end]
00593                         if not self.validateSteps(steps):
00594                                 self.handleParsingError( "Steps were not found corrently: %s for current job: %s" % (str(steps), str(job_lines)))
00595                                 
00596                                 """ quite nasty - just a work around """
00597                                 print "Trying to recover from this error in case of old cmssw"
00598                                 
00599                                 """ we assume that steps are between the following sentance and a TimeStamp """
00600                                 steps_start = self.findFirstIndex_ofStartsWith(job_lines, "Steps passed to writeCommands")
00601                                 steps_end = self.findLineAfter(steps_start, job_lines, test_condition = self.isTimeStamp, return_index = True)
00602                                 
00603                                 steps = job_lines[steps_start + 1:steps_end]
00604                                 if not self.validateSteps(steps):
00605                                         self.handleParsingError( "EVEN AFTER RECOVERY Steps were not found corrently! : %s for current job: %s" % (str(steps), str(job_lines)))
00606                                 else:
00607                                         print "RECOVERY SEEMS to be successful: %s" % str(steps)
00608 
00609                         info["steps"] = self._LINE_SEPARATOR.join(steps) #!!!! STEPS MIGHT CONTAIN COMMA: ","
00610                         
00611 
00612                         timesize_result.append(info)
                return {"TimeSize": timesize_result}

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readCmsScimark	(	self,
		main_cores = `[1]`
	)

Definition at line 629 of file parserPerfsuiteMetadata.py.

00630                                                   :
00631                 main_core = main_cores[0]
00632                 #TODO: WE DO NOT ALWAYS REALLY KNOW THE MAIN CORE NUMBER! but we don't care too much
00633                 #we parse each of the SciMark files and the Composite scores
00634                 csimark = []
00635                 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark2", testType = "mainCore", core = main_core))
00636                 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark2_large", testType = "mainCore_Large", core = main_core))
00637 
00638 
00639                 #we not always know the number of cores available so we will just search the directory to find out core numbers
00640                 reIsCsiMark_notusedcore = re.compile("^cmsScimark_(\d+).log$")
00641                 scimark_files = [reIsCsiMark_notusedcore.match(f).groups()[0]
00642                                 for f in os.listdir(self._path)
00643                                  if reIsCsiMark_notusedcore.match(f) 
00644                                         and os.path.isfile(os.path.join(self._path, f)) ]
00645 
00646                 for core_number in scimark_files:
00647                         try:
00648                                 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark_%s" % str(core_number), testType = "NotUsedCore_%s" %str(core_number), core = core_number))
00649                         except IOError, e:
00650                                 if self._DEBUG:
00651                                         print e
00652                 return csimark
00653                 #print csimark

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readCmsScimarkTest	(	self,
		testName,
		testType,
		core
	)

Definition at line 617 of file parserPerfsuiteMetadata.py.

00618                                                               :
00619                 lines  = self.readInput(self._path, fileName = testName + ".log")
00620                 scores = [{"score": self.reCmsScimarkTest.match(line).groups()[1], "type": testType, "core": core}
00621                                 for line in lines 
00622                                 if self.reCmsScimarkTest.match(line)]
00623                 #add the number of messurment
00624                 i = 0
00625                 for score in scores:
00626                         i += 1
00627                         score.update({"messurement_number": i})
00628                 return scores

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readInput	(	self,
		path,
		fileName = `"cmsPerfSuite.log"`
	)

Definition at line 161 of file parserPerfsuiteMetadata.py.

00162                                                                 :
00163                 try:
00164                         f = open(os.path.join(path, fileName), "r")
00165                         lines =  [s.strip() for s in f.readlines()]
00166                         f.close()
00167                 except IOError:
00168                         lines = []
00169 
00170                 #print self._lines
00171                 return lines
00172 
00173 
00174

def parserPerfsuiteMetadata::parserPerfsuiteMetadata::validateSteps	(	self,
		steps
	)

Simple function for error detection. TODO: we could use a list of possible steps also

Definition at line 24 of file parserPerfsuiteMetadata.py.

00025                                       :
00026                 """ Simple function for error detection. TODO: we could use a list of possible steps also """
00027                 return not (not steps or len(steps) > self._MAX_STEPS)