![]() |
![]() |
Public Member Functions | |
def | __init__ |
def | doQuery |
def | findFirstIndex_ofStartsWith |
def | findLineAfter |
def | findLineBefore |
def | firstTimeStampAfter |
def | firstTimeStampBefore |
def | get_tarball_fromlog |
def | getIgSummary |
def | getMachineInfo |
def | getSummaryInfo |
def | handleParsingError |
def | isTimeStamp |
def | parseAll |
def | parseAllOtherTests |
def | parseGeneralInfo |
def | parseTheCompletion |
def | parseTimeSize |
def | readCmsScimark |
def | readCmsScimarkTest |
def | readInput |
def | validateSteps |
Public Attributes | |
lines_general | |
lines_other | |
lines_timesize | |
missing_fields | |
reCmsScimarkTest | |
Private Member Functions | |
def | _applyParsingRules |
Private Attributes | |
_DEBUG | |
_MAX_STEPS | |
_otherStart | |
_path | |
_timeSizeEnd | |
_timeSizeStart | |
Static Private Attributes | |
string | _LINE_SEPARATOR = "|" |
The whole parsing works as following. We split the file into 3 parts (we keep 3 variables of line lists:self.lines_general, self.lines_timesize, self.lines_other ): * General info As most of the info are simple one line strings, we define some regular expressions defining and matching each of those lines. The regular expressions are associated with data which we can get from them. e.g. ^Suite started at (.+) on (.+) by user (.+)$ would match only the line defining the time suite started and on which machine. It's associated with tuple of field names for general info which will be filled in. in this way we get info = {'start_time': start-taken-from-regexp, 'host': host, 'user': user}. This is done by calling simple function _applyParsingRules which checks each lines with each if one passes another, if it does fills in the result dictionary with the result. Additionaly we get the cpu and memmory info from /proc/cpuinfo /proc/meminfo * TimeSize test We use the same technique a little bit also. But at first we divide the timesize lines by job (individual run of cmssw - per candle, and pileup/not). Then for each of the jobs we apply our parsing rules, also we find the starting and ending times (i.e. We know that start timestamp is somethere after certain line containing "Written out cmsRelvalreport.py input file at:") * All other tests We find the stating that the test is being launched (containing the test name, core and num events). Above we have the thread number, and below the starting time. The ending time can be ONLY connected with the starting time by the Thread-ID. The problem is that the file names different the same test instance like <Launching "PILE UP Memcheck"> and <"Memcheck" stopped>.
Definition at line 8 of file parserPerfsuiteMetadata.py.
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::__init__ | ( | self, | |
path | |||
) |
Definition at line 28 of file parserPerfsuiteMetadata.py.
00029 : 00030 00031 self._MAX_STEPS = 5 # MAXIMUM NUMBER OF STEPS PER RUN (taskset relvalreport.py...) 00032 self._DEBUG = False 00033 00034 00035 self._path = path 00036 00037 """ some initialisation to speedup the other functions """ 00038 #for cmsscimark 00039 self.reCmsScimarkTest = re.compile(r"""^Composite Score:(\s*)([^\s]+)$""") 00040 00041 #TimeSize 00042 """ the separator for beginning of timeSize / end of general statistics """ 00043 self._timeSizeStart = re.compile(r"""^Launching the TimeSize tests \(TimingReport, TimeReport, SimpleMemoryCheck, EdmSize\) with (\d+) events each$""") 00044 """ (the first timestamp is the start of TimeSize) """ 00045 00046 00047 """ the separator for end of timeSize / beginning of IgProf_Perf, IgProf_Mem, Memcheck, Callgrind tests """ 00048 self._timeSizeEnd = re.compile(r"""^Stopping all cmsScimark jobs now$""") 00049 00050 #Other tests: 00051 self._otherStart = re.compile(r"^Preparing") 00052 00053 """ 00054 ----- READ THE DATA ----- 00055 """ 00056 lines = self.readInput(path) 00057 """ split the whole file into parts """ 00058 #Let's not assume there are ALWAYS TimeSize tests in the runs of the Performance Suite!: 00059 #Check first: 00060 #FIXME: Vidmantas did not think to this case... will need to implement protectionb against it for all the IB tests... 00061 #To do as soon as possible... 00062 #Maybe revisit the strategy if it can be done quickly. 00063 timesize_end= [lines.index(line) for line in lines if self._timeSizeEnd.match(line)] 00064 if timesize_end: 00065 timesize_end_index = timesize_end[0] 00066 else: 00067 timesize_end_index=0 00068 timesize_start=[lines.index(line) for line in lines if self._timeSizeStart.match(line)] 00069 general_stop=[lines.index(line) for line in lines if self._otherStart.match(line)] 00070 if timesize_start: 00071 timesize_start_index = timesize_start[0] 00072 general_stop_index=timesize_start_index 00073 elif general_stop: 00074 timesize_start_index=0 00075 general_stop_index=general_stop[0] 00076 else: 00077 timesize_start_index=0 00078 general_stop_index=-1 00079 00080 """ we split the structure: 00081 * general 00082 * timesize 00083 * all others [igprof etc] 00084 """ 00085 00086 """ we get the indexes of spliting """ 00087 #Not OK to use timsize_start_index for the general lines... want to be general, also to cases of no TimeSize tests... 00088 #self.lines_general = lines[:timesize_start_index] 00089 self.lines_general = lines[:general_stop_index] 00090 self.lines_timesize = lines[timesize_start_index:timesize_end_index+1] 00091 self.lines_other = lines[timesize_end_index:] 00092 00093 """ a list of missing fields """ 00094 self.missing_fields = []
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::_applyParsingRules | ( | self, | |
parsing_rules, | |||
lines | |||
) | [private] |
Applies the (provided) regular expression rules (=rule[1] for rule in parsing_rules) to each line and if it matches the line, puts the mached information to the dictionary as the specified keys (=rule[0]) which is later returned Rule[3] contains whether the field is required to be found. If so and it isn't found the exception would be raised. rules = [ ( (field_name_1_to_match, field_name_2), regular expression, /optionaly: is the field required? if so "req"/ ) ]
we call a shared parsing helper
Definition at line 235 of file parserPerfsuiteMetadata.py.
00236 : 00237 """ 00238 Applies the (provided) regular expression rules (=rule[1] for rule in parsing_rules) 00239 to each line and if it matches the line, 00240 puts the mached information to the dictionary as the specified keys (=rule[0]) which is later returned 00241 Rule[3] contains whether the field is required to be found. If so and it isn't found the exception would be raised. 00242 rules = [ 00243 ( (field_name_1_to_match, field_name_2), regular expression, /optionaly: is the field required? if so "req"/ ) 00244 ] 00245 """ 00246 """ we call a shared parsing helper """ 00247 #parsing_rules = map(parsingRulesHelper.rulesRegexpCompileFunction, parsing_rules) 00248 #print parsing_rules 00249 (info, missing_fields) = parsingRulesHelper.rulesParser(parsing_rules, lines, compileRules = True) 00250 00251 self.missing_fields.extend(missing_fields) 00252 00253 return info 00254
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::doQuery | ( | self, | |
query, | |||
database | |||
) |
Definition at line 631 of file parserPerfsuiteMetadata.py.
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findFirstIndex_ofStartsWith | ( | job_lines, | |
start_of_line | |||
) |
Definition at line 113 of file parserPerfsuiteMetadata.py.
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findLineAfter | ( | self, | |
line_index, | |||
lines, | |||
test_condition, | |||
return_index = False |
|||
) |
finds a line satisfying the `test_condition` comming after the `line_index`
Definition at line 129 of file parserPerfsuiteMetadata.py.
00130 : 00131 """ finds a line satisfying the `test_condition` comming after the `line_index` """ 00132 # we're going forward the lines list 00133 for line_index in xrange(line_index + 1, len(lines)): 00134 line = lines[line_index] 00135 00136 if test_condition(line): 00137 if return_index: 00138 return line_index 00139 return line
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::findLineBefore | ( | self, | |
line_index, | |||
lines, | |||
test_condition | |||
) |
finds a line satisfying the `test_condition` comming before the `line_index`
Definition at line 118 of file parserPerfsuiteMetadata.py.
00119 : 00120 """ finds a line satisfying the `test_condition` comming before the `line_index` """ 00121 # we're going backwards the lines list 00122 for line_index in xrange(line_index -1, -1, -1): 00123 line = lines[line_index] 00124 00125 if test_condition(line): 00126 return line 00127 raise ValueError 00128
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::firstTimeStampAfter | ( | self, | |
line_index, | |||
lines | |||
) |
returns the first timestamp AFTER the line with given index
Definition at line 145 of file parserPerfsuiteMetadata.py.
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::firstTimeStampBefore | ( | self, | |
line_index, | |||
lines | |||
) |
returns the first timestamp BEFORE the line with given index
Definition at line 140 of file parserPerfsuiteMetadata.py.
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::get_tarball_fromlog | ( | self | ) |
Return the tarball castor location by parsing the cmsPerfSuite.log file
Definition at line 690 of file parserPerfsuiteMetadata.py.
00691 : 00692 '''Return the tarball castor location by parsing the cmsPerfSuite.log file''' 00693 print "Getting the url from the cmsPerfSuite.log" 00694 log=open("cmsPerfSuite.log","r") 00695 castor_dir="UNKNOWN_CASTOR_DIR" 00696 tarball="UNKNOWN_TARBALL" 00697 for line in log.readlines(): 00698 if 'castordir' in line: 00699 castor_dir=line.split()[1] 00700 if 'tgz' in line and tarball=="UNKNOWN_TARBALL": #Pick the first line that contains the tar command... 00701 if 'tar' in line: 00702 tarball=os.path.basename(line.split()[2]) 00703 castor_tarball=os.path.join(castor_dir,tarball) 00704 return castor_tarball
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::getIgSummary | ( | self | ) |
Definition at line 602 of file parserPerfsuiteMetadata.py.
00603 : 00604 igresult = [] 00605 globbed = glob.glob(os.path.join(self._path, "../*/IgProfData/*/*/*.sql3")) 00606 00607 for f in globbed: 00608 #print f 00609 profileInfo = self.getSummaryInfo(f) 00610 if not profileInfo: 00611 continue 00612 cumCounts, cumCalls = profileInfo 00613 dump, architecture, release, rest = f.rsplit("/", 3) 00614 candle, sequence, pileup, conditions, process, counterType, events = rest.split("___") 00615 events = events.replace(".sql3", "") 00616 igresult.append({"counter_type": counterType, "event": events, "cumcounts": cumCounts, "cumcalls": cumCalls}) 00617 00618 return igresult
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::getMachineInfo | ( | self | ) |
Returns the cpu and memory info
cpu info
we assume that: * num_cores = max(core id+1) [it's counted from 0] * 'model name' is processor type [we will return only the first one - we assume others to be same!!?? * cpu MHz - is the speed of CPU
for model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz cpu MHz : 800.000 cache size : 6144 KB
Definition at line 175 of file parserPerfsuiteMetadata.py.
00176 : 00177 """ Returns the cpu and memory info """ 00178 00179 """ cpu info """ 00180 00181 """ 00182 we assume that: 00183 * num_cores = max(core id+1) [it's counted from 0] 00184 * 'model name' is processor type [we will return only the first one - we assume others to be same!!?? 00185 * cpu MHz - is the speed of CPU 00186 """ 00187 #TODO: BUT cpu MHz show not the maximum speed but current, 00188 """ 00189 for 00190 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz 00191 cpu MHz : 800.000 00192 cache size : 6144 KB 00193 """ 00194 cpu_result = {} 00195 try: 00196 f= open(os.path.join(self._path, "cpuinfo"), "r") 00197 00198 #we split data into a list of tuples = [(attr_name, attr_value), ...] 00199 cpu_attributes = [l.strip().split(":") for l in f.readlines()] 00200 #print cpu_attributes 00201 f.close() 00202 cpu_result = { 00203 "num_cores": max ([int(attr[1].strip())+1 for attr in cpu_attributes if attr[0].strip() == "processor"]), #Bug... Vidmantas used "core id" 00204 "cpu_speed_MHZ": max ([attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "cpu MHz"]), 00205 "cpu_cache_size": [attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "cache size"][0], 00206 "cpu_model_name": [attr[1].strip() for attr in cpu_attributes if attr[0].strip() == "model name"][0] 00207 } 00208 except IOError,e: 00209 print e 00210 00211 00212 00213 00214 00215 """ memory info """ 00216 mem_result = {} 00217 00218 try: 00219 f= open(os.path.join(self._path, "meminfo"), "r") 00220 00221 #we split data into a list of tuples = [(attr_name, attr_value), ...] 00222 mem_attributes = [l.strip().split(":") for l in f.readlines()] 00223 00224 mem_result = { 00225 "memory_total_ram": [attr[1].strip() for attr in mem_attributes if attr[0].strip() == "MemTotal"][0] 00226 } 00227 00228 except IOError,e: 00229 print e 00230 00231 cpu_result.update(mem_result) 00232 return cpu_result 00233 00234
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::getSummaryInfo | ( | self, | |
database | |||
) |
Definition at line 619 of file parserPerfsuiteMetadata.py.
00620 : 00621 summary_query="""SELECT counter, total_count, total_freq, tick_period 00622 FROM summary;""" 00623 error, output = self.doQuery(summary_query, database) 00624 if error or not output or output.count("\n") > 1: 00625 return None 00626 counter, total_count, total_freq, tick_period = output.split("@@@") 00627 if counter == "PERF_TICKS": 00628 return float(tick_period) * float(total_count), int(total_freq) 00629 else: 00630 return int(total_count), int(total_freq)
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::handleParsingError | ( | self, | |
message | |||
) |
Definition at line 150 of file parserPerfsuiteMetadata.py.
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::isTimeStamp | ( | line | ) |
Returns whether the string is a timestamp (if not returns None) >>> parserPerfsuiteMetadata.isTimeStamp("Fri Aug 14 01:16:03 2009") True >>> parserPerfsuiteMetadata.isTimeStamp("Fri Augx 14 01:16:03 2009")
Definition at line 96 of file parserPerfsuiteMetadata.py.
00097 : 00098 """ 00099 Returns whether the string is a timestamp (if not returns None) 00100 00101 >>> parserPerfsuiteMetadata.isTimeStamp("Fri Aug 14 01:16:03 2009") 00102 True 00103 >>> parserPerfsuiteMetadata.isTimeStamp("Fri Augx 14 01:16:03 2009") 00104 00105 """ 00106 datetime_format = "%a %b %d %H:%M:%S %Y" # we use default date format 00107 try: 00108 time.strptime(line, datetime_format) 00109 return True 00110 except ValueError: 00111 return None
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseAll | ( | self | ) |
Definition at line 705 of file parserPerfsuiteMetadata.py.
00706 : 00707 result = {"General": {}, "TestResults":{}, "cmsSciMark":{}, "IgSummary":{}, 'unrecognized_jobs': []} 00708 00709 """ all the general info - start, arguments, host etc """ 00710 result["General"].update(self.parseGeneralInfo()) 00711 00712 """ machine info - cpu, memmory """ 00713 result["General"].update(self.getMachineInfo()) 00714 00715 """ we add info about how successfull was the run, when it finished and final castor url to the file! """ 00716 result["General"].update(self.parseTheCompletion()) 00717 00718 try: 00719 result["TestResults"].update(self.parseTimeSize()) 00720 except Exception, e: 00721 print "BAD BAD BAD UNHANDLED ERROR" + str(e) 00722 00723 00724 #TODO: 00725 #Check what Vidmantas was doing in the parseAllOtherTests, de facto it is not used now, so commenting it for now (to avoid the "BAD BAD BAD...." 00726 #try: 00727 # result["TestResults"].update(self.parseAllOtherTests()) 00728 #except Exception, e: 00729 # print "BAD BAD BAD UNHANDLED ERROR" + str(e) 00730 00731 00732 main_cores = [result["General"]["run_on_cpus"]] 00733 num_cores = result["General"].get("num_cores", 0) 00734 #DEBUG 00735 #print "Number of cores was: %s"%num_cores 00736 #TODO: temporarly - search for cores, use regexp 00737 main_cores = [1] 00738 00739 # THE MAHCINE SCIMARKS 00740 result["cmsSciMark"] = self.readCmsScimark(main_cores = main_cores) 00741 result["IgSummary"] = self.getIgSummary() 00742 00743 00744 00745 if self.missing_fields: 00746 self.handleParsingError("========== SOME REQUIRED FIELDS WERE NOT FOUND DURING PARSING ======= "+ str(self.missing_fields)) 00747 00748 return result 00749 00750
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseAllOtherTests | ( | self | ) |
Definition at line 360 of file parserPerfsuiteMetadata.py.
00361 : 00362 threads = {} 00363 tests = { 00364 #"IgProf_Perf": {}, "IgProf_Mem": {}, "Memcheck": {}, "Callgrind": {}, 00365 } 00366 00367 lines = self.lines_other 00368 """ 00369 00370 for each of IgProf_Perf, IgProf_Mem, Memcheck, Callgrind tests we have such a structure of input file: 00371 * beginning ->> and start timestamp- the firstone: 00372 Adding thread <simpleGenReportThread(Thread-1, started)> to the list of active threads 00373 Launching the Memcheck tests on cpu 3 with 5 events each 00374 Fri Aug 14 01:16:03 2009 00375 00376 <... whatever might be here, might overlap with other test start/end messages ..> 00377 00378 Fri Aug 14 02:13:18 2009 00379 Memcheck test, in thread <simpleGenReportThread(Thread-1, stopped)> is done running on core 3 00380 * ending - the last timestamp "before is done running ...." 00381 """ 00382 # we take the first TimeStamp after the starting message and the first before the finishing message 00383 00384 00385 #TODO: if threads would be changed it would stop working!!! 00386 00387 # i.e. Memcheck, cpu, events 00388 reStart = re.compile(r"""^Launching the (.*) tests on cpu (\d+) with (\d+) events each$""") 00389 # i.e. Memcheck, thread name,core number 00390 reEnd = re.compile(r"""^(.*) test, in thread <simpleGenReportThread\((.+), stopped\)> is done running on core (\d+)$""") 00391 00392 #i.e. thread = Thread-1 00393 reAddThread = re.compile(r"""^Adding thread <simpleGenReportThread\((.+), started\)> to the list of active threads$""") 00394 00395 reExitCode = re.compile(r"""Individual cmsRelvalreport.py ExitCode (\d+)""") 00396 """ we search for lines being either: (it's a little pascal'ish but we need the index!) """ 00397 for line_index in xrange(0, len(lines)): 00398 line = lines[line_index] 00399 00400 # * starting of test 00401 if reStart.match(line): 00402 #print reStart.match(line).groups() 00403 testName, testCore, testEventsNum = reStart.match(line).groups() 00404 00405 time = self.firstTimeStampAfter(line_index, lines) 00406 00407 #find the name of Thread: it's one of the lines before 00408 line_thread = self.findLineBefore(line_index, lines, test_condition=lambda l: reAddThread.match(l)) 00409 (thread_id, ) = reAddThread.match(line_thread).groups() 00410 00411 #we add it to the list of threads as we DO NOT KNOW EXACT NAME OF TEST 00412 if not threads.has_key(thread_id): 00413 threads[thread_id] = {} 00414 # this way we would get an Exception in case of unknown test name! 00415 threads[thread_id].update({"name": testName, "events_num": testEventsNum, "core": testCore, "start": time, "thread_id": thread_id}) 00416 00417 # * or end of test 00418 if reEnd.match(line): 00419 testName, thread_id, testCore = reEnd.match(line).groups() 00420 if not threads.has_key(testName): 00421 threads[thread_id] = {} 00422 #TODO: we get an exception if we found non existing 00423 00424 time = self.firstTimeStampBefore(line_index, lines) 00425 try: 00426 exit_code = "" 00427 #we search for the exit code 00428 line_exitcode = self.findLineBefore(line_index, lines, test_condition=lambda l: reExitCode.match(l)) 00429 exit_code, = reExitCode.match(line_exitcode).groups() 00430 except Exception, e: 00431 print "Error while getting exit code (Other test): %s" + str(e) 00432 00433 00434 # this way we would get an Exception in case of unknown test name! So we would be warned if the format have changed 00435 threads[thread_id].update({"end": time, "exit_code":exit_code}) 00436 for key, thread in threads.items(): 00437 tests[thread["name"]] = thread 00438 return tests 00439
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseGeneralInfo | ( | self | ) |
Definition at line 255 of file parserPerfsuiteMetadata.py.
00256 : 00257 lines = self.lines_general 00258 """ we define a simple list (tuple) of rules for parsing, the first part tuple defines the parameters to be fetched from the 00259 regexp while the second one is the regexp itself """ 00260 #TIP: don't forget that tuple of one ends with , 00261 parsing_rules = ( 00262 (("", "num_cores", "run_on_cpus"), r"""^This machine \((.+)\) is assumed to have (\d+) cores, and the suite will be run on cpu \[(.+)\]$"""), 00263 (("start_time", "host", "local_workdir", "user"), r"""^Performance Suite started running at (.+) on (.+) in directory (.+), run by user (.+)$""", "req"), 00264 (("architecture",) ,r"""^Current Architecture is (.+)$"""), 00265 (("test_release_based_on",), r"""^Test Release based on: (.+)$""", "req"), 00266 (("base_release_path",) , r"""^Base Release in: (.+)$"""), 00267 (("test_release_local_path",) , r"""^Your Test release in: (.+)$"""), 00268 00269 (("castor_dir",) , r"""^The performance suite results tarball will be stored in CASTOR at (.+)$"""), 00270 00271 (("TimeSize_events",) , r"""^(\d+) TimeSize events$"""), 00272 (("IgProf_events",) , r"""^(\d+) IgProf events$"""), 00273 (("CallGrind_events",) , r"""^(\d+) Callgrind events$"""), 00274 (("Memcheck_events",) , r"""^(\d+) Memcheck events$"""), 00275 00276 (("candles_TimeSize",) , r"""^TimeSizeCandles \[(.*)\]$"""), 00277 (("candles_TimeSizePU",) , r"""^TimeSizePUCandles \[(.*)\]$"""), 00278 00279 (("candles_Memcheck",) , r"""^MemcheckCandles \[(.*)\]$"""), 00280 (("candles_MemcheckPU",) , r"""^MemcheckPUCandles \[(.*)\]$"""), 00281 00282 (("candles_Callgrind",) , r"""^CallgrindCandles \[(.*)\]$"""), 00283 (("candles_CallgrindPU",) , r"""^CallgrindPUCandles \[(.*)\]$"""), 00284 00285 (("candles_IgProfPU",) , r"""^IgProfPUCandles \[(.*)\]$"""), 00286 (("candles_IgProf",) , r"""^IgProfCandles \[(.*)\]$"""), 00287 00288 00289 (("cmsScimark_before",) , r"""^(\d+) cmsScimark benchmarks before starting the tests$"""), 00290 (("cmsScimark_after",) , r"""^(\d+) cmsScimarkLarge benchmarks before starting the tests$"""), 00291 (("cmsDriverOptions",) , r"""^Running cmsDriver.py with user defined options: --cmsdriver="(.+)"$"""), 00292 00293 (("HEPSPEC06_SCORE",) ,r"""^This machine's HEPSPEC06 score is: (.+)$"""), 00294 00295 00296 ) 00297 """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """ 00298 info = self._applyParsingRules(parsing_rules, lines) 00299 00300 00301 """ postprocess the candles list """ 00302 candles = {} 00303 for field, value in info.items(): 00304 if field.startswith("candles_"): 00305 test = field.replace("candles_", "") 00306 value = [v.strip(" '") for v in value.split(",")] 00307 #if value: 00308 candles[test]=value 00309 del info[field] 00310 #print candles 00311 info["candles"] = self._LINE_SEPARATOR.join([k+":"+",".join(v) for (k, v) in candles.items()]) 00312 00313 00314 """ TAGS """ 00315 """ 00316 --- Tag --- --- RelTag --- -------- Package -------- 00317 HEAD V05-03-06 IgTools/IgProf 00318 V01-06-05 V01-06-04 Validation/Performance 00319 --------------------------------------- 00320 total packages: 2 (2 displayed) 00321 """ 00322 tags_start_index = -1 # set some default 00323 try: 00324 tags_start_index = [i for i in xrange(0, len(lines)) if lines[i].startswith("--- Tag ---")][0] 00325 except: 00326 pass 00327 if tags_start_index > -1: 00328 tags_end_index = [i for i in xrange(tags_start_index + 1, len(lines)) if lines[i].startswith("---------------------------------------")][0] 00329 # print "tags start index: %s, end index: %s" % (tags_start_index, tags_end_index) 00330 tags = lines[tags_start_index:tags_end_index+2] 00331 # print [tag.split(" ") for tag in tags] 00332 # print "\n".join(tags) 00333 else: # no tags found, make an empty list ... 00334 tags = [] 00335 """ we join the tags with separator to store as simple string """ 00336 info["tags"] = self._LINE_SEPARATOR.join(tags) 00337 #FILES/PATHS 00338 00339 00340 """ get the command line """ 00341 try: 00342 cmd_index = self.findFirstIndex_ofStartsWith(lines, "Performance suite invoked with command line:") + 1 #that's the next line 00343 info["command_line"] = lines[cmd_index] 00344 except IndexError, e: 00345 if self._DEBUG: 00346 print e 00347 info["command_line"] = "" 00348 00349 try: 00350 cmd_parsed_start = self.findFirstIndex_ofStartsWith(lines, "Initial PerfSuite Arguments:") + 1 00351 cmd_parsed_end = self.findFirstIndex_ofStartsWith(lines, "Running cmsDriver.py") 00352 info["command_line_parsed"] = self._LINE_SEPARATOR.join(lines[cmd_parsed_start:cmd_parsed_end]) 00353 except IndexError, e: 00354 if self._DEBUG: 00355 print e 00356 info["command_line"] = "" 00357 00358 return info 00359
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseTheCompletion | ( | self | ) |
checks if the suite has successfully finished and if the tarball was successfully archived and uploaded to the castor
Definition at line 638 of file parserPerfsuiteMetadata.py.
00639 : 00640 """ 00641 checks if the suite has successfully finished 00642 and if the tarball was successfully archived and uploaded to the castor """ 00643 00644 parsing_rules = ( 00645 (("finishing_time", "", ""), r"""^Performance Suite finished running at (.+) on (.+) in directory (.+)$"""), 00646 (("castor_md5",) , r"""^The md5 checksum of the tarball: (.+)$"""), 00647 (("successfully_archived_tarball", ), r"""^Successfully archived the tarball (.+) in CASTOR!$"""), 00648 #TODO: WE MUST HAVE THE CASTOR URL, but for some of files it's not included [probably crashed] 00649 (("castor_file_url",), r"""^The tarball can be found: (.+)$"""), 00650 (("castor_logfile_url",), r"""^The logfile can be found: (.+)$"""), 00651 ) 00652 00653 00654 """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """ 00655 info = self._applyParsingRules(parsing_rules, self.lines_other) 00656 00657 """ did we detect any errors in log files ? """ 00658 info["no_errors_detected"] = [line for line in self.lines_other if line == "There were no errors detected in any of the log files!"] and "1" or "0" 00659 if not info["successfully_archived_tarball"]: 00660 info["castor_file_url"] = "" 00661 00662 if not info["castor_file_url"]: 00663 #TODO: get the castor file url or abort 00664 self.handleParsingError( "Castor tarball URL not found. Trying to get from environment") 00665 lmdb_castor_url_is_valid = lambda url: url.startswith("/castor/") 00666 00667 url = "" 00668 try: 00669 print "HERE!" 00670 url=self.get_tarball_fromlog() 00671 print "Extracted castor tarball full path by re-parsing cmsPerfSuite.log: %s"%url 00672 00673 except: 00674 if os.environ.has_key("PERFDB_CASTOR_FILE_URL"): 00675 url = os.environ["PERFDB_CASTOR_FILE_URL"] 00676 00677 else: #FIXME: add the possibility to get it directly from the cmsPerfSuite.log file (make sure it is dumped there before doing the tarball itself...) 00678 print "Failed to get the tarball location from environment variable PERFDB_CASTOR_FILE_URL" 00679 self.handleParsingError( "Castor tarball URL not found. Provide interactively") 00680 00681 while True: 00682 00683 if lmdb_castor_url_is_valid(url): 00684 info["castor_file_url"] = url 00685 break 00686 print "Please enter a valid CASTOR url: has to start with /castor/ and should point to the tarball" 00687 url = sys.stdin.readline() 00688 00689 return info
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::parseTimeSize | ( | self | ) |
parses the timeSize
Definition at line 440 of file parserPerfsuiteMetadata.py.
00441 : 00442 """ parses the timeSize """ 00443 timesize_result = [] 00444 00445 # TODO: we will use the first timestamp after the "or these tests will use user input file..." 00446 #TODO: do we have to save the name of input file somewhere? 00447 """ 00448 the structure of input file: 00449 * beginning ->> and start timestamp- the firstone: 00450 >>> [optional:For these tests will use user input file /build/RAWReference/MinBias_RAW_320_IDEAL.root] 00451 <...> 00452 Using user-specified cmsDriver.py options: --conditions FrontierConditions_GlobalTag,MC_31X_V4::All --eventcontent RECOSIM 00453 Candle MinBias will be PROCESSED 00454 You defined your own steps to run: 00455 RAW2DIGI-RECO 00456 *Candle MinBias 00457 Written out cmsRelvalreport.py input file at: 00458 /build/relval/CMSSW_3_2_4/workStep2/MinBias_TimeSize/SimulationCandles_CMSSW_3_2_4.txt 00459 Thu Aug 13 14:53:37 2009 [start] 00460 <....> 00461 Thu Aug 13 16:04:48 2009 [end] 00462 Individual cmsRelvalreport.py ExitCode 0 00463 * ending - the last timestamp "... ExitCode ...." 00464 """ 00465 #TODO: do we need the cmsDriver --conditions? I suppose it would the global per work directory = 1 perfsuite run (so samefor all candles in one work dir) 00466 # TODO: which candle definition to use? 00467 """ divide into separate jobs """ 00468 lines = self.lines_timesize 00469 jobs = [] 00470 start = False 00471 timesize_start_indicator = re.compile(r"""^taskset -c (\d+) cmsRelvalreportInput.py""") 00472 for line_index in xrange(0, len(lines)): 00473 line = lines[line_index] 00474 # search for start of each TimeSize job (with a certain candle and step) 00475 if timesize_start_indicator.match(line): 00476 if start: 00477 jobs.append(lines[start:line_index]) 00478 start = line_index 00479 #add the last one 00480 jobs.append(lines[start:len(lines)]) 00481 #print "\n".join(str(i) for i in jobs) 00482 00483 parsing_rules = ( 00484 (("", "candle", ), r"""^(Candle|ONLY) (.+) will be PROCESSED$""", "req"), 00485 #e.g.: --conditions FrontierConditions_GlobalTag,MC_31X_V4::All --eventcontent RECOSIM 00486 (("cms_driver_options", ), r"""^Using user-specified cmsDriver.py options: (.+)$"""), 00487 (("", "conditions", ""), r"""^Using user-specified cmsDriver.py options: (.*)--conditions ([^\s]+)(.*)$""", "req"), 00488 # for this we cannot guarrantee that it has been found, TODO: we might count the number of pileup candles and compare with arguments 00489 (("", "pileup_type", ""), r"""^Using user-specified cmsDriver.py options:(.*)--pileup=([^\s]+)(.*)$"""), 00490 #not shure if event content is required 00491 (("", "event_content", ""), r"""^Using user-specified cmsDriver.py options:(.*)--eventcontent ([^\s]+)(.*)$""", "req"), 00492 #TODO: after changeing the splitter to "taskset -c ..." this is no longer included into the part of correct job 00493 #(("input_user_root_file", ), r"""^For these tests will use user input file (.+)$"""), 00494 ) 00495 00496 #parse each of the TimeSize jobs: find candles, etc and start-end times 00497 00498 reExit_code = re.compile(r"""Individual ([^\s]+) ExitCode (\d+)""") 00499 00500 if self._DEBUG: 00501 print "TimeSize (%d) jobs: %s" % (len(jobs), str(jobs)) 00502 00503 for job_lines in jobs: 00504 """ we apply the defined parsing rules to extract the required fields of information into the dictionary (as defined in parsing rules) """ 00505 info = self._applyParsingRules(parsing_rules, job_lines) 00506 #Fixing here the compatibility with new cmsdriver.py --conditions option (for which now we have autoconditions and FrontierConditions_GlobalTag is optional): 00507 if 'auto:' in info['conditions']: 00508 from Configuration.PyReleaseValidation.autoCond import autoCond 00509 info['conditions'] = autoCond[ info['conditions'].split(':')[1] ].split("::")[0] 00510 else: 00511 if 'FrontierConditions_GlobalTag' in info['conditions']: 00512 info['conditions']=info['conditions'].split(",")[1] 00513 00514 #DEBUG: 00515 #print "CONDITIONS are: %s"%info['conditions'] 00516 #start time - the index after which comes the time stamp 00517 """ the following is not available on one of the releases, instead 00518 use the first timestamp available on our job - that's the starting time :) """ 00519 00520 #start_time_after = self.findFirstIndex_ofStartsWith(job_lines, "Written out cmsRelvalreport.py input file at:") 00521 #print start_time_after 00522 info["start"] = self.firstTimeStampAfter(0, job_lines) 00523 00524 #TODO: improve in future (in case of some changes) we could use findBefore instead which uses the regexp as parameter for searching 00525 #end time - the index before which comes the time stamp 00526 00527 # On older files we have - "Individual Relvalreport.py ExitCode 0" instead of "Individual cmsRelvalreport.py ExitCode" 00528 end_time_before = self.findLineAfter(0, job_lines, test_condition = reExit_code.match, return_index = True) 00529 00530 # on the same line we have the exit Code - so let's get it 00531 nothing, exit_code = reExit_code.match(job_lines[end_time_before]).groups() 00532 00533 info["end"] = self.firstTimeStampBefore(end_time_before, job_lines) 00534 info["exit_code"] = exit_code 00535 00536 steps_start = self.findFirstIndex_ofStartsWith(job_lines, "You defined your own steps to run:") 00537 steps_end = self.findFirstIndex_ofStartsWith(job_lines, "*Candle ") 00538 #probably it includes steps until we found *Candle... ? 00539 steps = job_lines[steps_start + 1:steps_end] 00540 if not self.validateSteps(steps): 00541 self.handleParsingError( "Steps were not found corrently: %s for current job: %s" % (str(steps), str(job_lines))) 00542 00543 """ quite nasty - just a work around """ 00544 print "Trying to recover from this error in case of old cmssw" 00545 00546 """ we assume that steps are between the following sentance and a TimeStamp """ 00547 steps_start = self.findFirstIndex_ofStartsWith(job_lines, "Steps passed to writeCommands") 00548 steps_end = self.findLineAfter(steps_start, job_lines, test_condition = self.isTimeStamp, return_index = True) 00549 00550 steps = job_lines[steps_start + 1:steps_end] 00551 if not self.validateSteps(steps): 00552 self.handleParsingError( "EVEN AFTER RECOVERY Steps were not found corrently! : %s for current job: %s" % (str(steps), str(job_lines))) 00553 else: 00554 print "RECOVERY SEEMS to be successful: %s" % str(steps) 00555 00556 info["steps"] = self._LINE_SEPARATOR.join(steps) #!!!! STEPS MIGHT CONTAIN COMMA: "," 00557 00558 00559 timesize_result.append(info) return {"TimeSize": timesize_result}
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readCmsScimark | ( | self, | |
main_cores = [1] |
|||
) |
Definition at line 576 of file parserPerfsuiteMetadata.py.
00577 : 00578 main_core = main_cores[0] 00579 #TODO: WE DO NOT ALWAYS REALLY KNOW THE MAIN CORE NUMBER! but we don't care too much 00580 #we parse each of the SciMark files and the Composite scores 00581 csimark = [] 00582 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark2", testType = "mainCore", core = main_core)) 00583 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark2_large", testType = "mainCore_Large", core = main_core)) 00584 00585 00586 #we not always know the number of cores available so we will just search the directory to find out core numbers 00587 reIsCsiMark_notusedcore = re.compile("^cmsScimark_(\d+).log$") 00588 scimark_files = [reIsCsiMark_notusedcore.match(f).groups()[0] 00589 for f in os.listdir(self._path) 00590 if reIsCsiMark_notusedcore.match(f) 00591 and os.path.isfile(os.path.join(self._path, f)) ] 00592 00593 for core_number in scimark_files: 00594 try: 00595 csimark.extend(self.readCmsScimarkTest(testName = "cmsScimark_%s" % str(core_number), testType = "NotUsedCore_%s" %str(core_number), core = core_number)) 00596 except IOError, e: 00597 if self._DEBUG: 00598 print e 00599 return csimark 00600 #print csimark
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readCmsScimarkTest | ( | self, | |
testName, | |||
testType, | |||
core | |||
) |
Definition at line 564 of file parserPerfsuiteMetadata.py.
00565 : 00566 lines = self.readInput(self._path, fileName = testName + ".log") 00567 scores = [{"score": self.reCmsScimarkTest.match(line).groups()[1], "type": testType, "core": core} 00568 for line in lines 00569 if self.reCmsScimarkTest.match(line)] 00570 #add the number of messurment 00571 i = 0 00572 for score in scores: 00573 i += 1 00574 score.update({"messurement_number": i}) 00575 return scores
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::readInput | ( | self, | |
path, | |||
fileName = "cmsPerfSuite.log" |
|||
) |
Definition at line 161 of file parserPerfsuiteMetadata.py.
def parserPerfsuiteMetadata::parserPerfsuiteMetadata::validateSteps | ( | self, | |
steps | |||
) |
Simple function for error detection. TODO: we could use a list of possible steps also
Definition at line 24 of file parserPerfsuiteMetadata.py.
Definition at line 28 of file parserPerfsuiteMetadata.py.
string parserPerfsuiteMetadata::parserPerfsuiteMetadata::_LINE_SEPARATOR = "|" [static, private] |
Definition at line 23 of file parserPerfsuiteMetadata.py.
Definition at line 28 of file parserPerfsuiteMetadata.py.
Definition at line 28 of file parserPerfsuiteMetadata.py.
Definition at line 28 of file parserPerfsuiteMetadata.py.
Definition at line 28 of file parserPerfsuiteMetadata.py.
Definition at line 28 of file parserPerfsuiteMetadata.py.
Definition at line 34 of file parserPerfsuiteMetadata.py.
Definition at line 34 of file parserPerfsuiteMetadata.py.
Definition at line 34 of file parserPerfsuiteMetadata.py.
Definition at line 34 of file parserPerfsuiteMetadata.py.
Definition at line 28 of file parserPerfsuiteMetadata.py.