CMS 3D CMS Logo

List of all members | Public Member Functions | Public Attributes | Private Attributes
crabFunctions.CrabTask Class Reference

Class for a single CrabRequest e This class represents one crab3 task/request. More...

Public Member Functions

def __init__ (self, taskname="", crab_config="", crabController=None, initUpdate=True, debuglevel="ERROR", datasetpath="", localDir="", outlfn="")
 The object constructor. More...
 
def crab_folder (self)
 
def crabConfig (self)
 Function to access crab config object or read it if unititalized. More...
 
def crabFolder (self)
 
def datasetpath (self)
 
def handleNoState (self)
 Function to handle Task which received NOSTATE status. More...
 
def isData (self)
 Property function to find out if task runs on data. More...
 
def readLogArch (self, logArchName)
 Function to read log info from log.tar.gz. More...
 
def resubmit_failed (self)
 Function to resubmit failed jobs in tasks. More...
 
def test_print (self)
 
def update (self)
 Function to update Task in associated Jobs. More...
 
def updateJobStats (self, dCacheFileList=None)
 Function to update JobStatistics. More...
 

Public Attributes

 debug
 
 failureReason
 
 finalFiles
 
 isUpdating
 
 jobs
 
 lastUpdate
 
 localDir
 
 log
 
 maxjobnumber
 
 name
 
 nComplete
 
 nCooloff
 
 nFailed
 
 nFinished
 
 nIdle
 
 nJobs
 
 nRunning
 
 nTransferring
 
 nUnsubmitted
 
 outlfn
 
 resubmitCount
 
 state
 
 taskId
 
 totalEvents
 
 uuid
 

Private Attributes

 _crabConfig
 
 _crabFolder
 
 _datasetpath_default
 
 _isData
 

Detailed Description

Class for a single CrabRequest e This class represents one crab3 task/request.

Definition at line 371 of file crabFunctions.py.

Constructor & Destructor Documentation

◆ __init__()

def crabFunctions.CrabTask.__init__ (   self,
  taskname = "",
  crab_config = "",
  crabController = None,
  initUpdate = True,
  debuglevel = "ERROR",
  datasetpath = "",
  localDir = "",
  outlfn = "" 
)

The object constructor.

Parameters
selfThe object pointer.
tasknameThe object pointer.
initUpdateFlag if crab status should be called when an instance is created

Definition at line 386 of file crabFunctions.py.

386  outlfn = "" ,):
387 
388  # crab config as a python object should only be used via .config
389  self._crabConfig = None
390 
391  self._crabFolder = None
392 
393  if taskname:
394  self.name = taskname
395  else:
396  if not crab_config:
397  raise ValueError("Either taskname or crab_config needs to be set")
398  if not os.path.exists( crab_config):
399  raise IOError("File %s not found" % crab_config )
400  self.name = crab_config
401  self.name = self.crabConfig.General.requestName
402  self.uuid = uuid.uuid4()
403  #~ self.lock = multiprocessing.Lock()
404  #setup logging
405  self.log = logging.getLogger( 'crabTask' )
406  self.log.setLevel(logging.getLevelName(debuglevel))
407  self.jobs = {}
408  self.localDir = localDir
409  self.outlfn = outlfn
410  self.isUpdating = False
411  self.taskId = -1
412  #variables for statistics
413  self.nJobs = 0
414  self.state = "NOSTATE"
415  self.maxjobnumber = 0
416  self.nUnsubmitted = 0
417  self.nIdle = 0
418  self.nRunning = 0
419  self.nTransferring = 0
420  self.nCooloff = 0
421  self.nFailed = 0
422  self.nFinished = 0
423  self.nComplete = 0
424  self.failureReason = None
425  self.lastUpdate = datetime.datetime.now().strftime( "%Y-%m-%d_%H.%M.%S" )
426 
427  self._isData = None
428  self.resubmitCount = 0
429 
430  self.debug = False
431 
432  self.finalFiles = []
433  self.totalEvents = 0
434 
435 
436  self._datasetpath_default = datasetpath
437 
438  #start with first updates
439  if initUpdate:
440  self.update()
441  self.updateJobStats()
442 

Member Function Documentation

◆ crab_folder()

def crabFunctions.CrabTask.crab_folder (   self)

Definition at line 506 of file crabFunctions.py.

References crabFunctions.CrabTask.crabConfig().

Referenced by crabFunctions.CrabTask.update().

506  def crab_folder(self):
507  return os.path.join( self.crabConfig.General.workArea,
508  "crab_" + self.crabConfig.General.requestName)

◆ crabConfig()

def crabFunctions.CrabTask.crabConfig (   self)

Function to access crab config object or read it if unititalized.

Parameters
selfCrabTask The object pointer.

Definition at line 464 of file crabFunctions.py.

References crabFunctions.CrabTask._crabConfig, AlignableObjectId::entry.name, XMLProcessor::_loaderBaseConfig.name, h4DSegm.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, classes.MonitorData.name, classes.OutputData.name, h2DSegm.name, geometry.Structure.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, and hTMaxCell.name.

Referenced by crabFunctions.CrabTask.crab_folder(), crabFunctions.CrabTask.crabFolder(), crabFunctions.CrabTask.datasetpath(), and crabFunctions.CrabTask.isData().

464  def crabConfig( self ):
465  if self._crabConfig is None:
466  crab = CrabController()
467  self._crabConfig = crab.readCrabConfig( self.name )
468  return self._crabConfig
469 

◆ crabFolder()

def crabFunctions.CrabTask.crabFolder (   self)

Definition at line 479 of file crabFunctions.py.

References crabFunctions.CrabTask._crabFolder, crabFunctions.CrabTask.crabConfig(), relativeConstraints.error, crabFunctions.CrabTask.log, AlignableObjectId::entry.name, XMLProcessor::_loaderBaseConfig.name, h4DSegm.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, classes.MonitorData.name, classes.OutputData.name, h2DSegm.name, geometry.Structure.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, and hTMaxCell.name.

479  def crabFolder( self ):
480  if not self._crabFolder is None: return self._crabFolder
481  crab = CrabController()
482  if os.path.exists( os.path.join( self.crabConfig.General.workArea, crab._prepareFoldername( self.name ) ) ):
483  self._crabFolder = os.path.join( self.crabConfig.General.workArea, crab._prepareFoldername( self.name ) )
484  return self._crabFolder
485  alternative_path = os.path.join(os.path.cwd(), crab._prepareFoldername( self.name ) )
486  if os.path.exists( alternative_path ):
487  self._crabFolder = alternative_path
488  return self._crabFolder
489  self.log.error( "Unable to find folder for Task")
490  return ""
491 

◆ datasetpath()

def crabFunctions.CrabTask.datasetpath (   self)

Definition at line 471 of file crabFunctions.py.

References crabFunctions.CrabTask._datasetpath_default, and crabFunctions.CrabTask.crabConfig().

471  def datasetpath( self ):
472  try:
473  return self.crabConfig.Data.inputDataset
474  except:
475  pass
476  return self._datasetpath_default
477 

◆ handleNoState()

def crabFunctions.CrabTask.handleNoState (   self)

Function to handle Task which received NOSTATE status.

Parameters
selfCrabTask The object pointer.

Definition at line 541 of file crabFunctions.py.

References AlignableObjectId::entry.name, XMLProcessor::_loaderBaseConfig.name, h4DSegm.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, classes.MonitorData.name, classes.OutputData.name, h2DSegm.name, geometry.Structure.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, hTMaxCell.name, crabFunctions.CrabTask.resubmitCount, CastorLedAnalysis.state, HcalLedAnalysis.state, HcalPedestalAnalysis.state, CastorPedestalAnalysis.state, and crabFunctions.CrabTask.state.

Referenced by crabFunctions.CrabTask.update().

541  def handleNoState( self ):
542  crab = CrabController()
543  if "The CRAB3 server backend could not resubmit your task because the Grid scheduler answered with an error." in task.failureReason:
544  # move folder and try it again
545  cmd = 'mv %s bak_%s' %(crab._prepareFoldername( self.name ),crab._prepareFoldername( self.name ))
546  p = subprocess.Popen(cmd,stdout=subprocess.PIPE, shell=True)#,shell=True,universal_newlines=True)
547  (out,err) = p.communicate()
548  self.state = "SHEDERR"
549  configName = '%s_cfg.py' %(crab._prepareFoldername( self.name ))
550  crab.submit( configName )
551 
552  elif task.failureReason is not None:
553  self.state = "ERRHANDLE"
554  crab.resubmit( self.name )
555  self.resubmitCount += 1
556 

◆ isData()

def crabFunctions.CrabTask.isData (   self)

Property function to find out if task runs on data.

Parameters
selfCrabTask The object pointer.

Definition at line 447 of file crabFunctions.py.

References crabFunctions.CrabTask._isData, crabFunctions.CrabTask.crabConfig(), AlignableObjectId::entry.name, XMLProcessor::_loaderBaseConfig.name, h4DSegm.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, classes.MonitorData.name, classes.OutputData.name, h2DSegm.name, geometry.Structure.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, and hTMaxCell.name.

447  def isData( self ):
448  if self._isData is None:
449  try:
450  test = self.crabConfig.Data.lumiMask
451  self._isData = True
452  except:
453  if self.name.startswith( "Data_" ):
454  self._isData = True
455  else:
456  self._isData = False
457  return self._isData
458 
459 

◆ readLogArch()

def crabFunctions.CrabTask.readLogArch (   self,
  logArchName 
)

Function to read log info from log.tar.gz.

Parameters
selfThe object pointer.
logArchNamepath to the compressed log file
Returns
a dictionary with parsed info

Definition at line 598 of file crabFunctions.py.

References createfilelist.int, print(), and submitPVValidationJobs.split().

598  def readLogArch(self, logArchName):
599  JobNumber = logArchName.split("/")[-1].split("_")[1].split(".")[0]
600  log = {'readEvents' : 0}
601  with tarfile.open( logArchName, "r") as tar:
602  try:
603  JobXmlFile = tar.extractfile('FrameworkJobReport-%s.xml' % JobNumber)
604  root = ET.fromstring( JobXmlFile.read() )
605  for child in root:
606  if child.tag == 'InputFile':
607  for subchild in child:
608  if subchild.tag == 'EventsRead':
609  nEvents = int(subchild.text)
610  log.update({'readEvents' : nEvents})
611  break
612  break
613  except:
614  print("Can not parse / read %s" % logArchName)
615  return log
616 
void print(TMatrixD &m, const char *label=nullptr, bool mathematicaFormat=false)
Definition: Utilities.cc:47

◆ resubmit_failed()

def crabFunctions.CrabTask.resubmit_failed (   self)

Function to resubmit failed jobs in tasks.

Parameters
selfCrabTask The object pointer.

Definition at line 495 of file crabFunctions.py.

References crabFunctions.CrabTask.jobs, relativeConstraints.keys, crabFunctions.CrabTask.lastUpdate, AlignableObjectId::entry.name, XMLProcessor::_loaderBaseConfig.name, h4DSegm.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, classes.MonitorData.name, classes.OutputData.name, h2DSegm.name, geometry.Structure.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, and hTMaxCell.name.

495  def resubmit_failed( self ):
496  failedJobIds = []
497  controller = CrabController()
498  for jobkey in self.jobs.keys():
499  job = self.jobs[jobkey]
500  if job['State'] == 'failed':
501  failedJobIds.append( job['JobIds'][-1] )
502  controller.resubmit( self.name, joblist = failedJobIds )
503  self.lastUpdate = datetime.datetime.now().strftime( "%Y-%m-%d_%H.%M.%S" )
504 

◆ test_print()

def crabFunctions.CrabTask.test_print (   self)

Definition at line 557 of file crabFunctions.py.

References crabFunctions.CrabTask.uuid.

557  def test_print(self):
558  return self.uuid

◆ update()

def crabFunctions.CrabTask.update (   self)

Function to update Task in associated Jobs.

Parameters
selfCrabTask The object pointer.

Definition at line 512 of file crabFunctions.py.

References crabFunctions.CrabTask.crab_folder(), crabFunctions.CrabTask.debug, crabFunctions.CrabTask.failureReason, crabFunctions.CrabTask.handleNoState(), crabFunctions.CrabTask.isUpdating, crabFunctions.CrabTask.jobs, crabFunctions.CrabTask.lastUpdate, crabFunctions.CrabTask.log, AlignableObjectId::entry.name, XMLProcessor::_loaderBaseConfig.name, h4DSegm.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, classes.MonitorData.name, classes.OutputData.name, h2DSegm.name, geometry.Structure.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, hTMaxCell.name, Mpslibclass.jobdatabase.nJobs, crabFunctions.CrabTask.nJobs, crabFunctions.CrabTask.resubmitCount, CosmicNavigationSchool::CosmicNavigationSchoolConfiguration.self, DDLSAX2FileHandler.self(), CastorLedAnalysis.state, HcalLedAnalysis.state, HcalPedestalAnalysis.state, CastorPedestalAnalysis.state, crabFunctions.CrabTask.state, and crabFunctions.CrabTask.updateJobStats().

Referenced by progressbar.ProgressBar.__next__(), MatrixUtil.Matrix.__setitem__(), MatrixUtil.Steps.__setitem__(), dqm-mbProfile.Profile.finish(), progressbar.ProgressBar.finish(), and MatrixUtil.Steps.overwrite().

512  def update(self):
513  #~ self.lock.acquire()
514  self.log.debug( "Start update for task %s" % self.name )
515  self.isUpdating = True
516  controller = CrabController()
517  self.state = "UPDATING"
518  # check if we should drop this sample due to missing info
519 
520  self.log.debug( "Try to get status for task" )
521  self.state , self.jobs,self.failureReason = controller.status(self.crab_folder)
522  self.log.debug( "Found state: %s" % self.state )
523  if self.state=="FAILED":
524  #try it once more
525  time.sleep(2)
526  self.state , self.jobs,self.failureReason = controller.status(self.crab_folder)
527  self.nJobs = len(self.jobs)
528  self.updateJobStats()
529  if self.state == "NOSTATE":
530  self.log.debug( "Trying to resubmit because of NOSTATE" )
531  if self.resubmitCount < 3: self.self.handleNoState()
532  # add to db if not
533  # Final solution inf state not yet found
534  self.isUpdating = False
535  self.lastUpdate = datetime.datetime.now().strftime( "%Y-%m-%d_%H.%M.%S" )
536  #~ self.lock.release()
537 
#define debug
Definition: HDRShower.cc:19
#define update(a, b)

◆ updateJobStats()

def crabFunctions.CrabTask.updateJobStats (   self,
  dCacheFileList = None 
)

Function to update JobStatistics.

Parameters
selfThe object pointer.
dCacheFilelistA list of files on the dCache

Definition at line 563 of file crabFunctions.py.

References any(), createfilelist.int, crabFunctions.CrabTask.jobs, relativeConstraints.keys, AlignableObjectId::entry.name, XMLProcessor::_loaderBaseConfig.name, h4DSegm.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, classes.MonitorData.name, classes.OutputData.name, h2DSegm.name, geometry.Structure.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, hTMaxCell.name, crabFunctions.CrabTask.nComplete, and print().

Referenced by crabFunctions.CrabTask.update().

563  def updateJobStats(self,dCacheFileList = None):
564  jobKeys = sorted(self.jobs.keys())
565  try:
566  intJobkeys = [int(x) for x in jobKeys]
567  except:
568  print("error parsing job numers to int")
569 
570  #maxjobnumber = max(intJobkeys)
571 
572  stateDict = {'unsubmitted':0,'idle':0,'running':0,'transferring':0,'cooloff':0,'failed':0,'finished':0}
573  nComplete = 0
574 
575  # loop through jobs
576  for key in jobKeys:
577  job = self.jobs[key]
578  #check if all completed files are on decache
579  for statekey in stateDict.keys():
580  if statekey in job['State']:
581  stateDict[statekey]+=1
582  # check if finished fails are found on dCache if dCacheFilelist is given
583  if dCacheFileList is not None:
584  outputFilename = "%s_%s"%( self.name, key)
585  if 'finished' in statekey and any(outputFilename in s for s in dCacheFileList):
586  nComplete +=1
587 
588  for state in stateDict:
589  attrname = "n" + state.capitalize()
590  setattr(self, attrname, stateDict[state])
591  self.nComplete = nComplete
592 
bool any(const std::vector< T > &v, const T &what)
Definition: ECalSD.cc:37
void print(TMatrixD &m, const char *label=nullptr, bool mathematicaFormat=false)
Definition: Utilities.cc:47

Member Data Documentation

◆ _crabConfig

crabFunctions.CrabTask._crabConfig
private

Definition at line 389 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.crabConfig().

◆ _crabFolder

crabFunctions.CrabTask._crabFolder
private

Definition at line 391 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.crabFolder().

◆ _datasetpath_default

crabFunctions.CrabTask._datasetpath_default
private

Definition at line 436 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.datasetpath().

◆ _isData

crabFunctions.CrabTask._isData
private

Definition at line 427 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.isData().

◆ debug

crabFunctions.CrabTask.debug

◆ failureReason

crabFunctions.CrabTask.failureReason

Definition at line 424 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.update().

◆ finalFiles

crabFunctions.CrabTask.finalFiles

Definition at line 432 of file crabFunctions.py.

◆ isUpdating

crabFunctions.CrabTask.isUpdating

Definition at line 410 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.update().

◆ jobs

crabFunctions.CrabTask.jobs

◆ lastUpdate

crabFunctions.CrabTask.lastUpdate

◆ localDir

crabFunctions.CrabTask.localDir

Definition at line 408 of file crabFunctions.py.

◆ log

crabFunctions.CrabTask.log

◆ maxjobnumber

crabFunctions.CrabTask.maxjobnumber

Definition at line 415 of file crabFunctions.py.

◆ name

crabFunctions.CrabTask.name

Definition at line 394 of file crabFunctions.py.

Referenced by ElectronMVAID.ElectronMVAID.__call__(), FWLite.ElectronMVAID.__call__(), dirstructure.Directory.__create_pie_image(), DisplayManager.DisplayManager.__del__(), dqm_interfaces.DirID.__eq__(), dirstructure.Directory.__get_full_path(), dirstructure.Comparison.__get_img_name(), dirstructure.Comparison.__make_image(), core.autovars.NTupleVariable.__repr__(), core.autovars.NTupleObjectType.__repr__(), core.autovars.NTupleObject.__repr__(), core.autovars.NTupleCollection.__repr__(), dirstructure.Directory.__repr__(), dqm_interfaces.DirID.__repr__(), dirstructure.Comparison.__repr__(), config.Service.__setattr__(), config.CFG.__str__(), counter.Counter.__str__(), average.Average.__str__(), FWLite.WorkingPoints._reformat_cut_definitions(), core.autovars.NTupleObjectType.addSubObjects(), core.autovars.NTupleObjectType.addVariables(), core.autovars.NTupleObjectType.allVars(), dataset.CMSDataset.buildListOfFiles(), dataset.LocalDataset.buildListOfFiles(), dataset.CMSDataset.buildListOfFilesDBS(), dirstructure.Directory.calcStats(), crabFunctions.CrabTask.crabConfig(), crabFunctions.CrabTask.crabFolder(), validation.Sample.digest(), python.rootplot.utilities.Hist.divide(), python.rootplot.utilities.Hist.divide_wilson(), DisplayManager.DisplayManager.Draw(), TreeCrawler.Package.dump(), core.autovars.NTupleVariable.fillBranch(), core.autovars.NTupleObject.fillBranches(), core.autovars.NTupleCollection.fillBranchesScalar(), core.autovars.NTupleCollection.fillBranchesVector(), core.autovars.NTupleCollection.get_cpp_declaration(), core.autovars.NTupleCollection.get_cpp_wrapper_class(), core.autovars.NTupleCollection.get_py_wrapper_class(), utils.StatisticalTest.get_status(), production_tasks.Task.getname(), dataset.CMSDataset.getPrimaryDatasetEntries(), dataset.PrivateDataset.getPrimaryDatasetEntries(), crabFunctions.CrabTask.handleNoState(), VIDSelectorBase.VIDSelectorBase.initialize(), crabFunctions.CrabTask.isData(), personalPlayback.Applet.log(), core.autovars.NTupleVariable.makeBranch(), core.autovars.NTupleObject.makeBranches(), core.autovars.NTupleCollection.makeBranchesScalar(), core.autovars.NTupleCollection.makeBranchesVector(), dirstructure.Directory.print_report(), dataset.BaseDataset.printInfo(), dataset.Dataset.printInfo(), crabFunctions.CrabTask.resubmit_failed(), production_tasks.MonitorJobs.run(), python.rootplot.utilities.Hist.TGraph(), python.rootplot.utilities.Hist.TH1F(), crabFunctions.CrabTask.update(), crabFunctions.CrabTask.updateJobStats(), counter.Counter.write(), and average.Average.write().

◆ nComplete

crabFunctions.CrabTask.nComplete

Definition at line 423 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.updateJobStats().

◆ nCooloff

crabFunctions.CrabTask.nCooloff

Definition at line 420 of file crabFunctions.py.

◆ nFailed

crabFunctions.CrabTask.nFailed

Definition at line 421 of file crabFunctions.py.

◆ nFinished

crabFunctions.CrabTask.nFinished

Definition at line 422 of file crabFunctions.py.

◆ nIdle

crabFunctions.CrabTask.nIdle

Definition at line 417 of file crabFunctions.py.

◆ nJobs

crabFunctions.CrabTask.nJobs

Definition at line 413 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.update().

◆ nRunning

crabFunctions.CrabTask.nRunning

Definition at line 418 of file crabFunctions.py.

◆ nTransferring

crabFunctions.CrabTask.nTransferring

Definition at line 419 of file crabFunctions.py.

◆ nUnsubmitted

crabFunctions.CrabTask.nUnsubmitted

Definition at line 416 of file crabFunctions.py.

◆ outlfn

crabFunctions.CrabTask.outlfn

Definition at line 409 of file crabFunctions.py.

◆ resubmitCount

crabFunctions.CrabTask.resubmitCount

◆ state

crabFunctions.CrabTask.state

◆ taskId

crabFunctions.CrabTask.taskId

Definition at line 411 of file crabFunctions.py.

◆ totalEvents

crabFunctions.CrabTask.totalEvents

Definition at line 433 of file crabFunctions.py.

◆ uuid

crabFunctions.CrabTask.uuid

Definition at line 402 of file crabFunctions.py.

Referenced by crabFunctions.CrabTask.test_print().