CMS 3D CMS Logo

All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Modules Pages
List of all members | Public Member Functions | Public Attributes
dataset.BaseDataset Class Reference
Inheritance diagram for dataset.BaseDataset:
dataset.CMSDataset dataset.Dataset dataset.EOSDataset dataset.LocalDataset dataset.PrivateDataset

Public Member Functions

def __init__ (self, name, user, pattern='.*root', run_range=None, dbsInstance=None)
 def init(self, name, user, pattern='. More...
 
def buildListOfBadFiles (self)
 
def buildListOfFiles (self, pattern)
 
def extractFileSizes (self)
 
def getPrimaryDatasetEntries (self)
 
def listOfFiles (self)
 
def listOfGoodFiles (self)
 
def listOfGoodFilesWithPrescale (self, prescale)
 
def printFiles (self, abspath=True, info=True)
 
def printInfo (self)
 

Public Attributes

 bad_files
 
 dbsInstance
 MM. More...
 
 files
 
 filesAndSizes
 
 good_files
 
 name
 
 pattern
 
 primaryDatasetEntries
 MM. More...
 
 report
 
 run_range
 
 user
 

Detailed Description

Definition at line 22 of file dataset.py.

Constructor & Destructor Documentation

◆ __init__()

def dataset.BaseDataset.__init__ (   self,
  name,
  user,
  pattern = '.*root',
  run_range = None,
  dbsInstance = None 
)

def init(self, name, user, pattern='.

*root', run_range=None):

Definition at line 25 of file dataset.py.

25  def __init__(self, name, user, pattern='.*root', run_range=None, dbsInstance=None):
26  self.name = name
27  self.user = user
28  self.pattern = pattern
29  self.run_range = run_range
30 
31  self.dbsInstance = dbsInstance
32 
33  self.primaryDatasetEntries = -1
34  self.report = None
35  self.buildListOfFiles( self.pattern )
36  self.extractFileSizes()
37  self.buildListOfBadFiles()
38  self.primaryDatasetEntries = self.getPrimaryDatasetEntries()
39 
def __init__(self, dataset, job_number, job_id, job_name, isDA, isMC, applyBOWS, applyEXTRACOND, extraconditions, runboundary, lumilist, intlumi, maxevents, gt, allFromGT, alignmentDB, alignmentTAG, apeDB, apeTAG, bowDB, bowTAG, vertextype, tracktype, refittertype, ttrhtype, applyruncontrol, ptcut, CMSSW_dir, the_dir)

Member Function Documentation

◆ buildListOfBadFiles()

def dataset.BaseDataset.buildListOfBadFiles (   self)

Definition at line 48 of file dataset.py.

48  def buildListOfBadFiles(self):
49  self.good_files = []
50  self.bad_files = {}
51 

◆ buildListOfFiles()

def dataset.BaseDataset.buildListOfFiles (   self,
  pattern 
)

Definition at line 40 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

40  def buildListOfFiles( self, pattern ):
41  self.files = []
42 

◆ extractFileSizes()

def dataset.BaseDataset.extractFileSizes (   self)
Get the file size for each file, 
from the eos ls -l command.

Definition at line 43 of file dataset.py.

43  def extractFileSizes(self):
44  '''Get the file size for each file,
45  from the eos ls -l command.'''
46  self.filesAndSizes = {}
47 

◆ getPrimaryDatasetEntries()

def dataset.BaseDataset.getPrimaryDatasetEntries (   self)

Definition at line 56 of file dataset.py.

References dataset.BaseDataset.primaryDatasetEntries.

56  def getPrimaryDatasetEntries(self):
57  return self.primaryDatasetEntries
58 

◆ listOfFiles()

def dataset.BaseDataset.listOfFiles (   self)
Returns all files, even the bad ones.

Definition at line 82 of file dataset.py.

References readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MessageLogger.files, MatrixUtil.InputInfo.files, and JsonOutputProducer::JsonConfigurationBlock.files.

82  def listOfFiles(self):
83  '''Returns all files, even the bad ones.'''
84  return self.files
85 

◆ listOfGoodFiles()

def dataset.BaseDataset.listOfGoodFiles (   self)
Returns all files flagged as good in the integrity 
check text output, or not present in this file, are 
considered as good.

Definition at line 86 of file dataset.py.

References mps_setup.append, dataset.BaseDataset.bad_files, readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MessageLogger.files, MatrixUtil.InputInfo.files, JsonOutputProducer::JsonConfigurationBlock.files, and dataset.BaseDataset.good_files.

Referenced by dataset.BaseDataset.listOfGoodFilesWithPrescale().

86  def listOfGoodFiles(self):
87  '''Returns all files flagged as good in the integrity
88  check text output, or not present in this file, are
89  considered as good.'''
90  self.good_files = []
91  for file in self.files:
92  if file not in self.bad_files:
93  self.good_files.append( file )
94  return self.good_files
95 

◆ listOfGoodFilesWithPrescale()

def dataset.BaseDataset.listOfGoodFilesWithPrescale (   self,
  prescale 
)
Takes the list of good files and selects a random sample 
from them according to the prescale factor. 
E.g. a prescale of 10 will select 1 in 10 files.

Definition at line 96 of file dataset.py.

References dataset.BaseDataset.good_files, dataset.int, and dataset.BaseDataset.listOfGoodFiles().

96  def listOfGoodFilesWithPrescale(self, prescale):
97  """Takes the list of good files and selects a random sample
98  from them according to the prescale factor.
99  E.g. a prescale of 10 will select 1 in 10 files."""
100 
101  good_files = self.listOfGoodFiles()
102  if prescale < 2:
103  return self.good_files
104 
105  #the number of files to select from the dataset
106  num_files = int( (len(good_files)/(1.0*prescale)) + 0.5)
107  if num_files < 1:
108  num_files = 1
109  if num_files > len(good_files):
110  num_files = len(good_files)
111 
112  #pick unique good files randomly
113  import random
114  subset = set()
115  while len(subset) < num_files:
116  #pick a random file from the list
117  choice = random.choice(good_files)
118  slen = len(subset)
119  #add to the set
120  subset.add(choice)
121  #if this was a unique file remove so we don't get
122  #very slow corner cases where prescale is small
123  if len(subset) > slen:
124  good_files.remove(choice)
125  assert len(subset)==num_files,'The number of files does not match'
126 
127  return [f for f in subset]
128 

◆ printFiles()

def dataset.BaseDataset.printFiles (   self,
  abspath = True,
  info = True 
)

Definition at line 59 of file dataset.py.

References dataset.BaseDataset.bad_files, dataset.BaseDataset.buildListOfFiles(), readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MessageLogger.files, MatrixUtil.InputInfo.files, JsonOutputProducer::JsonConfigurationBlock.files, dataset.BaseDataset.filesAndSizes, dataset.BaseDataset.good_files, CSCALCT.pattern, dataset.BaseDataset.pattern, l1t::EMTFRoad.pattern, L1TMuon::TriggerPrimitive::CSCData.pattern, cscdqm::MOFilterItem.pattern, CSCCorrelatedLCTDigi.pattern, l1t::EMTFHit.pattern, dataset.BaseDataset.primaryDatasetEntries, and print().

59  def printFiles(self, abspath=True, info=True):
60  # import pdb; pdb.set_trace()
61  if self.files == None:
62  self.buildListOfFiles(self.pattern)
63  for file in self.files:
64  status = 'OK'
65  if file in self.bad_files:
66  status = self.bad_files[file]
67  elif file not in self.good_files:
68  status = 'UNKNOWN'
69  fileNameToPrint = file
70  if abspath == False:
71  fileNameToPrint = os.path.basename(file)
72  if info:
73  size=self.filesAndSizes.get(file,'UNKNOWN').rjust(10)
74  # if size is not None:
75  # size = size.rjust(10)
76  print(status.ljust(10), size, \
77  '\t', fileNameToPrint)
78  else:
79  print(fileNameToPrint)
80  print('PrimaryDatasetEntries: %d' % self.primaryDatasetEntries)
81 
void print(TMatrixD &m, const char *label=nullptr, bool mathematicaFormat=false)
Definition: Utilities.cc:47

◆ printInfo()

def dataset.BaseDataset.printInfo (   self)

Definition at line 52 of file dataset.py.

References ElectronMVAID.ElectronMVAID.name, AlignableObjectId::entry.name, average.Average.name, counter.Counter.name, histograms.Histograms.name, cond::persistency::RUN_INFO::RUN_NUMBER.name, TmModule.name, cond::persistency::TAG::NAME.name, cond::persistency::GTEditorData.name, cond::persistency::GLOBAL_TAG::NAME.name, cond::persistency::TAG::TIME_TYPE.name, cond::persistency::RUN_INFO::START_TIME.name, cond::persistency::GLOBAL_TAG::VALIDITY.name, cond::persistency::RUN_INFO::END_TIME.name, core.autovars.NTupleVariable.name, cond::persistency::TAG::OBJECT_TYPE.name, cond::persistency::GLOBAL_TAG::DESCRIPTION.name, cond::persistency::TAG::SYNCHRONIZATION.name, cond::persistency::GLOBAL_TAG::RELEASE.name, DQMRivetClient::NormOption.name, cond::persistency::TAG::END_OF_VALIDITY.name, MEPSet.name, cond::persistency::GLOBAL_TAG::SNAPSHOT_TIME.name, cond::persistency::O2O_RUN::JOB_NAME.name, cond::persistency::TAG::DESCRIPTION.name, cms::dd::NameValuePair< T >.name, cond::persistency::GLOBAL_TAG::INSERTION_TIME.name, cond::persistency::O2O_RUN::START_TIME.name, cond::persistency::TAG::LAST_VALIDATED_TIME.name, cond::persistency::O2O_RUN::END_TIME.name, cond::persistency::TAG::INSERTION_TIME.name, FWTGeoRecoGeometry::Info.name, cond::persistency::O2O_RUN::STATUS_CODE.name, cond::persistency::TAG::MODIFICATION_TIME.name, cond::persistency::O2O_RUN::LOG.name, ParameterSet.name, nanoaod::MergeableCounterTable::SingleColumn< T >.name, cond::persistency::TAG::PROTECTION_CODE.name, OutputMEPSet.name, PixelDCSObject< T >::Item.name, dataset.BaseDataset.name, AlignmentConstraint.name, cms::dd::ValuePair< T, U >.name, personalPlayback.Applet.name, Types._Untracked.name, MagCylinder.name, analyzer.Analyzer.name, heppy::ParSet.name, DQMRivetClient::LumiOption.name, cond::persistency::GTProxyData.name, SingleObjectCondition.name, edm::PathTimingSummary.name, DQMRivetClient::ScaleFactorOption.name, cms::DDAlgoArguments.name, EgHLTOfflineSummaryClient::SumHistBinData.name, Barrel.name, perftools::EdmEventSize::BranchRecord.name, core.autovars.NTupleObjectType.name, cond::TimeTypeSpecs.name, EcalLogicID.name, edm::PathSummary.name, lumi::TriggerInfo.name, XMLProcessor::_loaderBaseConfig.name, PixelEndcapLinkMaker::Item.name, MEtoEDM< T >::MEtoEDMObject.name, FWTableViewManager::TableEntry.name, PixelBarrelLinkMaker::Item.name, ExpressionHisto< T >.name, DQMGenericClient::EfficOption.name, TreeCrawler.Package.name, Supermodule.name, cond::persistency::GLOBAL_TAG_MAP::GLOBAL_TAG_NAME.name, cond::persistency::GLOBAL_TAG_MAP::RECORD.name, options.ConnectionHLTMenu.name, cond::persistency::GLOBAL_TAG_MAP::LABEL.name, cms::DDParsingContext::CompositeMaterial.name, cond::persistency::GLOBAL_TAG_MAP::TAG_NAME.name, cond::Tag_t.name, dqmoffline::l1t::HistDefinition.name, DQMGenericClient::ProfileOption.name, magneticfield::BaseVolumeHandle.name, nanoaod::MergeableCounterTable::VectorColumn< T >.name, FastHFShowerLibrary.name, emtf::Node.name, h4DSegm.name, DQMGenericClient::NormOption.name, core.TriggerMatchAnalyzer.TriggerMatchAnalyzer.name, DQMGenericClient::CDOption.name, CounterChecker.name, cond::TagInfo_t.name, TrackerSectorStruct.name, MuonGeometrySanityCheckPoint.name, PhysicsTools::Calibration::Variable.name, cond::persistency::PAYLOAD::HASH.name, DQMGenericClient::NoFlowOption.name, EDMtoMEConverter.name, looper.Looper.name, Mapper::definition< ScannerT >.name, cond::persistency::PAYLOAD::OBJECT_TYPE.name, cond::persistency::PAYLOAD::DATA.name, cond::persistency::PAYLOAD::STREAMER_INFO.name, cond::persistency::PAYLOAD::VERSION.name, cond::persistency::PAYLOAD::INSERTION_TIME.name, classes.MonitorData.name, HistogramManager.name, classes.OutputData.name, BPHDecayToResResBuilderBase::DZSelect.name, Crystal.name, h2DSegm.name, options.HLTProcessOptions.name, cond::persistency::IOV::TAG_NAME.name, cond::persistency::IOV::SINCE.name, cond::persistency::IOV::PAYLOAD_HASH.name, cond::persistency::IOV::INSERTION_TIME.name, DQMNet::WaitObject.name, core.TriggerBitAnalyzer.TriggerBitAnalyzer.name, AlpgenParameterName.name, config.Analyzer.name, geometry.Structure.name, core.autovars.NTupleSubObject.name, Capsule.name, core.autovars.NTupleObject.name, Ceramic.name, SiStripMonitorDigi.name, config.Service.name, BulkSilicon.name, APD.name, core.autovars.NTupleCollection.name, BPHRecoBuilder::BPHRecoSource.name, nanoaod::FlatTable::Column.name, BPHRecoBuilder::BPHCompSource.name, StraightTrackAlignment::RPSetPlots.name, cond::persistency::TAG_AUTHORIZATION::TAG_NAME.name, cond::persistency::TAG_AUTHORIZATION::ACCESS_TYPE.name, cond::persistency::TAG_AUTHORIZATION::CREDENTIAL.name, cond::persistency::TAG_AUTHORIZATION::CREDENTIAL_TYPE.name, InnerLayerVolume.name, cond::payloadInspector::TagReference.name, cond::persistency::TAG_LOG::TAG_NAME.name, cond::persistency::TAG_LOG::EVENT_TIME.name, cond::persistency::TAG_LOG::USER_NAME.name, cond::persistency::TAG_LOG::HOST_NAME.name, cond::persistency::TAG_LOG::COMMAND.name, cond::persistency::TAG_LOG::ACTION.name, cond::persistency::TAG_LOG::USER_TEXT.name, personalPlayback.FrameworkJob.name, Grid.name, trklet::TrackletConfigBuilder::DTCinfo.name, Grille.name, BackPipe.name, plotscripts.SawTeethFunction.name, PatchPanel.name, BackCoolTank.name, DryAirTube.name, crabFunctions.CrabTask.name, MBCoolTube.name, MBManif.name, cscdqm::ParHistoDef.name, hTMaxCell.name, SummaryOutputProducer::GenericSummary.name, print(), EcalTPGParamReaderFromDB.user, dataset.BaseDataset.user, popcon::RpcDataT.user, popcon::RpcObGasData.user, popcon::RPCObPVSSmapData.user, popcon::RpcDataV.user, popcon::RpcDataFebmap.user, popcon::RpcDataUXC.user, popcon::RpcDataS.user, popcon::RpcDataI.user, popcon::RpcDataGasMix.user, EcalDBConnection.user, and MatrixInjector.MatrixInjector.user.

52  def printInfo(self):
53  print('sample : ' + self.name)
54  print('user : ' + self.user)
55 
void print(TMatrixD &m, const char *label=nullptr, bool mathematicaFormat=false)
Definition: Utilities.cc:47
def printInfo(self, event)

Member Data Documentation

◆ bad_files

dataset.BaseDataset.bad_files

◆ dbsInstance

dataset.BaseDataset.dbsInstance

MM.

Definition at line 31 of file dataset.py.

Referenced by dataset.PrivateDataset.getPrimaryDatasetEntries().

◆ files

dataset.BaseDataset.files

◆ filesAndSizes

dataset.BaseDataset.filesAndSizes

Definition at line 46 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

◆ good_files

dataset.BaseDataset.good_files

◆ name

dataset.BaseDataset.name

◆ pattern

dataset.BaseDataset.pattern

Definition at line 28 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

◆ primaryDatasetEntries

dataset.BaseDataset.primaryDatasetEntries

◆ report

dataset.BaseDataset.report

Definition at line 34 of file dataset.py.

Referenced by dataset.Dataset.getPrimaryDatasetEntries(), and addOnTests.testit.run().

◆ run_range

dataset.BaseDataset.run_range

◆ user

dataset.BaseDataset.user