CMS 3D CMS Logo

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Pages
List of all members | Public Member Functions | Public Attributes
dataset.BaseDataset Class Reference
Inheritance diagram for dataset.BaseDataset:
dataset.CMSDataset dataset.Dataset dataset.EOSDataset dataset.LocalDataset dataset.PrivateDataset

Public Member Functions

def __init__
 def init(self, name, user, pattern='. More...
 
def buildListOfBadFiles
 
def buildListOfFiles
 
def extractFileSizes
 
def getPrimaryDatasetEntries
 
def listOfFiles
 
def listOfGoodFiles
 
def listOfGoodFilesWithPrescale
 
def printFiles
 
def printInfo
 

Public Attributes

 bad_files
 
 dbsInstance
 MM. More...
 
 files
 
 filesAndSizes
 
 good_files
 
 name
 
 pattern
 
 primaryDatasetEntries
 MM. More...
 
 report
 
 run_range
 
 user
 

Detailed Description

Definition at line 19 of file dataset.py.

Constructor & Destructor Documentation

def dataset.BaseDataset.__init__ (   self,
  name,
  user,
  pattern = '.*root',
  run_range = None,
  dbsInstance = None 
)

def init(self, name, user, pattern='.

*root', run_range=None):

Definition at line 22 of file dataset.py.

22 
23  def __init__(self, name, user, pattern='.*root', run_range=None, dbsInstance=None):
24  self.name = name
25  self.user = user
26  self.pattern = pattern
27  self.run_range = run_range
28  ### MM
29  self.dbsInstance = dbsInstance
30  ### MM
32  self.report = None
33  self.buildListOfFiles( self.pattern )
34  self.extractFileSizes()
35  self.buildListOfBadFiles()
def getPrimaryDatasetEntries
Definition: dataset.py:53
def __init__
def init(self, name, user, pattern='.
Definition: dataset.py:22
primaryDatasetEntries
MM.
Definition: dataset.py:30
def buildListOfBadFiles
Definition: dataset.py:45

Member Function Documentation

def dataset.BaseDataset.buildListOfBadFiles (   self)

Definition at line 45 of file dataset.py.

45 
47  self.good_files = []
48  self.bad_files = {}
def buildListOfBadFiles
Definition: dataset.py:45
def dataset.BaseDataset.buildListOfFiles (   self,
  pattern 
)

Definition at line 37 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

37 
38  def buildListOfFiles( self, pattern ):
39  self.files = []
def dataset.BaseDataset.extractFileSizes (   self)
Get the file size for each file, 
from the eos ls -l command.

Definition at line 40 of file dataset.py.

40 
41  def extractFileSizes(self):
42  '''Get the file size for each file,
43  from the eos ls -l command.'''
44  self.filesAndSizes = {}
def dataset.BaseDataset.getPrimaryDatasetEntries (   self)

Definition at line 53 of file dataset.py.

References dataset.BaseDataset.primaryDatasetEntries.

53 
54  def getPrimaryDatasetEntries(self):
55  return self.primaryDatasetEntries
def getPrimaryDatasetEntries
Definition: dataset.py:53
primaryDatasetEntries
MM.
Definition: dataset.py:30
def dataset.BaseDataset.listOfFiles (   self)
Returns all files, even the bad ones.

Definition at line 79 of file dataset.py.

References readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, geometryComparison.GeometryComparison.files, and JsonOutputProducer::JsonConfigurationBlock.files.

79 
80  def listOfFiles(self):
81  '''Returns all files, even the bad ones.'''
82  return self.files
def dataset.BaseDataset.listOfGoodFiles (   self)
Returns all files flagged as good in the integrity 
check text output, or not present in this file, are 
considered as good.

Definition at line 83 of file dataset.py.

References dataset.BaseDataset.bad_files, readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, geometryComparison.GeometryComparison.files, JsonOutputProducer::JsonConfigurationBlock.files, and dataset.BaseDataset.good_files.

Referenced by dataset.BaseDataset.listOfGoodFilesWithPrescale().

83 
84  def listOfGoodFiles(self):
85  '''Returns all files flagged as good in the integrity
86  check text output, or not present in this file, are
87  considered as good.'''
88  self.good_files = []
89  for file in self.files:
90  if file not in self.bad_files:
91  self.good_files.append( file )
92  return self.good_files
def dataset.BaseDataset.listOfGoodFilesWithPrescale (   self,
  prescale 
)
Takes the list of good files and selects a random sample 
from them according to the prescale factor. 
E.g. a prescale of 10 will select 1 in 10 files.

Definition at line 93 of file dataset.py.

References dataset.BaseDataset.good_files, and dataset.BaseDataset.listOfGoodFiles().

93 
94  def listOfGoodFilesWithPrescale(self, prescale):
95  """Takes the list of good files and selects a random sample
96  from them according to the prescale factor.
97  E.g. a prescale of 10 will select 1 in 10 files."""
98 
99  good_files = self.listOfGoodFiles()
100  if prescale < 2:
101  return self.good_files
102 
103  #the number of files to select from the dataset
104  num_files = int( (len(good_files)/(1.0*prescale)) + 0.5)
105  if num_files < 1:
106  num_files = 1
107  if num_files > len(good_files):
108  num_files = len(good_files)
109 
110  #pick unique good files randomly
111  import random
112  subset = set()
113  while len(subset) < num_files:
114  #pick a random file from the list
115  choice = random.choice(good_files)
116  slen = len(subset)
117  #add to the set
118  subset.add(choice)
119  #if this was a unique file remove so we don't get
120  #very slow corner cases where prescale is small
121  if len(subset) > slen:
122  good_files.remove(choice)
123  assert len(subset)==num_files,'The number of files does not match'
124 
125  return [f for f in subset]
def listOfGoodFilesWithPrescale
Definition: dataset.py:93
def dataset.BaseDataset.printFiles (   self,
  abspath = True,
  info = True 
)

Definition at line 56 of file dataset.py.

References dataset.BaseDataset.bad_files, dataset.BaseDataset.buildListOfFiles(), readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, geometryComparison.GeometryComparison.files, JsonOutputProducer::JsonConfigurationBlock.files, dataset.BaseDataset.good_files, dataset.BaseDataset.pattern, CSCALCT.pattern, additionalparser.AdditionalData.pattern, produceOfflineValidationTex.PageLayout.pattern, L1TMuon::TriggerPrimitive::CSCData.pattern, CSCCathodeLCTProcessor.pattern, CSCCorrelatedLCTDigi.pattern, l1t::EMTFHit.pattern, cscdqm::MOFilterItem.pattern, and dataset.BaseDataset.primaryDatasetEntries.

56 
57  def printFiles(self, abspath=True, info=True):
58  # import pdb; pdb.set_trace()
59  if self.files == None:
60  self.buildListOfFiles(self.pattern)
61  for file in self.files:
62  status = 'OK'
63  if file in self.bad_files:
64  status = self.bad_files[file]
65  elif file not in self.good_files:
66  status = 'UNKNOWN'
67  fileNameToPrint = file
68  if abspath == False:
69  fileNameToPrint = os.path.basename(file)
70  if info:
71  size=self.filesAndSizes.get(file,'UNKNOWN').rjust(10)
72  # if size is not None:
73  # size = size.rjust(10)
74  print status.ljust(10), size, \
75  '\t', fileNameToPrint
76  else:
77  print fileNameToPrint
78  print 'PrimaryDatasetEntries: %d' % self.primaryDatasetEntries
primaryDatasetEntries
MM.
Definition: dataset.py:30
def dataset.BaseDataset.printInfo (   self)

Definition at line 49 of file dataset.py.

References ElectronMVAID.ElectronMVAID.name, counter.Counter.name, entry.name, average.Average.name, geometrydata.GeometryData.name, histograms.Histograms.name, TmModule.name, cond::persistency::TAG::NAME.name, cond::persistency::GLOBAL_TAG::NAME.name, core.autovars.NTupleVariable.name, cond::persistency::TAG::TIME_TYPE.name, cond::persistency::GLOBAL_TAG::VALIDITY.name, cond::persistency::TAG::OBJECT_TYPE.name, genericValidation.GenericValidation.name, cond::persistency::GLOBAL_TAG::DESCRIPTION.name, cond::persistency::TAG::SYNCHRONIZATION.name, preexistingValidation.PreexistingValidation.name, cond::persistency::GLOBAL_TAG::RELEASE.name, ora::RecordSpecImpl::Item.name, MEPSet.name, cond::persistency::TAG::END_OF_VALIDITY.name, cond::persistency::GLOBAL_TAG::SNAPSHOT_TIME.name, cond::persistency::TAG::DESCRIPTION.name, cond::persistency::GTEditorData.name, cond::persistency::GLOBAL_TAG::INSERTION_TIME.name, cond::persistency::TAG::LAST_VALIDATED_TIME.name, FWTGeoRecoGeometry::Info.name, Types._Untracked.name, cond::persistency::TAG::INSERTION_TIME.name, cond::persistency::TAG::MODIFICATION_TIME.name, dataset.BaseDataset.name, OutputMEPSet.name, personalPlayback.Applet.name, ParameterSet.name, PixelDCSObject< class >::Item.name, analyzer.Analyzer.name, DQMRivetClient::LumiOption.name, MagCylinder.name, alignment.Alignment.name, ParSet.name, DQMRivetClient::ScaleFactorOption.name, SingleObjectCondition.name, EgHLTOfflineSummaryClient::SumHistBinData.name, XMLHTRZeroSuppressionLoader::_loaderBaseConfig.name, XMLRBXPedestalsLoader::_loaderBaseConfig.name, DQMGenericClient::EfficOption.name, cond::persistency::GTProxyData.name, core.autovars.NTupleObjectType.name, o2o.O2OJob.name, MyWatcher.name, edm::PathTimingSummary.name, lumi::TriggerInfo.name, cond::TimeTypeSpecs.name, edm::PathSummary.name, PixelEndcapLinkMaker::Item.name, perftools::EdmEventSize::BranchRecord.name, cond::persistency::GLOBAL_TAG_MAP::GLOBAL_TAG_NAME.name, FWTableViewManager::TableEntry.name, cond::persistency::GLOBAL_TAG_MAP::RECORD.name, PixelBarrelLinkMaker::Item.name, Mapper::definition< ScannerT >.name, EcalLogicID.name, cond::persistency::GLOBAL_TAG_MAP::LABEL.name, cond::persistency::GLOBAL_TAG_MAP::TAG_NAME.name, ExpressionHisto< T >.name, XMLProcessor::_loaderBaseConfig.name, DQMGenericClient::ProfileOption.name, cond::persistency::PAYLOAD::HASH.name, TreeCrawler.Package.name, cond::persistency::PAYLOAD::OBJECT_TYPE.name, cond::persistency::PAYLOAD::DATA.name, cond::persistency::PAYLOAD::STREAMER_INFO.name, cond::persistency::PAYLOAD::VERSION.name, MagGeoBuilderFromDDD::volumeHandle.name, cond::persistency::PAYLOAD::INSERTION_TIME.name, options.ConnectionHLTMenu.name, DQMGenericClient::NormOption.name, emtf::Node.name, DQMGenericClient::CDOption.name, FastHFShowerLibrary.name, h4DSegm.name, PhysicsTools::Calibration::Variable.name, cond::TagInfo_t.name, CounterChecker.name, EDMtoMEConverter.name, looper.Looper.name, MEtoEDM< T >::MEtoEDMObject.name, cond::persistency::IOV::TAG_NAME.name, TrackerSectorStruct.name, cond::persistency::IOV::SINCE.name, cond::persistency::IOV::PAYLOAD_HASH.name, cond::persistency::IOV::INSERTION_TIME.name, classes.MonitorData.name, MuonGeometrySanityCheckPoint.name, classes.OutputData.name, options.HLTProcessOptions.name, h2DSegm.name, config.Analyzer.name, core.autovars.NTupleSubObject.name, DQMNet::WaitObject.name, AlpgenParameterName.name, SiStripMonitorDigi.name, core.autovars.NTupleObject.name, geometry.Structure.name, cond::persistency::TAG_LOG::TAG_NAME.name, cond::persistency::TAG_LOG::EVENT_TIME.name, cond::persistency::TAG_LOG::USER_NAME.name, cond::persistency::TAG_LOG::HOST_NAME.name, cond::persistency::TAG_LOG::COMMAND.name, cond::persistency::TAG_LOG::ACTION.name, cond::persistency::TAG_LOG::USER_TEXT.name, config.Service.name, core.autovars.NTupleCollection.name, FastTimerService::LuminosityDescription.name, personalPlayback.FrameworkJob.name, plotscripts.SawTeethFunction.name, FastTimerService::ProcessDescription.name, hTMaxCell.name, cscdqm::ParHistoDef.name, BeautifulSoup.Tag.name, SummaryOutputProducer::GenericSummary.name, TiXmlAttribute.name, BeautifulSoup.SoupStrainer.name, dataset.BaseDataset.user, EcalTPGParamReaderFromDB.user, popcon::RpcDataT.user, popcon::RPCObPVSSmapData.user, popcon::RpcObGasData.user, popcon::RpcDataV.user, popcon::RpcDataGasMix.user, popcon::RpcDataS.user, popcon::RpcDataI.user, popcon::RpcDataUXC.user, popcon::RpcDataFebmap.user, MatrixInjector.MatrixInjector.user, EcalDBConnection.user, and conddblib.TimeType.user.

49 
50  def printInfo(self):
51  print 'sample : ' + self.name
52  print 'user : ' + self.user

Member Data Documentation

dataset.BaseDataset.bad_files

Definition at line 47 of file dataset.py.

Referenced by dataset.BaseDataset.listOfGoodFiles(), and dataset.BaseDataset.printFiles().

dataset.BaseDataset.dbsInstance

MM.

Definition at line 28 of file dataset.py.

Referenced by dataset.PrivateDataset.getPrimaryDatasetEntries().

dataset.BaseDataset.files

Definition at line 38 of file dataset.py.

Referenced by dataset.BaseDataset.listOfFiles(), dataset.BaseDataset.listOfGoodFiles(), and dataset.BaseDataset.printFiles().

dataset.BaseDataset.filesAndSizes

Definition at line 43 of file dataset.py.

dataset.BaseDataset.good_files

Definition at line 46 of file dataset.py.

Referenced by dataset.BaseDataset.listOfGoodFiles(), dataset.BaseDataset.listOfGoodFilesWithPrescale(), and dataset.BaseDataset.printFiles().

dataset.BaseDataset.name

Definition at line 23 of file dataset.py.

Referenced by dirstructure.Directory.__create_pie_image(), dqm_interfaces.DirID.__eq__(), dirstructure.Directory.__get_full_path(), dirstructure.Comparison.__get_img_name(), dataset.Dataset.__getDataType(), dataset.Dataset.__getFileInfoList(), cuy.divideElement.__init__(), cuy.plotElement.__init__(), cuy.additionElement.__init__(), cuy.superimposeElement.__init__(), cuy.graphElement.__init__(), dirstructure.Comparison.__make_image(), dirstructure.Directory.__repr__(), dqm_interfaces.DirID.__repr__(), dirstructure.Comparison.__repr__(), config.CFG.__str__(), dirstructure.Directory.calcStats(), validation.Sample.digest(), python.rootplot.utilities.Hist.divide(), python.rootplot.utilities.Hist.divide_wilson(), utils.StatisticalTest.get_status(), production_tasks.Task.getname(), dataset.CMSDataset.getPrimaryDatasetEntries(), dataset.PrivateDataset.getPrimaryDatasetEntries(), VIDSelectorBase.VIDSelectorBase.initialize(), dirstructure.Directory.print_report(), dataset.BaseDataset.printInfo(), dataset.Dataset.printInfo(), production_tasks.MonitorJobs.run(), python.rootplot.utilities.Hist.TGraph(), python.rootplot.utilities.Hist.TH1F(), and Vispa.Views.PropertyView.Property.valueChanged().

dataset.BaseDataset.pattern

Definition at line 25 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

dataset.BaseDataset.primaryDatasetEntries

MM.

Definition at line 30 of file dataset.py.

Referenced by dataset.BaseDataset.getPrimaryDatasetEntries(), and dataset.BaseDataset.printFiles().

dataset.BaseDataset.report

Definition at line 31 of file dataset.py.

Referenced by dataset.Dataset.getPrimaryDatasetEntries(), and addOnTests.testit.run().

dataset.BaseDataset.run_range

Definition at line 26 of file dataset.py.

Referenced by dataset.CMSDataset.buildListOfFiles(), dataset.CMSDataset.buildListOfFilesDBS(), dataset.CMSDataset.getPrimaryDatasetEntries(), and dataset.PrivateDataset.getPrimaryDatasetEntries().

dataset.BaseDataset.user

Definition at line 24 of file dataset.py.

Referenced by cmsPerfSuite.PerfSuite.optionParse(), dataset.BaseDataset.printInfo(), production_tasks.CheckDatasetExists.run(), production_tasks.GenerateMask.run(), production_tasks.SourceCFG.run(), production_tasks.FullCFG.run(), production_tasks.MonitorJobs.run(), and production_tasks.CleanJobFiles.run().