CMS 3D CMS Logo

List of all members | Public Member Functions | Public Attributes
dataset.BaseDataset Class Reference
Inheritance diagram for dataset.BaseDataset:
dataset.CMSDataset dataset.Dataset dataset.EOSDataset dataset.LocalDataset dataset.PrivateDataset

Public Member Functions

def __init__ (self, name, user, pattern='.*root', run_range=None, dbsInstance=None)
 def init(self, name, user, pattern='. More...
 
def buildListOfBadFiles (self)
 
def buildListOfFiles (self, pattern)
 
def extractFileSizes (self)
 
def getPrimaryDatasetEntries (self)
 
def listOfFiles (self)
 
def listOfGoodFiles (self)
 
def listOfGoodFilesWithPrescale (self, prescale)
 
def printFiles (self, abspath=True, info=True)
 
def printInfo (self)
 

Public Attributes

 bad_files
 
 dbsInstance
 MM. More...
 
 files
 
 filesAndSizes
 
 good_files
 
 name
 
 pattern
 
 primaryDatasetEntries
 MM. More...
 
 report
 
 run_range
 
 user
 

Detailed Description

Definition at line 21 of file dataset.py.

Constructor & Destructor Documentation

def dataset.BaseDataset.__init__ (   self,
  name,
  user,
  pattern = '.*root',
  run_range = None,
  dbsInstance = None 
)

def init(self, name, user, pattern='.

*root', run_range=None):

Definition at line 24 of file dataset.py.

24  def __init__(self, name, user, pattern='.*root', run_range=None, dbsInstance=None):
25  self.name = name
26  self.user = user
27  self.pattern = pattern
28  self.run_range = run_range
29  ### MM
30  self.dbsInstance = dbsInstance
31  ### MM
33  self.report = None
34  self.buildListOfFiles( self.pattern )
35  self.extractFileSizes()
36  self.buildListOfBadFiles()
38 
def __init__(self, name, user, pattern='.*root', run_range=None, dbsInstance=None)
def init(self, name, user, pattern='.
Definition: dataset.py:24
def buildListOfFiles(self, pattern)
Definition: dataset.py:39
def getPrimaryDatasetEntries(self)
Definition: dataset.py:55
def extractFileSizes(self)
Definition: dataset.py:42
primaryDatasetEntries
MM.
Definition: dataset.py:32
def buildListOfBadFiles(self)
Definition: dataset.py:47

Member Function Documentation

def dataset.BaseDataset.buildListOfBadFiles (   self)

Definition at line 47 of file dataset.py.

48  self.good_files = []
49  self.bad_files = {}
50 
def buildListOfBadFiles(self)
Definition: dataset.py:47
def dataset.BaseDataset.buildListOfFiles (   self,
  pattern 
)

Definition at line 39 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

39  def buildListOfFiles( self, pattern ):
40  self.files = []
41 
def buildListOfFiles(self, pattern)
Definition: dataset.py:39
def dataset.BaseDataset.extractFileSizes (   self)
Get the file size for each file, 
from the eos ls -l command.

Definition at line 42 of file dataset.py.

42  def extractFileSizes(self):
43  '''Get the file size for each file,
44  from the eos ls -l command.'''
45  self.filesAndSizes = {}
46 
def extractFileSizes(self)
Definition: dataset.py:42
def dataset.BaseDataset.getPrimaryDatasetEntries (   self)

Definition at line 55 of file dataset.py.

References dataset.BaseDataset.primaryDatasetEntries.

56  return self.primaryDatasetEntries
57 
def getPrimaryDatasetEntries(self)
Definition: dataset.py:55
primaryDatasetEntries
MM.
Definition: dataset.py:32
def dataset.BaseDataset.listOfFiles (   self)
Returns all files, even the bad ones.

Definition at line 81 of file dataset.py.

References readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, and JsonOutputProducer::JsonConfigurationBlock.files.

81  def listOfFiles(self):
82  '''Returns all files, even the bad ones.'''
83  return self.files
84 
def listOfFiles(self)
Definition: dataset.py:81
def dataset.BaseDataset.listOfGoodFiles (   self)
Returns all files flagged as good in the integrity 
check text output, or not present in this file, are 
considered as good.

Definition at line 85 of file dataset.py.

References dataset.BaseDataset.bad_files, readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, JsonOutputProducer::JsonConfigurationBlock.files, and dataset.BaseDataset.good_files.

Referenced by dataset.BaseDataset.listOfGoodFilesWithPrescale().

85  def listOfGoodFiles(self):
86  '''Returns all files flagged as good in the integrity
87  check text output, or not present in this file, are
88  considered as good.'''
89  self.good_files = []
90  for file in self.files:
91  if file not in self.bad_files:
92  self.good_files.append( file )
93  return self.good_files
94 
def listOfGoodFiles(self)
Definition: dataset.py:85
def dataset.BaseDataset.listOfGoodFilesWithPrescale (   self,
  prescale 
)
Takes the list of good files and selects a random sample 
from them according to the prescale factor. 
E.g. a prescale of 10 will select 1 in 10 files.

Definition at line 95 of file dataset.py.

References dataset.BaseDataset.good_files, dataset.int, and dataset.BaseDataset.listOfGoodFiles().

95  def listOfGoodFilesWithPrescale(self, prescale):
96  """Takes the list of good files and selects a random sample
97  from them according to the prescale factor.
98  E.g. a prescale of 10 will select 1 in 10 files."""
99 
100  good_files = self.listOfGoodFiles()
101  if prescale < 2:
102  return self.good_files
103 
104  #the number of files to select from the dataset
105  num_files = int( (len(good_files)/(1.0*prescale)) + 0.5)
106  if num_files < 1:
107  num_files = 1
108  if num_files > len(good_files):
109  num_files = len(good_files)
110 
111  #pick unique good files randomly
112  import random
113  subset = set()
114  while len(subset) < num_files:
115  #pick a random file from the list
116  choice = random.choice(good_files)
117  slen = len(subset)
118  #add to the set
119  subset.add(choice)
120  #if this was a unique file remove so we don't get
121  #very slow corner cases where prescale is small
122  if len(subset) > slen:
123  good_files.remove(choice)
124  assert len(subset)==num_files,'The number of files does not match'
125 
126  return [f for f in subset]
127 
def listOfGoodFiles(self)
Definition: dataset.py:85
def listOfGoodFilesWithPrescale(self, prescale)
Definition: dataset.py:95
def dataset.BaseDataset.printFiles (   self,
  abspath = True,
  info = True 
)

Definition at line 58 of file dataset.py.

References dataset.BaseDataset.bad_files, dataset.BaseDataset.buildListOfFiles(), readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, JsonOutputProducer::JsonConfigurationBlock.files, dataset.BaseDataset.good_files, dataset.BaseDataset.pattern, CSCALCT.pattern, presentation.PageLayout.pattern, l1t::EMTFRoad.pattern, L1TMuon::TriggerPrimitive::CSCData.pattern, CSCCorrelatedLCTDigi.pattern, cscdqm::MOFilterItem.pattern, l1t::EMTFHit.pattern, dataset.BaseDataset.primaryDatasetEntries, and edm.print().

58  def printFiles(self, abspath=True, info=True):
59  # import pdb; pdb.set_trace()
60  if self.files == None:
61  self.buildListOfFiles(self.pattern)
62  for file in self.files:
63  status = 'OK'
64  if file in self.bad_files:
65  status = self.bad_files[file]
66  elif file not in self.good_files:
67  status = 'UNKNOWN'
68  fileNameToPrint = file
69  if abspath == False:
70  fileNameToPrint = os.path.basename(file)
71  if info:
72  size=self.filesAndSizes.get(file,'UNKNOWN').rjust(10)
73  # if size is not None:
74  # size = size.rjust(10)
75  print(status.ljust(10), size, \
76  '\t', fileNameToPrint)
77  else:
78  print(fileNameToPrint)
79  print('PrimaryDatasetEntries: %d' % self.primaryDatasetEntries)
80 
S & print(S &os, JobReport::InputFile const &f)
Definition: JobReport.cc:66
def buildListOfFiles(self, pattern)
Definition: dataset.py:39
primaryDatasetEntries
MM.
Definition: dataset.py:32
def printFiles(self, abspath=True, info=True)
Definition: dataset.py:58
def dataset.BaseDataset.printInfo (   self)

Definition at line 51 of file dataset.py.

References ElectronMVAID.ElectronMVAID.name, counter.Counter.name, average.Average.name, histograms.Histograms.name, AlignableObjectId::entry.name, TmModule.name, cond::persistency::TAG::NAME.name, cond::persistency::GLOBAL_TAG::NAME.name, core.autovars.NTupleVariable.name, cond::persistency::RUN_INFO::RUN_NUMBER.name, cond::persistency::TAG::TIME_TYPE.name, cond::persistency::GLOBAL_TAG::VALIDITY.name, cond::persistency::RUN_INFO::START_TIME.name, cond::persistency::TAG::OBJECT_TYPE.name, cond::persistency::GLOBAL_TAG::DESCRIPTION.name, cond::persistency::RUN_INFO::END_TIME.name, cond::persistency::TAG::SYNCHRONIZATION.name, cond::persistency::GLOBAL_TAG::RELEASE.name, MEPSet.name, cond::persistency::TAG::END_OF_VALIDITY.name, cond::persistency::GLOBAL_TAG::SNAPSHOT_TIME.name, cond::persistency::GTEditorData.name, cond::persistency::TAG::DESCRIPTION.name, cond::persistency::GLOBAL_TAG::INSERTION_TIME.name, nanoaod::MergeableCounterTable::SingleColumn< T >.name, cond::persistency::TAG::LAST_VALIDATED_TIME.name, cond::persistency::TAG::INSERTION_TIME.name, cond::persistency::TAG::MODIFICATION_TIME.name, preexistingValidation.PreexistingValidation.name, FWTGeoRecoGeometry::Info.name, Types._Untracked.name, dataset.BaseDataset.name, OutputMEPSet.name, personalPlayback.Applet.name, ParameterSet.name, PixelDCSObject< T >::Item.name, DQMRivetClient::LumiOption.name, MagCylinder.name, analyzer.Analyzer.name, ParSet.name, DQMRivetClient::ScaleFactorOption.name, EgHLTOfflineSummaryClient::SumHistBinData.name, SingleObjectCondition.name, cond::persistency::GTProxyData.name, core.autovars.NTupleObjectType.name, MyWatcher.name, edm::PathTimingSummary.name, nanoaod::MergeableCounterTable::VectorColumn< T >.name, cond::TimeTypeSpecs.name, lumi::TriggerInfo.name, alignment.Alignment.name, edm::PathSummary.name, PixelEndcapLinkMaker::Item.name, perftools::EdmEventSize::BranchRecord.name, cond::persistency::GLOBAL_TAG_MAP::GLOBAL_TAG_NAME.name, DQMGenericClient::EfficOption.name, FWTableViewManager::TableEntry.name, cond::persistency::GLOBAL_TAG_MAP::RECORD.name, PixelBarrelLinkMaker::Item.name, cms::DDAlgoArguments.name, EcalLogicID.name, cond::persistency::GLOBAL_TAG_MAP::LABEL.name, validateAlignments.ParallelMergeJob.name, MEtoEDM< T >::MEtoEDMObject.name, cond::persistency::GLOBAL_TAG_MAP::TAG_NAME.name, ExpressionHisto< T >.name, XMLProcessor::_loaderBaseConfig.name, cond::persistency::PAYLOAD::HASH.name, cond::persistency::PAYLOAD::OBJECT_TYPE.name, cond::persistency::PAYLOAD::DATA.name, genericValidation.GenericValidation.name, TreeCrawler.Package.name, cond::persistency::PAYLOAD::STREAMER_INFO.name, cond::persistency::PAYLOAD::VERSION.name, options.ConnectionHLTMenu.name, MagGeoBuilderFromDDD::volumeHandle.name, cond::persistency::PAYLOAD::INSERTION_TIME.name, DQMGenericClient::ProfileOption.name, dqmoffline::l1t::HistDefinition.name, DQMGenericClient::NormOption.name, emtf::Node.name, h4DSegm.name, PhysicsTools::Calibration::Variable.name, FastHFShowerLibrary.name, core.TriggerMatchAnalyzer.TriggerMatchAnalyzer.name, DQMGenericClient::CDOption.name, CounterChecker.name, cond::TagInfo_t.name, looper.Looper.name, DQMGenericClient::NoFlowOption.name, cond::persistency::IOV::TAG_NAME.name, EDMtoMEConverter.name, cond::persistency::IOV::SINCE.name, TrackerSectorStruct.name, Mapper::definition< ScannerT >.name, cond::persistency::IOV::PAYLOAD_HASH.name, classes.MonitorData.name, cond::persistency::IOV::INSERTION_TIME.name, HistogramManager.name, MuonGeometrySanityCheckPoint.name, classes.OutputData.name, options.HLTProcessOptions.name, h2DSegm.name, core.TriggerBitAnalyzer.TriggerBitAnalyzer.name, nanoaod::FlatTable::Column.name, geometry.Structure.name, config.Analyzer.name, core.autovars.NTupleSubObject.name, DQMNet::WaitObject.name, AlpgenParameterName.name, SiStripMonitorDigi.name, core.autovars.NTupleObject.name, config.Service.name, cond::persistency::TAG_LOG::TAG_NAME.name, cond::persistency::TAG_LOG::EVENT_TIME.name, cond::persistency::TAG_LOG::USER_NAME.name, cond::persistency::TAG_LOG::HOST_NAME.name, cond::persistency::TAG_LOG::COMMAND.name, cond::persistency::TAG_LOG::ACTION.name, cond::persistency::TAG_LOG::USER_TEXT.name, core.autovars.NTupleCollection.name, BPHRecoBuilder::BPHRecoSource.name, BPHRecoBuilder::BPHCompSource.name, personalPlayback.FrameworkJob.name, plotscripts.SawTeethFunction.name, crabFunctions.CrabTask.name, hTMaxCell.name, cscdqm::ParHistoDef.name, BeautifulSoup.Tag.name, SummaryOutputProducer::GenericSummary.name, BeautifulSoup.SoupStrainer.name, edm.print(), dataset.BaseDataset.user, EcalTPGParamReaderFromDB.user, popcon::RpcDataV.user, popcon::RpcObGasData.user, popcon::RPCObPVSSmapData.user, popcon::RpcDataT.user, popcon::RpcDataS.user, popcon::RpcDataFebmap.user, popcon::RpcDataGasMix.user, popcon::RpcDataI.user, popcon::RpcDataUXC.user, MatrixInjector.MatrixInjector.user, and EcalDBConnection.user.

51  def printInfo(self):
52  print('sample : ' + self.name)
53  print('user : ' + self.user)
54 
def printInfo(self)
Definition: dataset.py:51
S & print(S &os, JobReport::InputFile const &f)
Definition: JobReport.cc:66

Member Data Documentation

dataset.BaseDataset.bad_files
dataset.BaseDataset.dbsInstance

MM.

Definition at line 30 of file dataset.py.

Referenced by dataset.PrivateDataset.getPrimaryDatasetEntries().

dataset.BaseDataset.files
dataset.BaseDataset.filesAndSizes

Definition at line 45 of file dataset.py.

dataset.BaseDataset.good_files
dataset.BaseDataset.name
dataset.BaseDataset.pattern

Definition at line 27 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

dataset.BaseDataset.primaryDatasetEntries
dataset.BaseDataset.report

Definition at line 33 of file dataset.py.

Referenced by dataset.Dataset.getPrimaryDatasetEntries(), and addOnTests.testit.run().

dataset.BaseDataset.run_range
dataset.BaseDataset.user