CMS 3D CMS Logo

List of all members | Public Member Functions | Public Attributes
dataset.BaseDataset Class Reference
Inheritance diagram for dataset.BaseDataset:
dataset.CMSDataset dataset.Dataset dataset.EOSDataset dataset.LocalDataset dataset.PrivateDataset

Public Member Functions

def __init__ (self, name, user, pattern='.*root', run_range=None, dbsInstance=None)
 def init(self, name, user, pattern='. More...
 
def buildListOfBadFiles (self)
 
def buildListOfFiles (self, pattern)
 
def extractFileSizes (self)
 
def getPrimaryDatasetEntries (self)
 
def listOfFiles (self)
 
def listOfGoodFiles (self)
 
def listOfGoodFilesWithPrescale (self, prescale)
 
def printFiles (self, abspath=True, info=True)
 
def printInfo (self)
 

Public Attributes

 bad_files
 
 dbsInstance
 MM. More...
 
 files
 
 filesAndSizes
 
 good_files
 
 name
 
 pattern
 
 primaryDatasetEntries
 MM. More...
 
 report
 
 run_range
 
 user
 

Detailed Description

Definition at line 23 of file dataset.py.

Constructor & Destructor Documentation

◆ __init__()

def dataset.BaseDataset.__init__ (   self,
  name,
  user,
  pattern = '.*root',
  run_range = None,
  dbsInstance = None 
)

def init(self, name, user, pattern='.

*root', run_range=None):

Definition at line 26 of file dataset.py.

26  def __init__(self, name, user, pattern='.*root', run_range=None, dbsInstance=None):
27  self.name = name
28  self.user = user
29  self.pattern = pattern
30  self.run_range = run_range
31 
32  self.dbsInstance = dbsInstance
33 
34  self.primaryDatasetEntries = -1
35  self.report = None
36  self.buildListOfFiles( self.pattern )
37  self.extractFileSizes()
38  self.buildListOfBadFiles()
39  self.primaryDatasetEntries = self.getPrimaryDatasetEntries()
40 

Member Function Documentation

◆ buildListOfBadFiles()

def dataset.BaseDataset.buildListOfBadFiles (   self)

Reimplemented in dataset.Dataset.

Definition at line 49 of file dataset.py.

49  def buildListOfBadFiles(self):
50  self.good_files = []
51  self.bad_files = {}
52 

◆ buildListOfFiles()

def dataset.BaseDataset.buildListOfFiles (   self,
  pattern 
)

Reimplemented in dataset.PrivateDataset, dataset.Dataset, dataset.EOSDataset, dataset.LocalDataset, and dataset.CMSDataset.

Definition at line 41 of file dataset.py.

41  def buildListOfFiles( self, pattern ):
42  self.files = []
43 

Referenced by dataset.BaseDataset.printFiles().

◆ extractFileSizes()

def dataset.BaseDataset.extractFileSizes (   self)
Get the file size for each file, 
from the eos ls -l command.

Reimplemented in dataset.Dataset.

Definition at line 44 of file dataset.py.

44  def extractFileSizes(self):
45  '''Get the file size for each file,
46  from the eos ls -l command.'''
47  self.filesAndSizes = {}
48 

◆ getPrimaryDatasetEntries()

def dataset.BaseDataset.getPrimaryDatasetEntries (   self)

Reimplemented in dataset.PrivateDataset, dataset.Dataset, and dataset.CMSDataset.

Definition at line 57 of file dataset.py.

57  def getPrimaryDatasetEntries(self):
58  return self.primaryDatasetEntries
59 

References dataset.BaseDataset.primaryDatasetEntries.

◆ listOfFiles()

def dataset.BaseDataset.listOfFiles (   self)
Returns all files, even the bad ones.

Definition at line 83 of file dataset.py.

83  def listOfFiles(self):
84  '''Returns all files, even the bad ones.'''
85  return self.files
86 

References readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, and JsonOutputProducer::JsonConfigurationBlock.files.

◆ listOfGoodFiles()

def dataset.BaseDataset.listOfGoodFiles (   self)
Returns all files flagged as good in the integrity 
check text output, or not present in this file, are 
considered as good.

Definition at line 87 of file dataset.py.

87  def listOfGoodFiles(self):
88  '''Returns all files flagged as good in the integrity
89  check text output, or not present in this file, are
90  considered as good.'''
91  self.good_files = []
92  for file in self.files:
93  if file not in self.bad_files:
94  self.good_files.append( file )
95  return self.good_files
96 

References mps_setup.append, dataset.BaseDataset.bad_files, readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, JsonOutputProducer::JsonConfigurationBlock.files, and dataset.BaseDataset.good_files.

Referenced by dataset.BaseDataset.listOfGoodFilesWithPrescale().

◆ listOfGoodFilesWithPrescale()

def dataset.BaseDataset.listOfGoodFilesWithPrescale (   self,
  prescale 
)
Takes the list of good files and selects a random sample 
from them according to the prescale factor. 
E.g. a prescale of 10 will select 1 in 10 files.

Definition at line 97 of file dataset.py.

97  def listOfGoodFilesWithPrescale(self, prescale):
98  """Takes the list of good files and selects a random sample
99  from them according to the prescale factor.
100  E.g. a prescale of 10 will select 1 in 10 files."""
101 
102  good_files = self.listOfGoodFiles()
103  if prescale < 2:
104  return self.good_files
105 
106  #the number of files to select from the dataset
107  num_files = int( (len(good_files)/(1.0*prescale)) + 0.5)
108  if num_files < 1:
109  num_files = 1
110  if num_files > len(good_files):
111  num_files = len(good_files)
112 
113  #pick unique good files randomly
114  import random
115  subset = set()
116  while len(subset) < num_files:
117  #pick a random file from the list
118  choice = random.choice(good_files)
119  slen = len(subset)
120  #add to the set
121  subset.add(choice)
122  #if this was a unique file remove so we don't get
123  #very slow corner cases where prescale is small
124  if len(subset) > slen:
125  good_files.remove(choice)
126  assert len(subset)==num_files,'The number of files does not match'
127 
128  return [f for f in subset]
129 

References dataset.BaseDataset.good_files, dataset.int, and dataset.BaseDataset.listOfGoodFiles().

◆ printFiles()

def dataset.BaseDataset.printFiles (   self,
  abspath = True,
  info = True 
)

Definition at line 60 of file dataset.py.

60  def printFiles(self, abspath=True, info=True):
61  # import pdb; pdb.set_trace()
62  if self.files == None:
63  self.buildListOfFiles(self.pattern)
64  for file in self.files:
65  status = 'OK'
66  if file in self.bad_files:
67  status = self.bad_files[file]
68  elif file not in self.good_files:
69  status = 'UNKNOWN'
70  fileNameToPrint = file
71  if abspath == False:
72  fileNameToPrint = os.path.basename(file)
73  if info:
74  size=self.filesAndSizes.get(file,'UNKNOWN').rjust(10)
75  # if size is not None:
76  # size = size.rjust(10)
77  print(status.ljust(10), size, \
78  '\t', fileNameToPrint)
79  else:
80  print(fileNameToPrint)
81  print('PrimaryDatasetEntries: %d' % self.primaryDatasetEntries)
82 

References dataset.BaseDataset.bad_files, dataset.BaseDataset.buildListOfFiles(), readConfig.fileINI.files, dataset.BaseDataset.files, chain.Chain.files, MatrixReader.MatrixReader.files, MatrixUtil.InputInfo.files, JsonOutputProducer::JsonConfigurationBlock.files, dataset.BaseDataset.filesAndSizes, dataset.BaseDataset.good_files, CSCALCT.pattern, dataset.BaseDataset.pattern, presentation.PageLayout.pattern, l1t::EMTFRoad.pattern, L1TMuon::TriggerPrimitive::CSCData.pattern, cscdqm::MOFilterItem.pattern, CSCCorrelatedLCTDigi.pattern, l1t::EMTFHit.pattern, dataset.BaseDataset.primaryDatasetEntries, and edm.print().

◆ printInfo()

def dataset.BaseDataset.printInfo (   self)

Reimplemented in dataset.Dataset.

Definition at line 53 of file dataset.py.

53  def printInfo(self):
54  print('sample : ' + self.name)
55  print('user : ' + self.user)
56 

References ElectronMVAID.ElectronMVAID.name, DigiComparisonTask.name, TestTask.name, TPComparisonTask.name, HcalOfflineHarvesting.name, HcalOnlineHarvesting.name, HFRaddamTask.name, LaserTask.name, NoCQTask.name, PedestalTask.name, QIE10Task.name, RecHitTask.name, QIE11Task.name, UMNioTask.name, ZDCTask.name, AlignableObjectId::entry.name, RawTask.name, average.Average.name, counter.Counter.name, TPTask.name, DigiTask.name, histograms.Histograms.name, LEDTask.name, cond::persistency::TAG::NAME.name, cond::persistency::RUN_INFO::RUN_NUMBER.name, cond::persistency::GTEditorData.name, TmModule.name, cond::persistency::GLOBAL_TAG::NAME.name, cond::persistency::RUN_INFO::START_TIME.name, cond::persistency::TAG::TIME_TYPE.name, cond::persistency::GLOBAL_TAG::VALIDITY.name, cond::persistency::RUN_INFO::END_TIME.name, cond::persistency::TAG::OBJECT_TYPE.name, core.autovars.NTupleVariable.name, cond::persistency::GLOBAL_TAG::DESCRIPTION.name, cond::persistency::O2O_RUN::JOB_NAME.name, DQMRivetClient::NormOption.name, cond::persistency::TAG::SYNCHRONIZATION.name, cond::persistency::GLOBAL_TAG::RELEASE.name, cond::persistency::O2O_RUN::START_TIME.name, cond::persistency::TAG::END_OF_VALIDITY.name, MEPSet.name, cond::persistency::GLOBAL_TAG::SNAPSHOT_TIME.name, cond::persistency::O2O_RUN::END_TIME.name, cond::persistency::O2O_RUN::STATUS_CODE.name, cms::dd::NameValuePair< T >.name, cond::persistency::TAG::DESCRIPTION.name, cond::persistency::GLOBAL_TAG::INSERTION_TIME.name, cond::persistency::O2O_RUN::LOG.name, cond::persistency::TAG::LAST_VALIDATED_TIME.name, cond::persistency::TAG::INSERTION_TIME.name, FWTGeoRecoGeometry::Info.name, cond::persistency::TAG::MODIFICATION_TIME.name, nanoaod::MergeableCounterTable::SingleColumn< T >.name, ParameterSet.name, preexistingValidation.PreexistingValidation.name, OutputMEPSet.name, PixelDCSObject< T >::Item.name, cms::DDSpecPar.name, cms::dd::ValuePair< T, U >.name, dataset.BaseDataset.name, personalPlayback.Applet.name, Types._Untracked.name, MagCylinder.name, ParSet.name, analyzer.Analyzer.name, DQMRivetClient::LumiOption.name, cond::persistency::GTProxyData.name, Barrel.name, edm::PathTimingSummary.name, DQMRivetClient::ScaleFactorOption.name, EgHLTOfflineSummaryClient::SumHistBinData.name, cms::DDAlgoArguments.name, SingleObjectCondition.name, cond::TimeTypeSpecs.name, perftools::EdmEventSize::BranchRecord.name, core.autovars.NTupleObjectType.name, MyWatcher.name, edm::PathSummary.name, EcalLogicID.name, lumi::TriggerInfo.name, alignment.Alignment.name, PixelEndcapLinkMaker::Item.name, XMLProcessor::_loaderBaseConfig.name, MEtoEDM< T >::MEtoEDMObject.name, FWTableViewManager::TableEntry.name, PixelBarrelLinkMaker::Item.name, ExpressionHisto< T >.name, DQMGenericClient::EfficOption.name, Supermodule.name, TreeCrawler.Package.name, cond::persistency::GLOBAL_TAG_MAP::GLOBAL_TAG_NAME.name, genericValidation.GenericValidation.name, cond::persistency::GLOBAL_TAG_MAP::RECORD.name, options.ConnectionHLTMenu.name, cond::persistency::GLOBAL_TAG_MAP::LABEL.name, cond::persistency::GLOBAL_TAG_MAP::TAG_NAME.name, cond::Tag_t.name, FastHFShowerLibrary.name, dqmoffline::l1t::HistDefinition.name, DQMGenericClient::ProfileOption.name, nanoaod::MergeableCounterTable::VectorColumn< T >.name, magneticfield::BaseVolumeHandle.name, cms::DDParsingContext::CompositeMaterial.name, emtf::Node.name, h4DSegm.name, DQMGenericClient::NormOption.name, core.TriggerMatchAnalyzer.TriggerMatchAnalyzer.name, cond::persistency::PAYLOAD::HASH.name, cond::persistency::PAYLOAD::OBJECT_TYPE.name, looper.Looper.name, DQMGenericClient::CDOption.name, PhysicsTools::Calibration::Variable.name, cond::persistency::PAYLOAD::DATA.name, cond::TagInfo_t.name, cond::persistency::PAYLOAD::STREAMER_INFO.name, cond::persistency::PAYLOAD::VERSION.name, TrackerSectorStruct.name, cond::persistency::PAYLOAD::INSERTION_TIME.name, MuonGeometrySanityCheckPoint.name, FCDTask.name, DQMGenericClient::NoFlowOption.name, CounterChecker.name, Mapper::definition< ScannerT >.name, EDMtoMEConverter.name, classes.MonitorData.name, HistogramManager.name, classes.OutputData.name, Crystal.name, cond::persistency::IOV::TAG_NAME.name, cond::persistency::IOV::SINCE.name, cond::persistency::IOV::PAYLOAD_HASH.name, h2DSegm.name, cond::persistency::IOV::INSERTION_TIME.name, options.HLTProcessOptions.name, DQMNet::WaitObject.name, core.TriggerBitAnalyzer.TriggerBitAnalyzer.name, AlpgenParameterName.name, config.Analyzer.name, geometry.Structure.name, core.autovars.NTupleSubObject.name, Capsule.name, core.autovars.NTupleObject.name, Ceramic.name, SiStripMonitorDigi.name, BulkSilicon.name, config.Service.name, APD.name, nanoaod::FlatTable::Column.name, core.autovars.NTupleCollection.name, BPHRecoBuilder::BPHRecoSource.name, cond::payloadInspector::TagReference.name, BPHRecoBuilder::BPHCompSource.name, cond::persistency::TAG_LOG::TAG_NAME.name, cond::persistency::TAG_LOG::EVENT_TIME.name, cond::persistency::TAG_LOG::USER_NAME.name, cond::persistency::TAG_LOG::HOST_NAME.name, cond::persistency::TAG_LOG::COMMAND.name, cond::persistency::TAG_LOG::ACTION.name, cond::persistency::TAG_LOG::USER_TEXT.name, InnerLayerVolume.name, personalPlayback.FrameworkJob.name, Grid.name, Grille.name, BackPipe.name, plotscripts.SawTeethFunction.name, PatchPanel.name, BackCoolTank.name, DryAirTube.name, crabFunctions.CrabTask.name, MBCoolTube.name, MBManif.name, cscdqm::ParHistoDef.name, hTMaxCell.name, BeautifulSoup.Tag.name, SummaryOutputProducer::GenericSummary.name, BeautifulSoup.SoupStrainer.name, edm.print(), EcalTPGParamReaderFromDB.user, dataset.BaseDataset.user, popcon::RpcDataT.user, popcon::RpcDataV.user, popcon::RpcObGasData.user, popcon::RPCObPVSSmapData.user, popcon::RpcDataUXC.user, popcon::RpcDataI.user, popcon::RpcDataFebmap.user, popcon::RpcDataGasMix.user, popcon::RpcDataS.user, EcalDBConnection.user, and MatrixInjector.MatrixInjector.user.

Member Data Documentation

◆ bad_files

dataset.BaseDataset.bad_files

◆ dbsInstance

dataset.BaseDataset.dbsInstance

MM.

Definition at line 32 of file dataset.py.

Referenced by dataset.PrivateDataset.getPrimaryDatasetEntries().

◆ files

dataset.BaseDataset.files

◆ filesAndSizes

dataset.BaseDataset.filesAndSizes

Definition at line 47 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

◆ good_files

dataset.BaseDataset.good_files

◆ name

dataset.BaseDataset.name

◆ pattern

dataset.BaseDataset.pattern

Definition at line 29 of file dataset.py.

Referenced by dataset.BaseDataset.printFiles().

◆ primaryDatasetEntries

dataset.BaseDataset.primaryDatasetEntries

◆ report

dataset.BaseDataset.report

Definition at line 35 of file dataset.py.

Referenced by dataset.Dataset.getPrimaryDatasetEntries(), and addOnTests.testit.run().

◆ run_range

dataset.BaseDataset.run_range

◆ user

dataset.BaseDataset.user
pileupDistInMC.listOfFiles
listOfFiles
Definition: pileupDistInMC.py:28
mps_setup.append
append
Definition: mps_setup.py:85
objects.IsoTrackAnalyzer.printInfo
def printInfo(self, event)
Definition: IsoTrackAnalyzer.py:252
dataset.int
int
Definition: dataset.py:35
edm::print
S & print(S &os, JobReport::InputFile const &f)
Definition: JobReport.cc:66