CMS 3D CMS Logo

Public Member Functions | Private Attributes | Static Private Attributes

cmsswVersionTools::PickRelValInputFiles Class Reference

------------------------------------------------------ Automatic pick-up of RelVal input files ------------------------------------------------------ More...

Inherits FWCore::GuiBrowsers::ConfigToolBase::ConfigToolBase.

List of all members.

Public Member Functions

def __call__
def __init__
def apply
def getDefaultParameters
def messageEmptyList

Private Attributes

 _comment
 _parameters

Static Private Attributes

tuple _defaultParameters = dicttypes.SortedKeysDict()
string _label = 'pickRelValInputFiles'

Detailed Description

------------------------------------------------------ Automatic pick-up of RelVal input files ------------------------------------------------------

Picks up RelVal input files automatically and
  returns a vector of strings with the paths to be used in [PoolSource].fileNames
PickRelValInputFiles( cmsswVersion, relVal, dataTier, condition, globalTag, maxVersions, skipFiles, numberOfFiles, debug )
- useDAS       : switch to perform query in DAS rather than in DBS
                 optional; default: False
- cmsswVersion : CMSSW release to pick up the RelVal files from
                 optional; default: the current release (determined automatically from environment)
- formerVersion: use the last before the last valid CMSSW release to pick up the RelVal files from
                 applies also, if 'cmsswVersion' is set explicitly
                 optional; default: False
- relVal       : RelVal sample to be used
                 optional; default: 'RelValTTbar'
- dataTier     : data tier to be used
                 optional; default: 'GEN-SIM-RECO'
- condition    : identifier of GlobalTag as defined in Configurations/PyReleaseValidation/python/autoCond.py
                 possibly overwritten, if 'globalTag' is set explicitly
                 optional; default: 'startup'
- globalTag    : name of GlobalTag as it is used in the data path of the RelVals
                 optional; default: determined automatically as defined by 'condition' in Configurations/PyReleaseValidation/python/autoCond.py
  !!!            Determination is done for the release one runs in, not for the release the RelVals have been produced in.
  !!!            Example of deviation: data RelVals (CMSSW_4_1_X) might not only have the pure name of the GlobalTag 'GR_R_311_V2' in the full path,
                 but also an extension identifying the data: 'GR_R_311_V2_RelVal_wzMu2010B'
- maxVersions  : max. versioning number of RelVal to check
                 optional; default: 9
- skipFiles    : number of files to skip for a found RelVal sample
                 optional; default: 0
- numberOfFiles: number of files to pick up
                 setting it to negative values, returns all found ('skipFiles' remains active though)
                 optional; default: -1
- debug        : switch to enable enhanced messages in 'stdout'
                 optional; default: False

Definition at line 766 of file cmsswVersionTools.py.


Constructor & Destructor Documentation

def cmsswVersionTools::PickRelValInputFiles::__init__ (   self)

Definition at line 806 of file cmsswVersionTools.py.

00807                         :
00808         ConfigToolBase.__init__( self )
00809         self.addParameter( self._defaultParameters, 'useDAS'       , False                                                               , '' )
00810         self.addParameter( self._defaultParameters, 'cmsswVersion' , os.getenv( "CMSSW_VERSION" )                                        , 'auto from environment' )
00811         self.addParameter( self._defaultParameters, 'formerVersion', False                                                               , '' )
00812         self.addParameter( self._defaultParameters, 'relVal'       , 'RelValTTbar'                                                       , '' )
00813         self.addParameter( self._defaultParameters, 'dataTier'     , 'GEN-SIM-RECO'                                                      , '' )
00814         self.addParameter( self._defaultParameters, 'condition'    , 'startup'                                                           , '' )
00815         self.addParameter( self._defaultParameters, 'globalTag'    , autoCond[ self.getDefaultParameters()[ 'condition' ].value ][ : -5 ], 'auto from \'condition\'' )
00816         self.addParameter( self._defaultParameters, 'maxVersions'  , 3                                                                   , '' )
00817         self.addParameter( self._defaultParameters, 'skipFiles'    , 0                                                                   , '' )
00818         self.addParameter( self._defaultParameters, 'numberOfFiles', -1                                                                  , 'all' )
00819         self.addParameter( self._defaultParameters, 'debug'        , False                                                               , '' )
00820         self._parameters = copy.deepcopy( self._defaultParameters )
00821         self._comment = ""


Member Function Documentation

def cmsswVersionTools::PickRelValInputFiles::__call__ (   self,
  useDAS = None,
  cmsswVersion = None,
  formerVersion = None,
  relVal = None,
  dataTier = None,
  condition = None,
  globalTag = None,
  maxVersions = None,
  skipFiles = None,
  numberOfFiles = None,
  debug = None 
)

Definition at line 822 of file cmsswVersionTools.py.

00835                  :
00836         if useDAS is None:
00837             useDAS = self.getDefaultParameters()[ 'useDAS' ].value
00838         if cmsswVersion is None:
00839             cmsswVersion = self.getDefaultParameters()[ 'cmsswVersion' ].value
00840         if formerVersion is None:
00841             formerVersion = self.getDefaultParameters()[ 'formerVersion' ].value
00842         if relVal is None:
00843             relVal = self.getDefaultParameters()[ 'relVal' ].value
00844         if dataTier is None:
00845             dataTier = self.getDefaultParameters()[ 'dataTier' ].value
00846         if condition is None:
00847             condition = self.getDefaultParameters()[ 'condition' ].value
00848         if globalTag is None:
00849             globalTag = autoCond[ condition ][ : -5 ] # auto from 'condition'
00850         if maxVersions is None:
00851             maxVersions = self.getDefaultParameters()[ 'maxVersions' ].value
00852         if skipFiles is None:
00853             skipFiles = self.getDefaultParameters()[ 'skipFiles' ].value
00854         if numberOfFiles is None:
00855             numberOfFiles = self.getDefaultParameters()[ 'numberOfFiles' ].value
00856         if debug is None:
00857             debug = self.getDefaultParameters()[ 'debug' ].value
00858         self.setParameter( 'useDAS'       , useDAS )
00859         self.setParameter( 'cmsswVersion' , cmsswVersion )
00860         self.setParameter( 'formerVersion', formerVersion )
00861         self.setParameter( 'relVal'       , relVal )
00862         self.setParameter( 'dataTier'     , dataTier )
00863         self.setParameter( 'condition'    , condition )
00864         self.setParameter( 'globalTag'    , globalTag )
00865         self.setParameter( 'maxVersions'  , maxVersions )
00866         self.setParameter( 'skipFiles'    , skipFiles )
00867         self.setParameter( 'numberOfFiles', numberOfFiles )
00868         self.setParameter( 'debug'        , debug )
00869         return self.apply()

def cmsswVersionTools::PickRelValInputFiles::apply (   self)

Definition at line 874 of file cmsswVersionTools.py.

00875                      :
00876         useDAS        = self._parameters[ 'useDAS'        ].value
00877         cmsswVersion  = self._parameters[ 'cmsswVersion'  ].value
00878         formerVersion = self._parameters[ 'formerVersion' ].value
00879         relVal        = self._parameters[ 'relVal'        ].value
00880         dataTier      = self._parameters[ 'dataTier'      ].value
00881         condition     = self._parameters[ 'condition'     ].value # only used for GT determination in initialization, if GT not explicitly given
00882         globalTag     = self._parameters[ 'globalTag'     ].value
00883         maxVersions   = self._parameters[ 'maxVersions'   ].value
00884         skipFiles     = self._parameters[ 'skipFiles'     ].value
00885         numberOfFiles = self._parameters[ 'numberOfFiles' ].value
00886         debug         = self._parameters[ 'debug'         ].value
00887 
00888         filePaths = []
00889 
00890         # Determine corresponding CMSSW version for RelVals
00891         preId      = '_pre'
00892         patchId    = '_patch'    # patch releases
00893         hltPatchId = '_hltpatch' # HLT patch releases
00894         dqmPatchId = '_dqmpatch' # DQM patch releases
00895         slhcId     = '_SLHC'     # SLHC releases
00896         rootId     = '_root'     # ROOT test releases
00897         ibId       = '_X_'       # IBs
00898         if patchId in cmsswVersion:
00899             cmsswVersion = cmsswVersion.split( patchId )[ 0 ]
00900         elif hltPatchId in cmsswVersion:
00901             cmsswVersion = cmsswVersion.split( hltPatchId )[ 0 ]
00902         elif dqmPatchId in cmsswVersion:
00903             cmsswVersion = cmsswVersion.split( dqmPatchId )[ 0 ]
00904         elif rootId in cmsswVersion:
00905             cmsswVersion = cmsswVersion.split( rootId )[ 0 ]
00906         elif slhcId in cmsswVersion:
00907             cmsswVersion = cmsswVersion.split( slhcId )[ 0 ]
00908         elif ibId in cmsswVersion or formerVersion:
00909             outputTuple = Popen( [ 'scram', 'l -c CMSSW' ], stdout = PIPE, stderr = PIPE ).communicate()
00910             if len( outputTuple[ 1 ] ) != 0:
00911                 print '%s INFO : SCRAM error'%( self._label )
00912                 if debug:
00913                     print '    from trying to determine last valid releases before \'%s\''%( cmsswVersion )
00914                     print
00915                     print outputTuple[ 1 ]
00916                     print
00917                     self.messageEmptyList()
00918                 return filePaths
00919             versions = { 'last'      :''
00920                        , 'lastToLast':''
00921                        }
00922             for line in outputTuple[ 0 ].splitlines():
00923                 version = line.split()[ 1 ]
00924                 if cmsswVersion.split( ibId )[ 0 ] in version or cmsswVersion.rpartition( '_' )[ 0 ] in version:
00925                     if not ( patchId in version or hltPatchId in version or dqmPatchId in version or slhcId in version or ibId in version or rootId in version ):
00926                         versions[ 'lastToLast' ] = versions[ 'last' ]
00927                         versions[ 'last' ]       = version
00928                         if version == cmsswVersion:
00929                             break
00930             # FIXME: ordering of output problematic ('XYZ_pre10' before 'XYZ_pre2', no "formerVersion" for 'XYZ_pre1')
00931             if formerVersion:
00932                 # Don't use pre-releases as "former version" for other releases than CMSSW_X_Y_0
00933                 if preId in versions[ 'lastToLast' ] and not preId in versions[ 'last' ] and not versions[ 'last' ].endswith( '_0' ):
00934                     versions[ 'lastToLast' ] = versions[ 'lastToLast' ].split( preId )[ 0 ] # works only, if 'CMSSW_X_Y_0' esists ;-)
00935                 # Use pre-release as "former version" for CMSSW_X_Y_0
00936                 elif versions[ 'last' ].endswith( '_0' ) and not ( preId in versions[ 'lastToLast' ] and versions[ 'lastToLast' ].startswith( versions[ 'last' ] ) ):
00937                     versions[ 'lastToLast' ] = ''
00938                     for line in outputTuple[ 0 ].splitlines():
00939                         version      = line.split()[ 1 ]
00940                         versionParts = version.partition( preId )
00941                         if versionParts[ 0 ] == versions[ 'last' ] and versionParts[ 1 ] == preId:
00942                             versions[ 'lastToLast' ] = version
00943                         elif versions[ 'lastToLast' ] != '':
00944                             break
00945                 # Don't use CMSSW_X_Y_0 as "former version" for pre-releases
00946                 elif preId in versions[ 'last' ] and not preId in versions[ 'lastToLast' ] and versions[ 'lastToLast' ].endswith( '_0' ):
00947                     versions[ 'lastToLast' ] = '' # no alternative :-(
00948                 cmsswVersion = versions[ 'lastToLast' ]
00949             else:
00950                 cmsswVersion = versions[ 'last' ]
00951 
00952         # Debugging output
00953         if debug:
00954             print '%s DEBUG: Called with...'%( self._label )
00955             for key in self._parameters.keys():
00956                print '    %s:\t'%( key ),
00957                print self._parameters[ key ].value,
00958                if self._parameters[ key ].value is self.getDefaultParameters()[ key ].value:
00959                    print ' (default)'
00960                else:
00961                    print
00962                if key == 'cmsswVersion' and cmsswVersion != self._parameters[ key ].value:
00963                    if formerVersion:
00964                        print '    ==> modified to last to last valid release %s (s. \'formerVersion\' parameter)'%( cmsswVersion )
00965                    else:
00966                        print '    ==> modified to last valid release %s'%( cmsswVersion )
00967 
00968         # Check domain
00969         domain = socket.getfqdn().split( '.' )
00970         domainSE = ''
00971         if len( domain ) == 0:
00972             print '%s INFO : Cannot determine domain of this computer'%( self._label )
00973             if debug:
00974                 self.messageEmptyList()
00975             return filePaths
00976         elif os.uname()[0] == "Darwin":
00977             print '%s INFO : Running on MacOSX without direct access to RelVal files.'%( self._label )
00978             if debug:
00979                 self.messageEmptyList()
00980             return filePaths
00981         elif len( domain ) == 1:
00982             print '%s INFO : Running on local host \'%s\' without direct access to RelVal files'%( self._label, domain[ 0 ] )
00983             if debug:
00984                 self.messageEmptyList()
00985             return filePaths
00986         if not ( ( domain[ -2 ] == 'cern' and domain[ -1 ] == 'ch' ) or ( domain[ -2 ] == 'fnal' and domain[ -1 ] == 'gov' ) ):
00987             print '%s INFO : Running on site \'%s.%s\' without direct access to RelVal files'%( self._label, domain[ -2 ], domain[ -1 ] )
00988             if debug:
00989                 self.messageEmptyList()
00990             return filePaths
00991         if domain[ -2 ] == 'cern':
00992             domainSE = 'T2_CH_CERN'
00993         elif domain[ -2 ] == 'fnal':
00994             domainSE = 'T1_US_FNAL_MSS'
00995         if debug:
00996             print '%s DEBUG: Running at site \'%s.%s\''%( self._label, domain[ -2 ], domain[ -1 ] )
00997             print '%s DEBUG: Looking for SE \'%s\''%( self._label, domainSE )
00998 
00999         # Find files
01000         validVersion = 0
01001         dataset    = ''
01002         datasetAll = '/%s/%s-%s-v*/%s'%( relVal, cmsswVersion, globalTag, dataTier )
01003         if useDAS:
01004             if debug:
01005                 print '%s DEBUG: Using DAS query'%( self._label )
01006             dasLimit = numberOfFiles
01007             if dasLimit <= 0:
01008                 dasLimit += 1
01009             for version in range( maxVersions, 0, -1 ):
01010                 filePaths    = []
01011                 filePathsTmp = []
01012                 fileCount    = 0
01013                 dataset = '/%s/%s-%s-v%i/%s'%( relVal, cmsswVersion, globalTag, version, dataTier )
01014                 dasQuery = 'file dataset=%s | grep file.name'%( dataset )
01015                 if debug:
01016                     print '%s DEBUG: Querying dataset \'%s\' with'%( self._label, dataset )
01017                     print '    \'%s\''%( dasQuery )
01018                 # partially stolen from das_client.py for option '--format=plain', needs filter ("grep") in the query
01019                 dasData     = das_client.get_data( 'https://cmsweb.cern.ch', dasQuery, 0, dasLimit, False )
01020                 jsondict    = json.loads( dasData )
01021                 if debug:
01022                     print '%s DEBUG: Received DAS data:'%( self._label )
01023                     print '    \'%s\''%( dasData )
01024                     print '%s DEBUG: Determined JSON dictionary:'%( self._label )
01025                     print '    \'%s\''%( jsondict )
01026                 if jsondict[ 'status' ] != 'ok':
01027                     print 'There was a problem while querying DAS with query \'%s\'. Server reply was:\n %s' % (dasQuery, dasData)
01028                     exit( 1 )
01029                 mongo_query = jsondict[ 'mongo_query' ]
01030                 filters     = mongo_query[ 'filters' ]
01031                 data        = jsondict[ 'data' ]
01032                 if debug:
01033                     print '%s DEBUG: Query in JSON dictionary:'%( self._label )
01034                     print '    \'%s\''%( mongo_query )
01035                     print '%s DEBUG: Filters in query:'%( self._label )
01036                     print '    \'%s\''%( filters )
01037                     print '%s DEBUG: Data in JSON dictionary:'%( self._label )
01038                     print '    \'%s\''%( data )
01039                 for row in data:
01040                     filePath = [ r for r in das_client.get_value( row, filters ) ][ 0 ]
01041                     if debug:
01042                         print '%s DEBUG: Testing file entry \'%s\''%( self._label, filePath )
01043                     if len( filePath ) > 0:
01044                         if validVersion != version:
01045                             dasTest         = das_client.get_data( 'https://cmsweb.cern.ch', 'site dataset=%s | grep site.name'%( dataset ), 0, 999, False )
01046                             jsontestdict    = json.loads( dasTest )
01047                             mongo_testquery = jsontestdict[ 'mongo_query' ]
01048                             testfilters = mongo_testquery[ 'filters' ]
01049                             testdata    = jsontestdict[ 'data' ]
01050                             if debug:
01051                                 print '%s DEBUG: Received DAS data (site test):'%( self._label )
01052                                 print '    \'%s\''%( dasTest )
01053                                 print '%s DEBUG: Determined JSON dictionary (site test):'%( self._label )
01054                                 print '    \'%s\''%( jsontestdict )
01055                                 print '%s DEBUG: Query in JSON dictionary (site test):'%( self._label )
01056                                 print '    \'%s\''%( mongo_testquery )
01057                                 print '%s DEBUG: Filters in query (site test):'%( self._label )
01058                                 print '    \'%s\''%( testfilters )
01059                                 print '%s DEBUG: Data in JSON dictionary (site test):'%( self._label )
01060                                 print '    \'%s\''%( testdata )
01061                             foundSE = False
01062                             for testrow in testdata:
01063                                 siteName = [ tr for tr in das_client.get_value( testrow, testfilters ) ][ 0 ]
01064                                 if siteName == domainSE:
01065                                     foundSE = True
01066                                     break
01067                             if not foundSE:
01068                                 if debug:
01069                                     print '%s DEBUG: Possible version \'v%s\' not available on SE \'%s\''%( self._label, version, domainSE )
01070                                 break
01071                             validVersion = version
01072                             if debug:
01073                                 print '%s DEBUG: Valid version set to \'v%i\''%( self._label, validVersion )
01074                         if numberOfFiles == 0:
01075                             break
01076                         # protect from double entries ( 'unique' flag in query does not work here)
01077                         if not filePath in filePathsTmp:
01078                             filePathsTmp.append( filePath )
01079                             if debug:
01080                                 print '%s DEBUG: File \'%s\' found'%( self._label, filePath )
01081                             fileCount += 1
01082                             # needed, since and "limit" overrides "idx" in 'get_data' (==> "idx" set to '0' rather than "skipFiles")
01083                             if fileCount > skipFiles:
01084                                 filePaths.append( filePath )
01085                         elif debug:
01086                             print '%s DEBUG: File \'%s\' found again'%( self._label, filePath )
01087                 if validVersion > 0:
01088                     if numberOfFiles == 0 and debug:
01089                         print '%s DEBUG: No files requested'%( self._label )
01090                     break
01091         else:
01092             if debug:
01093                 print '%s DEBUG: Using DBS query'%( self._label )
01094             for version in range( maxVersions, 0, -1 ):
01095                 filePaths = []
01096                 fileCount = 0
01097                 dataset = '/%s/%s-%s-v%i/%s'%( relVal, cmsswVersion, globalTag, version, dataTier )
01098                 dbsQuery = 'find file where dataset = %s'%( dataset )
01099                 if debug:
01100                     print '%s DEBUG: Querying dataset \'%s\' with'%( self._label, dataset )
01101                     print '    \'%s\''%( dbsQuery )
01102                 foundSE = False
01103                 for line in os.popen( 'dbs search --query="%s"'%( dbsQuery ) ):
01104                     if line.find( '.root' ) != -1:
01105                         if validVersion != version:
01106                             if not foundSE:
01107                                 dbsSiteQuery = 'find dataset where dataset = %s and site = %s'%( dataset, domainSE )
01108                                 if debug:
01109                                     print '%s DEBUG: Querying site \'%s\' with'%( self._label, domainSE )
01110                                     print '    \'%s\''%( dbsSiteQuery )
01111                                 for lineSite in os.popen( 'dbs search --query="%s"'%( dbsSiteQuery ) ):
01112                                     if lineSite.find( dataset ) != -1:
01113                                         foundSE = True
01114                                         break
01115                             if not foundSE:
01116                                 if debug:
01117                                     print '%s DEBUG: Possible version \'v%s\' not available on SE \'%s\''%( self._label, version, domainSE )
01118                                 break
01119                             validVersion = version
01120                             if debug:
01121                                 print '%s DEBUG: Valid version set to \'v%i\''%( self._label, validVersion )
01122                         if numberOfFiles == 0:
01123                             break
01124                         filePath = line.replace( '\n', '' )
01125                         if debug:
01126                             print '%s DEBUG: File \'%s\' found'%( self._label, filePath )
01127                         fileCount += 1
01128                         if fileCount > skipFiles:
01129                             filePaths.append( filePath )
01130                         if not numberOfFiles < 0:
01131                             if numberOfFiles <= len( filePaths ):
01132                                 break
01133                 if validVersion > 0:
01134                     if numberOfFiles == 0 and debug:
01135                         print '%s DEBUG: No files requested'%( self._label )
01136                     break
01137 
01138         # Check output and return
01139         if validVersion == 0:
01140             print '%s INFO : No RelVal file(s) found at all in datasets \'%s*\' on SE \'%s\''%( self._label, datasetAll, domainSE )
01141             if debug:
01142                 self.messageEmptyList()
01143         elif len( filePaths ) == 0:
01144             print '%s INFO : No RelVal file(s) picked up in dataset \'%s\''%( self._label, dataset )
01145             if debug:
01146                 self.messageEmptyList()
01147         elif len( filePaths ) < numberOfFiles:
01148             print '%s INFO : Only %i RelVal file(s) instead of %i picked up in dataset \'%s\''%( self._label, len( filePaths ), numberOfFiles, dataset )
01149 
01150         if debug:
01151             print '%s DEBUG: returning %i file(s):\n%s'%( self._label, len( filePaths ), filePaths )
01152         return filePaths

def cmsswVersionTools::PickRelValInputFiles::getDefaultParameters (   self)

Definition at line 803 of file cmsswVersionTools.py.

00804                                     :
00805         return self._defaultParameters

def cmsswVersionTools::PickRelValInputFiles::messageEmptyList (   self)

Definition at line 870 of file cmsswVersionTools.py.

00871                                 :
00872         print '%s DEBUG: Empty file list returned'%( self._label )
00873         print '    This might be overwritten by providing input files explicitly to the source module in the main configuration file.'


Member Data Documentation

Definition at line 806 of file cmsswVersionTools.py.

tuple cmsswVersionTools::PickRelValInputFiles::_defaultParameters = dicttypes.SortedKeysDict() [static, private]

Definition at line 801 of file cmsswVersionTools.py.

Definition at line 800 of file cmsswVersionTools.py.

Definition at line 806 of file cmsswVersionTools.py.