Classes | |
class | CMSHarvester |
CMSHarvester class. More... | |
class | CMSHarvesterHelpFormatter |
Helper class: CMSHarvesterHelpFormatter. More... | |
class | DBSXMLHandler |
Helper class: DBSXMLHandler. More... | |
class | Error |
Helper class: Error exception. More... | |
class | Usage |
Helper class: Usage exception. More... | |
Variables | |
__author__ | |
__version__ | |
all_file_names | |
all_sites_found | |
CAFMore... | |
caf_access | |
castor_base_dir | |
castor_path_checks_cache | |
self.logger.debug("Path is now `%s'" % \ path) More... | |
castor_path_common | |
DEBUG DEBUG DEBUGThis is probably only useful to make sure we don't muckthings up, right?Figure out across how many sites this sample has been spread.More... | |
castor_paths | |
cmd | |
cmd_line_opts | |
cmssw_version | |
complete_sites | |
site_names_ref = set(files_info[run_number].values()[0][1]) for site_names_tmp in files_info[run_number].values()[1:]: if set(site_names_tmp[1]) != site_names_ref: mirrored = False break More... | |
config_contents | |
In case this file is the second step (the real harvestingstep) of the two-step harvesting we have to tell it to useour local files.More... | |
config_file_name | |
Only add the alarming piece to the file name if this isa spread-out dataset.More... | |
crab_submission | |
dataset_names_after_checks | |
dataset_names_after_checks_tmp | |
datasets_information | |
datasets_to_ignore | |
datasets_to_use | |
dbs_api | |
empty_runs | |
exit_code | |
file_name | |
files_at_site | |
files_info | |
files_without_sites | |
globaltag | |
harvesting_info | |
harvesting_mode | |
harvesting_type | |
Jsonfilename | |
Jsonlumi | |
CRABMore... | |
mirrored | |
msg | |
class Handler(xml.sax.handler.ContentHandler): def startElement(self, name, attrs): if name == "result": site_name = str(attrs["STORAGEELEMENT_SENAME"]) TODO TODO TODOUgly hack to get around cases like this:$ dbs search –query="find dataset, site, file.count where dataset=/RelValQCD_Pt_3000_3500/CMSSW_3_3_0_pre1-STARTUP31X_V4-v1/GEN-SIM-RECO"Using DBS instance at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServletProcessing ...More... | |
nevents | |
non_t1access | |
nr_max_sites | |
num_events_catalog | |
num_events_dataset | |
num_sites | |
if self.datasets_information[dataset_name]["num_events"][run_number] != 0: pdb.set_trace() DEBUG DEBUG DEBUG end More... | |
option_parser | |
output | |
path | |
else: Piece not in the list, fine.More... | |
permissions | |
permissions_new | |
permissions_target | |
preferred_site | |
ref_hist_mappings_file_name | |
run_number | |
runs_to_ignore | |
runs_to_use | |
saveByLumiSection | |
site_names | |
sites_with_complete_copies | |
skip_this_path_piece | |
self.logger.debug("Checking CASTOR path piece `%s'" % \ piece) More... | |
status | |
tmp | |
TODO TODO TODONeed to think about where this should go, butsomewhere we have to move over the fact that we wantto process all runs for each dataset that we'reconsidering.More... | |
traceback_string | |
twiki_url | |
def cmsHarvester.build_dataset_ignore_list | ( | self | ) |
Build a list of datasets to ignore. NOTE: We should always have a list of datasets to process, but it may be that we don't have a list of datasets to ignore.
Definition at line 3443 of file cmsHarvester.py.
References info().
def cmsHarvester.build_dataset_list | ( | self, | |
input_method, | |||
input_name | |||
) |
Build a list of all datasets to be processed.
Definition at line 3357 of file cmsHarvester.py.
References dbs_resolve_dataset_name(), info(), and print().
def cmsHarvester.build_dataset_use_list | ( | self | ) |
Build a list of datasets to process.
Definition at line 3420 of file cmsHarvester.py.
References info(), and ComparisonHelper.zip().
def cmsHarvester.build_datasets_information | ( | self | ) |
Obtain all information on the datasets that we need to run. Use DBS to figure out all required information on our datasets, like the run numbers and the GlobalTag. All information is stored in the datasets_information member variable.
Definition at line 5285 of file cmsHarvester.py.
def cmsHarvester.build_runs_ignore_list | ( | self | ) |
Build a list of runs to ignore. NOTE: We should always have a list of runs to process, but it may be that we don't have a list of runs to ignore.
Definition at line 3541 of file cmsHarvester.py.
References info().
def cmsHarvester.build_runs_list | ( | self, | |
input_method, | |||
input_name | |||
) |
Definition at line 3469 of file cmsHarvester.py.
References info(), and createfilelist.int.
def cmsHarvester.build_runs_use_list | ( | self | ) |
Build a list of runs to process.
Definition at line 3520 of file cmsHarvester.py.
References info().
def cmsHarvester.check_cmssw | ( | self | ) |
Check if CMSSW is setup.
Definition at line 2333 of file cmsHarvester.py.
def cmsHarvester.check_dataset_list | ( | self | ) |
Check list of dataset names for impossible ones. Two kinds of checks are done: - Checks for things that do not make sense. These lead to errors and skipped datasets. - Sanity checks. For these warnings are issued but the user is considered to be the authoritative expert. Checks performed: - The CMSSW version encoded in the dataset name should match self.cmssw_version. This is critical. - There should be some events in the dataset/run. This is critical in the sense that CRAB refuses to create jobs for zero events. And yes, this does happen in practice. E.g. the reprocessed CRAFT08 datasets contain runs with zero events. - A cursory check is performed to see if the harvesting type makes sense for the data type. This should prevent the user from inadvertently running RelVal for data. - It is not possible to run single-step harvesting jobs on samples that are not fully contained at a single site. - Each dataset/run has to be available at at least one site.
Definition at line 3794 of file cmsHarvester.py.
References info(), relativeConstraints.keys, and MessageLogger_cfi.warning.
def cmsHarvester.check_dbs | ( | self | ) |
def cmsHarvester.check_globaltag | ( | self, | |
globaltag = None |
|||
) |
Check if globaltag exists. Check if globaltag exists as GlobalTag in the database given by self.frontier_connection_name['globaltag']. If globaltag is None, self.globaltag is used instead. If we're going to use reference histograms this method also checks for the existence of the required key in the GlobalTag.
Definition at line 4500 of file cmsHarvester.py.
def cmsHarvester.check_globaltag_contains_ref_hist_key | ( | self, | |
globaltag, | |||
connect_name | |||
) |
Check if globaltag contains the required RefHistos key.
Definition at line 4597 of file cmsHarvester.py.
def cmsHarvester.check_globaltag_exists | ( | self, | |
globaltag, | |||
connect_name | |||
) |
Check if globaltag exists.
Definition at line 4555 of file cmsHarvester.py.
References debug, info(), and submitPVValidationJobs.split().
def cmsHarvester.check_input_status | ( | self | ) |
Check completeness and correctness of input information. Check that all required information has been specified and that, at least as far as can be easily checked, it makes sense. NOTE: This is also where any default values are applied.
Definition at line 2192 of file cmsHarvester.py.
References info(), and join().
def cmsHarvester.check_ref_hist_mappings | ( | self | ) |
Make sure all necessary reference histograms exist. Check that for each of the datasets to be processed a reference histogram is specified and that that histogram exists in the database. NOTE: There's a little complication here. Since this whole thing was designed to allow (in principle) harvesting of both data and MC datasets in one go, we need to be careful to check the availability fof reference mappings only for those datasets that need it.
Definition at line 5245 of file cmsHarvester.py.
References info().
def cmsHarvester.check_ref_hist_tag | ( | self, | |
tag_name | |||
) |
Check the existence of tag_name in database connect_name. Check if tag_name exists as a reference histogram tag in the database given by self.frontier_connection_name['refhists'].
Definition at line 4642 of file cmsHarvester.py.
def cmsHarvester.create_and_check_castor_dir | ( | self, | |
castor_dir | |||
) |
Check existence of the give CASTOR dir, if necessary create it. Some special care has to be taken with several things like setting the correct permissions such that CRAB can store the output results. Of course this means that things like /castor/cern.ch/ and user/j/ have to be recognised and treated properly. NOTE: Only CERN CASTOR area (/castor/cern.ch/) supported for the moment. NOTE: This method uses some slightly tricky caching to make sure we don't keep over and over checking the same base paths.
Definition at line 1490 of file cmsHarvester.py.
References debug, and spr.find().
def cmsHarvester.create_and_check_castor_dirs | ( | self | ) |
Make sure all required CASTOR output dirs exist. This checks the CASTOR base dir specified by the user as well as all the subdirs required by the current set of jobs.
Definition at line 1431 of file cmsHarvester.py.
References debug, info(), mps_monitormerge.items, SiStripPI.max, and MessageLogger_cfi.warning.
def cmsHarvester.create_castor_path_name_common | ( | self, | |
dataset_name | |||
) |
Build the common part of the output path to be used on CASTOR. This consists of the CASTOR area base path specified by the user and a piece depending on the data type (data vs. MC), the harvesting type and the dataset name followed by a piece containing the run number and event count. (See comments in create_castor_path_name_special for details.) This method creates the common part, without run number and event count.
Definition at line 1327 of file cmsHarvester.py.
References create_castor_path_name_special(), python.rootplot.root2matplotlib.replace(), and nano_mu_digi_cff.strip.
def cmsHarvester.create_castor_path_name_special | ( | self, | |
dataset_name, | |||
run_number, | |||
castor_path_common | |||
) |
Create the specialised part of the CASTOR output dir name. NOTE: To avoid clashes with `incremental harvesting' (re-harvesting when a dataset grows) we have to include the event count in the path name. The underlying `problem' is that CRAB does not overwrite existing output files so if the output file already exists CRAB will fail to copy back the output. NOTE: It's not possible to create different kinds of harvesting jobs in a single call to this tool. However, in principle it could be possible to create both data and MC jobs in a single go. NOTE: The number of events used in the path name is the _total_ number of events in the dataset/run at the time of harvesting. If we're doing partial harvesting the final results will reflect lower statistics. This is a) the easiest to code and b) the least likely to lead to confusion if someone ever decides to swap/copy around file blocks between sites.
Definition at line 1383 of file cmsHarvester.py.
Referenced by create_castor_path_name_common().
def cmsHarvester.create_config_file_name | ( | self, | |
dataset_name, | |||
run_number | |||
) |
Generate the name of the configuration file to be run by CRAB. Depending on the harvesting mode (single-step or two-step) this is the name of the real harvesting configuration or the name of the first-step ME summary extraction configuration.
Definition at line 4064 of file cmsHarvester.py.
Referenced by create_multicrab_config().
def cmsHarvester.create_crab_config | ( | self | ) |
Create a CRAB configuration for a given job. NOTE: This is _not_ a complete (as in: submittable) CRAB configuration. It is used to store the common settings for the multicrab configuration. NOTE: Only CERN CASTOR area (/castor/cern.ch/) is supported. NOTE: According to CRAB, you `Must define exactly two of total_number_of_events, events_per_job, or number_of_jobs.'. For single-step harvesting we force one job, for the rest we don't really care. # BUG BUG BUG # With the current version of CRAB (2.6.1), in which Daniele # fixed the behaviour of no_block_boundary for me, one _has to # specify_ the total_number_of_events and one single site in # the se_white_list. # BUG BUG BUG end
Definition at line 4232 of file cmsHarvester.py.
References spr.find(), and join().
def cmsHarvester.create_harvesting_config | ( | self, | |
dataset_name | |||
) |
Create the Python harvesting configuration for harvesting. The basic configuration is created by Configuration.PyReleaseValidation.ConfigBuilder. (This mimics what cmsDriver.py does.) After that we add some specials ourselves. NOTE: On one hand it may not be nice to circumvent cmsDriver.py, on the other hand cmsDriver.py does not really do anything itself. All the real work is done by the ConfigBuilder so there is not much risk that we miss out on essential developments of cmsDriver in the future.
Definition at line 4688 of file cmsHarvester.py.
References join().
def cmsHarvester.create_harvesting_config_file_name | ( | self, | |
dataset_name | |||
) |
Definition at line 4096 of file cmsHarvester.py.
Referenced by write_harvesting_config().
def cmsHarvester.create_harvesting_output_file_name | ( | self, | |
dataset_name, | |||
run_number | |||
) |
Generate the name to be used for the harvesting output file. This harvesting output file is the _final_ ROOT output file containing the harvesting results. In case of two-step harvesting there is an intermediate ME output file as well.
Definition at line 4168 of file cmsHarvester.py.
References spr.find().
def cmsHarvester.create_me_extraction_config | ( | self, | |
dataset_name | |||
) |
Definition at line 4914 of file cmsHarvester.py.
References create_output_file_name(), and join().
def cmsHarvester.create_me_summary_config_file_name | ( | self, | |
dataset_name | |||
) |
Definition at line 4110 of file cmsHarvester.py.
Referenced by write_me_extraction_config().
def cmsHarvester.create_me_summary_output_file_name | ( | self, | |
dataset_name | |||
) |
Generate the name of the intermediate ME file name to be used in two-step harvesting.
Definition at line 4200 of file cmsHarvester.py.
def cmsHarvester.create_multicrab_block_name | ( | self, | |
dataset_name, | |||
run_number, | |||
index | |||
) |
Create the block name to use for this dataset/run number. This is what appears in the brackets `[]' in multicrab.cfg. It is used as the name of the job and to create output directories.
Definition at line 4215 of file cmsHarvester.py.
def cmsHarvester.create_multicrab_config | ( | self | ) |
Create a multicrab.cfg file for all samples. This creates the contents for a multicrab.cfg file that uses the crab.cfg file (generated elsewhere) for the basic settings and contains blocks for each run of each dataset. # BUG BUG BUG # The fact that it's necessary to specify the se_white_list # and the total_number_of_events is due to our use of CRAB # version 2.6.1. This should no longer be necessary in the # future. # BUG BUG BUG end
Definition at line 4312 of file cmsHarvester.py.
References create_config_file_name(), create_output_file_name(), info(), join(), relativeConstraints.keys, print(), and FastTimerService_cff.range.
def cmsHarvester.create_output_file_name | ( | self, | |
dataset_name, | |||
run_number = None |
|||
) |
Create the name of the output file name to be used. This is the name of the output file of the `first step'. In the case of single-step harvesting this is already the final harvesting output ROOT file. In the case of two-step harvesting it is the name of the intermediary ME summary file.
Definition at line 4124 of file cmsHarvester.py.
Referenced by create_me_extraction_config(), and create_multicrab_config().
def cmsHarvester.dbs_check_dataset_spread | ( | self, | |
dataset_name | |||
) |
Figure out the number of events in each run of this dataset. This is a more efficient way of doing this than calling dbs_resolve_number_of_events for each run.
Definition at line 3076 of file cmsHarvester.py.
References cms::cuda.assert(), and debug.
def cmsHarvester.dbs_resolve_cmssw_version | ( | self, | |
dataset_name | |||
) |
Ask DBS for the CMSSW version used to create this dataset.
Definition at line 2475 of file cmsHarvester.py.
References cms::cuda.assert().
def cmsHarvester.dbs_resolve_dataset_name | ( | self, | |
dataset_name | |||
) |
Use DBS to resolve a wildcarded dataset name.
Definition at line 2419 of file cmsHarvester.py.
References cms::cuda.assert(), and MessageLogger_cfi.warning.
Referenced by build_dataset_list().
def cmsHarvester.dbs_resolve_datatype | ( | self, | |
dataset_name | |||
) |
Ask DBS for the the data type (data or mc) of a given dataset.
Definition at line 2682 of file cmsHarvester.py.
References cms::cuda.assert().
def cmsHarvester.dbs_resolve_globaltag | ( | self, | |
dataset_name | |||
) |
Ask DBS for the globaltag corresponding to a given dataset. # BUG BUG BUG # This does not seem to work for data datasets? E.g. for # /Cosmics/Commissioning08_CRAFT0831X_V1_311_ReReco_FromSuperPointing_v1/RAW-RECO # Probaly due to the fact that the GlobalTag changed during # datataking... BUG BUG BUG end
Definition at line 2626 of file cmsHarvester.py.
References cms::cuda.assert().
def cmsHarvester.dbs_resolve_number_of_events | ( | self, | |
dataset_name, | |||
run_number = None |
|||
) |
Determine the number of events in a given dataset (and run). Ask DBS for the number of events in a dataset. If a run number is specified the number of events returned is that in that run of that dataset. If problems occur we throw an exception. # BUG BUG BUG # Since DBS does not return the number of events correctly, # neither for runs nor for whole datasets, we have to work # around that a bit... # BUG BUG BUG end
Definition at line 2735 of file cmsHarvester.py.
References cms::cuda.assert().
def cmsHarvester.dbs_resolve_runs | ( | self, | |
dataset_name | |||
) |
Ask DBS for the list of runs in a given dataset. # NOTE: This does not (yet?) skip/remove empty runs. There is # a bug in the DBS entry run.numevents (i.e. it always returns # zero) which should be fixed in the `next DBS release'. # See also: # https://savannah.cern.ch/bugs/?53452 # https://savannah.cern.ch/bugs/?53711
Definition at line 2569 of file cmsHarvester.py.
References cms::cuda.assert(), and createfilelist.int.
def cmsHarvester.escape_dataset_name | ( | self, | |
dataset_name | |||
) |
Escape a DBS dataset name. Escape a DBS dataset name such that it does not cause trouble with the file system. This means turning each `/' into `__', except for the first one which is just removed.
Definition at line 4045 of file cmsHarvester.py.
def cmsHarvester.load_ref_hist_mappings | ( | self | ) |
Load the reference histogram mappings from file. The dataset name to reference histogram name mappings are read from a text file specified in self.ref_hist_mappings_file_name.
Definition at line 5169 of file cmsHarvester.py.
References geometryDiff.file, info(), mps_monitormerge.items, relativeConstraints.keys, SiStripPI.max, and nano_mu_digi_cff.strip.
def cmsHarvester.option_handler_caf_access | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
Set the self.caf_access flag to try and create jobs that run on the CAF.
Definition at line 1103 of file cmsHarvester.py.
def cmsHarvester.option_handler_castor_dir | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
Specify where on CASTOR the output should go. At the moment only output to CERN CASTOR is supported. Eventually the harvested results should go into the central place for DQM on CASTOR anyway.
Definition at line 1061 of file cmsHarvester.py.
def cmsHarvester.option_handler_crab_submission | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
Crab jobs are not created and "submitted automatically",
Definition at line 1131 of file cmsHarvester.py.
def cmsHarvester.option_handler_list_types | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
List all harvesting types and their mappings. This lists all implemented harvesting types with their corresponding mappings to sequence names. This had to be separated out from the help since it depends on the CMSSW version and was making things a bit of a mess. NOTE: There is no way (at least not that I could come up with) to code this in a neat generic way that can be read both by this method and by setup_harvesting_info(). Please try hard to keep these two methods in sync!
Definition at line 1153 of file cmsHarvester.py.
References print().
def cmsHarvester.option_handler_no_t1access | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
Set the self.no_t1access flag to try and create jobs that run without special `t1access' role.
Definition at line 1086 of file cmsHarvester.py.
def cmsHarvester.option_handler_preferred_site | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
Definition at line 1147 of file cmsHarvester.py.
def cmsHarvester.option_handler_saveByLumiSection | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
Set process.dqmSaver.saveByLumiSectiont=1 in cfg harvesting file
Definition at line 1119 of file cmsHarvester.py.
def cmsHarvester.option_handler_sites | ( | self, | |
option, | |||
opt_str, | |||
value, | |||
parser | |||
) |
Definition at line 1141 of file cmsHarvester.py.
def cmsHarvester.parse_cmd_line_options | ( | self | ) |
Definition at line 1870 of file cmsHarvester.py.
def cmsHarvester.pick_a_site | ( | self, | |
sites, | |||
cmssw_version | |||
) |
Definition at line 1706 of file cmsHarvester.py.
References debug, relativeConstraints.error, and info().
def cmsHarvester.process_dataset_ignore_list | ( | self | ) |
Update the list of datasets taking into account the ones to ignore. Both lists have been generated before from DBS and both are assumed to be unique. NOTE: The advantage of creating the ignore list from DBS (in case a regexp is given) and matching that instead of directly matching the ignore criterion against the list of datasets (to consider) built from DBS is that in the former case we're sure that all regexps are treated exactly as DBS would have done without the cmsHarvester. NOTE: This only removes complete samples. Exclusion of single runs is done by the book keeping. So the assumption is that a user never wants to harvest just part (i.e. n out of N runs) of a sample.
Definition at line 3565 of file cmsHarvester.py.
References debug, info(), and relativeConstraints.keys.
def cmsHarvester.process_runs_use_and_ignore_lists | ( | self | ) |
Definition at line 3612 of file cmsHarvester.py.
References info(), print(), and MessageLogger_cfi.warning.
def cmsHarvester.ref_hist_mappings_needed | ( | self, | |
dataset_name = None |
|||
) |
Check if we need to load and check the reference mappings. For data the reference histograms should be taken automatically from the GlobalTag, so we don't need any mappings. For RelVals we need to know a mapping to be used in the es_prefer code snippet (different references for each of the datasets.) WARNING: This implementation is a bit convoluted.
Definition at line 5135 of file cmsHarvester.py.
References relativeConstraints.keys.
def cmsHarvester.run | ( | self | ) |
Definition at line 5486 of file cmsHarvester.py.
References info(), relativeConstraints.keys, update, and contentValuesCheck.values.
def cmsHarvester.setup_dbs | ( | self | ) |
Setup the Python side of DBS. For more information see the DBS Python API documentation: https://twiki.cern.ch/twiki/bin/view/CMS/DBSApiDocumentation
Definition at line 2393 of file cmsHarvester.py.
def cmsHarvester.setup_harvesting_info | ( | self | ) |
Fill our dictionary with all info needed to understand harvesting. This depends on the CMSSW version since at some point the names and sequences were modified. NOTE: There is no way (at least not that I could come up with) to code this in a neat generic way that can be read both by this method and by option_handler_list_types(). Please try hard to keep these two methods in sync!
Definition at line 1208 of file cmsHarvester.py.
def cmsHarvester.show_exit_message | ( | self | ) |
Tell the user what to do now, after this part is done. This should provide the user with some (preferably copy-pasteable) instructions on what to do now with the setups and files that have been created.
Definition at line 5433 of file cmsHarvester.py.
References info(), and MessageLogger_cfi.warning.
def cmsHarvester.singlify_datasets | ( | self | ) |
Remove all but the largest part of all datasets. This allows us to harvest at least part of these datasets using single-step harvesting until the two-step approach works.
Definition at line 3741 of file cmsHarvester.py.
References mps_monitormerge.items, SiStripPI.max, contentValuesCheck.values, and MessageLogger_cfi.warning.
def cmsHarvester.write_crab_config | ( | self | ) |
Write a CRAB job configuration Python file.
Definition at line 5011 of file cmsHarvester.py.
References geometryDiff.file, and info().
def cmsHarvester.write_harvesting_config | ( | self, | |
dataset_name | |||
) |
Write a harvesting job configuration Python file. NOTE: This knows nothing about single-step or two-step harvesting. That's all taken care of by create_harvesting_config.
Definition at line 5069 of file cmsHarvester.py.
References create_harvesting_config_file_name(), debug, and geometryDiff.file.
def cmsHarvester.write_me_extraction_config | ( | self, | |
dataset_name | |||
) |
Write an ME-extraction configuration Python file. This `ME-extraction' (ME = Monitoring Element) is the first step of the two-step harvesting.
Definition at line 5102 of file cmsHarvester.py.
References create_me_summary_config_file_name(), debug, and geometryDiff.file.
def cmsHarvester.write_multicrab_config | ( | self | ) |
Write a multi-CRAB job configuration Python file.
Definition at line 5040 of file cmsHarvester.py.
References geometryDiff.file, and info().
|
private |
Definition at line 40 of file cmsHarvester.py.
|
private |
Definition at line 39 of file cmsHarvester.py.
cmsHarvester.all_file_names |
Definition at line 3231 of file cmsHarvester.py.
cmsHarvester.all_sites_found |
Definition at line 1863 of file cmsHarvester.py.
cmsHarvester.caf_access |
Definition at line 1108 of file cmsHarvester.py.
cmsHarvester.castor_base_dir |
Definition at line 1077 of file cmsHarvester.py.
cmsHarvester.castor_path_checks_cache |
self.logger.debug("Path is now `%s'" % \ path)
Definition at line 1603 of file cmsHarvester.py.
cmsHarvester.castor_path_common |
if num_sites == 1: self.logger.info(" sample is contained at a single site") else: self.logger.info(" sample is spread across %d sites" % \ num_sites) if num_sites < 1:
self.logger.warning(" --> skipping dataset which is not " \ "hosted anywhere")
Definition at line 5417 of file cmsHarvester.py.
cmsHarvester.castor_paths |
Definition at line 5421 of file cmsHarvester.py.
cmsHarvester.cmd |
Definition at line 1632 of file cmsHarvester.py.
cmsHarvester.cmd_line_opts |
Definition at line 2169 of file cmsHarvester.py.
cmsHarvester.cmssw_version |
Definition at line 2347 of file cmsHarvester.py.
cmsHarvester.complete_sites |
site_names_ref = set(files_info[run_number].values()[0][1]) for site_names_tmp in files_info[run_number].values()[1:]: if set(site_names_tmp[1]) != site_names_ref: mirrored = False break
Definition at line 3276 of file cmsHarvester.py.
cmsHarvester.config_contents |
if self.harvesting_mode == "two-step": castor_dir = self.datasets_information[dataset_name] \ ["castor_path"][run] customisations.append("") customisations.append("# This is the second step (the real") customisations.append("# harvesting step) of a two-step") customisations.append("# harvesting procedure.")
customisations.append("import pdb")
customisations.append("import subprocess") customisations.append("import os") customisations.append("castor_dir = \"s"" % castor_dir) customisations.append("cmd = "rfdir s" % castor_dir") customisations.append("(status, output) = subprocess.getstatusoutput(cmd)") customisations.append("if status != 0:") customisations.append(" print "ERROR"") customisations.append(" raise Exception, "ERROR"") customisations.append("file_names = [os.path.join("rfio:s" % path, i) for i in output.split() if i.startswith("EDM_summary") and i.endswith(".root")]") #customisations.append("pdb.set_trace()") customisations.append("process.source.fileNames = cms.untracked.vstring(*file_names)") customisations.append("")
Definition at line 4890 of file cmsHarvester.py.
cmsHarvester.config_file_name |
pdb.set_trace() if self.datasets_information[dataset_name] \ ["mirrored"][run_number] == False: config_file_name = config_file_name.replace(".py", "_partial.py")
Definition at line 4085 of file cmsHarvester.py.
cmsHarvester.crab_submission |
Definition at line 1135 of file cmsHarvester.py.
cmsHarvester.dataset_names_after_checks |
Definition at line 4030 of file cmsHarvester.py.
cmsHarvester.dataset_names_after_checks_tmp |
Definition at line 4023 of file cmsHarvester.py.
cmsHarvester.datasets_information |
Definition at line 5305 of file cmsHarvester.py.
cmsHarvester.datasets_to_ignore |
Definition at line 3457 of file cmsHarvester.py.
cmsHarvester.datasets_to_use |
Definition at line 4039 of file cmsHarvester.py.
cmsHarvester.dbs_api |
Definition at line 2406 of file cmsHarvester.py.
cmsHarvester.empty_runs |
Definition at line 4007 of file cmsHarvester.py.
cmsHarvester.exit_code |
Definition at line 5654 of file cmsHarvester.py.
cmsHarvester.file_name |
Definition at line 3175 of file cmsHarvester.py.
cmsHarvester.files_at_site |
Definition at line 3235 of file cmsHarvester.py.
cmsHarvester.files_info |
Definition at line 3161 of file cmsHarvester.py.
cmsHarvester.files_without_sites |
Definition at line 3201 of file cmsHarvester.py.
cmsHarvester.globaltag |
Definition at line 2307 of file cmsHarvester.py.
cmsHarvester.harvesting_info |
Definition at line 1314 of file cmsHarvester.py.
cmsHarvester.harvesting_mode |
Definition at line 2216 of file cmsHarvester.py.
cmsHarvester.harvesting_type |
Definition at line 3858 of file cmsHarvester.py.
cmsHarvester.Jsonfilename |
Definition at line 3707 of file cmsHarvester.py.
cmsHarvester.Jsonlumi |
cmsHarvester.mirrored |
Definition at line 3222 of file cmsHarvester.py.
cmsHarvester.msg |
class Handler(xml.sax.handler.ContentHandler): def startElement(self, name, attrs): if name == "result": site_name = str(attrs["STORAGEELEMENT_SENAME"])
\
if len(site_name) < 1: return
run_number = int(attrs["RUNS_RUNNUMBER"]) file_name = str(attrs["FILES_LOGICALFILENAME"]) nevents = int(attrs["FILES_NUMBEROFEVENTS"])
if not files_info.has_key(run_number):
files_info[run_number] = {} files_info[run_number][file_name] = (nevents, [site_name]) elif not files_info[run_number].has_key(file_name):
files_info[run_number][file_name] = (nevents, [site_name]) else:
assert nevents == files_info[run_number][file_name][0]
files_info[run_number][file_name][1].append(site_name) OBSOLETE OBSOLETE OBSOLETE end
Definition at line 1640 of file cmsHarvester.py.
cmsHarvester.nevents |
Definition at line 3176 of file cmsHarvester.py.
cmsHarvester.non_t1access |
Definition at line 1092 of file cmsHarvester.py.
cmsHarvester.nr_max_sites |
Definition at line 1143 of file cmsHarvester.py.
cmsHarvester.num_events_catalog |
Definition at line 3215 of file cmsHarvester.py.
cmsHarvester.num_events_dataset |
Definition at line 3985 of file cmsHarvester.py.
cmsHarvester.num_sites |
if self.datasets_information[dataset_name]["num_events"][run_number] != 0: pdb.set_trace() DEBUG DEBUG DEBUG end
Definition at line 3955 of file cmsHarvester.py.
cmsHarvester.option_parser |
Definition at line 1879 of file cmsHarvester.py.
cmsHarvester.output |
Definition at line 1633 of file cmsHarvester.py.
cmsHarvester.path |
else:
self.logger.debug(" accepting") Add piece to the path we're building. self.logger.debug("!!! Skip path piece `%s'? %s" % \ (piece, str(skip_this_path_piece))) self.logger.debug("Adding piece to path...")
Definition at line 1592 of file cmsHarvester.py.
cmsHarvester.permissions |
Definition at line 1649 of file cmsHarvester.py.
Referenced by cond::CredentialStore.updatePrincipal().
cmsHarvester.permissions_new |
Definition at line 1679 of file cmsHarvester.py.
cmsHarvester.permissions_target |
Definition at line 1673 of file cmsHarvester.py.
cmsHarvester.preferred_site |
Definition at line 1149 of file cmsHarvester.py.
cmsHarvester.ref_hist_mappings_file_name |
Definition at line 2258 of file cmsHarvester.py.
cmsHarvester.run_number |
Definition at line 3174 of file cmsHarvester.py.
cmsHarvester.runs_to_ignore |
Definition at line 3554 of file cmsHarvester.py.
cmsHarvester.runs_to_use |
Definition at line 3530 of file cmsHarvester.py.
cmsHarvester.saveByLumiSection |
Definition at line 1122 of file cmsHarvester.py.
cmsHarvester.site_names |
Definition at line 3217 of file cmsHarvester.py.
cmsHarvester.sites_with_complete_copies |
Definition at line 3233 of file cmsHarvester.py.
cmsHarvester.skip_this_path_piece |
self.logger.debug("Checking CASTOR path piece `%s'" % \ piece)
self.logger.debug("Checking `%s' against `%s'" % \ (castor_path_pieces[piece_index + check_size], castor_paths_dont_touch[check_size])) self.logger.debug(" skipping")
Definition at line 1584 of file cmsHarvester.py.
cmsHarvester.status |
Definition at line 1633 of file cmsHarvester.py.
cmsHarvester.tmp |
This basically means copying over the
for dataset_name in self.datasets_to_use.keys(): self.datasets_to_use[dataset_name] = self.datasets_information[dataset_name]["runs"]
OBSOLETE OBSOLETE OBSOLETE end tmp = self.datasets_information[dataset_name] \ ["num_events"]
Definition at line 3982 of file cmsHarvester.py.
cmsHarvester.traceback_string |
Definition at line 5679 of file cmsHarvester.py.
cmsHarvester.twiki_url |
Definition at line 43 of file cmsHarvester.py.