CMS 3D CMS Logo

List of all members | Public Member Functions | Public Attributes | Private Attributes
mergeLHE.DefaultLHEMerger Class Reference
Inheritance diagram for mergeLHE.DefaultLHEMerger:
mergeLHE.BaseLHEMerger

Public Member Functions

def __init__ (self, input_files, output_file, kwargs)
 
def check_header_compatibility (self)
 
def file_iterator (self, path)
 
def merge (self)
 
def merge_headers (self)
 
def merge_init_blocks (self)
 
- Public Member Functions inherited from mergeLHE.BaseLHEMerger
def __init__ (self, input_files, output_file)
 
def merge (self)
 

Public Attributes

 bypass_check
 
- Public Attributes inherited from mergeLHE.BaseLHEMerger
 input_files
 
 output_file
 

Private Attributes

 _f
 
 _header_lines
 
 _header_str
 
 _init_str
 
 _is_mglo
 
 _merged_init_str
 
 _nevent
 
 _uwgt
 
 _xsec_combined
 

Detailed Description

Default LHE merge scheme that copies the header of the first LHE file,
merges and outputs the init block, then concatenates all event blocks.

Definition at line 26 of file mergeLHE.py.

Constructor & Destructor Documentation

◆ __init__()

def mergeLHE.DefaultLHEMerger.__init__ (   self,
  input_files,
  output_file,
  kwargs 
)

Definition at line 30 of file mergeLHE.py.

30  def __init__(self, input_files, output_file, **kwargs):
31  super(DefaultLHEMerger, self).__init__(input_files, output_file)
32 
33  self.bypass_check = kwargs.get('bypass_check', False)
34  # line-by-line iterator for each input file
35  self._f = [self.file_iterator(name) for name in self.input_files]
36  self._header_str = []
37  self._is_mglo = False
38  self._xsec_combined = 0.
39  self._uwgt = 0.
40  self._init_str = [] # initiated blocks for each input file
41  self._nevent = [] # number of events for each input file
42 
def __init__(self, dataset, job_number, job_id, job_name, isDA, isMC, applyBOWS, applyEXTRACOND, extraconditions, runboundary, lumilist, intlumi, maxevents, gt, allFromGT, alignmentDB, alignmentTAG, apeDB, apeTAG, bowDB, bowTAG, vertextype, tracktype, refittertype, ttrhtype, applyruncontrol, ptcut, CMSSW_dir, the_dir)

Member Function Documentation

◆ check_header_compatibility()

def mergeLHE.DefaultLHEMerger.check_header_compatibility (   self)
Check if all headers for input files are consistent.

Definition at line 49 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger.bypass_check.

Referenced by mergeLHE.DefaultLHEMerger.merge().

49  def check_header_compatibility(self):
50  """Check if all headers for input files are consistent."""
51 
52  if self.bypass_check:
53  return
54 
55  inconsistent_error_info = ("Incompatibility found in LHE headers: %s. "
56  "Use -b/--bypass-check to bypass the check.")
57  allow_diff_keys = [
58  'nevent', 'numevts', 'iseed', 'Seed', 'Random', '.log', '.dat', '.lhe',
59  'Number of Events', 'Integrated weight'
60  ]
61  self._header_lines = [header.split('\n') for header in self._header_str]
62 
63  # Iterate over header lines for all input files and check consistency
64  logging.debug('header line number: %s' \
65  % ', '.join([str(len(lines)) for lines in self._header_lines]))
66  assert all([
67  len(self._header_lines[0]) == len(lines) for lines in self._header_lines]
68  ), inconsistent_error_info % "line number does not match"
69  inconsistent_lines_set = [set() for _ in self._header_lines]
70  for line_zip in zip(*self._header_lines):
71  if any([k in line_zip[0] for k in allow_diff_keys]):
72  logging.debug('Captured \'%s\', we allow difference in this line' % line_zip[0])
73  continue
74  if not all([line_zip[0] == line for line in line_zip]):
75  # Ok so meet inconsistency in some lines, then temporarily store them
76  for i, line in enumerate(line_zip):
77  inconsistent_lines_set[i].add(line)
78  # Those inconsistent lines still match, meaning that it is only a change of order
79  assert all([inconsistent_lines_set[0] == lset for lset in inconsistent_lines_set]), \
80  inconsistent_error_info % ('{' + ', '.join(inconsistent_lines_set[0]) + '}')
81 
def all(container)
workaround iterator generators for ROOT classes
Definition: cmstools.py:25
bool any(const std::vector< T > &v, const T &what)
Definition: ECalSD.cc:37
ALPAKA_FN_HOST_ACC ALPAKA_FN_INLINE constexpr float zip(ConstView const &tracks, int32_t i)
Definition: TracksSoA.h:90
static std::string join(char **cmd)
Definition: RemoteFile.cc:21
void add(std::map< std::string, TH1 *> &h, TH1 *hist)
#define str(s)

◆ file_iterator()

def mergeLHE.DefaultLHEMerger.file_iterator (   self,
  path 
)
Line-by-line iterator of a txt file

Definition at line 43 of file mergeLHE.py.

43  def file_iterator(self, path):
44  """Line-by-line iterator of a txt file"""
45  with open(path, 'r') as f: for line in f:
46  yield line
47 
48 

◆ merge()

def mergeLHE.DefaultLHEMerger.merge (   self)

Definition at line 196 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger._f, edmStreamStallGrapher.StallMonitorParser._f, edmTracerCompactLogViewer.TracerCompactFileParser._f, mergeLHE.DefaultLHEMerger._header_str, mergeLHE.DefaultLHEMerger._init_str, mergeLHE.DefaultLHEMerger._is_mglo, mergeLHE.DefaultLHEMerger._nevent, mergeLHE.DefaultLHEMerger._uwgt, mergeLHE.DefaultLHEMerger._xsec_combined, mps_setup.append, mergeLHE.DefaultLHEMerger.bypass_check, mergeLHE.DefaultLHEMerger.check_header_compatibility(), ALCARECOEcalPhiSym_cff.float, watchdog.group, join(), mergeLHE.DefaultLHEMerger.merge_headers(), mergeLHE.DefaultLHEMerger.merge_init_blocks(), GetRecoTauVFromDQM_MC_cff.next, DTT0WireWorkflow.DTT0WireWorkflow.output_file, mergeLHE.BaseLHEMerger.output_file, DTVdriftWorkflow.DTvdriftWorkflow.output_file, DTTtrigWorkflow.DTttrigWorkflow.output_file, isotrackApplyRegressor.range, and Validation_hcalonly_cfi.sign.

196  def merge(self):
197  with open(self.output_file, 'w') as fw:
198  # Read the header for the all input files
199  for i in range(len(self._f)):
200  header = []
201  line = next(self._f[i])
202  while not re.search('\s*<init(>|\s)', line):
203  header.append(line)
204  line = next(self._f[i])
205  # 'header' includes all contents before reaches <init>
206  self._header_str.append(''.join(header))
207  self.check_header_compatibility()
208 
209  # Read <init> blocks for all input_files
210  for i in range(len(self._f)):
211  init = []
212  line = next(self._f[i])
213  while not re.search('\s*</init>', line):
214  init.append(line)
215  line = next(self._f[i])
216  # 'init_str' includes all contents inside <init>...</init>
217  self._init_str.append(''.join(init))
218 
219  # Iterate over all events file-by-file and write events temporarily
220  # to .tmp.lhe
221  with open('.tmp.lhe', 'w') as _fwtmp:
222  for i in range(len(self._f)):
223  nevent = 0
224  while True:
225  line = next(self._f[i])
226  if re.search('\s*</event>', line):
227  nevent += 1
228  if re.search('\s*</LesHouchesEvents>', line):
229  break
230  _fwtmp.write(line)
231  self._nevent.append(nevent)
232  self._f[i].close()
233 
234  # Merge the header and init blocks and write to the output
235  fw.write(self.merge_headers())
236  fw.write('<init>\n' + self.merge_init_blocks() + '</init>\n')
237 
238  # Write event blocks in .tmp.lhe back to the output
239  # If is MG5 LO LHE, will recalculate the weights based on combined xsec
240  # and nevent read from <MGGenerationInfo>, and the 'event_norm' mode
241  if self._is_mglo and not self.bypass_check:
242  event_norm = re.search(
243  r'\s(\w+)\s*=\s*event_norm\s',
244  self._header_str[0]).group(1)
245  if event_norm == 'sum':
246  self._uwgt = self._xsec_combined / sum(self._nevent)
247  elif event_norm == 'average':
248  self._uwgt = self._xsec_combined
249  logging.info(("MG5 LO LHE with event_norm = %s detected. Will "
250  "recalculate weights in each event block.\n"
251  "Unit weight: %+.7E") % (event_norm, self._uwgt))
252 
253  # Modify event wgt when transfering .tmp.lhe to the output file
254  event_line = -999
255  with open('.tmp.lhe', 'r') as ftmp:
256  sign = lambda x: -1 if x < 0 else 1
257  for line in ftmp:
258  event_line += 1
259  if re.search('\s*<event.*>', line) and not re.search('\s*<event_num.*>', line):
260  event_line = 0
261  if event_line == 1:
262  # modify the XWGTUP appeared in the first line of the
263  # <event> block
264  orig_wgt = float(line.split()[2])
265  fw.write(re.sub(r'(^\s*\S+\s+\S+\s+)\S+(.+)', r'\g<1>%+.7E\g<2>' \
266  % (sign(orig_wgt) * self._uwgt), line))
267  elif re.search('\s*<wgt.*>.*</wgt>', line):
268  addi_wgt_str = re.search(r'<wgt.*>\s*(\S+)\s*<\/wgt>', line).group(1)
269  fw.write(line.replace(
270  addi_wgt_str, '%+.7E' % (float(addi_wgt_str) / orig_wgt * self._uwgt)))
271  else:
272  fw.write(line)
273  else:
274  # Simply transfer all lines
275  with open('.tmp.lhe', 'r') as ftmp:
276  for line in ftmp:
277  fw.write(line)
278  fw.write('</LesHouchesEvents>\n')
279  os.remove('.tmp.lhe')
280 
281 
int merge(int argc, char *argv[])
Definition: DiMuonVmerge.cc:28
static std::string join(char **cmd)
Definition: RemoteFile.cc:21

◆ merge_headers()

def mergeLHE.DefaultLHEMerger.merge_headers (   self)
Merge the headers of input LHEs. Need special handle for the MG5 LO case.

Definition at line 82 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger._header_str, mergeLHE.DefaultLHEMerger._is_mglo, mergeLHE.DefaultLHEMerger._nevent, mergeLHE.DefaultLHEMerger._xsec_combined, python.cmstools.all(), mergeLHE.DefaultLHEMerger.bypass_check, ALCARECOEcalPhiSym_cff.float, watchdog.group, python.rootplot.root2matplotlib.replace(), and reco.zip().

Referenced by mergeLHE.DefaultLHEMerger.merge().

82  def merge_headers(self):
83  """Merge the headers of input LHEs. Need special handle for the MG5 LO case."""
84 
85  self._is_mglo = all(['MGGenerationInfo' in header for header in self._header_str])
86  if self._is_mglo and not self.bypass_check:
87  # Special handling of MadGraph5 LO LHEs
88  match_geninfo = [
89  re.search(
90  (r"<MGGenerationInfo>\s+#\s*Number of Events\s*\:\s*(\S+)\s+"
91  r"#\s*Integrated weight \(pb\)\s*\:\s*(\S+)\s+<\/MGGenerationInfo>"),
92  header
93  ) for header in self._header_str
94  ]
95  self._xsec_combined = sum(
96  [float(info.group(2)) * nevt for info, nevt in zip(match_geninfo, self._nevent)]
97  ) / sum(self._nevent)
98  geninfo_combined = ("<MGGenerationInfo>\n"
99  "# Number of Events : %d\n"
100  "# Integrated weight (pb) : %.10f\n</MGGenerationInfo>") \
101  % (sum(self._nevent), self._xsec_combined)
102  logging.info('Detected: MG5 LO LHEs. Input <MGGenerationInfo>:\n\tnevt\txsec')
103  for info, nevt in zip(match_geninfo, self._nevent):
104  logging.info('\t%d\t%.10f' % (nevt, float(info.group(2))))
105  logging.info('Combined <MGGenerationInfo>:\n\t%d\t%.10f' \
106  % (sum(self._nevent), self._xsec_combined))
107 
108  header_combined = self._header_str[0].replace(match_geninfo[0].group(), geninfo_combined)
109  return header_combined
110 
111  else:
112  # No need to merge the headers
113  return self._header_str[0]
114 
def all(container)
workaround iterator generators for ROOT classes
Definition: cmstools.py:25
ALPAKA_FN_HOST_ACC ALPAKA_FN_INLINE constexpr float zip(ConstView const &tracks, int32_t i)
Definition: TracksSoA.h:90
def replace(string, replacements)

◆ merge_init_blocks()

def mergeLHE.DefaultLHEMerger.merge_init_blocks (   self)
If all <init> blocks are identical, return the same <init> block
(in the case of Powheg LHEs); otherwise, calculate the output <init>
blocks by merging the input blocks info using formula (same with the
MG5LOLHEMerger scheme):
    XSECUP = sum(xsecup * no.events) / tot.events
    XERRUP = sqrt( sum(sigma^2 * no.events^2) ) / tot.events
    XMAXUP = max(xmaxup)

Definition at line 115 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger._f, edmStreamStallGrapher.StallMonitorParser._f, edmTracerCompactLogViewer.TracerCompactFileParser._f, mergeLHE.DefaultLHEMerger._init_str, mergeLHE.DefaultLHEMerger._nevent, python.cmstools.all(), mergeLHE.DefaultLHEMerger.bypass_check, ALCARECOEcalPhiSym_cff.float, mergeLHE.BaseLHEMerger.input_files, DTWorkflow.DTWorkflow.input_files, createfilelist.int, relativeConstraints.keys, WZElectronSkims53X_cff.max, isotrackApplyRegressor.range, submitPVValidationJobs.split(), nano_mu_digi_cff.strip, and mkLumiAveragedPlots.tuple.

Referenced by mergeLHE.DefaultLHEMerger.merge().

115  def merge_init_blocks(self):
116  """If all <init> blocks are identical, return the same <init> block
117  (in the case of Powheg LHEs); otherwise, calculate the output <init>
118  blocks by merging the input blocks info using formula (same with the
119  MG5LOLHEMerger scheme):
120  XSECUP = sum(xsecup * no.events) / tot.events
121  XERRUP = sqrt( sum(sigma^2 * no.events^2) ) / tot.events
122  XMAXUP = max(xmaxup)
123  """
124 
125  if self.bypass_check:
126  # If bypass the consistency check, simply use the first LHE <init>
127  # block as the output
128  return self._init_str[0]
129 
130  # Initiate collected init block info. Will be in format of
131  # {iprocess: [xsecup, xerrup, xmaxup]}
132  new_init_block = {}
133  old_init_block = [{} for _ in self._init_str]
134 
135  # Read the xsecup, xerrup, and xmaxup info from the <init> block for
136  # all input LHEs
137  for i, bl in enumerate(self._init_str): # loop over files
138  nline = int(bl.split('\n')[0].strip().split()[-1])
139 
140  # loop over lines in <init> block
141  for bl_line in bl.split('\n')[1:nline + 1]:
142  bl_line_sp = bl_line.split()
143  old_init_block[i][int(bl_line_sp[3])] = [
144  float(bl_line_sp[0]), float(bl_line_sp[1]), float(bl_line_sp[2])]
145 
146  # After reading all subprocesses info, store the rest content in
147  # <init> block for the first file
148  if i == 0:
149  info_after_subprocess = bl.strip().split('\n')[nline + 1:]
150 
151  logging.info('Input file: %s' % self.input_files[i])
152  for ipr in sorted(list(old_init_block[i].keys()), reverse=True):
153  # reverse order: follow the MG5 custom
154  logging.info(' xsecup, xerrup, xmaxup, lprup: %.6E, %.6E, %.6E, %d' \
155  % tuple(old_init_block[i][ipr] + [ipr]))
156 
157  # Adopt smarter <init> block merging method
158  # If all <init> blocks from input files are identical, return the same block;
159  # otherwise combine them based on MG5LOLHEMerger scheme
160  if all([old_init_block[i] == old_init_block[0] for i in range(len(self._f))]):
161  # All <init> blocks are identical
162  logging.info(
163  'All input <init> blocks are identical. Output the same "<init> block.')
164  return self._init_str[0]
165 
166  # Otherwise, calculate merged init block
167  for i in range(len(self._f)):
168  for ipr in old_init_block[i]:
169  # Initiate the subprocess for the new block if it is found for the
170  # first time in one input file
171  if ipr not in new_init_block:
172  new_init_block[ipr] = [0., 0., 0.]
173  new_init_block[ipr][0] += old_init_block[i][ipr][0] * self._nevent[i] # xsecup
174  new_init_block[ipr][1] += old_init_block[i][ipr][1]**2 * self._nevent[i]**2 # xerrup
175  new_init_block[ipr][2] = max(new_init_block[ipr][2], old_init_block[i][ipr][2]) # xmaxup
176  tot_nevent = sum([self._nevent[i] for i in range(len(self._f))])
177 
178  # Write first line of the <init> block (modify the nprocess at the last)
179  self._merged_init_str = self._init_str[0].split('\n')[0].strip().rsplit(' ', 1)[0] \
180  + ' ' + str(len(new_init_block)) + '\n'
181  # Form the merged init block
182  logging.info('Output file: %s' % self.output_file)
183  for ipr in sorted(list(new_init_block.keys()), reverse=True):
184  # reverse order: follow the MG5 custom
185  new_init_block[ipr][0] /= tot_nevent
186  new_init_block[ipr][1] = math.sqrt(new_init_block[ipr][1]) / tot_nevent
187  logging.info(' xsecup, xerrup, xmaxup, lprup: %.6E, %.6E, %.6E, %d' \
188  % tuple(new_init_block[ipr] + [ipr]))
189  self._merged_init_str += '%.6E %.6E %.6E %d\n' % tuple(new_init_block[ipr] + [ipr])
190  self._merged_init_str += '\n'.join(info_after_subprocess)
191  if len(info_after_subprocess):
192  self._merged_init_str += '\n'
193 
194  return self._merged_init_str
195 
def all(container)
workaround iterator generators for ROOT classes
Definition: cmstools.py:25
vecTString rsplit(TString in, TString separator="=")
Definition: stringutil.cc:49
static std::string join(char **cmd)
Definition: RemoteFile.cc:21
#define str(s)

Member Data Documentation

◆ _f

◆ _header_lines

mergeLHE.DefaultLHEMerger._header_lines
private

Definition at line 61 of file mergeLHE.py.

◆ _header_str

mergeLHE.DefaultLHEMerger._header_str
private

◆ _init_str

mergeLHE.DefaultLHEMerger._init_str
private

◆ _is_mglo

mergeLHE.DefaultLHEMerger._is_mglo
private

◆ _merged_init_str

mergeLHE.DefaultLHEMerger._merged_init_str
private

Definition at line 179 of file mergeLHE.py.

◆ _nevent

mergeLHE.DefaultLHEMerger._nevent
private

◆ _uwgt

mergeLHE.DefaultLHEMerger._uwgt
private

Definition at line 39 of file mergeLHE.py.

Referenced by mergeLHE.DefaultLHEMerger.merge().

◆ _xsec_combined

mergeLHE.DefaultLHEMerger._xsec_combined
private

◆ bypass_check