CMS 3D CMS Logo

List of all members | Public Member Functions | Public Attributes | Private Attributes
mergeLHE.DefaultLHEMerger Class Reference
Inheritance diagram for mergeLHE.DefaultLHEMerger:
mergeLHE.BaseLHEMerger

Public Member Functions

def __init__ (self, input_files, output_file, kwargs)
 
def check_header_compatibility (self)
 
def file_iterator (self, path)
 
def merge (self)
 
def merge_headers (self)
 
def merge_init_blocks (self)
 
- Public Member Functions inherited from mergeLHE.BaseLHEMerger
def __init__ (self, input_files, output_file)
 
def merge (self)
 

Public Attributes

 bypass_check
 
- Public Attributes inherited from mergeLHE.BaseLHEMerger
 input_files
 
 output_file
 

Private Attributes

 _f
 
 _header_lines
 
 _header_str
 
 _init_str
 
 _is_mglo
 
 _merged_init_str
 
 _nevent
 
 _uwgt
 
 _xsec_combined
 

Detailed Description

Default LHE merge scheme that copies the header of the first LHE file,
merges and outputs the init block, then concatenates all event blocks.

Definition at line 23 of file mergeLHE.py.

Constructor & Destructor Documentation

◆ __init__()

def mergeLHE.DefaultLHEMerger.__init__ (   self,
  input_files,
  output_file,
  kwargs 
)

Definition at line 27 of file mergeLHE.py.

27  def __init__(self, input_files, output_file, **kwargs):
28  super(DefaultLHEMerger, self).__init__(input_files, output_file)
29 
30  self.bypass_check = kwargs.get('bypass_check', False)
31  # line-by-line iterator for each input file
32  self._f = [self.file_iterator(name) for name in self.input_files]
33  self._header_str = []
34  self._is_mglo = False
35  self._xsec_combined = 0.
36  self._uwgt = 0.
37  self._init_str = [] # initiated blocks for each input file
38  self._nevent = [] # number of events for each input file
39 
def __init__(self, dataset, job_number, job_id, job_name, isDA, isMC, applyBOWS, applyEXTRACOND, extraconditions, runboundary, lumilist, intlumi, maxevents, gt, allFromGT, alignmentDB, alignmentTAG, apeDB, apeTAG, bowDB, bowTAG, vertextype, tracktype, refittertype, ttrhtype, applyruncontrol, ptcut, CMSSW_dir, the_dir)

Member Function Documentation

◆ check_header_compatibility()

def mergeLHE.DefaultLHEMerger.check_header_compatibility (   self)
Check if all headers for input files are consistent.

Definition at line 46 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger.bypass_check.

Referenced by mergeLHE.DefaultLHEMerger.merge().

46  def check_header_compatibility(self):
47  """Check if all headers for input files are consistent."""
48 
49  if self.bypass_check:
50  return
51 
52  inconsistent_error_info = ("Incompatibility found in LHE headers: %s. "
53  "Use -b/--bypass-check to bypass the check.")
54  allow_diff_keys = [
55  'nevent', 'numevts', 'iseed', 'Seed', 'Random', '.log', '.dat', '.lhe',
56  'Number of Events', 'Integrated weight'
57  ]
58  self._header_lines = [header.split('\n') for header in self._header_str]
59 
60  # Iterate over header lines for all input files and check consistency
61  logging.debug('header line number: %s' \
62  % ', '.join([str(len(lines)) for lines in self._header_lines]))
63  assert all([
64  len(self._header_lines[0]) == len(lines) for lines in self._header_lines]
65  ), inconsistent_error_info % "line number not matches"
66  inconsistent_lines_set = [set() for _ in self._header_lines]
67  for line_zip in zip(*self._header_lines):
68  if any([k in line_zip[0] for k in allow_diff_keys]):
69  logging.debug('Captured \'%s\', we allow difference in this line' % line_zip[0])
70  continue
71  if not all([line_zip[0] == line for line in line_zip]):
72  # Ok so meet inconsistency in some lines, then temporarily store them
73  for i, line in enumerate(line_zip):
74  inconsistent_lines_set[i].add(line)
75  # Those inconsistent lines still match, meaning that it is only a change of order
76  assert all([inconsistent_lines_set[0] == lset for lset in inconsistent_lines_set]), \
77  inconsistent_error_info % ('{' + ', '.join(inconsistent_lines_set[0]) + '}')
78 
def all(container)
workaround iterator generators for ROOT classes
Definition: cmstools.py:25
bool any(const std::vector< T > &v, const T &what)
Definition: ECalSD.cc:37
OutputIterator zip(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, InputIterator2 last2, OutputIterator result, Compare comp)
static std::string join(char **cmd)
Definition: RemoteFile.cc:19
void add(std::map< std::string, TH1 *> &h, TH1 *hist)
#define str(s)

◆ file_iterator()

def mergeLHE.DefaultLHEMerger.file_iterator (   self,
  path 
)
Line-by-line iterator of a txt file

Definition at line 40 of file mergeLHE.py.

40  def file_iterator(self, path):
41  """Line-by-line iterator of a txt file"""
42  with open(path, 'r') as f: for line in f:
43  yield line
44 
45 

◆ merge()

def mergeLHE.DefaultLHEMerger.merge (   self)

Definition at line 193 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger._f, edmStreamStallGrapher.StallMonitorParser._f, mergeLHE.DefaultLHEMerger._header_str, mergeLHE.DefaultLHEMerger._init_str, mergeLHE.DefaultLHEMerger._is_mglo, mergeLHE.DefaultLHEMerger._nevent, mergeLHE.DefaultLHEMerger._uwgt, mergeLHE.DefaultLHEMerger._xsec_combined, mps_setup.append, mergeLHE.DefaultLHEMerger.bypass_check, mergeLHE.DefaultLHEMerger.check_header_compatibility(), dqmMemoryStats.float, watchdog.group, join(), mergeLHE.DefaultLHEMerger.merge_headers(), mergeLHE.DefaultLHEMerger.merge_init_blocks(), GetRecoTauVFromDQM_MC_cff.next, mergeLHE.BaseLHEMerger.output_file, DTT0WireWorkflow.DTT0WireWorkflow.output_file, DTVdriftWorkflow.DTvdriftWorkflow.output_file, DTTtrigWorkflow.DTttrigWorkflow.output_file, FastTimerService_cff.range, and Validation_hcalonly_cfi.sign.

193  def merge(self):
194  with open(self.output_file, 'w') as fw:
195  # Read the header for the all input files
196  for i in range(len(self._f)):
197  header = []
198  line = next(self._f[i])
199  while not re.search('\s*<init(>|\s)', line):
200  header.append(line)
201  line = next(self._f[i])
202  # 'header' includes all contents before reaches <init>
203  self._header_str.append(''.join(header))
204  self.check_header_compatibility()
205 
206  # Read <init> blocks for all input_files
207  for i in range(len(self._f)):
208  init = []
209  line = next(self._f[i])
210  while not re.search('\s*</init>', line):
211  init.append(line)
212  line = next(self._f[i])
213  # 'init_str' includes all contents inside <init>...</init>
214  self._init_str.append(''.join(init))
215 
216  # Iterate over all events file-by-file and write events temporarily
217  # to .tmp.lhe
218  with open('.tmp.lhe', 'w') as _fwtmp:
219  for i in range(len(self._f)):
220  nevent = 0
221  while True:
222  line = next(self._f[i])
223  if re.search('\s*</event>', line):
224  nevent += 1
225  if re.search('\s*</LesHouchesEvents>', line):
226  break
227  _fwtmp.write(line)
228  self._nevent.append(nevent)
229  self._f[i].close()
230 
231  # Merge the header and init blocks and write to the output
232  fw.write(self.merge_headers())
233  fw.write('<init>\n' + self.merge_init_blocks() + '</init>\n')
234 
235  # Write event blocks in .tmp.lhe back to the output
236  # If is MG5 LO LHE, will recalculate the weights based on combined xsec
237  # and nevent read from <MGGenerationInfo>, and the 'event_norm' mode
238  if self._is_mglo and not self.bypass_check:
239  event_norm = re.search(
240  r'\s(\w+)\s*=\s*event_norm\s',
241  self._header_str[0]).group(1)
242  if event_norm == 'sum':
243  self._uwgt = self._xsec_combined / sum(self._nevent)
244  elif event_norm == 'average':
245  self._uwgt = self._xsec_combined
246  logging.info(("MG5 LO LHE with event_norm = %s detected. Will "
247  "recalculate weights in each event block.\n"
248  "Unit weight: %+.7E") % (event_norm, self._uwgt))
249 
250  # Modify event wgt when transfering .tmp.lhe to the output file
251  event_line = -999
252  with open('.tmp.lhe', 'r') as ftmp:
253  sign = lambda x: -1 if x < 0 else 1
254  for line in ftmp:
255  event_line += 1
256  if re.search('\s*<event.*>', line):
257  event_line = 0
258  if event_line == 1:
259  # modify the XWGTUP appeared in the first line of the
260  # <event> block
261  orig_wgt = float(line.split()[2])
262  fw.write(re.sub(r'(^\s*\S+\s+\S+\s+)\S+(.+)', r'\g<1>%+.7E\g<2>' \
263  % (sign(orig_wgt) * self._uwgt), line))
264  elif re.search('\s*<wgt.*>.*</wgt>', line):
265  addi_wgt_str = re.search(r'<wgt.*>\s*(\S+)\s*<\/wgt>', line).group(1)
266  fw.write(line.replace(
267  addi_wgt_str, '%+.7E' % (float(addi_wgt_str) / orig_wgt * self._uwgt)))
268  else:
269  fw.write(line)
270  else:
271  # Simply transfer all lines
272  with open('.tmp.lhe', 'r') as ftmp:
273  for line in ftmp:
274  fw.write(line)
275  fw.write('</LesHouchesEvents>\n')
276  os.remove('.tmp.lhe')
277 
278 
static std::string join(char **cmd)
Definition: RemoteFile.cc:19
def merge(dictlist, TELL=False)
Definition: MatrixUtil.py:205

◆ merge_headers()

def mergeLHE.DefaultLHEMerger.merge_headers (   self)
Merge the headers of input LHEs. Need special handle for the MG5 LO case.

Definition at line 79 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger._header_str, mergeLHE.DefaultLHEMerger._is_mglo, mergeLHE.DefaultLHEMerger._nevent, mergeLHE.DefaultLHEMerger._xsec_combined, python.cmstools.all(), mergeLHE.DefaultLHEMerger.bypass_check, dqmMemoryStats.float, watchdog.group, python.rootplot.root2matplotlib.replace(), and ComparisonHelper.zip().

Referenced by mergeLHE.DefaultLHEMerger.merge().

79  def merge_headers(self):
80  """Merge the headers of input LHEs. Need special handle for the MG5 LO case."""
81 
82  self._is_mglo = all(['MGGenerationInfo' in header for header in self._header_str])
83  if self._is_mglo and not self.bypass_check:
84  # Special handling of MadGraph5 LO LHEs
85  match_geninfo = [
86  re.search(
87  (r"<MGGenerationInfo>\s+#\s*Number of Events\s*\:\s*(\S+)\s+"
88  r"#\s*Integrated weight \(pb\)\s*\:\s*(\S+)\s+<\/MGGenerationInfo>"),
89  header
90  ) for header in self._header_str
91  ]
92  self._xsec_combined = sum(
93  [float(info.group(2)) * nevt for info, nevt in zip(match_geninfo, self._nevent)]
94  ) / sum(self._nevent)
95  geninfo_combined = ("<MGGenerationInfo>\n"
96  "# Number of Events : %d\n"
97  "# Integrated weight (pb) : %.10f\n</MGGenerationInfo>") \
98  % (sum(self._nevent), self._xsec_combined)
99  logging.info('Detected: MG5 LO LHEs. Input <MGGenerationInfo>:\n\tnevt\txsec')
100  for info, nevt in zip(match_geninfo, self._nevent):
101  logging.info('\t%d\t%.10f' % (nevt, float(info.group(2))))
102  logging.info('Combined <MGGenerationInfo>:\n\t%d\t%.10f' \
103  % (sum(self._nevent), self._xsec_combined))
104 
105  header_combined = self._header_str[0].replace(match_geninfo[0].group(), geninfo_combined)
106  return header_combined
107 
108  else:
109  # No need to merge the headers
110  return self._header_str[0]
111 
def all(container)
workaround iterator generators for ROOT classes
Definition: cmstools.py:25
def replace(string, replacements)
OutputIterator zip(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, InputIterator2 last2, OutputIterator result, Compare comp)

◆ merge_init_blocks()

def mergeLHE.DefaultLHEMerger.merge_init_blocks (   self)
If all <init> blocks are identical, return the same <init> block
(in the case of Powheg LHEs); otherwise, calculate the output <init>
blocks by merging the input blocks info using formula (same with the
MG5LOLHEMerger scheme):
    XSECUP = sum(xsecup * no.events) / tot.events
    XERRUP = sqrt( sum(sigma^2 * no.events^2) ) / tot.events
    XMAXUP = max(xmaxup)

Definition at line 112 of file mergeLHE.py.

References mergeLHE.DefaultLHEMerger._f, edmStreamStallGrapher.StallMonitorParser._f, mergeLHE.DefaultLHEMerger._init_str, mergeLHE.DefaultLHEMerger._nevent, python.cmstools.all(), mergeLHE.DefaultLHEMerger.bypass_check, dqmMemoryStats.float, mergeLHE.BaseLHEMerger.input_files, DTWorkflow.DTWorkflow.input_files, createfilelist.int, relativeConstraints.keys, SiStripPI.max, FastTimerService_cff.range, submitPVValidationJobs.split(), and digitizers_cfi.strip.

Referenced by mergeLHE.DefaultLHEMerger.merge().

112  def merge_init_blocks(self):
113  """If all <init> blocks are identical, return the same <init> block
114  (in the case of Powheg LHEs); otherwise, calculate the output <init>
115  blocks by merging the input blocks info using formula (same with the
116  MG5LOLHEMerger scheme):
117  XSECUP = sum(xsecup * no.events) / tot.events
118  XERRUP = sqrt( sum(sigma^2 * no.events^2) ) / tot.events
119  XMAXUP = max(xmaxup)
120  """
121 
122  if self.bypass_check:
123  # If bypass the consistency check, simply use the first LHE <init>
124  # block as the output
125  return self._init_str[0]
126 
127  # Initiate collected init block info. Will be in format of
128  # {iprocess: [xsecup, xerrup, xmaxup]}
129  new_init_block = {}
130  old_init_block = [{} for _ in self._init_str]
131 
132  # Read the xsecup, xerrup, and xmaxup info from the <init> block for
133  # all input LHEs
134  for i, bl in enumerate(self._init_str): # loop over files
135  nline = int(bl.split('\n')[0].strip().split()[-1])
136 
137  # loop over lines in <init> block
138  for bl_line in bl.split('\n')[1:nline + 1]:
139  bl_line_sp = bl_line.split()
140  old_init_block[i][int(bl_line_sp[3])] = [
141  float(bl_line_sp[0]), float(bl_line_sp[1]), float(bl_line_sp[2])]
142 
143  # After reading all subprocesses info, store the rest content in
144  # <init> block for the first file
145  if i == 0:
146  info_after_subprocess = bl.strip().split('\n')[nline + 1:]
147 
148  logging.info('Input file: %s' % self.input_files[i])
149  for ipr in sorted(list(old_init_block[i].keys()), reverse=True):
150  # reverse order: follow the MG5 custom
151  logging.info(' xsecup, xerrup, xmaxup, lprup: %.6E, %.6E, %.6E, %d' \
152  % tuple(old_init_block[i][ipr] + [ipr]))
153 
154  # Adopt smarter <init> block merging method
155  # If all <init> blocks from input files are identical, return the same block;
156  # otherwise combine them based on MG5LOLHEMerger scheme
157  if all([old_init_block[i] == old_init_block[0] for i in range(len(self._f))]):
158  # All <init> blocks are identical
159  logging.info(
160  'All input <init> blocks are identical. Output the same "<init> block.')
161  return self._init_str[0]
162 
163  # Otherwise, calculate merged init block
164  for i in range(len(self._f)):
165  for ipr in old_init_block[i]:
166  # Initiate the subprocess for the new block if it is found for the
167  # first time in one input file
168  if ipr not in new_init_block:
169  new_init_block[ipr] = [0., 0., 0.]
170  new_init_block[ipr][0] += old_init_block[i][ipr][0] * self._nevent[i] # xsecup
171  new_init_block[ipr][1] += old_init_block[i][ipr][1]**2 * self._nevent[i]**2 # xerrup
172  new_init_block[ipr][2] = max(new_init_block[ipr][2], old_init_block[i][ipr][2]) # xmaxup
173  tot_nevent = sum([self._nevent[i] for i in range(len(self._f))])
174 
175  # Write first line of the <init> block (modify the nprocess at the last)
176  self._merged_init_str = self._init_str[0].split('\n')[0].strip().rsplit(' ', 1)[0] \
177  + ' ' + str(len(new_init_block)) + '\n'
178  # Form the merged init block
179  logging.info('Output file: %s' % self.output_file)
180  for ipr in sorted(list(new_init_block.keys()), reverse=True):
181  # reverse order: follow the MG5 custom
182  new_init_block[ipr][0] /= tot_nevent
183  new_init_block[ipr][1] = math.sqrt(new_init_block[ipr][1]) / tot_nevent
184  logging.info(' xsecup, xerrup, xmaxup, lprup: %.6E, %.6E, %.6E, %d' \
185  % tuple(new_init_block[ipr] + [ipr]))
186  self._merged_init_str += '%.6E %.6E %.6E %d\n' % tuple(new_init_block[ipr] + [ipr])
187  self._merged_init_str += '\n'.join(info_after_subprocess)
188  if len(info_after_subprocess):
189  self._merged_init_str += '\n'
190 
191  return self._merged_init_str
192 
def all(container)
workaround iterator generators for ROOT classes
Definition: cmstools.py:25
static std::string join(char **cmd)
Definition: RemoteFile.cc:19
#define str(s)

Member Data Documentation

◆ _f

mergeLHE.DefaultLHEMerger._f
private

◆ _header_lines

mergeLHE.DefaultLHEMerger._header_lines
private

Definition at line 58 of file mergeLHE.py.

◆ _header_str

mergeLHE.DefaultLHEMerger._header_str
private

◆ _init_str

mergeLHE.DefaultLHEMerger._init_str
private

◆ _is_mglo

mergeLHE.DefaultLHEMerger._is_mglo
private

◆ _merged_init_str

mergeLHE.DefaultLHEMerger._merged_init_str
private

Definition at line 176 of file mergeLHE.py.

◆ _nevent

mergeLHE.DefaultLHEMerger._nevent
private

◆ _uwgt

mergeLHE.DefaultLHEMerger._uwgt
private

Definition at line 36 of file mergeLHE.py.

Referenced by mergeLHE.DefaultLHEMerger.merge().

◆ _xsec_combined

mergeLHE.DefaultLHEMerger._xsec_combined
private

◆ bypass_check