Bug Summary

File:scratch/gluex/scan-build-work/hdgeant4/src/AdaptiveSampler.cc
Location:line 257, column 11
Description:Access to field 'divAxis' results in a dereference of a null pointer (loaded from variable 'sel')

Annotated Source Code

1//
2// AdaptiveSampler - provide an adaptive algorithm for improved
3// importance sampling in Monte Carlo integration
4//
5// file: AdaptiveSampler.cc (see AdaptiveSampler.hh for header, usage)
6// author: richard.t.jones at uconn.edu
7// original release: february 23, 2016
8//
9// new feature release: may 30,2 2020
10// - see new notes 7,8 below
11// - richard.t.jones at uconn.edu
12//
13// implementation notes:
14// 1) See the description at the top of AdaptiveSampler.hh for an overview
15// of the problem that this class is designed to help solve, and how to
16// use it.
17//
18// 2) If an AdaptiveSampler object is being shared by more than one thread,
19// only one of the threads can be calling the feedback method in the
20// sample loop at a time. Simultaneous entry from more than one thread
21// to any of the mutating methods (feedback, adapt, reset_stats, etc.)
22// produces undefined behavior. Either you can share a single instance
23// of AdaptiveSampler among many threads with the calls to mutating
24// methods protected by a mutex, or you can construct an AdapativeSample
25// instance for each thread and then pause at regular intervals to
26// combine their statistics using save/merge/restore.
27//
28// 3) Users must provide their uniform random number generator function to
29// the AdaptiveSampler constructor. If the sample method is to be called
30// from multiple threads, this user-written function must be reentrant.
31// The user is responsible for initializing the state of the generator
32// before beginning the Monte Carlo loop.
33//
34// 4) Successful use of AdaptiveSampler requires that the user understand
35// some basic things about opportunistic importance sampling. The basic
36// strategy it uses is to look for hot spots in the integration domain
37// and zoom in to oversample in those regions. By implication, there
38// are regions in the integration domain that are undersampled, and
39// this can lead to systematic underbias in the result. Because of this
40// bias, importance sampling is really a variance-reduction technique:
41// only by repeated application under a variety of conditions does one
42// reach any level of confidence that the result is correct. Tuning
43// parameters are provided to allow the user to vary the conditions
44// for adaptation and check the robustness of their result.
45// *) Adaptation_sampling_threshold - minimum fraction of the total
46// sample sum of wI**2 that must be coming from a given cell in
47// order for that cell to be considered for splitting.
48// *) Adaptation_efficiency_target - this is the goal that is set
49// for the adaptation algorithm to shoot for. Adaptations are
50// allowed if and only if the improvements to the efficiency
51// that they entail have a reasonable chance of achieving the
52// target value before Adaptation_maximum_cells (see below) is
53// reached.
54// *) Adaptation_maximum_depth - this is the maximum depth of the
55// adaptation tree. The finest subdivision of the domain of
56// integration will be a cell of volume 3**maximum_depth. This
57// prevents the algorithm from tuning on noise if the inherent
58// resolution in the integrand function is has a finite limit.
59// Because of the intrinsic limitations of double precision
60// floats, this parameter should normally not exceed 35.
61// *) Adaptation_maximum_cells - maximum number of cells that
62// should be allowed by the adaptation algorithm in building
63// out the sampling tree. The sampling tree is a hierarchical
64// subdivision of the integration domain that selects certain
65// regions in the ndim-dimensional hypercube for oversampling.
66// Each cell in the tree requires memory resources to hold the
67// statistics that it collects for the hits that are generated
68// within that cell. All cells in the tree have the same size,
69// approximately 40 + (8 * 3^ndim) for ndim dimensions. The
70// number of cells that are needed to achieve a certain target
71// efficiency depends on the problem. Generally one should
72// plan to accommodate as many cells as memory resources can
73// support. The algorithm attempts to achieve the efficiency
74// target using as few cells as it can.
75//
76// 5) Generally, one can expect to achieve a factor 3 improvement in
77// sampling efficiency each time the maximum depth of the tree
78// advances by ndim. Generically it is the maximum depth of the
79// tree and not the number of cells it contains that indicates how
80// much improvement the sampler has achieved over unbiased sampling.
81// The number of cells required to advance the depth another ndim
82// levels may increase or decrease with depth depending on the
83// fractal dimension of the integrand. It really depends on the
84// problem; there is no way in advance to predict how this will
85// behave. Obviously integrands of low fractal dimension will be
86// more susceptible to the generic subdivision strategy employed
87// by the AdaptiveSampling algorithm than those with high dimension.
88//
89// 6) Here are rules of thumb that will keep you from common blunders.
90// a) Do not be too aggressive in adapting, especially at first
91// when the AdaptiveSampler is new and undifferentiated. Keep
92// the sampling_threshold value as high as you can (1-10 is a
93// good range to start off with), only lower it if you cannot
94// get the adaptation to progress. Keep in mind that the lower
95// this threshold is set, the more susceptible you are to
96// tuning on noise and introducing uncontrolled underbias.
97// b) Bias is generically on the low side of the true result.
98// This means that if you repeat a measurement 25 times and
99// one repetition gives an answer that is many sigma above
100// the rest, there is a low-visibility region in integration
101// space that this repetition has found. Based on this, you
102// should modify your adaptation strategy to increase the
103// visiblity of that region early on during development. See
104// point (c) for how that can be done.
105// c) If you know there are low-visiblity regions of the domain
106// of integration that tend to get excluded by the adaptation
107// algorithm, you can "train" the AdaptiveSampler by running it
108// initially for several thousand events on a integrand that is
109// biased in favor of these regions. Mixing in these events with
110// a larger sample of the true integrand over several rounds of
111// adapt/reset_stats/repeat will build sensitivity to these
112// regions into the tree that should persist throughout the
113// remainder of its evolution.
114// d) Run large batches of MC jobs in parallel during adaptation
115// and use saveState/mergeState, then adapt on their combined
116// results. Do not adapt separately in each individual thread
117// unless your goal is to generate as diverse a set of trees
118// as possible. The adaptation algorithm in AdaptiveSampler is
119// designed to deterministically produce the optimum sampling
120// tree in the large-N limit, so the best trees are those that
121// are based on the largest statistical sample. Do not try to
122// generate a super-high performance tree by producing many of
123// them and chosing the one with the highest efficiency -- it
124// may have high efficiency, but by doing this you have almost
125// guaranteed that your result will have a large bias.
126//
127// 7) The user can now set up the problem with higher dimension than
128// the number of random numbers needed per event, with the extra
129// dimensions allocated to user-generated random variables. These
130// must be transformed by the user to fall within the domain [0,1]
131// and passed to sample() and feedback() in the first nfixed
132// elements of the u vector (first argument). If nfixed > 0 then
133// it needs to be passed as the last argument to the constructor
134// of AdaptiveSampler.
135
136#include <assert.h>
137#include <cmath>
138#include <fstream>
139#include <sstream>
140#include <iostream>
141#include <exception>
142#include "AdaptiveSampler.hh"
143
144int AdaptiveSampler::verbosity = 3;
145
146AdaptiveSampler::AdaptiveSampler(int dim, Uniform01 user_generator, int nfixed)
147 : fNdim(dim),
148 fNfixed(nfixed),
149 fMaximum_depth(30),
150 fMaximum_cells(1000000),
151 fSampling_threshold(0.01),
152 fEfficiency_target(0.9),
153 fMinimum_sum_wI2_delta(0)
154{
155 fRandom = user_generator;
156 fTopCell = new Cell(dim, nfixed);
157 fTopCell->subset = 1;
158 reset_stats();
159}
160
161AdaptiveSampler::AdaptiveSampler(const AdaptiveSampler &src)
162 : fNdim(src.fNdim),
163 fNfixed(src.fNfixed),
164 fMaximum_depth(src.fMaximum_depth),
165 fMaximum_cells(src.fMaximum_cells),
166 fSampling_threshold(src.fSampling_threshold),
167 fEfficiency_target(src.fEfficiency_target),
168 fMinimum_sum_wI2_delta(src.fMinimum_sum_wI2_delta)
169{
170 fRandom = src.fRandom;
171 fTopCell = new Cell(*src.fTopCell);
172}
173
174AdaptiveSampler::~AdaptiveSampler()
175{
176 delete fTopCell;
177}
178
179AdaptiveSampler AdaptiveSampler::operator=(const AdaptiveSampler &src)
180{
181 return AdaptiveSampler(src);
182}
183
184double AdaptiveSampler::sample(double *u)
185{
186 for (int i=0; i < fNfixed; ++i) {
187 if (u[i] < 0 || u[i] >= 1) {
188 std::cerr << "AdaptiveSampler::sample error - "
189 << "fixed parameter " << i << " is "
190 << "outside the allowed interval [0,1), "
191 << "value is " << u[i] << std::endl;
192 u[i] = 0;
193 }
194 }
195 int depth;
196 double *u0 = new double[fNdim];
197 double *u1 = new double[fNdim];
198 double *uu = new double[fNdim + 1];
199 (*fRandom)(fNdim - fNfixed + 1, uu + fNfixed);
200 Cell *cell = findCell(uu[fNdim], depth, u0, u1, u);
201 for (int i=fNfixed; i < fNdim; ++i) {
202 u[i] = uu[i] * u0[i] + (1 - uu[i]) * u1[i];
203 }
204 delete [] u0;
205 delete [] u1;
206 delete [] uu;
207 // WARNING: strong condition is assumed here!
208 // All splits along axes 0..fNfixed must be each
209 // assigned the full subset of the parent cell.
210 // You can use check_subsets() to verify this.
211 double dNu = (depth > 0)? pow(1/3., depth) : 1;
212 return dNu / cell->subset;
213}
214
215AdaptiveSampler::Cell *AdaptiveSampler::findCell(double ucell,
216 int &depth,
217 double *u0,
218 double *u1,
219 const double *u)
220{
221 std::fill(u0, u0 + fNdim, 0);
222 std::fill(u1, u1 + fNdim, 1);
223 depth = 0;
224 Cell *sel = fTopCell;
225 while (sel->divAxis > -1) {
226 int i;
227 int j = sel->divAxis;
228 double du = (u1[j] - u0[j]) / 3;
229 if (j < fNfixed) {
230 i = int((u[j] - u0[j]) / du);
231 }
232 else {
233 for (i=0; i < 2; i++) {
234 if (ucell >= sel->subcell[i]->subset)
235 ucell -= sel->subcell[i]->subset;
236 else
237 break;
238 }
239 ++depth;
240 }
241 u0[j] = u0[j] + du * i;
242 u1[j] = u0[j] + du;
243 sel = sel->subcell[i];
244 }
245 return sel;
246}
247
248AdaptiveSampler::Cell *AdaptiveSampler::findCell(const double *u,
249 int &depth,
250 double *u0,
251 double *u1) const
252{
253 std::fill(u0, u0 + fNdim, 0);
254 std::fill(u1, u1 + fNdim, 1);
255 depth = 0;
256 Cell *sel = fTopCell;
257 while (sel->divAxis > -1) {
2
Loop condition is true. Entering loop body
9
Loop condition is true. Entering loop body
15
Access to field 'divAxis' results in a dereference of a null pointer (loaded from variable 'sel')
258 int j = sel->divAxis;
259 double du = (u1[j] - u0[j]) / 3;
260 int i = floor((u[j] - u0[j]) / du);
261 i = (i < 0)? 0 : (i < 3)? i : 2;
3
Assuming 'i' is >= 0
4
'?' condition is false
5
Assuming 'i' is >= 3
6
'?' condition is false
10
Assuming 'i' is < 0
11
'?' condition is true
262 u0[j] = u0[j] + du * i;
263 u1[j] = u0[j] + du;
264 if (sel->subcell[i] == 0) {
7
Taking false branch
12
Taking true branch
265 std::cerr << "bad romans!" << std::endl;
266 }
267 sel = sel->subcell[i];
13
Null pointer value stored to 'sel'
268 if (j >= fNfixed)
8
Taking false branch
14
Taking false branch
269 ++depth;
270 }
271 return sel;
272}
273
274void AdaptiveSampler::feedback(const double *u, double wI)
275{
276 int depth;
277 double *u0 = new double[fNdim];
278 double *u1 = new double[fNdim];
279 Cell *cell = findCell(u, depth, u0, u1);
1
Calling 'AdaptiveSampler::findCell'
280 cell->nhit += 1;
281 cell->sum_wI += wI;
282 double wI2 = wI*wI;
283 cell->sum_wI2 += wI2;
284 cell->sum_wI4 += wI2*wI2;
285 for (int n=0; n < fNdim; ++n) {
286 double du = (u1[n] - u0[n]) / 3;
287 int s = (u[n] - u0[n]) / du;
288 assert (s >= 0 && s < 3)((s >= 0 && s < 3) ? static_cast<void> (0
) : __assert_fail ("s >= 0 && s < 3", "src/AdaptiveSampler.cc"
, 288, __PRETTY_FUNCTION__))
;
289 cell->sum_wI2d[3*n+s] += wI2;
290 cell->sum_wI4d[3*n+s] += wI2*wI2;
291 }
292 delete [] u0;
293 delete [] u1;
294}
295
296int AdaptiveSampler::adapt()
297{
298 int ncells = fTopCell->sum_stats();
299 std::cout << "starting sum_wI2 is " << fTopCell->sum_wI2 << std::endl;
300 double sum_wI2_target = pow(fTopCell->sum_wI, 2) /
301 (fTopCell->nhit * fEfficiency_target);
302 double sum_wI2_error = sqrt(fTopCell->sum_wI4) + 1e-99;
303 double gain_sig = (fTopCell->sum_wI2 - sum_wI2_target) / sum_wI2_error;
304 if (gain_sig > 3) {
305 if (verbosity > 0)
306 std::cout << "adapt - sample statistics indicate that significant"
307 << " gains (" << gain_sig << " sigma)" << std::endl
308 << "in sampling efficiency can be achieved through"
309 << " further optimization, beginning optimization pass."
310 << std::endl;
311 }
312 else {
313 if (verbosity > 0)
314 std::cout << "adapt - sample statistics indicate that significant"
315 << " gains in sampling efficiency cannot be achieved through"
316 << " further optimization "
317 << "(" << gain_sig << " sigma),"
318 << " a larger sample size is needed."
319 << std::endl;
320 }
321 std::cout << "Looking for cells containing at least "
322 << fSampling_threshold * 100 << "% of the total "
323 << "sample sum_wI2 = " << fTopCell->sum_wI2
324 << std::endl;
325
326 fMinimum_sum_wI2_delta = (fTopCell->sum_wI2 - sum_wI2_target) /
327 (fMaximum_cells - ncells);
328 std::vector<int> cellIndex;
329 int newcells = recursively_update(cellIndex);
330 if (verbosity > 0) {
331 if (newcells) {
332 if (verbosity > 0)
333 std::cout << "adapt - " << newcells << " added this pass,"
334 << " new count is " << getNcells()
335 << std::endl;
336 }
337 else {
338 if (verbosity > 0)
339 std::cout << "adapt - no changes this pass,"
340 << " count remains unchanged at " << getNcells()
341 << std::endl;
342 }
343 }
344 if (verbosity > 1) {
345 std::cout << "present efficiency " << getEfficiency()
346 << ", new value predicted to be " << getEfficiency(true)
347 << std::endl;
348 }
349 std::cout << "optimized sum_wI2 is " << fTopCell->opt_wI2 << std::endl;
350 return newcells;
351}
352
353int AdaptiveSampler::recursively_update(std::vector<int> index)
354{
355 Cell *cell = fTopCell;
356 unsigned int depth;
357 std::stringstream cellpath;
358 for (depth=0; depth < index.size(); ++depth) {
359 cellpath << "/" << cell->divAxis << ":" << index[depth];
360 cell = cell->subcell[index[depth]];
361 }
362
363 int count = 0;
364 double efficiency = 0;
365 if (cell->nhit > 0 && cell->sum_wI2 > 0) {
366 efficiency = pow(cell->sum_wI, 2) / (cell->nhit * cell->sum_wI2);
367 }
368 if (cell->divAxis < 0) {
369
370 // This is the adaptation algorithm: all of the power
371 // of AdaptiveSampler lies in this little bit of code.
372
373 if (verbosity > 3) {
374 std::cout << "checking if we should partition " << cellpath.str()
375 << " with sum_wI2=" << cell->sum_wI2
376 << " +/- " << sqrt(cell->sum_wI4)
377 << std::endl;
378 }
379 if (cell->sum_wI2 > fSampling_threshold * fTopCell->sum_wI2 &&
380 efficiency < fEfficiency_target &&
381 depth < fMaximum_depth)
382 {
383 if (verbosity > 3) {
384 std::cout << "checking if to split at cell " << depth
385 << " with sum_wI2 = " << cell->sum_wI2
386 << " +/- " << sqrt(cell->sum_wI4)
387 << " with fMinimum_sum_wI2_delta = "
388 << fMinimum_sum_wI2_delta
389 << std::endl;
390 }
391 int best_axis = -1;
392 double best_sum_wI2 = 1e99;
393 for (int n=0; n < fNdim; ++n) {
394 double sum_wI2 = pow(sqrt(cell->sum_wI2d[3*n]) +
395 sqrt(cell->sum_wI2d[3*n+1]) +
396 sqrt(cell->sum_wI2d[3*n+2]), 2) / 3;
397 if (sum_wI2 < best_sum_wI2) {
398 best_sum_wI2 = sum_wI2;
399 best_axis = n;
400 }
401 }
402 if (best_sum_wI2 > cell->sum_wI2) {
403 std::cerr << "AdaptiveSampler::recursive_update error - "
404 << "best_sum_wI2 > sum_wI2, this cannot be!"
405 << std::endl;
406 exit(7);
407 }
408 else if (cell->sum_wI2 - best_sum_wI2 > fMinimum_sum_wI2_delta) {
409 if (best_sum_wI2 > 0) {
410 cell->divAxis = best_axis;
411 for (int s=0; s < 3; ++s) {
412 cell->subcell[s] = new Cell(fNdim, fNfixed);
413 cell->subcell[s]->nhit = cell->nhit / 3;
414 cell->subcell[s]->sum_wI = cell->sum_wI / 3;
415 cell->subcell[s]->sum_wI2 = cell->sum_wI2d[3*best_axis + s];
416 cell->subcell[s]->sum_wI4 = cell->sum_wI4d[3*best_axis + s];
417 cell->subcell[s]->subset = (best_axis < fNfixed)?
418 cell->subset : cell->subset / 3;
419 }
420 if (verbosity > 2) {
421 std::cout << "splitting this cell[";
422 for (unsigned int i=0; i < depth; i++) {
423 std::cout << ((i > 0)? "," : "") << index[i];
424 }
425 std::cout << "] along axis " << best_axis << std::endl;
426 }
427 count += 1;
428 }
429 else {
430 if (verbosity > 3) {
431 std::cout << "nope, missing statistics to find best axis!"
432 << std::endl;
433 }
434 }
435 }
436 else {
437 if (verbosity > 3) {
438 std::cout << "nope, fails to make the threshold!" << std::endl;
439 }
440 }
441 }
442 else if (cell->sum_wI2 < fSampling_threshold * fTopCell->sum_wI2) {
443 if (verbosity > 3) {
444 std::cout << "nope, fails the statistical significance test!"
445 << std::endl;
446 }
447 }
448 else if (efficiency >= fEfficiency_target) {
449 if (verbosity > 3) {
450 std::cout << "nope, efficiency is " << efficiency
451 << ", already meets the target value!"
452 << std::endl;
453 }
454 }
455 else {
456 if (verbosity > 3) {
457 std::cout << "nope, this cell cannot be split any further!"
458 << std::endl;
459 }
460 }
461 }
462 else {
463 index.push_back(0);
464 for (int i=0; i < 3; ++i) {
465 index[depth] = i;
466 count += recursively_update(index);
467 }
468 index.pop_back();
469 }
470 return count;
471}
472
473void AdaptiveSampler::reset_stats()
474{
475 fTopCell->reset_stats();
476}
477
478long int AdaptiveSampler::getNsample() const
479{
480 fTopCell->sum_stats();
481 return fTopCell->nhit;
482}
483
484double AdaptiveSampler::getWItotal() const
485{
486 fTopCell->sum_stats();
487 return fTopCell->sum_wI;
488}
489
490double AdaptiveSampler::getWI2total(bool optimized)
491{
492 fTopCell->sum_stats();
493 if (optimized) {
494 optimize_tree();
495 return fTopCell->opt_wI2;
496 }
497 return fTopCell->sum_wI2;
498}
499
500double AdaptiveSampler::getEfficiency(bool optimized)
501{
502 fTopCell->sum_stats();
503 if (fTopCell->nhit == 0)
504 return 0;
505 else if (optimized) {
506 optimize_tree();
507 return pow(fTopCell->sum_wI, 2) /
508 (fTopCell->opt_wI2 * fTopCell->opt_nhit + 1e-99);
509 }
510 return pow(fTopCell->sum_wI, 2) /
511 (fTopCell->sum_wI2 * fTopCell->nhit);
512}
513
514double AdaptiveSampler::getResult(double *error,
515 double *error_uncertainty)
516{
517 fTopCell->sum_stats();
518 double eff = pow(fTopCell->sum_wI, 2) /
519 (fTopCell->sum_wI2 * fTopCell->nhit);
520 double result = (eff > 0)? fTopCell->sum_wI / fTopCell->nhit : 0;
521 if (error) {
522 if (eff > 0)
523 *error = sqrt((1 - eff) * fTopCell->sum_wI2) / fTopCell->nhit;
524 else
525 *error = 0;
526 }
527
528 if (error_uncertainty) {
529 if (eff > 0)
530 *error_uncertainty = sqrt((1 - eff) / fTopCell->sum_wI2) / 2 *
531 sqrt(fTopCell->sum_wI4) / fTopCell->nhit;
532 else
533 *error_uncertainty = 0;
534 }
535 return result;
536}
537
538double AdaptiveSampler::getReweighted(double *error,
539 double *error_uncertainty)
540{
541 fTopCell->sum_stats();
542 optimize_tree();
543 double eff = pow(fTopCell->sum_wI, 2) /
544 (fTopCell->opt_wI2 * fTopCell->opt_nhit + 1e-99);
545 double result = (eff > 0)? fTopCell->sum_wI / fTopCell->opt_nhit : 0;
546 if (error) {
547 if (eff > 0)
548 *error = sqrt((1 - eff) * fTopCell->opt_wI2) /
549 (fTopCell->opt_nhit + 1e-99);
550 else
551 *error = 0;
552 }
553 if (error_uncertainty) {
554 if (eff > 0)
555 *error_uncertainty = sqrt((1 - eff) / fTopCell->opt_wI2) / 2 *
556 sqrt(fTopCell->opt_wI4) /
557 (fTopCell->opt_nhit + 1e-99);
558 else
559 *error_uncertainty = 0;
560 }
561 return result;
562}
563
564int AdaptiveSampler::getNdim() const
565{
566 return fNdim;
567}
568
569int AdaptiveSampler::getNfixed() const
570{
571 return fNfixed;
572}
573
574int AdaptiveSampler::getNcells() const
575{
576 return fTopCell->sum_stats();
577}
578
579void AdaptiveSampler::setAdaptation_sampling_threshold(double threshold)
580{
581 fSampling_threshold = threshold;
582}
583
584double AdaptiveSampler::getAdaptation_sampling_threshold() const
585{
586 return fSampling_threshold;
587}
588
589void AdaptiveSampler::setAdaptation_efficiency_target(double target)
590{
591 fEfficiency_target = target;
592}
593
594double AdaptiveSampler::getAdaptation_efficiency_target() const
595{
596 return fEfficiency_target;
597}
598
599void AdaptiveSampler::setAdaptation_maximum_depth(int depth)
600{
601 fMaximum_depth = depth;
602}
603
604int AdaptiveSampler::getAdaptation_maximum_depth() const
605{
606 return fMaximum_depth;
607}
608
609void AdaptiveSampler::setAdaptation_maximum_cells(int ncells)
610{
611 fMaximum_cells = ncells;
612}
613
614int AdaptiveSampler::getAdaptation_maximum_cells() const
615{
616 return fMaximum_cells;
617}
618
619void AdaptiveSampler::optimize_tree()
620{
621 fTopCell->sum_stats();
622 if (fTopCell->sum_wI > 0) {
623 fTopCell->opt_nhit = fTopCell->nhit;
624 fTopCell->opt_subset = 1;
625 fTopCell->optimize();
626 }
627}
628
629void AdaptiveSampler::display_tree(bool optimized)
630{
631 double *u0 = new double[fNdim];
632 double *u1 = new double[fNdim];
633 std::fill(u0, u0 + fNdim, 0);
634 std::fill(u1, u1 + fNdim, 1);
635 display_tree(fTopCell, 1, "o", u0, u1, optimized);
636 delete [] u0;
637 delete [] u1;
638}
639
640double AdaptiveSampler::display_tree(Cell *cell, double subset, std::string id,
641 double *u0, double *u1, bool optimized)
642{
643 char numeric[80];
644 std::cout << id << ": ";
645 double cell_subset = (optimized)? cell->opt_subset : cell->subset;
646 snprintf(numeric, 80, "%9.5e", cell_subset);
647 std::cout << numeric;
648 if (cell_subset > subset * 0.9995) {
649 std::cout << "(100%) ";
650 }
651 else {
652 snprintf(numeric, 30, "(%.1f%%) ", 100 * cell_subset / subset);
653 std::cout << numeric;
654 }
655
656 if (cell->divAxis > -1) {
657 double ssum = 0;
658 double u0m = u0[cell->divAxis];
659 double u1m = u1[cell->divAxis];
660 double du = (u1m - u0m) / 3.;
661 snprintf(numeric, 80, " [%15.13f,%15.13f] ", u0m, u1m);
662 std::cout << " split along axis " << cell->divAxis << numeric;
663 double dlev = -log(du) / log(3.);
664 snprintf(numeric, 80, " (1/3^%d) ", int(dlev));
665 std::cout << numeric;
666 std::cout << ((optimized)? cell->opt_nhit : cell->nhit)
667 << " " << cell->sum_wI
668 << " " << ((optimized)? cell->opt_wI2 : cell->sum_wI2)
669 << " " << ((optimized)? cell->opt_wI4 : cell->sum_wI4)
670 << std::endl;
671 for (int n=0; n < 3; ++n) {
672 u0[cell->divAxis] = u0m + du * n;
673 u1[cell->divAxis] = u0m + du * (n + 1);
674 ssum += display_tree(cell->subcell[n], cell_subset,
675 id + std::to_string(n), u0, u1, optimized);
676 }
677 ssum /= (cell->divAxis < fNfixed)? 3 : 1;
678 if (fabs(ssum - cell_subset) > 1e-15 * cell_subset) {
679 std::cerr << "Error in AdaptiveSampler::display_tree - "
680 << "subcell subsets fail to obey the sum rule, "
681 << "tree is invalid !!!" << std::endl
682 << " cell subset = " << cell_subset << std::endl
683 << " summed subsets = " << ssum << std::endl
684 << " difference = " << ssum - cell_subset
685 << std::endl;
686 }
687 u0[cell->divAxis] = u0m;
688 u1[cell->divAxis] = u1m;
689 }
690 else {
691 std::cout << ((optimized)? cell->opt_nhit : cell->nhit )
692 << " " << cell->sum_wI
693 << " " << ((optimized)? cell->opt_wI2 : cell->sum_wI2)
694 << " " << ((optimized)? cell->opt_wI4 : cell->sum_wI4)
695 << std::endl;
696 }
697 return cell_subset;
698}
699
700int AdaptiveSampler::saveState(const std::string filename, bool optimized) const
701{
702 std::ofstream fout(filename);
703 fout << "fNdim=" << fNdim << std::endl;
704 fout << "fNfixed=" << fNfixed << std::endl;
705 fout << "fSampling_threshold=" << fSampling_threshold << std::endl;
706 fout << "fMaximum_depth=" << fMaximum_depth << std::endl;
707 fout << "fMaximum_cells=" << fMaximum_cells << std::endl;
708 fout << "fEfficiency_target=" << fEfficiency_target << std::endl;
709 fout << "=" << std::endl;
710 int ncells = fTopCell->serialize(fout, optimized);
711 return (ncells > 0);
712}
713
714int AdaptiveSampler::mergeState(const std::string filename)
715{
716 std::ifstream fin(filename);
717 if (!fin.is_open()) {
718#ifdef EXTRA_ADAPTIVE_SAMPLER_WARNINGS
719 std::cerr << "AdaptiveSampler::mergeState error - "
720 << "unable to open " << filename << " for input."
721 << std::endl;
722#endif
723 return 0;
724 }
725 std::map<std::string,double> keyval;
726 while (true) {
727 std::string key;
728 std::getline(fin, key, '=');
729 if (key.size() == 0) {
730 std::getline(fin, key);
731 break;
732 }
733 fin >> keyval[key];
734 std::getline(fin, key);
735 }
736 if (fNdim != keyval["fNdim"]) {
737 std::cerr << "AdaptiveSampler::mergeState error - "
738 << "cannot merge with state saved in " << filename
739 << " because they have different dimensions,"
740 << " this Ndim=" << fNdim << ", file Ndim=" << keyval["fNdim"]
741 << std::endl;
742 return 0;
743 }
744 try {
745 fNfixed = keyval.at("fNfixed");
746 fSampling_threshold = keyval.at("fSampling_threshold");
747 fMaximum_depth = keyval.at("fMaximum_depth");
748 fMaximum_cells = keyval.at("fMaximum_cells");
749 fEfficiency_target = keyval.at("fEfficiency_target");
750 }
751 catch (std::exception e) {
752 std::cerr << "AdaptiveSampler::mergeState warning - "
753 << "required keyword missing in " << filename
754 << std::endl;
755 }
756 int ncells = fTopCell->deserialize(fin);
757 return (ncells > 0);
758}
759
760int AdaptiveSampler::restoreState(const std::string filename)
761{
762 if (fTopCell->divAxis > -1) {
763 for (int i=0; i < 3; ++i) {
764 delete fTopCell->subcell[i];
765 fTopCell->subcell[i] = 0;
766 }
767 }
768 fTopCell->divAxis = -1;
769 reset_stats();
770 return mergeState(filename);
771}
772
773int AdaptiveSampler::getVerbosity()
774{
775 return verbosity;
776}
777
778void AdaptiveSampler::setVerbosity(int verbose)
779{
780 verbosity = verbose;
781}
782
783int AdaptiveSampler::check_subsets(bool optimized)
784{
785 return fTopCell->check_subsets(optimized);
786}
787
788void AdaptiveSampler::Cell::optimize()
789{
790 // Assume opt_nhit and opt_subset already set upon entry,
791 // task is to assign opt_wI2, opt_wI4 for this cell, and
792 // all opt_* parameters for child nodes in the tree.
793 if (divAxis > -1) {
794 if (divAxis < nfixed) {
795 opt_wI2 = 0;
796 opt_wI4 = 0;
797 double r = (opt_nhit + 1e-99) / nhit;
798 for (int n=0; n < 3; ++n) {
799 subcell[n]->opt_nhit = subcell[n]->nhit * r;
800 subcell[n]->opt_subset = opt_subset;
801 subcell[n]->optimize();
802 opt_wI2 += subcell[n]->opt_wI2;
803 opt_wI4 += subcell[n]->opt_wI4;
804 }
805 }
806 else {
807 opt_wI2 = 0;
808 opt_wI4 = 0;
809 for (int n=0; n < 3; ++n) {
810 double r = subcell[n]->sum_wI2s / sum_wI2s;
811 // double r0 = 5e-3; // minimum subset fraction
812 // r = (r + r0) / (1 + 3*r0);
813 // r = (nhit > 100)? r : 1;
814 subcell[n]->opt_nhit = opt_nhit * r;
815 if (n == 2) {
816 double opt_ntot = subcell[0]->opt_nhit +
817 subcell[1]->opt_nhit +
818 subcell[2]->opt_nhit;
819 assert (abs(opt_ntot - opt_nhit) < 2)((abs(opt_ntot - opt_nhit) < 2) ? static_cast<void> (
0) : __assert_fail ("abs(opt_ntot - opt_nhit) < 2", "src/AdaptiveSampler.cc"
, 819, __PRETTY_FUNCTION__))
;
820 }
821 subcell[n]->opt_subset = opt_subset * r;
822 subcell[n]->optimize();
823 opt_wI2 += subcell[n]->opt_wI2;
824 opt_wI4 += subcell[n]->opt_wI4;
825 }
826 }
827 }
828 else if (nhit > 0) {
829 double r = (opt_nhit + 1e-99) / nhit;
830 opt_wI2 = sum_wI2 / r;
831 opt_wI4 = sum_wI4 / pow(r,3);
832 }
833}
834
835int AdaptiveSampler::Cell::serialize(std::ofstream &ofs, bool optimized)
836{
837 ofs << "divAxis=" << divAxis << std::endl;
838 if (nhit != 0)
839 ofs << "nhit=" << nhit << std::endl;
840 if (sum_wI != 0)
841 ofs << "sum_wI=" << (std::isfinite(sum_wI)? sum_wI : 0) << std::endl;
842 if (sum_wI2 != 0)
843 ofs << "sum_wI2=" << (std::isfinite(sum_wI2)? sum_wI2 : 0) << std::endl;
844 if (sum_wI4 != 0)
845 ofs << "sum_wI4=" << (std::isfinite(sum_wI4)? sum_wI4 : 0) << std::endl;
846 for (int i=0; i < 3*ndim; ++i) {
847 if (sum_wI2d[i] != 0)
848 ofs << "sum_wI2d[" << i << "]="
849 << (std::isfinite(sum_wI2d[i])? sum_wI2d[i] : 0) << std::endl;
850 }
851 for (int i=0; i < 3*ndim; ++i) {
852 if (sum_wI4d[i] != 0)
853 ofs << "sum_wI4d[" << i << "]="
854 << (std::isfinite(sum_wI4d[i])? sum_wI4d[i] : 0) << std::endl;
855 }
856 if (optimized)
857 ofs << "subset=" << std::setprecision(20) << opt_subset << std::endl;
858 else
859 ofs << "subset=" << std::setprecision(20) << subset << std::endl;
860 ofs << "=" << std::endl;
861 int count = 1;
862 if (divAxis > -1) {
863 count += subcell[0]->serialize(ofs, optimized);
864 count += subcell[1]->serialize(ofs, optimized);
865 count += subcell[2]->serialize(ofs, optimized);
866 }
867 return count;
868}
869
870int AdaptiveSampler::Cell::deserialize(std::ifstream &ifs, double subset_multiplier)
871{
872 std::map<std::string,double> keyval;
873 while (true) {
874 std::string key;
875 std::getline(ifs, key, '=');
876 if (key.size() == 0) {
877 std::getline(ifs, key);
878 break;
879 }
880 ifs >> keyval[key];
881 std::getline(ifs, key);
882 }
883 divAxis = keyval.at("divAxis");
884 nhit += keyval["nhit"];
885 sum_wI += keyval["sum_wI"];
886 sum_wI2 += keyval["sum_wI2"];
887 sum_wI4 += keyval["sum_wI4"];
888 subset = keyval.at("subset") * subset_multiplier;
889 std::map<std::string,double>::iterator it;
890 for (it = keyval.begin(); it != keyval.end(); ++it) {
891 if (it->first.substr(0,8) == "sum_wI2u") {
892 int n3dim=1;
893 for (int n=0; n < ndim; ++n)
894 n3dim *= 3;
895 int j;
896 if (sscanf(it->first.c_str(), "sum_wI2u[%d]=", &j) == 1) {
897 assert (j < n3dim)((j < n3dim) ? static_cast<void> (0) : __assert_fail
("j < n3dim", "src/AdaptiveSampler.cc", 897, __PRETTY_FUNCTION__
))
;
898 for (int n=0; n < ndim; ++n) {
899 double wI2 = it->second;
900 sum_wI2d[n*3+(j%3)] += wI2;
901 sum_wI4d[n*3+(j%3)] += wI2*wI2 * n3dim/nhit;
902 j /= 3;
903 }
904 }
905 }
906 else if (it->first.substr(0,8) == "sum_wI2d") {
907 int j;
908 if (sscanf(it->first.c_str(), "sum_wI2d[%d]=", &j) == 1) {
909 assert (j < 3*ndim)((j < 3*ndim) ? static_cast<void> (0) : __assert_fail
("j < 3*ndim", "src/AdaptiveSampler.cc", 909, __PRETTY_FUNCTION__
))
;
910 sum_wI2d[j] += it->second;
911 }
912 }
913 else if (it->first.substr(0,8) == "sum_wI4d") {
914 int j;
915 if (sscanf(it->first.c_str(), "sum_wI4d[%d]=", &j) == 1) {
916 assert (j < 3*ndim)((j < 3*ndim) ? static_cast<void> (0) : __assert_fail
("j < 3*ndim", "src/AdaptiveSampler.cc", 916, __PRETTY_FUNCTION__
))
;
917 sum_wI4d[j] += it->second;
918 }
919 }
920 }
921 int count = 1;
922 if (divAxis > -1) {
923 //subset_multiplier *= (divAxis < 1)? 3 : 1;
924 for (int i=0; i < 3; ++i) {
925 if (subcell[i] == 0) {
926 subcell[i] = new Cell(ndim, nfixed);
927 }
928 count += subcell[i]->deserialize(ifs, subset_multiplier);
929 }
930 }
931 return count;
932}
933
934int AdaptiveSampler::Cell::check_subsets(bool optimized, std::string id)
935{
936 if (divAxis < 0) {
937 return 0;
938 }
939
940 // test trinomial statistics at each node
941 int warnings = 0;
942 long int Ntot = subcell[0]->nhit + subcell[1]->nhit + subcell[2]->nhit;
943 if (Ntot != nhit) {
944 std::cerr << "Error in AdaptiveSampler::Cell::check_subsets - "
945 << "nhit consistency check #1 failed for cell " << id
946 << std::endl
947 << " split cell nhit=" << nhit
948 << ", sum of subcell nhit=" << Ntot
949 << std::endl;
950 return 999;
951 }
952 if (divAxis < nfixed) {
953 if (subcell[0]->subset != subset ||
954 subcell[1]->subset != subset ||
955 subcell[2]->subset != subset)
956 {
957 std::cerr << "Error in AdaptiveSampler::Cell::check_subsets - "
958 << "subset consistency check #1 failed for cell " << id
959 << std::endl
960 << " split fixed cell subset=" << subset
961 << ", subcell subsets="
962 << subcell[0]->subset << ","
963 << subcell[1]->subset << ","
964 << subcell[2]->subset
965 << std::endl;
966 return 999;
967 }
968 }
969 else {
970 double subsetsum = subcell[0]->subset +
971 subcell[1]->subset +
972 subcell[2]->subset;
973 if (fabs(subsetsum - subset) > subset * 1e-12) {
974 std::cerr << "Error in AdaptiveSampler::Cell::check_subsets - "
975 << "subset consistency check #2 failed for cell " << id
976 << std::endl
977 << " split cell subset=" << subset
978 << ", sum of subcell subsets=" << subsetsum
979 << std::endl;
980 return 999;
981 }
982 for (int i=0; i < 3; ++i) {
983 double p = subcell[i]->subset / (subset + 1e-99);
984 double mu = nhit * p;
985 double sigma = sqrt(nhit * p * (1-p));
986 double p1 = subcell[(i+1)%3]->subset / (subset + 1e-99);
987 double mu1 = nhit * p1;
988 double sigma1 = sqrt(nhit * p1 * (1-p1));
989 double p2 = subcell[(i+2)%3]->subset / (subset + 1e-99);
990 double mu2 = nhit * p2;
991 double sigma2 = sqrt(nhit * p2 * (1-p2));
992 if (subcell[i]->nhit < 30) {
993 double prob = 1;
994 for (int k=1; k <= nhit; ++k) {
995 if (k <=subcell[i]->nhit)
996 prob *= p * double(nhit - k + 1) / k;
997 else
998 prob *= 1 - p;
999 }
1000 if (prob < 1e-6) {
1001 std::cerr << "Warning in AdaptiveSampler::Cell::check_subsets - "
1002 << "nhit - subset probability < 1e-6, cell "
1003 << id << ", subcell " << i
1004 << " has nhit=" << subcell[i]->nhit
1005 << ", expected " << mu << " +/- " << sigma
1006 << ", P-value " << prob
1007 << std::endl;
1008 std::cerr << " This cell is splits " << nhit
1009 << " events along axis " << divAxis
1010 << std::endl;
1011 std::cerr << " Other branch of this cell:"
1012 << " subcell " << (i+1)%3
1013 << " with nhit=" << subcell[(i+1)%3]->nhit
1014 << ", expected " << mu1 << " +/- " << sigma1
1015 << ", " << (subcell[(i+1)%3]->nhit - mu1) / sigma1
1016 << " sigma." << std::endl;
1017 std::cerr << " Other branch of this cell:"
1018 << " subcell " << (i+2)%3
1019 << " with nhit=" << subcell[(i+2)%3]->nhit
1020 << ", expected " << mu2 << " +/- " << sigma2
1021 << ", " << (subcell[(i+2)%3]->nhit - mu2) / sigma2
1022 << " sigma." << std::endl;
1023 warnings++;
1024 }
1025 }
1026 else if (fabs(subcell[i]->nhit - mu) > 5 * sigma) {
1027 std::cerr << "Warning in AdaptiveSampler::Cell::check_subsets - "
1028 << "nhit - subset mismatch > 5 sigma, cell "
1029 << id << ", subcell " << i
1030 << " has nhit=" << subcell[i]->nhit
1031 << ", expected " << mu << " +/- " << sigma
1032 << ", " << (subcell[i]->nhit - mu) / sigma
1033 << " sigma!" << std::endl;
1034 std::cerr << " This cell is splits " << nhit
1035 << " events along axis " << divAxis
1036 << std::endl;
1037 std::cerr << " Other branch of this cell:"
1038 << " subcell " << (i+1)%3
1039 << " with nhit=" << subcell[(i+1)%3]->nhit
1040 << ", expected " << mu1 << " +/- " << sigma1
1041 << ", " << (subcell[(i+1)%3]->nhit - mu1) / sigma1
1042 << " sigma." << std::endl;
1043 std::cerr << " Other branch of this cell:"
1044 << " subcell " << (i+2)%3
1045 << " with nhit=" << subcell[(i+2)%3]->nhit
1046 << ", expected " << mu2 << " +/- " << sigma2
1047 << ", " << (subcell[(i+2)%3]->nhit - mu2) / sigma2
1048 << " sigma." << std::endl;
1049 warnings++;
1050 }
1051 }
1052 }
1053 for (int i=0; i < 3; ++i) {
1054 warnings += subcell[i]->check_subsets(optimized,
1055 id + std::to_string(i));
1056 }
1057 return warnings;
1058}
1059
1060std::string AdaptiveSampler::Cell::address()
1061{
1062 // assumes sum_stats has been called, so super is filled in
1063 std::string a;
1064 Cell *here = this;
1065 while (here != 0 && here->super != 0) {
1066 std::string sn = "?";
1067 for (int n=0; n < 3; ++n) {
1068 if (here->super->subcell[n] == here) {
1069 sn = std::to_string(n);
1070 here = here->super;
1071 break;
1072 }
1073 }
1074 a.insert(0, sn);
1075 }
1076 a.insert(0, "o");
1077 return a;
1078}