src/AdaptiveSampler.cc

Bug Summary

File:	scratch/gluex/scan-build-work/hdgeant4/src/AdaptiveSampler.cc
Location:	line 257, column 11
Description:	Access to field 'divAxis' results in a dereference of a null pointer (loaded from variable 'sel')

Annotated Source Code

// AdaptiveSampler - provide an adaptive algorithm for improved

// importance sampling in Monte Carlo integration

// file: AdaptiveSampler.cc (see AdaptiveSampler.hh for header, usage)

// author: richard.t.jones at uconn.edu

// original release: february 23, 2016

// new feature release: may 30,2 2020

// - see new notes 7,8 below

// - richard.t.jones at uconn.edu

// implementation notes:

// 1) See the description at the top of AdaptiveSampler.hh for an overview

// of the problem that this class is designed to help solve, and how to

// use it.

// 2) If an AdaptiveSampler object is being shared by more than one thread,

// only one of the threads can be calling the feedback method in the

// sample loop at a time. Simultaneous entry from more than one thread

// to any of the mutating methods (feedback, adapt, reset_stats, etc.)

// produces undefined behavior. Either you can share a single instance

// of AdaptiveSampler among many threads with the calls to mutating

// methods protected by a mutex, or you can construct an AdapativeSample

// instance for each thread and then pause at regular intervals to

// combine their statistics using save/merge/restore.

// 3) Users must provide their uniform random number generator function to

// the AdaptiveSampler constructor. If the sample method is to be called

// from multiple threads, this user-written function must be reentrant.

// The user is responsible for initializing the state of the generator

// before beginning the Monte Carlo loop.

// 4) Successful use of AdaptiveSampler requires that the user understand

// some basic things about opportunistic importance sampling. The basic

// strategy it uses is to look for hot spots in the integration domain

// and zoom in to oversample in those regions. By implication, there

// are regions in the integration domain that are undersampled, and

// this can lead to systematic underbias in the result. Because of this

// bias, importance sampling is really a variance-reduction technique:

// only by repeated application under a variety of conditions does one

// reach any level of confidence that the result is correct. Tuning

// parameters are provided to allow the user to vary the conditions

// for adaptation and check the robustness of their result.

// *) Adaptation_sampling_threshold - minimum fraction of the total

// sample sum of wI**2 that must be coming from a given cell in

// order for that cell to be considered for splitting.

// *) Adaptation_efficiency_target - this is the goal that is set

// for the adaptation algorithm to shoot for. Adaptations are

// allowed if and only if the improvements to the efficiency

// that they entail have a reasonable chance of achieving the

// target value before Adaptation_maximum_cells (see below) is

// reached.

// *) Adaptation_maximum_depth - this is the maximum depth of the

// adaptation tree. The finest subdivision of the domain of

// integration will be a cell of volume 3**maximum_depth. This

// prevents the algorithm from tuning on noise if the inherent

// resolution in the integrand function is has a finite limit.

// Because of the intrinsic limitations of double precision

// floats, this parameter should normally not exceed 35.

// *) Adaptation_maximum_cells - maximum number of cells that

// should be allowed by the adaptation algorithm in building

// out the sampling tree. The sampling tree is a hierarchical

// subdivision of the integration domain that selects certain

// regions in the ndim-dimensional hypercube for oversampling.

// Each cell in the tree requires memory resources to hold the

// statistics that it collects for the hits that are generated

// within that cell. All cells in the tree have the same size,

// approximately 40 + (8 * 3^ndim) for ndim dimensions. The

// number of cells that are needed to achieve a certain target

// efficiency depends on the problem. Generally one should

// plan to accommodate as many cells as memory resources can

// support. The algorithm attempts to achieve the efficiency

// target using as few cells as it can.

// 5) Generally, one can expect to achieve a factor 3 improvement in

// sampling efficiency each time the maximum depth of the tree

// advances by ndim. Generically it is the maximum depth of the

// tree and not the number of cells it contains that indicates how

// much improvement the sampler has achieved over unbiased sampling.

// The number of cells required to advance the depth another ndim

// levels may increase or decrease with depth depending on the

// fractal dimension of the integrand. It really depends on the

// problem; there is no way in advance to predict how this will

// behave. Obviously integrands of low fractal dimension will be

// more susceptible to the generic subdivision strategy employed

// by the AdaptiveSampling algorithm than those with high dimension.

// 6) Here are rules of thumb that will keep you from common blunders.

// a) Do not be too aggressive in adapting, especially at first

// when the AdaptiveSampler is new and undifferentiated. Keep

// the sampling_threshold value as high as you can (1-10 is a

// good range to start off with), only lower it if you cannot

// get the adaptation to progress. Keep in mind that the lower

// this threshold is set, the more susceptible you are to

// tuning on noise and introducing uncontrolled underbias.

// b) Bias is generically on the low side of the true result.

// This means that if you repeat a measurement 25 times and

// one repetition gives an answer that is many sigma above

100

// the rest, there is a low-visibility region in integration

101

// space that this repetition has found. Based on this, you

102

// should modify your adaptation strategy to increase the

103

// visiblity of that region early on during development. See

104

// point (c) for how that can be done.

105

// c) If you know there are low-visiblity regions of the domain

106

// of integration that tend to get excluded by the adaptation

107

// algorithm, you can "train" the AdaptiveSampler by running it

108

// initially for several thousand events on a integrand that is

109

// biased in favor of these regions. Mixing in these events with

110

// a larger sample of the true integrand over several rounds of

111

// adapt/reset_stats/repeat will build sensitivity to these

112

// regions into the tree that should persist throughout the

113

// remainder of its evolution.

114

// d) Run large batches of MC jobs in parallel during adaptation

115

// and use saveState/mergeState, then adapt on their combined

116

// results. Do not adapt separately in each individual thread

117

// unless your goal is to generate as diverse a set of trees

118

// as possible. The adaptation algorithm in AdaptiveSampler is

119

// designed to deterministically produce the optimum sampling

120

// tree in the large-N limit, so the best trees are those that

121

// are based on the largest statistical sample. Do not try to

122

// generate a super-high performance tree by producing many of

123

// them and chosing the one with the highest efficiency -- it

124

// may have high efficiency, but by doing this you have almost

125

// guaranteed that your result will have a large bias.

126

127

// 7) The user can now set up the problem with higher dimension than

128

// the number of random numbers needed per event, with the extra

129

// dimensions allocated to user-generated random variables. These

130

// must be transformed by the user to fall within the domain [0,1]

131

// and passed to sample() and feedback() in the first nfixed

132

// elements of the u vector (first argument). If nfixed > 0 then

133

// it needs to be passed as the last argument to the constructor

134

// of AdaptiveSampler.

135

136

#include <assert.h>

137

#include <cmath>

138

#include <fstream>

139

#include <sstream>

140

#include <iostream>

141

#include <exception>

142

#include "AdaptiveSampler.hh"

143

144

int AdaptiveSampler::verbosity = 3;

145

146

AdaptiveSampler::AdaptiveSampler(int dim, Uniform01 user_generator, int nfixed)

147

: fNdim(dim),

148

fNfixed(nfixed),

149

fMaximum_depth(30),

150

fMaximum_cells(1000000),

151

fSampling_threshold(0.01),

152

fEfficiency_target(0.9),

153

fMinimum_sum_wI2_delta(0)

154

{

155

fRandom = user_generator;

156

fTopCell = new Cell(dim, nfixed);

157

fTopCell->subset = 1;

158

reset_stats();

159

}

160

161

AdaptiveSampler::AdaptiveSampler(const AdaptiveSampler &src)

162

: fNdim(src.fNdim),

163

fNfixed(src.fNfixed),

164

fMaximum_depth(src.fMaximum_depth),

165

fMaximum_cells(src.fMaximum_cells),

166

fSampling_threshold(src.fSampling_threshold),

167

fEfficiency_target(src.fEfficiency_target),

168

fMinimum_sum_wI2_delta(src.fMinimum_sum_wI2_delta)

169

{

170

fRandom = src.fRandom;

171

fTopCell = new Cell(*src.fTopCell);

172

}

173

174

AdaptiveSampler::~AdaptiveSampler()

175

{

176

delete fTopCell;

177

}

178

179

AdaptiveSampler AdaptiveSampler::operator=(const AdaptiveSampler &src)

180

{

181

return AdaptiveSampler(src);

182

}

183

184

double AdaptiveSampler::sample(double *u)

185

{

186

for (int i=0; i < fNfixed; ++i) {

187

if (u[i] < 0 || u[i] >= 1) {

188

std::cerr << "AdaptiveSampler::sample error - "

189

<< "fixed parameter " << i << " is "

190

<< "outside the allowed interval [0,1), "

191

<< "value is " << u[i] << std::endl;

192

u[i] = 0;

193

}

194

}

195

int depth;

196

double *u0 = new double[fNdim];

197

double *u1 = new double[fNdim];

198

double *uu = new double[fNdim + 1];

199

(*fRandom)(fNdim - fNfixed + 1, uu + fNfixed);

200

Cell *cell = findCell(uu[fNdim], depth, u0, u1, u);

201

for (int i=fNfixed; i < fNdim; ++i) {

202

u[i] = uu[i] * u0[i] + (1 - uu[i]) * u1[i];

203

}

204

delete [] u0;

205

delete [] u1;

206

delete [] uu;

207

// WARNING: strong condition is assumed here!

208

// All splits along axes 0..fNfixed must be each

209

// assigned the full subset of the parent cell.

210

// You can use check_subsets() to verify this.

211

double dNu = (depth > 0)? pow(1/3., depth) : 1;

212

return dNu / cell->subset;

213

}

214

215

AdaptiveSampler::Cell *AdaptiveSampler::findCell(double ucell,

216

int &depth,

217

double *u0,

218

double *u1,

219

const double *u)

220

{

221

std::fill(u0, u0 + fNdim, 0);

222

std::fill(u1, u1 + fNdim, 1);

223

depth = 0;

224

Cell *sel = fTopCell;

225

while (sel->divAxis > -1) {

226

int i;

227

int j = sel->divAxis;

228

double du = (u1[j] - u0[j]) / 3;

229

if (j < fNfixed) {

230

i = int((u[j] - u0[j]) / du);

231

}

232

else {

233

for (i=0; i < 2; i++) {

234

if (ucell >= sel->subcell[i]->subset)

235

ucell -= sel->subcell[i]->subset;

236

else

237

break;

238

}

239

++depth;

240

}

241

u0[j] = u0[j] + du * i;

242

u1[j] = u0[j] + du;

243

sel = sel->subcell[i];

244

}

245

return sel;

246

}

247

248

AdaptiveSampler::Cell *AdaptiveSampler::findCell(const double *u,

249

int &depth,

250

double *u0,

251

double *u1) const

252

{

253

std::fill(u0, u0 + fNdim, 0);

254

std::fill(u1, u1 + fNdim, 1);

255

depth = 0;

256

Cell *sel = fTopCell;

257

while (sel->divAxis > -1) {

←

Loop condition is true. Entering loop body

→

←

Loop condition is true. Entering loop body

→

←

Access to field 'divAxis' results in a dereference of a null pointer (loaded from variable 'sel')

258

int j = sel->divAxis;

259

double du = (u1[j] - u0[j]) / 3;

260

int i = floor((u[j] - u0[j]) / du);

261

i = (i < 0)? 0 : (i < 3)? i : 2;

←

Assuming 'i' is >= 0

→

←

'?' condition is false

Assuming 'i' is >= 3

'?' condition is false

Assuming 'i' is < 0

'?' condition is true

→

262

u0[j] = u0[j] + du * i;

263

u1[j] = u0[j] + du;

264

if (sel->subcell[i] == 0) {

Taking false branch

Taking true branch

265

std::cerr << "bad romans!" << std::endl;

266

}

267

sel = sel->subcell[i];

←

Null pointer value stored to 'sel'

→

268

if (j >= fNfixed)

Taking false branch

Taking false branch

269

++depth;

270

}

271

return sel;

272

}

273

274

void AdaptiveSampler::feedback(const double *u, double wI)

275

{

276

int depth;

277

double *u0 = new double[fNdim];

278

double *u1 = new double[fNdim];

279

Cell *cell = findCell(u, depth, u0, u1);

Calling 'AdaptiveSampler::findCell'

→

280

cell->nhit += 1;

281

cell->sum_wI += wI;

282

double wI2 = wI*wI;

283

cell->sum_wI2 += wI2;

284

cell->sum_wI4 += wI2*wI2;

285

for (int n=0; n < fNdim; ++n) {

286

double du = (u1[n] - u0[n]) / 3;

287

int s = (u[n] - u0[n]) / du;

288

assert (s >= 0 && s < 3)((s >= 0 && s < 3) ? static_cast<void> (0
) : __assert_fail ("s >= 0 && s < 3", "src/AdaptiveSampler.cc"
, 288, __PRETTY_FUNCTION__));

289

cell->sum_wI2d[3*n+s] += wI2;

290

cell->sum_wI4d[3*n+s] += wI2*wI2;

291

}

292

delete [] u0;

293

delete [] u1;

294

}

295

296

int AdaptiveSampler::adapt()

297

{

298

int ncells = fTopCell->sum_stats();

299

std::cout << "starting sum_wI2 is " << fTopCell->sum_wI2 << std::endl;

300

double sum_wI2_target = pow(fTopCell->sum_wI, 2) /

301

(fTopCell->nhit * fEfficiency_target);

302

double sum_wI2_error = sqrt(fTopCell->sum_wI4) + 1e-99;

303

double gain_sig = (fTopCell->sum_wI2 - sum_wI2_target) / sum_wI2_error;

304

if (gain_sig > 3) {

305

if (verbosity > 0)

306

std::cout << "adapt - sample statistics indicate that significant"

307

<< " gains (" << gain_sig << " sigma)" << std::endl

308

<< "in sampling efficiency can be achieved through"

309

<< " further optimization, beginning optimization pass."

310

<< std::endl;

311

}

312

else {

313

if (verbosity > 0)

314

std::cout << "adapt - sample statistics indicate that significant"

315

<< " gains in sampling efficiency cannot be achieved through"

316

<< " further optimization "

317

<< "(" << gain_sig << " sigma),"

318

<< " a larger sample size is needed."

319

<< std::endl;

320

}

321

std::cout << "Looking for cells containing at least "

322

<< fSampling_threshold * 100 << "% of the total "

323

<< "sample sum_wI2 = " << fTopCell->sum_wI2

324

<< std::endl;

325

326

fMinimum_sum_wI2_delta = (fTopCell->sum_wI2 - sum_wI2_target) /

327

(fMaximum_cells - ncells);

328

std::vector<int> cellIndex;

329

int newcells = recursively_update(cellIndex);

330

if (verbosity > 0) {

331

if (newcells) {

332

if (verbosity > 0)

333

std::cout << "adapt - " << newcells << " added this pass,"

334

<< " new count is " << getNcells()

335

<< std::endl;

336

}

337

else {

338

if (verbosity > 0)

339

std::cout << "adapt - no changes this pass,"

340

<< " count remains unchanged at " << getNcells()

341

<< std::endl;

342

}

343

}

344

if (verbosity > 1) {

345

std::cout << "present efficiency " << getEfficiency()

346

<< ", new value predicted to be " << getEfficiency(true)

347

<< std::endl;

348

}

349

std::cout << "optimized sum_wI2 is " << fTopCell->opt_wI2 << std::endl;

350

return newcells;

351

}

352

353

int AdaptiveSampler::recursively_update(std::vector<int> index)

354

{

355

Cell *cell = fTopCell;

356

unsigned int depth;

357

std::stringstream cellpath;

358

for (depth=0; depth < index.size(); ++depth) {

359

cellpath << "/" << cell->divAxis << ":" << index[depth];

360

cell = cell->subcell[index[depth]];

361

}

362

363

int count = 0;

364

double efficiency = 0;

365

if (cell->nhit > 0 && cell->sum_wI2 > 0) {

366

efficiency = pow(cell->sum_wI, 2) / (cell->nhit * cell->sum_wI2);

367

}

368

if (cell->divAxis < 0) {

369

370

// This is the adaptation algorithm: all of the power

371

// of AdaptiveSampler lies in this little bit of code.

372

373

if (verbosity > 3) {

374

std::cout << "checking if we should partition " << cellpath.str()

375

<< " with sum_wI2=" << cell->sum_wI2

376

<< " +/- " << sqrt(cell->sum_wI4)

377

<< std::endl;

378

}

379

if (cell->sum_wI2 > fSampling_threshold * fTopCell->sum_wI2 &&

380

efficiency < fEfficiency_target &&

381

depth < fMaximum_depth)

382

{

383

if (verbosity > 3) {

384

std::cout << "checking if to split at cell " << depth

385

<< " with sum_wI2 = " << cell->sum_wI2

386

<< " +/- " << sqrt(cell->sum_wI4)

387

<< " with fMinimum_sum_wI2_delta = "

388

<< fMinimum_sum_wI2_delta

389

<< std::endl;

390

}

391

int best_axis = -1;

392

double best_sum_wI2 = 1e99;

393

for (int n=0; n < fNdim; ++n) {

394

double sum_wI2 = pow(sqrt(cell->sum_wI2d[3*n]) +

395

sqrt(cell->sum_wI2d[3*n+1]) +

396

sqrt(cell->sum_wI2d[3*n+2]), 2) / 3;

397

if (sum_wI2 < best_sum_wI2) {

398

best_sum_wI2 = sum_wI2;

399

best_axis = n;

400

}

401

}

402

if (best_sum_wI2 > cell->sum_wI2) {

403

std::cerr << "AdaptiveSampler::recursive_update error - "

404

<< "best_sum_wI2 > sum_wI2, this cannot be!"

405

<< std::endl;

406

exit(7);

407

}

408

else if (cell->sum_wI2 - best_sum_wI2 > fMinimum_sum_wI2_delta) {

409

if (best_sum_wI2 > 0) {

410

cell->divAxis = best_axis;

411

for (int s=0; s < 3; ++s) {

412

cell->subcell[s] = new Cell(fNdim, fNfixed);

413

cell->subcell[s]->nhit = cell->nhit / 3;

414

cell->subcell[s]->sum_wI = cell->sum_wI / 3;

415

cell->subcell[s]->sum_wI2 = cell->sum_wI2d[3*best_axis + s];

416

cell->subcell[s]->sum_wI4 = cell->sum_wI4d[3*best_axis + s];

417

cell->subcell[s]->subset = (best_axis < fNfixed)?

418

cell->subset : cell->subset / 3;

419

}

420

if (verbosity > 2) {

421

std::cout << "splitting this cell[";

422

for (unsigned int i=0; i < depth; i++) {

423

std::cout << ((i > 0)? "," : "") << index[i];

424

}

425

std::cout << "] along axis " << best_axis << std::endl;

426

}

427

count += 1;

428

}

429

else {

430

if (verbosity > 3) {

431

std::cout << "nope, missing statistics to find best axis!"

432

<< std::endl;

433

}

434

}

435

}

436

else {

437

if (verbosity > 3) {

438

std::cout << "nope, fails to make the threshold!" << std::endl;

439

}

440

}

441

}

442

else if (cell->sum_wI2 < fSampling_threshold * fTopCell->sum_wI2) {

443

if (verbosity > 3) {

444

std::cout << "nope, fails the statistical significance test!"

445

<< std::endl;

446

}

447

}

448

else if (efficiency >= fEfficiency_target) {

449

if (verbosity > 3) {

450

std::cout << "nope, efficiency is " << efficiency

451

<< ", already meets the target value!"

452

<< std::endl;

453

}

454

}

455

else {

456

if (verbosity > 3) {

457

std::cout << "nope, this cell cannot be split any further!"

458

<< std::endl;

459

}

460

}

461

}

462

else {

463

index.push_back(0);

464

for (int i=0; i < 3; ++i) {

465

index[depth] = i;

466

count += recursively_update(index);

467

}

468

index.pop_back();

469

}

470

return count;

471

}

472

473

void AdaptiveSampler::reset_stats()

474

{

475

fTopCell->reset_stats();

476

}

477

478

long int AdaptiveSampler::getNsample() const

479

{

480

fTopCell->sum_stats();

481

return fTopCell->nhit;

482

}

483

484

double AdaptiveSampler::getWItotal() const

485

{

486

fTopCell->sum_stats();

487

return fTopCell->sum_wI;

488

}

489

490

double AdaptiveSampler::getWI2total(bool optimized)

491

{

492

fTopCell->sum_stats();

493

if (optimized) {

494

optimize_tree();

495

return fTopCell->opt_wI2;

496

}

497

return fTopCell->sum_wI2;

498

}

499

500

double AdaptiveSampler::getEfficiency(bool optimized)

501

{

502

fTopCell->sum_stats();

503

if (fTopCell->nhit == 0)

504

return 0;

505

else if (optimized) {

506

optimize_tree();

507

return pow(fTopCell->sum_wI, 2) /

508

(fTopCell->opt_wI2 * fTopCell->opt_nhit + 1e-99);

509

}

510

return pow(fTopCell->sum_wI, 2) /

511

(fTopCell->sum_wI2 * fTopCell->nhit);

512

}

513

514

double AdaptiveSampler::getResult(double *error,

515

double *error_uncertainty)

516

{

517

fTopCell->sum_stats();

518

double eff = pow(fTopCell->sum_wI, 2) /

519

(fTopCell->sum_wI2 * fTopCell->nhit);

520

double result = (eff > 0)? fTopCell->sum_wI / fTopCell->nhit : 0;

521

if (error) {

522

if (eff > 0)

523

*error = sqrt((1 - eff) * fTopCell->sum_wI2) / fTopCell->nhit;

524

else

525

*error = 0;

526

}

527

528

if (error_uncertainty) {

529

if (eff > 0)

530

*error_uncertainty = sqrt((1 - eff) / fTopCell->sum_wI2) / 2 *

531

sqrt(fTopCell->sum_wI4) / fTopCell->nhit;

532

else

533

*error_uncertainty = 0;

534

}

535

return result;

536

}

537

538

double AdaptiveSampler::getReweighted(double *error,

539

double *error_uncertainty)

540

{

541

fTopCell->sum_stats();

542

optimize_tree();

543

double eff = pow(fTopCell->sum_wI, 2) /

544

(fTopCell->opt_wI2 * fTopCell->opt_nhit + 1e-99);

545

double result = (eff > 0)? fTopCell->sum_wI / fTopCell->opt_nhit : 0;

546

if (error) {

547

if (eff > 0)

548

*error = sqrt((1 - eff) * fTopCell->opt_wI2) /

549

(fTopCell->opt_nhit + 1e-99);

550

else

551

*error = 0;

552

}

553

if (error_uncertainty) {

554

if (eff > 0)

555

*error_uncertainty = sqrt((1 - eff) / fTopCell->opt_wI2) / 2 *

556

sqrt(fTopCell->opt_wI4) /

557

(fTopCell->opt_nhit + 1e-99);

558

else

559

*error_uncertainty = 0;

560

}

561

return result;

562

}

563

564

int AdaptiveSampler::getNdim() const

565

{

566

return fNdim;

567

}

568

569

int AdaptiveSampler::getNfixed() const

570

{

571

return fNfixed;

572

}

573

574

int AdaptiveSampler::getNcells() const

575

{

576

return fTopCell->sum_stats();

577

}

578

579

void AdaptiveSampler::setAdaptation_sampling_threshold(double threshold)

580

{

581

fSampling_threshold = threshold;

582

}

583

584

double AdaptiveSampler::getAdaptation_sampling_threshold() const

585

{

586

return fSampling_threshold;

587

}

588

589

void AdaptiveSampler::setAdaptation_efficiency_target(double target)

590

{

591

fEfficiency_target = target;

592

}

593

594

double AdaptiveSampler::getAdaptation_efficiency_target() const

595

{

596

return fEfficiency_target;

597

}

598

599

void AdaptiveSampler::setAdaptation_maximum_depth(int depth)

600

{

601

fMaximum_depth = depth;

602

}

603

604

int AdaptiveSampler::getAdaptation_maximum_depth() const

605

{

606

return fMaximum_depth;

607

}

608

609

void AdaptiveSampler::setAdaptation_maximum_cells(int ncells)

610

{

611

fMaximum_cells = ncells;

612

}

613

614

int AdaptiveSampler::getAdaptation_maximum_cells() const

615

{

616

return fMaximum_cells;

617

}

618

619

void AdaptiveSampler::optimize_tree()

620

{

621

fTopCell->sum_stats();

622

if (fTopCell->sum_wI > 0) {

623

fTopCell->opt_nhit = fTopCell->nhit;

624

fTopCell->opt_subset = 1;

625

fTopCell->optimize();

626

}

627

}

628

629

void AdaptiveSampler::display_tree(bool optimized)

630

{

631

double *u0 = new double[fNdim];

632

double *u1 = new double[fNdim];

633

std::fill(u0, u0 + fNdim, 0);

634

std::fill(u1, u1 + fNdim, 1);

635

display_tree(fTopCell, 1, "o", u0, u1, optimized);

636

delete [] u0;

637

delete [] u1;

638

}

639

640

double AdaptiveSampler::display_tree(Cell *cell, double subset, std::string id,

641

double *u0, double *u1, bool optimized)

642

{

643

char numeric[80];

644

std::cout << id << ": ";

645

double cell_subset = (optimized)? cell->opt_subset : cell->subset;

646

snprintf(numeric, 80, "%9.5e", cell_subset);

647

std::cout << numeric;

648

if (cell_subset > subset * 0.9995) {

649

std::cout << "(100%) ";

650

}

651

else {

652

snprintf(numeric, 30, "(%.1f%%) ", 100 * cell_subset / subset);

653

std::cout << numeric;

654

}

655

656

if (cell->divAxis > -1) {

657

double ssum = 0;

658

double u0m = u0[cell->divAxis];

659

double u1m = u1[cell->divAxis];

660

double du = (u1m - u0m) / 3.;

661

snprintf(numeric, 80, " [%15.13f,%15.13f] ", u0m, u1m);

662

std::cout << " split along axis " << cell->divAxis << numeric;

663

double dlev = -log(du) / log(3.);

664

snprintf(numeric, 80, " (1/3^%d) ", int(dlev));

665

std::cout << numeric;

666

std::cout << ((optimized)? cell->opt_nhit : cell->nhit)

667

<< " " << cell->sum_wI

668

<< " " << ((optimized)? cell->opt_wI2 : cell->sum_wI2)

669

<< " " << ((optimized)? cell->opt_wI4 : cell->sum_wI4)

670

<< std::endl;

671

for (int n=0; n < 3; ++n) {

672

u0[cell->divAxis] = u0m + du * n;

673

u1[cell->divAxis] = u0m + du * (n + 1);

674

ssum += display_tree(cell->subcell[n], cell_subset,

675

id + std::to_string(n), u0, u1, optimized);

676

}

677

ssum /= (cell->divAxis < fNfixed)? 3 : 1;

678

if (fabs(ssum - cell_subset) > 1e-15 * cell_subset) {

679

std::cerr << "Error in AdaptiveSampler::display_tree - "

680

<< "subcell subsets fail to obey the sum rule, "

681

<< "tree is invalid !!!" << std::endl

682

<< " cell subset = " << cell_subset << std::endl

683

<< " summed subsets = " << ssum << std::endl

684

<< " difference = " << ssum - cell_subset

685

<< std::endl;

686

}

687

u0[cell->divAxis] = u0m;

688

u1[cell->divAxis] = u1m;

689

}

690

else {

691

std::cout << ((optimized)? cell->opt_nhit : cell->nhit )

692

<< " " << cell->sum_wI

693

<< " " << ((optimized)? cell->opt_wI2 : cell->sum_wI2)

694

<< " " << ((optimized)? cell->opt_wI4 : cell->sum_wI4)

695

<< std::endl;

696

}

697

return cell_subset;

698

}

699

700

int AdaptiveSampler::saveState(const std::string filename, bool optimized) const

701

{

702

std::ofstream fout(filename);

703

fout << "fNdim=" << fNdim << std::endl;

704

fout << "fNfixed=" << fNfixed << std::endl;

705

fout << "fSampling_threshold=" << fSampling_threshold << std::endl;

706

fout << "fMaximum_depth=" << fMaximum_depth << std::endl;

707

fout << "fMaximum_cells=" << fMaximum_cells << std::endl;

708

fout << "fEfficiency_target=" << fEfficiency_target << std::endl;

709

fout << "=" << std::endl;

710

int ncells = fTopCell->serialize(fout, optimized);

711

return (ncells > 0);

712

}

713

714

int AdaptiveSampler::mergeState(const std::string filename)

715

{

716

std::ifstream fin(filename);

717

if (!fin.is_open()) {

718

#ifdef EXTRA_ADAPTIVE_SAMPLER_WARNINGS

719

std::cerr << "AdaptiveSampler::mergeState error - "

720

<< "unable to open " << filename << " for input."

721

<< std::endl;

722

#endif

723

return 0;

724

}

725

std::map<std::string,double> keyval;

726

while (true) {

727

std::string key;

728

std::getline(fin, key, '=');

729

if (key.size() == 0) {

730

std::getline(fin, key);

731

break;

732

}

733

fin >> keyval[key];

734

std::getline(fin, key);

735

}

736

if (fNdim != keyval["fNdim"]) {

737

std::cerr << "AdaptiveSampler::mergeState error - "

738

<< "cannot merge with state saved in " << filename

739

<< " because they have different dimensions,"

740

<< " this Ndim=" << fNdim << ", file Ndim=" << keyval["fNdim"]

741

<< std::endl;

742

return 0;

743

}

744

try {

745

fNfixed = keyval.at("fNfixed");

746

fSampling_threshold = keyval.at("fSampling_threshold");

747

fMaximum_depth = keyval.at("fMaximum_depth");

748

fMaximum_cells = keyval.at("fMaximum_cells");

749

fEfficiency_target = keyval.at("fEfficiency_target");

750

}

751

catch (std::exception &e) {

752

std::cerr << "AdaptiveSampler::mergeState warning - "

753

<< "required keyword missing in " << filename

754

<< std::endl;

755

}

756

int ncells = fTopCell->deserialize(fin);

757

return (ncells > 0);

758

}

759

760

int AdaptiveSampler::restoreState(const std::string filename)

761

{

762

if (fTopCell->divAxis > -1) {

763

for (int i=0; i < 3; ++i) {

764

delete fTopCell->subcell[i];

765

fTopCell->subcell[i] = 0;

766

}

767

}

768

fTopCell->divAxis = -1;

769

reset_stats();

770

return mergeState(filename);

771

}

772

773

int AdaptiveSampler::getVerbosity()

774

{

775

return verbosity;

776

}

777

778

void AdaptiveSampler::setVerbosity(int verbose)

779

{

780

verbosity = verbose;

781

}

782

783

int AdaptiveSampler::check_subsets(bool optimized)

784

{

785

return fTopCell->check_subsets(optimized);

786

}

787

788

void AdaptiveSampler::Cell::optimize()

789

{

790

// Assume opt_nhit and opt_subset already set upon entry,

791

// task is to assign opt_wI2, opt_wI4 for this cell, and

792

// all opt_* parameters for child nodes in the tree.

793

if (divAxis > -1) {

794

if (divAxis < nfixed) {

795

opt_wI2 = 0;

796

opt_wI4 = 0;

797

double r = (opt_nhit + 1e-99) / nhit;

798

for (int n=0; n < 3; ++n) {

799

subcell[n]->opt_nhit = subcell[n]->nhit * r;

800

subcell[n]->opt_subset = opt_subset;

801

subcell[n]->optimize();

802

opt_wI2 += subcell[n]->opt_wI2;

803

opt_wI4 += subcell[n]->opt_wI4;

804

}

805

}

806

else {

807

opt_wI2 = 0;

808

opt_wI4 = 0;

809

for (int n=0; n < 3; ++n) {

810

double r = subcell[n]->sum_wI2s / sum_wI2s;

811

// double r0 = 5e-3; // minimum subset fraction

812

// r = (r + r0) / (1 + 3*r0);

813

// r = (nhit > 100)? r : 1;

814

subcell[n]->opt_nhit = opt_nhit * r;

815

if (n == 2) {

816

double opt_ntot = subcell[0]->opt_nhit +

817

subcell[1]->opt_nhit +

818

subcell[2]->opt_nhit;

819

assert (abs(opt_ntot - opt_nhit) < 2)((abs(opt_ntot - opt_nhit) < 2) ? static_cast<void> (
0) : __assert_fail ("abs(opt_ntot - opt_nhit) < 2", "src/AdaptiveSampler.cc"
, 819, __PRETTY_FUNCTION__));

820

}

821

subcell[n]->opt_subset = opt_subset * r;

822

subcell[n]->optimize();

823

opt_wI2 += subcell[n]->opt_wI2;

824

opt_wI4 += subcell[n]->opt_wI4;

825

}

826

}

827

}

828

else if (nhit > 0) {

829

double r = (opt_nhit + 1e-99) / nhit;

830

opt_wI2 = sum_wI2 / r;

831

opt_wI4 = sum_wI4 / pow(r,3);

832

}

833

}

834

835

int AdaptiveSampler::Cell::serialize(std::ofstream &ofs, bool optimized)

836

{

837

ofs << "divAxis=" << divAxis << std::endl;

838

if (nhit != 0)

839

ofs << "nhit=" << nhit << std::endl;

840

if (sum_wI != 0)

841

ofs << "sum_wI=" << (std::isfinite(sum_wI)? sum_wI : 0) << std::endl;

842

if (sum_wI2 != 0)

843

ofs << "sum_wI2=" << (std::isfinite(sum_wI2)? sum_wI2 : 0) << std::endl;

844

if (sum_wI4 != 0)

845

ofs << "sum_wI4=" << (std::isfinite(sum_wI4)? sum_wI4 : 0) << std::endl;

846

for (int i=0; i < 3*ndim; ++i) {

847

if (sum_wI2d[i] != 0)

848

ofs << "sum_wI2d[" << i << "]="

849

<< (std::isfinite(sum_wI2d[i])? sum_wI2d[i] : 0) << std::endl;

850

}

851

for (int i=0; i < 3*ndim; ++i) {

852

if (sum_wI4d[i] != 0)

853

ofs << "sum_wI4d[" << i << "]="

854

<< (std::isfinite(sum_wI4d[i])? sum_wI4d[i] : 0) << std::endl;

855

}

856

if (optimized)

857

ofs << "subset=" << std::setprecision(20) << opt_subset << std::endl;

858

else

859

ofs << "subset=" << std::setprecision(20) << subset << std::endl;

860

ofs << "=" << std::endl;

861

int count = 1;

862

if (divAxis > -1) {

863

count += subcell[0]->serialize(ofs, optimized);

864

count += subcell[1]->serialize(ofs, optimized);

865

count += subcell[2]->serialize(ofs, optimized);

866

}

867

return count;

868

}

869

870

int AdaptiveSampler::Cell::deserialize(std::ifstream &ifs, double subset_multiplier)

871

{

872

std::map<std::string,double> keyval;

873

while (true) {

874

std::string key;

875

std::getline(ifs, key, '=');

876

if (key.size() == 0) {

877

std::getline(ifs, key);

878

break;

879

}

880

ifs >> keyval[key];

881

std::getline(ifs, key);

882

}

883

divAxis = keyval.at("divAxis");

884

nhit += keyval["nhit"];

885

sum_wI += keyval["sum_wI"];

886

sum_wI2 += keyval["sum_wI2"];

887

sum_wI4 += keyval["sum_wI4"];

888

subset = keyval.at("subset") * subset_multiplier;

889

std::map<std::string,double>::iterator it;

890

for (it = keyval.begin(); it != keyval.end(); ++it) {

891

if (it->first.substr(0,8) == "sum_wI2u") {

892

int n3dim=1;

893

for (int n=0; n < ndim; ++n)

894

n3dim *= 3;

895

int j;

896

if (sscanf(it->first.c_str(), "sum_wI2u[%d]=", &j) == 1) {

897

assert (j < n3dim)((j < n3dim) ? static_cast<void> (0) : __assert_fail
("j < n3dim", "src/AdaptiveSampler.cc", 897, __PRETTY_FUNCTION__
));

898

for (int n=0; n < ndim; ++n) {

899

double wI2 = it->second;

900

sum_wI2d[n*3+(j%3)] += wI2;

901

sum_wI4d[n*3+(j%3)] += wI2*wI2 * n3dim/nhit;

902

j /= 3;

903

}

904

}

905

}

906

else if (it->first.substr(0,8) == "sum_wI2d") {

907

int j;

908

if (sscanf(it->first.c_str(), "sum_wI2d[%d]=", &j) == 1) {

909

assert (j < 3*ndim)((j < 3*ndim) ? static_cast<void> (0) : __assert_fail
("j < 3*ndim", "src/AdaptiveSampler.cc", 909, __PRETTY_FUNCTION__
));

910

sum_wI2d[j] += it->second;

911

}

912

}

913

else if (it->first.substr(0,8) == "sum_wI4d") {

914

int j;

915

if (sscanf(it->first.c_str(), "sum_wI4d[%d]=", &j) == 1) {

916

assert (j < 3*ndim)((j < 3*ndim) ? static_cast<void> (0) : __assert_fail
("j < 3*ndim", "src/AdaptiveSampler.cc", 916, __PRETTY_FUNCTION__
));

917

sum_wI4d[j] += it->second;

918

}

919

}

920

}

921

int count = 1;

922

if (divAxis > -1) {

923

//subset_multiplier *= (divAxis < 1)? 3 : 1;

924

for (int i=0; i < 3; ++i) {

925

if (subcell[i] == 0) {

926

subcell[i] = new Cell(ndim, nfixed);

927

}

928

count += subcell[i]->deserialize(ifs, subset_multiplier);

929

}

930

}

931

return count;

932

}

933

934

int AdaptiveSampler::Cell::check_subsets(bool optimized, std::string id)

935

{

936

if (divAxis < 0) {

937

return 0;

938

}

939

940

// test trinomial statistics at each node

941

int warnings = 0;

942

long int Ntot = subcell[0]->nhit + subcell[1]->nhit + subcell[2]->nhit;

943

if (Ntot != nhit) {

944

std::cerr << "Error in AdaptiveSampler::Cell::check_subsets - "

945

<< "nhit consistency check #1 failed for cell " << id

946

<< std::endl

947

<< " split cell nhit=" << nhit

948

<< ", sum of subcell nhit=" << Ntot

949

<< std::endl;

950

return 999;

951

}

952

if (divAxis < nfixed) {

953

if (subcell[0]->subset != subset ||

954

subcell[1]->subset != subset ||

955

subcell[2]->subset != subset)

956

{

957

std::cerr << "Error in AdaptiveSampler::Cell::check_subsets - "

958

<< "subset consistency check #1 failed for cell " << id

959

<< std::endl

960

<< " split fixed cell subset=" << subset

961

<< ", subcell subsets="

962

<< subcell[0]->subset << ","

963

<< subcell[1]->subset << ","

964

<< subcell[2]->subset

965

<< std::endl;

966

return 999;

967

}

968

}

969

else {

970

double subsetsum = subcell[0]->subset +

971

subcell[1]->subset +

972

subcell[2]->subset;

973

if (fabs(subsetsum - subset) > subset * 1e-12) {

974

std::cerr << "Error in AdaptiveSampler::Cell::check_subsets - "

975

<< "subset consistency check #2 failed for cell " << id

976

<< std::endl

977

<< " split cell subset=" << subset

978

<< ", sum of subcell subsets=" << subsetsum

979

<< std::endl;

980

return 999;

981

}

982

for (int i=0; i < 3; ++i) {

983

double p = subcell[i]->subset / (subset + 1e-99);

984

double mu = nhit * p;

985

double sigma = sqrt(nhit * p * (1-p));

986

double p1 = subcell[(i+1)%3]->subset / (subset + 1e-99);

987

double mu1 = nhit * p1;

988

double sigma1 = sqrt(nhit * p1 * (1-p1));

989

double p2 = subcell[(i+2)%3]->subset / (subset + 1e-99);

990

double mu2 = nhit * p2;

991

double sigma2 = sqrt(nhit * p2 * (1-p2));

992

if (subcell[i]->nhit < 30) {

993

double prob = 1;

994

for (int k=1; k <= nhit; ++k) {

995

if (k <=subcell[i]->nhit)

996

prob *= p * double(nhit - k + 1) / k;

997

else

998

prob *= 1 - p;

999

}

1000

if (prob < 1e-6) {

1001

std::cerr << "Warning in AdaptiveSampler::Cell::check_subsets - "

1002

<< "nhit - subset probability < 1e-6, cell "

1003

<< id << ", subcell " << i

1004

<< " has nhit=" << subcell[i]->nhit

1005

<< ", expected " << mu << " +/- " << sigma

1006

<< ", P-value " << prob

1007

<< std::endl;

1008

std::cerr << " This cell is splits " << nhit

1009

<< " events along axis " << divAxis

1010

<< std::endl;

1011

std::cerr << " Other branch of this cell:"

1012

<< " subcell " << (i+1)%3

1013

<< " with nhit=" << subcell[(i+1)%3]->nhit

1014

<< ", expected " << mu1 << " +/- " << sigma1

1015

<< ", " << (subcell[(i+1)%3]->nhit - mu1) / sigma1

1016

<< " sigma." << std::endl;

1017

std::cerr << " Other branch of this cell:"

1018

<< " subcell " << (i+2)%3

1019

<< " with nhit=" << subcell[(i+2)%3]->nhit

1020

<< ", expected " << mu2 << " +/- " << sigma2

1021

<< ", " << (subcell[(i+2)%3]->nhit - mu2) / sigma2

1022

<< " sigma." << std::endl;

1023

warnings++;

1024

}

1025

}

1026

else if (fabs(subcell[i]->nhit - mu) > 5 * sigma) {

1027

std::cerr << "Warning in AdaptiveSampler::Cell::check_subsets - "

1028

<< "nhit - subset mismatch > 5 sigma, cell "

1029

<< id << ", subcell " << i

1030

<< " has nhit=" << subcell[i]->nhit

1031

<< ", expected " << mu << " +/- " << sigma

1032

<< ", " << (subcell[i]->nhit - mu) / sigma

1033

<< " sigma!" << std::endl;

1034

std::cerr << " This cell is splits " << nhit

1035

<< " events along axis " << divAxis

1036

<< std::endl;

1037

std::cerr << " Other branch of this cell:"

1038

<< " subcell " << (i+1)%3

1039

<< " with nhit=" << subcell[(i+1)%3]->nhit

1040

<< ", expected " << mu1 << " +/- " << sigma1

1041

<< ", " << (subcell[(i+1)%3]->nhit - mu1) / sigma1

1042

<< " sigma." << std::endl;

1043

std::cerr << " Other branch of this cell:"

1044

<< " subcell " << (i+2)%3

1045

<< " with nhit=" << subcell[(i+2)%3]->nhit

1046

<< ", expected " << mu2 << " +/- " << sigma2

1047

<< ", " << (subcell[(i+2)%3]->nhit - mu2) / sigma2

1048

<< " sigma." << std::endl;

1049

warnings++;

1050

}

1051

}

1052

}

1053

for (int i=0; i < 3; ++i) {

1054

warnings += subcell[i]->check_subsets(optimized,

1055

id + std::to_string(i));

1056

}

1057

return warnings;

1058

}

1059

1060

std::string AdaptiveSampler::Cell::address()

1061

{

1062

// assumes sum_stats has been called, so super is filled in

1063

std::string a;

1064

Cell *here = this;

1065

while (here != 0 && here->super != 0) {

1066

std::string sn = "?";

1067

for (int n=0; n < 3; ++n) {

1068

if (here->super->subcell[n] == here) {

1069

sn = std::to_string(n);

1070

here = here->super;

1071

break;

1072

}

1073

}

1074

a.insert(0, sn);

1075

}

1076

a.insert(0, "o");

1077

return a;

1078

}