CDC run 1280 bad events

From GlueXWiki
Jump to: navigation, search
Run 1280 Evt 14992 roc 26 slot 3 chan 56
original
upsampled
both


Below are the original sample values followed by the upsampled values for that same time, and then at 4 time-steps of 1.6ns later (calculated in C++, floating point arith). The first upsampled value should match the original value +/- 1. Clearly something is wrong here, first with the original data (contains discontinuities) and then with the upsampled data.

sample: 45  original:   57  upsampled:    57    54    50    48    46
sample: 46  original:   44  upsampled:    45    44    45    46    49
sample: 47  original:   51  upsampled:    51    55    59    63    67
sample: 48  original:   71  upsampled:    72    76    81    86    92
sample: 49  original:   94  upsampled:    98   103   107   110   110
sample: 50  original:  117  upsampled:   108   107   113   130   162
sample: 51  original:  174  upsampled:   207   260   311   346   353
sample: 52  original:  375  upsampled:   331   287   241   220   245
sample: 53  original:  257  upsampled:   334   485   680   889  1075
sample: 54  original: 1287  upsampled:  1212  1289  1316  1316  1323
sample: 55  original: 1294  upsampled:  1365  1458  1602  1780  1962
sample: 56  original: 2196  upsampled:  2126  2254  2342  2399  2436
sample: 57  original: 2423  upsampled:  2469  2502  2534  2552  2542

Used C++ version of the integer arithmetic that is used in the VHDL code

C++ leading edge algo output: Searching for ADC value >= 300, ADC values surrounding the threshold crossing are

  ADC[41] 108
  ADC[42] 98
  ADC[43] 89
  ADC[44] 77
  ADC[45] 57
  ADC[46] 44
  ADC[47] 51
  ADC[48] 71
  ADC[49] 94
  ADC[50] 117
  ADC[51] 174
  ADC[52] 375

Subset of ADC values sent to the upsampling routine:
adc_subset[0] 89
adc_subset[1] 77
adc_subset[2] 57
... 
adc_subset[9] 375
adc_subset[10] 257
adc_subset[11] 1287
... up to 
adc_subset[14] 2423

thresholds: hi 231 lo 111
threshold 231 met or exceeded by sample 9 value 375

adc[8] 174
adc[7] 117
adc[6] 94

threshold 111 met or preceded by sample 6 value 94
itime1 60 x 0.1 samples

The low timing threshold is now known to be between samples 6 and 7, so upsampled points are calculated from 5.8 to 7.2. These are searched from the last point back to the first, looking for a value which matches or is lower than the low timing threshold (111).

upsampled 0 91 upsampled 1 97 upsampled 2 102 upsampled 3 107 upsampled 4 110 upsampled 5 109 upsampled 6 107 upsampled 7 107

threshold 111 met or preceded by upsampled point 7 value 107 itime2 14 x 0.1 samples

Upsampled values 1 (97) and 6 (107) should match the raw values for samples 6 (94) and 7 (117), respectively. They do not. The next stage in the calculation is to interpolate between the two points surrounding the threshold crossing but in this case the first point (upsampled 7) was less than the threshold, it all goes horribly wrong.

denom -107 limit 8


VHDL simulation output:

at 308 ns(1): Note:                                            upsampled: iubuf(0) 91 (/findtime_tb/uut/).
at 460 ns(1): Note:                                            upsampled: iubuf(1) 97 (/findtime_tb/uut/).
at 612 ns(1): Note:                                            upsampled: iubuf(2) 102 (/findtime_tb/uut/).
at 748 ns(1): Note:                                            upsampled: iubuf(3) 107 (/findtime_tb/uut/).
at 884 ns(1): Note:                                            upsampled: iubuf(4) 110 (/findtime_tb/uut/).
at 1036 ns(1): Note:                                            upsampled: iubuf(5) 109 (/findtime_tb/uut/).
at 1188 ns(1): Note:                                            upsampled: iubuf(6) 107 (/findtime_tb/uut/).
at 1340 ns(1): Note:                                            upsampled: iubuf(7) 107 (/findtime_tb/uut/).

at 1348 ns(1): Note: i 7 adc val 107 lo thres 111 (/findtime_tb/uut/).
at 1348 ns(1): Note: adc_sample_lo2 7 (/findtime_tb/uut/).
at 1348 ns(1): Note: itime2 14 x 0.1samples (/findtime_tb/uut/).
at 1356 ns(1): Note: denom -107 (/findtime_tb/uut/).
at 1356 ns(1): Note: limit 8 (/findtime_tb/uut/).
at 1364 ns(1): Note:  sum -107 ifrac 1 (/findtime_tb/uut/).
at 1372 ns(1): Note:  sum -214 ifrac 2 (/findtime_tb/uut/).
at 1380 ns(1): Note:  sum -321 ifrac 3 (/findtime_tb/uut/).
at 1388 ns(1): Note:  sum -428 ifrac 4 (/findtime_tb/uut/).

The good news is that the VHDL and C++ codes give the same results before VHDL descends into a spiral of doom.

The fix for the bug would be to test if upsampled[7] is <= the low timing threshold, if so then bail out with a rough time and an error code.


Run 1280 Evt 14992 roc 26 slot 3 chan 56
original
both
sample: 35  original:  115  upsampled:   116   116   116   118   121
sample: 36  original:  122  upsampled:   126   131   136   141   145
sample: 37  original:  157  upsampled:   149   154   165   184   214
sample: 38  original:  239  upsampled:   255   302   349   388   410
sample: 39  original:  420  upsampled:   410   386   343   290   237
sample: 40  original:  189  upsampled:   198   183   201   255   345
sample: 41  original:  456  upsampled:   464   602   746   882   996
sample: 42  original: 1109  upsampled:  1080  1131  1154  1157  1151
threshold 275 met or exceeded by sample 9 value 420
adc[8] 239
adc[7] 157
adc[6] 122
threshold 155 met or preceded by sample 6 value 122
itime1 60 x 0.1 samples
upsampled 0 121 
upsampled 1 125 
upsampled 2 130 
upsampled 3 136 
upsampled 4 141 
upsampled 5 145 
upsampled 6 148 
upsampled 7 154 
threshold 155 met or preceded by upsampled point 7 value 154

Same problem as previous, same fix.


Here is a normal event (8 out of 2.3 million CDC hits in the first part of this run had the problem shown above)

Run 1280 Evt 4 roc 27 slot 8 chan 49
original
both
sample: 35  original:   80  upsampled:    80    79    78    76    75
sample: 36  original:   73  upsampled:    73    72    70    68    67
sample: 37  original:   65  upsampled:    65    64    63    62    62
sample: 38  original:   62  upsampled:    62    64    66    68    72
sample: 39  original:   75  upsampled:    75    79    83    86    89
sample: 40  original:   91  upsampled:    91    92    93    93    92
sample: 41  original:   92  upsampled:    92    92    92    92    92
sample: 42  original:   93  upsampled:    93    94    94    95    95
sample: 43  original:   96  upsampled:    96    96    96    96    95
sample: 44  original:   95  upsampled:    95    95    94    94    94
sample: 45  original:   93  upsampled:    93    92    91    89    87
sample: 46  original:   85  upsampled:    85    83    81    78    77
sample: 47  original:   75  upsampled:    75    75    75    75    77
sample: 48  original:   79  upsampled:    79    82    85    89    94
sample: 49  original:   98  upsampled:    98   102   106   109   111
sample: 50  original:  113  upsampled:   113   113   112   110   107
sample: 51  original:  103  upsampled:   103    98    92    86    81
sample: 52  original:   75  upsampled:    75    71    67    65    64
sample: 53  original:   64  upsampled:    64    66    69    74    79
sample: 54  original:   84  upsampled:    85    91    96   102   106
sample: 55  original:  108  upsampled:   110   113   115   116   117
sample: 56  original:  115  upsampled:   118   121   125   134   146
sample: 57  original:  159  upsampled:   164   187   218   257   302
sample: 58  original:  351  upsampled:   354   413   477   547   619
sample: 59  original:  693  upsampled:   693   767   840   911   978
sample: 60  original: 1044  upsampled:  1040  1095  1144  1185  1218
sample: 61  original: 1250  upsampled:  1243  1259  1268  1269  1262
sample: 62  original: 1255  upsampled:  1248  1229  1204  1175  1142
sample: 63  original: 1110  upsampled:  1106  1068  1029   988   948
sample: 64  original:  908  upsampled:   907   867   827   788   750
sample: 65  original:  712  upsampled:   713   678   645   613   583
sample: 66  original:  554  upsampled:   556   531   509   489   471
sample: 67  original:  454  upsampled:   456   443   432   423   415
sample: 68  original:  408  upsampled:   409   405   401   397   394
sample: 69  original:  390  upsampled:   390   386   382   376   370
sample: 70  original:  363  upsampled:   363   355   347   338   329
sample: 71  original:  320  upsampled:   320   310   301   291   281

This is spot on until the adc value rises sharply, from then on there are small discrepancies (0 to 5) and larger ones (6 to 8) when above 1000. We are most interested in the region just above pedestal (ie 100-150).


Entry 83 eventnum 4 channelnum 3 rocid 25 slot 8 channel 49 nsamples 180

Searching for ADC value >= 288
  ADC[36] 73
...
  ADC[57] 159
  ADC[58] 351

adc_subset[0] 98
adc_subset[1] 113
adc_subset[2] 103
adc_subset[3] 75
adc_subset[4] 64
adc_subset[5] 84
adc_subset[6] 108
adc_subset[7] 115
adc_subset[8] 159
adc_subset[9] 351
adc_subset[10] 693
adc_subset[11] 1044
adc_subset[12] 1250
adc_subset[13] 1255
adc_subset[14] 1110
thresholds: hi 244 lo 124
threshold 244 met or exceeded by sample 9 value 351
adc[8] 159
adc[7] 115
threshold 124 met or preceded by sample 7 value 115
itime1 70 x 0.1 samples
upsampled 0 116 
upsampled 1 118 
upsampled 2 120 
upsampled 3 125 
upsampled 4 133 
upsampled 5 145 
upsampled 6 163 
upsampled 7 187 
threshold 124 met or preceded by upsampled point 2 value 120

Upsampling error check

After excluding known CDC_bad_channels I compared the upsampled data with the original samples for the first 10000 events of run 1602. This is a high-intensity run so some of the pedestals are lower than they should be.
ups_err1 is the upsampled value iubuf[1] - the original sample adc[adc_sample_lo]
ups_err2 is the upsampled value iubuf[6] - the original sample adc[adc_sample_lo+1]

q_code list:
   0: Good
   1: ADC data did not go over threshold adc_thres_hi 
   2: Leading edge time is outside the upsampled region (cross adc_thres_lo too late in the buffer subset ) 
   3: Last upsampled point is <= low timing threshold
   4: Upsampled points did not go below low timing threshold
Run 1602
ups err 1 (adc_sample_lo)
ups err 2 (adc_sample_lo + 1)
err 1 vs err 2
ups err 1 vs iubuf[1]
ups err 1 vs iubuf[6]
ups err 1 vs iubuf[6] - iubuf[1]

Extremely steep gradients are likely to cause a problem so check what happens when the adc saturates:

Run 1602
ups err 1 vs maxamp + pedestal
final sample in adc subset for upsampler, when maxamp=4095
ups err 1 (adc_sample_lo) vs pedestal
ups err 1 vs last value in adc subset
ups err 1 vs min value in adc subset
ups err 1 vs min value in adc subset (zoomed)


Improved code
Upsampled value can be negative following a dip between two small sampled values, catch this & bail out from algorithm. q_code=8 when any upsampled value<0

upsampled values when one is <0 vs min value in adc subset
upsampled values vs min value in subset, q_code=0


Total entries in file is 1366697
Entries are removed from processing as q_code>0 in this order...

q_code reason entries percent of total entries
5 ADC value = 0 53828 4%
6 ADC value = 4095 99891 7%
7 Ped > 511 47052 3%
1 ADC data did not go over threshold adc_thres_hi 10179 1%
2 Leading edge time is too late in the buffer (outside the upsampled region) 820 0.06%
8 Upsampled value is negative 396 0.03%
3 Last upsampled point is too low (below or equal to low timing threshold) 5 0.0004%
4 Upsampled points are too high (did not go below low timing threshold) 7903 0.6%
0 Good 1146623 84%

Looking at upsampling errors - occur when gradient is very large

ADC sample subset for q code 0
ADC sample subset for q code 4
ADC sample subset for q code 4, normalised to min value
Error in upsampled value vs original sample value adc_sample_lo, q code 4
Error in upsampled value vs original sample value adc_sample_lo+1, q code 4
Error in upsampled values, q code 0
Error in upsampled values, q code 4
Difference in error in upsampled values, q code 0 (blue) and 4 (pink)