MantisBT - Hall D Online
View Issue Details
0000440Hall D OnlineDAQpublic2014-12-16 16:492014-12-30 15:41
davidl 
 
normalmajorsometimes
newopen 
No
0
0
0000440: Files on tape not matching ones on RAID
After discussing with David what we should do to verify that
data is being correctly written to tape, I decided to
use the md5checksum. This information is created and according
to David, checked when files are written to tape. So it seemed
like a duplicated effort, but I went ahead anyway.

The numbers I am comparing are
1. the line that starts with "md5=" within each file for
/mss/halld/RunPeriod-2014-10/rawdata/Run002104/*.evio
2. running md5sum on each file on the evio files on the gluonraids

Running md5sum is rather cumbersome since you have to compute
this value based on every single byte in the file. This can
take between 1-3 min. for each file, so to save time I restricted
myself to files ending in _000.evio.

The result is, I find 5 pairs where the md5sum do not agree.
The runs and md5sums are below. The "<" files are from /mss,
the ">" files are from running on the gluonraid files.

< hd_rawdata_000990_000.evio c68868a8d28e04f9a9c3e1da69b16f08
> hd_rawdata_000990_000.evio 289c814000df3c24c9b21dd8b9db224b

< hd_rawdata_001291_000.evio 6f6bf352b2549a0eb5fc2b95b56fafe3
> hd_rawdata_001291_000.evio 46d356ff5033b0565bf0a2d361f72745

< hd_rawdata_001398_000.evio 4d940f220dba37209f3a2749fa8c6e40
> hd_rawdata_001398_000.evio d082eb4b27035e9034f0671011ede4f4

< hd_rawdata_001642_000.evio 5b3283c1a361e4a41ff16c690faa8849
> hd_rawdata_001642_000.evio 49035579053149f1d6879a98a7c0c21c

< hd_rawdata_002104_000.evio 69c7515b40359444b6c0afd82ba3908d
> hd_rawdata_002104_000.evio a75a12de0f8fa4b17cc8729dc1fd81b5

I ran over the 000 files between 940 and 2245, but note that
there are actually some runs where file 000 is missing for one
reason or the other (DAQ crash etc). So for roughly 1300 files,
I find 5 files (~0.4%) have wrong md5 checksums.

I checked the size of each file, and each pair seems to have different
values. The below shows file names, size shown in /mss, and size in
gluonraid. The gluonraid files are always bigger.

000990_000.evio 16132156448 16132463696
001291_000.evio 239303640 397829944
001398_000.evio 5645101228 5645842860
001642_000.evio 5707567564 5709949640
002104_000.evio 715201656 1014051452

The runs that show a difference don't seem to be clustered
at any specific value. Since there is now the potential that
there is something wrong in the copying process, we may
need to think of how to eliminate the bug.

    Kei
I checked the file sizes of all evio files in the gluonraid disks
and the /mss system.

Below are the 17 files that do not agree in size:

< hd_raw_000691_000.evio 797324740
> hd_raw_000691_000.evio 1734616380

> hd_raw_000692_000.evio 5411961244
< hd_raw_000692_000.evio 797095824

< hd_raw_000696_000.evio 818140800
> hd_raw_000696_000.evio 1964263484

< hd_raw_000697_000.evio 938227060
> hd_raw_000697_000.evio 949686472

< hd_raw_000700_000.evio 818631544
> hd_raw_000700_000.evio 4117328924

< hd_raw_000702_000.evio 820366980
> hd_raw_000702_000.evio 3363271288

< hd_raw_000707_000.evio 1292643512
> hd_raw_000707_000.evio 2594874824

< hd_raw_000710_000.evio 1290320092
> hd_raw_000710_000.evio 2498216628

< hd_raw_000718_000.evio 1291617672
> hd_raw_000718_000.evio 10043266080

< hd_raw_000731_000.evio 1101749368
> hd_raw_000731_000.evio 1312180216

< hd_raw_000803_000.evio 1232781032
> hd_raw_000803_000.evio 1565069964

< hd_rawdata_000990_000.evio 16132156448
> hd_rawdata_000990_000.evio 16132463696

< hd_rawdata_001291_000.evio 239303640
> hd_rawdata_001291_000.evio 397829944

< hd_rawdata_001398_000.evio 5645101228
> hd_rawdata_001398_000.evio 5645842860

< hd_rawdata_001642_000.evio 5707567564
> hd_rawdata_001642_000.evio 5709949640

< hd_rawdata_001656_036.evio 5229012312
> hd_rawdata_001656_036.evio 5229903160

< hd_rawdata_002104_000.evio 715201656
> hd_rawdata_002104_000.evio 1014051452
No tags attached.
Issue History
2014-12-16 16:49davidlNew Issue
2014-12-16 16:49davidlWork by outside group for Hall D => No
2014-12-16 16:49davidlPercentage complete => 0
2014-12-16 16:49davidlActual man-weeks => 0
2014-12-17 05:17kmoriyaDescription Updated
2014-12-17 05:17kmoriyaAdditional Information Updated
2014-12-30 15:41sdobbsNote Added: 0000624
2014-12-30 15:42sdobbsNote Edited: 0000624

Notes
(0000624)
sdobbs   
2014-12-30 15:41   
(edited on: 2014-12-30 15:42)
I checked the md5 checksums of all the EVIO files on the RAID disk with those reported by the tape system in the stub files in /mss [Note that there were a handful of files my script didn't find on the RAID disk (about 20)]. Of the analyzed files, the files whose checksums did not match overlaps almost perfectly with Kei's list - there is one extra file which presumably was created after his analysis.

This is consistent with the hypothesis that the files are being copied to tape without error, and the mismatches we see are only due to files on tape being truncated.

                filename RAID checksum tape checksum

        hd_raw_000691_000.evio 39f677ec9417a8b32e07773f780e2b4e 9f2a3f3289e77dc1240c1794d6b9b262
        hd_raw_000692_000.evio 012b6a53f7d0ef1854884a92bd1e4048 17e1afb6359c7582ed7c680224d42522
        hd_raw_000696_000.evio 4ad2e8b77ef8eb5c9746bddc0e7f76af 03e005874f048917d5f71235ccfd6ada
        hd_raw_000697_000.evio 7bc86001ca2b561c29640f9024626e6e d4b548fb4db62096d0114f34594f3641
        hd_raw_000700_000.evio e58acfc4cea2e4ab8d8a7a43c77af08a 798db9c2a0e359384062123f41166820
        hd_raw_000702_000.evio 3c71b0cd9642165368a11ab24cfd81f4 90e84753f2558ee1c871c5c849ce42de
        hd_raw_000707_000.evio 885c811f69c957b9fd31bf8d9619f928 e555119d6731a1bef043fc9197186d72
        hd_raw_000710_000.evio 23be1029c2194bd4f4aec9f4b851ba98 26666c83f2ef6f4c2926571475d003c3
        hd_raw_000718_000.evio b67fb9900b75de21bb6d7d7395f69193 084007a8ea31915c7f421eaac97a351c
        hd_raw_000731_000.evio b4ad2166a83f2ce25e383614f09d0c68 024772c69d13a21d708f766eb4d032f8
        hd_raw_000803_000.evio 44bbb75ed7f0240e2e8b8421eff3a85c f4da62d69ddfe8ec0f1fd1852150fdc5
    hd_rawdata_000990_000.evio 289c814000df3c24c9b21dd8b9db224b c68868a8d28e04f9a9c3e1da69b16f08
    hd_rawdata_001291_000.evio 46d356ff5033b0565bf0a2d361f72745 6f6bf352b2549a0eb5fc2b95b56fafe3
    hd_rawdata_001398_000.evio d082eb4b27035e9034f0671011ede4f4 4d940f220dba37209f3a2749fa8c6e40
    hd_rawdata_001642_000.evio 49035579053149f1d6879a98a7c0c21c 5b3283c1a361e4a41ff16c690faa8849
    hd_rawdata_001656_036.evio c17fc4b1c1a81d0b66bf69a54295a5fd 0450214ed7d71c09be5dfeec6e11a595
    hd_rawdata_002104_000.evio a75a12de0f8fa4b17cc8729dc1fd81b5 69c7515b40359444b6c0afd82ba3908d
    hd_rawdata_002361_000.evio 744558a07d81abcbe81523befd74e8bb 14189d779a6c62c4cdb16b0aeb9c1b96