Investigations and Repairs

From GlueXWiki
Revision as of 19:24, 11 April 2016 by Jzarling (Talk | contribs)

Jump to: navigation, search

This page is intended to be a reference for past and ongoing issues regarding base failures, investigations, and repairs. Dates are provided on some bullet points for extra clarity. An inventory of bases can be found at https://halldweb.jlab.org/JInventory/htdocs/list.php#. Contact Jon Zarling or Adesh Subedi for instructions on navigating JInventory.


HV Issues

  • See https://logbooks.jlab.org/book/hdfcal for a full list of base replacements.
  • Bases continue to experience HV failure at a rate of ~1 new base per day (4/11/16).
  • High temperature (in the ballpark of 35-40 C) is correlated with base HV failure. The cause and effect nature of is still not entirely clear at this point.
  • High current, with HV sagging by ~30 V is another common observation. These bases can continue to operate for months, though it is believed (at least by Adesh and Jon) that these will fail at some point. Unfortunately, no monitoring data exists to observe the moment of HV failure, to the best of Jon's awareness.
    • ONGOING: A setup in CEBAF can externally heat up to 12 bases to ~35 C at present. Currently this can only run during daytime hours. Over three eight hour periods no HV failures were seen among 12 recently removed from FCAL bases. Jon intends to investigate for a few days at 40 C before drawing conclusions.

Communications Issues

  • From Feb-April 8 2016 nine bases had severe communications issues on the FCAL with both EPICS and IU software. Adesh attempted to address them both individually and together with all other bases on strand. They gave no responses back for a matter of days. Power cycling via software controls did not help. However, when removed from the FCAL these bases appear to respond just fine (no long term monitoring has been performed yet).
    • TO DO: Check to see if these bases can still receive messages before removing from FCAL. This is most easily done by seeing if the base in question can receive an LED ON command.

Incorrect SOCK/tran pin Issues

  • It was discovered that plugging the bottom row of the tran board to the top row of the sock board (leaving the top row of tran pins unconnected) led to significant issues (4/11/16).
    • This caused the back plate to become charged to beyond (-)60 V, presenting a nasty shock hazard.
    • Bases building up charge can also discharge to a neighboring base's back plate. This in turn can cause some nearby bases to reset (presumably due to rf noise).
    • These resetting bases caused base monitoring to become very difficult, particularly from late February to April 11 at JLab.

Bootloader Issues

  • There were a number of issues when attempting to reprogram bases via the bootloader. No issues have been observed while reprogramming via SWIM cables. The bootloader scheme of reprogramming bases was devised by Dan Bennett, with some advice from one of the seller companies.
  • Sketch of bootloader checking procedure: the normal operating firmware is restricted to a certain address range. Other firmware exists for bootloader operation, but can only be altered when reprogrammed via swim cables (in theory).
    • The firmware is sent along CAN messages. Each message contains data, address to reprogram, and a checksum of the message. The base sums the message locally, and sends a response byte to indicate whether the local and server checksums match.
  • In practice, the checksum scheme is insufficient: a large number of issues occur. In some cases, one base can corrupt ALL other bases on a strand. From studies done last summer, it was determined that all bases cause failure if the upload procedure is performed enough times, and can lead to corruption of other bases. Somehow addresses outside the correct range are altered, and somehow bases mimic the server ID. Neither is well understood, one would think both should be astronomically unlikely if truly random.
  • Jon estimates two weeks of labor to upload a new firmware to the FCAL.