Joe was able to reconfigure his datafind server at CIT to return only frame files from the /frame0 archive, resolving the problem I reported yesterday. As of now, the k1sum0, k1sum1, k1det0, and k1det1 machines are equipped with a Conda environment that includes gwdatafind (and an automatic pointer to Joe’s server by default). As a result, the summary pages will be able to read data directly from frames.
Summary page failures
I was able to diagnose these by looking at the condor log files, and I was able to resolve all but one of them by cleaning up various configuration files. In most cases, the issues stemmed either from typos (e.g. duplicate variables) or channels no longer being available (which I just removed).
I was also able to test a complete working example of these changes, the output of which is linked below:
http://10.68.10.87/~controls/summary/testing/day/20190827/ (requires access from the KAGRA control site)
You’ll notice this landing page now includes a percentile spectrum and full spectrograms of the DARM displacement, as well as the LSC_LOCK_SIMPLE guardian, which demonstrates FPMI lock. For those not on site, a screenshot of this is attached below.
A few things to note about this
(1) There is a problem with the way the LSC_LOCK_SIMPLE guardian is written to frames, more on this later. (2) There is now a “Home” button linking back to the top-level daily KAGRA summary. (3) I was able to compute spectra without issue, so I don’t think there are any fundamental memory limitations. I am cautiously optimistic that the actual problem with these was caused by things like missing channels, and that it has been resolved by the work reported above. I suppose time will (quickly) tell. (4) All of this would be much easier to be notified of, diagnose, and fix if it were connected to Nagios. We should think carefully about this and, if possible, work out whether to allow the Nagios server to access k1sum0 (or set one up ourselves?).
I haven’t changed any of the production configurations yet, since I prefer to give you a chance to look this over first. Tomorrow I will continue to dig into this and try to resolve one remaining configuration issue and add more automated tools, like the check for software saturations. We can touch base about it then as Keiko will be back in the office.
Channel list configuration
In LIGO we maintain a master (or “standard”) list of channels for various DetChar analyses. We find that this is enormously useful — even critical — in keeping track of which channels couple into DARM, which ones can be analyzed at ultra low frequency (< 1 Hz), which ones are available to analyze in the first place, etc. This standard list is then used to produce the production run configurations for Omicron, Omega scans, Hveto, and other tools.
Siddharth painstakingly went through your commissioning frames today using FrameCPP, a frame reading library that I’ll want to show you how to use. Sidd was able to collect all sufficiently fast channels (sampled at > 64 Hz) and organize them into a first draft of a KAGRA standard channel list. He’s opened a merge request in our repository here:
Once this goes in, we’ll be able to organize and control the Omicron, Omega scan, and Hveto run configurations with greater precision using the tools we’ve developed on the LIGO side.
Finally, I promised to explain the issue I found with this guardian channel, which collects the status of FPMI lock for the interferometer. In general, guardian channels are 16 Hz timeseries records whose value at each sample is an integer indicating a state in the interferometer, e.g., 16 corresponding to FPRMI lock, 15 for transitioning to REFL17, etc. This particular channel, however, is stored as small floats on the order of 1e-5 or so. I noticed that if I divide out the channel by 6.103e-5 (a number very suspiciously close to 1/16384), then I can convert it back to the normal integer representation.
I suspect something has gone wrong in the frame writer with this channel, which will need to be corrected. DataViewer in the control room shows the guardian state as normal (i.e., with integers), so the problem only exists in frame files. For the time being, I’m able to work around this in the summary pages by configuring this channel to apply a lambda function rescaling it to the correct value.
Comments from Joe
One minor correct to Alex's report.
My tweaks were to the datafind server at Kagra not CIT. This fixed a BIG problem with ldvw. LDVW, datafind and NDS (same machine) had been configured with the wrong frame file servers.