,-----.  ,---.   ,----.   ,--.   ,--.                            ,--.             ,--.        
'  .--./ /  O  \ '  .-./   |   `.'   | ,---. ,--,--,      ,---. ,-'  '-.,--.,--. ,-|  | ,---.  
|  |    |  .-.  ||  | .---.|  |'.'|  || .-. ||      \    | .-. :'-.  .-'|  ||  |' .-. || .-. : 
'  '--'\|  | |  |'  '--'  ||  |   |  |' '-' '|  ||  |    \   --.  |  |  '  ''  '\ `-' |\   --. 
 `-----'`--' `--' `------' `--'   `--' `---' `--''--'     `----'  `--'   `----'  `---'  `----'                                                                                             

Description

The CAGMon etude is a study version of CAGMon that evaluates the dependence between the primary and auxiliary channels.

Project goal

The goal of this project is to find a systematic way of identifying the abnormal glitches in the gravitational-wave data using various methods of correlation analysis. Usually, the community such as LIGO, Virgo, and KAGRA uses a conventional way of finding glitches in auxiliary channels of the detector - Klein-Welle, Omicron, Ordered Veto Lists, etc. However, some different ways can be possible to find and monitor them in a (quasi-) realtime. Also, the method can point out which channel is responsible for the found glitch. In this project, we study its possible to apply three different correlation methods - maximal information coefficient, Pearson's correlation coefficient, and Kendall's tau coefficient - in the gravitational wave data from the KAGRA detector.

Participants

Methods and Frameworks

Maximal Information Coefficient (MIC)

the Maximal Information coefficient(MIC) of a set D of two-variable data with sample size n and the grid less than B(n) is given by

\[ MIC(D)=\underset{xy<B(n)}{\max}{\left\{ \frac{I^{*}(D,x,y)}{\log \min \left\{x,y \right\}} \right \}} \],

where \[\omega(1)<B(n)\le O(n^{1-\epsilon}) \] for some \[ 0<\epsilon<1 \]

Pearson's Correlation Coefficient (PCC)

Pearson Correlation Coefficient(PCC) is a statistic that explains the amount of variance accounted for in the relationship between two (or more) variables by \[ R=\sum_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{Y})} \over {\sqrt{\sum_{i=1}^{n} (X_i - \overline{X}) \sum_{i=1}^{n} (Y_i - \overline{Y})} \],

where \[ \overline{X} \] and \[ \overline{Y} \] are the mean of X and Y, respectively

Kendall's tau Coefficient

Kendall’s tau with a random samples n of observations from two variables measures the strength of the relationship between two ordinal level variables by

\[ \tau =\frac{c-d}n \choose 2 \],

where c is the number of concordant pairs, and d is the number of discordant pairs

Flow chart

Code development

GitHub

TBA

Code versions

  1. CAGMon Etude Alpha
    • for the basic test and evaluation of the LASSO regression method developed by LIGO
    • reproduced original CAGMon methods and idea
  2. CAGMon Etude Beta
    • added coefficient trend plots with LASSO beta, coherence, MIC, PCC, and Kendall's tau
  3. CAGMon Etude Delta
    • fixed a critical problem that sucked enormous memory when it used the matplotlib module
  4. CAGMon Etude Eta
    • fixed minor issues
    • added the range limitation of stride
  5. CAGMon Etude Flat
    • fixed minor issues and optimized scripts
    • added the script of HTML summary page
    • added coefficient distribution plots
  6. CAGMon Etude Octave (current version)
    • remove some processes that make Time-series and Scatter plots. Even though it required tremendous memory, this information is not useful
    • adjust HTML code
    • fixed minor issues and optimized scripts
    • added the analysis option whether or not the algorithm proceeds in the active segment only
    • improve script efficiency
    • added the process to make scatter and OmegaScan plots in detail boxes of the summary page

  7. CAGMon Etude Rhapsody (development version)
    • utilize MICe with the auto-selection method of Alpha
    • improve script efficiency and completeness
    • publish the script on Github

Series of scripts

User guide

Needs of code development

Empirical study (No free lunch)

  1. Apply to glitch data on KAGRA during O3GK
    • Glitch information
    • Purpose
      • To decide on appropriate parameters when we run CAGMon for searching glitches and correlation
      • To make recommended parameters in the short-range analysis
    • Result
    • Appropriate parameters of CAGMon for the glitch search
      • Data-size: 8,192 or 16,384
      • Stride: 0.5 or 1.0 seconds (the glitch duration users want to find)
  2. Statistical power test

Exemplary results

1. Earthquake effects during O3GK (with CAGMon Etude Flat)

2. With iKAGRA hardware injection data (with CAGMon Etude Flat)

3. Skim through some obs-segments of O3GK (with CAGMon Etude Octave)

4. Glitch analysis and channel correlation study during O3GK (with CAGMon Etude Octave)

5. Glitch analysis and channel correlation study during O3GK (with CAGMon Etude Rhapsody)

Cross-validation

Beyond

References

Presentation materials

JGW-G2112481-v1

Papers

Science.1518; Detecting Novel Associations in Large Data Sets