Differences between revisions 1 and 34 (spanning 33 versions)
Revision 1 as of 2021-01-20 13:22:49
Size: 1355
Editor: PJJung
Comment:
Revision 34 as of 2021-01-26 13:54:36
Size: 11542
Editor: PJJung
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= CAGMon etude =
== Descripstion ==
{{{
 ,-----. ,---. ,----. ,--. ,--. ,--. ,--.
' .--./ / O \ ' .-./ | `.' | ,---. ,--,--, ,---. ,-' '-.,--.,--. ,-| | ,---.
| | | .-. || | .---.| |'.'| || .-. || \ | .-. :'-. .-'| || |' .-. || .-. :
' '--'\| | | |' '--' || | | |' '-' '| || | \ --. | | ' '' '\ `-' |\ --.
 `-----'`--' `--' `------' `--' `--' `---' `--''--' `----' `--' `----' `---' `----'
}}}


== Description ==
Line 7: Line 15:
== Project Goal == == Project goal ==
Line 22: Line 30:
the Maximal Information coefficient(MIC) of a set D of two-variable data with sample size n and grid less than B(n) is given by

\[
MIC(D)=\underset{xy<B(n)}{\max}{\left\{ \frac{I^{*}(D,x,y)}{\log \min \left\{x,y \right\}} \right \}}
\],

where \[\omega(1)<B(n)\le O(n^{1-\epsilon}) \] for some \[ 0<\epsilon<1 \]
Line 23: Line 38:

Pearson Correlation Coefficient(PCC) is a statistic that explains the amount of variance accounted for in the relationship between two (or more) variables by
\[
R={{\sum_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{Y})} \over {\sqrt{\sum_{i=1}^{n} (X_i - \overline{X}) \sum_{i=1}^{n} (Y_i - \overline{Y})}}}
\],

where \[ \overline{X} \] and \[ \overline{Y} \] are the mean of X and Y, respectively
Line 26: Line 49:
Kendall’s tau with a random samples n of observations from two variables measures the strength of the relationship between two ordinal level variables by

\[
\tau =\frac{c-d}{{n \choose 2}}
\],

where c is the number of concordant pairs, and d is the number of discordant pairs

==== Flow chart ====
Line 28: Line 60:
== Exemplary Results ==
== Code development ==

==== GitHub ====
[[TBA]]

==== Code versions ====
 1. CAGMon Etude Alpha
   * for the basic test and evaluation of the LASSO regression method developed by LIGO
   * reproduced original CAGMon methods and idea
 2. CAGMon Etude Beta
  * added coefficient trend plots with LASSO beta, coherence, MIC, PCC, and Kendall's tau
 3. CAGMon Etude Delta
  * fixed a critical problem that sucked enormous memory when it used the matplotlib module
 4. CAGMon Etude Eta
  * fixed minor issues
  * added the range limitation of stride
 5. CAGMon Etude Flat (current version)
  * fixed minor issues and optimized scripts
  * added the script of HTML summary page
  * added coefficient distribution plots
 6. CAGMon Etude Octave (development version)
  * remove some processes that make Time-series and Scatter plots. Even though it required tremendous memory, this information is not useful
  * adjust HTML code
  * fixed minor issues and optimized scripts
 
==== Series of scripts ====
 * Agrement.py
  * the script gathered functions the model required
 * Melody.py
  * the script to calcutate each coefficient and to save trend data as csv
 * Conchord.py
  * the script to make plots
 * Echo.py
  * the script to save the result as HTML web page
 * CAGMonEtude{Version}.py
  * the script to run each script

==== User guide ====
 * [[/userguide | How to use CAGMon]]

==== Needs of code development ====
 * Fundamental critarian or guideline of the stride and its data-size
 * Daily running on KAGRA


== Exemplary results ==

1. Earthquake effects during O3GK
 * Datetime: 19 April 2020 20:39 UTC
 * Purpose
  * Test to run CAGMon algorithm with a remarkable event
  * To figure out the cause of lock-loss in KAGRA
 * Results
  * stride 5 seconds [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/2020-04-19_K1:CAL-CS_PROC_C00_STRAIN_DBL_DQ_1271363358-1271364078(5)/ | Summary page]]
  * stride 20 seconds [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/2020-04-19_K1:CAL-CS_PROC_C00_STRAIN_DBL_DQ_1271363358-1271364078/ | Summary page]]
  * stride 30 seconds [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/2020-04-19_K1:CAL-CS_PROC_C00_STRAIN_DBL_DQ_1271363358-1271364078(30)/ | Summary page]]

2. Skim through all obs-segments of O3GK
 * Purpose
  * Test for calculation time and required resources with all observation segments during O3GK
  * To figure out trigger events or abnormal behaviors
 * Results
  || Date || GPS time || Stride || Sample rate || Data size || Summary page link || Remarks ||
  || April 7 || 1270287158 - 1270328032 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 8 || 1270339218 - 1270425618 || || || || || Full Data is unavailable in the KISTI cluster ||
  || April 9 || 1270425618 - 1270510167 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 10 || 1270513160 - 1270596544 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 11 || 1270598418 - 1270683904 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 12 || 1270684818 - 1270762046 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 14 || 1270909686 - 1270937768 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 15 || 1270945288 - 1271017582 || || || || || GRB200415 (08:48:05 UTC) / Full Data is unavailable in the KISTI cluster ||
  || April 16 || 1271030433 - 1271112809 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 17 || 1271119833 - 1271186507 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 18 || 1271227441 - 1271288128 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 19 || 1271289618 - 1271364033 || 600s || 16Hz || about 10,000 || [[ | summary page]] || ||
  || || || 300s || 32Hz || about 10,000 || [[ | summary page]] || ||
  || April 20 || 1271377409 - 1271460608 || || || || || GRB200420A (2:32:58 UTC) / Full Data is unavailable in the KISTI cluster ||

3. With iKAGRA hardware injection data
 * Event
  * Phenomenon: the strain channel and seismometer channels in iKAGRA had a high correlation during the hardware injection test
  * Cause: still unknown
  * Hypothesis: the glitches have relatively the same behavior as the vacuum rotary pump
  * More detail analysis: [[https://www.dropbox.com/s/950vjc807sgz24u/hveto%20brief%20Report%20for%20K1.pdf?dl=0 | hveto brief Report for K1]] and [[https://www.dropbox.com/s/hb7rx93an8yluiq/PilJong%2C%20KGWG%20Face-to-Face%20Meeting.pdf?dl=0 | KGWG Face-to-Face Meeting]]
 * Purpose
  * To verify whether this model senses injected signals and abnormal glitches
  * To test noise resistance and data-size limitation
 *Results
  || Stride || Sample sata || Data size || Dada length || Summary page link ||
  || 10s || 512Hz || about 5,000 || about 12m || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b5000%5d/ | summary page]] ||
  || 10s || 1024Hz || about 10,000 || about 12m || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b10000%5d/ | summary page]] ||
  || 10s || 2048Hz || about 20,000 || about 12m || [[ https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b20000%5d/| summary page]] ||
  || 10s || 3072Hz || about 30,000 || about 12m || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b30000%5d/ | summary page]] ||
  || 10s || 4096Hz || about 40,000 || about 12m || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b40000%5d/ | summary page]] ||
  || 2s || 4096Hz || about 8,000 || about 12m || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b2s%5d/| summary page]] ||
  || 5s || 4096Hz || about 20,000 || about 12m || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b5s%5d/| summary page]] ||
  || 60s || 128Hz || about 7,500 || whole iKAGRA data || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b60s%5d/ | summary page]] ||
  || 150s || 64Hz || about 10,000 || whole iKAGRA data || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b150s%5d/ | summary page]] ||
  || 300s || 64Hz || about 20,000 || whole iKAGRA data || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b300s%5d/ | summary page]] ||
  || 600s || 16Hz || about 10,000 || whole iKAGRA data || [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b600s%5d/ | summary page]] ||
Line 34: Line 176:
==== Presentation Materials ==== ==== Presentation materials ====
[[https://gwdoc.icrr.u-tokyo.ac.jp/cgi-bin/private/DocDB/ShowDocument?docid=12481|JGW-G2112481-v1]]
Line 37: Line 180:
[[https://science.sciencemag.org/content/334/6062/1518 | Science.1518; Detecting Novel Associations in Large Data Sets]]

 ,-----.  ,---.   ,----.   ,--.   ,--.                            ,--.             ,--.        
'  .--./ /  O  \ '  .-./   |   `.'   | ,---. ,--,--,      ,---. ,-'  '-.,--.,--. ,-|  | ,---.  
|  |    |  .-.  ||  | .---.|  |'.'|  || .-. ||      \    | .-. :'-.  .-'|  ||  |' .-. || .-. : 
'  '--'\|  | |  |'  '--'  ||  |   |  |' '-' '|  ||  |    \   --.  |  |  '  ''  '\ `-' |\   --. 
 `-----'`--' `--' `------' `--'   `--' `---' `--''--'     `----'  `--'   `----'  `---'  `----'                                                                                             

Description

The CAGMon etude is a study version of CAGMon that evaluates the dependence between the primary and auxiliary channels.

Project goal

The goal of this project is to find a systematic way of identifying the abnormal glitches in the gravitational-wave data using various methods of correlation analysis. Usually, the community such as LIGO, Virgo, and KAGRA uses a conventional way of finding glitches in auxiliary channels of the detector - Klein-Welle, Omicron, Ordered Veto Lists, etc. However, some different ways can be possible to find and monitor them in a (quasi-) realtime. Also, the method can point out which channel is responsible for the found glitch. In this project, we study its possible to apply three different correlation methods - maximal information coefficient, Pearson's correlation coefficient, and Kendall's tau coefficient - in the gravitational wave data from the KAGRA detector.

Participants

  • John.J Oh (NIMS)
  • Young-Min Kim (UNIST)
  • Pil-Jong Jung (NIMS)

Methods and Frameworks

Maximal Information Coefficient (MIC)

the Maximal Information coefficient(MIC) of a set D of two-variable data with sample size n and grid less than B(n) is given by

\[ MIC(D)=\underset{xy<B(n)}{\max}{\left\{ \frac{I^{*}(D,x,y)}{\log \min \left\{x,y \right\}} \right \}} \],

where \[\omega(1)<B(n)\le O(n^{1-\epsilon}) \] for some \[ 0<\epsilon<1 \]

Pearson's Correlation Coefficient (PCC)

Pearson Correlation Coefficient(PCC) is a statistic that explains the amount of variance accounted for in the relationship between two (or more) variables by \[ R=\sum_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{Y})} \over {\sqrt{\sum_{i=1}^{n} (X_i - \overline{X}) \sum_{i=1}^{n} (Y_i - \overline{Y})} \],

where \[ \overline{X} \] and \[ \overline{Y} \] are the mean of X and Y, respectively

Kendall's tau Coefficient

Kendall’s tau with a random samples n of observations from two variables measures the strength of the relationship between two ordinal level variables by

\[ \tau =\frac{c-d}n \choose 2 \],

where c is the number of concordant pairs, and d is the number of discordant pairs

Flow chart

Code development

GitHub

TBA

Code versions

  1. CAGMon Etude Alpha
    • for the basic test and evaluation of the LASSO regression method developed by LIGO
    • reproduced original CAGMon methods and idea
  2. CAGMon Etude Beta
    • added coefficient trend plots with LASSO beta, coherence, MIC, PCC, and Kendall's tau
  3. CAGMon Etude Delta
    • fixed a critical problem that sucked enormous memory when it used the matplotlib module
  4. CAGMon Etude Eta
    • fixed minor issues
    • added the range limitation of stride
  5. CAGMon Etude Flat (current version)
    • fixed minor issues and optimized scripts
    • added the script of HTML summary page
    • added coefficient distribution plots
  6. CAGMon Etude Octave (development version)
    • remove some processes that make Time-series and Scatter plots. Even though it required tremendous memory, this information is not useful
    • adjust HTML code
    • fixed minor issues and optimized scripts

Series of scripts

  • Agrement.py
    • the script gathered functions the model required
  • Melody.py
    • the script to calcutate each coefficient and to save trend data as csv
  • Conchord.py
    • the script to make plots
  • Echo.py
    • the script to save the result as HTML web page
  • CAGMonEtude{Version}.py
    • the script to run each script

User guide

Needs of code development

  • Fundamental critarian or guideline of the stride and its data-size
  • Daily running on KAGRA

Exemplary results

1. Earthquake effects during O3GK

  • Datetime: 19 April 2020 20:39 UTC
  • Purpose
    • Test to run CAGMon algorithm with a remarkable event
    • To figure out the cause of lock-loss in KAGRA
  • Results

2. Skim through all obs-segments of O3GK

  • Purpose
    • Test for calculation time and required resources with all observation segments during O3GK
    • To figure out trigger events or abnormal behaviors
  • Results
    • Date

      GPS time

      Stride

      Sample rate

      Data size

      Summary page link

      Remarks

      April 7

      1270287158 - 1270328032

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 8

      1270339218 - 1270425618

      Full Data is unavailable in the KISTI cluster

      April 9

      1270425618 - 1270510167

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 10

      1270513160 - 1270596544

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 11

      1270598418 - 1270683904

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 12

      1270684818 - 1270762046

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 14

      1270909686 - 1270937768

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 15

      1270945288 - 1271017582

      GRB200415 (08:48:05 UTC) / Full Data is unavailable in the KISTI cluster

      April 16

      1271030433 - 1271112809

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 17

      1271119833 - 1271186507

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 18

      1271227441 - 1271288128

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 19

      1271289618 - 1271364033

      600s

      16Hz

      about 10,000

      summary page

      300s

      32Hz

      about 10,000

      summary page

      April 20

      1271377409 - 1271460608

      GRB200420A (2:32:58 UTC) / Full Data is unavailable in the KISTI cluster

3. With iKAGRA hardware injection data

  • Event
    • Phenomenon: the strain channel and seismometer channels in iKAGRA had a high correlation during the hardware injection test
    • Cause: still unknown
    • Hypothesis: the glitches have relatively the same behavior as the vacuum rotary pump
    • More detail analysis: hveto brief Report for K1 and KGWG Face-to-Face Meeting

  • Purpose
    • To verify whether this model senses injected signals and abnormal glitches
    • To test noise resistance and data-size limitation
  • Results

Beyond

References

Presentation materials

JGW-G2112481-v1

Papers

Science.1518; Detecting Novel Associations in Large Data Sets

PJJung/CAGMonEtude (last edited 2021-07-28 08:43:57 by PJJung)