2523
Comment:
|
9683
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= CAGMon etude = == Descripstion == |
{{{ ,-----. ,---. ,----. ,--. ,--. ,--. ,--. ' .--./ / O \ ' .-./ | `.' | ,---. ,--,--, ,---. ,-' '-.,--.,--. ,-| | ,---. | | | .-. || | .---.| |'.'| || .-. || \ | .-. :'-. .-'| || |' .-. || .-. : ' '--'\| | | |' '--' || | | |' '-' '| || | \ --. | | ' '' '\ `-' |\ --. `-----'`--' `--' `------' `--' `--' `---' `--''--' `----' `--' `----' `---' `----' }}} == Description == |
Line 22: | Line 30: |
the Maximal Information coefficient(MIC) of a set D of two-variable data with sample size n and grid less than B(n) is given by | |
Line 24: | Line 33: |
r=\frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i-\bar{x})^2} \sqrt{\sum_{i=1}^{n} (y_i -\bar{y})^2}} \] |
MIC(D)=\underset{xy<B(n)}{\max}{\left\{ \frac{I^{*}(D,x,y)}{\log \min \left\{x,y \right\}} \right \}} \], |
Line 27: | Line 36: |
where \[\omega(1)<B(n)\le O(n^{1-\epsilon}) \] for some \[ 0<\epsilon<1 \] ==== Pearson's Correlation Coefficient (PCC) ==== |
|
Line 28: | Line 39: |
==== Pearson's Correlation Coefficient (PCC) ==== | Pearson Correlation Coefficient(PCC) is a statistic that explains the amount of variance accounted for in the relationship between two (or more) variables by \[ R={{\sum_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{Y})} \over {\sqrt{\sum_{i=1}^{n} (X_i - \overline{X}) \sum_{i=1}^{n} (Y_i - \overline{Y})}}} \], where \[ \overline{X} \] and \[ \overline{Y} \] are the mean of X and Y, respectively |
Line 32: | Line 49: |
Kendall’s tau with a random samples n of observations from two variables measures the strength of the relationship between two ordinal level variables by \[ \tau =\frac{c-d}{{n \choose 2}} \], where c is the number of concordant pairs, and d is the number of discordant pairs ==== Flow chart ==== |
|
Line 34: | Line 63: |
==== Basic structure ==== | ==== GitHub ==== [[TBA]] |
Line 36: | Line 66: |
==== Code version ==== | ==== Code versions ==== |
Line 39: | Line 69: |
* reproduced original CAGMon methods and idea | |
Line 43: | Line 74: |
---- /!\ '''Edit conflict - other version:''' ---- |
|
Line 46: | Line 75: |
---- /!\ '''Edit conflict - your version:''' ---- 4. CAGMon Etude Eta ---- /!\ '''End of edit conflict''' ---- * fixed memory issues * fixed minor bugs |
* fixed minor issues |
Line 54: | Line 77: |
---- /!\ '''Edit conflict - other version:''' ---- 5. CAGMon Etude Flat (latest version) ---- /!\ '''Edit conflict - your version:''' ---- 5. CAGMon Etude Flat (latest version) ---- /!\ '''End of edit conflict''' ---- |
5. CAGMon Etude Flat (current version) * fixed minor issues and optimized scripts |
Line 64: | Line 81: |
6. CAGMon Etude Octave (development version) * remove some processes that make Time-series and Scatter plots. Even though it required tremendous memory, this information is not useful * adjust HTML code ==== Series of scripts ==== * Agrement.py * the script gathered functions the medel required * Melody.py * the script to calcutate each coefficient and to save trend data as csv * Conchord.py * the script to make plots * Echo.py * the script to save the result as HTML web page * CAGMonEtude{Version}.py * the script to run each script ==== User guide ==== * [[/userguide | How to use CAGMon]] ==== Needs of code development ==== * Fundamental critarian or guideline of the stride and its data-size * Daily running on KAGRA |
|
Line 66: | Line 106: |
1. Earthquake effects during O3GK * Datetime: 19 April 2020 20:39 UTC * Purpose * Test to run CAGMon algorithm with a remarkable event * To figure out the cause of lock-loss in KAGRA * Results * stride 5 seconds [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/2020-04-19_K1:CAL-CS_PROC_C00_STRAIN_DBL_DQ_1271363358-1271364078(5)/ | Summary page]] * stride 20 seconds [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/2020-04-19_K1:CAL-CS_PROC_C00_STRAIN_DBL_DQ_1271363358-1271364078/ | Summary page]] * stride 30 seconds [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/2020-04-19_K1:CAL-CS_PROC_C00_STRAIN_DBL_DQ_1271363358-1271364078(30)/ | Summary page]] 2. Skim through all obs-segments of O3GK * Purpose * Test for calculation time and required resources with all observation segments during O3GK * To figure out trigger events or abnormal behaviors * Results * April 7, 1270287158 - 1270328032 * April 8, 1270339218 - 1270425618 * Full Data is unavailable in the KISTI cluster * April 9, 1270425618 - 1270510167 * April 10, 1270513160 - 1270596544 * April 11, 1270598418 - 1270683904 * April 12, 1270684818 - 1270762046 * April 14, 1270909686 - 1270937768 * April 15, 1270945288 - 1271017582 * Event: GRB200415 (08:48:05 UTC) * Full Data is unavailable in the KISTI cluster * April 16, 1271030433 - 1271112809 * April 17, 1271119833 - 1271186507 * April 18, 1271227441 - 1271288128 * April 19, 1271289618 - 1271364033 * April 20, 1271377409 - 1271460608 * Event: GRB200420A (2:32:58 UTC) * Full Data is unavailable in the KISTI cluster 3. With iKAGRA hardware injection data * Event * Phenomenon: the strain channel and seismometer channels in iKAGRA had a high correlation during the hardware injection test * Cause: still unknown * Hypothesis: the glitches have relatively the same behavior as the vacuum rotary pump * More detail analysis: [[https://www.dropbox.com/s/950vjc807sgz24u/hveto%20brief%20Report%20for%20K1.pdf?dl=0 | hveto brief Report for K1]] and [[https://www.dropbox.com/s/hb7rx93an8yluiq/PilJong%2C%20KGWG%20Face-to-Face%20Meeting.pdf?dl=0 | KGWG Face-to-Face Meeting]] * Purpose * To verify whether this model senses injected signals and abnormal glitches * To test noise resistance and data-size limitation *Results * stride: 10 seconds with about 5000 data size during 12 minutes [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b5000%5d/ | summary page]] * stride: 10 seconds with about 10000 data size during 12 minutes [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b10000%5d/ | summary page]] * stride: 10 seconds with about 20000 data size during 12 minutes [[ https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b20000%5d/| summary page]] * stride: 10 seconds with about 30000 data size during 12 minutes [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b30000%5d/ | summary page]] * stride: 10 seconds with about 40000 data size during 12 minutes [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b40000%5d/ | summary page]] * stride: 2 seconds with about 8000 data size during 12 minutes [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b2s%5d/ | summary page]] * stride: 5 seconds with about 20000 data size during 12 minutes [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145624200-1145624936%5b5s%5d/ | summary page]] * stride: 60 seconds with about 7500 data during whole iKAGRA data [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b60s%5d/ | summary page]] * stride: 150 seconds with about 10000 data during whole iKAGRA data [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b150s%5d/ | summary page]] * stride: 300 seconds with about 20000 data during whole iKAGRA data [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b300s%5d/ | summary page]] * stride: 600 seconds during whole iKAGRA data [[https://ldas-jobs.ligo.caltech.edu/~pil-jong.jung/CAGMon/iKAGRA/2016-04-25_K1:LSC-MICH_CTRL_CAL_OUT_DQ_1145621548-1145670954%5b600s%5d/ | summary page]] |
|
Line 72: | Line 168: |
[[https://gwdoc.icrr.u-tokyo.ac.jp/cgi-bin/private/DocDB/ShowDocument?docid=12481|JGW-G2112481-v1]] | |
Line 74: | Line 171: |
[[https://science.sciencemag.org/content/334/6062/1518 | Science.1518; Detecting Novel Associations in Large Data Sets]] |
,-----. ,---. ,----. ,--. ,--. ,--. ,--. ' .--./ / O \ ' .-./ | `.' | ,---. ,--,--, ,---. ,-' '-.,--.,--. ,-| | ,---. | | | .-. || | .---.| |'.'| || .-. || \ | .-. :'-. .-'| || |' .-. || .-. : ' '--'\| | | |' '--' || | | |' '-' '| || | \ --. | | ' '' '\ `-' |\ --. `-----'`--' `--' `------' `--' `--' `---' `--''--' `----' `--' `----' `---' `----'
Description
The CAGMon etude is a study version of CAGMon that evaluates the dependence between the primary and auxiliary channels.
Project goal
The goal of this project is to find a systematic way of identifying the abnormal glitches in the gravitational-wave data using various methods of correlation analysis. Usually, the community such as LIGO, Virgo, and KAGRA uses a conventional way of finding glitches in auxiliary channels of the detector - Klein-Welle, Omicron, Ordered Veto Lists, etc. However, some different ways can be possible to find and monitor them in a (quasi-) realtime. Also, the method can point out which channel is responsible for the found glitch. In this project, we study its possible to apply three different correlation methods - maximal information coefficient, Pearson's correlation coefficient, and Kendall's tau coefficient - in the gravitational wave data from the KAGRA detector.
Participants
- John.J Oh (NIMS)
- Young-Min Kim (UNIST)
- Pil-Jong Jung (NIMS)
Methods and Frameworks
Maximal Information Coefficient (MIC)
the Maximal Information coefficient(MIC) of a set D of two-variable data with sample size n and grid less than B(n) is given by
\[ MIC(D)=\underset{xy<B(n)}{\max}{\left\{ \frac{I^{*}(D,x,y)}{\log \min \left\{x,y \right\}} \right \}} \],
where \[\omega(1)<B(n)\le O(n^{1-\epsilon}) \] for some \[ 0<\epsilon<1 \]
Pearson's Correlation Coefficient (PCC)
Pearson Correlation Coefficient(PCC) is a statistic that explains the amount of variance accounted for in the relationship between two (or more) variables by \[ R=} \],
where \[ \overline{X} \] and \[ \overline{Y} \] are the mean of X and Y, respectively
Kendall's tau Coefficient
Kendall’s tau with a random samples n of observations from two variables measures the strength of the relationship between two ordinal level variables by
\[ \tau =\frac{c-d} \],
where c is the number of concordant pairs, and d is the number of discordant pairs
Flow chart
Code development
GitHub
Code versions
- CAGMon Etude Alpha
- for the basic test and evaluation of the LASSO regression method developed by LIGO
- reproduced original CAGMon methods and idea
- CAGMon Etude Beta
- added coefficient trend plots with LASSO beta, coherence, MIC, PCC, and Kendall's tau
- CAGMon Etude Delta
- fixed a critical problem that sucked enormous memory when it used the matplotlib module
- CAGMon Etude Eta
- fixed minor issues
- added the range limitation of stride
- CAGMon Etude Flat (current version)
- fixed minor issues and optimized scripts
- added the script of HTML summary page
- added coefficient distribution plots
- CAGMon Etude Octave (development version)
- remove some processes that make Time-series and Scatter plots. Even though it required tremendous memory, this information is not useful
- adjust HTML code
Series of scripts
- Agrement.py
- the script gathered functions the medel required
- Melody.py
- the script to calcutate each coefficient and to save trend data as csv
- Conchord.py
- the script to make plots
- Echo.py
- the script to save the result as HTML web page
- CAGMonEtude{Version}.py
- the script to run each script
User guide
Needs of code development
- Fundamental critarian or guideline of the stride and its data-size
- Daily running on KAGRA
Exemplary results
1. Earthquake effects during O3GK
- Datetime: 19 April 2020 20:39 UTC
- Purpose
- Test to run CAGMon algorithm with a remarkable event
- To figure out the cause of lock-loss in KAGRA
- Results
stride 5 seconds Summary page
stride 20 seconds Summary page
stride 30 seconds Summary page
2. Skim through all obs-segments of O3GK
- Purpose
- Test for calculation time and required resources with all observation segments during O3GK
- To figure out trigger events or abnormal behaviors
- Results
- April 7, 1270287158 - 1270328032
- April 8, 1270339218 - 1270425618
- Full Data is unavailable in the KISTI cluster
- April 9, 1270425618 - 1270510167
- April 10, 1270513160 - 1270596544
- April 11, 1270598418 - 1270683904
- April 12, 1270684818 - 1270762046
- April 14, 1270909686 - 1270937768
- April 15, 1270945288 - 1271017582
- Event: GRB200415 (08:48:05 UTC)
- Full Data is unavailable in the KISTI cluster
- April 16, 1271030433 - 1271112809
- April 17, 1271119833 - 1271186507
- April 18, 1271227441 - 1271288128
- April 19, 1271289618 - 1271364033
- April 20, 1271377409 - 1271460608
- Event: GRB200420A (2:32:58 UTC)
- Full Data is unavailable in the KISTI cluster
3. With iKAGRA hardware injection data
- Event
- Phenomenon: the strain channel and seismometer channels in iKAGRA had a high correlation during the hardware injection test
- Cause: still unknown
- Hypothesis: the glitches have relatively the same behavior as the vacuum rotary pump
More detail analysis: hveto brief Report for K1 and KGWG Face-to-Face Meeting
- Purpose
- To verify whether this model senses injected signals and abnormal glitches
- To test noise resistance and data-size limitation
- Results
stride: 10 seconds with about 5000 data size during 12 minutes summary page
stride: 10 seconds with about 10000 data size during 12 minutes summary page
stride: 10 seconds with about 20000 data size during 12 minutes summary page
stride: 10 seconds with about 30000 data size during 12 minutes summary page
stride: 10 seconds with about 40000 data size during 12 minutes summary page
stride: 2 seconds with about 8000 data size during 12 minutes summary page
stride: 5 seconds with about 20000 data size during 12 minutes summary page
stride: 60 seconds with about 7500 data during whole iKAGRA data summary page
stride: 150 seconds with about 10000 data during whole iKAGRA data summary page
stride: 300 seconds with about 20000 data during whole iKAGRA data summary page
stride: 600 seconds during whole iKAGRA data summary page
Beyond
References
Presentation materials
Papers
Science.1518; Detecting Novel Associations in Large Data Sets