NAME

     clust3d  - Modified K-means Algorithm

SYNOPSIS

     clust3d [ -NR ntapr ] [ -NI ntapi ] [  -OR  otapr  ]  [  -OI
     otapi  ] [ -OC otapc ] [ -tstart tstart ] [ -tskip tskip ] [
     -tend tend ] [ -ildm ildm ] [ -cldm cldm ] [ -nclust  nclust
     ]  [  -sclustini  sclustini  ]  [ -sdelthresl sdelthresl ] [
     -sdelthresu sdelthresu ] [ -sdelini sdelini  ]  [  -maxcycle
     maxcycle  ]  [ -kmthres kmthres ] [ -seed seed ] [ -fout ] [
     -V ] [ -? ]

DESCRIPTION

     clust3d The clust3d routine represents  a  modified  K-means
     cluster  algorithm (see J.T. Tou, and R.C. Gonzales; Pattern
     Recognition   Principles,   Massachusetts,   Addison-Wesley,
     1974).  The performance index that is minimized using the K-
     means algorithm is defined as the sum of  the  squared  dis-
     tances  from  all  points in a cluster domain to the cluster
     center (which can be thought of as  a  minimization  of  the
     overall  noise  power  contained  in  the  the whole cluster
     domain). Initial cluster centers can be specified. The  user
     can also delete clusters, relocate cluster centers or delete
     feature vectors.

USAGE

     clust3d gets all its parameters from command line arguments.
     These  arguments  specify the real and imaginary part of the
     initial input data set (-NR and -NI), the real and imaginary
     part  of the output ("initial subset") feature data set (-OR
     and -OI), the  output  file  containing  the  final  cluster
     assignments  (-OC),  the sample points of each initial trace
     you want to use for processing (-tstart, -tskip and  -tend),
     the  in-line/cross-line  spacing  of  the dataset (-ildm and
     -cldm), the number of initial cluster centers (-nclust), the
     cluster center initilization method (-sclustini), the thres-
     hold and modification method used for cluster center reloca-
     tion, cluster center removal and feature vector vector remo-
     val (-sdelthresl, sdelthresu,  and  -sdelini),  the  maximum
     number of iterations (-maxcycle) that can occur, the conver-
     gence  threshold  used  within  the  K-means  subroutine  (-
     kmthres),  a  seed parameter for a random generator (-seed),
     as well as a feature output file flag (-fout).

     In the IKP environment, input #0 and input #3 are  used  for
     the  real part (-NR) and imaginary part (-NI) of the initial
     data set, respectively.   Output  #1  contains  the  cluster
     assignments  (-OC).  Output #4 and output #5 are the real (-
     OR) and imaginary part (-OI) of the "initial subset". Output
     #2 is the usual process printfile output.


  Command line arguments
     -NR ntapr
          Enter the input file name immediately after typing  -NR
          (corresponding  to  the  real part of the initial input
          data set). The input file should include  the  complete
          path name if the file resides in a different directory.
          For this program, the data must be  stored  as  a  rec-
          tangular  grid  of regularly binned data. The number of
          samples denoted by lineheader word 'NumSmp' defines the
          number  of  entries  used for each initial feature vec-
          tors. The number of traces denoted by  lineheader  word
          'NumTrc'  defines  the  number  of  traces  in  the  x-
          direction.   The  number  of  records  (seismic  lines)
          denoted  by lineheader word 'NumRec' defines the number
          of traces in the y-direction. Dead traces  have  to  be
          flagged by the dead traceheader flag 'StaCor'=30000 and
          the number of padded traces have to  be  given  by  the
          lineheader  word  'Nx_Pad'. Has to be specified, other-
          wise program will abort.

     -NI ntapi
          Enter the input file name immediately after typing  -NI
          (corresponding  to  the  imaginary  part of the initial
          input data set). For this program,  the  data  must  be
          stored  as a rectangular grid of regularly binned data.
          The  number  of  samples  denoted  by  lineheader  word
          'NumSmp'  defines  the  number of entries used for each
          initial feature vector. The number of traces denoted by
          lineheader  word  'NumTrc' defines the number of traces
          in the x-direction.  The  number  of  records  (seismic
          lines)  denoted by lineheader word 'NumRec' defines the
          number of traces in the y-direction. Dead  traces  have
          to   be   flagged   by   the   dead   traceheader  flag
          'StaCor'=30000 and the number of padded traces have  to
          be given by the lineheader word 'Nx_Pad'. Does not have
          to be specified. If not specified the imaginary part is
          set internally to zero.

     -OC otapc
          Enter the output file name immediately after typing -OC
          (contains  the  cluster  assignments  for each and ever
          spatial location). Note that  cluster  assignments  for
          dead  traces  and  deleted  traces  are set to zero and
          "number of clusters  +  1",  respectively.  Has  to  be
          specified if in IKP, otherwise program will abort. Will
          be set to outc.usp if not in IKP and not specified.

          Note: The number of clusters  used  is  stored  in  the
          LINEHEADER keyword 'MnDpIn'!!!

     -OR otapr
          Enter the output file name immediately after typing -OR
          (contains  the  real  part  of  the "initial subset" of
          feature vectors). The "initial subset" of feature  vec-
          tors  includes  filled traces (StaCor=30000) which will
          be taken out for cluster processing. Has to  be  speci-
          fied if in IKP and the feature output file flag (-fout)
          is specified, otherwise program will abort. Will be set
          to outr.usp if not in IKP and not specified.

     -OI otapi
          Enter the output file name immediately after typing -OI
          (contains the imaginary part of the "initial subset" of
          feature vectors). The "initial subset" of feature  vec-
          tors  includes  filled traces (StaCor=30000) which will
          be taken out for cluster processing. Has to  be  speci-
          fied  if  in IKP and the feature output flag (-fout) as
          well as the input file  name  (-ntapi)  are  specified,
          otherwise  program  will abort. Will be set to outi.usp
          if not in IKP and not specified.

     tstart, tskip and  tend can be used to select a reduced set of
          sample  points (per trace) of the whole input data set.
          Note that the times that you specify will be  internaly
          adjusted to fit the time grid of the existing data. All
          times have to be given and will be set in milliseconds,
          respectively.  Note  that  tend  has to be greater than
          tstart; otherwise error message will occur.

     -tstart tstart
          Specifies the time of occurence of first  sample  point
          in  trace  which  will  be used for processing. Will be
          adjusted internaly according to the following rules:

          default => TmMsFs (time of occurence  of  first  sample
          point in trace)

          if tstart specified is less than or equal to TmMSFs  =>
          set tstart internaly to TmMsFs

          if tstart specified  is  greater  than  TmMSFs  =>  set
          tstart  internaly  to the time of occurence of the next
          available sample whose time of occurence is greater  or
          equal to tstart specified

          if  tstart  specified  is  greater  than  the  time  of
          occurence of the last sample in the trace => set tstart
          internaly to time of occurence of last sample in trace

     -tskip tskip
          Specifies the time increment for the selection of trace
          sample  points  used  for  processing. Will be adjusted
          internaly according to the following rules:

          default => one input sample interval

          if tskip specified is less than or equal to zero => set
          tskip internaly to one input sample interval

          if tskip specified is greater than zero  =>  set  tskip
          internaly  to the multiple of input sample intervals it
          encloses

          if tskip  specified  is  greater  than  the  difference
          between  the time of occurence of the last sample point
          in a trace and the first sample point in a trace => set
          tskip internaly to this time difference

     -tend tend
          Specifies the time of occurence of last sample point in
          trace  which  will  be  used  for  processing.  Will be
          adjusted internaly according to the following rules:

          default => time of occurence of last  sample  point  in
          trace

          if tend specified is less than or equal  to  TmMSFs  =>
          set  tend  internaly  to  the time of occurence of last
          sample in trace

          if tend specified is greater than TmMSFs  =>  set  tend
          internaly  to  the time of occurence of the last avail-
          able sample whose time of occurence is smaller or equal
          to tend specified

          if tend specified is greater than the time of occurence
          of  last  sample in trace => set tend internaly to time
          of occurence of last sample in trace

     -ildm ildm
          If not found in the line header, you must  specify  the
          in-line spacing (In-line cell increment)

     -cldm cldm
          If not found in the line header, you must  specify  the
          crossline spacing (Cross-line cell increment)

     -nclust nclust
          Specifies the number of clusters  used  initialy.  Note
          that the result might contain less clusters than speci-
          fied here (if -sdelini was  set  to  2,  i.e.,  if  one
          deletes   clusters   having   insufficient   number  of
          members).

          default => 5

     -sclustini sclustini
          Specifies the cluster center initialization method:

          default => Option 1

          1 => Select the first nclust  feature  vectors  of  the
          "start  subset"  as cluster centers. The "start subset"
          consists of all the feature  vectors  in  the  "initial
          subset" (see -OR, -OI) corresponding to live traces.

          2 => Select the initial cluster centers  randomly  from
          the "start subset".

          3 => Select the initial cluster centers  based  on  the
          one-pass  "furthest  neighborhood  method".  The  first
          feature vector of the  "start  subset"  serves  as  the
          first  initial  cluster  center  and the feature vector
          which is furthest away from this first  cluster  center
          is  choosen  as  the  second  cluster  center. The next
          nclust-2  cluster  centers  are  selected  accordingly,
          i.e.,  the distance between each remaining feature vec-
          tor and the  already  established  cluster  centers  is
          obtained  and the feature vector which is furthest away
          from any of these cluster centers  is  choosen  as  the
          next  cluster center. Precaution! Note that this method
          has its pitfall. Although,  e.g.,  the   third  cluster
          might  be far away from cluster center 1 it might actu-
          ally coincide with cluster center 2.

          4 => Select the initial cluster centers  based  on  the
          "maxmin-distance  algorithm"  (see  J.T.  Tou, and R.C.
          Gonzales;   Pattern   Recognition   Principles,    Mas-
          sachusetts, Addison-Wesley, 1974). Select the first and
          second cluster center as in the "furthest  neighborhood
          method".  Calculate  the  distance  from each remaining
          feature vector to each of the already existing  cluster
          centers  and  assign  that  feature vector as the third
          cluster center that has the maximum  minimum  distance.
          This  procedure  is  repeated  to  obtain the remaining
          cluster centers.


     -sdelthresl sdelthresl
          Specifies  the  lower  membership-threshold  individual
          clusters (number of members) are compared against.

          default => 0


     -sdelthresu sdelthresu
          Specifies  the  upper  membership-threshold  individual
          clusters (number of members) are compared against.
          default => 0

          sdelthresl and  sdelthresu specify the lower and  upper
          membership  threshold.  Using these thresholds the user
          can select one out of four different  boundary  ranges,
          individual  cluster  membership  numbers  are  compared
          against.  Clusters whose membership number fall  within
          the  specified  boundary  will be modified according to
          the specification made in sdelini.

          range 0 => default; selected if neither sdelthresl  nor
          sdelthresu  are  specified  (or if both are set to zero
          internally). Results in no modification for any of  the
          clusters.

          range 1 => selected if only sdelthresl is specified (or
          if  sdelthresu)  is set to zero internally). Results in
          modifications of all clusters whose  membership  number
          is <= sdelthresl.

          range 2 => selected if sdelthresl  and  sdelthresu  are
          specified.   Results  in  modifications of all clusters
          whose  membership  number  is  >   sdelthresl   and   <
          sdelthresu.

          range 3 => selected if only sdelthresu is specified (or
          if  sdelthresl)  is set to zero internally). Results in
          modifications of all clusters whose  membership  number
          is >= sdelthresu.


          Note: If sdelthresl is specified  to  be  >  sdelthresu
          program will abort!

     -sdelini sdelini
          Specifies the modification method. Action is  taken  if
          the  membership  number of any cluster falls within the
          boundary range given by sdelthresl and sdelthresu:

          default => Option 3

          1 => Deletes all feature  vectors  which  belong  to  a
          cluster whose membership number falls within the thres-
          hold boundary set by  sdelthresl and  sdelthresu.   The
          number  of  cluster centers stays the same. The feature
          vectors that have been  deleted  are  given  the  value
          "nclust + 1" in the cluster assignment output file (see
          -OC) where nclust  is  the  final  number  of  cluster.
          Cluster  centers  whose  members  have been deleted are
          moved to be near a  cluster  center  of  an  adequately
          (membership  number within boundary) populated cluster.
          The cluster with the most members is selected.
          2 => Deletes all feature  vectors  which  belong  to  a
          cluster whose membership number falls within the thres-
          hold boundary set by  sdelthresl and  sdelthresu.   The
          number of cluster centers is also reduced.

          3 => Deletes all cluster  centers  which  belong  to  a
          cluster whose membership number falls within the thres-
          hold boundary set by  sdelthresl and  sdelthresu.  This
          will reduce the number of clusters used but it will not
          effect the number of feature vectors used.  Using  this
          option  one  can  distribute  the feature vectors which
          fall in low populated clusters among the  higher  popu-
          lated clusters.

          4 => Move all cluster centers which belong to a cluster
          whose  membership  number  falls  within  the threshold
          boundary set by  sdelthresl and  sdelthresu.  The clus-
          ter centers are not being deleted. They are moved to be
          near a feature  vector  of  an  adequately  (membership
          number within boundary) populated cluster.  The cluster
          whose cluster  center  is  closest  (in  the  Euclidean
          sense)  to  the cluster center we want to move is being
          selected. This leads to a splitting of this  adequately
          populated cluster.

     -maxcycle maxcycle
          Specifies how many iteration  cycles  (K-means  cycles)
          will at most occur.

          default => 0; will lead to no limitations in the amount
          of iteration cycles that can occur.


     -kmthres kmthres
          Specifies the threshold used for each K-means cycle. As
          soon as the number of reassignments is equal or smaller
          than  kmthres convergence is assumed to be achieved.

          default => 0


     -seed seed
          Specifies seed value  for  random  generator  used  for
          cluster initialization method (see -sclustini).

          default => 1


     -fout
          Enter the command line argument '-fout' if you want  to
          output  the  "initial  subset"  of feature vectors (see
          -otapr and otapi). This gives the user the  ability  to
          use the same feature vectors in another clust3d session
          working with a different parameter set.


     -V   Enter the command line argument '-V' to get  additional
          printout.

          -?  Enter the command line argument '-?' to get  online
          help.   The program terminates after the help screen is
          printed.

BUGS

     None currently known.

REFERENCES:

     Kirlin, R.L., 1996, "Considerations for Clustering and  Seg-
     mentation  of  Reflections  for  3D Horizon Feature Enhance-
     ment", R. Lynn Kirlin, Inc.

     Kaufhold, B., Kirlin, R.L., Dizaji, R.M., 1998, "A  Cluster-
     ing  Based Blind System Indentification Approach", submitted
     to ICA' 99, France

CONTRACT AGREEMENT

     This product is brought to you by Research  Contract  Agree-
     ment  2548 (Seismic coherency cube). Thank you for your sup-
     port.

AUTHOR

     Bertram Kaufhold (1998)

COPYRIGHT

     copyright 2001, Amoco Production Company
               All Rights Reserved
          an affiliate of BP America Inc.

Man(1) output converted with man2html

NAME

SYNOPSIS

DESCRIPTION

USAGE

BUGS

See Also:

REFERENCES:

CONTRACT AGREEMENT

AUTHOR

COPYRIGHT