NAME

     clust3dm  - Modified K-means Algorithm with Masking capabil-
     ities

SYNOPSIS

     clust3dm [ -NR ntapr ] [ -NI ntapi ] [ -NM  ntapm  ]  [  -OR
     otapr  ]  [ -OI otapi ] [ -OC otapc ] [ -tfstart tfstart ] [
     -tfskip tfskip ] [ -tfend tfend ] [ -ildm  ildm  ]  [  -cldm
     cldm  ]  [  -nclust  nclust  ]  [  -sclustini  sclustini ] [
     -sdelthresl sdelthresl ] [ -sdelthresu sdelthresu ] [ -sdel-
     ini  sdelini ] [ -maxcycle maxcycle ] [ -kmthres kmthres ] [
     -seed seed ] [ -fout ] [ -V ] [ -? ]

DESCRIPTION

     clust3dm The clust3dm routine represents a modified  K-means
     cluster  algorithm (see J.T. Tou, and R.C. Gonzales; Pattern
     Recognition   Principles,   Massachusetts,   Addison-Wesley,
     1974).  The performance index that is minimized using the K-
     means algorithm is defined as the sum of  the  squared  dis-
     tances  from  all  points in a cluster domain to the cluster
     center (which can be thought of as  a  minimization  of  the
     overall  noise  power  contained  in  the  the whole cluster
     domain). Initial cluster centers can be specified. The  user
     can also delete clusters, relocate cluster centers or delete
     feature vectors. Furthermore, if the user supplies a   mask-
     file  (which  can  be  easily created using the USP-routines
     clustedit and/or pickmask) a  specified  subset  of  feature
     vectors can be selected for classification.

USAGE

     clust3dm gets all its parameters  from  command  line  argu-
     ments.  These  arguments specify the real and imaginary part
     of the initial input data set (-NR and  -NI),  the  maskfile
     (-NM),  the  real and imaginary part of the output ("initial
     subset") feature data set (-OR and  -OI),  the  output  file
     containing  the  final cluster assignments (-OC), the sample
     points of each initial trace you want to use for  processing
     (-tfstart, -tfskip and -tfend), the in-line/cross-line spac-
     ing of the dataset (-ildm and -cldm), the number of  initial
     cluster  centers (-nclust), the cluster center initilization
     method (-sclustini), the threshold and  modification  method
     used  for  cluster center relocation, cluster center removal
     and feature vector vector removal (-sdelthresl,  sdelthresu,
     and  -sdelini), the maximum number of iterations (-maxcycle)
     that can occur, the convergence threshold  used  within  the
     K-means subroutine (-kmthres), a seed parameter for a random
     generator (-seed), as well as a feature output file flag  (-
     fout).

     In the IKP environment, input #0 and input #3 are  used  for
     the  real part (-NR) and imaginary part (-NI) of the initial
     data set, respectively.  Input #6 is used for  the  maskfile
     and output #1 contains the cluster assignments (-OC). Output
     #4 and output #5 are the real (-OR) and imaginary part (-OI)
     of  the  "initial  subset".  Output  #2 is the usual process
     printfile output.

  Command line arguments
     -NR ntapr
          Enter the input file name immediately after typing  -NR
          (corresponding  to  the  real part of the initial input
          data set). The input file should include  the  complete
          path name if the file resides in a different directory.
          For this program, the data must be  stored  as  a  rec-
          tangular  grid  of regularly binned data. The number of
          samples denoted by lineheader word 'NumSmp' defines the
          number  of  entries  used for each initial feature vec-
          tors. The number of traces denoted by  lineheader  word
          'NumTrc'  defines  the  number  of  traces  in  the  x-
          direction.   The  number  of  records  (seismic  lines)
          denoted  by lineheader word 'NumRec' defines the number
          of traces in the y-direction. Dead traces  have  to  be
          flagged  by  the  dead traceheader flag 'StaCor'=30000.
          Has to be specified, otherwise program will abort.

     -NI ntapi
          Enter the input file name immediately after typing  -NI
          (corresponding  to  the  imaginary  part of the initial
          input data set). For this program,  the  data  must  be
          stored  as a rectangular grid of regularly binned data.
          The  number  of  samples  denoted  by  lineheader  word
          'NumSmp'  defines  the  number of entries used for each
          initial feature vector. The number of traces denoted by
          lineheader  word  'NumTrc' defines the number of traces
          in the x-direction.  The  number  of  records  (seismic
          lines)  denoted by lineheader word 'NumRec' defines the
          number of traces in the y-direction. Dead  traces  have
          to   be   flagged   by   the   dead   traceheader  flag
          'StaCor'=30000. Does not have to be specified.  If  not
          specified the imaginary part is set internally to zero.

     -NI ntapm
          Enter the maskfile name immediately after  typing  -NM.
          The   number  of  traces  denoted  by  lineheader  word
          'NumTrc'  defines  the  number  of  traces  in  the  x-
          direction.   The  number  of  records  (seismic  lines)
          denoted by lineheader word 'NumRec' defines the  number
          of  traces  in the y-direction. Note that the x- and y-
          dimensions of all  input  files  have  to  match.   The
          number  of  samples denoted by lineheader word 'NumSmp'
          is one; each sample referirng to a masking flag  for  a
          certain   x-y-position.  The  masking  flags  subdivide
          traces into "active" and  "passive"  traces.  "Passive"
          traces  are  marked  by a masking flag of -1, all other
          values correspond to  "active"  traces.  If  a  passive
          trace  is  detected, StaCor will be set to 30000, i.e.,
          all traces of output files corresponding  to  "passive"
          traces  are  made dead traces. Only "active" traces are
          used for classification. Does not have to be specified.
          If not specified all traces are assumed to be "active".


     -OC otapc
          Enter the output file name immediately after typing -OC
          (contains  the  cluster  assignments  for each and ever
          spatial location). Note that  cluster  assignments  for
          dead and masked traces are set to zero and that cluster
          assignments for deleted traces are set  to  "number  of
          clusters  + 1", respectively. Has to be specified if in
          IKP, otherwise program  will  abort.  Will  be  set  to
          outc.usp if not in IKP and not specified.

          Note: The number of clusters  used  is  stored  in  the
          LINEHEADER keyword 'MnDpIn'!!!

     -OR otapr
          Enter the output file name immediately after typing -OR
          (contains  the  real  part  of  the "initial subset" of
          feature vectors). The "initial subset" of feature  vec-
          tors  includes  dead  and  masked traces (StaCor=30000)
          which will be taken out for cluster processing. Has  to
          be specified if in IKP and the feature output file flag
          (-fout) is specified,  otherwise  program  will  abort.
          Will  be  set  to outr.usp if not in IKP and not speci-
          fied.

     -OI otapi
          Enter the output file name immediately after typing -OI
          (contains the imaginary part of the "initial subset" of
          feature vectors). The "initial subset" of feature  vec-
          tors  includes  dead  and  masked traces (StaCor=30000)
          which will be taken out for cluster processing. Has  to
          be  specified  if in IKP and the feature output flag (-
          fout) as well as  the  input  file  name  (-ntapi)  are
          specified, otherwise program will abort. Will be set to
          outi.usp if not in IKP and not specified.

     tfstart, tfskip and  tfend can be used to select a reduced set of
          sample  points (per trace) of the whole input data set.
          Note that the times/frequencies that you  specify  will
          be internaly adjusted to fit the time/frequency grid of
          the existing data. All  times/frequencies  have  to  be
          given and will be set in milliseconds/Hz, respectively.
          Note that tfend has to be greater than tfstart;  other-
          wise error message will occur.

     -tfstart tfstart
          Specifies the time/frequency of occurence of first sam-
          ple  point  in trace which will be used for processing.
          Will be adjusted internaly according to  the  following
          rules:

          default => TmMsFs (time/frequency of occurence of first
          sample point in trace)

          if tfstart specified is less than or equal to TmMSFs =>
          set tfstart internaly to TmMsFs

          if tfstart specified is  greater  than  TmMSFs  =>  set
          tfstart internaly to the time/frequency of occurence of
          the  next  available  sample  whose  time/frequency  of
          occurence is greater or equal to tfstart specified

          if tfstart specified is greater than the time/frequency
          of  occurence  of  the  last sample in the trace => set
          tfstart internaly to  time/frequency  of  occurence  of
          last sample in trace

     -tfskip tfskip
          Specifies the time/frequency increment for  the  selec-
          tion  of  trace sample points used for processing. Will
          be adjusted internaly according to the following rules:

          default => one input sample interval

          if tfskip specified is less than or equal  to  zero  =>
          set  tfskip  internaly  to  one  input  sample interval
          (given by lineheader words SmpInt and MutVel  for  time
          and frequency, respectively).

          if tfskip specified is greater than zero => set  tfskip
          internaly  to the multiple of input sample intervals it
          encloses

          if tfskip specified  is  greater  than  the  difference
          between  the  time/frequency  of  occurence of the last
          sample point in a trace and the first sample point in a
          trace  =>  set  tfskip internaly to this time/frequency
          difference

     -tfend tfend
          Specifies the time/frequency of occurence of last  sam-
          ple  point  in trace which will be used for processing.
          Will be adjusted internaly according to  the  following
          rules:

          default => time of occurence of last  sample  point  in
          trace
          if tfend specified is less than or equal to  TmMSFs  =>
          set  tfend internaly to the time/frequency of occurence
          of last sample in trace

          if tfend specified is greater than TmMSFs => set  tfend
          internaly  to  the  time/frequency  of occurence of the
          last available sample whose time/frequency of occurence
          is smaller or equal to tfend specified

          if tfend specified is greater than  the  time/frequency
          of  occurence  of  last  sample  in  trace => set tfend
          internaly to time/frequency of occurence of last sample
          in trace

     -ildm ildm
          If not found in the line header, you must  specify  the
          in-line spacing (In-line cell increment)

     -cldm cldm
          If not found in the line header, you must  specify  the
          crossline spacing (Cross-line cell increment)

     -nclust nclust
          Specifies the number of clusters  used  initialy.  Note
          that the result might contain less clusters than speci-
          fied here (if -sdelini was  set  to  2,  i.e.,  if  one
          deletes   clusters   having   insufficient   number  of
          members).

          default => 5

     -sclustini sclustini
          Specifies the cluster center initialization method:

          default => Option 4

          1 => Select the first nclust  feature  vectors  of  the
          "start  subset"  as cluster centers. The "start subset"
          consists of all the feature  vectors  in  the  "initial
          subset" (see -OR, -OI) corresponding to live traces.

          2 => Select the initial cluster centers  randomly  from
          the "start subset".

          3 => Select the initial cluster centers  based  on  the
          one-pass  "furthest  neighborhood  method".  The  first
          feature vector of the  "start  subset"  serves  as  the
          first  initial  cluster  center  and the feature vector
          which is furthest away from this first  cluster  center
          is  choosen  as  the  second  cluster  center. The next
          nclust-2  cluster  centers  are  selected  accordingly,
          i.e.,  the  distance  between  each  remaining  feature
          vector and the already established cluster  centers  is
          obtained  and the feature vector which is furthest away
          from any of these cluster centers  is  choosen  as  the
          next  cluster center. Precaution! Note that this method
          has its pitfall. Although,  e.g.,  the   third  cluster
          might  be far away from cluster center 1 it might actu-
          ally coincide with cluster center 2.

          4 => Select the initial cluster centers  based  on  the
          "maxmin-distance  algorithm"  (see  J.T.  Tou, and R.C.
          Gonzales;   Pattern   Recognition   Principles,    Mas-
          sachusetts, Addison-Wesley, 1974). Select the first and
          second cluster center as in the "furthest  neighborhood
          method".  Calculate  the  distance  from each remaining
          feature vector to each of the already existing  cluster
          centers  and  assign  that  feature vector as the third
          cluster center that has the maximum  minimum  distance.
          This  procedure  is  repeated  to  obtain the remaining
          cluster centers.


     -sdelthresl sdelthresl
          Specifies  the  lower  membership-threshold  individual
          clusters (number of members) are compared against.

          default => 0


     -sdelthresu sdelthresu
          Specifies  the  upper  membership-threshold  individual
          clusters (number of members) are compared against.

          default => 0

          sdelthresl and  sdelthresu specify the lower and  upper
          membership  threshold.  Using these thresholds the user
          can select one out of four different  boundary  ranges,
          individual  cluster  membership  numbers  are  compared
          against.  Clusters whose membership number fall  within
          the  specified  boundary  will be modified according to
          the specification made in sdelini.

          range 0 => default; selected if neither sdelthresl  nor
          sdelthresu  are  specified  (or if both are set to zero
          internally). Results in no modification for any of  the
          clusters.

          range 1 => selected if only sdelthresl is specified (or
          if  sdelthresu)  is set to zero internally). Results in
          modifications of all clusters whose  membership  number
          is <= sdelthresl.

          range 2 => selected if sdelthresl  and  sdelthresu  are
          specified.   Results  in  modifications of all clusters
          whose  membership  number  is  >   sdelthresl   and   <
          sdelthresu.

          range 3 => selected if only sdelthresu is specified (or
          if  sdelthresl)  is set to zero internally). Results in
          modifications of all clusters whose  membership  number
          is >= sdelthresu.


          Note: If sdelthresl is specified  to  be  >  sdelthresu
          program will abort!

     -sdelini sdelini
          Specifies the modification method. Action is  taken  if
          the  membership  number of any cluster falls within the
          boundary range given by sdelthresl and sdelthresu:

          default => Option 3

          1 => Deletes all feature  vectors  which  belong  to  a
          cluster whose membership number falls within the thres-
          hold boundary set by  sdelthresl and  sdelthresu.   The
          number  of  cluster centers stays the same. The feature
          vectors that have been  deleted  are  given  the  value
          "nclust + 1" in the cluster assignment output file (see
          -OC) where nclust  is  the  final  number  of  cluster.
          Cluster  centers  whose  members  have been deleted are
          moved to be near a  cluster  center  of  an  adequately
          (membership  number within boundary) populated cluster.
          The cluster with the most members is selected.

          2 => Deletes all feature  vectors  which  belong  to  a
          cluster whose membership number falls within the thres-
          hold boundary set by  sdelthresl and  sdelthresu.   The
          number of cluster centers is also reduced.

          3 => Deletes all cluster  centers  which  belong  to  a
          cluster whose membership number falls within the thres-
          hold boundary set by  sdelthresl and  sdelthresu.  This
          will reduce the number of clusters used but it will not
          effect the number of feature vectors used.  Using  this
          option  one  can  distribute  the feature vectors which
          fall in low populated clusters among the  higher  popu-
          lated clusters.

          4 => Move all cluster centers which belong to a cluster
          whose  membership  number  falls  within  the threshold
          boundary set by  sdelthresl and  sdelthresu.  The clus-
          ter centers are not being deleted. They are moved to be
          near a feature  vector  of  an  adequately  (membership
          number within boundary) populated cluster.  The cluster
          whose cluster  center  is  closest  (in  the  Euclidean
          sense)  to  the cluster center we want to move is being
          selected. This leads to a splitting of this  adequately
          populated cluster.

     -maxcycle maxcycle
          Specifies how many iteration  cycles  (K-means  cycles)
          will at most occur.

          default => 0; will lead to no limitations in the amount
          of iteration cycles that can occur.


     -kmthres kmthres
          Specifies the threshold used for each K-means cycle. As
          soon as the number of reassignments is equal or smaller
          than  kmthres convergence is assumed to be achieved.

          default => 0


     -seed seed
          Specifies seed value  for  random  generator  used  for
          cluster initialization method (see -sclustini).

          default => 1


     -fout
          Enter the command line argument '-fout' if you want  to
          output  the  "initial  subset"  of feature vectors (see
          -otapr and otapi). This gives the user the  ability  to
          use  the  same feature vectors in another clust3dm ses-
          sion working with a different parameter set.


     -V   Enter the command line argument '-V' to get  additional
          printout.

          -?  Enter the command line argument '-?' to get  online
          help.   The program terminates after the help screen is
          printed.

BUGS

     None currently known.

REFERENCES:

     Kirlin, R.L., 1996, "Considerations for Clustering and  Seg-
     mentation  of  Reflections  for  3D Horizon Feature Enhance-
     ment", R. Lynn Kirlin, Inc.
     Kaufhold, B., Kirlin, R.L., Dizaji, R.M., 1998, "A  Cluster-
     ing  Based Blind System Indentification Approach", submitted
     to ICA' 99, France

CONTRACT AGREEMENT

     This product is brought to you by Research  Contract  Agree-
     ment  2548 (Seismic coherency cube). Thank you for your sup-
     port.

AUTHOR

     Bertram Kaufhold (1998)

COPYRIGHT

     copyright 2001, Amoco Production Company
               All Rights Reserved
          an affiliate of BP America Inc.

Man(1) output converted with man2html

NAME

SYNOPSIS

DESCRIPTION

USAGE

BUGS

See Also:

REFERENCES:

CONTRACT AGREEMENT

AUTHOR

COPYRIGHT