fluent-plugin-forest released! - たごもりすメモ

現状のfluentdでは、タグを動的に扱う方法がいまいち無い。具体的に言うと設定項目にタグに応じて変化するような指定をしたい場合、タグごとに分けて書くしかない。例えば out_file で出力先ファイル名をタグに応じてつけたい場合、タグの数だけ match 節を書く必要がある。

<match hoge>
  type file
  path /var/log/hoge.log
</match>
<match pos>
  type pos
  path /var/log/pos.log
</match>
# 以下いっぱい

これには極めて簡単にわかる範囲で、ふたつの大きな問題がある。

多数のタグを扱う場合、設定ファイル全体のボリュームが肥大化して管理コストが増大する(品質が低下する)
新しく扱うタグが増える場合、設定ファイルの更新と適用が必要となり、管理コストが増大する

既に手元でこの問題に悩まされていて、HoopServerに書き込みに行くfluentd*1の設定ファイルの行数が既に18タグ分、417行にわたっていてかなり末期的な症状を呈していた。大部分コピペなんだけどところどころ違うのがたちが悪い。

ぶっちゃけ、このように書いたらあとは適当にfluentdがやってくれたりすると嬉しい。

<match *>
  type file
  path /var/log/__TAG__.log
</match>

無いのなら、作ってしまえ、プラグイン

fluent-plugin-forest

ということで fluent-plugin-forest を作った。とりあえずまる1日動かしたところ問題なさそうなのでrubygems.orgにリリースし、手元では全台に配布済。

tagomoris/fluent-plugin-forest · GitHub
fluent-plugin-forest | RubyGems.org | your community gem host

これを使うことで417行あった設定ファイルが55行まで圧縮され、しかもタグが増えても再起動も要らず、タグごとの設定の違いもたいへん分かりやすいという天国的な状況に！ヤッタネ！

ちなみに上の単純な out_file のパターンだとこのようになる。

<match *>
  type forest
  subtype file
  <template>
    path /var/log/__TAG__.log
  </template>
</match>

どう使えるか

fluent-plugin-forestは以下のように動作する。

タグひとつにつき output plugin のインスタンスをひとつ生成する
- これは設定ファイルにのセクションをひとつ増やすことに(おおまかには)対応する*2
- タグのパターンごと、ではなく、タグ全体の文字列ひとつに対しインスタンスひとつ、であることに注意*3
設定中の任意の場所で __TAG__ というプレースホルダを使用できる
- この部分はタグ文字列に置き換えられた設定として output plugin に渡される
- なおここには remove_prefix / add_prefix が評価された後のタグが入る
設定では template セクションと case セクションを使用できる
- template セクションは全タグ文字列に対して適用する設定内容
- case セクションは case hoge.** のようにパターンを記述でき、そのパターンにマッチする場合だけ適用する設定内容
現状 server セクションや store セクションなどを用いる output plugin について、そのサブセクション内だけを置き換えるような書き方はできない
- つまり forward プラグインで送り先の1行だけを書き換える、みたいなやりかたは forest ではできない
- そのうちやるつもり*4

このあたりを理解して、自分の手元の実例を挙げるとこうなる。まず従来の設定内容。sourceや最後のforwardまわりの細かい設定は省いてある。

# original configurations
<source>
  type forward
</source>

<match converted.serviceX>
  type copy
  <store> # Hoop server への書き込み、流量が多いのでノードごとにファイルを分けるもので NODENAME には実際にはfluentd動作ノード名がサーバごとに書かれる
    type hoop
    hoop_server hoop.server.local:14000
    path /hoop/log/%Y%m%d/serviceX-%Y%m%d-%H.NODENAME.log
    username hoopuser
    flush_interval 60s
    output_include_time false
    output_include_tag  false
    output_data_type attr:hhmmss,vhost,path,method,status,bytes,duration,referer,rhost,userlabel,agent,FLAG,status_redirection,status_errors,rhost_internal,suffix_miscfile,suffix_imagefile,agent_bot
    add_newline true
  </store>
  <store> # 監視用サンプリング設定
    type sampling_filter
    interval 100
    remove_prefix converted
    add_prefix sampled.100
  </store>
</match>

<match converted.serviceY> # 流量の少ないサービス、path指定NODENAMEがないこと、サンプリングレートが 10 であることのみ異なる
  type copy
  <store>
    type hoop
    hoop_server hoop.server.local:14000
    path /hoop/log/%Y%m%d/serviceY-%Y%m%d-%H.log
    username hoopuser
    flush_interval 60s
    output_include_time false
    output_include_tag  false
    output_data_type attr:hhmmss,vhost,path,method,status,bytes,duration,referer,rhost,userlabel,agent,FLAG,status_redirection,status_errors,rhost_internal,suffix_miscfile,suffix_imagefile,agent_bot
    add_newline true
  </store>
  <store>
    type sampling_filter
    interval 10
    remove_prefix converted
    add_prefix sampled.10
  </store>
</match>

# 以下16回(！)くりかえし

<match sampled.**>
  type forward
  <server>
    host watcher01.local
  </server>
  <server>
    host watcher02.local
  </server>
</match>

眺めるだけでアタマが痛いですね！！！！！これを fluent-plugin-forest を使うことによりこれだけに圧縮した。

<source>
  type forward
</source>

<match converted.*>
  type copy
  <store>
    type forest
    subtype hoop
    remove_prefix converted
    <template>
      hoop_server hoop.server.local:14000
      username edge-dev
      flush_interval 60s
      output_include_time false
      output_include_tag  false
      output_data_type attr:hhmmss,vhost,path,method,status,bytes,duration,referer,rhost,userlabel,agent,FLAG,status_redirection,status_errors,rhost_internal,suffix_miscfile,suffix_imagefile,agent_bot
      add_newline true
    </template>
    <case {serviceX,serviceZ}>
      path /hoop/log/%Y%m%d/__TAG__-%Y%m%d-%H.NODENAME.log
    </case>
    <case *>
      path /hoop/log/%Y%m%d/__TAG__-%Y%m%d-%H.log
    </case>
  </store>
  <store>
    type forest
    subtype sampling_filter
    remove_prefix converted
    <case {serviceA,serviceX}>
      interval 100
      add_prefix sampled.100
    </case>
    <case *>
      interval 10
      add_prefix sampled.10
    </case>
  </store>
</match>

<match sampled.**>
  type forward
  <server>
    host watcher01.local
  </server>
  <server>
    host watcher02.local
  </server>
</match>

これで全部。どうよこの画期的にわかりやすい(？)設定！これでタグが増えてもデフォルトルールに従うだけなら設定変更はいらないし、メンテナンスも極めて簡単になって言うことなしですね！！！！！

*1:ついでにサービス毎に特定のレートで監視用のサンプリングも行う

*2:メモリ使用量的にはこの通り。matchにかかるCPU資源の使いかたには少しだけ差があるが、ほとんど増えていないはず。output plugin自体が使うCPU資源には全く差は無い。

*3:もし時刻をタグに埋めているようなものをforestに食わせると大変なことになる

*4:設定オブジェクトのノードを丁寧にたどる実装を書く必要があって面倒。誰かパッチ書いてくんないかなｗ