I’ve been having trouble indexing CSV files. In particular, CSV files from Tripwire. I’ll show you the format and how I was able to index the files in Splunk
Node Name,Node Type,Policy,Parent Test Group,Test Name,Description,Element,Result Time,Result State,Actual Value
“192.168.1.1”,Linux Server,“My Policy Name”,“My Test Group”,“My Test Name”,“My Test Description”,“Some Element”,10/25/15 2:02 AM,passed,“ELEMENT=foo”
Here’s my Props.conf stanza:
[te_scm_csv]
DATETIME_CONFIG = NONE
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
LINE_BREAKER = (?:")([\r\n]+)
INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1
Explanation:
- DATETIME_CONFIG – I want the Splunk events’ _time field to be the index time (AKA when the CSV file was generated, not the “Result Time”)
- KV_MODE – I’m not sure if this is necessary, but I disabled it just to be sure
- NO_BINARY_CHECK – same
- SHOULD_LINEMERGE – some lines in the CSV had line breaks in the field values. Splunk was throwing away these events, so I had to override the default (see LINE BREAKER, next)…
- LINE_BREAKER – In combination with SHOULD_LINEMERGE, this causes Splunk to create a new event only when a double quote is followed by a CRLF sequence.
- INDEXED_EXTRACTIONS – explicitly tell Splunk this is a CSV
- HEADER_FIELD_LINE_NUMBER – by default, Splunk “automatically detects” this. This is not a problem for most CSV files, but for this file format, it was causing the header row to be indexed as an event.
As a verification step, I loaded the same CSV file into Excel and cross-checked my event counts with a Pivot Table.