I have to run some rather lengthy and legacy PHP scripts using cron. I want to capture the pid, output (both std and err) and log it to a file, but I would also like every line of this output to be timestamped. These scripts are quite fragile and it is not viable to properly instrument them with logging.
I came up with a bit of bash + gawk script that accomplishes this like so:
# run php in a subshell, capturing the PID and redirecting the output. (php $@ 2>&1 & echo $! >&3) 3> $PID_FILE | gawk -v php_pid=$(while ! test -s $PID_FILE; do sleep .1; done; cat $PID_FILE) '{ print "[" php_pid strftime(" %Y-%m-%d %H:%M:%S] "), $0 }' >> $LOG_FILE
This works OK, however, it’s a rather convoluted and obscure solution to an otherwise simple problem.
Is there a simpler, better solution or tool to accomplish this?
Advertisement
Answer
an actual alternative –
Let awk
handle the wait.
mkfifo pipe_name # a FIFO as a tempfile for un-timestamped data php $@ > pipe_name 2>&1 & # your program in background to the FIFO echo $! > $PID_FILE # the pid into a holding file gawk 'BEGIN { while ( 0 == getline php_pid < "pid" ) { system("sleep .1") } } { print "[" php_pid strftime(" %Y-%m-%d %H:%M:%S] "), $0 }' pipe_name >> $LOG_FILE rm -f pid pipe_name # clean up
I don’t like that this spawns a whole subshell for a tenth-second sleep, but it avoids loading another extension and having to be concerned whether this local system is above or below version 5.1… (c.f. the manual entry for details)
just a rewrite
On closer inspection, this solution is riddled with problems and I don’t recommend you use it.
First, it fails if the input has more than one string.
$: for x in one two three; do sleep 1; echo test $x; done | sed 's/^/date +"%F %T - "/e' date: extra operand ‘one’ Try 'date --help' for more information. date: extra operand ‘two’ Try 'date --help' for more information. date: extra operand ‘three’ Try 'date --help' for more information.
I can fix that by putting the whole line into the date format:
$: for x in one two three; do sleep 1; echo test $x; done | sed -E 's/^(.*)/date +"%F %T - 1"/e' 2020-12-09 09:50:15 - test one 2020-12-09 09:50:16 - test two 2020-12-09 09:50:18 - test three
but that will corrupt any data it sees as a valid formatting directive.
$: for x in one two three; do sleep 1; echo "test %c $x"; done | sed -E 's/^(.*)/date +"%F %T - 1"/e' 2020-12-09 09:51:56 - test Wed, Dec 09, 2020 9:51:56 AM one 2020-12-09 09:51:57 - test Wed, Dec 09, 2020 9:51:57 AM two 2020-12-09 09:51:58 - test Wed, Dec 09, 2020 9:51:58 AM three
This method is really much kludgier than what you already had, as it’s spawning a date
process for every line…
Adding in the PID is the only reason it becomes problematic, and your solution for that, while visually painful, really isn’t bad at all.
Taking another look… is the PID file really taking long enough to write that spawning the gawk
reads an empty? This works fine for me:
$: (./tst 2>&1 & echo $! >pid ) | awk -v php_pid=$(<pid) '{ print "[" php_pid strftime(" %Y-%m-%d %H:%M:%S] "), $0 }'
If you just want a slightly less confusing, more maintainable way to write basically the same logic –
{ ./tst 2>&1 & echo $! >pid; } | { declare -i val=0; until (( val )); do cnt=$(<pid); sleep .1; done; awk -v php_pid=$(<pid) '{print "[" php_pid strftime(" %Y-%m-%d %H:%M:%S] "), $0 }'; }
Switched to curly enclosure, works fine for this.
Or, using your values,
{ php $@ 2>&1 & echo $! > $PID_FILE; } | # run the job in bg, save PID { declare -i val=0; # initialize a gatekeeper until (( val )); do cnt=$(<$PID_FILE); sleep .1; done; # wait till pid is loaded # now process the data gawk -v php_pid=$(<$PID_FILE) '{ print "[" php_pid strftime(" %Y-%m-%d %H:%M:%S] "), $0 }' >> $LOG_FILE }
original
Pipe it through sed
with a expression substitution.
$: for x in one two three; do sleep 1; echo $x; done | sed 's/^/date +"%F %T - "/e' 2020-12-04 10:26:25 - one 2020-12-04 10:26:26 - two 2020-12-04 10:26:27 - three