let rec weng = ~weng cv misc contact blog 博客


Understanding `format6` in OCaml by examples

2021-01-13

λ. 1. Background

Printing in OCaml was always a pain to me. Most of my hack-ed printing was done by print_string and string concatenation. My original motivation is to understand how to print in OCaml. To answer this question may need a series of posts (which I have a vague plan).

This post concerns format6, the ultimate type used by all OCaml modules on reading and printing. Most content comes from the examples I tried to understand format6. I list the reference in the last section, which I attribute to Paper Format Unraveled most.

λ. 2. Introduction

OCaml has four public modules involving reading and printing, Scanf Printf Format and Stdlib. Stdlib wraps definitions from internal camlinternalFormatBasics.

Scanf and Printf are flipped in the perspective of reading and writing, while Format can be viewed as an enhanced Printf with prettyprint. Scanf has several *scanf functions. Printf has *printf functions. Format has *printf and pp_* functions. Each module has type definitions e.g. Scanf.scanner format format4, that finally relate to ('a, 'b, 'c, 'd, 'e, 'f) format6.

# Scanf.scanf
- : ('a, 'b, 'c, 'd) Scanf.scanner = <fun>

# #show Scanf.scanner
type nonrec ('a, 'b, 'c, 'd) scanner =
    ('a, Scanf.Scanning.scanbuf, 'b, 'c, 'a -> 'd, 'd) format6 -> 'c

# Printf.fprintf
- : out_channel -> ('a, out_channel, unit) format -> 'a = <fun>

# Format.fprintf
- : Format.formatter -> ('a, Format.formatter, unit) format -> 'a = <fun>

# #show format
type nonrec ('a, 'b, 'c) format = ('a, 'b, 'c, 'c) format4

# #show format4
type nonrec ('a, 'b, 'c, 'd) format4 = ('a, 'b, 'c, 'c, 'c, 'd) format6

# #show format6
type nonrec ('a, 'b, 'c, 'd, 'e, 'f) format6 =
    ('a, 'b, 'c, 'd, 'e, 'f) format6

λ. 3. Format strings

After all, what is ('a, 'b, 'c, 'd, 'e, 'f) format6? Paper Format Unraveled describes it this holds the record for the most polymorphic datatype of the entire OCaml libary. This is the main question this post will answer.

# ("%d" : _ format6);;
- : (int -> 'a, 'b, 'c, 'd, 'd, 'a) format6 =
CamlinternalFormatBasics.Format
 (CamlinternalFormatBasics.Int (CamlinternalFormatBasics.Int_d,
   CamlinternalFormatBasics.No_padding,
   CamlinternalFormatBasics.No_precision,
   CamlinternalFormatBasics.End_of_format),
 "%d")

# let fmt = ("%d" : _ format6) in Printf.printf fmt 1
1
- : unit = ()

One impression is we can annotate a string as a format string and check its type. format6 is defined in module camlinternalFormatBasics. It’s a GADT. The GADT hints a heterogeneous list. I open this module for clarity of the code block. We don’t need to read the internal files to understand it. You can observe a lot by just watching examples.

# open CamlinternalFormatBasics
# ("%d" : _ format6);;
- : (int -> 'a, 'b, 'c, 'd, 'd, 'a) format6 =
Format (Int (Int_d, No_padding, No_precision, End_of_format), "%d")

# let fmt = ("%d" : _ format6) in let _ = Printf.printf fmt 1 in fmt
1
- : (int -> '_weak1, '_weak2, '_weak3, '_weak4, '_weak4, '_weak1) format6 =
Format (Int (Int_d, No_padding, No_precision, End_of_format), "%d")

A format string is almost a string with type format6. To make a format string explicitly, one can use the annotate as previous, or we can use format_of_string, which is almost an identity function. Stdlib also provides ^^ to concatenate two format strings.

# #show format_of_string
external format_of_string :
  ('a, 'b, 'c, 'd, 'e, 'f) format6 -> ('a, 'b, 'c, 'd, 'e, 'f) format6
  = "%identity"

# #show (^^)
val ( ^^ ) :
  ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
  ('f, 'b, 'c, 'e, 'g, 'h) format6 -> ('a, 'b, 'c, 'd, 'g, 'h) format6

# let fmt = "%d" ^^ " %d" in Printf.printf fmt 1 2
1 2
- : unit = ()

λ. 4. Type parameters of format6 (at first sight)

Both the paper Format Unraveled (section 4.2) and the official manual (stdlib#FormatStrings) describes the meanings of each type parameters in ('a, 'b, 'c, 'd, 'e, 'f) format6. I will briefly describe the type, discuss them with examples, and then revisit the material after we have a good intuition.

The types (at first sight) are

'c is only meaningful for printf functions, and 'd 'e is only meaningful for scanf functions. That’s why printf uses format (it should be named format3) or format4 where'd 'e 'f is just a duplicate of 'c . It also explains why scanf functions have unresolved 'b, which corresponds to the 3rd type parameter 'c in format6.

For explicitness, I will only use the ordinal number to specify the type parameter of format6 e.g. 3rd type parameter ('b) of format6. Otherwise, 'c is ambiguous.

# #show format
type nonrec ('a, 'b, 'c) format = ('a, 'b, 'c, 'c) format4

# #show format4
type nonrec ('a, 'b, 'c, 'd) format4 = ('a, 'b, 'c, 'c, 'c, 'd) format6

# #show format6
type nonrec ('a, 'b, 'c, 'd, 'e, 'f) format6 =
    Format of ('a, 'b, 'c, 'd, 'e, 'f) fmt * string

# #show Scanf.scanner
type nonrec ('a, 'b, 'c, 'd) scanner =
    ('a, Scanf.Scanning.scanbuf, 'b, 'c, 'a -> 'd, 'd) format6 -> 'c

λ. 5. Source of examples

Example comes from the combinations of modules {Scanf, Printf, Format}, functions {scanf, printf} and format strings {"%d", "%d %d", "%a", "%a %a", "%r", "%r "%r"}.

Format strings contain literal characters and conversions leading with % e.g. %d is for integer. In printf, %d is to print an integer that users need to provide, while in scanf %d is to read an integer from the source. Most conversions are inherited from C language. In OCaml, %a means a user-defined printer, and %r means a user-defined reader.

λ. 5.1 Typing format strings

Let’s first check standalone format strings.

# ("%d" : _ format6);;
- : (int -> 'a, 'b, 'c, 'd, 'd, 'a) format6 =
Format (Int (Int_d, No_padding, No_precision, End_of_format), "%d")

# ("%d %d" : _ format6);;
- : (int -> int -> 'a, 'b, 'c, 'd, 'd, 'a) format6 =
Format
 (Int (Int_d, No_padding, No_precision,
   Char_literal (' ', Int (Int_d, No_padding, No_precision, End_of_format))),
 "%d %d")

# ("%a" : _ format6);;
- : (('a -> 'b -> 'c) -> 'b -> 'd, 'a, 'c, 'e, 'e, 'd) format6 =
Format (Alpha End_of_format, "%a")

# ("%a %a" : _ format6);;
- : (('a -> 'b -> 'c) -> 'b -> ('a -> 'd -> 'c) -> 'd -> 'e, 'a, 'c, 'f, 'f,
     'e)
    format6
= Format (Alpha (Char_literal (' ', Alpha End_of_format)), "%a %a")

# ("%r" : _ format6);;
- : ('a -> 'b, 'c, 'd, ('c -> 'a) -> 'e, 'e, 'b) format6 =
Format (Reader End_of_format, "%r")

# ("%r %r" : _ format6);;
- : ('a -> 'b -> 'c, 'd, 'e, ('d -> 'a) -> ('d -> 'b) -> 'f, 'f, 'c) format6
= Format (Reader (Char_literal (' ', Reader End_of_format)), "%r %r")

Recalling 1st type parameter 'a is the type of the arguments after the format string fmt, plus an extra return type. Here we have

Case %d and %d %d are straightforward, taking one or two int and returning 'a. Case %a takes a user-defined printer 'a -> 'b -> 'c and a value 'b. A better annotation of user-defined printer for 't is 'low_level_device -> 't -> 'printer_result. Case %a %a takes two printers and values. Each pair of printer and value must match on the value (('a -> 'b -> 'c) matches 'b , ('a -> 'd -> 'c) matches'd) in this case. Case %r takes a user-define reader 'a -> 'b, a better annotation is 'low_level_device -> 't. low_level_device is the 2nd type parameter 'b of format6.

Noting the 6th type parameter 'f of format6 is always the return type of 'a (even without using the format string in printf or scanf functions).

λ. 5.2 Data type and user-defined functions

Let’s prepare type foo bar with user-define printers and readers. As we just said in previous section, a user-define printer has type 'low_level_device -> 't -> 'printer_result e.g. pp_foo out_foo here. A user-define reader has type

# type foo = F1 | F2
type foo = F1 | F2
# let pp_foo fmt foo = Format.fprintf fmt "%s" (match foo with | F1 -> "f1" | F2 -> "f2")
val pp_foo : Format.formatter -> foo -> unit = <fun>
# let out_foo oc foo = Printf.fprintf oc "%s" (match foo with | F1 -> "f1" | F2 -> "f2")
val out_foo : out_channel -> foo -> unit = <fun>
# let scan_foo ic = Scanf.bscanf ic "%s" (function | "f1" -> F1 | "f2" -> F2 | _ -> raise (Scanf.Scan_failure "foo"))  
val scan_foo : Scanf.Scanning.scanbuf -> foo = <fun>

# type bar = B1 | B2
type bar = B1 | B2
# let pp_bar fmt bar = Format.fprintf fmt "%s" (match bar with | B1 -> "b1" | B2 -> "b2")
val pp_bar : Format.formatter -> bar -> unit = <fun>
# let out_bar oc bar = Printf.fprintf oc "%s" (match bar with | B1 -> "b1" | B2 -> "b2")
val out_bar : out_channel -> bar -> unit = <fun>
# let scan_bar ic = Scanf.bscanf ic "%s" (function | "b1" -> B1 | "b2" -> B2 | _ -> raise (Scanf.Scan_failure "bar"))  
val scan_bar : Scanf.Scanning.scanbuf -> bar = <fun>

Recalling we just said in previous section, pp_foo has type Format.formatter -> foo -> unit as the type template 'low_level_device -> 't -> 'printer_result. out_foo is out_channel -> foo -> unit. scan_foo has type scanbuf -> foo as the type template 'low_level_device -> 't.

λ. 5.3 Examples of Scanf

# Scanf.sscanf
- : string -> ('a, 'b, 'c, 'd) Scanf.scanner = <fun>

# #show Scanf.scanner
type nonrec ('a, 'b, 'c, 'd) scanner =
    ('a, Scanf.Scanning.scanbuf, 'b, 'c, 'a -> 'd, 'd) format6 -> 'c

sscanf scans data from a string string -> _. As the string is wrapped into a string buffer in the implementation, we will have a scanbuf as the 2nd type parameter of format6. 3rd parameter 'b will be ignored (_weak<#> here) for scanf.

# let fmt = format_of_string "%d" in let _ = Scanf.sscanf "1" fmt (fun x -> x + 1) in fmt
- : (int -> int, Scanf.Scanning.scanbuf, '_weak5, (int -> int) -> int,
     (int -> int) -> int, int)
    format6
= Format (Int (Int_d, No_padding, No_precision, End_of_format), "%d")

# let fmt = format_of_string "%d %d" in let _ = Scanf.sscanf "1 2" fmt (fun x y -> x = y) in fmt
- : (int -> int -> bool, Scanf.Scanning.scanbuf, '_weak6,
     (int -> int -> bool) -> bool, (int -> int -> bool) -> bool, bool)
    format6
=
Format
 (Int (Int_d, No_padding, No_precision,
   Char_literal (' ', Int (Int_d, No_padding, No_precision, End_of_format))),
 "%d %d")

# let fmt = format_of_string "%r" in let _ = Scanf.sscanf "f1" fmt scan_foo (fun _x -> true) in fmt
- : (foo -> bool, Scanf.Scanning.scanbuf, '_weak7,
     (Scanf.Scanning.scanbuf -> foo) -> (foo -> bool) -> bool,
     (foo -> bool) -> bool, bool)
    format6
= Format (Reader End_of_format, "%r")

# let fmt = format_of_string "%r %r" in let _ = Scanf.sscanf "f1 b1" fmt scan_foo scan_bar (fun x y -> true) in fmt
- : (foo -> bar -> bool, Scanf.Scanning.scanbuf, '_weak8,
     (Scanf.Scanning.scanbuf -> foo) ->
     (Scanf.Scanning.scanbuf -> bar) -> (foo -> bar -> bool) -> bool,
     (foo -> bar -> bool) -> bool, bool)
    format6
= Format (Reader (Char_literal (' ', Reader End_of_format)), "%r %r")

In scanf, the 1st type parameter 'a of format6 equals the receiver function. The receiver function is also the last argument of scanf. Examining 'a in the examples we have:

The 4th type parameter 'd of format6 is the type of all arguments after the fmt. Since receiver function is the last argument. They must end with ... -> 'a -> 'return_type_of_a. Examining 'd we have:

Indeed, the next-to-the-last-return-type of 'd is the type of the receiver function 'a, and the return type of 'd is also the return type of the receiver function.

Now we come back to scanner. Observe:

  1. The scanner is ('a, Scanf.Scanning.scanbuf, 'b, 'c, 'a -> 'd, 'd) format6 -> 'c
  2. The 5rd type parameter ('a -> 'd here) of format6 is the result type of 4th type parameter ('c here) of format6.
  3. The 6th type parameter is the result type of the 1st type parameter.

Hence, we can conclude in scanf function:

  1. The type of 'c must end with 'a -> 'd.
  2. What’s more, if there is no user-defined reader, 'c is the same as 'a -> 'd.
  3. The 6th type parameter must be the return type of 'a. It shares the same return type of fmt and the receiver function.

The examples match our conclusion:

I have no idea of the exact reasons for these duplications. However, it’s enough for us to gain impressions for scanf.

λ. 5.4 Examples of Printf

Let’s come to the printf function.

# Printf.sprintf
- : ('a, unit, string) format -> 'a = <fun>

# Printf.printf
- : ('a, out_channel, unit) format -> 'a = <fun>

# Printf.fprintf
- : out_channel -> ('a, out_channel, unit) format -> 'a = <fun>

# #show format
type nonrec ('a, 'b, 'c) format = ('a, 'b, 'c, 'c) format4
# (* = ('a, 'b, 'c, 'c, 'c, 'c) format6 *)

format just contain three type parameters.

# let fmt = format_of_string "%d" in let _ = Printf.printf fmt 1 in fmt
1
- : (int -> unit, out_channel, unit, unit, unit, unit) format6 =
Format (Int (Int_d, No_padding, No_precision, End_of_format), "%d")

# let fmt = format_of_string "%d %d" in let _ = Printf.printf fmt 1 2 in fmt
1 2
- : (int -> int -> unit, out_channel, unit, unit, unit, unit) format6 =
Format
 (Int (Int_d, No_padding, No_precision,
   Char_literal (' ', Int (Int_d, No_padding, No_precision, End_of_format))),
 "%d %d")

# let fmt = format_of_string "%a" in let _ = Printf.printf fmt out_foo F1 in fmt
f1
- : ((out_channel -> foo -> unit) -> foo -> unit, out_channel, unit,
     unit, unit, unit)
    format6
= Format (Alpha End_of_format, "%a")

# let fmt = format_of_string "%a %a" in let _ = Printf.printf fmt out_foo F1 out_bar B1 in fmt
f1 b1
- : ((out_channel -> foo -> unit) ->
     foo -> (out_channel -> bar -> unit) -> bar -> unit, out_channel,
     unit, unit, unit, unit)
    format6
= Format (Alpha (Char_literal (' ', Alpha End_of_format)), "%a %a")

In printf, the 2nd type parameter of format6 is always out_channel (it means stdout). It’s unit of sprintf since printing to string doesn’t involve any low level devices. The 3rd type parameter is string if printing to (and returning) a string, otherwise it’s unit. Let’s examine the 1nd type parameter.

Each %d requires one int value and each %a requires a corresponding user-define printer and a corresponding value e.g. (out_channel -> foo -> unit) and foo.

# out_foo
- : out_channel -> foo -> unit = <fun>

# Printf.sprintf "%a" out_foo F1
Line 1, characters 21-28:
Error: This expression has type out_channel -> foo -> unit
       but an expression was expected of type unit -> 'a -> string
       Type out_channel is not compatible with type unit

A user-define printer has type 'low_level_device -> 't -> 'printer_result, so it must match the low_level_device to use. We cannot print to a string with a printer for out_channel directly.

λ. 5.5 Examples of Format

# Format.printf
- : ('a, Format.formatter, unit) format -> 'a = <fun>

# Printf.sprintf
- : ('a, unit, string) format -> 'a = <fun>

# Printf.fprintf
- : out_channel -> ('a, out_channel, unit) format -> 'a = <fun>

Format is a enhanced Printf. Format.formatter is an abstract over low-level devices. Format.std_formatter outputs to stdout, Format.err_formatter outputs to stderr. Most functions in Format has a variant operating on std_formatter and a generic version with prefix pp_ operating on its formatter argument.

# let fmt = format_of_string "%d" in let _ = Format.printf fmt 1 in fmt
- : (int -> unit, Format.formatter, unit, unit, unit, unit) format6 =
Format (Int (Int_d, No_padding, No_precision, End_of_format), "%d")

# let fmt = format_of_string "%d %d" in let _ = Format.printf fmt 1 2 in fmt
- : (int -> int -> unit, Format.formatter, unit, unit, unit, unit) format6 =
Format
 (Int (Int_d, No_padding, No_precision,
   Char_literal (' ', Int (Int_d, No_padding, No_precision, End_of_format))),
 "%d %d")

# let fmt = format_of_string "%a" in let _ = Format.printf fmt pp_foo F1 in fmt
- : ((Format.formatter -> foo -> unit) -> foo -> unit, Format.formatter,
     unit, unit, unit, unit)
    format6
= Format (Alpha End_of_format, "%a")

# let fmt = format_of_string "%a %a" in let _ = Format.printf fmt pp_foo F1 pp_bar B1 in fmt
- : ((Format.formatter -> foo -> unit) ->
     foo -> (Format.formatter -> bar -> unit) -> bar -> unit,
     Format.formatter, unit, unit, unit, unit)
    format6
= Format (Alpha (Char_literal (' ', Alpha End_of_format)), "%a %a")

The result of Format.printf is similar to Printf.printf:

The 1st type parameter is also similar to cases in Printf:

The only difference is Format.formatter instead of out_channel in it.

λ. 5.6 Examples of kprintf

In scanf examples, we mention the 1st, 2nd, 4th, 5th, 6th type parameters, and in printf examples, we mention the 1st, 2nd, 6th type parameters of format6. The only left is the 3rd type parameter. For this, we use kprintf. k in kprintf means continuation, a function will accept the printing result and do the next computation.

# let string_of_foo () foo = match foo with | F1 -> "f1" | F2 -> "f2"
val string_of_foo : unit -> foo -> string = <fun>

# Printf.ksprintf
- : (string -> 'd) -> ('a, unit, string, 'd) format4 -> 'a = <fun>

# let fmt = format_of_string "%a" in let _ = Printf.ksprintf (fun s -> s, String.length) fmt string_of_foo F1 in fmt
- : ((unit -> foo -> string) -> foo -> string * (string -> int), unit,
     string, string, string, string * (string -> int))
    format6
= Format (Alpha End_of_format, "%a")

# Format.kfprintf
- : (Format.formatter -> 'a) ->
    Format.formatter -> ('b, Format.formatter, unit, 'a) format4 -> 'b
= <fun>

# let fmt = format_of_string "%a" in let _f fmter = Format.kfprintf (fun _fmt s -> s, String.length) fmter fmt pp_foo F1 in fmt
- : ((Format.formatter -> foo -> unit) ->
     foo -> '_weak9 -> '_weak9 * (string -> int), Format.formatter, unit,
     unit, unit, '_weak9 -> '_weak9 * (string -> int))
    format6
= Format (Alpha End_of_format, "%a")

The 2nd type parameter of format6 is unit rather than string, since string is not so low-level. The type template of user-defined printer 'low_level_device -> 't -> 'printer_result is unit -> 't -> string here. None of our previous printer out_foo or pp_foo matches this, so we need to make a string_of_foo here with an extra dummy argument unit.

The 3rd type parameter of format4 (format6) is the type of the argument of the continuation. It’s string for Printf.ksprintf and it’s Format.formatter for Format.kfprintf.

The 6th type parameter here matches the return type of the continuation function, a.k.a. string * int in both examples.

λ. 6. Type parameters of format6 (recap)

After examples in previous sections, we can recap the type parameters in section 4.

We can also reason it from the perspective of user-defined printers and readers. For user-defined type 't, there is a type concerning 't where only its value is needed e.g. receiver function in scanf. There is also a type concerning 't where both its value and its user-defined functions are needed e.g. 1st type parameter 'a for printf, and 4th type parameter 'd for scanf. Where to put this two-forms information is a result of both design and legacy code, however, it makes the format6 finally.

λ. 7. Others

The GADT-based format implementation is explained in this GitHub issue. I found it in c-cube’s blog Format All the Data Structures. Format Unraveled explains the history of Format, typechecking on Format, the prettyprint models, and the semantic tag. OCaml source code also contains some comments in camlinternalFormatBasics.ml. I presented this format6 topic in my lab’s seminar, among which I am the least experienced OCaml programmer, who happens to study this topic most. My original examples were all in sketch.sh but I found it doesn’t support exporting yet. Therefore, I changed to mdx and it’s really cool.

I omit %t conversion in this post because I think it helps little to understanding format6 though it doesn’t bring extra difficulties either.

I plan to write a series of blogs on OCaml (formatting), using the same approaching by examples. OCaml world needs more examples, I think.

λ. Reference


Understanding `format6` in OCaml by examples