I am trying to write some regex to match FIGI numbers.
FIGI numbers have 12 characters and are built in the following structure:
- A two-letter prefix, excluding (BS, BM, GG, GB, GH, KY, VG)
- G as the third character
- An eight character alpha-numeric code which does not contain English vowels “A”, “E”, “I”, “O”, or “U”
- A single check digit (0-9)
E.G.
BBG000BLNNV0
is a valid FIGI
I have already:
^([A-Z]{2})(G{1})(([A-Z]|d){8})d{1}
But I am unsure on how to add the exclusions e.g. not including any vowels and not including these specific letter combinations: BS, BM, GG, GB, GH, KY, VG
Anyone have any ideas? Thank you very much!
Advertisement
Answer
You would write the exclusions with a negative lookahead/behind. Adding a simple negative lookahead for the two letter prefix to your existing regex would look like this.
^(?!BS|BM|GG|GB|GH|KY|VG)([A-Z]{2})G(([A-Z]|d){8})d{1}
But it could be shortened a little by grouping together some common sets
^(?!B[SM]|G[GBH]|KY|VG)([A-Z]{2})G(([A-Z]|d){8})d{1}
I also shortened (G{1})
to just G
because it’s the same thing, unless you need the capture group. {1}
had no effect at all.
Lastly, the not-vowels. Again, using a negative lookahead and utilizing the w
metacharacter (which is alphanumeric plus an underscore).
^(?!B[SM]|G[GBH]|KY|VG)([A-Z]{2})G((?![AEIOU_])w){8}d
Just an aside, you could replace the second negative lookahead with a positive lookahead, in combination with a negated character class.
^(?!B[SM]|G[GBH]|KY|VG)([A-Z]{2})G((?=[^AEIOU_])w){8}d