API
iobes
- class iobes.SpanFormat[source]
A description of a tag format.
The anatomy of a tag is
{token-function}-{span-type}. It has two parts, the second part is the type of the span. This is specific to the downstream task and can be things likePERorLOCfor NER or things liketo_cityandfrom_cityfor slot filling used in a dialogue manager for air travel booking. The first part is the token function. It is generic across tasks and it is used when converting per-token labels into spans. Common examples is when a tag starts with aB-we know that it is the beginning of a new span or when a tag starts withI-we know that is is inside of a span.Note
Some tag formats have special
endandsinglevalues for tags that end a span and tags that constitute a whole span in themselves while others don’t. The span encoding formats that don’t have end and single tokens just repeats of theinsideandbeginattributes respectively.- BEGIN: str | None = None
This is the token function that all tags that trigger a new span should have
- INSIDE: str | None = None
This is the token function that all tags inside of a span should have
- END: str | None = None
This is the token function that all tags at the end of a span should have
- SINGLE: str | None = None
This is the token function that all tags that constitute a span of length 1 should have
- class iobes.TokenFunction[source]
Prefixes for tags that are used in decoding.
In general tags can be broken into two parts, The first is the token function which tells you something about how the decoding parser should act when it hits this tag and the second half is the type (PER, LOC, etc) of the span.
- OUTSIDE = 'O'
This tag is not in any span, this is a rare one that is a whole tag, not just a prefix
- BEGIN = 'B'
This tag starts a span
- INSIDE = 'I'
This tag is in the middle of a span
- MIDDLE = 'M'
This tag is in the middle of a span
- END = 'E'
This tag ends a span
- LAST = 'L'
This tag ends a span
- SINGLE = 'S'
This tag by itself represents a span
- UNIT = 'U'
This tag by itself represents a span
- WHOLE = 'W'
This tag by itself represents a span
- GO = '<GO>'
This tag is a special tag for the beginning of a sequence
- EOS = '<EOS>'
This tag is a special tag for the end of a sequence
- class iobes.IOB[source]
The original IOB tagging format.
The first span encoding format proposed in Ramshaw and Marcus, 1995
This is the only format this is contextual, When two spans for the same type are touching then the first token of the second span would be a
B-where as in cases when the first token is not following (touching) another span of the same type it would be anI-. So the value of the BEGIN tag isn’t known without context. The same applies to the SINGLE tag. When a span is a single token the prefix will beI-if it is preceded by no span, or a span of a different type. It would use the prefixB-if the previous span was that same type.- BEGIN: str | None = None
The prefix for the beginning of the span in unknown a priori
- INSIDE: str | None = 'I'
The inside of a span is always known.
- END: str | None = 'I'
The end token is always known, it is the same as the inside token.
- SINGLE: str | None = None
Like the beginning token, the single token span is unknown without the previous span type.
- class iobes.BIO[source]
The improved BIO tagging format.
This is an improvement to the IOB format. All entities, regardless of the value of the previous span, start with a
B-token. This is a context independent format because we always know that the first token is aB-. There is not special end tag however. Things like anOand a token of a different type trigger the end of the entity.- BEGIN: str | None = 'B'
This is the token function that all tags that trigger a new span should have
- INSIDE: str | None = 'I'
This is the token function that all tags inside of a span should have
- END: str | None = 'I'
This is the token function that all tags at the end of a span should have
- SINGLE: str | None = 'B'
This is the token function that all tags that constitute a span of length 1 should have
- class iobes.IOBES[source]
The best tagging format.
** TODO ** flesh out
This format adds an END tag that needs to show up at the end of entities. This format has been shown to be better than IOB or BIO (Ratinov and Roth, 2009) and should be used instead.
- BEGIN: str | None = 'B'
This is the token function that all tags that trigger a new span should have
- INSIDE: str | None = 'I'
This is the token function that all tags inside of a span should have
- END: str | None = 'E'
This is the token function that all tags at the end of a span should have
- SINGLE: str | None = 'S'
This is the token function that all tags that constitute a span of length 1 should have
- class iobes.BILOU[source]
The BILOU format.
** TODO ** flesh out
This is the same as the IOBES format but we just have different values for the END and SINGLE tokens.
- BEGIN: str | None = 'B'
This is the token function that all tags that trigger a new span should have
- INSIDE: str | None = 'I'
This is the token function that all tags inside of a span should have
- END: str | None = 'L'
This is the token function that all tags at the end of a span should have
- SINGLE: str | None = 'U'
This is the token function that all tags that constitute a span of length 1 should have
- class iobes.BMEOW[source]
The BMEOW format.
** TODO ** flesh out
From Borthwick, 1999
This is the same as the IOBES format but we just have different values for the INSIDE and SINGLE tokens.
- BEGIN: str | None = 'B'
This is the token function that all tags that trigger a new span should have
- INSIDE: str | None = 'M'
This is the token function that all tags inside of a span should have
- END: str | None = 'E'
This is the token function that all tags at the end of a span should have
- SINGLE: str | None = 'W'
This is the token function that all tags that constitute a span of length 1 should have
- class iobes.TOKEN[source]
A format to use when processing tokens.
In this case the tags are supposed to be for the tokens themselves instead of being converted into spans. This format makes sure that each tag is converted into a span of length
1. This lets us run metrics over individual tags without having to change our processing code. This is used for things like part of speech tagging.Due to the fact that there are no special prefixes for tokens that dictate the function a token plays in a span all the class values are left as
None.
- class iobes.SpanEncoding(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
An enumeration of the kind of span encoding schemes we support processing.
- TOKEN = <class 'iobes.TOKEN'>
- IOB = <class 'iobes.IOB'>
- BIO = <class 'iobes.BIO'>
- IOBES = <class 'iobes.IOBES'>
- BILOU = <class 'iobes.BILOU'>
- BMEOW = <class 'iobes.BMEOW'>
- BMEWO = <class 'iobes.BMEOW'>
- classmethod from_string(value)[source]
Parse string into a specific span encoding format.
- Parameters:
value (str) – The string to dispatch to encoding on.
- Raises:
ValueError – If the string cannot be recognized as pointing to a specific SpanEncoding format.
- Returns:
The SpanEncoding member.
- Return type:
- class iobes.Span(type, start, end, tokens)[source]
Our representation of a span of text.
Note
Our
endattribute of a span is one greater than the index of the final token in the span. This is so that python list slicing works. For example,tokens[span.start : span.end]will yield the surface form of the span.- Parameters:
type (str) – The type of the span in our downstream task, things like
PERorLOC.start (int) – The index into the tokens list where the span starts.
end (int) – The index of the last token of the span plus 1.
tokens (Tuple[int, ...]) – The indices that are part of the span.
- type: str
Alias for field number 0
- start: int
Alias for field number 1
- end: int
Alias for field number 2
- tokens: Tuple[int, ...]
Alias for field number 3
- class iobes.ErrorType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
- class iobes.Error(location, type, current, previous, next)[source]
An error encountered when parsing tags into spans.
- Parameters:
location (int) – The index where the error occurred
type (str) – What kind of error is it. TODO These types need to be enumerated and hammer out the specifics
current (str | None) – The tag at the index of the error
previous (str | None) – The previous tag
next (str | None) – The next tag
- location: int
Alias for field number 0
- type: str
Alias for field number 1
- current: str | None
Alias for field number 2
- previous: str | None
Alias for field number 3
- next: str | None
Alias for field number 4
iobes.parse
- iobes.parse.parse_spans(seq, span_type)[source]
Parse a sequence of labels into a list of spans.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
span_type (SpanEncoding) – The span encoding format used to encode the spans into the labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_with_errors(seq, span_type)[source]
Parse a sequence of labels into a list of spans but return any violations of the encoding scheme.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- Parameters:
seq (Sequence[str]) – The sequence of labels
span_type (SpanEncoding) – The span encoding format the spans are encoded into the labels with
- Returns:
A list of spans and a list of errors.
- Return type:
- iobes.parse.parse_spans_token(seq)[source]
Parse a sequence of labels into a list of spans.
Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_token_with_errors(seq)[source]
Parse a sequence of labels into a list of spans but return any violations of the encoding scheme.
Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- iobes.parse.parse_spans_iob(seq)[source]
Parse a sequence of IOB encoded labels into a list of spans.
Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_iob_with_errors(seq)[source]
Parse a sequence of IOB encoded labels into a list of spans but return any violations of the encoding scheme.
Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- iobes.parse.parse_spans_bio(seq)[source]
Parse a sequence of BIO labels into a list of spans.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_bio_with_errors(seq)[source]
Parse a sequence of BIO labels into a list of spans but return any violations of the encoding scheme.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- iobes.parse.parse_spans_with_end(seq, span_format)[source]
Parse a sequence of labels into a list of spans.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
This is a generic function that can parse IOBES, BILOU, and BMEWO formats.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
span_format (Type[SpanFormat]) – A description of the span encoding format.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_with_end_with_errors(seq, span_format)[source]
Parse a sequence of labels into a list of spans but return any violations of the encoding scheme.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
Note
This is a generic function that can parse IOBES, BILOU, and BMEWO formats.
- Parameters:
seq (Sequence[str]) – The sequence of labels
span_format (Type[SpanFormat]) –
- Returns:
A list of spans and a list of errors.
- Return type:
- iobes.parse.parse_spans_iobes(seq)[source]
Parse a sequence of IOBES encoded labels into a list of spans.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_iobes_with_errors(seq)[source]
Parse a sequence of IOBES encoded labels into a list of spans but return any violations of the encoding scheme.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- iobes.parse.parse_spans_bilou(seq)[source]
Parse a sequence of BILOU labels into a list of spans.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_bilou_with_errors(seq)[source]
Parse a sequence of BILOU labels into a list of spans but return any violations of the encoding scheme.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- iobes.parse.parse_spans_bmeow(seq)[source]
Parse a sequence of BMEOW labels into a list of spans.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_bmewo(seq)[source]
Parse a sequence of BMEWO labels into a list of spans.
Note
Alias for
parse_spans_bmeow()Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
- Parameters:
seq (Sequence[str]) – The sequence of labels.
- Returns:
A list of spans.
- Return type:
List[Span]
- iobes.parse.parse_spans_bmeow_with_errors(seq)[source]
Parse a sequence of BMEOW labels into a list of spans but return any violations of the encoding scheme.
Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- iobes.parse.parse_spans_bmewo_with_errors(seq)[source]
Parse a sequence of BMEOW labels into a list of spans but return any violations of the encoding scheme.
Note
Alias for
parse_spans_bmeow_with_errors()Note
In the case where labels violate the span encoded scheme, for example the tag is a new type (like
I-ORG) in the middle of a span of another type (likePER) without a proper starting token (B-ORG) we will finish the initial span and start a new one, resulting in two spans. This follows theconlleval.plscript.Note
Span are returned sorted by their starting location. Due to the fact that spans are not allowed to overlap there is no resolution policy when two spans have same starting location.
Note
Errors are returned sorted by the location where the violation occurred. In the case a single transition triggered multiple errors they are sorted lexically based on the error type.
- iobes.parse.validate_tags(tags, span_type)[source]
Check for errors in a tag scheme.
- Parameters:
tags (Sequence[str]) – The tags we are parsing.
span_type (SpanEncoding) – The span encoding scheme we have used.
- Raises:
ValueError – If the span encoding scheme isn’t recognized.
- Returns:
True if the tags don’t have any formatting errors, False otherwise.
- Return type:
bool
- iobes.parse.validate_tags_iob(tags)[source]
Check for errors in IOB tags.
- Parameters:
tags (Sequence[str]) – The IOB tags we are parsing.
- Returns:
True if the IOB tags are well-formed, False otherwise.
- Return type:
bool
- iobes.parse.validate_tags_bio(tags)[source]
Check for errors in BIO tags.
- Parameters:
tags (Sequence[str]) – The BIO tags we are parsing.
- Returns:
True if the BIO tags are well-formed, False otherwise.
- Return type:
bool
- iobes.parse.validate_tags_iobes(tags)[source]
Check for errors in IOBES tags.
- Parameters:
tags (Sequence[str]) – The IOBES tags we are parsing.
- Returns:
True if the IOBES tags are well-formed, False otherwise.
- Return type:
bool
- iobes.parse.validate_tags_bilou(tags)[source]
Check for errors in BILOU tags.
- Parameters:
tags (Sequence[str]) – The BILOU tags we are parsing.
- Returns:
True if the BILOU tags are well-formed, False otherwise.
- Return type:
bool
- iobes.parse.validate_tags_bmeow(tags)[source]
Check for errors in BMEOW tags.
- Parameters:
tags (Sequence[str]) – The BMEOW tags we are parsing.
- Returns:
True if the BMEOW tags are well-formed, False otherwise.
- Return type:
bool
iobes.convert
- iobes.convert.convert_tags(tags, parse_function, write_function)[source]
Convert tags from one format to another.
- Parameters:
tags (Sequence[str]) – The tags that we are converting.
parse_function (ParseWithErrorsCallable) – A function that parses tags into spans.
write_function (WriteCallable) – A function the turns spans into a list of tags.
- Raises:
ValueError – If there were errors in the tag formatting.
- Returns:
The list of tags in the new format.
- Return type:
List[str]
- iobes.convert.iob_to_bio(tags)[source]
Convert IOB tags to the BIO format.
- Parameters:
tags (Sequence[str]) – The IOB tags we are converting
- Raises:
ValueError – If there were errors in the IOB formatting of the input.
- Returns:
Tags that produce the same spans in the BIO format.
- Return type:
List[str]
- iobes.convert.iob_to_iobes(tags)[source]
Convert IOB tags to the IOBES format.
- Parameters:
tags (Sequence[str]) – The IOB tags we are converting
- Raises:
ValueError – If there were errors in the IOB formatting of the input.
- Returns:
Tags that produce the same spans in the IOBES format.
- Return type:
List[str]
- iobes.convert.iob_to_bilou(tags)[source]
Convert IOB tags to the BILOU format.
- Parameters:
tags (Sequence[str]) – The IOB tags we are converting
- Raises:
ValueError – If there were errors in the IOB formatting of the input.
- Returns:
Tags that produce the same spans in the BILOU format.
- Return type:
List[str]
- iobes.convert.iob_to_bmeow(tags)[source]
Convert IOB tags to the BMEOW format.
- Parameters:
tags (Sequence[str]) – The IOB tags we are converting
- Raises:
ValueError – If there were errors in the IOB formatting of the input.
- Returns:
Tags that produce the same spans in the BMEOW format.
- Return type:
List[str]
- iobes.convert.iob_to_bmewo(tags)[source]
Convert IOB tags to the BMEWO format.
Note
Alias for
iob_to_bmeow().- Parameters:
tags (Sequence[str]) – The IOB tags we are converting
- Raises:
ValueError – If there were errors in the IOB formatting of the input.
- Returns:
Tags that produce the same spans in the BMEOW format.
- Return type:
List[str]
- iobes.convert.bio_to_iob(tags)[source]
Convert BIO tags to the IOB format.
- Parameters:
tags (Sequence[str]) – The BIO tags we are converting
- Raises:
ValueError – If there were errors in the BIO formatting of the input.
- Returns:
Tags that produce the same spans in the IOB format.
- Return type:
List[str]
- iobes.convert.bio_to_iobes(tags)[source]
Convert BIO tags to the IOBES format.
- Parameters:
tags (Sequence[str]) – The BIO tags we are converting
- Raises:
ValueError – If there were errors in the BIO formatting of the input.
- Returns:
Tags that produce the same spans in the IOBES format.
- Return type:
List[str]
- iobes.convert.bio_to_bilou(tags)[source]
Convert BIO tags to the BILOU format.
- Parameters:
tags (Sequence[str]) – The BIO tags we are converting
- Raises:
ValueError – If there were errors in the BIO formatting of the input.
- Returns:
Tags that produce the same spans in the BILOU format.
- Return type:
List[str]
- iobes.convert.bio_to_bmeow(tags)[source]
Convert BIO tags to the BMEOW format.
- Parameters:
tags (Sequence[str]) – The BIO tags we are converting
- Raises:
ValueError – If there were errors in the BIO formatting of the input.
- Returns:
Tags that produce the same spans in the BMEOW format.
- Return type:
List[str]
- iobes.convert.bio_to_bmewo(tags)[source]
Convert BIO tags to the BMEWO format.
Note
Alias for
bio_to_bmeow()- Parameters:
tags (Sequence[str]) – The BIO tags we are converting
- Raises:
ValueError – If there were errors in the BIO formatting of the input.
- Returns:
Tags that produce the same spans in the BMEWO format.
- Return type:
List[str]
- iobes.convert.iobes_to_iob(tags)[source]
Convert IOBES tags to the IOB format.
- Parameters:
tags (Sequence[str]) – The IOBES tags we are converting
- Raises:
ValueError – If there were errors in the IOBES formatting of the input.
- Returns:
Tags that produce the same spans in the IOB format.
- Return type:
List[str]
- iobes.convert.iobes_to_bio(tags)[source]
Convert IOBES tags to the BIO format.
- Parameters:
tags (Sequence[str]) – The IOBES tags we are converting
- Raises:
ValueError – If there were errors in the IOBES formatting of the input.
- Returns:
Tags that produce the same spans in the BIO format.
- Return type:
List[str]
- iobes.convert.iobes_to_bilou(tags)[source]
Convert IOBES tags to the BILOU format.
- Parameters:
tags (Sequence[str]) – The IOBES tags we are converting
- Raises:
ValueError – If there were errors in the IOBES formatting of the input.
- Returns:
Tags that produce the same spans in the BILOU format.
- Return type:
List[str]
- iobes.convert.iobes_to_bmeow(tags)[source]
Convert IOBES tags to the BMEOW format.
- Parameters:
tags (Sequence[str]) – The IOBES tags we are converting
- Raises:
ValueError – If there were errors in the IOBES formatting of the input.
- Returns:
Tags that produce the same spans in the BMEOW format.
- Return type:
List[str]
- iobes.convert.iobes_to_bmewo(tags)[source]
Convert IOBES tags to the BMEWO format.
Note
Alias for
iobes_to_bmeow()- Parameters:
tags (Sequence[str]) – The IOBES tags we are converting
- Raises:
ValueError – If there were errors in the IOBES formatting of the input.
- Returns:
Tags that produce the same spans in the BMEWO format.
- Return type:
List[str]
- iobes.convert.bilou_to_iob(tags)[source]
Convert BILOU tags to the IOB format.
- Parameters:
tags (Sequence[str]) – The BILOU tags we are converting
- Raises:
ValueError – If there were errors in the BILOU formatting of the input.
- Returns:
Tags that produce the same spans in the IOB format.
- Return type:
List[str]
- iobes.convert.bilou_to_bio(tags)[source]
Convert BILOU tags to the BIO format.
- Parameters:
tags (Sequence[str]) – The BILOU tags we are converting
- Raises:
ValueError – If there were errors in the BILOU formatting of the input.
- Returns:
Tags that produce the same spans in the BIO format.
- Return type:
List[str]
- iobes.convert.bilou_to_iobes(tags)[source]
Convert BILOU tags to the IOBES format.
- Parameters:
tags (Sequence[str]) – The BILOU tags we are converting
- Raises:
ValueError – If there were errors in the BILOU formatting of the input.
- Returns:
Tags that produce the same spans in the IOBES format.
- Return type:
List[str]
- iobes.convert.bilou_to_bmeow(tags)[source]
Convert BILOU tags to the BMEOW format.
- Parameters:
tags (Sequence[str]) – The BILOU tags we are converting
- Raises:
ValueError – If there were errors in the BILOU formatting of the input.
- Returns:
Tags that produce the same spans in the BMEOW format.
- Return type:
List[str]
- iobes.convert.bilou_to_bmewo(tags)[source]
Convert BILOU tags to the BMEWO format.
Note
Alias for
bilou_to_bmeow()- Parameters:
tags (Sequence[str]) – The BILOU tags we are converting
- Raises:
ValueError – If there were errors in the BILOU formatting of the input.
- Returns:
Tags that produce the same spans in the BMEWO format.
- Return type:
List[str]
- iobes.convert.bmeow_to_iob(tags)[source]
Convert BMEOW tags to the IOB format.
- Parameters:
tags (Sequence[str]) – The BMEOW tags we are converting
- Raises:
ValueError – If there were errors in the BMEOW formatting of the input.
- Returns:
Tags that produce the same spans in the IOB format.
- Return type:
List[str]
- iobes.convert.bmeow_to_bio(tags)[source]
Convert BMEOW tags to the BIO format.
- Parameters:
tags (Sequence[str]) – The BMEOW tags we are converting
- Raises:
ValueError – If there were errors in the BMEOW formatting of the input.
- Returns:
Tags that produce the same spans in the BIO format.
- Return type:
List[str]
- iobes.convert.bmeow_to_iobes(tags)[source]
Convert BMEOW tags to the IOBES format.
- Parameters:
tags (Sequence[str]) – The BMEOW tags we are converting
- Raises:
ValueError – If there were errors in the BMEOW formatting of the input.
- Returns:
Tags that produce the same spans in the IOBES format.
- Return type:
List[str]
- iobes.convert.bmeow_to_bilou(tags)[source]
Convert BMEOW tags to the BILOU format.
- Parameters:
tags (Sequence[str]) – The BMEOW tags we are converting
- Raises:
ValueError – If there were errors in the BMEOW formatting of the input.
- Returns:
Tags that produce the same spans in the BILOU format.
- Return type:
List[str]
- iobes.convert.bmewo_to_iob(tags)[source]
Convert BMEWO tags to the IOB format.
Note
Alias for
bmeow_to_iob()- Parameters:
tags (Sequence[str]) – The BMEWO tags we are converting
- Raises:
ValueError – If there were errors in the BMEWO formatting of the input.
- Returns:
Tags that produce the same spans in the IOB format.
- Return type:
List[str]
- iobes.convert.bmewo_to_bio(tags)[source]
Convert BMEWO tags to the BIO format.
Note
Alias for
bmeow_to_bio()- Parameters:
tags (Sequence[str]) – The BMEWO tags we are converting
- Raises:
ValueError – If there were errors in the BMEWO formatting of the input.
- Returns:
Tags that produce the same spans in the BIO format.
- Return type:
List[str]
- iobes.convert.bmewo_to_iobes(tags)[source]
Convert BMEWO tags to the IOBES format.
Note
Alias for
bmeow_to_iobes()- Parameters:
tags (Sequence[str]) – The BMEWO tags we are converting
- Raises:
ValueError – If there were errors in the BMEWO formatting of the input.
- Returns:
Tags that produce the same spans in the IOBES format.
- Return type:
List[str]
- iobes.convert.bmewo_to_bilou(tags)[source]
Convert BMEWO tags to the BILOU format.
Note
Alias for
bmeow_to_bilou()- Parameters:
tags (Sequence[str]) – The BMEWO tags we are converting
- Raises:
ValueError – If there were errors in the BMEWO formatting of the input.
- Returns:
Tags that produce the same spans in the BILOU format.
- Return type:
List[str]
iobes.utils
- iobes.utils.extract_type(tag, sep='-')[source]
Extract the span type from a tag.
Tags are made of two parts. The second part is the type of the span which is specific to the downstream task. This function extracts that value from the tag.
- Parameters:
tag (str) – The tag to extract the type from.
sep (str) – The character (or string of characters) that separate the token function from the span type.
- Returns:
The span type.
- Return type:
str
- iobes.utils.extract_function(tag, sep='-')[source]
Extract the token function from a tag.
Tags are made of two parts. The first part is the role that this tag plays in a span. It is generic across datasets (but differs across different span formatting options) and tells us things like this tag is the beginning or a span or this tag ends a span. This function extracts the token function or from the tag.
- Parameters:
tag (str) – The tag to extract the token function from.
sep (str) – The character (or string of characters) that separate the token function from the span type.
- Returns:
The token function of this tag.
- Return type:
str
- iobes.utils.safe_get(xs, idx)[source]
Get the element at some index but return
Nonewhen the index is out of bounds.- Parameters:
xs (Sequence[T]) – The list to extract from.
idx (int) – The index to try to pull from.
- Returns:
The value at
idxorNoneifidxis out of bounds.- Return type:
T | None
- iobes.utils.sort_spans(spans)[source]
Sort the list of spans.
Note
The spans are sorted by their starting location and ties broken by their end. This tie should never happen because span are not allowed to overlap.
iobes.transition
- class iobes.transition.Transition(source, target, valid)[source]
A transition from one state to another.
This includes information about whether the transition is legal or not. The legality of a transition is dictated by the span encoding scheme used.
- Parameters:
source (str) – The state you are starting at.
target (str) – The state you are going to.
valid (bool) – Is this transition allowed by the encoding scheme?
- source: str
Alias for field number 0
- target: str
Alias for field number 1
- valid: bool
Alias for field number 2
- iobes.transition.transitions_legality(tags, span_type, start='<GO>', end='<EOS>')[source]
Get the transition legality for some SpanEncoding format.
Return a list of transitions and their legality based on the SpanEncoding schemes and the types of spans present.
This is a convenience function that dispatches to span encoding specific implementations based on the span_type.
Note
We include special tags that represent the start and end of sequences. These are special values that used downstream implementations of things like Conditional Random Fields (CRFs) Lafferty et. al., 2001 and helps define constraints about what tags are allowed on the first and last token in a sequence. General rules around the start symbol is that nothing can transition to the start token and the legal targets of a transition from a start symbol is limited by the span encoding scheme. Similarly the end token cannot transition into anything else and what can transition to it is specified by the encoding scheme.
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
span_type (SpanEncoding) – The span encoding format we are trying to use. Different formats impose different rules about which transitions are legal or not.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Raises:
ValueError – If the span encoding scheme isn’t recognized.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.token_transitions_legality(tags, start='<GO>', end='<EOS>')[source]
Get transition legality when processing tokens.
Note
Token level annotations like Part of Speech Tagging don’t have transition constrains defined by the span encoding scheme (because there is no span encoding scheme). This means that most every transition is allowed. The only illegal transitions are moving back to the special start token or leaving the end token.
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.iob_transitions_legality(tags, start='<GO>', end='<EOS>')[source]
Get transition legality when processing IOB tags.
There are a few rules the govern IOB tagging. Spans are allowed to begin with an I- so a lot of the rules other span encoding formats about not transitioning from O to and I- don’t apply. The main rules are around the use of the B- token. In IOB we are only allowed to start a token with a B- when it is the start of a new span that directly follows (touches) a previous span of the same time. This translates into rules that B- tokens can only follow tags that have the same type (either B- or I-)
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.bio_transitions_legality(tags, start='<GO>', end='<EOS>')[source]
Get transition legality when processing BIO tags.
TODO
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.with_end_transitions_legality(tags, span_format, start='<GO>', end='<EOS>')[source]
Get transition legality when processing tags when the encoding scheme has a end token function.
Span encoding schemes that have special token prefixes for tokens that are the start, middle, and end of a span (and a specific prefix for a token that represents a single token span) have quite a few more rule. These can mostly be summed up as spans need to start with the starting prefix and end with the ending prefix. What this means that things like the inside tokens can’t follow an outside and can’t be followed by an outside. It also has rules like the beginning token can’t be followed by an ending token that is a different type.
Note
Several span formats like IOBES, BILOU, and BMEOW are the same except for the value of some of the TokenFunction (IOBES has E for the end while BILOU has L). Other than these differences these all behave the same way. This function parses all of these formats by comparing to the things like the SpanFormat.BEGIN instead of the literal string. This is the underlying implementation but the user facing function to get the transitions for a specific encoding scheme should be used.
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
span_format (Type[SpanFormat]) – The SpanFormat we are using for these tags.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.iobes_transitions_legality(tags, start='<GO>', end='<EOS>')[source]
Get transition legality when processing IOBES tags.
TODO
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.bilou_transitions_legality(tags, start='<GO>', end='<EOS>')[source]
Get transition legality when processing BILOU tags.
TODO
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.bmeow_transitions_legality(tags, start='<GO>', end='<EOS>')[source]
Get transition legality when processing BMEOW tags.
TODO
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.bmewo_transitions_legality(tags, start='<GO>', end='<EOS>')
Get transition legality when processing BMEOW tags.
TODO
- Parameters:
tags (Set[str]) – The tags that we can assign to tokens.
start (str) – A special tag representing the start of all sequences.
end (str) – A special tag representing the end of all sequences.
- Returns:
The list of transitions.
- Return type:
List[Transition]
- iobes.transition.transitions_to_tuple_map(transitions)[source]
Convert the list of transitions to a dictionary keyed by the (source, target) tuple.
This data structure is useful when given a pair of states you want to check if the transition is legal in O(1) time.
- Parameters:
transitions (List[Transition]) – The list of transitions.
- Returns:
A dictionary mapping (source, target) pairs to the legality of that transition.
- Return type:
Dict[Tuple[str, str], bool]
- iobes.transition.transitions_to_map(transitions)[source]
Convert the list of transitions into nested dictionaries keyed by the states.
The data format is a dictionary mapping source to a dictionary of target. This inner dictionary has the legality of the transition as values. For example result[src][tgt] being True means that the transition from src to tgt is valid while a False means it is illegal.
This data structure is useful when given a state you want to see the legality of transitions from it to other states.
- Parameters:
transitions (List[Transition]) – The list of transitions.
- Returns:
Nested dictionaries representing the legality of transitions.
- Return type:
Dict[str, Dict[str, bool]]
- iobes.transition.transitions_to_mask(transitions, vocabulary)[source]
Convert the list of transitions into a mask.
The starting state is represented by the row index in the mask while the ending state is represented by the column index. A value of one in the mask means the transition was legal while a zero means it was illegal. For example, mask[src, tgt] == 1 is means the transition from src to tgt was allowed while a zero means it is not.
This data structure is useful when you have a transition matrix that represents something like the probability of transitions between states and you want to zero out the value for illegal transitions.
Note
This function has a dependency on numpy. This is an optional dependency for the iobes library and can installed with pip install iobes[mask].
- Parameters:
transitions (List[Transition]) – The list of transitions.
vocabulary (Dict[str, int]) – A mapping of state name to index. This is used to figure out where to place the state value in the mask.
- Returns:
A mask representing the legal and illegal transitions.
- Return type:
np.ndarray
iobes.write
- iobes.write.sort_spans(spans)[source]
Sort a list of spans ordered by where they start.
The idea of ordering for spans is that the earlier in the tags the span appears (where the span starts) the earlier it will appear in the sorted list of spans.
- iobes.write.make_blanks(spans, length=None, fill='O')[source]
Create a list of outside tags that we can populate with tags generated from spans.
- Parameters:
spans (Sequence[Span]) – The list of spans that will eventually be used to populate the tags.
length (int | None) – A pre-specified length for the list of empty tags.
fill (str) – The value that will be used to populate the list of tags.
- Returns:
A list of outside tags.
- Return type:
List[str]
- iobes.write.tags_length_from_spans(spans)[source]
Get the length of the list of tags that would be needed to contain all the spans.
To get a list of tags that are long enough to contain all the spans we need find the tag with the largest end index.
- Parameters:
spans (Sequence[Span]) – The list of spans we need to contain
- Returns:
The length of the tag list needed.
- Return type:
int
- iobes.write.make_tag(token_function, span_type, delimiter='-')[source]
Create a tag from a token function and a span type.
- Parameters:
token_function (str | None) – The token function for the tag, it is the first part of the tag.
span_type (str) – The type of the span this tag is part of it. It is the second part of the tag.
delimiter (str) – A separator character (or sequence of characters) that separate the token_function and the span_type.
- Returns:
The created tag.
- Return type:
str