folia.main.AbstractSpanAnnotation

class folia.main.AbstractSpanAnnotation(doc, *args, **kwargs)

Bases: AbstractElement, AllowGenerateID, AllowCorrections

Abstract element, all span annotation elements are derived from this class

Method Summary

`__init__`(doc, args, *kwargs)
`accepts`(Class[, raiseexceptions, parentinstance])
`add`(child, args, *kwargs)
`addable`(parent[, set, raiseexceptions])	Tests whether a new element of this class can be added to the parent.
`addidsuffix`(idsuffix[, recursive])	Appends a suffix to this element's ID, and optionally to all child IDs as well.
`addtoindex`([norecurse])	Makes sure this element (and all subelements), are properly added to the index
`ancestor`(*Classes)	Find the most immediate ancestor of the specified type, multiple classes may be specified.
`ancestors`([Class])	Generator yielding all ancestors of this element, effectively back-tracing its path to the root element.
`annotation`(type[, set])	Will return a single annotation (even if there are multiple).
`annotations`(Class[, set])	Obtain annotations.
`annotator2processor`([annotator, ...])	Converts annotator information to processor information (FoLiA v2).
`append`(child, args, *kwargs)	See `AbstractElement.append()`
`checkdeclaration`()	Internal method (usually no need to call this) that checks whether the element's annotation type is properly declared, raises an exception if not so, or auto-declares the annotation type if need be.
`context`(size[, placeholder, scope])	Returns this word in context, {size} words to the left, the current word, and {size} words to the right
`copy`([newdoc, idsuffix])	Make a deep copy of this element and all its children.
`copychildren`([newdoc, idsuffix])	Generator creating a deep copy of the children of this element.
`correct`(**kwargs)	Apply a correction (TODO: documentation to be written still)
`count`(Class[, set, recursive, ignore, node])	Like `AbstractElement.select()`, but instead of returning the elements, it merely counts them.
`deepvalidation`()	Perform deep validation of this element.
`depthfirstsearch`(function)	Generic depth first search algorithm using a callback function, continues as long as the callback function returns None
`description`()	Obtain the description associated with the element.
`elements`([founditems])	Returns a depth-first flat list of all elements below this element
`feat`(subset)	Obtain the feature class value of the specific subset.
`findcorrectionhandling`(cls)	Find the proper correctionhandling given a textclass by looking in the underlying corrections where it is reused
`findreplaceables`(parent[, set])	Internal method to find replaceable elements.
`generate_id`(cls)
`getindex`(child[, recursive, ignore])	Get the index at which an element occurs, recursive by default!
`getmetadata`([key])	Get the metadata that applies to this element, automatically inherited from parent elements
`gettextdelimiter`([retaintokenisation])	Return the text delimiter for this class.
`hasannotation`(Class[, set])	Returns an integer indicating whether such as annotation exists, and if so, how many.
`hasphon`([cls, strict, correctionhandling, ...])	Does this element have phonetic content (of the specified class)
`hastag`(tag)	Check whether a processing tag is present
`hastext`([cls, strict, correctionhandling, ...])	Does this element have text (of the specified class)
`incorrection`()	Is this element part of a correction? If it is, it returns the Correction element (evaluating to True), otherwise it returns None
`insert`(index, child, args, *kwargs)
`items`([founditems])	Returns a depth-first flat list of all items below this element (not limited to AbstractElement)
`json`([attribs, recurse, ignorelist])	Serialises the FoLiA element and all its contents to a Python dictionary suitable for serialisation to JSON.
`layer`()	Return the annotation layer this annotation pertains to
`leftcontext`(size[, placeholder, scope])	Returns the left context for an element, as a list.
`move`(newdoc[, idsuffix])	Move elements of one document to another
`next`([Class, scope, reverse])	Returns the next element, if it is of the specified type and if it does not cross the boundary of the defined scope.
`originaltext`([cls])	Alias for retrieving the original uncorrect text.
`parsecommonarguments`(doc, **kwargs)	Internal function to parse common FoLiA attributes and sets up the instance accordingly.
`parsexml`(node, doc, **kwargs)	Internal class method used for turning an XML element into an instance of the Class.
`phon`([cls, previousdelimiter, strict, ...])	Get the phonetic representation associated with this element (of the specified class)
`phoncontent`([cls, correctionhandling, hidden])	Get the phonetic content explicitly associated with this element (of the specified class).
`postappend`()	This method will be called after an element is added to another and does some checks.
`precedes`(other)	Returns a boolean indicating whether this element precedes the other element
`previous`([Class, scope])	Returns the previous element, if it is of the specified type and if it does not cross the boundary of the defined scope.
`relaxng`([includechildren, extraattribs, ...])	Returns a RelaxNG definition for this element (as an XML element (lxml.etree) rather than a string)
`relaxng_backwards`()	internal helper function for backward compatibility
`remove`(child)	Removes the child element
`replace`(child, args, *kwargs)	Appends a child element like `append()`, but replaces any existing child element of the same type and set.
`resolveoffsets`(begin, end[, ...])	Resolves supplied character offset information and returns tokens (non-token structures like linebreaks etc are ignored!).
`resolveword`(id)
`rightcontext`(size[, placeholder, scope])	Returns the right context for an element, as a list.
`select`(Class[, set, recursive, ignore, node])	Select child elements of the specified class.
`setdoc`(newdoc[, idsuffix])	Set a different document and handles setting an id suffix.
`setdocument`(doc)	Associate a document with this element.
`setparents`()	Correct all parent relations for elements within the scop.
`setprocessor`(processor)	Sets the processor for this element, taking care of adding an annotator in the declarations
`setspan`(*args)	Sets the span of the span element anew, erases all data inside.
`settext`(text[, cls])	Set the text for this element.
`sort`([force])	Sort children (wrefs and child spans) in order of appearance.
`speech_speaker`()	Retrieves the speaker of the audio or video file associated with the element.
`speech_src`()	Retrieves the URL/filename of the audio or video file associated with the element.
`stricttext`([cls])	Alias for `text()` with `strict=True`
`substitute`(oldchild, newchild, args, *kwargs)	Substitutes a particular child element with another.
`tag`(tag)	Add a processing tag
`text`([cls, retaintokenisation, ...])	Get the text associated with this element (of the specified class)
`textcontent`([cls, correctionhandling, hidden])	Get the text content explicitly associated with this element (of the specified class).
`textvalidation`([warnonly, trim_spaces])	Run text validation on this element.
`toktext`([cls])	Alias for `text()` with `retaintokenisation=True`
`untag`(tag)	Remove a processing tag
`updatetext`()	Recompute textual value based on the text content of the children.
`wrefs`([index, recurse])	Returns a list of word references, these can be Words but also Morphemes or Phonemes.
`xml`([attribs, elements, skipchildren, form])	See `AbstractElement.xml()`
`xmlstring`([pretty_print, form])	Serialises this FoLiA element and all its contents to XML.
`__iter__`()	Iterate over all children of this element.
`__len__`()	Returns the number of child elements under the current element.
`__str__`()	Alias for `text()`

Class Attributes

ACCEPTED_DATA = (<class 'folia.main.AbstractInlineAnnotation'>, <class 'folia.main.Comment'>, <class 'folia.main.Description'>, <class 'folia.main.ForeignData'>, <class 'folia.main.LinkReference'>, <class 'folia.main.Metric'>, <class 'folia.main.Relation'>)

ANNOTATIONTYPE = None

AUTH = True

AUTO_GENERATE_ID = False

HIDDEN = False

IMPLICITSPACE = False

OCCURRENCES = 0

OCCURRENCES_PER_SET = 0

OPTIONAL_ATTRIBS = (0, 1, 2, 4, 3, 5, 8, 6, 7, 9, 10, 11, 14)

PHONCONTAINER = False

PRIMARYELEMENT = True

PRINTABLE = True

REQUIRED_ATTRIBS = None

REQUIRED_DATA = None

SETONLY = False

SPEAKABLE = True

SUBSET = None

TEXTCONTAINER = False

TEXTDELIMITER = None

WREFABLE = False

XLINK = False

XMLTAG = None

Method Details

__init__(doc, *args, **kwargs)

__init__(doc, *args, **kwargs)

classmethod accepts(Class, raiseexceptions=True, parentinstance=None)

add(child, *args, **kwargs)

classmethod addable(parent, set=False, raiseexceptions=True)

Tests whether a new element of this class can be added to the parent.

This method is mostly for internal use. This will use the OCCURRENCES property, but may be overidden by subclasses for more customised behaviour.

Parameters:

parent (AbstractElement) – The element that is being added to
set (str,None, or False) – The set
raiseexceptions (bool) – Raise an exception if the element can’t be added?

Returns:

bool

Raises:

ValueError –

addidsuffix(idsuffix, recursive=True): Appends a suffix to this element’s ID, and optionally to all child IDs as well. There is sually no need to call this directly, invoked implicitly by copy()

addtoindex(norecurse=None): Makes sure this element (and all subelements), are properly added to the index

ancestor(*Classes)

Find the most immediate ancestor of the specified type, multiple classes may be specified. Raise a NoSuchAnnotation exception if not found.

Parameters:: Classes (*) – The possible classes (AbstractElement or subclasses) to select from. Not instances!

Example:

paragraph = word.ancestor(folia.Paragraph)

ancestors(Class=None)

Generator yielding all ancestors of this element, effectively back-tracing its path to the root element. A tuple of multiple classes may be specified.

Parameters:: *Class – The class or (tuple of) classes (AbstractElement or subclasses). Not instances!
Yields:: elements (instances derived from AbstractElement)

annotation(type, set=False): Will return a single annotation (even if there are multiple). Raises a NoSuchAnnotation exception if none was found

annotations(Class, set=False)

Obtain annotations. Very similar to select() but raises an error if the annotation was not found.

Parameters:

retrieve (* set - The set you want to) –
retrieve –

Yields:

elements

Raises:

NoSuchAnnotation` if the specified annotation does not exist –

annotator2processor(annotator=None, annotatortype=None, parentprocessor=None): Converts annotator information to processor information (FoLiA v2). Can be called with arguments to override defaults.

append(child, *args, **kwargs): See AbstractElement.append()

checkdeclaration(): Internal method (usually no need to call this) that checks whether the element’s annotation type is properly declared, raises an exception if not so, or auto-declares the annotation type if need be.

context(size, placeholder=None, scope=None): Returns this word in context, {size} words to the left, the current word, and {size} words to the right

copy(newdoc=None, idsuffix='')

Make a deep copy of this element and all its children.

Parameters:

newdoc (Document) – The document the copy should be associated with.
idsuffix (str or bool) – If set to a string, the ID of the copy will be append with this (prevents duplicate IDs when making copies for the same document). If set to True, a random suffix will be generated.

Returns:

a copy of the element

copychildren(newdoc=None, idsuffix=''): Generator creating a deep copy of the children of this element. If idsuffix is a string, if set to True, a random idsuffix will be generated including a random 32-bit hash

correct(**kwargs): Apply a correction (TODO: documentation to be written still)

count(Class, set=False, recursive=True, ignore=True, node=None)

Like AbstractElement.select(), but instead of returning the elements, it merely counts them.

Returns:: int

deepvalidation()

Perform deep validation of this element.

Raises:: DeepValidationError –

depthfirstsearch(function): Generic depth first search algorithm using a callback function, continues as long as the callback function returns None

description()

Obtain the description associated with the element.

Raises:: NoSuchAnnotation –

elements(founditems=None): Returns a depth-first flat list of all elements below this element

feat(subset)

Obtain the feature class value of the specific subset.

If a feature occurs multiple times, the values will be returned in a list.

Example:

sense = word.annotation(folia.Sense)
synset = sense.feat('synset')

Returns:: str or list

findcorrectionhandling(cls): Find the proper correctionhandling given a textclass by looking in the underlying corrections where it is reused

classmethod findreplaceables(parent, set=False, **kwargs): Internal method to find replaceable elements. Auxiliary function used by AbstractElement.replace(). Can be overriden for more fine-grained control.

generate_id(cls)

getindex(child, recursive=True, ignore=True)

Get the index at which an element occurs, recursive by default!

Returns:: int

getmetadata(key=None): Get the metadata that applies to this element, automatically inherited from parent elements

gettextdelimiter(retaintokenisation=False)

Return the text delimiter for this class.

Uses the TEXTDELIMITER attribute but may return a customised one instead.

hasannotation(Class, set=False): Returns an integer indicating whether such as annotation exists, and if so, how many. See annotations() for a description of the parameters.

hasphon(cls='current', strict=True, correctionhandling=1, hidden=False)

Does this element have phonetic content (of the specified class)

By default, and unlike phon(), this checks strictly, i.e. the element itself must have the phonetic content and it is not inherited from its children.

Parameters:

cls (str) – The class of the phonetic content to obtain, defaults to current.
strict (bool) – Set this if you are strictly interested in the phonetic content explicitly associated with the element, without recursing into children. Defaults to True.
correctionhandling – Specifies what phonetic content to check for when corrections are encountered. The default is CorrectionHandling.CURRENT, which will retrieve the corrected/current phonetic content. You can set this to CorrectionHandling.ORIGINAL if you want the phonetic content prior to correction, and CorrectionHandling.EITHER if you don’t care.

Returns:

bool

hastag(tag): Check whether a processing tag is present

hastext(cls='current', strict=True, correctionhandling=1, hidden=False)

Does this element have text (of the specified class)

By default, and unlike text(), this checks strictly, i.e. the element itself must have the text and it is not inherited from its children.

Parameters:

cls (str) – The class of the text content to obtain, defaults to current.
strict (bool) – Set this if you are strictly interested in the text explicitly associated with the element, without recursing into children. Defaults to True.
correctionhandling – Specifies what text to check for when corrections are encountered. The default is CorrectionHandling.CURRENT, which will retrieve the corrected/current text. You can set this to CorrectionHandling.ORIGINAL if you want the text prior to correction, and CorrectionHandling.EITHER if you don’t care.

Returns:

bool

incorrection(): Is this element part of a correction? If it is, it returns the Correction element (evaluating to True), otherwise it returns None

insert(index, child, *args, **kwargs)

items(founditems=None): Returns a depth-first flat list of all items below this element (not limited to AbstractElement)

json(attribs=None, recurse=True, ignorelist=False)

Serialises the FoLiA element and all its contents to a Python dictionary suitable for serialisation to JSON.

Example:

import json
json.dumps(word.json())

Returns:: dict

layer(): Return the annotation layer this annotation pertains to

leftcontext(size, placeholder=None, scope=None): Returns the left context for an element, as a list. This method crosses sentence/paragraph boundaries by default, which can be restricted by setting scope

move(newdoc, idsuffix='')

Move elements of one document to another

Parameters:

newdoc (Document) – The document the copy should be associated with.
idsuffix (str or bool) – If set to a string, the ID of the copy will be append with this (prevents duplicate IDs when making copies for the same document). If set to True, a random suffix will be generated.

Returns:

a copy of the element

next(Class=True, scope=True, reverse=False)

Returns the next element, if it is of the specified type and if it does not cross the boundary of the defined scope. Returns None if no next element is found. Non-authoritative elements are never returned.

Parameters:

Class (*) – The class to select; any python class subclassed off ‘AbstractElement`, may also be a tuple of multiple classes. Set to True to constrain to the same class as that of the current instance, set to None to not constrain at all
scope (*) – A list of classes which are never crossed looking for a next element. Set to True to constrain to a default list of structure elements (Sentence,Paragraph,Division,Event, ListItem,Caption), set to None to not constrain at all.

originaltext(cls='original')

Alias for retrieving the original uncorrect text.

A call to text() with correctionhandling=CorrectionHandling.ORIGINAL

parsecommonarguments(doc, **kwargs): Internal function to parse common FoLiA attributes and sets up the instance accordingly. Do not invoke directly.

classmethod parsexml(node, doc, **kwargs)

Internal class method used for turning an XML element into an instance of the Class.

Parameters:

Element (* node - XML) –
Document (* doc -) –

Returns:

An instance of the current Class.

phon(cls='current', previousdelimiter='', strict=False, correctionhandling=1, hidden=False)

Get the phonetic representation associated with this element (of the specified class)

The phonetic content will be constructed from child-elements whereever possible, as they are more specific. If no phonetic content can be obtained from the children and the element has itself phonetic content associated with it, then that will be used.

Parameters:

cls (str) – The class of the phonetic content to obtain, defaults to current.
retaintokenisation (bool) – If set, the space attribute on words will be ignored, otherwise it will be adhered to and phonetic content will be detokenised as much as possible. Defaults to False.
previousdelimiter (str) – Can be set to a delimiter that was last outputed, useful when chaining calls to phon(). Defaults to an empty string.
strict (bool) – Set this if you are strictly interested in the phonetic content explicitly associated with the element, without recursing into children. Defaults to False.
correctionhandling – Specifies what phonetic content to retrieve when corrections are encountered. The default is CorrectionHandling.CURRENT, which will retrieve the corrected/current phonetic content. You can set this to CorrectionHandling.ORIGINAL if you want the phonetic content prior to correction, and CorrectionHandling.EITHER if you don’t care.
hidden (bool) – Include hidden elements, defaults to False.

Example:

word.phon()

Returns:: The phonetic content of the element (unicode instance in Python 2, str in Python 3)
Raises:: NoSuchPhon – if no phonetic conent is found at all.

See also

phon() textcontent() text()

postappend()

This method will be called after an element is added to another and does some checks.

It can do extra checks and if necessary raise exceptions to prevent addition. By default makes sure the right document is associated.

This method is mostly for internal use.

precedes(other): Returns a boolean indicating whether this element precedes the other element

previous(Class=True, scope=True)

Returns the previous element, if it is of the specified type and if it does not cross the boundary of the defined scope. Returns None if no next element is found. Non-authoritative elements are never returned.

Parameters:

Class (*) – The class to select; any python class subclassed off ‘AbstractElement`, may also be a tuple of multiple classes. Set to True to constrain to the same class as that of the current instance, set to None to not constrain at all
scope (*) – A list of classes which are never crossed looking for a next element. Set to True to constrain to a default list of structure elements (Sentence,Paragraph,Division,Event, ListItem,Caption), set to None to not constrain at all.

classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None, origclass=None): Returns a RelaxNG definition for this element (as an XML element (lxml.etree) rather than a string)

classmethod relaxng_backwards(): internal helper function for backward compatibility

remove(child): Removes the child element

replace(child, *args, **kwargs)

Appends a child element like append(), but replaces any existing child element of the same type and set. If no such child element exists, this will act the same as append()

Keyword Arguments:

alternative (bool) – If set to True, the replaced element will be made into an alternative. Simply use AbstractElement.append() if you want the added element
alternative. (to be an) –

See AbstractElement.append() for more information and all parameters.

resolveoffsets(begin, end, retaintokenisation=True, strictend=True, cls='current'): Resolves supplied character offset information and returns tokens (non-token structures like linebreaks etc are ignored!). Note: offsets are zero-indexed and the end is non-inclusive!

resolveword(id)

rightcontext(size, placeholder=None, scope=None): Returns the right context for an element, as a list. This method crosses sentence/paragraph boundaries by default, which can be restricted by setting scope

select(Class, set=False, recursive=True, ignore=True, node=None)

Select child elements of the specified class.

A further restriction can be made based on set.

Parameters:

Class (class) – The class to select; any python class (not instance) subclassed off AbstractElement
Set (str) – The set to match against, only elements pertaining to this set will be returned. If set to False (default), all elements regardless of set will be returned.
recursive (bool) – Select recursively? Descending into child elements? Defaults to True.
ignore – A list of Classes to ignore, if set to True instead of a list, all non-authoritative elements will be skipped (this is the default behaviour and corresponds to the following elements: Alternative, AlternativeLayers, Suggestion, and folia.Original. These elements and those contained within are never authorative. You may also include the boolean True as a member of a list, if you want to skip additional tags along the predefined non-authoritative ones.
node (*) – Reserved for internal usage, used in recursion.

Yields:

Elements (instances derived from AbstractElement)

Example:

for sense in text.select(folia.Sense, 'cornetto', True, [folia.Original, folia.Suggestion, folia.Alternative] ):
    ..

setdoc(newdoc, idsuffix=''): Set a different document and handles setting an id suffix. Usually no need to call this directly, invoked implicitly by copy()

setdocument(doc)

Associate a document with this element.

Parameters:: doc (Document) – A document

Each element must be associated with a FoLiA document.

setparents(): Correct all parent relations for elements within the scop. There is sually no need to call this directly, invoked implicitly by copy()

setprocessor(processor): Sets the processor for this element, taking care of adding an annotator in the declarations

setspan(*args)

Sets the span of the span element anew, erases all data inside.

Parameters:: *args – Instances of Word, Morpheme or Phoneme

settext(text, cls='current')

Set the text for this element.

Parameters:

text (str) – The text
cls (str) – The class of the text, defaults to current (leave this unless you know what you are doing). There may be only one text content element of each class associated with the element.

sort(force=False): Sort children (wrefs and child spans) in order of appearance. Returns True if sort is successful (or not needed), False if sort could not be performed at this stage

speech_speaker()

Retrieves the speaker of the audio or video file associated with the element.

The source is inherited from ancestor elements if none is specified. For this reason, always use this method rather than access the src attribute directly.

Returns:: str or None if not found

speech_src()

Retrieves the URL/filename of the audio or video file associated with the element.

The source is inherited from ancestor elements if none is specified. For this reason, always use this method rather than access the src attribute directly.

Returns:: str or None if not found

stricttext(cls='current'): Alias for text() with strict=True

substitute(oldchild, newchild, *args, **kwargs)

Substitutes a particular child element with another. The child element can be specified like with append(). Unlike the replace() function, here you specify explicitly the old child elements, and it can be any child element.

Parameters:: oldchild – The child instance to replace

See AbstractElement.append() for more information and all parameters.

tag(tag): Add a processing tag

text(cls='current', retaintokenisation=False, previousdelimiter='', strict=False, correctionhandling=1, normalize_spaces=False, hidden=False, trim_spaces=True)

Get the text associated with this element (of the specified class)

The text will be constructed from child-elements whereever possible, as they are more specific. If no text can be obtained from the children and the element has itself text associated with it, then that will be used.

Parameters:

cls (str) – The class of the text content to obtain, defaults to current.
retaintokenisation (bool) – If set, the space attribute on words will be ignored, otherwise it will be adhered to and text will be detokenised as much as possible. Defaults to False.
previousdelimiter (str) – Can be set to a delimiter that was last outputed, useful when chaining calls to text(). Defaults to an empty string.
strict (bool) – Set this if you are strictly interested in the text explicitly associated with the element, without recursing into children. Defaults to False.
correctionhandling – Specifies what text to retrieve when corrections are encountered. The default is CorrectionHandling.CURRENT, which will retrieve the corrected/current text. You can set this to CorrectionHandling.ORIGINAL if you want the text prior to correction, and CorrectionHandling.EITHER if you don’t care.
normalize_spaces (bool) – Return the text with multiple spaces, linebreaks, tabs normalized to single spaces
trim_spaces (bool) – Trim leading and trailing spaces, this is default behaviour since FoLiA v2.4.1 and should only be set to False for compatibility with older documents
hidden (bool) – Include hidden elements, defaults to False.

Example:

word.text()

Returns:: The text of the element (unicode instance in Python 2, str in Python 3)
Raises:: NoSuchText – if no text is found at all.

textcontent(cls='current', correctionhandling=1, hidden=False)

Get the text content explicitly associated with this element (of the specified class).

Unlike text(), this method does not recurse into child elements (with the sole exception of the Correction/New element), and it returns the TextContent instance rather than the actual text!

Parameters:

cls (str) – The class of the text content to obtain, defaults to current.
correctionhandling – Specifies what content to retrieve when corrections are encountered. The default is CorrectionHandling.CURRENT, which will retrieve the corrected/current content. You can set this to CorrectionHandling.ORIGINAL if you want the content prior to correction, and CorrectionHandling.EITHER if you don’t care.
hidden (bool) – Include hidden elements, defaults to False.

Returns:

The phonetic content (TextContent)

Raises:

NoSuchText –