NAME
    Data::CSel - Select tree node objects using CSS Selector-like syntax

VERSION
    This document describes version 0.128 of Data::CSel (from Perl
    distribution Data-CSel), released on 2022-06-07.

SYNOPSIS
     use Data::CSel qw(csel csel_each);

     # using csel():
     my @cells = csel("Table[name=~/data/i] TCell[value != '']:first", $tree);
     for (@cells) { say $_->value }

     # using csel_each():
     csel_each { say $_->value } "Table[name=~/data/i] TCell[value != '']:first", $tree;

    Using selection object:

     # ditto, but wrap result using a Data::CSel::Selection
     my $res = csel({wrap=>1}, "Table ...", $tree);

     # call method 'foo' of each node object (works even when there are zero nodes
     # in the selection object, or when some nodes do not support the 'foo' method
     $res->foo;

DESCRIPTION
    This module lets you use a query language (hereby named CSel) that is
    similar to CSS Selector to select nodes from a tree of objects.

EXPRESSION SYNTAX
    The following is description of the CSel query expression. It is modeled
    after the CSS Selector syntax with some modification (see "Differences
    with CSS selector").

    An *expression* is a chain of one or more selectors separated by commas.

    A *selector* is a chain of one or more simple selectors separated by
    combinators.

    A *combinator* is either: whitespace (descendant combinator), ">" (child
    combinator), "~" (general sibling combinator), or "+" (adjacent sibling
    combinator). "E F", or two elements combined using descendant
    combinator, means F element descendant of an E element. "E > F" means F
    element child of E element. "E ~ F" means F element preceded by an E
    element. "E + F" means F element immediately preceded by an E element.

    A *simple selector* is either a type selector (see "Type selector") or
    universal selector (see "Universal selector") followed immediately by
    zero or more attribute selectors (see "Attribute selector" or class
    selector (see "Class selector"" in ") or ID selector (see "ID selector")
    or pseudo-classes (see "Pseudo-class"" in "), in any order. Type or
    universal selector is optional if there is at least one attribute
    selector or pseudo-class.

  Type selector
    A *type selector* is a Perl class/package name.

    Example:

     My::Class

    will match any "My::Class" object. Subclasses of "My::Class" will *not*
    be matched, use class selector for that.

  Universal selector
    A *universal selector* is "*" and matches any class/package.

    Example:

     *

    will match any object.

  Attribute selector
    An *attribute selector* filters objects based on the value of their
    attributes. The syntax is:

     [ATTR]
     [ATTR OP LITERAL]

    "[ATTR]" means to only select objects that have an attribute named
    "ATTR", for example:

     [length]

    means to select objects that respond to ("can()") "length()".

    Note: to select objects that do not have a specified attribute, you can
    use the ":not" pseudo-class (see "Pseudo-class"), for example:

     :not([length])

    "[ATTR OP LITERAL]" means to only select objects that have an attribute
    named "ATTR" that has value that matches the expression specified by
    operator "OP" and operand "LITERAL". For example:

     [length > 12]
     [is_done is true]
     [name =~ /foo/]

    Calling methods "ATTR" can also be replaced by "METH()" or
    "METH(LITERAL, ...)" to allow passing arguments to methods. Note that
    this specific syntax:

     [METH()]

    does not simply mean to select objects that respond to "METH", but
    actually:

     [METH() is true]

    For example:

     # select objects that have non-zero length
     [length()]

     # while this means to select objects that have 'length' attribute
     [length]

     # select objects for which the method call returns true
     [has_key('foo')]

    Experimental: a chain of attributes is allowed for the attribute, for
    example:

     [date.month = 12]

    will select only objects that has an attribute "date", and the value of
    "date" is an object that has an attribute "month", and the value of
    "month" is 12. When there is a failure in the chain somewhere (e.g. the
    "date" object does not have the "month" attribute), the whole expression
    evaluates to false.

   Literal
    There are several kinds of literals supported.

    Numbers. Examples:

     1
     -2.3
     4.5e-6

    Boolean:

     true
     false

    Null (undef):

     null

    String. Either single-quoted (only recognizes the escape sequences "\\"
    and "\'"):

     'this is a string'
     'this isn\'t hard'

    or double-quoted (currently recognizes the escape sequences "\\", "\"",
    "\'", "\$" [literal $], "\t" [tab character], "\n" [newline], "\r"
    [linefeed], "\f" [formfeed], "\b" [backspace], "\a" [bell], "\e"
    [escape], "\0" [null], octal escape e.g. "\033", hexadecimal escape e.g.
    "\x1b"):

     "This is a string"
     "This isn't hard"
     "Line 1\nLine 2"

    For convenience, a word string can be unquoted in expression, e.g.:

     [name = ujang]

    is equivalent to:

     [name = 'ujang']

    Regex literal. Must be delimited by "/.../" or "qr(...)", can be
    followed by zero of more regex modifier characters m, s, i):

     //
     /ab(c|d)/i
     qr(foo/bar)

    Array. Examples:

     []
     [1,2,3]
     ["foo", "bar","baz"]

   Operators
    The following are supported operators:

    *   "eq"

        String equality using Perl's "eq" operator.

        Example:

         Table[title eq "TOC"]

        selects all "Table" objects that have "title()" with the value of
        "TOC".

    *   "=" (or "==")

        Numerical equality using Perl's "==" operator.

        Example:

         TableCell[length=3]

        selects all "TableCell" objects that have "length()" with the value
        of 3.

        To avoid common trap, will switch to using Perl's "eq" operator when
        operand does not look like number, e.g.:

         Table[title = 'foo']

        is the same as:

         Table[title eq 'foo']

    *   "ne"

        String inequality using Perl's "ne" operator.

        Example:

         Table[title ne "TOC"]

        selects all "Table" objects that have "title()" with the value not
        equal to "TOC".

    *   "!=" (or "<>")

        Numerical inequality using Perl's "!=" operator.

        Example:

         TableCell[length != 3]
         TableCell[length <> 3]

        selects all "TableCell" objects that have "length()" with the value
        not equal to 3.

        To avoid common trap, will switch to using Perl's "ne" operator when
        operand does not look like number, e.g.:

         Table[title != 'foo']

        is the same as:

         Table[title ne 'foo']

    *   "gt"

        String greater-than using Perl's "gt" operator.

        Example:

         Person[first_name gt "Albert"]

        selects all "Person" objects that have "first_name()" with the value
        asciibetically greater than "Albert".

    *   ">"

        Numerical greater-than using Perl's ">" operator.

        Example:

         TableCell[length > 3]

        selects all "TableCell" objects that have "length()" with the value
        greater than 3.

        To avoid common trap, will switch to using Perl's "gt" operator when
        operand does not look like number, e.g.:

         Person[first_name > 'Albert']

        is the same as:

         Person[first_name gt "Albert"]

    *   "ge"

        String greater-than-or-equal-to using Perl's "ge" operator.

        Example:

         Person[first_name ge "Albert"]

        selects all "Person" objects that have "first_name()" with the value
        asciibetically greater than or equal to "Albert".

    *   ">="

        Numerical greater-than-or-equal-to using Perl's ">=" operator.

        Example:

         TableCell[length >= 3]

        selects all "TableCell" objects that have "length()" with the value
        greater than or equal to 3.

        To avoid common trap, will switch to using Perl's "ge" operator when
        operand does not look like number, e.g.:

         Person[first_name >= 'Albert']

        is the same as:

         Person[first_name ge "Albert"]

    *   "lt"

        String less-than using Perl's "lt" operator.

        Example:

         Person[first_name lt "Albert"]

        selects all "Person" objects that have "first_name()" with the value
        asciibetically less than "Albert".

    *   "<"

        Numerical less-than using Perl's "<" operator.

        Example:

         TableCell[length < 3]

        selects all "TableCell" objects that have "length()" with the value
        less than 3.

        To avoid common trap, will switch to using Perl's "lt" operator when
        operand does not look like number, e.g.:

         Person[first_name < 'Albert']

        is the same as:

         Person[first_name lt "Albert"]

    *   "le"

        String less-than-or-equal-to using Perl's "le" operator.

        Example:

         Person[first_name le "Albert"]

        selects all "Person" objects that have "first_name()" with the value
        asciibetically less than or equal to "Albert".

    *   "<="

        Numerical less-than-or-equal-to using Perl's "<=" operator.

        Example:

         TableCell[length <= 3]

        selects all "TableCell" objects that have "length()" with the value
        less than or equal to 3.

        To avoid common trap, will switch to using Perl's "le" operator when
        operand does not look like number, e.g.:

         Person[first_name <= 'Albert']

        is the same as:

         Person[first_name le "Albert"]

    *   "=~" and "!~"

        Filter only objects where the attribute named *attr* has the value
        matching regular expression *value*. Operand should be a regex
        literal. Regex literal must be delimited by "/.../" or "qr(...)".

        Example:

         Person[first_name =~ /^Al/]

        selects all "Person" objects that have "first_name()" with the value
        matching the regex "/^Al/".

         Person[first_name =~ qr(^al)i]

        Same as previous example except the regex is case-insensitive.

        "!~" is the opposite of "=~", just like in Perl. It checks whether
        *attr* has value that does not match regular expression.

    *   "is" and "isnt"

        Testing truth value or definedness. Value can be null or boolean
        literal.

        Example:

         DateTime[is_leap_year is true]

        will select all DateTime objects where its "is_leap_year" attribute
        has a true value.

         DateTime[is_leap_year is false]

        will select all DateTime objects where its "is_leap_year" attribute
        has a false value.

         Person[age isnt null]

        will select all Person objects where age is defined.

    *   "has" and "hasnt"

        Attribute value must be array. Will evaluate to true if one of the
        elements matches the operand.

        Examples:

         Headline[tags has "tag1"]
         Headline[tags has "tag2"][tags has "tag3"][tags hasnt "tag4"]

    *   "in" and "notin"

        Operand must be array. Will evaluate to true if one of the elements
        of array matches the attribute value.

        Examples:

         Headline[level in [1,2,3]]
         Headline[level not in [1,2]][tags notin ["old","deprecated"]]

  Class selector
    A *class selector* is a "." (dot) followed by Perl class/package name.

     .CLASSNAME

    It selects all objects that "isa()" a certain class. The difference with
    type selector is that inheritance is observed. So:

     .My::Class

    will match instances of "My::Class" as well as subclasses of it.

  ID selector
    An *ID selector* is a "#" (hash) followed by an identifier:

     #ID

    It is a special/shortcut form of attribute selector where the attribute
    is "id" and the operator is "=":

     [id = ID]

    The "csel()" function allows you to configure which attribute to use as
    the ID attribute, the default is "id".

  Pseudo-class
    A *pseudo-class* is ":" (colon) followed by pseudo-class name (a
    dash-separated word list), and optionally a list of arguments enclosed
    in parentheses.

     :PSEUDOCLASSNAME
     :PSEUDOCLASSNAME(ARG, ...)

    It filters result set based on some criteria. Currently supported
    pseudo-classes include:

    *   ":first"

        Select only the first object from the result set.

        Example:

         Person[name =~ /^a/i]:first

        selects the first person whose name starts with the letter "A".

    *   ":last"

        Select only the last item from the result set.

        Example:

         Person[name =~ /^a/i]:last

        selects the last person whose name starts with the letter "A".

    *   ":first-child"

        Select only objects that are the first child of their parent.

    *   ":last-child"

        Select only objects that are the last child of their parent.

    *   ":only-child"

        Select only objects that is the only child of their parent.

    *   ":nth-child(n)"

        Select only objects that are the *n*th child of their parent.

    *   ":nth-last-child(n)"

        Select only objects that are the *n*th last child of their parent.

    *   ":first-of-type"

        Select only objects that are the first child of their parent of
        their type. So if a parent's children is:

         id1(type=T1) id2(T2) id3(T2)

        then both "id1" and "id2" are first children of their respective
        types.

    *   ":last-of-type"

        Select only objects that are the last child of their parent of their
        type.

    *   ":only-of-type"

        Select only objects that are the only child of their parent of their
        type.

    *   ":nth-of-type(n)"

        Select only objects that are the *n*th child of their parent of
        their type.

    *   ":nth-last-of-type(n)"

        Select only objects that are the *n*th last child of their parent of
        their type.

    *   ":root"

        Select only root node(s).

    *   ":has-min-children(m)"

        Select only objects that have at least *m* direct children.

    *   ":has-max-children(n)"

        Select only objects that have at most *n* direct children.

    *   ":has-children-between(m,n)"

        Select only objects that have between *m* and *n* direct children.

    *   ":parent"

        Select the node's parent.

    *   ":empty"

        Select only leaf node(s).

        See also ":has".

    *   :not(S)

        Select all objects not matching selector "S". "S" can be a string or
        an unquoted CSel expression.

        Example:

         :not('.My::Class')
         :not(.My::Class)

        will select all objects that are not of "My::Class" type.

    *   :has(S)

        Select all objects that have a descendant matching selector "S". "S"
        can be a string or an unquoted CSel expression.

        Example:

         :has('T')
         :not(T)

        will select all objects that have a descendant of type "T".

        See also: ":parent".

  Differences with CSS selector
   Type selector can contain double colon ("::")
    Since Perl package names are separated by "::", CSel allows it in type
    selector.

   Syntax of attribute selector is a bit different
    In CSel, the syntax of attribute selector is made simpler and more
    regular.

    There are operators not supported by CSel, but CSel adds more operators
    from Perl. In particular, the whole substring matching operations like
    "[attr^=val]", "[attr$=val]", "[attr*=val]", "[attr~=val]", and
    "[attr|=val]" are replaced with the more flexible regex matching instead
    "[attr =~ /re/]".

   Different pseudo-classes supported
    Some CSS pseudo-classes only make sense for a DOM or a visual browser,
    e.g. ":link", ":visited", ":hover", so they are not supported.

    CSS selector does not sport ":parent".

   There is no concept of CSS namespaces
    CSS namespaces are used when there are foreign elements (e.g. SVG in
    addition to HTML) and one wants to use the same stylesheet for both.
    There is no need for something like this CSel, as we deal with only Perl
    objects.

VARIABLES
@Data::CSel::CLASS_PREFIXES
    Array of namespace prefixes to check when matching type in type selector
    as well as class selector. This is like PATH environment variable in
    Unix shell. For example, if @CLASS_PREFIXES is "["Foo::Bar", "Baz"]",
    then this expression:

     T

    will match class "Foo::Bar::T", or "Baz::T", or "T".

    Note that @Data::CSel::CLASS_PREFIXES is consulted after the
    "class_prefixes" opton in "csel()".

FUNCTIONS
  csel
    Usage:

     $list_or_selection_obj = csel([ \%opts , ] $expr, @tree_nodes)

    Select from tree node objects @tree_nodes using CSel expression $expr.
    Will return a list of mattching node objects (unless when "wrap" option
    is true, in which case will return a Data::CSel::Selection object
    instead). Will die on errors (e.g. syntax error in expression, objects
    not having the required methods, etc).

    A tree node object is any regular Perl object satisfying the following
    criteria: 1) it supports a "parent" method which should return a single
    parent node object, or undef if object is the root node); 2) it supports
    a "children" method which should return a list (or an arrayref) of
    children node objects (where the list/array will be empty for a leaf
    node). Note: you can use Role::TinyCommons::Tree::Node to enforce this
    requirement. Note: the "parent" and "children" method names can actually
    be customized, see options.

    Known options:

    *   class_prefixes => array of str

        Array of namespace prefixes to check when matching type in type
        selector as well as class selector. This is like PATH environment
        variable in Unix shell. For example, if "class_prefixes" is
        "["Foo::Bar", "Baz"]", then this expression:

         T

        will match class "Foo::Bar::T", or "Baz::T", or "T".

        Note that @Data::CSel::CLASS_PREFIXES is also consulted after this
        "class_prefixes" option.

    *   wrap => bool

        If set to true, instead of returning a list of matching nodes, the
        function will return a Data::CSel::Selection object instead (which
        wraps the result, for convenience). See the selection object's
        documentation for more details.

    *   get_parent_method => str

        Example:

         get_parent_method => 'get_parent'

        This option can be used if your node object uses method other than
        the default "parent" to get parent node.

    *   set_parent_method => str

        Example:

         set_parent_method => 'set_parent'

        This option can be used if your node object uses method other than
        the default "parent" to set parent node.

    *   get_children_method => str

        Example:

         get_children_method => 'get_children'

        This option can be used if your node object uses method other than
        the default "children" to get children nodes.

    *   set_children_method => str

        Example:

         set_children_method => 'set_children'

        This option can be used if your node object uses method other than
        the default "children" to set children nodes.

  csel_each
    Usage:

     csel_each { say $_[0]->value } "expr", $tree;
     csel_each { say $_->value    } {csel_opt1=>..., ...}, "expr", $tree1, $tree2;

    Execute callback for every node that matches expression. Basically
    shortcut for:

     my @nodes = csel(...);
     for (@nodes) { $callback->($_) )}

    The callback will retrieve the node either in the first element of @_ or
    in the localized $_ for convenience.

  parse_csel
    Usage:

     $hash = parse_csel($expr);

    Parse an expression. On success, will return a hash containing parsed
    information. On failure, will return undef.

FAQ
  Can I use csel() against a regular data structure (instead of a tree of objects)?
    Use Data::CSel::WrapStruct to create a tree of object from the data
    structure, then perform "csel()" on the resulting tree.

HOMEPAGE
    Please visit the project's homepage at
    <https://metacpan.org/release/Data-CSel>.

SOURCE
    Source repository is at <https://github.com/perlancar/perl-Data-CSel>.

SEE ALSO
  Related to CSS selector
    CSS4 Selectors Specification, <https://www.w3.org/TR/selectors4/>.

    These modules let you use CSS selector syntax (or its subset) to select
    nodes of an HTML document: Mojo::DOM (or DOM::Tiny), jQuery, pQuery,
    HTML::Selector::XPath (or via Web::Query). The last two modules can also
    handle XPath expression.

    CLI to select HTML elements using CSS selector syntax: html-css-sel
    (from App::html::css::sel).

  Similar query languages
    These modules let you use XPath (or XPath-like) syntax to select nodes
    of a data structure: Data::DPath. Like CSS selectors, XPath is another
    query language to select nodes of a document. XPath specification:
    <https://www.w3.org/TR/xpath/>.

    These modules let you use JSONPath syntax to select nodes of a data
    structure: JSON::Path. JSONPath is a query language to select nodes of a
    JSON document (data structure). JSONPath specification:
    <http://goessner.net/articles/JsonPath>.

  Related modules
    Data::CSel::WrapStruct

    CSel::Examples

  Modules that use CSel
    *   For data structure

        CLI to select JSON nodes using CSel: jsonsel (from App::jsonsel).

        CLI to select Perl data structure elements using CSel: ddsel (from
        App::CSelUtils).

        CLI to select YAML nodes using CSel: yamlsel (from App::yamlsel).

    *   For HTML document

        htmlsel (from App::htmlsel).

    *   For Org document

        orgsel (from App::orgsel).

    *   For POD document

        CLI to select POD::Elemental nodes using CSel: podsel (from
        App::podsel).

    *   For PPI (Perl source code tree representation) document

        CLI to select PPI nodes using CSel: ppisel (from App::ppisel).

AUTHOR
    perlancar <perlancar@cpan.org>

CONTRIBUTING
    To contribute, you can send patches by email/via RT, or send pull
    requests on GitHub.

    Most of the time, you don't need to build the distribution yourself. You
    can simply modify the code, then test via:

     % prove -l

    If you want to build the distribution (e.g. to try to install it locally
    on your system), you can install Dist::Zilla,
    Dist::Zilla::PluginBundle::Author::PERLANCAR, and sometimes one or two
    other Dist::Zilla plugin and/or Pod::Weaver::Plugin. Any additional
    steps required beyond that are considered a bug and can be reported to
    me.

COPYRIGHT AND LICENSE
    This software is copyright (c) 2022, 2021, 2020, 2019, 2016 by perlancar
    <perlancar@cpan.org>.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.

BUGS
    Please report any bugs or feature requests on the bugtracker website
    <https://rt.cpan.org/Public/Dist/Display.html?Name=Data-CSel>

    When submitting a bug or request, please include a test-file or a patch
    to an existing test-file that illustrates the bug or desired feature.