npRexp plugin

About this plugin

This plugin allow you to use one or more regular expresion objects in order to search and replace text using regular expresions on your publications. The plugin provide also some globals actions to search, split, replace text using regular expresions and more.

These global actions not need the interaction of a previosly created Rexp object instance. But creating this you can get all the power of the regular expresions search and replace. The plugin allow you to modify the r.e. engine behaviour, get and set r.e. modifiers, use all kind of metachars, etc.

The plugin provide various samples, and an introduction to the regular expresions syntax. The regular expresions appear to be difficult to the novices, but since you get the trick, you have on your hands a very potential search and replace text tool. So it's time to learn!

Thirparty

This plugin are imposible without the aid of this people:

Thanks a lot!

Plugin actions index

npRexpCreate

Create a new instance of an Rexp object. The result variable store the numeric ID of the new created Rexp object instance. You need this ID on other plugins actions.

↑↑

npRexpDestroy

Destroy a previously created instanace of an Rexp object. The result variable store "True" if everything is OK, or "False" if not. In this last case the [LastError] contain information about the error.

↑↑

npRexpDestroyAll

Destroy all previously created instances of Rexp objects.

↑↑

npRexpStr

Specify the expresion string for an Rexp object instance. With this action you can specify the regular expresion used later with npRexpExec, npRexpExecPos and/or npRexpExecNext actions. The result variable store "True" if everything is OK, or "False" if not. In this last case the [LastError] contain information about the error.

↑↑

npRexpExec

Execute the Rexp object instance in order to match the expresion on the provided string. Execute the regular expresion engine in order to match the regular expresion previously set with npRexpStr agains the provided string.

The result variable store "True" if the regular expresion is found on the string, or "False" if not, or in case of some error. In this last case the [LastError] contain information about the possible error. Once you execute this action you can execute npRexpExecNext until the regular expresion is found on the string.

So is easy to iterate over the string searching for succesives occurences of the regular expresion that we are use. The below is extracted from the "Exec and Exec Next.pub" plugin sample:

.Specify the regular expresion we want to search
npRexpStr "[ID]" "[Expresion]" "[Result]"

.Execute the first time the regular expresion engine
npRexpExec "[ID]" "[Text]" "[Result]"

.If the last action result is True, we are found the regexp
If "[Result]" "=" "True"

  .So we want to iterate over all occurences of the regexp
  While "[Result]" "=" "True"
  
    SetVar "[Count]" "[Count] + 1"

    .Retrieve the matched string	
    npRexpMatchstr "[ID]" "0" "[Str]"
    SetVar "[Results]" "[Results][#34][Str][#34] found"
    
	.Retrieve the matched string position
    npRexpMatchPos "[ID]" "0" "[Pos]"
    SetVar "[Results]" "[Results] at [Pos] pos"

	.Retrieve the matched string lenght
    npRexpMatchLen "[ID]" "0" "[Len]"
    SetVar "[Results]" "[Results] with [Len] length[#13]"

	.Execute again the regular expresion engine over the same regexp
	.begignning from the position of the last regexp found on the string
    npRexpExecNext "[ID]" "[Result]"
    
	.The while is looping until we found more regexp occurences
	.When no more occurences is found, the last action result is "False"
  EndWhile
  
Else

  AlertBox "Info" "No results found"
EndIf

↑↑

npRexpExecPos

Execute the Rexp object instance in order to match the expresion on the provided string beginning from specified position. This action is similar to npRexpExec, but this begin the search from a specified position on the string. In fact is possible to iterate over the string looking for a regular expresion ocurrences using this action, but npRexpExecNext doing this for you.

The result variable store "True" if the regular expresion is found on the string, or "False" if not, or in case of some error. In this last case the [LastError] contain information about the possible error.

↑↑

npRexpExecNext

Perform the next search (the first is mandatory) on a Rexp object instance. Before you use this action you need to call prior npRexpExec or npRexpExecPos. In fact this action find for the regular expresion on the string beginning from the last ocurrence position.

The result variable store "True" if another ocurrence of the regular expresion is found, or "False" when not. Also the result variable can store "False" if an error occurr. In this last case the [LastError] contain information about the possible error.

↑↑

npRexpMatchStr

Retrieve the matched string resulting of the last Rexp object instance execution. Once you use npRexpExec, npRexpExecPos and/or npRexpExecNext, this action is at your disposition to get the current regular expresion ocurrences.

The 0 (zero) index can always used to retrieve the whole matched string. When a regular expresion contain subexpresions, the index is 1 for the first subexpresion, 2 for the second, 3 for the third, and so on.

If you attempt to retrieve a matched subexpression with the wrong index (for example when the regular expresion is not found) this action result variable store an empty string. See the npRexpExec action for an example of use, take a look at the plugin samples and dont forget the Regular Expresions syntax help section.

↑↑

npRexpMatchPos

Retrieve the matched string position resulting of the last Rexp object instance execution. Once you use npRexpExec, npRexpExecPos and/or npRexpExecNext, this action is at your disposition to get the current regular expresion ocurrence position. For more information see npRexpMatchstr.

If you attempt to retrieve a matched string position with the wrong index (for example when the regular expresion is not found) this action result variable store an "-1" value. See the npRexpExec action for an example of use, take a look at the plugin samples and dont forget the Regular Expresions syntax help section.

↑↑

npRexpMatchLen

Retrieve the matched string length resulting of the last Rexp object instance execution. Once you use npRexpExec, npRexpExecPos and/or npRexpExecNext, this action is at your disposition to get the current regular expresion ocurrence length. For more information see npRexpMatchstr.

If you attempt to retrieve a matched string length with the wrong index (for example when the regular expresion is not found) this action result variable store an "-1" value. See the npRexpExec action for an example of use, take a look at the plugin samples and dont forget the Regular Expresions syntax help section.

↑↑

npRexpSubCount

Retrieve the number of subexpressions has been found in last Rexp object instance execution. For regular expresions that contain subexpresions, this action result variable store the number of subexpresions found. When no subexpresion is found this action result store "0" (zero). For more information about subexpresions carefully see the Regular Expresions syntax help section.

↑↑

npRexpSubstitute

Returns the template with tokens replaced by occurence of subexpression. Use this action to provide a template in order to get replaced with the matched regular expresions found values. To replace the whole regular expresion ocurrence on the template you can use on this "$&" or "$0".

In order to replace the regular expresion subexpresions you can use "$n", when "n" is the subexpresion number. If you need to use the "$" simbol on your template, use the "\" scape character, for example, "\$". If you need to use raw numbers after "$n" you can embrace with curly braces "{}", for example, "a$12bc" -> 'a(Match[12])bc' - 'a${1}2bc' -> "a(Match[1])2bc".

↑↑

npRexpOptions

Specify general options of an Rexp object instance. You can determine here the values to SpaceChars, WordChars, LineSeparators, LinePairedSeparator, regular expresions engine properties. See Regular Expresions syntax help section. Below you can view the default values for this properties (without quotes):

if You need Unix-styled line separators (only \n), then use: LineSeparators = "#$a" and LinePairedSeparator = "".

↑↑

npRexpGetMod

Get the currently value of the regular expresion modifiers on an Rexp object instance. By default the modifiers string is "gsr-imx", so use "g", "s" and "r" modifiers, and deactive "i", "m" and "x". You can set the regular expresion modifiers using npRexpSetMod. For more information about supported modifiers see the Regular Expresions syntax help section.

↑↑

npRexpSetMod

Set the currently value for the regular expresion modifiers on an Rexp object instance. If for example you provide a modifiers string like "i", you activate the "case sensitive" mode, and leave the rest of modifiers without changes. You can get the currently used regular expresion modifiers using npRexpGetMod. For more information about supported modifiers see the Regular Expresions syntax help section.

↑↑

npRexpSplit

Split a string into parts using regular expresions. This action is similar to the native "StrParse" NeoBook action, but allowing the use of a regular expresion in order to separate the string parts. Note that you can use this action without create any instance of a Rexp object. The result variable store the string parts as an NeoBook array, and the result count variable store the number of items of the resulted array.

↑↑

npRexpQuote

Replace all metachars on a regular expresion with its safe representation. This action is useful when a user can write regular expresions in order to use. Suposing this regular expresion string: "abc$cd.(" this action convert into "abc\$cd\.\(" In order to scape the "$", the "." and the "(·.

↑↑

npRexpSearch

Search on the string using regular expresions. This action allow you to find if a regular expresion is found or not on a string. Note you not need to create any Rexp object before using this action. The result variable store "True" if the regular expresion is found almost one time on the string, or "False" if not.

↑↑

npRexpReplace

Replace one strings with others using regular expresions. If the "use substitution" variable is "True", the replace string can contain a template in order to be replaced with the appropiate regular expresions ocurrences. See npRexpSubstitute. Note you not need to create any Rexp object before using this action. The result variable store the source string with the appropiate regular expresions ocurrences replaced with the replaced string or the substitution template.

↑↑

 

Regular Expresions syntax

Since npRexp plugin use the RegExpr library writen by Andrey V. Sorokin, I recommend you to use also the Regular Expresion Studio, in order to test and play with regular expresions before you try it with the plugin. This program is freeware and allow you to understand more about regular expresion, how to use it and also test your own regular expresions in a quick manner.

Please, note this regular expresion introduction is practically a copy (with minor modifications, but not enhancements) of the regular expresion introduction you can found on the RegExpr website. I use this introduction here with the express permission of the author, Andrey V. Sorokin.

Simple matches

Any single character matches itself, unless it is a metacharacter with a special meaning described below.

A series of characters matches that series of characters in the target string, so the pattern "bluh" would match "bluh" in the target string. Quite simple, eh?

You can cause characters that normally function as metacharacters or escape sequences to be interpreted literally by "escaping" them by preceding them with a backslash "\", for instance: metacharacter "^" match beginning of string, but "\^" match character "^", "\\" match "\" and so on.

Examples:

foobar Matchs string "foobar"

\^FooBarPtr Matchs "^FooBarPtr"

↑↑

Escape sequences

Characters may be specified using a escape sequences syntax much like that used in C and Perl: "\n" matches a newline, "\t" a tab, etc. More generally, "\xnn", where "nn" is a string of hexadecimal digits, matches the character whose ASCII value is "nn". If You need wide (Unicode) character code, You can use '\x{nnnn}', where 'nnnn' - one or more hexadecimal digits. Note that for now Unicode is supported but not alowed due to NeoBook support Unicode.

Examples:

foo\x20bar Matchs "foo bar" (note space in the middle)

\tfoobar Matchs "foobar" predefined by tab

↑↑

Character classes

You can specify a character class, by enclosing a list of characters in "[]", which will match any one character from the list.

If the first character after the "[" is "^", the class matches any character not in the list.

foob[aeiou]r finds strings "foobar", "foober" etc. but not "foobbr", "foobcr" etc.

foob[^aeiou]r  find strings "foobbr", "foobcr" etc. but not "foobar", "foober" etc.

Within a list, the "-" character is used to specify a range, so that "a-z" represents all characters between "a" and "z", inclusive.

If You want "-" itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. If You want "]" you may place it at the start of list or escape it with a backslash.

Examples:

[-az] Matchs "a", "z" and "-"

[az-] Matchs "a", "z" and "-"

[a\-z] Matchs "a", "z" and "-"

[a-z] Matchs all twenty six small characters from "a" to "z"

[\n-\x0D] Matchs any of #10, #11, #12, #13

[\d-t] Matchs any digit, "-" or "t"

[]-a] Matchs any char from "]".."a"

↑↑

Metacharacters

Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below.

Metacharacters - Line separators

Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below.

Examples:

^foobar Matchs string "foobar" only if it's at the beginning of line

foobar$ Matchs string "foobar" only if it's at the end of line

^foobar$ Matchs string "foobar" only if it's the only string in line

foob.r Matchs strings like "foobar", "foobbr", "foob1r" and so on

The "^" metacharacter by default is only guaranteed to match at the beginning of the input string/text, the "$" metacharacter only at the end. Embedded line separators will not be matched by "^" or "$".

You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any line separator within the string, and "$" will match before any line separator. You can do this by switching On the modifier /m.

The \A and \Z are just like "^" and "$", except that they won't match multiple times when the modifier /m is used, while "^" and "$" will match at every internal line separator.

The "." metacharacter by default matches any character, but if You switch Off the modifier /s, then '.' won't match embedded line separators.

RegExpr works with line separators as recommended at www.unicode.org ( http://www.unicode.org/unicode/reports/tr18/ ):

"^" is at the beginning of a input string, and, if modifier /m is On, also immediately following any occurrence of \x0D\x0A or \x0A or \x0D (if You are using Unicode version of RegExpr, then also \x2028 or \x2029 or \x0B or \x0C or \x85). Note that there is no empty line within the sequence \x0D\x0A.

"$" is at the end of a input string, and, if modifier /m is On, also immediately preceding any occurrence of \x0D\x0A or \x0A or \x0D (if You are using Unicode version of RegExpr, then also \x2028 or \x2029 or \x0B or \x0C or \x85). Note that there is no empty line within the sequence \x0D\x0A.

"." Matchs any character, but if You switch Off modifier /s then "." doesn't match \x0D\x0A and \x0A and \x0D (if You are using Unicode version of RegExpr, then also \x2028 and \x2029 and \x0B and \x0C and \x85).

Note that "^.*$" (an empty line pattern) doesnot match the empty string within the sequence \x0D\x0A, but Matchs the empty string within the sequence \x0A\x0D.

Multiline processing can be easely tuned for your own purpose with help of npRexpOptions "LineSeparators" and "LinePairedSeparator", You can use only Unix style separators \n or only DOS/Windows style \r\n or mix them together (as described above and used by default) or define your own line separators!

↑↑

Metacharacters - Predefined classes

You may use \w, \d and \s within custom character classes.

Examples:

foob\dr Matchs strings like "foob1r", "foob6r" and so on but not "foobar", "foobbr" and so on

foob[\w\s]r Matchs strings like "foobar", 'foob r", "foobbr" and so on but not "foob1r", "foob=r" and so on

You can use npRexpOptions to set "SpaceChars" and "WordChars" to define character classes \w, \W, \s, \S, so you can easely redefine it.

↑↑

Metacharacters - Word boundaries

A word boundary (\b) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W.

↑↑

Metacharacters - Iterators

Any item of a regular expression may be followed by another type of metacharacters - iterators. Using this metacharacters You can specify number of occurences of previous character, metacharacter or subexpression.

So, digits in curly brackets of the form {n,m}, specify the minimum number of times to match the item n and the maximum m. The form {n} is equivalent to {n,n} and matches exactly n times. The form {n,} matches n or more times. There is no limit to the size of n or m, but large numbers will chew up more memory and slow down r.e. execution.

If a curly bracket occurs in any other context, it is treated as a regular character.

Examples:

foob.*r Matchs strings like "foobar",  "foobalkjdflkj9r" and "foobr"

foob.+r Matchs strings like "foobar", "foobalkjdflkj9r" but not "foobr"

foob.?r Matchs strings like "foobar", "foobbr" and "foobr" but not "foobalkj9r"

fooba{2}r Matchs the string "foobaar"

fooba{2,}r Matchs strings like "foobaar", "foobaaar", "foobaaaar" etc.

fooba{2,3}r Matchs strings like "foobaar", or "foobaaar"  but not "foobaaaar"

A little explanation about "greediness". "Greedy" takes as many as possible, "non-greedy" takes as few as possible. For example, "b+" and "b*" applied to string "abbbbc" return "bbbb", "b+?" returns "b", "b*?" returns empty string, "b{2,3}?" returns "bb", "b{2,3}" returns "bbb".

You can switch all iterators into "non-greedy" mode (see below the modifier /g).

↑↑

Metacharacters - Alternatives

You can specify a series of alternatives for a pattern using "|" to separate them, so that fee|fie|foe will match any of "fee", "fie", or "foe" in the target string (as would f(e|i|o)e). The first alternative includes everything from the last pattern delimiter ("(", "[", or the beginning of the pattern) up to the first "|", and the last alternative contains everything from the last "|" to the next pattern delimiter. For this reason, it's common practice to include alternatives in parentheses, to minimize confusion about where they start and end.

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against "barefoot", only the "foo" part will match, as that is the first alternative tried, and it successfully matches the target string. (This might not seem important, but it is important when you are capturing matched text using parentheses.)

Also remember that "|" is interpreted as a literal within square brackets, so if You write [fee|fie|foe] You're really only matching [feio|].

Examples:

foo(bar|foo) Matchs strings "foobar" or "foofoo".

↑↑

Metacharacters - Subexpressions

The bracketing construct (...) may also be used for define r.e. subexpressions (after parsing you can find subexpression positions, lengths and actual values using npRexpMatchStr, npRexpMatchPos, npRexpMatchLen, npRexpSubCount and on npRexpSubstitute.

Subexpressions are numbered based on the left to right order of their opening parenthesis.

First subexpression has number "1" (whole r.e. match has number '0' - You can substitute it using npRexpSubstitute as "$0" or "$&").

Examples:

(foobar){8,10} Matchs strings which contain 8, 9 or 10 instances of the "foobar"

foob([0-9]|a+)r Matchs "foob0r", "foob1r" , "foobar", "foobaar", "foobaar" etc.

↑↑

Metacharacters - Backreferences

Metacharacters \1 through \9 are interpreted as backreferences. \<n> matches previously matched subexpression #<n>.

Examples:

(.)\1+ Matchs "aaaa" and "cc"

(.+)\1+ Also match "abab" and "123123"

(['"]?)(\d+)\1 Matchs "13" (in double quotes), or '4' (in single quotes) or 77 (without quotes), etc.

↑↑

Modifiers

Modifiers are for changing behaviour of npRexp Regular Expresions engine.

There are many ways to set up modifiers.

Any of these modifiers may be embedded within the regular expression itself using the (?...) construct.

Also, You can assign to appropriate modifiers string using npRexpSetMod and get what modifiers are established using npRexpGetMod actions.

The modifier /x itself needs a little more explanation. Can be used to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, for example:

Examples:

( 
  (abc) # comment 1
  |   # You can use spaces to format r.e. - RegExpr ignores it
  (efg) # comment 2
)

This also means that if you want real whitespace or # characters in the pattern (outside a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making regular expressions text more readable.

↑↑

Perl extensions

(?imsxr-imsxr)

You may use it into r.e. for modifying modifiers by the fly. If this construction inlined into subexpression, then it effects only into this subexpression.

Examples:

(?i)Saint-Petersburg Matchs "Saint-petersburg" and "Saint-Petersburg"

(?i)Saint-(?-i)Petersburg Matchs "Saint-Petersburg" but not "Saint-petersburg"

(?i)(Saint-)?Petersburg Matchs "Saint-petersburg" and "saint-petersburg"

((?i)Saint-)?Petersburg Matchs "saint-Petersburg", but not "saint-petersburg" 

(?#text) A comment, the text is ignored. Note that RegExpr closes the comment as soon as it sees a ")", so there is no way to put a literal ")" in the comment.

↑↑

Action errors subroutine

All the NeoPlugins deal with errors in the same way that NeoBook does: when the plugin found an action error the [LastError] variable store information about the error, so you can take care about this variable when execute an action.

But all the NeoPlugins also incorporate an advanced way to deal with possible action errors. You can define a subroutine named OnNeoPluginActionError in order to be executed when some action error are found and you can use this variables inside:

Note that this error handling subroutine are shared for all the NeoPlugins, so you no need to specify a subroutine for every plugin you use in your publication because the same subroutine are recognized and automagically used by every NeoPlugin. Below you can view a sample of this subroutine code:

:OnNeoPluginActionError
  AlertBox "NeoPlugin Error" "Error [LastError] in plugin: [PluginName]"
Return

Also note that the use of this NeoPlugins error handling subroutine is completelly optional. You can continue using the [LastError] variable as usual and even use the both methods at the same time.

↑↑