1 |
c56471a7
|
Anders Lind
|
<?php
|
2 |
|
|
/*
|
3 |
236a198c
|
Anders Lind
|
* parser_ipv6.inc
|
4 |
c56471a7
|
Anders Lind
|
*
|
5 |
0b4c14a4
|
Steve Beaver
|
* Copyright (c) 2017-2019 Anders Lind (anders.lind@gmail.com)
|
6 |
c56471a7
|
Anders Lind
|
* All rights reserved.
|
7 |
|
|
*
|
8 |
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
9 |
|
|
* you may not use this file except in compliance with the License.
|
10 |
|
|
* You may obtain a copy of the License at
|
11 |
|
|
*
|
12 |
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
13 |
|
|
*
|
14 |
|
|
* Unless required by applicable law or agreed to in writing, software
|
15 |
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
16 |
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
17 |
|
|
* See the License for the specific language governing permissions and
|
18 |
|
|
* limitations under the License.
|
19 |
|
|
*/
|
20 |
|
|
|
21 |
|
|
/*
|
22 |
|
|
* List of methods used:
|
23 |
|
|
* (?x) Free-spacing mode.
|
24 |
|
|
* Ability to use comments, place white space characters without impact,
|
25 |
|
|
* both newlines and white spaces are simply ignored unless escaped
|
26 |
|
|
* intentionally.
|
27 |
|
|
* Use "\ " to indicate a space e.g. like: Hey\ there.
|
28 |
|
|
* You might want to take a look:
|
29 |
|
|
* https://www.regular-expressions.info/freespacing.html
|
30 |
|
|
*
|
31 |
|
|
* # Comment under Free-spacing mode.
|
32 |
|
|
* If free-spacing mode is not on one can use (?#Some comment)
|
33 |
|
|
*
|
34 |
f3f98e97
|
Phil Davis
|
* (?(DEFINE) Subpattern.
|
35 |
c56471a7
|
Anders Lind
|
* Defines a subpattern that we intend to use
|
36 |
|
|
*
|
37 |
|
|
* (?'hexncolon' A named group.
|
38 |
|
|
* Can be used to define the group name of a subpattern
|
39 |
|
|
* or simply to give a matching group a name that is more 'logic'
|
40 |
|
|
* to use than a numbered group that might even change if the
|
41 |
|
|
* regular expression is changed.
|
42 |
|
|
*
|
43 |
|
|
* (?&hextet) Reference to use.
|
44 |
|
|
* Use/reference to the named group, which might be a subpattern.
|
45 |
|
|
*
|
46 |
|
|
* (?= Positive lookahead.
|
47 |
|
|
* Used when we want to make sure something is in the horizon before
|
48 |
|
|
* we start to match!
|
49 |
|
|
*
|
50 |
|
|
* (?! Negative lookahead.
|
51 |
|
|
* Used when we want to make sure something is NOT in the horizon before
|
52 |
|
|
* we start to match!
|
53 |
|
|
*
|
54 |
|
|
* () Capturing group.
|
55 |
|
|
* Normally identified by a number that corresponds to when it shows up
|
56 |
|
|
* in the regular expression.
|
57 |
|
|
*
|
58 |
|
|
* (?: Non-capturing group.
|
59 |
|
|
* Identifies a Non-capturing group that is useful if you e.g. need to
|
60 |
|
|
* repeat a match e.g. of a compound expression ab\d, but without capturing
|
61 |
|
|
* it: (?:ab\d)
|
62 |
|
|
*
|
63 |
|
|
* (?> Atomic (capturing) group.
|
64 |
|
|
* When it has a match it throws away all backtracking info it might have
|
65 |
f3f98e97
|
Phil Davis
|
* meaning it won't try alternations if there e.g. is a |.
|
66 |
c56471a7
|
Anders Lind
|
*
|
67 |
|
|
* \G We use \G once to alternate away from acceptable characters and instead
|
68 |
|
|
* match from the point where the last match ended. In our case below it is
|
69 |
|
|
* used to match at the start of the first line so we do not miss a match.
|
70 |
|
|
*
|
71 |
|
|
* For now everything runs stable.
|
72 |
|
|
*
|
73 |
|
|
* It we want we could make the following changes / investigations in the future:
|
74 |
|
|
* * At 1. in expression (?>(?&nohexncolonndot)+|\G) experiment with:
|
75 |
|
|
* \G vs |^|\s+ vs |^
|
76 |
|
|
* , if we want to optimize on speed/results.
|
77 |
|
|
* Expression handles cases at start and following matches.
|
78 |
|
|
*
|
79 |
|
|
* * Make 2 versions:
|
80 |
|
|
* One for IPv6 only and another that resemble what we have today (IPv6+IPv4).
|
81 |
|
|
* In that way we would have two regexes that can be chosen from.
|
82 |
|
|
* That would include when to use hexncolonndot (ipv6+ipv4) vs hexncolon (ipv6)
|
83 |
|
|
*
|
84 |
|
|
* * Experiment to move check_noclosingsinglecolon to the start right inside of
|
85 |
|
|
* (?'MATCH'
|
86 |
|
|
* , to see if we receive a speed improvement (that is stable of course).
|
87 |
|
|
* My hunch is that it wont work stable and likely require more steps in
|
88 |
|
|
* general also it seem less useful for the +IPv4 cases so experiment maybe as
|
89 |
|
|
* well right after: (?'IPV6'
|
90 |
|
|
*
|
91 |
|
|
* Main capturing groups:
|
92 |
|
|
* MATCH=We have a match
|
93 |
|
|
*
|
94 |
|
|
* Explanation to naming of the main groups below this section:
|
95 |
|
|
* C=double colon (::)
|
96 |
|
|
* L=Left
|
97 |
|
|
* M=Middle
|
98 |
|
|
* R=Right
|
99 |
|
|
* U=Unspecified address (:: alone)
|
100 |
|
|
* FULL=Full address not compressed with C/double colon
|
101 |
|
|
* 6=IPv6
|
102 |
|
|
* 4=IPv4
|
103 |
|
|
* , gives:
|
104 |
|
|
*
|
105 |
|
|
* IPV64
|
106 |
|
|
* 2. FULL64
|
107 |
|
|
* 3. CMR64
|
108 |
|
|
* 4. CLU64
|
109 |
|
|
*
|
110 |
|
|
* IPV6
|
111 |
|
|
* 5. FULL6
|
112 |
|
|
* 6. CMR6
|
113 |
|
|
* 7. CLU6
|
114 |
|
|
*/
|
115 |
|
|
|
116 |
|
|
const ipv6_regex = <<<'IPV6'
|
117 |
|
|
(?x)
|
118 |
|
|
# Definitions:
|
119 |
|
|
(?(DEFINE)(?'hex'[\da-f]))
|
120 |
|
|
(?(DEFINE)(?'hexncolon'[\da-f:]))
|
121 |
|
|
(?(DEFINE)(?'hexncolonndot'[\da-f:\.]))
|
122 |
|
|
(?(DEFINE)(?'hextet'(?&hex){1,4}))
|
123 |
|
|
(?(DEFINE)(?'octet'2[0-5]{2}|1[0-9]{2}|[1-9]?[0-9]))
|
124 |
|
|
(?(DEFINE)(?'ipv4'(?>(?&octet)\.){3}(?&octet)))
|
125 |
|
|
(?(DEFINE)(?'unspecifiedaddr'::))
|
126 |
|
|
(?(DEFINE)(?'nohexncolon'[^\da-f:]))
|
127 |
|
|
(?(DEFINE)(?'nohexncolonndot'[^\da-f:\.]))
|
128 |
|
|
(?(DEFINE)(?'check_withatleastonedoublecolon'
|
129 |
|
|
(?=.*(?&unspecifiedaddr))
|
130 |
|
|
))
|
131 |
|
|
(?(DEFINE)(?'check_withmostonedoublecolon'
|
132 |
|
|
(?!(?&hexncolon)+(?&unspecifiedaddr)(?&hexncolon)+(?&unspecifiedaddr))
|
133 |
|
|
))
|
134 |
|
|
# 1.
|
135 |
|
|
(?(DEFINE)(?'check_noclosingsinglecolon'(?!.*\b:(?>(?&nohexncolon)|$))))
|
136 |
|
|
# Start matching:
|
137 |
|
|
(?>(?&nohexncolonndot)+|\G)
|
138 |
|
|
(?&check_withmostonedoublecolon)
|
139 |
|
|
(?'MATCH'
|
140 |
|
|
(?'IPV64'
|
141 |
|
|
(?'FULL64'
|
142 |
|
|
# 2.
|
143 |
|
|
(?>(?&hextet):){6}(?&ipv4)
|
144 |
|
|
)
|
145 |
|
|
|
|
146 |
|
|
(?'CMR64'
|
147 |
|
|
# 3.
|
148 |
|
|
(?&check_withatleastonedoublecolon)(?>(?&hextet)(?>:{1,2})){1,5}(?&ipv4)
|
149 |
|
|
)
|
150 |
|
|
|
|
151 |
|
|
(?'CLU64'
|
152 |
|
|
# 4.
|
153 |
|
|
(?&unspecifiedaddr)(?>(?&hextet):){0,5}(?&ipv4)
|
154 |
|
|
)
|
155 |
|
|
)
|
156 |
|
|
(?!(?&hexncolonndot))
|
157 |
|
|
|
|
158 |
|
|
(?'IPV6'
|
159 |
|
|
(?'FULL6'
|
160 |
|
|
# 5.
|
161 |
|
|
(?>(?&hextet)(?>:)){7}(?&hextet)
|
162 |
|
|
)
|
163 |
|
|
|
|
164 |
|
|
(?'CMR6'
|
165 |
|
|
# 6.
|
166 |
|
|
(?&check_withatleastonedoublecolon)(?&check_noclosingsinglecolon)
|
167 |
|
|
(?&hextet):{1,2}(?>(?&hextet)(?>:{1,2}|\b)){0,6}
|
168 |
|
|
)
|
169 |
|
|
|
|
170 |
|
|
(?'CLU6'
|
171 |
|
|
# 7.
|
172 |
|
|
(?&check_noclosingsinglecolon)
|
173 |
|
|
(?&unspecifiedaddr)(?>(?&hextet)(?>:|\b)){0,7}
|
174 |
|
|
)
|
175 |
|
|
)
|
176 |
|
|
(?!(?&hexncolonndot))
|
177 |
|
|
)
|
178 |
|
|
IPV6;
|
179 |
|
|
|
180 |
|
|
/*
|
181 |
|
|
* Enumerated comments/documentation
|
182 |
|
|
*
|
183 |
|
|
* 1.
|
184 |
f3f98e97
|
Phil Davis
|
* check_noclosingsinglecolon checks with negative lookahead what we 'anti' match
|
185 |
c56471a7
|
Anders Lind
|
* (remember we do not capture with negative lookahead).
|
186 |
|
|
* check_noclosingsinglecolon defines inside (the inner check) the opposite of
|
187 |
|
|
* what we match. Therefore 'anti' match.
|
188 |
|
|
* The inner check does the following. It is a match
|
189 |
|
|
* 1) that contains something after the (unwanted) :
|
190 |
|
|
* or
|
191 |
|
|
* 2) that is empty after the : (meaning the end of 'line'/input).
|
192 |
|
|
* Inner check: Everything (except newlines), followed by word boundary,
|
193 |
|
|
* colon, one character of everything BUT NOT (digit, a-f, colon)
|
194 |
|
|
* It means that AFTER the check what we MATCH
|
195 |
|
|
* will be the opposite of .*\b:[some character NOT in \da-f:]
|
196 |
|
|
* meaning something that does not have the same features.
|
197 |
|
|
* The reason why whe have the character class in the end is that it refers to
|
198 |
|
|
* our characters that we use as building blocks in an ipv6 address and anything
|
199 |
|
|
* else can be considered as separators between addresses. So if these building
|
200 |
|
|
* block characters indeed show up the colon would not be the end of the address
|
201 |
|
|
* and not be a 'closing colon'.
|
202 |
|
|
* If a second colon shows up right after - well then it is potentially a
|
203 |
|
|
* shortening of an address meaning something else than a single colon.
|
204 |
|
|
* If in fact there is a 'separator' in the inner check then the check/result is
|
205 |
|
|
* accepted, but flipped around because of the negative lookahead! Meaning we
|
206 |
|
|
* do not match the result afterwards!
|
207 |
|
|
*
|
208 |
|
|
* 2.
|
209 |
|
|
* this must be the first of the 3 expressions - if this came e.g. after the next group
|
210 |
|
|
* it would not match in a line with multiple addresses!
|
211 |
|
|
* matches like: 1111:2222:3333:4444:5555:6666:222.111.333.231
|
212 |
|
|
*
|
213 |
|
|
* 3.
|
214 |
|
|
* matches like: beef:beef::beef:beef:123.123.123.255
|
215 |
|
|
*
|
216 |
|
|
* 4.
|
217 |
|
|
* matches like: ::beef:beef:231.132.213.0
|
218 |
|
|
*
|
219 |
|
|
* 5.
|
220 |
|
|
* this must be the first of the 3 expressions - if this came e.g. after the next group
|
221 |
|
|
* it would not match in a line with multiple addresses!
|
222 |
|
|
* matches like: 1111:2222:3333:4444:5555:6666:7777:8888
|
223 |
|
|
*
|
224 |
|
|
* 6.
|
225 |
|
|
* matches like: beef:beef::beef:beef
|
226 |
|
|
*
|
227 |
|
|
* 7.
|
228 |
|
|
* matches like: ::beef:beef
|
229 |
|
|
* if hexncolonndot is not used above when we start our hextet will typically
|
230 |
|
|
* match the last octet in an ipv4 address when an ipv6 address follows it e.g.:
|
231 |
|
|
* ::1234:1234:1234:1234:1234:1234:123.231.213.255 ::11
|
232 |
|
|
* 255 gets matched by us. Therefore it is important to use hexncolonndot at -START-
|
233 |
|
|
*
|
234 |
|
|
*/
|
235 |
|
|
|
236 |
|
|
/*
|
237 |
|
|
* Small wrapper function to match all the lease content.
|
238 |
|
|
*/
|
239 |
f95579a6
|
Anders Lind
|
function parse_all_ipv6_to_array($content) {
|
240 |
|
|
preg_match_all('/'.ipv6_regex.'/i', $content, $matches, PREG_SET_ORDER);
|
241 |
c56471a7
|
Anders Lind
|
return $matches;
|
242 |
|
|
}
|