版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
【移動應(yīng)用開發(fā)技術(shù)】4C++Boost正則表達(dá)式
4C++
Boost正則表達(dá)式目錄:
離線文檔:
去除HTML文件中的標(biāo)簽:
正則表達(dá)之檢驗(yàn)程序:
正則表達(dá)式元字符:
錨點(diǎn):
匹配多個字母與多個數(shù)字
標(biāo)記:含有()一對小括號里面的東西,Boost中()不需要轉(zhuǎn)譯了
?:
不被標(biāo)記,不能被反向引用
重復(fù)特性[貪婪匹配,盡量去匹配最多的]:
?
非貪婪匹配[盡可能少的匹配]:
流模式,不會回頭,匹配就匹配了,為高性能服務(wù):
反向引用:必須存在被標(biāo)記的表達(dá)式
或條件:
單詞邊界:
命名表達(dá)式:
注釋:
分支重設(shè):
正向預(yù)查:
舉例1:只是匹配th不是匹配ing,但是ing必須存在
舉例2:ing參與匹配,th不被消耗,in被匹配
舉例3:除了ing不匹配,其他都匹配.
反向預(yù)查:
遞歸正則:
操作符優(yōu)先級:
顯示子串的個數(shù)
boost
正則表達(dá)式
sub
match
boost
正則表達(dá)式
算法regex_replace
boost
正則表達(dá)式
迭代器
boost
正則表達(dá)式
-1,就是未被匹配的字符
boost
正則表達(dá)式
captures
官方代碼為什么會出現(xiàn)段錯誤?
boost
正則表達(dá)式
官方例子
boost
正則表達(dá)式
search方式
簡單的詞法分析器,分析C++類定義
boost
正則表達(dá)式
迭代器方式
簡單的詞法分析器,分析C++類定義
boost
正則表達(dá)式,將C++文件轉(zhuǎn)換為HTML文件
boost
正則表達(dá)式
,抓取網(wǎng)頁中的所有連接:離線文檔:boost_1_62_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html去除HTML文件中的標(biāo)簽:chunli@Linux:~/workspace/Boost$sed's/<[\/]\?\([[:alpha:]][[:alnum:]]*[^>]*\)>//g'index.html
正則表達(dá)之檢驗(yàn)程序:
chunli@Linux:~/boost$
cat
main.cpp
#include
<iostream>
#include
<iomanip>
#include
<boost/regex.hpp>
using
namespace
std;
int
main(int
argc,
const
char*
argv[])
{
if
(argc
!=
2)
{
cerr
<<
"Usage:
"
<<
argv[0]
<<
"
regex-str"
<<
endl;
return
1;
}
boost::regex
e(argv[1],
boost::regex::icase);
//mark_count
返回regex中帶標(biāo)記子表達(dá)式的數(shù)量。帶標(biāo)記子表達(dá)式是指正則表達(dá)式中用圓括號括起來的部分
cout
<<
"subexpressions:
"
<<
e.mark_count()
<<
endl;
string
line;
while
(getline(cin,
line))
{
boost::match_results<string::const_iterator>
m;
if
(boost::regex_search(line,
m,
e,
boost::match_default))
{
const
int
n
=
m.size();
for
(int
i
=
0;
i
<
n;
++i)
{
cout
<<
m[i]
<<
"
";
}
cout
<<
endl;
}
else
{
cout
<<
setw(line.size())
<<
setfill('-')
<<
'-'
<<
right
<<
endl;
}
}
} 正則表達(dá)式元字符:.[{}()\*+?|^$ 錨點(diǎn):AnchorsA'^'charactershallmatchthestartofaline.A'$'charactershallmatchtheendofaline. 匹配多個字母與多個數(shù)字chunli@Linux:~/boost$g++main.cpp
-lboost_regex-Wall
&&./a.out"\w+\d+"
subexpressions:0Hello,world2016
world2016
標(biāo)記:含有()一對小括號里面的東西,Boost中()不需要轉(zhuǎn)譯了chunli@Linux:~/boost$
g++
main.cpp
-l
boost_regex
-Wall
&&
./a.out
"([[:alpha:]]+)[[:digit:]]+\1"
subexpressions:
1
hello123abc8888888abc
abc8888888abc
abc
\1為引用$1
只有被標(biāo)記的內(nèi)容才能被反向引用. ?:不被標(biāo)記,不能被反向引用chunli@Linux:~/boost$
g++
main.cpp
-l
boost_regex
-Wall
&&
./a.out
'(?:[[:alpha:]]+)[[:digit:]]+'
subexpressions:
0
abcd1234
abcd1234
11111@@
重復(fù)特性[貪婪匹配,盡量去匹配最多的]:* 任意次
+ 至少一次
? 一次
{n} n次
{n,} 大于等于n次
{n,m} n到m次
chunli@Linux:~/boost$
g++
main.cpp
-l
boost_regex
-Wall
&&
./a.out
'a.*b'
subexpressions:
0
azzzzzzzzzbbaaazzzzzzzb
azzzzzzzzzbbaaazzzzzzzb ?非貪婪匹配[盡可能少的匹配]:
Non
greedy
repeats
The
normal
repeat
operators
are
"greedy",
that
is
to
say
they
will
consume
as
much
input
as
possible.
There
are
non-greedy
versions
available
that
will
consume
as
little
input
as
possible
while
still
producing
a
match.
*?
Matches
the
previous
atom
zero
or
more
times,
while
consuming
as
little
input
as
possible.
+?
Matches
the
previous
atom
one
or
more
times,
while
consuming
as
little
input
as
possible.
??
Matches
the
previous
atom
zero
or
one
times,
while
consuming
as
little
input
as
possible.
{n,}?
Matches
the
previous
atom
n
or
more
times,
while
consuming
as
little
input
as
possible.
{n,m}?
Matches
the
previous
atom
between
n
and
m
times,
while
consuming
as
little
input
as
possible.
chunli@Linux:~/boost$
g++
main.cpp
-l
boost_regex
-Wall
&&
./a.out
'a.*?b'
subexpressions:
0
azzzzzzzzzbbaaazzzzzzzb
azzzzzzzzzb 流模式,不會回頭,匹配就匹配了,為高性能服務(wù):
Possessive
repeats
By
default
when
a
repeated
pattern
does
not
match
then
the
engine
will
backtrack
until
a
match
is
found.
However,
this
behaviour
can
sometime
be
undesireble
so
there
are
also
"possessive"
repeats:
these
match
as
much
as
possible
and
do
not
then
allow
backtracking
if
the
rest
of
the
expression
fails
to
match.
*+
Matches
the
previous
atom
zero
or
more
times,
while
giving
nothing
back.
++
Matches
the
previous
atom
one
or
more
times,
while
giving
nothing
back.
?+
Matches
the
previous
atom
zero
or
one
times,
while
giving
nothing
back.
{n,}+
Matches
the
previous
atom
n
or
more
times,
while
giving
nothing
back.
{n,m}+
Matches
the
previous
atom
between
n
and
m
times,
while
giving
nothing
back.
Back
references 反向引用:必須存在被標(biāo)記的表達(dá)式
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'^(a*).*\1$'
subexpressions:
1
a66a66
a66a66
asssasss
asssasss 或條件:
Alternation
The
|
operator
will
match
either
of
its
arguments,
so
for
example:
abc|def
will
match
either
"abc"
or
"def".
Parenthesis
can
be
used
to
group
alternations,
for
example:
ab(d|ef)
will
match
either
of
"abd"
or
"abef".
Empty
alternatives
are
not
allowed
(these
are
almost
always
a
mistake),
but
if
you
really
want
an
empty
alternative
use
(?:)
as
a
placeholder,
for
example:
|abc
is
not
a
valid
expression,
but
(?:)|abc
is
and
is
equivalent,
also
the
expression:
(?:abc)??
has
exactly
the
same
effect.
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'l(i|o)ve'
subexpressions:
1
love
love
o
live
live
i
^C
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'\<l(i|o)ve\>'
subexpressions:
1
love
love
o
live
live
i
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'abc|123|234'
subexpressions:
0
23
--
123
123
abc
abc
234
234
123456789abc
123 單詞邊界:
Word
Boundaries
Word
Boundaries
The
following
escape
sequences
match
the
boundaries
of
words:
<
Matches
the
start
of
a
word.
>
Matches
the
end
of
a
word.
\b
Matches
a
word
boundary
(the
start
or
end
of
a
word).
\B
Matches
only
when
not
at
a
word
boundary. 命名表達(dá)式:
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'(?<r1>\d+)[[:blank:]]+\1'
subexpressions:
1
123
123
123
123
123
234
234
234
234
234
^C
chunli@Linux:~/boost$
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'(?<r1>\d+)[[:blank:]]+\g{r1}'
subexpressions:
1
1234
1234
1234
1234
1234
1236
1236
1236
1236
1236 注釋:
Comments
(?#
...
)
is
treated
as
a
comment,
it's
contents
are
ignored.
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'\d+(?#我的注釋)'
subexpressions:
0
hello1234
1234 分支重設(shè):
Branch
reset
(?|pattern)
resets
the
subexpression
count
at
the
start
of
each
"|"
alternative
within
pattern.
The
sub-expression
count
following
this
construct
is
that
of
whichever
branch
had
the
largest
number
of
sub-expressions.
This
construct
is
useful
when
you
want
to
capture
one
of
a
number
of
alternative
matches
in
a
single
sub-expression
index.
In
the
following
example
the
index
of
each
sub-expression
is
shown
below
the
expression:
#
before
branch-reset
after
/
(
a
)
(?|
x
(
y
)
z
|
(p
(q)
r)
|
(t)
u
(v)
)
(
z
)
/x
#
1
2
2
3
2
3
4
chunli@Linux:~/boost$
./a.out
'(
a
)
(?|
x
(
y
)
z
|
(p
(q)
r)
|
(t)
u
(v)
)
(
z
)
/x'
subexpressions:
4 正向預(yù)查:即使字符已經(jīng)被匹配,但是不被消耗,留著其他人繼續(xù)匹配Lookahead(?=pattern)consumeszerocharacters,onlyifpatternmatches.(?!pattern)consumeszerocharacters,onlyifpatterndoesnotmatch.LookaheadistypicallyusedtocreatethelogicalANDoftworegularexpressions,forexampleifapasswordmustcontainalowercaseletter,anuppercaseletter,apunctuationsymbol,andbeatleast6characterslong,thentheexpression:(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}couldbeusedtovalidatethepassword. 舉例1:只是匹配th不是匹配ing,但是ing必須存在chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'th(?=ing)'
subexpressions:
0
those
thing
th 舉例2:ing參與匹配,th不被消耗,in被匹配chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'th(?=ing)(in)'
subexpressions:
1
thing
thin
in
those
舉例3:除了ing不匹配,其他都匹配.chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'th(?!ing)'
subexpressions:
0
this
th
thing
反向預(yù)查:
Lookbehind
(?<=pattern)
consumes
zero
characters,
only
if
pattern
could
be
matched
against
the
characters
preceding
the
current
position
(pattern
must
be
of
fixed
length).
(?<!pattern)
consumes
zero
characters,
only
if
pattern
could
not
be
matched
against
the
characters
preceding
the
current
position
(pattern
must
be
of
fixed
length).
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'(?<=ti)mer'
subexpressions:
0
timer
mer
memer
chunli@Linux:~/boost$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
'(?<!ti)mer'
subexpressions:
0
timer
hhmer
mer 遞歸正則:(?N)
(?-N)
(?+N)
(?R)
(?0)
(?&NAME)
(?R)
and
(?0)
recurse
to
the
start
of
the
entire
pattern.
(?N)
executes
sub-expression
N
recursively,
for
example
(?2)
will
recurse
to
sub-expression
2.
(?-N)
and
(?+N)
are
relative
recursions,
so
for
example
(?-1)
recurses
to
the
last
sub-expression
to
be
declared,
and
(?+1)
recurses
to
the
next
sub-expression
to
be
declared.
(?&NAME)
recurses
to
named
sub-expression
NAME. 操作符優(yōu)先級:
Operator
precedence
The
order
of
precedence
for
of
operators
is
as
follows:
Collation-related
bracket
symbols
[==]
[::]
[..]
Escaped
characters
\
Character
set
(bracket
expression)
[]
Grouping
()
Single-character-ERE
duplication
*
+
?
{m,n}
Concatenation
Anchoring
^$
Alternation
|===========================================================Boost
regexAPI顯示子串的個數(shù)
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<iostream>
#include
<iomanip>
#include
<boost/regex.hpp>
using
namespace
std;
int
main(int
argc,
const
char*
argv[])
{
using
boost::regex;
regex
e1;
e1
=
"^[[:xdigit:]]*$";
cout
<<
e1.str()
<<
endl;
cout
<<
e1.mark_count()
<<
endl;
//regex::save_subexpression_location如果沒有打開,
e2.subexpression(0)會報錯
regex
e2("\\b\\w+(?=ing)\\b.{2,}?([[:alpha:]]*)$",regex::perl
|
regex::icase|regex::save_subexpression_location );
cout
<<
e2.str()
<<
endl;
cout
<<
e2.mark_count()
<<
endl;
pair<regex::const_iterator,regex::const_iterator>
sub1
=
e2.subexpression(0);
string
sub1Str(sub1.first,++sub1.second);
cout
<<
sub1Str
<<
endl;
return
0;
}
pi@raspberrypi:~/boost
$
pi@raspberrypi:~/boost
$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
^[[1;5D^[[:xdigit:]]*$
0
\b\w+(?=ing)\b.{2,}?([[:alpha:]]*)$
1
([[:alpha:]]*)
pi@raspberrypi:~/boost
$boost正則表達(dá)式submatch
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<iostream>
#include
<iomanip>
#include
<boost/regex.hpp>
using
namespace
std;
int
main(int
argc,
const
char*
argv[])
{
using
boost::regex;
//以T開頭,跟多個字母
\b邊界,然后是16進(jìn)制匹配
regex
e1("\\bT\\w+\\b
([[:xdigit:]]+)");//讓正則表達(dá)式看到反斜杠
string
s("Time
ef09,Todo
001");
boost::smatch
m;
//bool
b
=
boost::regex_search(s,m,e1,boost::match_all);//:match_all只會匹配最后一下
bool
b
=
boost::regex_search(s,m,e1);//默認(rèn)只會匹配首次
cout
<<
b
<<endl;
const
int
n
=
m.size();
for(int
i
=
0;
i<n;
i++)
{
cout
<<
"matched:"
<<
i
<<
"
,position:"
<<
m.position(i)
<<",
";
cout
<<
"length:"
<<
m.length(i)
<<
"
,
str:"
<<
m.str(i)
<<
endl;
}
return
0;
}
pi@raspberrypi:~/boost
$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
1
matched:0
,position:0,
length:9
,
str:Time
ef09
matched:1
,position:5,
length:4
,
str:ef09
pi@raspberrypi:~/boost
$boost正則表達(dá)式算法regex_replace
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<iostream>
#include
<iomanip>
#include
<boost/regex.hpp>
using
namespace
std;
int
main(int
argc,
const
char*
argv[])
{
using
boost::regex;
regex
e1("([TQV])|(\\*)|(@)");
string
replaceFmt("(\\L?1$&)(?2+)(?3#)");//轉(zhuǎn)小寫,轉(zhuǎn)+,轉(zhuǎn)#
string
src("guTdQhV@@g*b*");//輸入的字符串
cout
<<
"before
replaced:
"
<<src
<<
endl;
//before
replaced:
guTdQhV@@g*b*
string
newStr1
=
regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_all);//必須format_all
cout
<<
"after
replaced:
"
<<
newStr1
<<
endl;
//after
replaced:
gutdqhv##g+b+
string
newStr2
=
regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_default);//奇怪的結(jié)果
cout
<<
"after
replaced:
"
<<
newStr2
<<
endl;
//其他的方式
ostream_iterator<char>
oi(cout);
regex_replace(oi,src.begin(),src.end(),e1,replaceFmt,boost::match_default
|
boost::match_all);
cout
<<
endl;
return
0;
}
pi@raspberrypi:~/boost
$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
before
replaced:
guTdQhV@@g*b*
after
replaced:
gutdqhv##g+b+
after
replaced:
gu(?1t)(?2+)(?3#)d(?1q)(?2+)(?3#)h(?1v)(?2+)(?3#)(?1@)(?2+)(?3#)(?1@)(?2+)(?3#)g(?1*)(?2+)(?3#)b(?1*)(?2+)(?3#)
guTdQhV@@g*b(?1*)(?2+)(?3#)
pi@raspberrypi:~/boost
$boost正則表達(dá)式
迭代器
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<iostream>
#include
<iomanip>
#include
<boost/regex.hpp>
using
namespace
std;
int
main(int
argc,
const
char*
argv[])
{
using
boost::regex;
regex
e("(a+).+?",regex::icase);
string
s("ann
abb
aaat");
boost::sregex_iterator
it1(s.begin(),s.end(),e);
boost::sregex_iterator
it2;
for(;it1
!=
it2;++it1)
{
boost::smatch
m
=
*it1;
cout
<<
m
<<
endl;
}
return
0;
}
pi@raspberrypi:~/boost
$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
an
ab
aaat
pi@raspberrypi:~/boost
$boost正則表達(dá)式-1,就是未被匹配的字符
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<iostream>
#include
<iomanip>
#include
<boost/regex.hpp>
using
namespace
std;
int
main(int
argc,
const
char*
argv[])
{
using
boost::regex;
string
s("this
is
::a
string
::of
tokens");
boost::regex
re("\\s+:*");//匹配
boost::sregex_token_iterator
i(s.begin(),s.end(),re,-1);
boost::sregex_token_iterator
j;
unsigned
count
=
0;
while(i
!=
j)
{
cout
<<
*i++
<<
endl;
count++;
}
cout
<<
"There
were
"<<
count
<<
"
tokens
found
!"
<<
endl;
return
0;
}
pi@raspberrypi:~/boost
$
g++
main.cpp
-lboost_regex
-Wall
&&./a.out
this
is
a
string
of
tokens
There
were
6
tokens
found
!
pi@raspberrypi:~/boost
$boost正則表達(dá)式captures官方代碼為什么會出現(xiàn)段錯誤?
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<boost/regex.hpp>
#include
<iostream>
void
print_captures(const
std::string&
regx,
const
std::string&
text)
{
boost::regex
e(regx);
boost::smatch
what;
std::cout
<<
"Expression:
\""
<<
regx
<<
"\"\n";
std::cout
<<
"Text:
\""
<<
text
<<
"\"\n";
if(boost::regex_match(text,
what,
e,
boost::match_extra))
{
unsigned
i,
j;
std::cout
<<
"**
Match
found
**\n
Sub-Expressions:\n";
for(i
=
0;
i
<
what.size();
++i)
std::cout
<<
"
$"
<<
i
<<
"
=
\""
<<
what[i]
<<
"\"\n";
std::cout
<<
"
Captures:\n";
for(i
=
0;
i
<
what.size();
++i)
{
std::cout
<<
"
$"
<<
i
<<
"
=
{";
for(j
=
0;
j
<
what.captures(i).size();
++j)
{
if(j)
std::cout
<<
",
";
else
std::cout
<<
"
";
std::cout
<<
"\""
<<
what.captures(i)[j]
<<
"\"";
}
std::cout
<<
"
}\n";
}
}
else
{
std::cout
<<
"**
No
Match
found
**\n";
}
}
int
main(int
,
char*
[])
{
print_captures("(([[:lower:]]+)|([[:upper:]]+))+",
"aBBcccDDDDDeeeeeeee");
print_captures("a(b+|((c)*))+d",
"abd");
print_captures("(.*)bar|(.*)bah",
"abcbar");
print_captures("(.*)bar|(.*)bah",
"abcbah");
print_captures("^(?:(\\w+)|(?>\\W+))*$",
"now
is
the
time
for
all
good
men
to
come
to
the
aid
of
the
party");
print_captures("^(?>(\\w+)\\W*)*$",
"now
is
the
time
for
all
good
men
to
come
to
the
aid
of
the
party");
print_captures("^(\\w+)\\W+(?>(\\w+)\\W+)*(\\w+)$",
"now
is
the
time
for
all
good
men
to
come
to
the
aid
of
the
party");
print_captures("^(\\w+)\\W+(?>(\\w+)\\W+(?:(\\w+)\\W+){0,2})*(\\w+)$",
"now
is
the
time
for
all
good
men
to
come
to
the
aid
of
the
party");
return
0;
}
pi@raspberrypi:~/boost
$
g++
-D
BOOST_REGEX_MATCH_EXTRA
-l
boost_regex
-Wall
main.cpp
&&./a.out
Expression:
"(([[:lower:]]+)|([[:upper:]]+))+"
Text:
"aBBcccDDDDDeeeeeeee"
**
No
Match
found
**
Bus
error
pi@raspberrypi:~/boost
$boost正則表達(dá)式官方例子
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<cstdlib>
#include
<stdlib.h>
#include
<boost/regex.hpp>
#include
<string>
#include
<iostream>
using
namespace
std;
using
namespace
boost;
regex
expression("^([0-9]+)(\\-|
|$)(.*)$");//0-9,-
$,*三種
int
process_ftp(const
char*
response,
std::string*
msg)
{
cmatch
what;
if(regex_match(response,
what,
expression))
{
//
what[0]
contains
the
whole
string
//
what[1]
contains
the
response
code
//
what[2]
contains
the
separator
character
//
what[3]
contains
the
text
message.
if(msg)
msg->assign(what[3].first,
what[3].second);
return
::atoi(what[1].first);
}
//
failure
did
not
match
if(msg)
msg->erase();
return
-1;
}
#if
defined(BOOST_MSVC)
||
(defined(__BORLANDC__)
&&
(__BORLANDC__
==
0x550))
istream&
getline(istream&
is,
std::string&
s)
{
s.erase();
char
c
=
static_cast<char>(is.get());
while(c
!=
'\n')
{
s.append(1,
c);
c
=
static_cast<char>(is.get());
}
return
is;
}
#endif
int
main(int
argc,
const
char*[])
{
std::string
in,
out;
do
{
if(argc
==
1)
{
cout
<<
"enter
test
string"
<<
endl;
getline(cin,
in);
if(in
==
"quit")
break;
}
else
in
=
"100
this
is
an
ftp
message
text";
int
result;
result
=
process_ftp(in.c_str(),
&out);
if(result
!=
-1)
{
cout
<<
"Match
found:"
<<
endl;
cout
<<
"Response
code:
"
<<
result
<<
endl;
cout
<<
"Message
text:
"
<<
out
<<
endl;
}
else
{
cout
<<
"Match
not
found"
<<
endl;
}
cout
<<
endl;
}
while(argc
==
1);
return
0;
}
pi@raspberrypi:~/boost
$
g++
-l
boost_regex
-Wall
main.cpp
&&./a.out
enter
test
string
404
not
found
Match
found:
Response
code:
404
Message
text:
not
found
enter
test
string
500
service
error
Match
found:
Response
code:
500
Message
text:
service
error
enter
test
string
^C
pi@raspberrypi:~/boost
$boost正則表達(dá)式search方式簡單的詞法分析器,分析C++類定義
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<string>
#include
<map>
#include
<boost/regex.hpp>
//
purpose:
//
takes
the
contents
of
a
file
in
the
form
of
a
string
//
and
searches
for
all
the
C++
class
definitions,
storing
//
their
locations
in
a
map
of
strings/int's
typedef
std::map<std::string,
std::string::difference_type,
std::less<std::string>
>
map_type;
const
char*
re
=
//
possibly
leading
whitespace:
"^[[:space:]]*"
//
possible
template
declaration:
"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
//
class
or
struct:
"(class|struct)[[:space:]]*"
//
leading
declspec
macros
etc:
"("
"\\<\\w+\\>"
"("
"[[:blank:]]*\\([^)]*\\)"
")?"
"[[:space:]]*"
")*"
//
the
class
name
"(\\<\\w*\\>)[[:space:]]*"
//
template
specialisation
parameters
"(<[^;:{]+>)?[[:space:]]*"
//
terminate
in
{
or
:
"(\\{|:[^;\\{()]*\\{)";
boost::regex
expression(re);
void
IndexClasses(map_type&
m,
const
std::string&
file)
{
std::string::const_iterator
start,
end;
start
=
file.begin();
end
=
file.end();
boost::match_results<std::string::const_iterator>
what;
boost::match_flag_type
flags
=
boost::match_default;
while(boost::regex_search(start,
end,
what,
expression,
flags))
{
//
what[0]
contains
the
whole
string
//
what[5]
contains
the
class
name.
//
what[6]
contains
the
template
specialisation
if
any.
//
add
class
name
and
position
to
map:
m[std::string(what[5].first,
what[5].second)
+
std::string(what[6].first,
what[6].second)]
=
what[5].first
-
file.begin();
//
update
search
position:
start
=
what[0].second;
//
update
flags:
flags
|=
boost::match_prev_avail;
flags
|=
boost::match_not_bob;
}
}
#include
<iostream>
#include
<fstream>
using
namespace
std;
void
load_file(std::string&
s,
std::istream&
is)
{
s.erase();
if(is.bad())
return;
s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));
char
c;
while(is.get(c))
{
if(s.capacity()
==
s.size())
s.reserve(s.capacity()
*
3);
s.append(1,
c);
}
}
int
main(int
argc,
const
char**
argv)
{
std::string
text;
for(int
i
=
1;
i
<
argc;
++i)
{
cout
<<
"Processing
file
"
<<
argv[i]
<<
endl;
map_type
m;
std::ifstream
fs(argv[i]);
load_file(text,
fs);
fs.close();
IndexClasses(m,
text);
cout
<<
m.size()
<<
"
matches
found"
<<
endl;
map_type::iterator
c,
d;
c
=
m.begin();
d
=
m.end();
while(c
!=
d)
{
cout
<<
"class
\""
<<
(*c).first
<<
"\"
found
at
index:
"
<<
(*c).second
<<
endl;
++c;
}
}
return
0;
}
pi@raspberrypi:~/boost
$
cat
my_class.cpp
template
<class
T>
struct
A
{
public:
};
template
<class
T>
class
M
{
}
;
pi@raspberrypi:~/boost
$
g++
-l
boost_regex
-Wall
main.cpp
&&./a.out
my_class.cpp
Processing
file
my_class.cpp
2
matches
found
class
"A"
found
at
index:
36
class
"M"
found
at
index:
88
pi@raspberrypi:~/boost
$boost正則表達(dá)式迭代器方式簡單的詞法分析器,分析C++類定義
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<string>
#include
<map>
#include
<fstream>
#include
<iostream>
#include
<boost/regex.hpp>
using
namespace
std;
//
purpose:
//
takes
the
contents
of
a
file
in
the
form
of
a
string
//
and
searches
for
all
the
C++
class
definitions,
storing
//
their
locations
in
a
map
of
strings/int's
typedef
std::map<std::string,
std::string::difference_type,
std::less<std::string>
>
map_type;
const
char*
re
=
//
possibly
leading
whitespace:
"^[[:space:]]*"
//
possible
template
declaration:
"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
//
class
or
struct:
"(class|struct)[[:space:]]*"
//
leading
declspec
macros
etc:
"("
"\\<\\w+\\>"
"("
"[[:blank:]]*\\([^)]*\\)"
")?"
"[[:space:]]*"
")*"
//
the
class
name
"(\\<\\w*\\>)[[:space:]]*"
//
template
specialisation
parameters
"(<[^;:{]+>)?[[:space:]]*"
//
terminate
in
{
or
:
"(\\{|:[^;\\{()]*\\{)";
boost::regex
expression(re);
map_type
class_index;
bool
regex_callback(const
boost::match_results<std::string::const_iterator>&
what)
{
//
what[0]
contains
the
whole
string
//
what[5]
contains
the
class
name.
//
what[6]
contains
the
template
specialisation
if
any.
//
add
class
name
and
position
to
map:
class_index[what[5].str()
+
what[6].str()]
=
what.position(5);
return
true;
}
void
load_file(std::string&
s,
std::istream&
is)
{
s.erase();
if(is.bad())
return;
s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));
char
c;
while(is.get(c))
{
if(s.capacity()
==
s.size())
s.reserve(s.capacity()
*
3);
s.append(1,
c);
}
}
int
main(int
argc,
const
char**
argv)
{
std::string
text;
for(int
i
=
1;
i
<
argc;
++i)
{
cout
<<
"Processing
file
"
<<
argv[i]
<<
endl;
std::ifstream
fs(argv[i]);
load_file(text,
fs);
fs.close();
//
construct
our
iterators:
boost::sregex_iterator
m1(text.begin(),
text.end(),
expression);
boost::sregex_iterator
m2;
std::for_each(m1,
m2,
®ex_callback);
//
copy
results:
cout
<<
class_index.size()
<<
"
matches
found"
<<
endl;
map_type::iterator
c,
d;
c
=
class_index.begin();
d
=
class_index.end();
while(c
!=
d)
{
cout
<<
"class
\""
<<
(*c).first
<<
"\"
found
at
index:
"
<<
(*c).second
<<
endl;
++c;
}
class_index.erase(class_index.begin(),
class_index.end());
}
return
0;
}
pi@raspberrypi:~/boost
$
g++
-l
boost_regex
-Wall
main.cpp
&&./a.out
main.cpp
my_class.cpp
Processing
file
main.cpp
0
matches
found
Processing
file
my_class.cpp
2
matches
found
class
"A"
found
at
index:
23
class
"B"
found
at
index:
36
pi@raspberrypi:~/boost
$boost正則表達(dá)式,將C++文件轉(zhuǎn)換為HTML文件
pi@raspberrypi:~/boost
$
cat
main.cpp
#include
<iostream>
#include
<fstream>
#include
<sstream>
#include
<string>
#include
<iterator>
#include
<boost/regex.hpp>
#include
<fstream>
#include
<iostream>
//
purpose:
//
takes
the
contents
of
a
file
and
transform
to
//
syntax
highlighted
code
in
html
format
boost::regex
e1,
e2;
extern
const
char*
expression_text;
extern
const
char*
format_string;
extern
const
char*
pre_expression;
extern
const
char*
pre_format;
extern
const
char*
header_text;
extern
const
char*
footer_text;
void
load_file(std::string&
s,
std::istream&
is)
{
s.erase();
if(is.bad())
return;
s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));
char
c;
while(is.get(c))
{
if(s.capacity()
==
s.size())
s.reserve(s.capacity()
*
3);
s.append(1,
c);
}
}
int
main(int
argc,
const
char**
argv)
{
try{
e1.assign(expression_text);
e2.assign(pre_expression);
for(int
i
=
1;
i
<
argc;
++i)
{
std::cout
<<
"Processing
file
"
<<
argv[i]
<<
std::endl;
std::ifstream
fs(argv[i]);
std::string
in;
load_file(in,
fs);
fs.close();
std::string
out_name
=
std::string(argv[i])
+
std::string(".htm");
std::ofstream
os(out_name.c_str());
os
<<
header_text;
//
strip
'<'
and
'>'
first
by
outputting
to
a
//
temporary
string
stream
std::ostringstream
t(std::ios::out
|
std::ios::binary);
std::ostream_iterator<char>
oi(t);
boost::regex_replace(oi,
in.begin(),
in.end(),
e2,
pre_format,
boost::match_default
|
boost::format_all);
//
then
output
to
final
output
stream
//
adding
syntax
highlighting:
std::string
s(t.str());
std::ostream_iterator<char>
out(os);
boost::regex_replace(out,
s.begin(),
s.end(),
e1,
format_string,
boost::match_default
|
boost::format_all);
os
<<
footer_text;
os.close();
}
}
catch(...)
{
return
-1;
}
return
0;
}
const
char*
pre_expression
=
"(<)|(>)|(&)|\\r";
const
char*
pre_format
=
"(?1<)(?2>)(?3&)";
const
char*
expression_text
=
//
preprocessor
directives:
index
1
"(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
//
comment:
index
2
"(//[^\\n]*|/\\*.*?\\*/)|"
//
literals:
index
3
"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
//
string
literals:
index
4
"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
//
keywords:
index
5
"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
"|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
"|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
"|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
"|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
"|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
"|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
"|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
"|using|virtual|void|volatile|wchar_t|while)\\>"
;
const
char*
format_string
=
"(?1<font
color=\"#008040\">$&</font>)"
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 關(guān)于大學(xué)寒假實(shí)習(xí)報告匯編10篇
- 關(guān)于感恩父母演講稿范文合集五篇
- 去年的樹讀后感心得體會
- 總經(jīng)理述職報告8篇
- 除夕話題作文15篇
- 單位老員工辭職報告(合集7篇)
- 大學(xué)生認(rèn)識實(shí)習(xí)報告匯編6篇
- 社保離職證明(集錦15篇)
- 學(xué)生作業(yè)檢查記錄表
- 銷售業(yè)務(wù)合作協(xié)議書
- 國家開放大學(xué)2024春《1472藥劑學(xué)(本)》期末考試真題及答案-開放本科
- 四年級數(shù)學(xué)人教版(上冊)第1課時口算除法(課件)
- 廣西南寧學(xué)院招聘專任教師筆試真題2023
- 網(wǎng)絡(luò)安全測評整改投標(biāo)方案(技術(shù)方案)
- 抗菌藥物臨床應(yīng)用指導(dǎo)原則版
- 學(xué)校自我內(nèi)部控制評價范文
- 海洋氣象大數(shù)據(jù)挖掘與應(yīng)用
- 國際公法智慧樹知到期末考試答案章節(jié)答案2024年華東政法大學(xué)
- 2024年安全員C證考試題庫及解析(1000題)
- 軟件開發(fā)質(zhì)量承諾書
- 三類醫(yī)療器械培訓(xùn)筆記范文
評論
0/150
提交評論