【移動應(yīng)用開發(fā)技術(shù)】4 C++ Boost 正則表達(dá)式_第1頁
【移動應(yīng)用開發(fā)技術(shù)】4 C++ Boost 正則表達(dá)式_第2頁
【移動應(yīng)用開發(fā)技術(shù)】4 C++ Boost 正則表達(dá)式_第3頁
【移動應(yīng)用開發(fā)技術(shù)】4 C++ Boost 正則表達(dá)式_第4頁
【移動應(yīng)用開發(fā)技術(shù)】4 C++ Boost 正則表達(dá)式_第5頁
已閱讀5頁,還剩35頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

【移動應(yīng)用開發(fā)技術(shù)】4C++Boost正則表達(dá)式

4C++

Boost正則表達(dá)式目錄:

離線文檔:

去除HTML文件中的標(biāo)簽:

正則表達(dá)之檢驗(yàn)程序:

正則表達(dá)式元字符:

錨點(diǎn):

匹配多個字母與多個數(shù)字

標(biāo)記:含有()一對小括號里面的東西,Boost中()不需要轉(zhuǎn)譯了

?:

不被標(biāo)記,不能被反向引用

重復(fù)特性[貪婪匹配,盡量去匹配最多的]:

?

非貪婪匹配[盡可能少的匹配]:

流模式,不會回頭,匹配就匹配了,為高性能服務(wù):

反向引用:必須存在被標(biāo)記的表達(dá)式

或條件:

單詞邊界:

命名表達(dá)式:

注釋:

分支重設(shè):

正向預(yù)查:

舉例1:只是匹配th不是匹配ing,但是ing必須存在

舉例2:ing參與匹配,th不被消耗,in被匹配

舉例3:除了ing不匹配,其他都匹配.

反向預(yù)查:

遞歸正則:

操作符優(yōu)先級:

顯示子串的個數(shù)

boost

正則表達(dá)式

sub

match

boost

正則表達(dá)式

算法regex_replace

boost

正則表達(dá)式

迭代器

boost

正則表達(dá)式

-1,就是未被匹配的字符

boost

正則表達(dá)式

captures

官方代碼為什么會出現(xiàn)段錯誤?

boost

正則表達(dá)式

官方例子

boost

正則表達(dá)式

search方式

簡單的詞法分析器,分析C++類定義

boost

正則表達(dá)式

迭代器方式

簡單的詞法分析器,分析C++類定義

boost

正則表達(dá)式,將C++文件轉(zhuǎn)換為HTML文件

boost

正則表達(dá)式

,抓取網(wǎng)頁中的所有連接:離線文檔:boost_1_62_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html去除HTML文件中的標(biāo)簽:chunli@Linux:~/workspace/Boost$sed's/<[\/]\?\([[:alpha:]][[:alnum:]]*[^>]*\)>//g'index.html

正則表達(dá)之檢驗(yàn)程序:

chunli@Linux:~/boost$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

if

(argc

!=

2)

{

cerr

<<

"Usage:

"

<<

argv[0]

<<

"

regex-str"

<<

endl;

return

1;

}

boost::regex

e(argv[1],

boost::regex::icase);

//mark_count

返回regex中帶標(biāo)記子表達(dá)式的數(shù)量。帶標(biāo)記子表達(dá)式是指正則表達(dá)式中用圓括號括起來的部分

cout

<<

"subexpressions:

"

<<

e.mark_count()

<<

endl;

string

line;

while

(getline(cin,

line))

{

boost::match_results<string::const_iterator>

m;

if

(boost::regex_search(line,

m,

e,

boost::match_default))

{

const

int

n

=

m.size();

for

(int

i

=

0;

i

<

n;

++i)

{

cout

<<

m[i]

<<

"

";

}

cout

<<

endl;

}

else

{

cout

<<

setw(line.size())

<<

setfill('-')

<<

'-'

<<

right

<<

endl;

}

}

} 正則表達(dá)式元字符:.[{}()\*+?|^$ 錨點(diǎn):AnchorsA'^'charactershallmatchthestartofaline.A'$'charactershallmatchtheendofaline. 匹配多個字母與多個數(shù)字chunli@Linux:~/boost$g++main.cpp

-lboost_regex-Wall

&&./a.out"\w+\d+"

subexpressions:0Hello,world2016

world2016

標(biāo)記:含有()一對小括號里面的東西,Boost中()不需要轉(zhuǎn)譯了chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

"([[:alpha:]]+)[[:digit:]]+\1"

subexpressions:

1

hello123abc8888888abc

abc8888888abc

abc

\1為引用$1

只有被標(biāo)記的內(nèi)容才能被反向引用. ?:不被標(biāo)記,不能被反向引用chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

'(?:[[:alpha:]]+)[[:digit:]]+'

subexpressions:

0

abcd1234

abcd1234

11111@@

重復(fù)特性[貪婪匹配,盡量去匹配最多的]:* 任意次

+ 至少一次

? 一次

{n} n次

{n,} 大于等于n次

{n,m} n到m次

chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

'a.*b'

subexpressions:

0

azzzzzzzzzbbaaazzzzzzzb

azzzzzzzzzbbaaazzzzzzzb ?非貪婪匹配[盡可能少的匹配]:

Non

greedy

repeats

The

normal

repeat

operators

are

"greedy",

that

is

to

say

they

will

consume

as

much

input

as

possible.

There

are

non-greedy

versions

available

that

will

consume

as

little

input

as

possible

while

still

producing

a

match.

*?

Matches

the

previous

atom

zero

or

more

times,

while

consuming

as

little

input

as

possible.

+?

Matches

the

previous

atom

one

or

more

times,

while

consuming

as

little

input

as

possible.

??

Matches

the

previous

atom

zero

or

one

times,

while

consuming

as

little

input

as

possible.

{n,}?

Matches

the

previous

atom

n

or

more

times,

while

consuming

as

little

input

as

possible.

{n,m}?

Matches

the

previous

atom

between

n

and

m

times,

while

consuming

as

little

input

as

possible.

chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

'a.*?b'

subexpressions:

0

azzzzzzzzzbbaaazzzzzzzb

azzzzzzzzzb 流模式,不會回頭,匹配就匹配了,為高性能服務(wù):

Possessive

repeats

By

default

when

a

repeated

pattern

does

not

match

then

the

engine

will

backtrack

until

a

match

is

found.

However,

this

behaviour

can

sometime

be

undesireble

so

there

are

also

"possessive"

repeats:

these

match

as

much

as

possible

and

do

not

then

allow

backtracking

if

the

rest

of

the

expression

fails

to

match.

*+

Matches

the

previous

atom

zero

or

more

times,

while

giving

nothing

back.

++

Matches

the

previous

atom

one

or

more

times,

while

giving

nothing

back.

?+

Matches

the

previous

atom

zero

or

one

times,

while

giving

nothing

back.

{n,}+

Matches

the

previous

atom

n

or

more

times,

while

giving

nothing

back.

{n,m}+

Matches

the

previous

atom

between

n

and

m

times,

while

giving

nothing

back.

Back

references 反向引用:必須存在被標(biāo)記的表達(dá)式

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'^(a*).*\1$'

subexpressions:

1

a66a66

a66a66

asssasss

asssasss 或條件:

Alternation

The

|

operator

will

match

either

of

its

arguments,

so

for

example:

abc|def

will

match

either

"abc"

or

"def".

Parenthesis

can

be

used

to

group

alternations,

for

example:

ab(d|ef)

will

match

either

of

"abd"

or

"abef".

Empty

alternatives

are

not

allowed

(these

are

almost

always

a

mistake),

but

if

you

really

want

an

empty

alternative

use

(?:)

as

a

placeholder,

for

example:

|abc

is

not

a

valid

expression,

but

(?:)|abc

is

and

is

equivalent,

also

the

expression:

(?:abc)??

has

exactly

the

same

effect.

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'l(i|o)ve'

subexpressions:

1

love

love

o

live

live

i

^C

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'\<l(i|o)ve\>'

subexpressions:

1

love

love

o

live

live

i

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'abc|123|234'

subexpressions:

0

23

--

123

123

abc

abc

234

234

123456789abc

123 單詞邊界:

Word

Boundaries

Word

Boundaries

The

following

escape

sequences

match

the

boundaries

of

words:

<

Matches

the

start

of

a

word.

>

Matches

the

end

of

a

word.

\b

Matches

a

word

boundary

(the

start

or

end

of

a

word).

\B

Matches

only

when

not

at

a

word

boundary. 命名表達(dá)式:

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<r1>\d+)[[:blank:]]+\1'

subexpressions:

1

123

123

123

123

123

234

234

234

234

234

^C

chunli@Linux:~/boost$

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<r1>\d+)[[:blank:]]+\g{r1}'

subexpressions:

1

1234

1234

1234

1234

1234

1236

1236

1236

1236

1236 注釋:

Comments

(?#

...

)

is

treated

as

a

comment,

it's

contents

are

ignored.

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'\d+(?#我的注釋)'

subexpressions:

0

hello1234

1234 分支重設(shè):

Branch

reset

(?|pattern)

resets

the

subexpression

count

at

the

start

of

each

"|"

alternative

within

pattern.

The

sub-expression

count

following

this

construct

is

that

of

whichever

branch

had

the

largest

number

of

sub-expressions.

This

construct

is

useful

when

you

want

to

capture

one

of

a

number

of

alternative

matches

in

a

single

sub-expression

index.

In

the

following

example

the

index

of

each

sub-expression

is

shown

below

the

expression:

#

before

branch-reset

after

/

(

a

)

(?|

x

(

y

)

z

|

(p

(q)

r)

|

(t)

u

(v)

)

(

z

)

/x

#

1

2

2

3

2

3

4

chunli@Linux:~/boost$

./a.out

'(

a

)

(?|

x

(

y

)

z

|

(p

(q)

r)

|

(t)

u

(v)

)

(

z

)

/x'

subexpressions:

4 正向預(yù)查:即使字符已經(jīng)被匹配,但是不被消耗,留著其他人繼續(xù)匹配Lookahead(?=pattern)consumeszerocharacters,onlyifpatternmatches.(?!pattern)consumeszerocharacters,onlyifpatterndoesnotmatch.LookaheadistypicallyusedtocreatethelogicalANDoftworegularexpressions,forexampleifapasswordmustcontainalowercaseletter,anuppercaseletter,apunctuationsymbol,andbeatleast6characterslong,thentheexpression:(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}couldbeusedtovalidatethepassword. 舉例1:只是匹配th不是匹配ing,但是ing必須存在chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'th(?=ing)'

subexpressions:

0

those

thing

th 舉例2:ing參與匹配,th不被消耗,in被匹配chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'th(?=ing)(in)'

subexpressions:

1

thing

thin

in

those

舉例3:除了ing不匹配,其他都匹配.chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'th(?!ing)'

subexpressions:

0

this

th

thing

反向預(yù)查:

Lookbehind

(?<=pattern)

consumes

zero

characters,

only

if

pattern

could

be

matched

against

the

characters

preceding

the

current

position

(pattern

must

be

of

fixed

length).

(?<!pattern)

consumes

zero

characters,

only

if

pattern

could

not

be

matched

against

the

characters

preceding

the

current

position

(pattern

must

be

of

fixed

length).

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<=ti)mer'

subexpressions:

0

timer

mer

memer

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<!ti)mer'

subexpressions:

0

timer

hhmer

mer 遞歸正則:(?N)

(?-N)

(?+N)

(?R)

(?0)

(?&NAME)

(?R)

and

(?0)

recurse

to

the

start

of

the

entire

pattern.

(?N)

executes

sub-expression

N

recursively,

for

example

(?2)

will

recurse

to

sub-expression

2.

(?-N)

and

(?+N)

are

relative

recursions,

so

for

example

(?-1)

recurses

to

the

last

sub-expression

to

be

declared,

and

(?+1)

recurses

to

the

next

sub-expression

to

be

declared.

(?&NAME)

recurses

to

named

sub-expression

NAME. 操作符優(yōu)先級:

Operator

precedence

The

order

of

precedence

for

of

operators

is

as

follows:

Collation-related

bracket

symbols

[==]

[::]

[..]

Escaped

characters

\

Character

set

(bracket

expression)

[]

Grouping

()

Single-character-ERE

duplication

*

+

?

{m,n}

Concatenation

Anchoring

^$

Alternation

|===========================================================Boost

regexAPI顯示子串的個數(shù)

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

regex

e1;

e1

=

"^[[:xdigit:]]*$";

cout

<<

e1.str()

<<

endl;

cout

<<

e1.mark_count()

<<

endl;

//regex::save_subexpression_location如果沒有打開,

e2.subexpression(0)會報錯

regex

e2("\\b\\w+(?=ing)\\b.{2,}?([[:alpha:]]*)$",regex::perl

|

regex::icase|regex::save_subexpression_location );

cout

<<

e2.str()

<<

endl;

cout

<<

e2.mark_count()

<<

endl;

pair<regex::const_iterator,regex::const_iterator>

sub1

=

e2.subexpression(0);

string

sub1Str(sub1.first,++sub1.second);

cout

<<

sub1Str

<<

endl;

return

0;

}

pi@raspberrypi:~/boost

$

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

^[[1;5D^[[:xdigit:]]*$

0

\b\w+(?=ing)\b.{2,}?([[:alpha:]]*)$

1

([[:alpha:]]*)

pi@raspberrypi:~/boost

$boost正則表達(dá)式submatch

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

//以T開頭,跟多個字母

\b邊界,然后是16進(jìn)制匹配

regex

e1("\\bT\\w+\\b

([[:xdigit:]]+)");//讓正則表達(dá)式看到反斜杠

string

s("Time

ef09,Todo

001");

boost::smatch

m;

//bool

b

=

boost::regex_search(s,m,e1,boost::match_all);//:match_all只會匹配最后一下

bool

b

=

boost::regex_search(s,m,e1);//默認(rèn)只會匹配首次

cout

<<

b

<<endl;

const

int

n

=

m.size();

for(int

i

=

0;

i<n;

i++)

{

cout

<<

"matched:"

<<

i

<<

"

,position:"

<<

m.position(i)

<<",

";

cout

<<

"length:"

<<

m.length(i)

<<

"

,

str:"

<<

m.str(i)

<<

endl;

}

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

1

matched:0

,position:0,

length:9

,

str:Time

ef09

matched:1

,position:5,

length:4

,

str:ef09

pi@raspberrypi:~/boost

$boost正則表達(dá)式算法regex_replace

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

regex

e1("([TQV])|(\\*)|(@)");

string

replaceFmt("(\\L?1$&)(?2+)(?3#)");//轉(zhuǎn)小寫,轉(zhuǎn)+,轉(zhuǎn)#

string

src("guTdQhV@@g*b*");//輸入的字符串

cout

<<

"before

replaced:

"

<<src

<<

endl;

//before

replaced:

guTdQhV@@g*b*

string

newStr1

=

regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_all);//必須format_all

cout

<<

"after

replaced:

"

<<

newStr1

<<

endl;

//after

replaced:

gutdqhv##g+b+

string

newStr2

=

regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_default);//奇怪的結(jié)果

cout

<<

"after

replaced:

"

<<

newStr2

<<

endl;

//其他的方式

ostream_iterator<char>

oi(cout);

regex_replace(oi,src.begin(),src.end(),e1,replaceFmt,boost::match_default

|

boost::match_all);

cout

<<

endl;

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

before

replaced:

guTdQhV@@g*b*

after

replaced:

gutdqhv##g+b+

after

replaced:

gu(?1t)(?2+)(?3#)d(?1q)(?2+)(?3#)h(?1v)(?2+)(?3#)(?1@)(?2+)(?3#)(?1@)(?2+)(?3#)g(?1*)(?2+)(?3#)b(?1*)(?2+)(?3#)

guTdQhV@@g*b(?1*)(?2+)(?3#)

pi@raspberrypi:~/boost

$boost正則表達(dá)式

迭代器

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

regex

e("(a+).+?",regex::icase);

string

s("ann

abb

aaat");

boost::sregex_iterator

it1(s.begin(),s.end(),e);

boost::sregex_iterator

it2;

for(;it1

!=

it2;++it1)

{

boost::smatch

m

=

*it1;

cout

<<

m

<<

endl;

}

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

an

ab

aaat

pi@raspberrypi:~/boost

$boost正則表達(dá)式-1,就是未被匹配的字符

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

string

s("this

is

::a

string

::of

tokens");

boost::regex

re("\\s+:*");//匹配

boost::sregex_token_iterator

i(s.begin(),s.end(),re,-1);

boost::sregex_token_iterator

j;

unsigned

count

=

0;

while(i

!=

j)

{

cout

<<

*i++

<<

endl;

count++;

}

cout

<<

"There

were

"<<

count

<<

"

tokens

found

!"

<<

endl;

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

this

is

a

string

of

tokens

There

were

6

tokens

found

!

pi@raspberrypi:~/boost

$boost正則表達(dá)式captures官方代碼為什么會出現(xiàn)段錯誤?

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<boost/regex.hpp>

#include

<iostream>

void

print_captures(const

std::string&

regx,

const

std::string&

text)

{

boost::regex

e(regx);

boost::smatch

what;

std::cout

<<

"Expression:

\""

<<

regx

<<

"\"\n";

std::cout

<<

"Text:

\""

<<

text

<<

"\"\n";

if(boost::regex_match(text,

what,

e,

boost::match_extra))

{

unsigned

i,

j;

std::cout

<<

"**

Match

found

**\n

Sub-Expressions:\n";

for(i

=

0;

i

<

what.size();

++i)

std::cout

<<

"

$"

<<

i

<<

"

=

\""

<<

what[i]

<<

"\"\n";

std::cout

<<

"

Captures:\n";

for(i

=

0;

i

<

what.size();

++i)

{

std::cout

<<

"

$"

<<

i

<<

"

=

{";

for(j

=

0;

j

<

what.captures(i).size();

++j)

{

if(j)

std::cout

<<

",

";

else

std::cout

<<

"

";

std::cout

<<

"\""

<<

what.captures(i)[j]

<<

"\"";

}

std::cout

<<

"

}\n";

}

}

else

{

std::cout

<<

"**

No

Match

found

**\n";

}

}

int

main(int

,

char*

[])

{

print_captures("(([[:lower:]]+)|([[:upper:]]+))+",

"aBBcccDDDDDeeeeeeee");

print_captures("a(b+|((c)*))+d",

"abd");

print_captures("(.*)bar|(.*)bah",

"abcbar");

print_captures("(.*)bar|(.*)bah",

"abcbah");

print_captures("^(?:(\\w+)|(?>\\W+))*$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

print_captures("^(?>(\\w+)\\W*)*$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

print_captures("^(\\w+)\\W+(?>(\\w+)\\W+)*(\\w+)$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

print_captures("^(\\w+)\\W+(?>(\\w+)\\W+(?:(\\w+)\\W+){0,2})*(\\w+)$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

return

0;

}

pi@raspberrypi:~/boost

$

g++

-D

BOOST_REGEX_MATCH_EXTRA

-l

boost_regex

-Wall

main.cpp

&&./a.out

Expression:

"(([[:lower:]]+)|([[:upper:]]+))+"

Text:

"aBBcccDDDDDeeeeeeee"

**

No

Match

found

**

Bus

error

pi@raspberrypi:~/boost

$boost正則表達(dá)式官方例子

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<cstdlib>

#include

<stdlib.h>

#include

<boost/regex.hpp>

#include

<string>

#include

<iostream>

using

namespace

std;

using

namespace

boost;

regex

expression("^([0-9]+)(\\-|

|$)(.*)$");//0-9,-

$,*三種

int

process_ftp(const

char*

response,

std::string*

msg)

{

cmatch

what;

if(regex_match(response,

what,

expression))

{

//

what[0]

contains

the

whole

string

//

what[1]

contains

the

response

code

//

what[2]

contains

the

separator

character

//

what[3]

contains

the

text

message.

if(msg)

msg->assign(what[3].first,

what[3].second);

return

::atoi(what[1].first);

}

//

failure

did

not

match

if(msg)

msg->erase();

return

-1;

}

#if

defined(BOOST_MSVC)

||

(defined(__BORLANDC__)

&&

(__BORLANDC__

==

0x550))

istream&

getline(istream&

is,

std::string&

s)

{

s.erase();

char

c

=

static_cast<char>(is.get());

while(c

!=

'\n')

{

s.append(1,

c);

c

=

static_cast<char>(is.get());

}

return

is;

}

#endif

int

main(int

argc,

const

char*[])

{

std::string

in,

out;

do

{

if(argc

==

1)

{

cout

<<

"enter

test

string"

<<

endl;

getline(cin,

in);

if(in

==

"quit")

break;

}

else

in

=

"100

this

is

an

ftp

message

text";

int

result;

result

=

process_ftp(in.c_str(),

&out);

if(result

!=

-1)

{

cout

<<

"Match

found:"

<<

endl;

cout

<<

"Response

code:

"

<<

result

<<

endl;

cout

<<

"Message

text:

"

<<

out

<<

endl;

}

else

{

cout

<<

"Match

not

found"

<<

endl;

}

cout

<<

endl;

}

while(argc

==

1);

return

0;

}

pi@raspberrypi:~/boost

$

g++

-l

boost_regex

-Wall

main.cpp

&&./a.out

enter

test

string

404

not

found

Match

found:

Response

code:

404

Message

text:

not

found

enter

test

string

500

service

error

Match

found:

Response

code:

500

Message

text:

service

error

enter

test

string

^C

pi@raspberrypi:~/boost

$boost正則表達(dá)式search方式簡單的詞法分析器,分析C++類定義

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<string>

#include

<map>

#include

<boost/regex.hpp>

//

purpose:

//

takes

the

contents

of

a

file

in

the

form

of

a

string

//

and

searches

for

all

the

C++

class

definitions,

storing

//

their

locations

in

a

map

of

strings/int's

typedef

std::map<std::string,

std::string::difference_type,

std::less<std::string>

>

map_type;

const

char*

re

=

//

possibly

leading

whitespace:

"^[[:space:]]*"

//

possible

template

declaration:

"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"

//

class

or

struct:

"(class|struct)[[:space:]]*"

//

leading

declspec

macros

etc:

"("

"\\<\\w+\\>"

"("

"[[:blank:]]*\\([^)]*\\)"

")?"

"[[:space:]]*"

")*"

//

the

class

name

"(\\<\\w*\\>)[[:space:]]*"

//

template

specialisation

parameters

"(<[^;:{]+>)?[[:space:]]*"

//

terminate

in

{

or

:

"(\\{|:[^;\\{()]*\\{)";

boost::regex

expression(re);

void

IndexClasses(map_type&

m,

const

std::string&

file)

{

std::string::const_iterator

start,

end;

start

=

file.begin();

end

=

file.end();

boost::match_results<std::string::const_iterator>

what;

boost::match_flag_type

flags

=

boost::match_default;

while(boost::regex_search(start,

end,

what,

expression,

flags))

{

//

what[0]

contains

the

whole

string

//

what[5]

contains

the

class

name.

//

what[6]

contains

the

template

specialisation

if

any.

//

add

class

name

and

position

to

map:

m[std::string(what[5].first,

what[5].second)

+

std::string(what[6].first,

what[6].second)]

=

what[5].first

-

file.begin();

//

update

search

position:

start

=

what[0].second;

//

update

flags:

flags

|=

boost::match_prev_avail;

flags

|=

boost::match_not_bob;

}

}

#include

<iostream>

#include

<fstream>

using

namespace

std;

void

load_file(std::string&

s,

std::istream&

is)

{

s.erase();

if(is.bad())

return;

s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));

char

c;

while(is.get(c))

{

if(s.capacity()

==

s.size())

s.reserve(s.capacity()

*

3);

s.append(1,

c);

}

}

int

main(int

argc,

const

char**

argv)

{

std::string

text;

for(int

i

=

1;

i

<

argc;

++i)

{

cout

<<

"Processing

file

"

<<

argv[i]

<<

endl;

map_type

m;

std::ifstream

fs(argv[i]);

load_file(text,

fs);

fs.close();

IndexClasses(m,

text);

cout

<<

m.size()

<<

"

matches

found"

<<

endl;

map_type::iterator

c,

d;

c

=

m.begin();

d

=

m.end();

while(c

!=

d)

{

cout

<<

"class

\""

<<

(*c).first

<<

"\"

found

at

index:

"

<<

(*c).second

<<

endl;

++c;

}

}

return

0;

}

pi@raspberrypi:~/boost

$

cat

my_class.cpp

template

<class

T>

struct

A

{

public:

};

template

<class

T>

class

M

{

}

;

pi@raspberrypi:~/boost

$

g++

-l

boost_regex

-Wall

main.cpp

&&./a.out

my_class.cpp

Processing

file

my_class.cpp

2

matches

found

class

"A"

found

at

index:

36

class

"M"

found

at

index:

88

pi@raspberrypi:~/boost

$boost正則表達(dá)式迭代器方式簡單的詞法分析器,分析C++類定義

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<string>

#include

<map>

#include

<fstream>

#include

<iostream>

#include

<boost/regex.hpp>

using

namespace

std;

//

purpose:

//

takes

the

contents

of

a

file

in

the

form

of

a

string

//

and

searches

for

all

the

C++

class

definitions,

storing

//

their

locations

in

a

map

of

strings/int's

typedef

std::map<std::string,

std::string::difference_type,

std::less<std::string>

>

map_type;

const

char*

re

=

//

possibly

leading

whitespace:

"^[[:space:]]*"

//

possible

template

declaration:

"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"

//

class

or

struct:

"(class|struct)[[:space:]]*"

//

leading

declspec

macros

etc:

"("

"\\<\\w+\\>"

"("

"[[:blank:]]*\\([^)]*\\)"

")?"

"[[:space:]]*"

")*"

//

the

class

name

"(\\<\\w*\\>)[[:space:]]*"

//

template

specialisation

parameters

"(<[^;:{]+>)?[[:space:]]*"

//

terminate

in

{

or

:

"(\\{|:[^;\\{()]*\\{)";

boost::regex

expression(re);

map_type

class_index;

bool

regex_callback(const

boost::match_results<std::string::const_iterator>&

what)

{

//

what[0]

contains

the

whole

string

//

what[5]

contains

the

class

name.

//

what[6]

contains

the

template

specialisation

if

any.

//

add

class

name

and

position

to

map:

class_index[what[5].str()

+

what[6].str()]

=

what.position(5);

return

true;

}

void

load_file(std::string&

s,

std::istream&

is)

{

s.erase();

if(is.bad())

return;

s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));

char

c;

while(is.get(c))

{

if(s.capacity()

==

s.size())

s.reserve(s.capacity()

*

3);

s.append(1,

c);

}

}

int

main(int

argc,

const

char**

argv)

{

std::string

text;

for(int

i

=

1;

i

<

argc;

++i)

{

cout

<<

"Processing

file

"

<<

argv[i]

<<

endl;

std::ifstream

fs(argv[i]);

load_file(text,

fs);

fs.close();

//

construct

our

iterators:

boost::sregex_iterator

m1(text.begin(),

text.end(),

expression);

boost::sregex_iterator

m2;

std::for_each(m1,

m2,

®ex_callback);

//

copy

results:

cout

<<

class_index.size()

<<

"

matches

found"

<<

endl;

map_type::iterator

c,

d;

c

=

class_index.begin();

d

=

class_index.end();

while(c

!=

d)

{

cout

<<

"class

\""

<<

(*c).first

<<

"\"

found

at

index:

"

<<

(*c).second

<<

endl;

++c;

}

class_index.erase(class_index.begin(),

class_index.end());

}

return

0;

}

pi@raspberrypi:~/boost

$

g++

-l

boost_regex

-Wall

main.cpp

&&./a.out

main.cpp

my_class.cpp

Processing

file

main.cpp

0

matches

found

Processing

file

my_class.cpp

2

matches

found

class

"A"

found

at

index:

23

class

"B"

found

at

index:

36

pi@raspberrypi:~/boost

$boost正則表達(dá)式,將C++文件轉(zhuǎn)換為HTML文件

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<fstream>

#include

<sstream>

#include

<string>

#include

<iterator>

#include

<boost/regex.hpp>

#include

<fstream>

#include

<iostream>

//

purpose:

//

takes

the

contents

of

a

file

and

transform

to

//

syntax

highlighted

code

in

html

format

boost::regex

e1,

e2;

extern

const

char*

expression_text;

extern

const

char*

format_string;

extern

const

char*

pre_expression;

extern

const

char*

pre_format;

extern

const

char*

header_text;

extern

const

char*

footer_text;

void

load_file(std::string&

s,

std::istream&

is)

{

s.erase();

if(is.bad())

return;

s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));

char

c;

while(is.get(c))

{

if(s.capacity()

==

s.size())

s.reserve(s.capacity()

*

3);

s.append(1,

c);

}

}

int

main(int

argc,

const

char**

argv)

{

try{

e1.assign(expression_text);

e2.assign(pre_expression);

for(int

i

=

1;

i

<

argc;

++i)

{

std::cout

<<

"Processing

file

"

<<

argv[i]

<<

std::endl;

std::ifstream

fs(argv[i]);

std::string

in;

load_file(in,

fs);

fs.close();

std::string

out_name

=

std::string(argv[i])

+

std::string(".htm");

std::ofstream

os(out_name.c_str());

os

<<

header_text;

//

strip

'<'

and

'>'

first

by

outputting

to

a

//

temporary

string

stream

std::ostringstream

t(std::ios::out

|

std::ios::binary);

std::ostream_iterator<char>

oi(t);

boost::regex_replace(oi,

in.begin(),

in.end(),

e2,

pre_format,

boost::match_default

|

boost::format_all);

//

then

output

to

final

output

stream

//

adding

syntax

highlighting:

std::string

s(t.str());

std::ostream_iterator<char>

out(os);

boost::regex_replace(out,

s.begin(),

s.end(),

e1,

format_string,

boost::match_default

|

boost::format_all);

os

<<

footer_text;

os.close();

}

}

catch(...)

{

return

-1;

}

return

0;

}

const

char*

pre_expression

=

"(<)|(>)|(&)|\\r";

const

char*

pre_format

=

"(?1<)(?2>)(?3&)";

const

char*

expression_text

=

//

preprocessor

directives:

index

1

"(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"

//

comment:

index

2

"(//[^\\n]*|/\\*.*?\\*/)|"

//

literals:

index

3

"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"

//

string

literals:

index

4

"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"

//

keywords:

index

5

"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"

"|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"

"|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"

"|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"

"|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"

"|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"

"|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"

"|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"

"|using|virtual|void|volatile|wchar_t|while)\\>"

;

const

char*

format_string

=

"(?1<font

color=\"#008040\">$&</font>)"

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論