【移動應用開發(fā)技術】4 C++ Boost 正則表達式_第1頁
【移動應用開發(fā)技術】4 C++ Boost 正則表達式_第2頁
【移動應用開發(fā)技術】4 C++ Boost 正則表達式_第3頁
【移動應用開發(fā)技術】4 C++ Boost 正則表達式_第4頁
【移動應用開發(fā)技術】4 C++ Boost 正則表達式_第5頁
已閱讀5頁,還剩35頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

【移動應用開發(fā)技術】4C++Boost正則表達式

4C++

Boost正則表達式目錄:

離線文檔:

去除HTML文件中的標簽:

正則表達之檢驗程序:

正則表達式元字符:

錨點:

匹配多個字母與多個數(shù)字

標記:含有()一對小括號里面的東西,Boost中()不需要轉譯了

?:

不被標記,不能被反向引用

重復特性[貪婪匹配,盡量去匹配最多的]:

?

非貪婪匹配[盡可能少的匹配]:

流模式,不會回頭,匹配就匹配了,為高性能服務:

反向引用:必須存在被標記的表達式

或條件:

單詞邊界:

命名表達式:

注釋:

分支重設:

正向預查:

舉例1:只是匹配th不是匹配ing,但是ing必須存在

舉例2:ing參與匹配,th不被消耗,in被匹配

舉例3:除了ing不匹配,其他都匹配.

反向預查:

遞歸正則:

操作符優(yōu)先級:

顯示子串的個數(shù)

boost

正則表達式

sub

match

boost

正則表達式

算法regex_replace

boost

正則表達式

迭代器

boost

正則表達式

-1,就是未被匹配的字符

boost

正則表達式

captures

官方代碼為什么會出現(xiàn)段錯誤?

boost

正則表達式

官方例子

boost

正則表達式

search方式

簡單的詞法分析器,分析C++類定義

boost

正則表達式

迭代器方式

簡單的詞法分析器,分析C++類定義

boost

正則表達式,將C++文件轉換為HTML文件

boost

正則表達式

,抓取網頁中的所有連接:離線文檔:boost_1_62_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html去除HTML文件中的標簽:chunli@Linux:~/workspace/Boost$sed's/<[\/]\?\([[:alpha:]][[:alnum:]]*[^>]*\)>//g'index.html

正則表達之檢驗程序:

chunli@Linux:~/boost$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

if

(argc

!=

2)

{

cerr

<<

"Usage:

"

<<

argv[0]

<<

"

regex-str"

<<

endl;

return

1;

}

boost::regex

e(argv[1],

boost::regex::icase);

//mark_count

返回regex中帶標記子表達式的數(shù)量。帶標記子表達式是指正則表達式中用圓括號括起來的部分

cout

<<

"subexpressions:

"

<<

e.mark_count()

<<

endl;

string

line;

while

(getline(cin,

line))

{

boost::match_results<string::const_iterator>

m;

if

(boost::regex_search(line,

m,

e,

boost::match_default))

{

const

int

n

=

m.size();

for

(int

i

=

0;

i

<

n;

++i)

{

cout

<<

m[i]

<<

"

";

}

cout

<<

endl;

}

else

{

cout

<<

setw(line.size())

<<

setfill('-')

<<

'-'

<<

right

<<

endl;

}

}

} 正則表達式元字符:.[{}()\*+?|^$ 錨點:AnchorsA'^'charactershallmatchthestartofaline.A'$'charactershallmatchtheendofaline. 匹配多個字母與多個數(shù)字chunli@Linux:~/boost$g++main.cpp

-lboost_regex-Wall

&&./a.out"\w+\d+"

subexpressions:0Hello,world2016

world2016

標記:含有()一對小括號里面的東西,Boost中()不需要轉譯了chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

"([[:alpha:]]+)[[:digit:]]+\1"

subexpressions:

1

hello123abc8888888abc

abc8888888abc

abc

\1為引用$1

只有被標記的內容才能被反向引用. ?:不被標記,不能被反向引用chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

'(?:[[:alpha:]]+)[[:digit:]]+'

subexpressions:

0

abcd1234

abcd1234

11111@@

重復特性[貪婪匹配,盡量去匹配最多的]:* 任意次

+ 至少一次

? 一次

{n} n次

{n,} 大于等于n次

{n,m} n到m次

chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

'a.*b'

subexpressions:

0

azzzzzzzzzbbaaazzzzzzzb

azzzzzzzzzbbaaazzzzzzzb ?非貪婪匹配[盡可能少的匹配]:

Non

greedy

repeats

The

normal

repeat

operators

are

"greedy",

that

is

to

say

they

will

consume

as

much

input

as

possible.

There

are

non-greedy

versions

available

that

will

consume

as

little

input

as

possible

while

still

producing

a

match.

*?

Matches

the

previous

atom

zero

or

more

times,

while

consuming

as

little

input

as

possible.

+?

Matches

the

previous

atom

one

or

more

times,

while

consuming

as

little

input

as

possible.

??

Matches

the

previous

atom

zero

or

one

times,

while

consuming

as

little

input

as

possible.

{n,}?

Matches

the

previous

atom

n

or

more

times,

while

consuming

as

little

input

as

possible.

{n,m}?

Matches

the

previous

atom

between

n

and

m

times,

while

consuming

as

little

input

as

possible.

chunli@Linux:~/boost$

g++

main.cpp

-l

boost_regex

-Wall

&&

./a.out

'a.*?b'

subexpressions:

0

azzzzzzzzzbbaaazzzzzzzb

azzzzzzzzzb 流模式,不會回頭,匹配就匹配了,為高性能服務:

Possessive

repeats

By

default

when

a

repeated

pattern

does

not

match

then

the

engine

will

backtrack

until

a

match

is

found.

However,

this

behaviour

can

sometime

be

undesireble

so

there

are

also

"possessive"

repeats:

these

match

as

much

as

possible

and

do

not

then

allow

backtracking

if

the

rest

of

the

expression

fails

to

match.

*+

Matches

the

previous

atom

zero

or

more

times,

while

giving

nothing

back.

++

Matches

the

previous

atom

one

or

more

times,

while

giving

nothing

back.

?+

Matches

the

previous

atom

zero

or

one

times,

while

giving

nothing

back.

{n,}+

Matches

the

previous

atom

n

or

more

times,

while

giving

nothing

back.

{n,m}+

Matches

the

previous

atom

between

n

and

m

times,

while

giving

nothing

back.

Back

references 反向引用:必須存在被標記的表達式

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'^(a*).*\1$'

subexpressions:

1

a66a66

a66a66

asssasss

asssasss 或條件:

Alternation

The

|

operator

will

match

either

of

its

arguments,

so

for

example:

abc|def

will

match

either

"abc"

or

"def".

Parenthesis

can

be

used

to

group

alternations,

for

example:

ab(d|ef)

will

match

either

of

"abd"

or

"abef".

Empty

alternatives

are

not

allowed

(these

are

almost

always

a

mistake),

but

if

you

really

want

an

empty

alternative

use

(?:)

as

a

placeholder,

for

example:

|abc

is

not

a

valid

expression,

but

(?:)|abc

is

and

is

equivalent,

also

the

expression:

(?:abc)??

has

exactly

the

same

effect.

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'l(i|o)ve'

subexpressions:

1

love

love

o

live

live

i

^C

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'\<l(i|o)ve\>'

subexpressions:

1

love

love

o

live

live

i

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'abc|123|234'

subexpressions:

0

23

--

123

123

abc

abc

234

234

123456789abc

123 單詞邊界:

Word

Boundaries

Word

Boundaries

The

following

escape

sequences

match

the

boundaries

of

words:

<

Matches

the

start

of

a

word.

>

Matches

the

end

of

a

word.

\b

Matches

a

word

boundary

(the

start

or

end

of

a

word).

\B

Matches

only

when

not

at

a

word

boundary. 命名表達式:

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<r1>\d+)[[:blank:]]+\1'

subexpressions:

1

123

123

123

123

123

234

234

234

234

234

^C

chunli@Linux:~/boost$

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<r1>\d+)[[:blank:]]+\g{r1}'

subexpressions:

1

1234

1234

1234

1234

1234

1236

1236

1236

1236

1236 注釋:

Comments

(?#

...

)

is

treated

as

a

comment,

it's

contents

are

ignored.

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'\d+(?#我的注釋)'

subexpressions:

0

hello1234

1234 分支重設:

Branch

reset

(?|pattern)

resets

the

subexpression

count

at

the

start

of

each

"|"

alternative

within

pattern.

The

sub-expression

count

following

this

construct

is

that

of

whichever

branch

had

the

largest

number

of

sub-expressions.

This

construct

is

useful

when

you

want

to

capture

one

of

a

number

of

alternative

matches

in

a

single

sub-expression

index.

In

the

following

example

the

index

of

each

sub-expression

is

shown

below

the

expression:

#

before

branch-reset

after

/

(

a

)

(?|

x

(

y

)

z

|

(p

(q)

r)

|

(t)

u

(v)

)

(

z

)

/x

#

1

2

2

3

2

3

4

chunli@Linux:~/boost$

./a.out

'(

a

)

(?|

x

(

y

)

z

|

(p

(q)

r)

|

(t)

u

(v)

)

(

z

)

/x'

subexpressions:

4 正向預查:即使字符已經被匹配,但是不被消耗,留著其他人繼續(xù)匹配Lookahead(?=pattern)consumeszerocharacters,onlyifpatternmatches.(?!pattern)consumeszerocharacters,onlyifpatterndoesnotmatch.LookaheadistypicallyusedtocreatethelogicalANDoftworegularexpressions,forexampleifapasswordmustcontainalowercaseletter,anuppercaseletter,apunctuationsymbol,andbeatleast6characterslong,thentheexpression:(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}couldbeusedtovalidatethepassword. 舉例1:只是匹配th不是匹配ing,但是ing必須存在chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'th(?=ing)'

subexpressions:

0

those

thing

th 舉例2:ing參與匹配,th不被消耗,in被匹配chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'th(?=ing)(in)'

subexpressions:

1

thing

thin

in

those

舉例3:除了ing不匹配,其他都匹配.chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'th(?!ing)'

subexpressions:

0

this

th

thing

反向預查:

Lookbehind

(?<=pattern)

consumes

zero

characters,

only

if

pattern

could

be

matched

against

the

characters

preceding

the

current

position

(pattern

must

be

of

fixed

length).

(?<!pattern)

consumes

zero

characters,

only

if

pattern

could

not

be

matched

against

the

characters

preceding

the

current

position

(pattern

must

be

of

fixed

length).

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<=ti)mer'

subexpressions:

0

timer

mer

memer

chunli@Linux:~/boost$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

'(?<!ti)mer'

subexpressions:

0

timer

hhmer

mer 遞歸正則:(?N)

(?-N)

(?+N)

(?R)

(?0)

(?&NAME)

(?R)

and

(?0)

recurse

to

the

start

of

the

entire

pattern.

(?N)

executes

sub-expression

N

recursively,

for

example

(?2)

will

recurse

to

sub-expression

2.

(?-N)

and

(?+N)

are

relative

recursions,

so

for

example

(?-1)

recurses

to

the

last

sub-expression

to

be

declared,

and

(?+1)

recurses

to

the

next

sub-expression

to

be

declared.

(?&NAME)

recurses

to

named

sub-expression

NAME. 操作符優(yōu)先級:

Operator

precedence

The

order

of

precedence

for

of

operators

is

as

follows:

Collation-related

bracket

symbols

[==]

[::]

[..]

Escaped

characters

\

Character

set

(bracket

expression)

[]

Grouping

()

Single-character-ERE

duplication

*

+

?

{m,n}

Concatenation

Anchoring

^$

Alternation

|===========================================================Boost

regexAPI顯示子串的個數(shù)

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

regex

e1;

e1

=

"^[[:xdigit:]]*$";

cout

<<

e1.str()

<<

endl;

cout

<<

e1.mark_count()

<<

endl;

//regex::save_subexpression_location如果沒有打開,

e2.subexpression(0)會報錯

regex

e2("\\b\\w+(?=ing)\\b.{2,}?([[:alpha:]]*)$",regex::perl

|

regex::icase|regex::save_subexpression_location );

cout

<<

e2.str()

<<

endl;

cout

<<

e2.mark_count()

<<

endl;

pair<regex::const_iterator,regex::const_iterator>

sub1

=

e2.subexpression(0);

string

sub1Str(sub1.first,++sub1.second);

cout

<<

sub1Str

<<

endl;

return

0;

}

pi@raspberrypi:~/boost

$

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

^[[1;5D^[[:xdigit:]]*$

0

\b\w+(?=ing)\b.{2,}?([[:alpha:]]*)$

1

([[:alpha:]]*)

pi@raspberrypi:~/boost

$boost正則表達式submatch

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

//以T開頭,跟多個字母

\b邊界,然后是16進制匹配

regex

e1("\\bT\\w+\\b

([[:xdigit:]]+)");//讓正則表達式看到反斜杠

string

s("Time

ef09,Todo

001");

boost::smatch

m;

//bool

b

=

boost::regex_search(s,m,e1,boost::match_all);//:match_all只會匹配最后一下

bool

b

=

boost::regex_search(s,m,e1);//默認只會匹配首次

cout

<<

b

<<endl;

const

int

n

=

m.size();

for(int

i

=

0;

i<n;

i++)

{

cout

<<

"matched:"

<<

i

<<

"

,position:"

<<

m.position(i)

<<",

";

cout

<<

"length:"

<<

m.length(i)

<<

"

,

str:"

<<

m.str(i)

<<

endl;

}

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

1

matched:0

,position:0,

length:9

,

str:Time

ef09

matched:1

,position:5,

length:4

,

str:ef09

pi@raspberrypi:~/boost

$boost正則表達式算法regex_replace

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

regex

e1("([TQV])|(\\*)|(@)");

string

replaceFmt("(\\L?1$&)(?2+)(?3#)");//轉小寫,轉+,轉#

string

src("guTdQhV@@g*b*");//輸入的字符串

cout

<<

"before

replaced:

"

<<src

<<

endl;

//before

replaced:

guTdQhV@@g*b*

string

newStr1

=

regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_all);//必須format_all

cout

<<

"after

replaced:

"

<<

newStr1

<<

endl;

//after

replaced:

gutdqhv##g+b+

string

newStr2

=

regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_default);//奇怪的結果

cout

<<

"after

replaced:

"

<<

newStr2

<<

endl;

//其他的方式

ostream_iterator<char>

oi(cout);

regex_replace(oi,src.begin(),src.end(),e1,replaceFmt,boost::match_default

|

boost::match_all);

cout

<<

endl;

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

before

replaced:

guTdQhV@@g*b*

after

replaced:

gutdqhv##g+b+

after

replaced:

gu(?1t)(?2+)(?3#)d(?1q)(?2+)(?3#)h(?1v)(?2+)(?3#)(?1@)(?2+)(?3#)(?1@)(?2+)(?3#)g(?1*)(?2+)(?3#)b(?1*)(?2+)(?3#)

guTdQhV@@g*b(?1*)(?2+)(?3#)

pi@raspberrypi:~/boost

$boost正則表達式

迭代器

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

regex

e("(a+).+?",regex::icase);

string

s("ann

abb

aaat");

boost::sregex_iterator

it1(s.begin(),s.end(),e);

boost::sregex_iterator

it2;

for(;it1

!=

it2;++it1)

{

boost::smatch

m

=

*it1;

cout

<<

m

<<

endl;

}

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

an

ab

aaat

pi@raspberrypi:~/boost

$boost正則表達式-1,就是未被匹配的字符

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<iomanip>

#include

<boost/regex.hpp>

using

namespace

std;

int

main(int

argc,

const

char*

argv[])

{

using

boost::regex;

string

s("this

is

::a

string

::of

tokens");

boost::regex

re("\\s+:*");//匹配

boost::sregex_token_iterator

i(s.begin(),s.end(),re,-1);

boost::sregex_token_iterator

j;

unsigned

count

=

0;

while(i

!=

j)

{

cout

<<

*i++

<<

endl;

count++;

}

cout

<<

"There

were

"<<

count

<<

"

tokens

found

!"

<<

endl;

return

0;

}

pi@raspberrypi:~/boost

$

g++

main.cpp

-lboost_regex

-Wall

&&./a.out

this

is

a

string

of

tokens

There

were

6

tokens

found

!

pi@raspberrypi:~/boost

$boost正則表達式captures官方代碼為什么會出現(xiàn)段錯誤?

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<boost/regex.hpp>

#include

<iostream>

void

print_captures(const

std::string&

regx,

const

std::string&

text)

{

boost::regex

e(regx);

boost::smatch

what;

std::cout

<<

"Expression:

\""

<<

regx

<<

"\"\n";

std::cout

<<

"Text:

\""

<<

text

<<

"\"\n";

if(boost::regex_match(text,

what,

e,

boost::match_extra))

{

unsigned

i,

j;

std::cout

<<

"**

Match

found

**\n

Sub-Expressions:\n";

for(i

=

0;

i

<

what.size();

++i)

std::cout

<<

"

$"

<<

i

<<

"

=

\""

<<

what[i]

<<

"\"\n";

std::cout

<<

"

Captures:\n";

for(i

=

0;

i

<

what.size();

++i)

{

std::cout

<<

"

$"

<<

i

<<

"

=

{";

for(j

=

0;

j

<

what.captures(i).size();

++j)

{

if(j)

std::cout

<<

",

";

else

std::cout

<<

"

";

std::cout

<<

"\""

<<

what.captures(i)[j]

<<

"\"";

}

std::cout

<<

"

}\n";

}

}

else

{

std::cout

<<

"**

No

Match

found

**\n";

}

}

int

main(int

,

char*

[])

{

print_captures("(([[:lower:]]+)|([[:upper:]]+))+",

"aBBcccDDDDDeeeeeeee");

print_captures("a(b+|((c)*))+d",

"abd");

print_captures("(.*)bar|(.*)bah",

"abcbar");

print_captures("(.*)bar|(.*)bah",

"abcbah");

print_captures("^(?:(\\w+)|(?>\\W+))*$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

print_captures("^(?>(\\w+)\\W*)*$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

print_captures("^(\\w+)\\W+(?>(\\w+)\\W+)*(\\w+)$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

print_captures("^(\\w+)\\W+(?>(\\w+)\\W+(?:(\\w+)\\W+){0,2})*(\\w+)$",

"now

is

the

time

for

all

good

men

to

come

to

the

aid

of

the

party");

return

0;

}

pi@raspberrypi:~/boost

$

g++

-D

BOOST_REGEX_MATCH_EXTRA

-l

boost_regex

-Wall

main.cpp

&&./a.out

Expression:

"(([[:lower:]]+)|([[:upper:]]+))+"

Text:

"aBBcccDDDDDeeeeeeee"

**

No

Match

found

**

Bus

error

pi@raspberrypi:~/boost

$boost正則表達式官方例子

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<cstdlib>

#include

<stdlib.h>

#include

<boost/regex.hpp>

#include

<string>

#include

<iostream>

using

namespace

std;

using

namespace

boost;

regex

expression("^([0-9]+)(\\-|

|$)(.*)$");//0-9,-

$,*三種

int

process_ftp(const

char*

response,

std::string*

msg)

{

cmatch

what;

if(regex_match(response,

what,

expression))

{

//

what[0]

contains

the

whole

string

//

what[1]

contains

the

response

code

//

what[2]

contains

the

separator

character

//

what[3]

contains

the

text

message.

if(msg)

msg->assign(what[3].first,

what[3].second);

return

::atoi(what[1].first);

}

//

failure

did

not

match

if(msg)

msg->erase();

return

-1;

}

#if

defined(BOOST_MSVC)

||

(defined(__BORLANDC__)

&&

(__BORLANDC__

==

0x550))

istream&

getline(istream&

is,

std::string&

s)

{

s.erase();

char

c

=

static_cast<char>(is.get());

while(c

!=

'\n')

{

s.append(1,

c);

c

=

static_cast<char>(is.get());

}

return

is;

}

#endif

int

main(int

argc,

const

char*[])

{

std::string

in,

out;

do

{

if(argc

==

1)

{

cout

<<

"enter

test

string"

<<

endl;

getline(cin,

in);

if(in

==

"quit")

break;

}

else

in

=

"100

this

is

an

ftp

message

text";

int

result;

result

=

process_ftp(in.c_str(),

&out);

if(result

!=

-1)

{

cout

<<

"Match

found:"

<<

endl;

cout

<<

"Response

code:

"

<<

result

<<

endl;

cout

<<

"Message

text:

"

<<

out

<<

endl;

}

else

{

cout

<<

"Match

not

found"

<<

endl;

}

cout

<<

endl;

}

while(argc

==

1);

return

0;

}

pi@raspberrypi:~/boost

$

g++

-l

boost_regex

-Wall

main.cpp

&&./a.out

enter

test

string

404

not

found

Match

found:

Response

code:

404

Message

text:

not

found

enter

test

string

500

service

error

Match

found:

Response

code:

500

Message

text:

service

error

enter

test

string

^C

pi@raspberrypi:~/boost

$boost正則表達式search方式簡單的詞法分析器,分析C++類定義

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<string>

#include

<map>

#include

<boost/regex.hpp>

//

purpose:

//

takes

the

contents

of

a

file

in

the

form

of

a

string

//

and

searches

for

all

the

C++

class

definitions,

storing

//

their

locations

in

a

map

of

strings/int's

typedef

std::map<std::string,

std::string::difference_type,

std::less<std::string>

>

map_type;

const

char*

re

=

//

possibly

leading

whitespace:

"^[[:space:]]*"

//

possible

template

declaration:

"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"

//

class

or

struct:

"(class|struct)[[:space:]]*"

//

leading

declspec

macros

etc:

"("

"\\<\\w+\\>"

"("

"[[:blank:]]*\\([^)]*\\)"

")?"

"[[:space:]]*"

")*"

//

the

class

name

"(\\<\\w*\\>)[[:space:]]*"

//

template

specialisation

parameters

"(<[^;:{]+>)?[[:space:]]*"

//

terminate

in

{

or

:

"(\\{|:[^;\\{()]*\\{)";

boost::regex

expression(re);

void

IndexClasses(map_type&

m,

const

std::string&

file)

{

std::string::const_iterator

start,

end;

start

=

file.begin();

end

=

file.end();

boost::match_results<std::string::const_iterator>

what;

boost::match_flag_type

flags

=

boost::match_default;

while(boost::regex_search(start,

end,

what,

expression,

flags))

{

//

what[0]

contains

the

whole

string

//

what[5]

contains

the

class

name.

//

what[6]

contains

the

template

specialisation

if

any.

//

add

class

name

and

position

to

map:

m[std::string(what[5].first,

what[5].second)

+

std::string(what[6].first,

what[6].second)]

=

what[5].first

-

file.begin();

//

update

search

position:

start

=

what[0].second;

//

update

flags:

flags

|=

boost::match_prev_avail;

flags

|=

boost::match_not_bob;

}

}

#include

<iostream>

#include

<fstream>

using

namespace

std;

void

load_file(std::string&

s,

std::istream&

is)

{

s.erase();

if(is.bad())

return;

s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));

char

c;

while(is.get(c))

{

if(s.capacity()

==

s.size())

s.reserve(s.capacity()

*

3);

s.append(1,

c);

}

}

int

main(int

argc,

const

char**

argv)

{

std::string

text;

for(int

i

=

1;

i

<

argc;

++i)

{

cout

<<

"Processing

file

"

<<

argv[i]

<<

endl;

map_type

m;

std::ifstream

fs(argv[i]);

load_file(text,

fs);

fs.close();

IndexClasses(m,

text);

cout

<<

m.size()

<<

"

matches

found"

<<

endl;

map_type::iterator

c,

d;

c

=

m.begin();

d

=

m.end();

while(c

!=

d)

{

cout

<<

"class

\""

<<

(*c).first

<<

"\"

found

at

index:

"

<<

(*c).second

<<

endl;

++c;

}

}

return

0;

}

pi@raspberrypi:~/boost

$

cat

my_class.cpp

template

<class

T>

struct

A

{

public:

};

template

<class

T>

class

M

{

}

;

pi@raspberrypi:~/boost

$

g++

-l

boost_regex

-Wall

main.cpp

&&./a.out

my_class.cpp

Processing

file

my_class.cpp

2

matches

found

class

"A"

found

at

index:

36

class

"M"

found

at

index:

88

pi@raspberrypi:~/boost

$boost正則表達式迭代器方式簡單的詞法分析器,分析C++類定義

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<string>

#include

<map>

#include

<fstream>

#include

<iostream>

#include

<boost/regex.hpp>

using

namespace

std;

//

purpose:

//

takes

the

contents

of

a

file

in

the

form

of

a

string

//

and

searches

for

all

the

C++

class

definitions,

storing

//

their

locations

in

a

map

of

strings/int's

typedef

std::map<std::string,

std::string::difference_type,

std::less<std::string>

>

map_type;

const

char*

re

=

//

possibly

leading

whitespace:

"^[[:space:]]*"

//

possible

template

declaration:

"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"

//

class

or

struct:

"(class|struct)[[:space:]]*"

//

leading

declspec

macros

etc:

"("

"\\<\\w+\\>"

"("

"[[:blank:]]*\\([^)]*\\)"

")?"

"[[:space:]]*"

")*"

//

the

class

name

"(\\<\\w*\\>)[[:space:]]*"

//

template

specialisation

parameters

"(<[^;:{]+>)?[[:space:]]*"

//

terminate

in

{

or

:

"(\\{|:[^;\\{()]*\\{)";

boost::regex

expression(re);

map_type

class_index;

bool

regex_callback(const

boost::match_results<std::string::const_iterator>&

what)

{

//

what[0]

contains

the

whole

string

//

what[5]

contains

the

class

name.

//

what[6]

contains

the

template

specialisation

if

any.

//

add

class

name

and

position

to

map:

class_index[what[5].str()

+

what[6].str()]

=

what.position(5);

return

true;

}

void

load_file(std::string&

s,

std::istream&

is)

{

s.erase();

if(is.bad())

return;

s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));

char

c;

while(is.get(c))

{

if(s.capacity()

==

s.size())

s.reserve(s.capacity()

*

3);

s.append(1,

c);

}

}

int

main(int

argc,

const

char**

argv)

{

std::string

text;

for(int

i

=

1;

i

<

argc;

++i)

{

cout

<<

"Processing

file

"

<<

argv[i]

<<

endl;

std::ifstream

fs(argv[i]);

load_file(text,

fs);

fs.close();

//

construct

our

iterators:

boost::sregex_iterator

m1(text.begin(),

text.end(),

expression);

boost::sregex_iterator

m2;

std::for_each(m1,

m2,

®ex_callback);

//

copy

results:

cout

<<

class_index.size()

<<

"

matches

found"

<<

endl;

map_type::iterator

c,

d;

c

=

class_index.begin();

d

=

class_index.end();

while(c

!=

d)

{

cout

<<

"class

\""

<<

(*c).first

<<

"\"

found

at

index:

"

<<

(*c).second

<<

endl;

++c;

}

class_index.erase(class_index.begin(),

class_index.end());

}

return

0;

}

pi@raspberrypi:~/boost

$

g++

-l

boost_regex

-Wall

main.cpp

&&./a.out

main.cpp

my_class.cpp

Processing

file

main.cpp

0

matches

found

Processing

file

my_class.cpp

2

matches

found

class

"A"

found

at

index:

23

class

"B"

found

at

index:

36

pi@raspberrypi:~/boost

$boost正則表達式,將C++文件轉換為HTML文件

pi@raspberrypi:~/boost

$

cat

main.cpp

#include

<iostream>

#include

<fstream>

#include

<sstream>

#include

<string>

#include

<iterator>

#include

<boost/regex.hpp>

#include

<fstream>

#include

<iostream>

//

purpose:

//

takes

the

contents

of

a

file

and

transform

to

//

syntax

highlighted

code

in

html

format

boost::regex

e1,

e2;

extern

const

char*

expression_text;

extern

const

char*

format_string;

extern

const

char*

pre_expression;

extern

const

char*

pre_format;

extern

const

char*

header_text;

extern

const

char*

footer_text;

void

load_file(std::string&

s,

std::istream&

is)

{

s.erase();

if(is.bad())

return;

s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));

char

c;

while(is.get(c))

{

if(s.capacity()

==

s.size())

s.reserve(s.capacity()

*

3);

s.append(1,

c);

}

}

int

main(int

argc,

const

char**

argv)

{

try{

e1.assign(expression_text);

e2.assign(pre_expression);

for(int

i

=

1;

i

<

argc;

++i)

{

std::cout

<<

"Processing

file

"

<<

argv[i]

<<

std::endl;

std::ifstream

fs(argv[i]);

std::string

in;

load_file(in,

fs);

fs.close();

std::string

out_name

=

std::string(argv[i])

+

std::string(".htm");

std::ofstream

os(out_name.c_str());

os

<<

header_text;

//

strip

'<'

and

'>'

first

by

outputting

to

a

//

temporary

string

stream

std::ostringstream

t(std::ios::out

|

std::ios::binary);

std::ostream_iterator<char>

oi(t);

boost::regex_replace(oi,

in.begin(),

in.end(),

e2,

pre_format,

boost::match_default

|

boost::format_all);

//

then

output

to

final

output

stream

//

adding

syntax

highlighting:

std::string

s(t.str());

std::ostream_iterator<char>

out(os);

boost::regex_replace(out,

s.begin(),

s.end(),

e1,

format_string,

boost::match_default

|

boost::format_all);

os

<<

footer_text;

os.close();

}

}

catch(...)

{

return

-1;

}

return

0;

}

const

char*

pre_expression

=

"(<)|(>)|(&)|\\r";

const

char*

pre_format

=

"(?1<)(?2>)(?3&)";

const

char*

expression_text

=

//

preprocessor

directives:

index

1

"(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"

//

comment:

index

2

"(//[^\\n]*|/\\*.*?\\*/)|"

//

literals:

index

3

"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"

//

string

literals:

index

4

"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"

//

keywords:

index

5

"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"

"|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"

"|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"

"|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"

"|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"

"|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"

"|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"

"|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"

"|using|virtual|void|volatile|wchar_t|while)\\>"

;

const

char*

format_string

=

"(?1<font

color=\"#008040\">$&</font>)"

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論