2024年5月5日发(作者:)

OpportunisticMeasurement:

ExtractingInsightfromSpuriousTraffic

MartinCasadoandTalGarfinkel

StanfordUniversity

WeidongCui

UCBerkeley

VernPaxson

ICSI

StefanSavage

UCSanDiego

Abstract—

Whilenetworkmeasurementtechniquesarecontinuallyimprov-

ing,representativenetworkmeasurementsareincreasinglyscarce.

Theissueisfundamentallyoneofaccess:eitherthepointsofinterest

arehidden,areunwilling,oraresufficientlymanythatrepresenta-

icular,muchof

theInternet’smoderngrowth,inbothsizeandcomplexity,is“pro-

tected”byNATandfirewalltechnologiesthatprecludetheuseof

,whilewecanseethe

shrinkingvisibleportionoftheInternetwithever-greaterfidelity,

eforanew

approachtoilluminatethesehiddenregionsoftheInternet:oppor-

tunisticmeasurementthatleveragessourcesof“spurious”network

trafficsuchasworms,misconfigurations,spamfloods,andmalicious

tifyanumberofsuchsourcesanddemon-

stratetheirpotentialtoprovidemeasurementdataatafargreater

portantly,

thesesourcesprovideinsightintoportionsofthenetworkunseen

y,wediscussthe

challengesofbiasandnoisethataccompanyanyuseofspuriousnet-

worktraffic.

I.I

NTRODUCTION

rives

Muchofourinsightintothecurrentstate

the

refined,

measurement

fromempirical

oftheInternetde-

techniques

unately,while

surements

thescopeatwhich

used

researchers

inthese

can

studiesareincreasingly

has

Forexample,

isconversely

thegrowth

shrinking.

conductsuchmea-

ofnetworkaddress

typically

hamstrung

about

presuppose

traditional

addressibility.

activemeasurement

translation

efforts-

(NAT)

which

researchers

edgenetworkdemographicsare

Thus

difficult

even

to

simple

answer

questions

because

home

Moreover,

users,

lack

work

researchers

small

adequate

businesses)

,

arealso

that

limited

heavily

tousing

determine

well-behaved

theanswer.

net-

thinkable

traffic

bandwidth

to

dbeun-

sources.

capacities

conducta

by

large-scale

flooding

measurement

thenetworkfrom

studyofbisection

a

ping

backlash

Indeed,theincreaseinnetwork-bornethreats

thousands

hasfueled

of

around-trip

packet

against

tomany

even

hosts

the

produces

mostinnocuous

anastye-mail

network

inaddition

probes—a

to

dedicated

Consequently,

timemeasurement.

muchactivemeasurement

However,

ber

such

infrastructures

infrastructures

(PlanetLab,

are

NIMI)

research

toprovide

reliesupon

data.

(e.g.,

ofavailablesourcesarerelatively

inherently

small

limited

and

asthenum-

research

10s

tive

networks,

orhundreds

often

ofnodesassociatedwitheducational

homogeneous

or

businesses,

ofthelarger

Internet

Internet

close

cafes,

(millions

tothecore)andnotrepresenta-

oftendeep

of

on

end-hosts

theedge).

inhomes,

Cooperative

small

efforts

see

togaingreateraccesstothesesources[10],[2]haveyetto

ness

Similarly,

muchadoption.

sequently

ofthe

passivemeasurementeffortsaregatedbytherich-

cally

for

alimited

limited

observer’s

set

to

of

research

vantage

thelinks

and

searchersarecon-

inthose

educational

settings.

networksandtypi-

from

alarge

concerns,

network

anddiversedemographicrequiresgreater

Obtaining

cooperation

traces

have

hasgenerally

carriers,which,

beeninfeasible.

forbusiness

Exceptions

reasonsand

to

privacy

these

beenperformedbythecarriersthemselves[9],[7]and

this

even

rule

us

Consequently,

stillonlycover

while

atinyfractionofInternethostsandpaths.

those

more

larger,

portions

thanever

ofthe

before

current

Internet

about

with

the

measurement

little

visible

techniquescantell

or

portions

novisibility

oftheInternet,

spread

Addressing

withthegapbetweenvisibleanddarklikelywidening.

remainfar

generating

deployment

these

of

problems

measurement

would

agents

seemingly

insideedge

require

networks,

wide-

drive

mentation

general

regular

sible,

ofthis

experimentation.

testtrafficofsufficientscaleanddiversityto

visionisbotheconomically

Whileastraightforward

andsocially

imple-

infea-

erated

inthispaperwearguethatsuchtrafficisalreadybeinggen-

propose

and

erated

exploiting

canbeopportunistically

theprodigious,yet

icular,we

botnet

liminary

scans,

bycompromised

underutilized,trafficgen-

diverse

experiments

DDoSbackscatter,

ormisconfiguredhosts—wormprobes,

weshowthat

spam

such

fl-

activities

andviablesubstrateforavarietyof

data

network

providesabroad,

of

andserendipitouslyside-stepsmanyofthe

measurement

limitations

depth.

In

traditional

the

methods.

vide

ods,

improvements

In

remainder

thenextsection

ofthispaperwepresentourcaseinmore

inscale

wediscuss

and

howspurioustrafficcanpro-

traffic

followed

sources

in§III.

by

In

examples

§IVwe

of

explore

large

diversity

scale

techniques

vents

over

that

traditionalmeth-

forutilizing

generatesuch

tunistic

tions

in§VII.

presented

measurement.

and§Vsketchessomepreliminaryresultsfromoppor-

these

bythis

We

approach

sketch

in

the

§VI

new

and

challenges

finallywe

and

conclude

limita-

II.W

HY

S

PURIOUS

T

RAFFIC

?

teristics

Spurious

that

traffic

areattractive

provides

for

us

network

withanumber

measurement.

ofuniquecharac-

ment

1)ManySources:Harnessingspuriouseventsfor

sources

purposes

ment

thanare

canyieldseveralordersofmagnitudemore

measure-

traffic

nized

sources

activitysuch

such

currently

as

as

automated

PlanetLab

available

scanning

[4]

from

orneti@home

otheractivemeasure-

andspamcan

[10].

use

Orga-

large

bot

recorded

networks

gle

large

concerted

traffic

of

amounts

scan.

from

tensof

over

mple,wehave

ofspam

In

we

another

16,000

recorded

study

unique

38,000

of

IP

a

addresses

domain

fromasin-

addresses.

thatreceives

dreds

Internet-scaleeventssuchaswormoutbreaks

logically

ofthousands

recorded

diversepool

ofhosts,creatinganincredibly

can

large

infect

andtopo-

hun-

(CRv2)

trafficfrom

of

359,000

traffic

sources

sources.

for

For

the

example,

firstCode

CAIDA

Red

While

is

the

outbreak

peaknumber

[17]and

oftraffic

160,000

sources

sources

from

forNIMDA[1].

continue

relatively

tialinfection.

togenerate

shortlived,

useful

often

traffic,

largenumbers

sometimes

of

years

infected

initialinfections

after

machines

theini-

tinct

Moreover,mostofthesourceswehaveexaminedare

ing

automated

sets

between

ofaddresses

spurious

from

traffic

three

episodes.

major

Forexample,compar-

dis-

fections

ily

(∼

scan(∼16,000machines),long-lived

trafficsources—a

CodeRed

large,

IIin-

dresses

spammed

1,500

tentially

thatappeared

domain

machines)

(∼

in

38,000

and

morethan

machines),

hostssendingemailtoaheav-

oneset.

we

Thus,

findonly

wecan

24ad-

po-

pools

patible

of

combine

with

sources,

thedifferent

if

multiple

themeasurement

typesofsourcestoobtainevenlarger

sourcetypes.

wewishtoperformiscom-

frastructures

2)GreatDiversity:Today’sorganized

sources

rope

from

are

academic

highly

institutions

homogeneous,

inthe

consisting

measurement

US

primarily

in-

of

For

275sites

example,

interconnected

inahandful

PlanetLab

with

ofcountries.

currently

high-bandwidth

has584

low-congestion

andWesternEu-

nodesrepresenting

links.

and

Incontrast,sourcesofspurioustrafficareoftenInternet-wide

searchers

biased

viduals).

to

towards

,

machines

those

that

of

havebeendifficultforre-

to

159

oneofour

For

domains

example,

have

the

whois

38,000

private

records

machines

institutions

with

originating

and

addresses

spam

indi-

from

M&M

countries,

addition,

tool

andbottleneckbandwidth(measuredusingthe

side

would

behind

alarge

suite

beinvisible

NAT

fraction

[11])ranging

boxes

totraditional

(see

of

§

the

V).

machines

measurement

Thesehosts

wehave

and

techniques.

their

measured

networks

re-

consider

3)SocialAcceptability:Historically,ithasbeentabooto

amounts

measurementactivitiesthatwouldconsumeverylarge

volume

responsible,

network

ofaggregatebandwidth:generatingwide-scalehigh-

attack.

and

probes

atworse

is

as

at

no

best

different

considered

than

anti-socialandir-

behavior

However,

cesslinks

(such

[16])

as

measuring

raises

theSlammer

preexisting

nosuch

worm’s

sources

a

that

hostile

exhibit

network

such

concerns—the

saturation

event

ofnetwork

hasalready

ac-

happened

rectharmcaused

duetosomeone

byexploiting

else’s

this

misbehavior

behavior.

andthereisnodi-

raises

Similarly,

infeasible

verysignificant

large-scale

privacy

passive

concerns,

analysis

and

of

thus

legitimate

hasbeenlargely

traffic

of

thermore,

normal

to

application

r,

content,

spurious

rendering

traffic

these

is

issues

generally

moot.

devoid

some

etary.

safe

the

Inthe

harbor

unsolicited

vernacular,

evenshould

broadcast

“thecat

the

natureofthistrafficprovides

Fur-

is

contents

already

be

out

sensitive

ofthebag”.

orpropri-

III.E

XAMPLESOF

T

RAFFIC

M

S

OURCESFOR

O

PPORTUNISTIC

EASUREMENT

fic

Opportunisticmeasurementrequires

meet

sources)

e.g.,

therequirements

thatsatisfy

of

several

theanalysis

constraints.

discovering

that

First,

wewish

the

events

traffic

(traf-

toperform:

must

tion,

high-volume

The

or

TCPflowsforbottleneckbandwidthestima-

cally

event

long-term

mustalso

predictable

generateenough

trafficfor

traffic

path

to

characterization.

producestatisti-

tination

meaningful

point,

addresses

results.

visibleto

Finally,

theresearcher.

thetraffic

Regarding

mustinclude

this

des-

any

quently

vantage

oftenspurious

hasa

point

non-uniform

on

traffic

theInternet.

events

distribution,

But

generate

in

trafficviewablefrom

last

creating

addition

hot-spots

suchtraffic

oreven

fre-

attractorswherethe

these

We

properties:

haveobserved

traffic

several

concentrates.

differentclassesofeventsexhibiting

lasses

sources,

1)Worms:

them

andtheir

Worms

codeis

turn

directly

large

available.

numbersofhostsintotraffic

also

initial

provide

idealcandidates

twodifferent

foropportunistic

Thesefeaturesmake

modesuseful

measurement.

for

Worms

only

of

machines.

sources

a

flurry

few

hours,

oftrafficfromawormoutbreak

measurement.

typicallylasts

The

for

sometimes

butresults

numbering

intraffic

in

from

thetens

amassive

of

number

ing

quires

bestadvantage

Werefer

of

tothesesingulareventsassupernovas

thousands

.Tak-

of

passive

carefulpriorplanning

thesespectacular

toensurethat

measurement

wehavethe

events

necessary

re-

rence.

measurementinfrastructureinplacetocapturetheoccur-

periods,

However,infectedmachinescancontinuetoscanfor

going

that

activity

sometimes

canprovide

evenyears,

predictable

afterthe

long-term

longer

on-

continue

weterm

toscan

pulsars

theInternet

.Indeed,

from

worms

thousands

released

of

in

traffic

infected

2001[17],

sources

hosts.

[1]

1

the

2)AutomatedScans:Anothersignificantsourceoftraffic

mated

Internet

Malicious

scans

is

by

the

attackers

ever-present

looking

“background

forvulnerable

radiation”ofauto-

on

lections

chines.

of

scans

bots,

are

sometimes

oftenperformed

numbering

collaboratively

machines[20].

in

bylargecol-

sharpbursts

Their

to

scanning

slow

patternsvarywidely

the

from

10,000s

unpredictable

ofma-

to

Automated

cally

derivemore

scansdiffer

linear

from

probes

worms

lasting

in

weeks.

thatgenerallytheyseek

may

interested

information

onlyinfinding

about

the

a

next

host.

target,

Whereawormistypi-

bilities,

look

network

or

for

other

specificsofagivenprotocolstack,

a

multiple

tool-driven

vulnera-

scan

individual

scans

IPaddresses

can

remotely

generate

discernible

(orsubnets)

relatively

information.

within

large

a

amounts

Consequently,

shorttime

of

period.

trafficto

interesting

3)Spam:Whenpresentinlargequantities,spamprovides

relatively

ten

long-lived

classofspurious

TCPflows.

traffic

Passive

becauseitgivesusaccess

an

to

ysis

require

reside

[11].

ontypes

In

ample

addition,

fl,50packets)

measurement

foraccurate

tools

anal-

of-

ofcomputers

hostsused

noteasily

tosource

accessible

orrelay

toresearchers.

spamoften

1

our

Strictly

each

datashows

speaking,,

datealtered

year,but

to

new

that

give

variants

theoriginalhasdiedoffasprogrammedonOctober1stof

theworm

are

extended

released

life.

includingCodeRed.F[6]withthedie-off