2024年5月5日发(作者:)
OpportunisticMeasurement:
ExtractingInsightfromSpuriousTraffic
MartinCasadoandTalGarfinkel
StanfordUniversity
WeidongCui
UCBerkeley
VernPaxson
ICSI
StefanSavage
UCSanDiego
Abstract—
Whilenetworkmeasurementtechniquesarecontinuallyimprov-
ing,representativenetworkmeasurementsareincreasinglyscarce.
Theissueisfundamentallyoneofaccess:eitherthepointsofinterest
arehidden,areunwilling,oraresufficientlymanythatrepresenta-
icular,muchof
theInternet’smoderngrowth,inbothsizeandcomplexity,is“pro-
tected”byNATandfirewalltechnologiesthatprecludetheuseof
,whilewecanseethe
shrinkingvisibleportionoftheInternetwithever-greaterfidelity,
eforanew
approachtoilluminatethesehiddenregionsoftheInternet:oppor-
tunisticmeasurementthatleveragessourcesof“spurious”network
trafficsuchasworms,misconfigurations,spamfloods,andmalicious
tifyanumberofsuchsourcesanddemon-
stratetheirpotentialtoprovidemeasurementdataatafargreater
portantly,
thesesourcesprovideinsightintoportionsofthenetworkunseen
y,wediscussthe
challengesofbiasandnoisethataccompanyanyuseofspuriousnet-
worktraffic.
I.I
NTRODUCTION
rives
Muchofourinsightintothecurrentstate
the
refined,
measurement
fromempirical
oftheInternetde-
techniques
unately,while
surements
thescopeatwhich
used
researchers
inthese
can
studiesareincreasingly
has
Forexample,
isconversely
thegrowth
shrinking.
conductsuchmea-
ofnetworkaddress
typically
hamstrung
about
presuppose
traditional
addressibility.
activemeasurement
translation
efforts-
(NAT)
which
researchers
edgenetworkdemographicsare
Thus
difficult
even
to
simple
answer
questions
because
home
Moreover,
users,
lack
work
researchers
small
adequate
businesses)
,
arealso
that
limited
heavily
tousing
determine
well-behaved
theanswer.
net-
thinkable
traffic
bandwidth
to
dbeun-
sources.
capacities
conducta
by
large-scale
flooding
measurement
thenetworkfrom
studyofbisection
a
ping
backlash
Indeed,theincreaseinnetwork-bornethreats
thousands
hasfueled
of
around-trip
packet
against
tomany
even
hosts
the
produces
mostinnocuous
anastye-mail
network
inaddition
probes—a
to
dedicated
Consequently,
timemeasurement.
muchactivemeasurement
However,
ber
such
infrastructures
infrastructures
(PlanetLab,
are
NIMI)
research
toprovide
reliesupon
data.
(e.g.,
ofavailablesourcesarerelatively
inherently
small
limited
and
asthenum-
research
10s
tive
networks,
orhundreds
often
ofnodesassociatedwitheducational
homogeneous
or
businesses,
ofthelarger
Internet
Internet
close
cafes,
(millions
tothecore)andnotrepresenta-
oftendeep
of
on
end-hosts
theedge).
inhomes,
Cooperative
small
efforts
see
togaingreateraccesstothesesources[10],[2]haveyetto
ness
Similarly,
muchadoption.
sequently
ofthe
passivemeasurementeffortsaregatedbytherich-
cally
for
alimited
limited
observer’s
set
to
of
research
vantage
thelinks
and
searchersarecon-
inthose
educational
settings.
networksandtypi-
from
alarge
concerns,
network
anddiversedemographicrequiresgreater
Obtaining
cooperation
traces
have
hasgenerally
carriers,which,
beeninfeasible.
forbusiness
Exceptions
reasonsand
to
privacy
these
beenperformedbythecarriersthemselves[9],[7]and
this
even
rule
us
Consequently,
stillonlycover
while
atinyfractionofInternethostsandpaths.
those
more
larger,
portions
thanever
ofthe
before
current
Internet
about
with
the
measurement
little
visible
techniquescantell
or
portions
novisibility
oftheInternet,
spread
Addressing
withthegapbetweenvisibleanddarklikelywidening.
remainfar
generating
deployment
these
of
problems
measurement
would
agents
seemingly
insideedge
require
networks,
wide-
drive
mentation
general
regular
sible,
ofthis
experimentation.
testtrafficofsufficientscaleanddiversityto
visionisbotheconomically
Whileastraightforward
andsocially
imple-
infea-
erated
inthispaperwearguethatsuchtrafficisalreadybeinggen-
propose
and
erated
exploiting
canbeopportunistically
theprodigious,yet
icular,we
botnet
liminary
scans,
bycompromised
underutilized,trafficgen-
diverse
experiments
DDoSbackscatter,
ormisconfiguredhosts—wormprobes,
weshowthat
spam
such
fl-
activities
andviablesubstrateforavarietyof
data
network
providesabroad,
of
andserendipitouslyside-stepsmanyofthe
measurement
limitations
depth.
In
traditional
the
methods.
vide
ods,
improvements
In
remainder
thenextsection
ofthispaperwepresentourcaseinmore
inscale
wediscuss
and
howspurioustrafficcanpro-
traffic
followed
sources
in§III.
by
In
examples
§IVwe
of
explore
large
diversity
scale
techniques
vents
over
that
traditionalmeth-
forutilizing
generatesuch
tunistic
tions
in§VII.
presented
measurement.
and§Vsketchessomepreliminaryresultsfromoppor-
these
bythis
We
approach
sketch
in
the
§VI
new
and
challenges
finallywe
and
conclude
limita-
II.W
HY
S
PURIOUS
T
RAFFIC
?
teristics
Spurious
that
traffic
areattractive
provides
for
us
network
withanumber
measurement.
ofuniquecharac-
ment
1)ManySources:Harnessingspuriouseventsfor
sources
purposes
ment
thanare
canyieldseveralordersofmagnitudemore
measure-
traffic
nized
sources
activitysuch
such
currently
as
as
automated
PlanetLab
available
scanning
[4]
from
orneti@home
otheractivemeasure-
andspamcan
[10].
use
Orga-
large
bot
recorded
networks
gle
large
concerted
traffic
of
amounts
scan.
from
tensof
over
mple,wehave
ofspam
In
we
another
16,000
recorded
study
unique
38,000
of
IP
a
addresses
domain
fromasin-
addresses.
thatreceives
dreds
Internet-scaleeventssuchaswormoutbreaks
logically
ofthousands
recorded
diversepool
ofhosts,creatinganincredibly
can
large
infect
andtopo-
hun-
(CRv2)
trafficfrom
of
359,000
traffic
sources
sources.
for
For
the
example,
firstCode
CAIDA
Red
While
is
the
outbreak
peaknumber
[17]and
oftraffic
160,000
sources
sources
from
forNIMDA[1].
continue
relatively
tialinfection.
togenerate
shortlived,
useful
often
traffic,
largenumbers
sometimes
of
years
infected
initialinfections
after
machines
theini-
tinct
Moreover,mostofthesourceswehaveexaminedare
ing
automated
sets
between
ofaddresses
spurious
from
traffic
three
episodes.
major
Forexample,compar-
dis-
fections
ily
(∼
scan(∼16,000machines),long-lived
trafficsources—a
CodeRed
large,
IIin-
dresses
spammed
1,500
tentially
thatappeared
domain
machines)
(∼
in
38,000
and
morethan
machines),
hostssendingemailtoaheav-
oneset.
we
Thus,
findonly
wecan
24ad-
po-
pools
patible
of
combine
with
sources,
thedifferent
if
multiple
themeasurement
typesofsourcestoobtainevenlarger
sourcetypes.
wewishtoperformiscom-
frastructures
2)GreatDiversity:Today’sorganized
sources
rope
from
are
academic
highly
institutions
homogeneous,
inthe
consisting
measurement
US
primarily
in-
of
For
275sites
example,
interconnected
inahandful
PlanetLab
with
ofcountries.
currently
high-bandwidth
has584
low-congestion
andWesternEu-
nodesrepresenting
links.
and
Incontrast,sourcesofspurioustrafficareoftenInternet-wide
searchers
biased
viduals).
to
towards
,
machines
those
that
of
havebeendifficultforre-
to
159
oneofour
For
domains
example,
have
the
whois
38,000
private
records
machines
institutions
with
originating
and
addresses
spam
indi-
from
M&M
countries,
addition,
tool
andbottleneckbandwidth(measuredusingthe
side
would
behind
alarge
suite
beinvisible
NAT
fraction
[11])ranging
boxes
totraditional
(see
of
§
the
V).
machines
measurement
Thesehosts
wehave
and
techniques.
their
measured
networks
re-
consider
3)SocialAcceptability:Historically,ithasbeentabooto
amounts
measurementactivitiesthatwouldconsumeverylarge
volume
responsible,
network
ofaggregatebandwidth:generatingwide-scalehigh-
attack.
and
probes
atworse
is
as
at
no
best
different
considered
than
anti-socialandir-
behavior
However,
cesslinks
(such
[16])
as
measuring
raises
theSlammer
preexisting
nosuch
worm’s
sources
a
that
hostile
exhibit
network
such
concerns—the
saturation
event
ofnetwork
hasalready
ac-
happened
rectharmcaused
duetosomeone
byexploiting
else’s
this
misbehavior
behavior.
andthereisnodi-
raises
Similarly,
infeasible
verysignificant
large-scale
privacy
passive
concerns,
analysis
and
of
thus
legitimate
hasbeenlargely
traffic
of
thermore,
normal
to
application
r,
content,
spurious
rendering
traffic
these
is
issues
generally
moot.
devoid
some
etary.
safe
the
Inthe
harbor
unsolicited
vernacular,
evenshould
broadcast
“thecat
the
natureofthistrafficprovides
Fur-
is
contents
already
be
out
sensitive
ofthebag”.
orpropri-
III.E
XAMPLESOF
T
RAFFIC
M
S
OURCESFOR
O
PPORTUNISTIC
EASUREMENT
fic
Opportunisticmeasurementrequires
meet
sources)
e.g.,
therequirements
thatsatisfy
of
several
theanalysis
constraints.
discovering
that
First,
wewish
the
events
traffic
(traf-
toperform:
must
tion,
high-volume
The
or
TCPflowsforbottleneckbandwidthestima-
cally
event
long-term
mustalso
predictable
generateenough
trafficfor
traffic
path
to
characterization.
producestatisti-
tination
meaningful
point,
addresses
results.
visibleto
Finally,
theresearcher.
thetraffic
Regarding
mustinclude
this
des-
any
quently
vantage
oftenspurious
hasa
point
non-uniform
on
traffic
theInternet.
events
distribution,
But
generate
in
trafficviewablefrom
last
creating
addition
hot-spots
suchtraffic
oreven
fre-
attractorswherethe
these
We
properties:
haveobserved
traffic
several
concentrates.
differentclassesofeventsexhibiting
lasses
sources,
1)Worms:
them
andtheir
Worms
codeis
turn
directly
large
available.
numbersofhostsintotraffic
also
initial
provide
idealcandidates
twodifferent
foropportunistic
Thesefeaturesmake
modesuseful
measurement.
for
Worms
only
of
machines.
sources
a
flurry
few
—
hours,
oftrafficfromawormoutbreak
measurement.
typicallylasts
The
for
sometimes
butresults
numbering
intraffic
in
from
thetens
amassive
of
number
ing
quires
bestadvantage
Werefer
of
tothesesingulareventsassupernovas
thousands
.Tak-
of
passive
carefulpriorplanning
thesespectacular
toensurethat
measurement
wehavethe
events
necessary
re-
rence.
measurementinfrastructureinplacetocapturetheoccur-
periods,
However,infectedmachinescancontinuetoscanfor
going
that
activity
sometimes
canprovide
evenyears,
predictable
afterthe
long-term
longer
on-
continue
weterm
toscan
pulsars
theInternet
.Indeed,
from
worms
thousands
released
of
in
traffic
infected
2001[17],
sources
hosts.
[1]
1
the
2)AutomatedScans:Anothersignificantsourceoftraffic
mated
Internet
Malicious
scans
is
by
the
attackers
ever-present
looking
“background
forvulnerable
radiation”ofauto-
on
lections
chines.
of
scans
bots,
are
sometimes
oftenperformed
numbering
collaboratively
machines[20].
in
bylargecol-
sharpbursts
Their
to
scanning
slow
patternsvarywidely
the
from
10,000s
unpredictable
ofma-
to
Automated
cally
derivemore
scansdiffer
linear
from
probes
worms
lasting
in
weeks.
thatgenerallytheyseek
may
interested
information
onlyinfinding
about
the
a
next
host.
target,
Whereawormistypi-
bilities,
look
network
or
for
other
specificsofagivenprotocolstack,
a
multiple
tool-driven
vulnera-
scan
individual
scans
IPaddresses
can
remotely
generate
discernible
(orsubnets)
relatively
information.
within
large
a
amounts
Consequently,
shorttime
of
period.
trafficto
interesting
3)Spam:Whenpresentinlargequantities,spamprovides
relatively
ten
long-lived
classofspurious
TCPflows.
traffic
Passive
becauseitgivesusaccess
an
to
ysis
require
reside
[11].
ontypes
In
ample
addition,
fl,50packets)
measurement
foraccurate
tools
anal-
of-
ofcomputers
hostsused
noteasily
tosource
accessible
orrelay
toresearchers.
spamoften
1
our
Strictly
each
datashows
speaking,,
datealtered
year,but
to
new
that
give
variants
theoriginalhasdiedoffasprogrammedonOctober1stof
theworm
are
extended
released
life.
includingCodeRed.F[6]withthedie-off
发布评论