# discrete mathematical approaches to traffic graph analysis discrete mathematical approaches to...

Post on 27-Apr-2020

2 views

Embed Size (px)

TRANSCRIPT

Discrete Mathematical Approaches to Traffic Graph Analysis CLIFF JOSLYN WENDY COWLEY, EMILIE HOGAN, BRYAN OLSEN FLOCON 2015

JANUARY 2015

Outline

The challenge for analytics on cyber network data Multi-scale network analysis approaches Analysis test environment

Netflow traffic analysis RDB and EDA tools VAST challenge data set

Basic graph statistics Labeled graph degree distributions Time interval synchrony measurement

January 20, 2015 2

Challenge

Asymmetric Resilient Cybersecurity Initiative (ARC), PNNL Research effort on modeling formalisms for general cyber systems

Cyber systems modeling needs unifying methodologies Digital: No space, ordinal time, no energy, no conservation laws, no natural metrics (continuity, contiguity) Engineered: No methods from discovery-based science

Represent cyber systems as discrete mathematical objects interacting across hierarchically scalar levels

Coarse-grained and fine-grained models Each distinctly validated, but interacting Similar to hybrid modeling and qualitative physics

Coarse grained discrete model Constrains fine-grained continuous model

We are discrete all the way down Utilize discrete mathematical foundations

Labeled, directed graphs as a base representation of any discrete relation But, equipped with additional constraints, complex attributes And exploiting higher-order combinatorial structures and methods

Netflow Focus

January 20, 2015 4

GOAL: Multi-scale network modeling • Modeling assumption 1: Netflow for first cut

Inherently multi-scale: drilldown to packet level, scalar “sweet spot”? Broad interest beyond ARC Ample use cases Both public and private test databases available

• Modeling assumption 2: VAST Challenge fort test data Open Ground truth Moderate size

Joslyn, CA; Choudhury, S; Haglin, D; Howe, B; Nickless, B; Olsen, B.: (2013) “Massive Scale Cyber Traffic Analysis: A Driver for Graph Database Research”, Proc. 1st Int. Wshop. on GRAph Data Management Experiences and Systems (GRADES 2013)

Test data sets

Currently scaling to O(100M) edges Netezza TwinFin:

Parallel SQL databases appliance Unique asymmetric massively parallel processing (AMPPTM) architecture FPGAs for data filtering

Tableau 8.1 for EDA Future: Porting to PNNL’s novel high-performance graph database engine GEMS, potential scaling to O(100B-1T) graph edges

Analysis Environment

January 20, 2015 5

Morari, A; Castellana, V; Tumeo, Antonino; Weaver, J; David Haglin, John Feo, Sutanay Choudhury, Oreste Villa: (2014) “Scaling Semantic Graph Databases in Size and Performance”, IEEE Micro, 34:4, pp: 16-26

VAST Data Challenge

Visual analytics competition co-led by PNNL since about 2005 Co-located with Visual Analytics Science and Technology (VAST) conference Funded by and in the service of specific sponsors and their goals 2011-2013 focus on cyber challenge Scenario: Big Marketing Situational Awareness PNNL-provided simulated netflow traffic Combined with IPS and BigBrother health monitoring Challenge

Provide visualizations for situational awareness Report events during the timeline

Submissions About a dozen from universities, commercial partners, individuals

January 20, 2015 6

http://vacommunity.org/VAST+Challenge+2013

VAST Architecture

Three BM sites Mostly web traffic Clients and servers both inside and outside Simulated external users hitting internal servers Some I/O ambiguity on bidirectional Netflow

January 20, 2015 7

Ground Truth

Italics = Events that are not observable in supplied data (red) = Attacks with serious consequences

= Attack attempts blocked by IPS

Thanks to Kirsten Whitley

Data Exfiltration

Port Scans

Botnet DOS

Threatening Letter

Mar 1 Mar 15

Apr 1 Apr 2 Apr 3 Apr 4 Apr 5 Apr 6 Apr 7

Apr 8 Apr 9 Apr 10 Apr 11 Apr 12 Apr 13 Apr 14 Apr 15

Video Conference

Network Health

Threatening Letter

Port Scans Port Scans

DOS

DOS

Intrusion: Webpage Redirects

Webpage Redirects

Malware Infection: Admin Infection

Port Scans

Firewall Compromise

Data Exfiltration

Data Exfiltration

Port Scans Port Scans Port Scans Port Scans Port Scans

Botnet Infection

Botnet C & C Botnet

DOS

2 2

2

2

DOS

Network Health

Netflow: Complex Data Space

Basic graph statistics: all with Input X Output Flow count IPPs IPs Ports Times: Start, Finish, Durations Payload: # packets, # bytes Transport protocol

Tremendous initial value just with basic stats! Many many, combinations, we’re cherry-picking a few to show

To which we bring our new measures: Degree distribution:

Dispersion, Smoothness Additional metrics

Time intervals January 20, 2015 9

“Graph Cube” Contractions

Projections in directed labeled graphs provide natural scalar levels Netflow: IPs and Ports

IP Projection

IPP

Port Projection

Zhao, Peixiang; Li, Xiaolei; Xin, Dong; and Han, Jiawei: (2011) “Graph Cube: On Warehousing and OLAP Multidimensional Networks”, SIGMOD 2011

10

Basic Graph Statistics: VAST

January 20, 2015 11

VAST IPP Mean flows per Flows 69,396,995 Nodes 10,066,187 6.89 Outs 8,784,807 7.90 Leaves 1,281,380 12.7% Ins 2,533,742 27.39 Roots 7,532,445 74.8% Internals 1,252,362 12.4%

Pairs present 14,387,421 4.82 Pairs possible 22,258,434,457,794 0.00000312 Density 0.0000646%

IP Projection

IPP

Port Projection

VAST IP Mean flows per Flows 69,396,995 Nodes 1,440 48,192 Outs 1,424 48,734 Leaves 16 1.1% Ins 1,345 51,596 Roots 95 6.6% Internals 1,329 92.3%

Pairs present 30,161 2,301 Pairs possible 1,915,280 36 Density 1.57%

Mean Ports/IP 6,990.41

VAST Port Mean flows per Flows 69,396,995 Nodes 65,536 1,058.91 Outs 64,501 1,075.91 Leaves 1,035 1.6% Ins 65,536 1,058.91 Roots - 0.0% Internals 64,501 98.4%

Pairs present 986,385 70.35 Pairs possible 4,227,137,536 0.01641702 Density 0.023%

# Flows by IP

# 0 in: 95 # 0 out: 16 # > 0 on both: 1328

# Flows by Port

January 20, 2015 13

Basic Payload View: Exfiltration

January 20, 2015 14

Basic Payload View: Exfiltration

January 20, 2015 15

1 100 10,000 1,000,000 100,000,000 10,000,000,000 Out_Total_Payload

1

2

5

10

20

50

100

200

500

1,000

2,000

5,000

10,000

20,000

50,000

100,000

200,000

500,000

1,000,000

2,000,000

5,000,000

10,000,000

IPADDR: 10.7.5.5 TIME_HR: April 6, 2013

CT_SRC_OUT_EDGES: 1,675 Sum_IN_PAYLOAD: 247,895,424,744

Sum_Sum_IN_PAYLOAD 0

50,000,000,000

100,000,000,000

150,000,000,000

200,000,000,000

247,895,424,744

PROTOCOL 1

6

17

IP_Group External

Internal

Other

Beyond Volume for Anomaly Detection

Packets and bytes not always sufficient to identify behavioral patterns IP and port behavior can tell the difference

E.g. port scan in figure Entropy of DstIP, DstPort

January 20, 2015 16 A Lakhina, M Crovella, C Diot: (2005) “Mining Anomalies Using Traffic Feature Distributions”, SIGCOMM 05

IP Projection

IPP

Port Projection

Labeled Degree Distributions

How can we characterize relationships between IPs, Ports, etc.?

How many other IPs/ports talked to? How distributed?

January 20, 2015 17

Input: C/A/D = 2/1/1 Output: B/A/C/E = 2/1/1/1

Joint: C/A/B/D/E = 3/2/2/1/1

Analyze the distributions of labels Incoming and outgoing IPs, Ports, IPPs Labeled degree distributions

Information Measures of IP/Port Distributions

January 20, 2015 18

Dispersion = 0.70 Smoothness = 0.76

Dispersion = 0.70 Smoothness = 1.00

Dispersion = 0.30 Smoothness = 0.97

DISPERSION: # IPs, ports relative to # flows Math: Log count ratio

SMOOTHNESS: Even or lumpy distribution of IPs, ports Math: Normalized entropy

CA Joslyn, W Cowley, EA Hogan, B Olsen: (2014) “Discrete Mathematical Approaches to Graph-Based Traffic Analysis” 2014 Int. Wshop. on Engineering Cyber Security and Resilience (ECSaR14) http://www.ase360.org/bitstream/handle/123456789/157/ecsar2014_paper4.pdf

Labeled Degree Distributions

Inform