Home

Awesome

ctsTraffic

ctsTraffic is a highly scalable client/server networking tool giving detailed performance and reliability analytics

If you would like to download the latest build and not have to pull down the source code to build it yourself, you can download them from https://github.com/microsoft/ctsTraffic/tree/master/Releases/2.0.3.5 .

New Visualization Tool!

A great new visualization toolset has been created that can post-process the output files generated by ctsTraffic. At your convenince, please look at https://github.com/microsoft/Network-Performance-Visualization


A Practical Guide


ctsTraffic was a tool initially developed just after Windows 7 shipped to accurately measure how our diverse network deployments scale, as well as assessing its network reliability. Since then we have added a huge number of options to work within an increasingly growing number of deployments. This document reviews the 90% case that most people would likely want to start.

Good-Put

ctsTraffic is deliberately designed and implemented to demonstrate various best-practice guidance we (Winsock) have provided app developers for designing efficient and scalable solutions. It has a "pluggable" model where we have author multiple different IO models -- but the default IO model is what will be most scalable for most network-facing applications.

As our IO models are implemented to model what we want apps and services to build, the resulting performance data is a strong reflection of what one can expect normal apps and services to see in the tested deployment. This throughput measurement of data as seen from the app is commonly referred to as "good-put" (as opposed to "through-put" which is generally measured at the hardware level in raw bits/sec).

A suggested starting point: measuring Good Put

The below set of options (using most default options) is generally a good starting point when measuring good put and reliability. These options will have clients maintain 8 TCP connections with the server, sending 1GB of data per connection. Data will be flowing unidirectionally from the client to the server ('upload' scenarios).

These options will also a good starting point to track the reliability of a network deployment. It provides data across multiple reliability pivots:

ServerClient
ctsTraffic.exe ctsTraffic.exe
-listen:* -target:<server>
-consoleverbosity:1 -consoleverbosity:1
-statusfilename:clientstatus.csv
-connectionfilename:clientconnections.csv

Note: if one needs to measure the other direction, the clients receiving data from servers, one should append -pattern:pull to the above commands on both the client and the server.

We found the above default values to generally be an effective balance when measuring Good Put, balancing the number of connections being established to send and receive data with the number of bytes being sent per connection. We found these values scale very well across many scenarios: down to small devices with slower connections and up to reaching 10Gbit deployments. (Note: once one gets to 10Gb we recommend doubling the number of connections and moving to 1TB of data sent; increasing both again at 40Gb).

Explaining the console output

As a sample run, the below is output from a quick test ran over loopback (client and server were both run on my same machine). Note that the -consoleverbosity: flag controls the type and detail of what it output to the console (setting 0 turns off all output).

C:\Users\kehor\Desktop\2.0.1.7> ctsTraffic.exe -target:localhost -consoleverbosity:1 -statusfilename:clientstatus.csv -connectionfilename:clientconnections.csv

Configured Settings

    Protocol: TCP
    Options: InlineIOCP
    IO function: Iocp (WSASend/WSARecv using IOCP)
    IoPattern: Push \<TCP client send/server recv\>
    PrePostRecvs: 1
    PrePostSends: 1
    Level of verification: Connections & Data
    Port: 4444
    Buffer used for each IO request: 65536 \[0x10000\] bytes
    Total transfer per connection: 1073741824 bytes
    Connecting out to addresses:
           [::1]:4444
           127.0.0.1:4444
    Binding to local addresses for outgoing connections:
           0.0.0.0
           ::
   Connection limit (maximum established connections): 8 \[0x8\]
   Connection throttling rate (maximum pended connection attempts): 1000 \[0x3e8\]
   Total outgoing connections before exit (iterations \* concurrent connections) : 0xffffffffffffffff

Legend:

* TimeSlice - (seconds) cumulative runtime
* Send & Recv Rates - bytes/sec that were transferred within the TimeSlice period
* In-Flight - count of established connections transmitting IO pattern data
* Completed - cumulative count of successfully completed IO patterns
* Network Errors - cumulative count of failed IO patterns due to Winsock errors
* Data Errors - cumulative count of failed IO patterns due to data errors
TimeSliceSendBpsRecvBpsIn-FlightCompletedNetErrorDataError
0.001008000
5.00226353570621248800
10.003251926359617181900
15.001243700278420283200
20.002263965536417184300
25.002255751618521885700

Historic Connection Statistics (all connections over the complete lifetime)

SuccessfulConnections [59] NetworkErrors [0] ProtocolErrors [0]

Total Bytes Recv : 5194

Total Bytes Sent : 67358818304

Total Time : 26357 ms.

Configured Settings

The banner under "Configured Settings" shows many of the defaulted options.

-consoleverbosity:1

Setting console verbosityto 1 will output an aggregate status at each time slice. The default time slice is every 5 seconds; the time slice is configurable: -statusUpdate. At every 5 seconds, a line will be output communicating the following aggregate information:

This output serves to give the viewer a quick assessment of what is, and has, occurred across the TCP connections that were established. The output functions the same on both the client and the server.

Options for controlling how long a test runs

ctsTraffic has a few options for controlling the amount of time for a run before it exits.

The manual approach is to just hit CTRL-C in the command-shell. ctsTraffic recognizes the key-press and will gracefully exit, ensuring data is accurately flushed to all log files.

The client can control its exit through 2 possible parameter combinations:

The server can control its exit through just 1 option -- as it is designed to accept any number of connections from any number of clients.

Explaining the generate log files

In the same sample as above, two log files were created due to the following command line options: -statusfilename:clientstatus.csv -connectionfilename:clientconnections.csv. The csv extension informed ctsTraffic output the files in a comma-separated values format (any other extension would be written as a line of text).

StatusFilename

The status file writes out the same information to a csv as is written to console with the above -consoleverbosity:1 option set. This is useful for later analysis, notably in an application like Excel. Imported into Excel, 25 seconds worth of data would look like this:

TimeSliceSendBpsRecvBpsIn-FlightCompletedNetErrorDataError
0.001208000
5.00226355143171248800
10.003251978189817181900
15.001243702901420283200
20.002263983882917184300
25.002255756861421885700

This becomes useful as Excel can quickly give richer views into our data.

Immediately we can look at the last line at Completed (there were 57 successful TCP connections when the last time slice was recorded), NetError and DataError (there were 0 failed TCP connections either through network failures or data errors).

For example, if we wanted to take an average of the SendBps values starting at time slice 5, we would simply specify this in a cell, =AVERAGE(B3:B7). Similarly we can see the min and max with =MIN(B3,B7) and =MAX(B3:B7) respectively. With longer runs and large data sets, this can be notably powerful to understand the overall performance metrics of a run.

Additionally, Excel does quick graphing, which can be some of the most powerful ways to view data. With a larger dataset (with the same above commands specified above), the final graph for SendBps on the client looked like this:

[[CHART]]{.chart}

[[CHART]]{.chart}

Bits per second was generated by adding a column and telling Excel to multiple SendBps * 8 (the result looks like a consistent ~20 Gbps).

ConnectionFilename

The status file writes out per connection information to a csv (this would be the same as what is written to console with "-consoleverbosity:3" option set). This is useful for later analysis when wanting to look at patterns across a long test run.

Here's an example from the above sample run of the first 10 connections:

{width="6.5in" height="1.3652777777777778in"}

In this log file we can see individual TCP connections recorded.

As with the Status file analysis, Excel can give deeper insight into the test run. For example:

[[CHART]]{.chart}

As an example, the above 2 charts give a rich view into addressing the "fairness" question across all connections across 5 minutes (over loopback). One could do similar analysis comparing connections across different server addresses to look for issues with servers or groups of servers (e.g. behind a bad routers for example).

A detailed network behavior of the above example

For those inclined, the below explains in more detail the network traffic generated with the above commands.

Scaling

ctsTraffic was deliberation designed to scale: scaling down (it's been used with very small IoT parts to view what good put looks like on a device with very few resources), scaling up (it's been used to large servers with 50 Gbps configurations to look at expected good put), and scaling out (it's been tested with deployments of 10s of thousands of concurrent connections; tested up to 500,000 connections).

We generally recommend scaling to match both the expected nominal deployments and workloads as well as the 90% extreme deployments and workloads. Using the options above to increase the numbers of concurrent connections, ctsTraffic will naturally scale to the resources and network pipes available.

It's useful to note that this scaling comes with the same coding models -- the same code runs which measures small IoT devices without overloading their CPUs as severs with hundreds of cores that run 50Gbps pipes. This all comes with our recommendation: using overlapped I/O with the NT thread-pool and handling inline completions. It's a great demonstration how the Windows OS will scale naturally.

Testing for reliability

We have added features over time which we found greatly helps in measuring the reliability of a networking deployment. Below are examples of combinations of options we have found to be particularly useful in discovering issues in networking components and devices.

Looking for data corruption

While thankfully rare, we have found one method has been particularly successful in discovering data corruption issues in hardware and software stacks. This has found data corruption issues across a variety of vendors and deployments. Interestingly in most cases it was only ctsTraffic and only when ctsTraffic was run in this way was the data corruption observed.

ServerClient
ctsTraffic.exe ctsTraffic.exe
-listen:* -target:<server>
-consoleverbosity:1 -consoleverbosity:1
-pattern:duplex -pattern:duplex

The unique bit here was running the "full duplex" pattern. This data pattern will send and receive at line rate concurrently: sends posting as quickly as they can post, receives posting as quickly as they can post, all in parallel. This often results in making software work the "hardest" as it must be tracking each TCP stream of data going in both directions, at line rate. "At line rate" was also generally required. With some 40 Gbps network devices we would only discover corruptions when running the duplex pattern at full 40 Gbps bidirectional line rate.

Note that scaling the number of connections and transfer size of each connection as one goes above 1Gbps does also help as it allows more time for each connection. If a network deployment continues to have trouble scaling to line rate, specifying the buffer sizes to 1MB can help (-buffer:1048576).

Randomizing buffer sizes

If one wants to work even harder to find data corruption bugs, one can instruct ctsTraffic to randomize the buffer sizes used for each send and receive request. This will often change the buffering patterns across a networking stack, as TCP segments get created of different sizes which can influence many other TCP factors, such as packet sizes and window sizes.

The default value is 64k for all IO requests on all connections. Randomizing buffer sizes can be done by specifying a range with square brackets. The below is an example where each TCP connection would be randomly choosing a buffer size to use for that connection between 1KB and 1MB.

ServerClient
ctsTraffic.exe ctsTraffic.exe
-listen:* -target:<server>
-consoleverbosity:1 -consoleverbosity:1
-pattern:duplex -pattern:duplex
-buffer:[1024,1048576] -buffer:[1024,1048576]

As noted previously, adjusting numbers of connections and the total transfer size can be useful especially when working on deployments beyond 1Gbps.

Looking for connection establishment issues

If one wants to work even harder to find issue in connection establishment, there are options which can be used to force many more connections to happen over time. The key to doing so is giving a much larger value for the number of connections: -connection [as well as]{.underline} giving a much smaller transfer size: -transfer:. The combination tells ctsTraffic a) maintain a lot of concurrent connections, and b) each connection should be very short-lived.

The result will cycle through a lot of connections very quickly.

ServerClient
ctsTraffic.exe ctsTraffic.exe
-listen:* -target:localhost
-consoleverbosity:1 -consoleverbosity:1
-transfer:64 -transfer:64
-connections:100

These commands when run as a quick test created the below output:

TimeSliceSendBpsRecvBpsIn-FlightCompletedNetErrorDataError
5.0000045155500
10.00019737240550310000
15.00019200234000460000
20.00019200234000610000
25.00019200234000760000
30.00119196233950910000
35.000192032340401060000
40.000192002340001210000
45.001191962339501360000
50.000192032340401510000
55.001166232025944163721540
60.00191001109233171108970
65.0001003712224491786816630

In the output from the run we can see in the Completed column that we were quickly iterating through many thousands of successful connections.

One should also note that at around the 55 second mark we started seeing errors. This is because of a TCP behavior called TIME-WAIT. Because the default behavior for ctsTraffic is for the clients to issue a graceful shutdown at the end of a connection, we create a 4-way FIN to gracefully tear down that TCP connection. While this is a typical way clients and servers terminate connections this can result in the client's tuple (its IP and port) to be temporarily held in a "time-wait" state per RFC. While in these states that port cannot be reused.

This can result in exhausting available ephemeral ports that the client can choose from (even with some recent Windows TCP/IP stack fixes to work harder to potentially reuse some of these ports).

We have options in ctsTraffic which can help to work around this issue: one can tell ctsTraffic how to terminate each successful connection. To avoid entering time-wait, we can tell ctsTraffic to force a RST to shutdown the connection. An RST is a rude/abrupt way to end a connection but is perfectly valid. The command line with this combination would like this:

ServerClient
ctsTraffic.exe ctsTraffic.exe
-listen:* -target:localhost
-consoleverbosity:1 -consoleverbosity:1
-transfer:64 -transfer:64
-connections:100
-shutdown:rude

The -shutdown option (either 'graceful' or 'rude') will instruct the client in how to end their connection (the server will always wait for the client to initiate a closure and therefore never enter time-wait -- something we highly recommend to those building server software). As you see in our simple example instead of seeing failures after about 16,000 connections, we were still creating successful connections after 20,000 connections.

TimeSliceSendBpsRecvBpsIn-FlightCompletedNetErrorDataError
0.001000000
5.000200872448034156600
10.00119592238790310000
15.00019203234040460000
20.00019200234000610000
25.00019200234000760000
30.00119196233950910000
35.000192032340401060000
40.000192002340001210000
45.000192002340001360000
50.000192002340001510000
55.000192002340001660000
60.000192002340001810000
65.000192002340001960000
70.000192002340002110000

UDP stream reliability

ctsTraffic measures UDP flows through media streaming semantics -- how most apps (especially client facing apps) use UDP datagrams. In our UDP stream implementation, every datagram is tagged by number and by time. Thus, the client receiving the stream of datagrams from the server can accurately identify every dropped datagram as well as validating the data integrity of each received datagram (the same bit-pattern analysis occurs with UDP as with TCP to check for data corruption).

A suggested starting point: measuring a common UDP stream

It's recommended to start with current stream behaviors -- to replicate and measure those streams over time. To express this in scenario terms, Netflix of often streaming much of its 2160p (4K) content at 15.26 Mbps, though they recommend 25 Mbps availability.

We can accurately measure a deployment's ability to stream a 4K movie at these rates. We will accurately send the specified stream and upon receiving verify the data integrity of all datagrams, track all lost frames (which would translate to lost packets), and track all repeated frames (which can happen with various network topologies).

ServerClient
ctsTraffic.exe ctsTraffic.exe
-listen:* -target:localhost
-protocol:udp -protocol:udp
-bitspersecond:25000000 -bitspersecond:25000000
-framerate:60 framerate:60
-bufferdepth:1 -bufferdepth:1
-streamlength:60 -streamlength:60
-consoleverbosity:1 -consoleverbosity:1
-connections:1
-iterations:1
-statusfilename:udpclient.csv
-connectionfilename:udpconnection.csv
-jitterfilename:jitter.csv

These options specify for the client to send a datagram to the server to initiate a "connection" -- where the server will be sending 25Mbps of data across 60 "frames" (datagrams) per second. Buffer depth is how much of a time allowance the client will allow for variance in receiving datagrams. 1 second is generally fine for most simulations.

The result of this test produces 3 log files, 2 similar to the TCP logs and one which tracks jitter by comparing time stamps within the received datagrams.

Explaining the console output

As a sample run, the below is output from a quick test ran over loopback (client and server were both run on my same machine). Note that the -consoleverbosity: flag controls the type and detail of what it output to the console (like with TCP, setting 0 turns off all output).

C:\\Users\\kehor\\Desktop\\2.0.1.7\> **ctsTraffic.exe -target:localhost -protocol:udp -bitspersecond:25000000 -framerate:60 -bufferdepth:1 -streamlength:60 -connections:1 -iterations:1 -consoleverbosity:1 -statusfilename:udpclient.csv -connectionfilename:udpconnection.csv -jitterfilename:jitter.csv**

Configured Settings

    Protocol: UDP
    Options: InlineIOCP SO\_RCVBUF(1048576)
    IO function: MediaStream Client
    IoPattern: MediaStream \<UDP controlled stream from server to client\>
    PrePostRecvs: 2
    PrePostSends: 1
    Level of verification: Connections & Data>
    Port: 4444
    Buffer used for each IO request: 52083 \[0xcb73\] bytes
    Total transfer per connection: 187498800 bytes
    UDP Stream BitsPerSecond: 25000000 bits per second
    UDP Stream FrameRate: 60 frames per second
    UDP Stream BufferDepth: 1 seconds
    UDP Stream StreamLength: 60 seconds (3600 frames)
    UDP Stream FrameSize: 52083 bytes
    Connecting out to addresses:
           [::1]:4444
           127.0.0.1:4444
    Binding to local addresses for outgoing connections:
           0.0.0.0
           ::
    Connection limit (maximum established connections): 1 \[0x1\]
    Connection throttling rate (maximum pended connection attempts): 1000 [0x3e8]
    Total outgoing connections before exit (iterations \* concurrent connections) : 1 [0x1]

Legend:

* TimeSlice - (seconds) cumulative runtime
* Streams - count of current number of UDP streams
* Bits/Sec - bits streamed within the TimeSlice period
* Completed Frames - count of frames successfully processed within the TimeSlice
* Dropped Frames - count of frames that were never seen within the TimeSlice
* Repeated Frames - count of frames received multiple times within the TimeSlice
* Stream Errors - count of invalid frames or buffers within the TimeSlice
TimeSliceBits/SecStreamsCompletedDroppedRepeatedErrors
5.0002901240000
10.000249998401300000
15.000249998401300000
20.000249998401300000
25.000250048401300000
30.000249998401300000
35.000249998401300000
40.000249998401300000
45.001249998401300000
50.000250048401300000
55.000249998401300000
60.000250048401300000
61.273327566060000

Historic Connection Statistics (all connections over the complete lifetime)

SuccessfulConnections [1] NetworkErrors [0] ProtocolErrors [0]

Total Bytes Recv : 187498800

Total Successful Frames : 3600

Total Dropped Frames : 0

Total Duplicate Frames : 0

Total Error Frames : 0

Total Time : 61273 ms.

The banner under Configured Settings shows default settings with how the streaming parameters were turned into datagram rates.

-consoleverbosity:1 will output an aggregate status at each time slice. The default time slice is every 5 seconds; the time slice is configurable: -statusUpdate. At every 5 seconds, a line will be output communicating the following aggregate information:

Explaining the generated log files

In the same sample as above, three log files were created due to the following command line options: "-statusfilename:udpclient.csv -connectionfilename:udpconnection.csv -jitterfilename:jitter.csv". The csv extension informed ctsTraffic output the files in a comma-separated values format (any other extension would be written as a line of text).

StatusFilename

The status file writes out the same information to a csv as is written to console with the above "-consoleverbosity:1" option set. This is useful for later analysis, notably in an application like Excel. Imported into Excel, 25 seconds worth of data would look like this:

TimeSliceBits/SecStreamsCompletedDroppedRepeatedErrors
52901240000
10249998401300000
15249998401300000
20249948411300000
25250048401300000
30249998401300000
35249998401300000
40249948411300000
45.001249998401300000
50250048401300000
55249948411300000
60249998401300000
61.273327566060000

[[CHART]]{.chart}

[[CHART]]{.chart}

ConnectionFilename

The status file writes out per connection information to a csv (this would be the same as what is written to console with "-consoleverbosity:3" option set). This is useful for later analysis when wanting to look at patterns across a long test run.

This is similar to the TCP connection view, this time with aggregate data points for each connection.

For a sample, I ran the above 25Mbps run with 10 concurrent UDP streams (-connections:10). Below is the connection output:

{width="6.5in" height="1.3923611111111112in"}

In this log file we can see individual UDP sessions ("connections") recorded.

As with the Status file analysis, Excel can give deeper insight into the test run. For example:

JitterFilename

When making a test run with just a single UDP connection, the client can also track jitter information. This is collected by tracking every individual datagram received and looking at the times stamped on it by the server. Even though the client and server are not time synchronized, the client can still analyze the latency deltas by using the first datagram received as its baseline and calculating gaps. Because the client and server had the same parameters specified, the client know the number of milliseconds that the server would have been waiting between calls to send(). By subtracting the known timer value when the server was waiting between sends it can calculate the time between the actual send and the resulting receive.

As a trivial example, here is jitter being tracked over loopback for the first 30 datagrams received:

{width="6.5in" height="4.928472222222222in"}

The chart shows the sender and receiver QueryPerformanceCounter and QueryPerformanceFrequency values which ctsTraffic stamped in the datagram payload. This leads to being able to calculate "Estimated Received Datagram Time In Flight". You'll note that being over loopback and given optimizations the math resulted in the time in flight being negative .

Streaming over Wi-Fi

As a more interesting example, running the above 25Mbps stream from a small Surface laptop over Wi-Fi to another machine also connected over Wi-Fi shows more diverse data.

The status output now shows more variance in throughput as well as infrequent packet drops. Because this was a shorter run (only 60 seconds) and I wanted to look into greater detail, I set -statusUpdate:500 so I had a twice/second updated view of throughput and packet drops.

Here's a sample of the first 10 seconds:

{width="5.638194444444444in" height="4.595138888888889in"}

You'll notice now that we have data every ½ second (500 ms). We can see we expected to receive 30 individual frames within each ½ second and there were bursts when datagrams were dropped.

Graphing always helps .

[[CHART]]{.chart}

This is showing bits/second across the entire 60 seconds of the stream. Because this is Excel I could also quickly do =AVERAGE(), =MIN(), and =MAX() to get a slightly better view into the data:

AVERAGE24091960
MIN23333184
MAX24850735

Just as useful we can look at Jitter data in a more relevant scenario -- here are the first 20 frames:

{width="6.5in" height="3.238888888888889in"}

We can see that variance drifted quite a bit, with larger gaps with a few negative gaps as datagrams arrived in bursts. Graphing this information gives us insightful views into the variance between datagrams received:

[[CHART]]{.chart}

With these views we can now see the variance distribution over this 60 second 25 Mbps stream of datagrams.