Monday, January 29, 2018

DigitalOcean versus Linode Performance



Background

I've been consistently using Linode since the summer of 2007 for virtual Linux systems. Their support team has been stellar when it comes to promptness and thoroughness. They also tend to hand out nice surprises in the form of disk space and memory around the time of the company's birth date (June 16.) I have been and continue to be a satisfied customer.

Of course, I've also heard of one of their competitors: DigitalOcean. I have historically come across DigitalOcean's various guides during my web sleuthing for a variety of application installation and configuration inquiries. I have found their documentation to be very helpful. Curiosity finally got the best of me a couple of weeks back and I finally created an account. I was also looking at giving Google Cloud a whirl. I have used AWS and Azure in the past, but the nickel-and-dime cloud pricing model has always bothered me. Easy/flat-rate pricing won for now.

Now that I have given DigitalOcean a spin, I figured I would share my impressions of it along with the results of some performance tests I've conducted in comparison to one of my Linode systems.


What I Like About DigitalOcean

Aside from their excellent collection of documentation which I've already mentioned, their web interface is easy to navigate and is quite clean. I was quite excited to see that they offer and support several releases of FreeBSD. This is something I wanted Linode to do for nearly a decade without the hassle of pushing your own custom image to their site. Can you guess what OS I selected for my first droplet? :) A virtual guest instance is referred to as a "droplet" at DigitalOcean by the way! My droplet was ready in under a minute after filling out the hostname and checking various options (IPv6, internal network support, etc.)

One of the first things I did was recompile my kernel. This operation went smoothly. It is also noteworthy to mention that I have maintained an SSH connection to my first droplet for over a week straight. The network seems quite stable (my droplet is hosted in the NYC3 datacenter.) One of the significant advantages I forgot to comment on is that DigitalOcean plans currently offer more disk capacity than equivalent Linode plans. For example, the $20 plan at Linode allows for 48GB of SSD storage whereas the $20 DigitalOcean plan provides a whopping 80GB of SSD storage. This is certainly nothing to scoff at.

Their API looks to be well written. I have not had the opportunity to use it yet. Additionally, I have not had to open a support ticket for anything at this point. I will update this post with my experience should that ever occur. Lastly, DigitalOcean recently announced the release of their object storage service called "spaces". I plan to take advantage of their 2-month trial to see what it is all about very soon. So far, I am liking DigitalOcean quite a bit.


Performance Test Results

Now that the introductory stuff is out of the way, let's get to the real reason why you're probably here... Let's take a look at the specs first.

DigitalOcean Linode
Geographic Region New York City, NY, USA Dallas, TX, USA
Virtualization Technology KVM KVM
Operating System FreeBSD 11.1 64 bit CentOS 7.2 64 bit
CPU Model Intel(R) Xeon(R) CPU E5-2650L v3 Intel(R) Xeon(R) CPU E5-2680 v2
CPU Frequency 1.80GHz 2.80GHz
Core(s) 1 1
Memory 1GB 1GB
Disk Capacity 25GB 20GB
Disk Type SSD SSD
Filesystem ZFS ext4
Ingress ? 40Gbps
Egress ? 1GbE

CPU Benchmark
Up first are the CPU benchmark results using sysbench. The DigitalOcean FreeBSD system is first:
root@vector:/tmp # date; sysbench --test=cpu --cpu-max-prime=20000 run
Sun Jan 28 21:14:48 EST 2018
sysbench 1.0.11 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   238.70

General statistics:
    total time:                          10.0008s
    total number of events:              2388

Latency (ms):
         min:                                  3.86
         avg:                                  4.17
         max:                                  9.88
         95th percentile:                      4.91
         sum:                               9968.19

Threads fairness:
    events (avg/stddev):           2388.0000/0.00
    execution time (avg/stddev):   9.9682/0.00

And now, the Linode Linux system:
[root@stack /tmp]# date; sysbench --test=cpu --cpu-max-prime=20000 run
Sun Jan 28 21:19:11 EST 2018
sysbench 1.0.9 (using system LuaJIT 2.0.4)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   351.07

General statistics:
    total time:                          10.0008s
    total number of events:              3512

Latency (ms):
         min:                                  2.81
         avg:                                  2.85
         max:                                  3.82
         95th percentile:                      2.91
         sum:                               9992.46

Threads fairness:
    events (avg/stddev):           3512.0000/0.00
    execution time (avg/stddev):   9.9925/0.00

Despite DigitalOcean sporting a newer Haswell-based Xeon, Linode's older Ivy Bridge Xeon has a faster clock rate and clearly handles more events per second.

CPU Performance Winner: Linode

Memory Benchmark

This test was performed using a pair of 256M tmpfs RAM-based filesystems. Again, the DigitalOcean FreeBSD system is up first:
root@vector:/mnt # mount -t tmpfs -o size=256m tmpfs /mnt/ramdisk1
root@vector:/mnt # mount -t tmpfs -o size=256m tmpfs /mnt/ramdisk2

root@vector:/mnt # df -h | grep ramdisk
tmpfs                 256M    4.0K    256M     0%    /mnt/ramdisk1
tmpfs                 256M    4.0K    256M     0%    /mnt/ramdisk2

root@vector:/mnt # dd if=/dev/zero of=/mnt/ramdisk1/test bs=1M
dd: /mnt/ramdisk1/test: No space left on device
256+0 records in
255+0 records out
267386880 bytes transferred in 0.154986 secs (1725231326 bytes/sec)

root@vector:/mnt # time cp /mnt/ramdisk1/test /mnt/ramdisk2/test

real    0m1.174s
user    0m0.013s
sys     0m0.509s

root@vector:/mnt # time cmp /mnt/ramdisk1/test /mnt/ramdisk2/test

real    0m1.927s
user    0m0.771s
sys     0m0.135s

And next up is the Linode Linux system:
[root@stack /mnt]# mount -t tmpfs -o size=256m tmpfs /mnt/ramdisk1
[root@stack /mnt]# mount -t tmpfs -o size=256m tmpfs /mnt/ramdisk2

[root@stack /mnt]# df -h | grep ramdisk
tmpfs           256M     0  256M   0% /mnt/ramdisk1
tmpfs           256M     0  256M   0% /mnt/ramdisk2

[root@stack /mnt]# dd if=/dev/zero of=/mnt/ramdisk1/test bs=1M
dd: error writing ‘/mnt/ramdisk1/test’: No space left on device
257+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 0.137701 s, 1.9 GB/s

[root@stack /mnt]# time cp /mnt/ramdisk1/test /mnt/ramdisk2/test

real    0m0.191s
user    0m0.000s
sys     0m0.189s

[root@stack /mnt]# time cmp /mnt/ramdisk1/test /mnt/ramdisk2/test

real    0m0.469s
user    0m0.235s
sys     0m0.232s

The initial memory write times were fairly close between the two providers. The in-memory copy and compare operations were significantly speedier on the Linode as indicated above however.

Memory Performance Winner: Linode

Disk Benchmark

This particular test will use both standard "dd" and sysbench. DigitalOcean FreeBSD system:
root@vector:/tmp # dd if=/dev/zero of=/tmp/test1.img bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 1.928485 secs (543730518 bytes/sec)

root@vector:/tmp # sysbench --test=fileio --file-total-size=1G prepare
sysbench 1.0.11 (using system LuaJIT 2.0.5)

128 files, 8192Kb each, 1024Mb total
Creating files for the test...
Extra file open flags: 0
1073741824 bytes written in 7.59 seconds (134.86 MiB/sec).

And the Linode Linux system:
[root@stack /tmp]# dd if=/dev/zero of=/tmp/test1.img bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.977575 s, 1.1 GB/s

[root@stack /tmp]# sysbench --test=fileio --file-total-size=1G prepare
sysbench 1.0.9 (using system LuaJIT 2.0.4)

128 files, 8192Kb each, 1024Mb total
Creating files for the test...
Extra file open flags: 0
1073741824 bytes written in 1.73 seconds (591.77 MiB/sec).

Disk Performance Winner: Linode (substantially)

Network Benchmark
In this test, I'll be pulling down a 1GB file using wget from a competitor of both companies, Vultr. I will be using Vultr's Chicago datacenter which is approximately equidistant between Dallas (Linode) and New York City (DigitalOcean) to test download rates. I will then attempt to sftp the same file to a server in my LAN (which is also fairly centralized between both test locations) to test the upstream of both providers. Here are the results from DigitalOcean:
root@vector:/tmp # wget https://il-us-ping.vultr.com/vultr.com.1000MB.bin
--2018-01-28 23:10:38--  https://il-us-ping.vultr.com/vultr.com.1000MB.bin
Resolving il-us-ping.vultr.com (il-us-ping.vultr.com)... 107.191.51.12
Connecting to il-us-ping.vultr.com (il-us-ping.vultr.com)|107.191.51.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1048576000 (1000M) [application/octet-stream]
Saving to: ‘vultr.com.1000MB.bin’

vultr.com.1000MB.bin                                        100%[========================================================================================================================================>]   1000M  48.3MB/s    in 22s

2018-01-28 23:11:00 (44.8 MB/s) - ‘vultr.com.1000MB.bin’ saved [1048576000/1048576000]
<23:13:20> (~)
[lloyd@lindev] :) > sftp testuser@vector
Connected to vector.
sftp> cd /tmp
sftp> get vultr.com.1000MB.bin
Fetching /tmp/vultr.com.1000MB.bin to vultr.com.1000MB.bin
/tmp/vultr.com.1000MB.bin                       6%   67MB   2.0MB/s   07:56 ETA

The download to my LAN capped out at an even 2MB/sec. Below are the Linode results:
[root@stack /tmp]# wget https://il-us-ping.vultr.com/vultr.com.1000MB.bin
--2018-01-28 23:10:45--  https://il-us-ping.vultr.com/vultr.com.1000MB.bin
Resolving il-us-ping.vultr.com (il-us-ping.vultr.com)... 107.191.51.12
Connecting to il-us-ping.vultr.com (il-us-ping.vultr.com)|107.191.51.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1048576000 (1000M) [application/octet-stream]
Saving to: ‘vultr.com.1000MB.bin’

100%[=================================================================================================================================================================================================>] 1,048,576,000 97.6MB/s   in 9.8s

2018-01-28 23:10:55 (102 MB/s) - ‘vultr.com.1000MB.bin’ saved [1048576000/1048576000]
<23:19:28> (~)
[lloyd@lindev] :) > sftp testuser@stack
Connected to stack.
sftp> cd /tmp
sftp> get vultr.com.1000MB.bin
Fetching /tmp/vultr.com.1000MB.bin to vultr.com.1000MB.bin
/tmp/vultr.com.1000MB.bin                       8%   89MB 978.0KB/s   15:53 ETA

The download to my LAN from Linode in this case never went above 1.1MB/sec. The download from Vultr to Linode was complete in less than half the time as the DigitalOcean transfer.

Network Performance Winner (Download): Linode
Network Performance Winner (Upload): DigitalOcean

*Performance is relative to location and tubing in this context of course!


Final Thoughts

Linode seems to handily outperform DigitalOcean in practically all test cases. Obviously, there are variances within my control which could impact the outcome. Three that come to mind are the platform (FreeBSD vs. Linux), filesystem (ext4 vs. ZFS), and sysbench version (1.0.9 vs. 1.0.11). The filesystem benchmark could potentially be the most affected in this case. However, I highly suspect that Linux on DigitalOcean would yield very similar results.

Would I still recommend and continue to use DigitalOcean? Absolutely -- for the reasons I cited at the beginning of this post. Performance certainly is not everything and DigitalOcean still provides more disk capacity for equivalent pricing along with the option to create a FreeBSD system.

Technology and services are constantly changing. DigitalOcean and Linode will undoubtedly upgrade their hardware and modify their plans again in the future. I hope that any readers found this information insightful.

Edit: I forgot to mention that you can use the following link for a $10 credit at DigitalOcean. This is enough to run a single $10/month or two $5/month instances for 1 month or a single $5 instance for 2 months so you can test drive the service yourself. Enjoy!: https://m.do.co/c/28ccd3d01301

Saturday, January 27, 2018

Nullifying Triglycerides

For folks looking to significantly reduce their triglycerides without fibrate and statin drugs, read on. I often read medical literature for fun. Two of the sources I frequent are Mayo Clinic and National Institutes of Health. I was facing this problem because I have a desk job, my hobbies (gaming and reading) involve sitting, and I love food. Genetics can also play a role. I am hoping that others find this information useful. The key is applying the knowledge.

Allow me to first explain what a triglyceride is for those who do not know. A triglyceride is a type of fat in the blood. The two most common causes of high triglycerides are consuming too many calories and not expending enough calories. Given the causes, we can deduce that people with high triglycerides should:

1.) Eat fewer calories per day. For non-athletic males, 1,800 calories is probably optimal. For non-athletic, non-pregnant, and non-nursing females, 1,600 calories is probably adequate. It is also interesting to note that caloric restriction has been shown in animal models and some human populations to increase longevity (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3014770/.) Going too low is also unhealthy and poses the potential risk of nutritional deficits.

2.) Move more/stop being sedentary. You do not need to run or do anything intense. Even casual walking is beneficial.

I am inclined to believe that even caloric reduction alone would be sufficient enough to reduce triglycerides over time even if sedentary. The convergence of dietary changes and aerobic activity would likely make the goal of triglyceride reduction occur more quickly however.

So, where do triglycerides come from? Aside from excess calories in any form, the largest offenders are refined carbohydrates including pizza dough, white bread, white rice, crackers, anything with sugar (honey, brown sugar, white sugar, brownies, muffins, cupcakes, cookies, etc.), pasta, and fruit juices. If these types of foods are not burned off, the liver stores them for later use. This is a problem as it can lead to non-alcoholic fatty liver disease (NAFLD.) Furthermore, very high levels of circulating triglycerides can induce gallstone formation and pancreatitis.

Aside from the two solutions above, you can also try 1-4g of omega-3 fatty acids per day. A prescription form is available as Lovaza which contains ethyl esters of eicosapentaenoic (EPA) and docosahexaenoic (DHA) acids. I opted for Carlson lemon-flavored fish oil which supplies 1,600mg of omega-3 fatty acids per teaspoon (800mg of EPA and 500mg of DHA) in triglyceride form. One teaspoon in the morning and one at night worked for me. This method, on average, seems to reduce plasma triglycerides by approximately 30% which is quite significant. Pharmacological effects of omega-3 fatty acids on triglycerides: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563284/

You can also supplement with 500mg to 1g of vitamin C (ascorbic acid and/or ascorbate) to help retard the oxidization of low-density lipoproteins (LDL) and lower triglycerides per this article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682928/

Lastly, there is vitamin B3 or niacin in the form of nicotinic acid. It must be in the nicotinic acid form considering that nicotinamide/niacinamide nor inositol hexanicotinate will exert the desired effect. The prescription form is Niaspan or Niacor. I was fearful of using this compound as I've experienced the infamous flushing effect in my late teenage years when I was more into supplements. I made the mistake of taking a standalone B3 supplement in the nicotinic acid form while taking a B complex supplement which contained the same form. It was a rather uncomfortable sensation and even triggered a panic attack. I thought I was having an allergic reaction to something. My face had turned a nice shade of red and my skin was burning. My heart rate was also elevated. This is all caused by vasodilation (opening up of capillaries which reduces blood pressure -- the heart needs to beat faster to keep up in response.) The event lasted for approximately 30 minutes. I would never again willingly put myself through that! :)

Aside from reducing triglycerides significantly, nicotinic acid also increases high-density lipoprotein (HDL -- the "good" cholesterol that transports other harmful molecules back to the liver) and reduces LDL (the bad cholesterol.) The flushing effect can be reduced by taking niacin with a meal, taking an aspirin 30 minutes prior, and using extended-release forms. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3858911/

*Disclaimer: Please note that I am NOT a physician. I am merely a hobbyist who enjoys reading medical journals and published health-related studies. Prior to taking supplements, you should consult a pharmacist and/or doctor to ensure that any supplement will not negatively impact existing health conditions or other prescriptions you take.

Sunday, October 2, 2016

The Sad State of Digital "Ownership"

I've been thinking a lot lately about how consumers of computer software have very little rights. EULAs and subscriber agreements seem to give businesses all of the power and the end users get little to nothing. This is actually quite depressing and I am surprised that, in 2016, we do not have more rights as digital consumers. It seems like the EFF or someone should be advocating harder for such rights and getting more precedents set. Richard Stallman has done a decent job overall, but comes off as an extremist (no offense to Mr. Stallman as I am an avid GPL fan!) There needs to be a middle ground when it comes to commercial software. What I would ultimately like to see is the following (as it pertains to game software):

  • I should be able to lend my copy of a video game to anyone I see fit just like we still can with physical media for consoles. Steam is almost there with Steam Family Sharing, but they lock your entire library when another person is playing one of your games. The lock should be on a per-game basis. The current per-library lock is far too restrictive to make sharing practical.

  • I should be able to purchase a copy of a game ONE TIME (for the same platform at the very least.) For example, if I purchase Mass Effect 3 for the PC from a store, the license should be valid not just within Origin, but on any PC distribution system such as Steam. Something that has also irked me in this regard is how Square Enix forces you to re-purchase the PC client of Final Fantasy: A Realm Reborn when you already purchased the software for a console.

  • Origin, PlayStation Network Store, Steam, Uplay, Xbox Live Marketplace, and the like can pull the plug on your account at *any* time for *any* reason. Blizzard and Steam do not even tell you why and may even refuse to assist you further. If you spent hundreds to thousands of dollars on games, you lose access to those titles. You should retain the right to use what you paid for despite losing access to the service. This opens up another can of worms: What happens when an MMORPG server goes away? Now you're left with a client that you likely paid for and now it can connect to nothing. You probably weren't refunded either (I'm looking at you, NCSoft.)

  • When I die, I would like to give my game collection to a family member. However, most of the aforementioned digital distribution services do not allow transfer of the account or titles to anyone else.

  • When you legitimately purchase a copy of a game, you should own that copy just like you own physical objects that you pay for such as automobiles, mobile devices, gardening equipment, your toothbrush, etc.

This stuff should anger and/or scare you because it totally stinks. The way things are currently, the virtual items you forked over money for can essentially disappear and you'll have nothing to show for it. I would like to point out that I love Steam. However, this is the biggest fault I find with their service... And it is a glaring one. If they resolved this issue, they would be nearly perfect. For now, I have no choice but to agree to their terms in order to continue accessing my library while having the fear in the back of my mind that my access could be revoked at any moment. I have slowed down on Steam purchases for this very reason.

I have found some hope though with a platform known as GOG. The current selection is nowhere near as large as Steam, but more titles are being added as time progresses. I've started using their Galaxy Client which behaves similarly to Steam. Their games are not tied to the service. Once you download them, they are yours forever. You can also re-download them, although you should make backups just in case the service ever ceases to exist. Lastly, there is GOG Connect which lets you import some of your Steam games to your GOG Library at no cost. You'll now have two separate copies of the game that exists within both libraries. The selection of games on Connect is quite small at the moment, but the amount is expected to grow as GOG develops better relationships with publishers. I highly recommend GOG to anyone upset by the current state of things or if you're just a PC gamer.

Our savior?
I usually do not complain publicly, but wanted to get all of these thoughts out there in the hopes that more people express their disappointment with the present software licensing and lack of digital consumer rights situation.

Sunday, December 20, 2015

Tactfully Handling Common Java Complaints

Java Sucks! ... Or Not?

So, your friends tell you Java blows chunks. They've either heard/read it elsewhere, had a bad history with slow-loading applets in 1996, or have personally worked with the language and loathe it. I'd like to dispel some of the arguments against Java in the modern age since I think it is a decent language. Let's start with some common things we hear or see about Java...

Java is Slow!

I remember loading applets on websites in Windows 95 and absolutely detesting the experience. Of course, this was on a 486 SX operating at 25 MHz with 4M of RAM and a 14.4Kbps modem. Java was also still in its infancy. Many people had similar experiences and the rancor spread. Unfortunately, such sentiment still exists today -- 20 years later. I am even guilty. A coworker in 2006 had an O'Reilly Java book he offered to lend me. I declined the offer and poked fun at him for even suggesting such a thing. Fast forward two years and I had become a Java fan. It was not until I was forced to learn the language in college that I developed an appreciation for it. Let's review some facts:
  1. Hardware has improved since the 1990s. Processors are faster and have more cache and registers. Bus speeds, disks, memory, and network access are also faster which improve program load times.
  2. The JVM has improved since the 1990s. HotSpot/JIT (just-in-time compilation), JNI (Java Native Interface), and other features have been added. There is also an array of garbage collection algorithms to choose from depending on your application.
With the convergence of #1 and #2 above, Java performance has come a long way. However, in many (not all) cases, C and C++ programs still handily beat Java in the performance department. Have a look at the matrix addition performance comparison below. The code is functionally equivalent in both the C and Java programs. Two one-dimensional arrays are filled with random numbers ranging from 0 to 65,535. The sum of the elements at the current index in both arrays are stored at the current position in the first array.
#include <stdlib.h> /* rand() and srand() */
#include <time.h>   /* time_t */

void fill_rand(int *arr, int length)
{
  int i;
  for(i = 0; i < length; i++)
    arr[i] = rand() % 65535 + 1;
  return;
}

void add_arr(int *arr1, int *arr2, int length)
{
  int i;
  for(i = 0; i < length; i++)
    arr1[i] += arr2[i];
  return;
}

int main()
{
  int arr1[5000];
  int arr2[5000];
  time_t t;
  srand((unsigned)time(&t));
  fill_rand(arr1, 5000);
  fill_rand(arr2, 5000);
  add_arr(arr1, arr2, 5000);
  return 0;
}

import java.util.Random;

class Arr
{
  static void fillRand(int[] arr)
  {
    Random rand = new Random();
    for(int i = 0; i < arr.length; i++)
      arr[i] = rand.nextInt(65535) + 1;
  }

  static void addArr(int[] arr1, int[] arr2)
  {
    for(int i = 0; i < arr1.length; i++)
      arr1[i] += arr2[i];
  }

  public static void main(String[] args)
  {
    int[] arr1 = new int[5000];
    int[] arr2 = new int[5000];
    fillRand(arr1);
    fillRand(arr2);
    addArr(arr1, arr2);
  }
}

The C code executes in approximately 4ms. In comparison, the Java equivalent takes about 207ms. That is over fifty times longer than the C program. Why? Well, the JVM "warm-up" time needs to be considered. If we make the following changes to the main() method to disregard warm-up time, we get a more reasonable execution time of about 6ms:
    long startTime = System.nanoTime();
    int[] arr1 = new int[5000];
    int[] arr2 = new int[5000];
    fillRand(arr1);
    fillRand(arr2);
    addArr(arr1, arr2);
    long endTime = System.nanoTime();
    System.out.println((endTime - startTime) / 1000000); // get ms

That isn't so bad. In fact, that is where JIT shines for long-running and commonly-executed code. If we arbitrarily loop over the C program 100 times and sleep 1 second between iterations, the execution time will be similar for each iteration whereas the Java program should become faster (until a peak is achieved.) Modifying main() demonstrates this:
  public static void main(String[] args) throws InterruptedException
  {
    for(int i = 0; i < 100; i++)
    {
      Thread.sleep(1000);
      long startTime = System.nanoTime();
      int[] arr1 = new int[5000];
      int[] arr2 = new int[5000];
      fillRand(arr1);
      fillRand(arr2);
      addArr(arr1, arr2);
      long endTime = System.nanoTime();
      System.out.println("Iteration " + i + ": " + (endTime - startTime) / 1000000 + "ms");
    }
  }

[lloyd@lindev ~]$ java Arr
Iteration 0: 6ms
Iteration 1: 3ms
Iteration 2: 3ms
Iteration 3: 3ms
Iteration 4: 3ms
Iteration 5: 3ms
Iteration 6: 3ms
Iteration 7: 2ms
Iteration 8: 2ms
Iteration 9: 2ms
Iteration 10: 2ms
Iteration 11: 2ms
Iteration 12: 1ms
Iteration 13: 1ms
Iteration 14: 1ms
Iteration 15: 1ms
Iteration 16: 1ms
Iteration 17: 1ms
Iteration 18: 1ms
Iteration 19: 0ms
Iteration 20: 0ms
Iteration 21: 0ms
Iteration 22: 0ms
Iteration 23: 0ms
Iteration 24: 0ms
Iteration 25: 0ms
Iteration 26: 0ms
Iteration 27: 0ms
Iteration 28: 0ms
Iteration 29: 0ms
Iteration 30: 0ms
Iteration 31: 0ms
...

The code within the loop body eventually executes faster than the equivalent C code due to JVM runtime optimizations. Even though the program reports a time of 0ms, it obviously still takes micro/nanoseconds to compute which are truncated off.

Java is Insecure!

As with most programs written in C/C++ (Apache HTTPD, ISC BIND, OpenSSL, etc.) there are vulnerabilities detected periodically for Java. These are primarily due to the potential dangers of inappropriate pointer use or from undersized buffers which allow overflows. The Java language itself features policies (via security manager) you can manipulate to effectively sandbox an application. This isolation limits what the program can do with resources such as disks and network access. Another thing to consider is that the JVM is an entire platform and is fairly sizable. The JVM needs to define types, abstract networking and GUI elements, and more. There is obviously an increased risk of bugs the more lines of code a program contains. For most Windows installations, Java even nags you when updates are available or takes care of updating itself automagically. In summary for this section, Java has admittedly had a large number of vulnerabilities over the years. Battling exploits is a part of life that IT folks must deal with. On the bright side, operating systems, web browsers, and Adobe Flash seem to have more vulnerabilities and keeping Java up-to-date is relatively easy.

Java is Bloated!

Java can handily consume a substantial amount of memory. I had 4M of RAM in 1994 or so. It is fairly common for users to have 8G or 16G these days. Of course, that is no reason to be wasteful. But, one must again consider that the JVM is an entire platform with a large set of features. The memory overhead that accompanies this should be expected. There are many things you can do to tune JVM memory usage. To put things into perspective, I am running a Tomcat instance, a Jetty instance, and one more JVM instance for a JRuby daemon on a virtual private server with 2G of memory (which is also running a plethora of other services) without breaking a sweat. Java also runs on many less-powerful mobile and embedded devices. To recap: Memory is plentiful and fairly cheap these days, the JVM can be tuned to use less memory, and don't be such a tightwad!

Why Java is Annoying

  • Java language lawyers who believe the JLS (Java Language Specification) is the only thing that matters. To them, memory addresses do not exist...
  • Unlike Ruby, everything is not an object (primitives like byte, short, int, long, etc.)
  • No explicit pointers
  • The library is too big
  • Calls to System.gc() are only suggestions that can be ignored
  • Cannot explicitly call deconstructors
  • Forced to put main() method in a class
  • Syntax can be very verbose/repetitive
  • No operator overloading
  • No multiple inheritance
  • It can be a hog unless you cap the heap
  • No native way to become a daemon or service
  • Others?

Why Java is Awesome

  • It picks up after you
  • Not having to deal with explicit pointers
  • Huge library
  • No operator overloading
  • No multiple inheritance
  • Portable networking and GUI code
  • Largely portable for most other things/compile once run anywhere
  • No need for sizeof
  • It's ubiquitous
  • Multithreading
  • Others?

Thursday, December 10, 2015

64-bit Linux Assembly Tutorial Part II

Introduction

Welcome to the second installment of the 64-bit Linux assembly tutorial. If you have not yet read part one of this tutorial, you can do so here. If you have read it, I hope that you enjoyed it. We will be covering networking, debugging, optimizing, endianness, and analogous C code in this tutorial. Let us get to it!

Useful Links



The C Way

We are going to start by looking at how you create a network program in C. See Beej's Guide to Network Programming for more information. I am illustrating socket programming in a higher-level language to give you a better idea of the sequence of events that occur. In order to accept network connections in a C program (or assembly), you must take the following steps:
  1. Call socket() to obtain a file descriptor to be used for communication. We used file descriptors in the first tutorial (stdin/0 and stdout/1 specifically.)
  2. Call bind() to associate (or bind) the IP address of a network interface with the file descriptor returned by socket().
  3. Call listen() to make the file descriptor be receptive to incoming network connections.
  4. Call accept() to handle incoming network connections.
accept() returns a file descriptor for the client which you can use to send and receive data to and from the remote end. You can also call close() on the client file descriptor once you are done receiving or transmitting data. After putting it all together, it would look something like:
#include <stdio.h>        /* for printf() and puts() */
#include <stdlib.h>       /* for exit() and perror() */
#include <string.h>       /* for strlen() */
#include <sys/socket.h>   /* for AF_INET, SOCK_STREAM, and socket_t */
#include <netinet/in.h>   /* for INADDR_ANY and sockaddr_in */

#define PORT 9990         /* TCP port number to accept connections on */
#define BACKLOG 10        /* connection queue limit */

int main()
{
  /* server and connecting client file descriptors */
  int server_fd, client_fd;

  /* size of sockaddr_in structure */
  int addrlen;

  /* includes information for the server socket */
  struct sockaddr_in server_address;

  /* message we send to connecting clients */
  char *message = "Greetings!\n";

  /* socket() - returns a file descriptor we can use for our server
   * or -1 if there was a problem
   * Arguments:
   * AF_INET = address family Internet (for Internet addressing)
   * SOCK_STREAM = TCP (Transmission Control Protocol)
   * 0 = default protocol for this type of socket
   */
  server_fd = socket(AF_INET, SOCK_STREAM, 0);

  /* Check for an error */
  if(server_fd == -1)
  {
    perror("Unable to obtain a file descriptor for the server");
    exit(1);
  }

  server_address.sin_family = AF_INET;

  /* set the listen address to any/all available */
  server_address.sin_addr.s_addr = INADDR_ANY;

  /* The htons() function below deals with endian conversion which
   * we'll discuss later. This assignment sets the port number to
   * accept connections on. */
  server_address.sin_port = htons(PORT);

  /* bind() - binds the IP address to the server's file descriptor or
   * returns -1 if there was a problem */
  if(bind(server_fd, (struct sockaddr *)&server_address,
          sizeof(server_address)) == -1)
  {
    perror("Unable to bind");
    exit(1);
  }

  /* listen() - listen for incoming connections */
  if(listen(server_fd, BACKLOG) == -1)
  {
    puts("Failed to listen on server socket!");
    exit(1);
  }

  addrlen = sizeof(server_address);

  puts("Waiting for connections...");

  /* Infinite loop to accept connections forever */
  for(;;)
  {
    /* accept() - handle new client connections */
    client_fd = accept(server_fd, (struct sockaddr *)&server_address,
                       (socklen_t*)&addrlen);
    if(client_fd == -1)
    {
      perror("Unable to accept client connection");
      continue;
    }
    /* Send greeting to client and then disconnect them */
    send(client_fd, message, strlen(message), 0);
    close(client_fd);
  }

  return 0;
}

You should be able to copy and paste the above code into a text file.
Compile it with: gcc <file>.c -o network_example
After compiling the program, execute it with: ./network_example
If all went well, you should see something similar to below:
[lloyd@lindev ~]$ ./network_example
Waiting for connections...

Open another terminal and issue: telnet localhost 9990
You should see something like the following:
[lloyd@lindev ~]$ telnet localhost 9990
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Greetings!
Connection closed by foreign host.

You can read more about bind(), listen(), and accept() if you're interested. Next up, we're going to replicate the above C program in x86-64 assembly. Let's see how it looks...


The Assembly Way

[BITS 64]

; Description: 64-bit Linux TCP server
; Author: Lloyd Dilley
; Date: 04/02/2014

struc sockaddr_in
  .sin_family resw 1
  .sin_port resw 1
  .sin_address resd 1
  .sin_zero resq 1
endstruc

section .bss
  peeraddr:
    istruc sockaddr_in
      at sockaddr_in.sin_family, resw 1
      at sockaddr_in.sin_port, resw 1
      at sockaddr_in.sin_address, resd 1
      at sockaddr_in.sin_zero, resq 1
    iend

section .data
  waiting:      db 'Waiting for connections...',0x0A
  waiting_len:  equ $-waiting
  greeting:     db 'Greetings!',0x0A
  greeting_len: equ $-greeting
  error:        db 'An error was encountered!',0x0A
  error_len:    equ $-error
  addr_len:     dq 16
  sockaddr:
    istruc sockaddr_in
      ; AF_INET
      at sockaddr_in.sin_family, dw 2
      ; TCP port 9990 (network byte order)
      at sockaddr_in.sin_port, dw 0x0627
      ; 127.0.0.1 (network byte order)
      at sockaddr_in.sin_address, dd 0x0100007F
      at sockaddr_in.sin_zero, dq 0
    iend

section .text
global _start
_start:
  ; Get a file descriptor for sys_bind
  mov rax, 41           ; sys_socket
  mov rdi, 2            ; AF_INET
  mov rsi, 1            ; SOCK_STREAM
  mov rdx, 0            ; protocol
  syscall
  mov r13, rax
  push rax              ; store return value (fd)
  test rax, rax         ; check if -1 was returned
  js exit_error

  ; Bind to a socket
  mov rax, 49           ; sys_bind
  pop rdi               ; file descriptor from sys_socket
  mov rbx, rdi          ; preserve server fd (rbx is saved across calls)
  mov rsi, sockaddr
  mov rdx, 16           ; size of sin_address is 16 bytes (64-bit address)
  syscall
  push rax
  test rax, rax
  js exit_error

  ; Listen for connections
  mov rax, 50           ; sys_listen
  mov rdi, rbx          ; fd
  mov rsi, 10           ; backlog
  syscall
  push rax
  test rax, rax
  js exit_error
  ; Notify user that we're ready to listen for incoming connections
  mov rax, 1            ; sys_write
  mov rdi, 1            ; file descriptor (1 is stdout)
  mov rsi, waiting
  mov rdx, waiting_len
  syscall
  call accept

accept:
  ; Accept connections
  mov rax, 43           ; sys_accept
  mov rdi, rbx          ; fd
  mov rsi, peeraddr
  lea rdx, [addr_len]
  syscall
  push rax
  test rax, rax
  js exit_error

  ; Send data
  mov rax, 1
  pop rdi               ; peer fd
  mov r15, rdi          ; preserve peer fd (r15 is saved across calls)
  mov rsi, greeting
  mov rdx, greeting_len
  syscall
  push rax
  test rax, rax
  js exit_error

  ; Close peer socket
  mov rax, 3            ; sys_close
  mov rdi, r15          ; fd
  syscall
  push rax
  test rax, rax
  js exit_error
  ;jz shutdown
  call accept           ; loop forever if preceding line is commented out

shutdown:
  ; Close server socket
  mov rax, 3
  mov rdi, rbx
  syscall
  push rax
  test rax, rax
  js exit_error

  ; Exit normally
  mov rax, 60           ; sys_exit
  xor rdi, rdi          ; return code 0
  syscall

exit_error:
  mov rax, 1
  mov rdi, 1
  mov rsi, error
  mov rdx, error_len
  syscall

  mov rax, 60
  pop rdi               ; stored error code
  syscall

Thank goodness for high-level languages, eh?
You can assemble and link just like you did from the first tutorial:
nasm -f elf64 -o network_example.o network_example.asm
ld -o network_example network_example.o

You can then execute the program and test it with telnet the same way you did with the C version. The functionality should be very similar.


Dissecting the Beast

NASM allows programmers to use structs, so we take advantage of this for better data organization. Just like in the C program, a sockaddr_in structure is defined. This is essentially a template which holds various data members. For review, the BSS section contains memory set aside for variable data during runtime. This makes sense considering it is not known what our connecting client source addresses and ports will be. And since we know what address and port to use on the server side, the information can be set in the data section as literals. I also touched on data types some in the first tutorial. The table below contains the types used in this program along with their sizes and examples.

Type Size Example
resb/db 1 byte (8 bits) A keyboard character such as the letter 'c'
resw/dw 2 bytes (16 bits) -- also called a "word" A network port with a maximum value of 65,535
resd/dd 4 bytes (32 bits) -- also called a "double word" An IPv4 address such as 192.168.1.1
resq/dq 8 bytes (64 bits) -- also called a "quad word" A "long long" in C/C++ or represents a decimal number (float)

An "octa word" (128 bits) is also worth mentioning, but is not used in this program. These are used for scientific calculations, graphics, IPv6 addresses, globally unique IDs (GUIDs), etc. The dX variety are initialized and the 'd' stands for "data". So, db is "data byte" and dw is "data word". The resX assortment is used for reserving space for uninitialized data. resb would be "reserve byte" and resq is "reserve quad" for example. The "at" macro gets at each field and sets it with the specified data. "struc" and "endstruc" define a structure. "istruc" and "iend" declare an instance of a structure. You can see in the code how to refer to an instance by using a label (peeraddr for example.)

In the text section (code), you should be able to get an idea of what is going on with the comments. The format is the same as the program from the first tutorial. It is all a matter of putting bullets (data) in certain positions (registers) of a revolver and then pulling the trigger with syscall. That is an analogy I like to use anyway. Again, you can refer to Ryan A. Chapman's 64-bit Linux system call table for reference. sys_bind, sys_listen, sys_accept, and other calls are all present there.


Ten Little Endians

Endianness (name originates from Gulliver's Travels) refers to the way data is arranged in memory in the context of hardware architectures. I bring this up because we needed to call htons() (short data from host to network order) in our C program on the network port. We also needed to convert the loopback IP address and TCP port number to network byte order in the assembly program.

x86/x86-64 are considered little-endian architectures whereas SPARC is big endian. Some processors, such as PPC, can handle both modes and are referred to as bi-endian. What does this mean exactly? Well, on little-endian machines, the most-significant byte (MSB) is stored at the highest memory address. The least-significant byte (LSB) is stored at the lowest address. Big endian is the reverse of this. An example would be storing three bytes that make up the word "BEEF". Using the ASCII values for each letter in hexadecimal: 'B' is 0x42, 'E' is 0x45, and 'F' is 0x46. On a big-endian system, the arrangement of bytes would appear as: 42 45 45 46. However, on a little-endian system, they would appear as: 46 45 45 42. Obviously, debugging is easier on a big-endian system since data is still easily readable by humans. Meanwhile, little endian has the advantage of programmers being able to determine if a number is even or odd by looking at its LSB.

Due to these differences, the need for a common format for data being transmitted over a network was clear. Big endian or network byte order was decided on for this purpose. How can we convert? The easiest method is to use a calculator in programmer mode. Windows calculator supports this mode. The TCP port number 9990 in decimal is 2706 in hex. Since 0x27 is the most significant part, it goes in the right-most slot. 0x06 goes on the left resulting in 0x0627. This is similar for the IP address. Each octet of 127.0.0.1 must be converted to hex. This yields 7F 00 00 01. Again, 127 or 0x7F is the most significant part, so it goes on the far right (lowest memory address.) You end up with 0x0100007F.


A Closer Look

You can use gdb or valgrind to debug of course, but this section is more about tracing program execution to demonstrate what is going on from an OS perspective with system calls. If you have strace installed, issue:
strace -f ./network_example

You can actually see each system call from the assembly program and what arguments populate each function such as source port for peer address. Try connecting with telnet with the trace still running and you can see write() and close() being called. Have a look:
[lloyd@lindev ~]$ strace -f ./network_example
execve("./network_example", ["./network_example"], [/* 26 vars */]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(9990), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
listen(3, 10)                           = 0
write(1, "Waiting for connections...\n", 27Waiting for connections...
) = 27
accept(3, {sa_family=AF_INET, sin_port=htons(47944), sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
write(4, "Greetings!\n", 11)            = 11
close(4)                                = 0
accept(3, {sa_family=AF_INET, sin_port=htons(47946), sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
write(4, "Greetings!\n", 11)            = 11
close(4)                                = 0
accept(3, ^CProcess 27238 detached
 

You can see from above that the server is assigned a file descriptor of 3 and the client is 4. 11 is the length of the greeting sent to the client. sin_port and sin_addr from accept() contain the connecting client's source IP address and port. Pretty slick, huh?


Compacting A Compact Program

As you can see, the size difference between the assembly program and the C program is significant. The functionally-equivalent C program is over 4 times as large:
[lloyd@lindev ~]$ ls -lah network_example_*
-rwxr-xr-x. 1 lloyd linux_users 2.1K Dec 10 03:51 network_example_asm
-rwxr-xr-x. 1 lloyd linux_users 8.9K Dec 10 04:06 network_example_c

Let's see if we can squeeze both of these binaries a bit more...
[lloyd@lindev ~]$ strip -s network_example_*
[lloyd@lindev ~]$ ls -lah network_example_*
-rwxr-xr-x. 1 lloyd linux_users  888 Dec 10 04:28 network_example_asm
-rwxr-xr-x. 1 lloyd linux_users 6.2K Dec 10 04:28 network_example_c

Even after shaving off symbols from both binaries, the C program is now over 6 times larger than the assembly program. The assembly program isn't even 1K. This is a testament to assembly's efficiency. Yay for assembly!


Sayonara

I apologize for the delay between the first tutorial and this one. Better late than never, right? I hope people still find this information useful. If you have any questions or feedback, please drop me a line in the comments and I would be happy to reply.