Behavior Analysis in Modified Cuckoo Sandbox

The goal of this project is to build an automatic dynamic malware analysis system. We leverage the advantages of Cuckoo Sandbox as a dynamic behavior collection platform. Cuckoo Sandbox is developed over 4 years in the open-source community. It is flexible to customize due to a well structured modular architecture.

Installation

Without the vm clone, there are three parts in setting up Cuckoo Sandbox. Install the packages in the host machine , install cuckoo , install virtual machine . With the virtual machine clone, we have all these components set up and ready to use.

To use the virtual machine clone, create a new Ubuntu 64 bit virtual machine in virtualbox, in the step to create the hard drive, choose the option “Use an existing virtual hard drive file” and select the .vdi file from the directory of the vm clone to import. Finish the default configuration of the vm. You’ll have a virtual machine with Cuckoo Sandbox installed and ready to use.

The username and password for Ubuntu host:

1root password is 123456
2
3username  rui
4password: 123456

The Windows guest don’t have a password

Configure Cuckoo

Before running Cuckoo. We need to make sure Cuckoo is configured correctly by modifying the configuration files in the conf directory . We only need the following configuration files.

1. cuckoo.conf
2. virtualbox.conf
3. processing.conf
4. reporting.conf
5. auxiliary.conf

We have them fully configured in our copy. In case you need any change, please refer to the detailed comments in the individual configuration file.

How to run Cuckoo sandbox

There are two steps:

  1. To run the python script cuckoo.py in the application root directory. This script start up the Cuckoo Sandbox. cuckoo.py --debug will output detailed debug information upon startup. See the official manual for other commandline options.

    Note: Before the first time running Cockoo, start the virtualbox and Windows 7 guest machine manually. Otherwise, Cuckoo will incur error during startup, this is due to the reason that it cannot bind the virtual network card we installed in virtualbox without boot the guest. See the appendix for more details about the network setup between the guest and host.

  2. To submit the sample to Cuckoo by running the python script utils/submit.py /path/to/sample. This will start the analysis of the submitted sample. There are also other options you can provide during submission, such as --max, which will specify the maximum samples will add for analysis. See the official manual for other options.

Quick start commands:

 1#1. Start Cuckoo
 2rui@rui-VirtualBox:~/cuckoo$./cuckoo.py
 3# with debug level of logs dump to stdout.
 4rui@rui-VirtualBox:~/cuckoo$./cuckoo.py --debug
 5
 6#2. Submit sample to Cuckoo
 7rui@rui-VirtualBox:~/cuckoo$./utils/submit.py /path/to/the/sample
 8# example of batch analysis of samples from a directory, 10 samples
 9rui@rui-VirtualBox:~/cuckoo$./utils/submit.py --max 10 /path/to/the/sample/directory
10
11#3. Check the statistics
12rui@rui-VirtualBox:~/cuckoo$./utils/stats.py
13
14#4. Remove the analysis generated data (db, storage, and log) from Cuckoo directory
15rui@rui-VirtualBox:~/cuckoo$./cuckoo.py --clean
16
17#5. To combine the multiple json analysis results generated in vtfeature. the purpose of
18#   this is to generate dataset as input to the existing machine learning algorithms.
19rui@rui-VirtualBox:~/cuckoo$./utils/joinjson.py
20
21#6. To enable and disable screenshot
22modify the code in the line cuckoo/analyzer/windows/modules/auxiliary/screenshots.py:27
23	"self.do_run = Ture" to enable
24	"self.do_run = False" to disable

Cuckoo reports and there locations

You can configure the generation of the report in the file processing.conf and reporting.conf. We are interested in two kinds of reports, one resides in the directory storage/rawfeature/ and another resides in storage/vtfeature/. The rawfeature includes complete behavioral profiles while the vtfeature includes tailored profiles to match the information obtained from virustotal.com. These two kinds of reports for each analyzed sample are json files with its SHA256 as file name.

There is a script utils/joinjson.py that join all the generated json files in the storage/vtfeature/. The joined file is put in the storage/dataset/. The file name is the string dataset-, followed by a decimal number and the postfix .json. For example, the file name dataset-10.json is tell us how many sample’s profile in storage/vtfeature/ is joined.

Where to find the analysis results?

Following is the directory structure of the analysis output, see the comments by the side of the file or directory below.

 1storage
 2├── analyses
 3│   ├── 1
 4│   │   ├── analysis.log	#Log from analysis guest
 5│   │   ├── binary
 6│   │   ├── dump.pcap
 7│   │   ├── files			#malware created files uploaded here
 8│   │   │   ├── 1260809960
 9│   │   │   │   └── WebBrowser_embedded.exe
10...
11│   │   │   └── 9504481298
12│   │   │       └── Failed.htm
13│   │   ├── logs			#logs of the hooked APIs
14│   │   │   └── 1364.bson
15│   │   ├── reports			#report generated by vanila cuckoo
16│   │   │   ├── report.html
17│   │   │   └── report.json
18│   │   └── shots			#screenshot of the analysis machine
19│   ├── 2
20│   │   ├── analysis.log
21... #similar to structure as above
22│   │   └── shots
23│   └── latest -> /home/rui/cuckoo/storage/analyses/2
24├── binaries
25│   ├── 0008e49c2d25161a78e8062ee59d6d03fdad59fd014906d2fdc0092988dc413b
26│   └── 33120cda1dd67cc13e73e83db6d7a33eea3ef2474653e90e5e153b6b473b3f14
27├── dataset					#generated by manually running ./utils/joinjson.py
28│   ├── dataset-2.json		#number after the dash sign is total json joined
29├── rawfeature				#json files contain comprehensive behavior reports
30│   ├── 0008e49c2d25161a78e8062ee59d6d03fdad59fd014906d2fdc0092988dc413b.json
31│   └── 33120cda1dd67cc13e73e83db6d7a33eea3ef2474653e90e5e153b6b473b3f14.json
32└── vtfeature				#json files contain tailored reports match virustotal
33    ├── 0008e49c2d25161a78e8062ee59d6d03fdad59fd014906d2fdc0092988dc413b.json
34    └── 33120cda1dd67cc13e73e83db6d7a33eea3ef2474653e90e5e153b6b473b3f14.json

The added processing module and report module

Although reports generated by Cuckoo Sandbox is both informative and configurable, the contents are too detailed to read either by human being or by software, because it is lack of behavioral categorization and order of importance of each. In our project, we require complete and structed behavior information in order to build the learning and detection components, which is mainly developed by my teammate Chi.

Processing module comprehensive.py

This module generate the complete behavior profile from the logged binary json file. It can be found under directory: cuckoo/modules/processing/. It adds much more semantic interpretation of Windows APIs in extracting acurate dynamic behavioral information. The behavior information we generated can be divided in to 9 groups as following, there are 31 operations:

Groups System Objects Operations
1 File openreaddeletemodifymove
2 Registry opencreatedeleteenummodifymovequeryclose
3 Service openscmanageropenstartcreatedeletemodify
4 Mutex opencreate
5 Processes startedterminated
6 Runtime DLLs .dlls files
7 Network TCPUDPDNSHTTP
8 Hooks hooksunhooks
9 Windows searchedwindows

Reporting module comprehensivedump.py

This module dump the json object into a file in the directory of cuckoo/vtfeature. The name is the sha256 value. Folowing is a template the created by the comprehensivedump.py reporting module.

 1{
 2  "metadata": {
 3    "name":"",
 4    "type":"PE32 executable (GUI) Intel 80386, for MS Windows",
 5    "size":"",
 6    "sha256": "",
 7    "md5":"",
 8    "machine": {
 9      "analysisid": 55,
10      "duration": 171,
11      "guest": "Win7-32",
12      "manager": "VirtualBox",
13      "shutdown": "2015-08-04 01:05:59",
14      "started": "2015-08-04 01:03:08",
15      "version": "1.3-dev"
16    },
17    "virustotal": {
18        "date": "2015-07-16 18:22:44",
19        "permalink": "https://www.virustotal.com/file/0a69bfbdefa4ae6595fb09c2de70948f0818fd18f0998419917fb8f9efd162b4/analysis/1437070964/",
20        "positives": 45,
21        "total": 56
22        "scans": {
23            "ALYac": {
24                "detected": true,
25                "result": "Generic.Malware.IN!p2p!dld.F635BCD0",
26                "update": "20150716",
27                "version": "1.0.1.4"
28            },
29			...
30        },
31     }
32  },
33  "file": {
34    "open":[],
35    "read":[],
36    "create":[],
37    "delete":[],
38    "modify":[],
39    "move":[]
40  },
41  "registry": {
42    "open":[],
43    "read":[],
44    "create":[],
45    "delete":[],
46    "modify":[]
47  },
48  "service": {
49    "open":[],
50    "create":[],
51    "delete":[],
52    "enum":[],
53    "modify":[],
54    "query":[],
55    "close":[]
56  },
57  "mutex": {
58    "open":[],
59    "create":[]
60  },
61  "network": {
62    "domain":[],
63    "tcp":[],
64    "udp":[],
65    "http":[]
66  },
67  "runtimeDLL": [],
68  "process": [],
69  "hook": [],
70  "unhook":[],
71  "searchedwindows":[]
72}

Reporting module vtfeature.py

This is a separate reporting module that designed to generate a behavioral profile that matches exactly the profile collected from the virus total by Chi. It generates a string that could be used as a malware classification label. The vtfeature.py the file is compact and I will present it below.

 1# Copyright (C) 2010-2015 Cuckoo Foundation.
 2# This file is part of Cuckoo Sandbox - http://www.cuckoosandbox.org
 3# See the file 'docs/LICENSE' for copying permission.
 4
 5import os
 6import json
 7import codecs
 8import operator
 9import re
10
11from lib.cuckoo.common.abstracts import Report
12from lib.cuckoo.common.exceptions import CuckooReportError
13from collections import OrderedDict
14
15class VtFeatureDump(Report):
16    """Saves analysis results in JSON format."""
17
18    def run(self, results):
19        """Writes report.
20        @param results: Cuckoo results dict.
21        @raise CuckooReportError: if fails to write report.
22        """
23        indent = self.options.get("indent", 4)
24        encoding = self.options.get("encoding", "utf-8")
25
26        vtfeature = OrderedDict()
27        details = OrderedDict()
28        categories = ""
29
30        compreobj = results["comprehensive"]
31
32        wordcount = {}
33
34        for k, v in compreobj["virustotal"]["scans"].iteritems():
35            if v["result"]:
36                words = re.split('\.|/| |:|!', v["result"])
37                for word in words:
38                    w = word.lower()
39                    if w not in wordcount:
40                        wordcount[w] = 1
41                    else:
42                        wordcount[w] += 1
43
44        sorted_wc = sorted(wordcount.iteritems(), key=operator.itemgetter(1))
45        sorted_wc.reverse()
46        i = 10
47        for kk, vv in sorted_wc:
48            if i:
49                categories += "({}={})".format(kk, vv)
50                i -= 1
51
52        details = {
53                "id":compreobj["sha256"],
54                "categories": categories,
55                "permurl":compreobj["virustotal"]["permalink"],
56                "scorestr":"{0}/{1}".format(compreobj["virustotal"]["positives"],compreobj["virustotal"]["total"]),
57                }
58
59        vtfeature = {
60                "Additional details": details,
61                "Read files": compreobj["file"]["read"],
62                "TCP connections":compreobj["TCP connections"],
63                "Hooking activity":compreobj["hooks"],
64                "DNS requests":compreobj["DNS requests"],
65                "HTTP requests":compreobj["HTTP requests"],
66                "Opened services":compreobj["service"]["open"],
67                "Written files": compreobj["file"]["modify"],
68                "Deleted files": compreobj["file"]["delete"],
69                "Created mutexes":compreobj["mutex"]["create"],
70                "Searched windows":compreobj["searchedwindow"],
71                "Opened files":compreobj["file"]["open"],
72                "Replaced files":compreobj["file"]["create"],
73                "Created processes":compreobj["processes"],
74                "Opened mutexes":compreobj["mutex"]["open"],
75                "UDP communications":compreobj["UDP connections"],
76                "Runtime DLLs":compreobj["runtimedll"]
77                }
78
79        try:
80            reportname = compreobj["name"]+".json"
81            path = os.path.join(self.vtfeature_path, reportname)
82            with codecs.open(path, "w", "utf-8") as report:
83                json.dump(vtfeature, report, sort_keys=True,
84                          indent=int(indent), encoding=encoding)
85        except (UnicodeError, TypeError, IOError) as e:
86            raise CuckooReportError("Failed to generate JSON report: %s" % e)

The behavior report it generated is as following:

 1{
 2    "Additional details": {
 3        "categories": "(trojan=22)(win32=15)(temr=10)(gen=8)(zusy=8)(138428=7)(variant=7)(backdoor=3)(w32=2)(genericr-dmc=2)",
 4        "id": "3a3a3adbf4b6cc962d603c3f453f546dd0afdf9a5a11cb7161821523c0d733e9",
 5        "permurl": "https://www.virustotal.com/file/3a3a3adbf4b6cc962d603c3f453f546dd0afdf9a5a11cb7161821523c0d733e9/analysis/1437087697/",
 6        "scorestr": "42/56"
 7    },
 8    "Created mutexes": [],
 9    "Created processes": [],
10    "DNS requests": [],
11    "Deleted files": [],
12    "HTTP requests": [],
13    "Hooking activity": [],
14    "Opened files": [],
15    "Opened mutexes": [],
16    "Opened services": [],
17    "Read files": [],
18    "Replaced files": [],
19    "Runtime DLLs": [],
20    "Searched windows": [],
21    "TCP connections": [],
22    "UDP communications": [],
23    "Written files": []
24}

Code modification statistics

This is the code modification statistics from original Cuckoo source:

 1rui@rui-VirtualBox:~/cuckoo$ git diff --stat b91ebffe 430f6434
 2 .gitignore                                        |   13 +
 3 analyzer/windows/modules/auxiliary/screenshots.py |    2 +-
 4 conf/cuckoo.conf                                  |    7 +-
 5 conf/processing.conf                              |   10 +-
 6 conf/reporting.conf                               |   10 +
 7 lib/cuckoo/common/abstracts.py                    |    3 +
 8 lib/cuckoo/core/scheduler.py                      |    3 +-
 9 lib/cuckoo/core/startup.py                        |    5 +-
10 modules/processing/comprehensive.py               | 1354 +++++++++++++++++++++
11 modules/reporting/comprehensivedump.py            |   32 +
12 modules/reporting/mongodb.py                      |    3 +-
13 modules/reporting/vtfeaturedump.py                |   86 ++
14 utils/api.py                                      |   66 +-
15 utils/joinjson.py                                 |   31 +
16 utils/setup.sh                                    |    2 +-
17 15 files changed, 1586 insertions(+), 41 deletions(-)

Appendix: Network Setup between Host and Guest

Note the term “vitualbox” mentioned in this section is refer to the guest Virtualbox(the inner virtualbox in our setting), becasue our installed “Cuckoo host” itself is a virtual machine. Our setting is a nested virtualization environment.

After installing the Windows guest virtualbox, we do the following steps to configure the network. This is important, otherwise, Cuckoo will not run.

  1. Turn off windows auto-update and firewall, enable some other Windows (guest) settings by run the following command
1reg add "hklm\software\Microsoft\Windows NT\CurrentVersion\WinLogon" /v AutoAdminLogon /d 1 /t REG_SZ /f
2reg add "hklm\system\CurrentControlSet\Control\TerminalServer" /v AllowRemoteRPC /d 0x01 /t REG_DWORD /f
3reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System" /v LocalAccountTokenFilterPolicy /d 0x01 /t REG_DWORD /f

make sure you didn’t set up a username and password to log in. The above setting make sure you enable auto-login (Allows for the agent to start upon reboot) and Remote RPC (Allows for Cuckoo to reboot the sandbox using RPC)

  1. Setting up the network in VirtualBox

    1. We should use two network adapter, the first one is host-only network adapter (Create one such adapter (vboxnet0) for the vm from the virtualbox menu File->Preference->Network), this adapter will enable host guest communication and RPC mechanism Cuckoo requires. After creating the host-only network adapter(i.e. vboxnet0), select it in the vm’s own setting, using default settings.[192.168.56/24]
    2. Configure the host-only network adapter as following(all default values). This is important for Cuckoo to run correctly. The default value is match the default configuration in the virtualbox.conf file. create create
    3. The other adapter we should use is NAT. To set this, just enable the second adapter from the virtual machine’s network configuration, and select the NAT in the dropdown menu.
  2. Using the vboxmanage command line tool to create a snapshot of vms.

    1vboxmanage snapshot "[VM Name]" take "[Snapshot Name]" --pause
    2vboxmanage controlvm "[VM Name]" poweroff
    3vboxmanage snapshot "[VM Name]" restorecurrent
    

    ``

  3. Configure

    1. cuckoo.conf
      • Make sure the following configuration is set up correctly:
        1machinary = virtualbox
        2[resultserver]
        3ip = 192.168.56.1
        
    2. virtualbox.conf
      • Make sure the following is set up correctly:
        1machine = Win7-32
        2platform = windows
        3ip = 192.168.56.101
        4snapshot = cuckoo-test
        
        1
        

Reference

  1. Cuckoo Sandbox Blackhat USA 2013 White Paper
  2. Cuckoo Sandbox Blackhat USA 2013 Slides
  3. recon2013 Slide by Jurriaan