The goal of this project is to build an automatic dynamic malware analysis system. We leverage the advantages of Cuckoo Sandbox as a dynamic behavior collection platform. Cuckoo Sandbox is developed over 4 years in the open-source community. It is flexible to customize due to a well structured modular architecture.
Installation
Without the vm clone, there are three parts in setting up Cuckoo Sandbox. Install the packages in the host machine , install cuckoo , install virtual machine . With the virtual machine clone, we have all these components set up and ready to use.
To use the virtual machine clone, create a new Ubuntu 64 bit virtual machine in virtualbox, in the step to create the hard drive, choose the option “Use an existing virtual hard drive file” and select the .vdi
file from the directory of the vm clone to import. Finish the default configuration of the vm. You’ll have a virtual machine with Cuckoo Sandbox installed and ready to use.
The username and password for Ubuntu host:
The Windows guest don’t have a password
Configure Cuckoo
Before running Cuckoo. We need to make sure Cuckoo is configured correctly by modifying the configuration files in the conf
directory . We only need the following configuration files.
1. cuckoo.conf
2. virtualbox.conf
3. processing.conf
4. reporting.conf
5. auxiliary.conf
We have them fully configured in our copy. In case you need any change, please refer to the detailed comments in the individual configuration file.
How to run Cuckoo sandbox
There are two steps:
-
To run the python script
cuckoo.py
in the application root directory. This script start up the Cuckoo Sandbox.cuckoo.py --debug
will output detailed debug information upon startup. See the official manual for other commandline options.Note: Before the first time running Cockoo, start the virtualbox and Windows 7 guest machine manually. Otherwise, Cuckoo will incur error during startup, this is due to the reason that it cannot bind the virtual network card we installed in virtualbox without boot the guest. See the appendix for more details about the network setup between the guest and host.
-
To submit the sample to Cuckoo by running the python script
utils/submit.py /path/to/sample
. This will start the analysis of the submitted sample. There are also other options you can provide during submission, such as--max
, which will specify the maximum samples will add for analysis. See the official manual for other options.
Quick start commands:
1#1. Start Cuckoo
2rui@rui-VirtualBox:~/cuckoo$./cuckoo.py
3# with debug level of logs dump to stdout.
4rui@rui-VirtualBox:~/cuckoo$./cuckoo.py --debug
5
6#2. Submit sample to Cuckoo
7rui@rui-VirtualBox:~/cuckoo$./utils/submit.py /path/to/the/sample
8# example of batch analysis of samples from a directory, 10 samples
9rui@rui-VirtualBox:~/cuckoo$./utils/submit.py --max 10 /path/to/the/sample/directory
10
11#3. Check the statistics
12rui@rui-VirtualBox:~/cuckoo$./utils/stats.py
13
14#4. Remove the analysis generated data (db, storage, and log) from Cuckoo directory
15rui@rui-VirtualBox:~/cuckoo$./cuckoo.py --clean
16
17#5. To combine the multiple json analysis results generated in vtfeature. the purpose of
18# this is to generate dataset as input to the existing machine learning algorithms.
19rui@rui-VirtualBox:~/cuckoo$./utils/joinjson.py
20
21#6. To enable and disable screenshot
22modify the code in the line cuckoo/analyzer/windows/modules/auxiliary/screenshots.py:27
23 "self.do_run = Ture" to enable
24 "self.do_run = False" to disable
Cuckoo reports and there locations
You can configure the generation of the report in the file processing.conf
and reporting.conf
. We are interested in two kinds of reports, one resides in the directory storage/rawfeature/
and another resides in storage/vtfeature/
. The rawfeature includes complete behavioral profiles while the vtfeature includes tailored profiles to match the information obtained from virustotal.com. These two kinds of reports for each analyzed sample are json files with its SHA256 as file name.
There is a script utils/joinjson.py
that join all the generated json files in the storage/vtfeature/
. The joined file is put in the storage/dataset/
. The file name is the string dataset-
, followed by a decimal number and the postfix .json
. For example, the file name dataset-10.json
is tell us how many sample’s profile in storage/vtfeature/
is joined.
Where to find the analysis results?
Following is the directory structure of the analysis output, see the comments by the side of the file or directory below.
1storage
2├── analyses
3│ ├── 1
4│ │ ├── analysis.log #Log from analysis guest
5│ │ ├── binary
6│ │ ├── dump.pcap
7│ │ ├── files #malware created files uploaded here
8│ │ │ ├── 1260809960
9│ │ │ │ └── WebBrowser_embedded.exe
10...
11│ │ │ └── 9504481298
12│ │ │ └── Failed.htm
13│ │ ├── logs #logs of the hooked APIs
14│ │ │ └── 1364.bson
15│ │ ├── reports #report generated by vanila cuckoo
16│ │ │ ├── report.html
17│ │ │ └── report.json
18│ │ └── shots #screenshot of the analysis machine
19│ ├── 2
20│ │ ├── analysis.log
21... #similar to structure as above
22│ │ └── shots
23│ └── latest -> /home/rui/cuckoo/storage/analyses/2
24├── binaries
25│ ├── 0008e49c2d25161a78e8062ee59d6d03fdad59fd014906d2fdc0092988dc413b
26│ └── 33120cda1dd67cc13e73e83db6d7a33eea3ef2474653e90e5e153b6b473b3f14
27├── dataset #generated by manually running ./utils/joinjson.py
28│ ├── dataset-2.json #number after the dash sign is total json joined
29├── rawfeature #json files contain comprehensive behavior reports
30│ ├── 0008e49c2d25161a78e8062ee59d6d03fdad59fd014906d2fdc0092988dc413b.json
31│ └── 33120cda1dd67cc13e73e83db6d7a33eea3ef2474653e90e5e153b6b473b3f14.json
32└── vtfeature #json files contain tailored reports match virustotal
33 ├── 0008e49c2d25161a78e8062ee59d6d03fdad59fd014906d2fdc0092988dc413b.json
34 └── 33120cda1dd67cc13e73e83db6d7a33eea3ef2474653e90e5e153b6b473b3f14.json
The added processing module and report module
Although reports generated by Cuckoo Sandbox is both informative and configurable, the contents are too detailed to read either by human being or by software, because it is lack of behavioral categorization and order of importance of each. In our project, we require complete and structed behavior information in order to build the learning and detection components, which is mainly developed by my teammate Chi.
Processing module comprehensive.py
This module generate the complete behavior profile from the logged binary json file.
It can be found under directory: cuckoo/modules/processing/
. It adds much
more semantic interpretation of Windows APIs in extracting acurate dynamic behavioral
information. The behavior information we generated can be divided in to 9 groups as following, there are 31 operations:
Groups | System Objects | Operations |
---|---|---|
1 | File | openreaddeletemodifymove |
2 | Registry | opencreatedeleteenummodifymovequeryclose |
3 | Service | openscmanageropenstartcreatedeletemodify |
4 | Mutex | opencreate |
5 | Processes | startedterminated |
6 | Runtime DLLs | .dlls files |
7 | Network | TCPUDPDNSHTTP |
8 | Hooks | hooksunhooks |
9 | Windows | searchedwindows |
Reporting module comprehensivedump.py
This module dump the json object into a file in the directory of cuckoo/vtfeature
. The name is the sha256 value. Folowing is a template the created by the comprehensivedump.py
reporting module.
1{
2 "metadata": {
3 "name":"",
4 "type":"PE32 executable (GUI) Intel 80386, for MS Windows",
5 "size":"",
6 "sha256": "",
7 "md5":"",
8 "machine": {
9 "analysisid": 55,
10 "duration": 171,
11 "guest": "Win7-32",
12 "manager": "VirtualBox",
13 "shutdown": "2015-08-04 01:05:59",
14 "started": "2015-08-04 01:03:08",
15 "version": "1.3-dev"
16 },
17 "virustotal": {
18 "date": "2015-07-16 18:22:44",
19 "permalink": "https://www.virustotal.com/file/0a69bfbdefa4ae6595fb09c2de70948f0818fd18f0998419917fb8f9efd162b4/analysis/1437070964/",
20 "positives": 45,
21 "total": 56
22 "scans": {
23 "ALYac": {
24 "detected": true,
25 "result": "Generic.Malware.IN!p2p!dld.F635BCD0",
26 "update": "20150716",
27 "version": "1.0.1.4"
28 },
29 ...
30 },
31 }
32 },
33 "file": {
34 "open":[],
35 "read":[],
36 "create":[],
37 "delete":[],
38 "modify":[],
39 "move":[]
40 },
41 "registry": {
42 "open":[],
43 "read":[],
44 "create":[],
45 "delete":[],
46 "modify":[]
47 },
48 "service": {
49 "open":[],
50 "create":[],
51 "delete":[],
52 "enum":[],
53 "modify":[],
54 "query":[],
55 "close":[]
56 },
57 "mutex": {
58 "open":[],
59 "create":[]
60 },
61 "network": {
62 "domain":[],
63 "tcp":[],
64 "udp":[],
65 "http":[]
66 },
67 "runtimeDLL": [],
68 "process": [],
69 "hook": [],
70 "unhook":[],
71 "searchedwindows":[]
72}
Reporting module vtfeature.py
This is a separate reporting module that designed to generate a behavioral profile
that matches exactly the profile collected from the virus total by Chi. It generates
a string that could be used as a malware classification label. The vtfeature.py
the file is compact and I will present it below.
1# Copyright (C) 2010-2015 Cuckoo Foundation.
2# This file is part of Cuckoo Sandbox - http://www.cuckoosandbox.org
3# See the file 'docs/LICENSE' for copying permission.
4
5import os
6import json
7import codecs
8import operator
9import re
10
11from lib.cuckoo.common.abstracts import Report
12from lib.cuckoo.common.exceptions import CuckooReportError
13from collections import OrderedDict
14
15class VtFeatureDump(Report):
16 """Saves analysis results in JSON format."""
17
18 def run(self, results):
19 """Writes report.
20 @param results: Cuckoo results dict.
21 @raise CuckooReportError: if fails to write report.
22 """
23 indent = self.options.get("indent", 4)
24 encoding = self.options.get("encoding", "utf-8")
25
26 vtfeature = OrderedDict()
27 details = OrderedDict()
28 categories = ""
29
30 compreobj = results["comprehensive"]
31
32 wordcount = {}
33
34 for k, v in compreobj["virustotal"]["scans"].iteritems():
35 if v["result"]:
36 words = re.split('\.|/| |:|!', v["result"])
37 for word in words:
38 w = word.lower()
39 if w not in wordcount:
40 wordcount[w] = 1
41 else:
42 wordcount[w] += 1
43
44 sorted_wc = sorted(wordcount.iteritems(), key=operator.itemgetter(1))
45 sorted_wc.reverse()
46 i = 10
47 for kk, vv in sorted_wc:
48 if i:
49 categories += "({}={})".format(kk, vv)
50 i -= 1
51
52 details = {
53 "id":compreobj["sha256"],
54 "categories": categories,
55 "permurl":compreobj["virustotal"]["permalink"],
56 "scorestr":"{0}/{1}".format(compreobj["virustotal"]["positives"],compreobj["virustotal"]["total"]),
57 }
58
59 vtfeature = {
60 "Additional details": details,
61 "Read files": compreobj["file"]["read"],
62 "TCP connections":compreobj["TCP connections"],
63 "Hooking activity":compreobj["hooks"],
64 "DNS requests":compreobj["DNS requests"],
65 "HTTP requests":compreobj["HTTP requests"],
66 "Opened services":compreobj["service"]["open"],
67 "Written files": compreobj["file"]["modify"],
68 "Deleted files": compreobj["file"]["delete"],
69 "Created mutexes":compreobj["mutex"]["create"],
70 "Searched windows":compreobj["searchedwindow"],
71 "Opened files":compreobj["file"]["open"],
72 "Replaced files":compreobj["file"]["create"],
73 "Created processes":compreobj["processes"],
74 "Opened mutexes":compreobj["mutex"]["open"],
75 "UDP communications":compreobj["UDP connections"],
76 "Runtime DLLs":compreobj["runtimedll"]
77 }
78
79 try:
80 reportname = compreobj["name"]+".json"
81 path = os.path.join(self.vtfeature_path, reportname)
82 with codecs.open(path, "w", "utf-8") as report:
83 json.dump(vtfeature, report, sort_keys=True,
84 indent=int(indent), encoding=encoding)
85 except (UnicodeError, TypeError, IOError) as e:
86 raise CuckooReportError("Failed to generate JSON report: %s" % e)
The behavior report it generated is as following:
1{
2 "Additional details": {
3 "categories": "(trojan=22)(win32=15)(temr=10)(gen=8)(zusy=8)(138428=7)(variant=7)(backdoor=3)(w32=2)(genericr-dmc=2)",
4 "id": "3a3a3adbf4b6cc962d603c3f453f546dd0afdf9a5a11cb7161821523c0d733e9",
5 "permurl": "https://www.virustotal.com/file/3a3a3adbf4b6cc962d603c3f453f546dd0afdf9a5a11cb7161821523c0d733e9/analysis/1437087697/",
6 "scorestr": "42/56"
7 },
8 "Created mutexes": [],
9 "Created processes": [],
10 "DNS requests": [],
11 "Deleted files": [],
12 "HTTP requests": [],
13 "Hooking activity": [],
14 "Opened files": [],
15 "Opened mutexes": [],
16 "Opened services": [],
17 "Read files": [],
18 "Replaced files": [],
19 "Runtime DLLs": [],
20 "Searched windows": [],
21 "TCP connections": [],
22 "UDP communications": [],
23 "Written files": []
24}
Code modification statistics
This is the code modification statistics from original Cuckoo source:
1rui@rui-VirtualBox:~/cuckoo$ git diff --stat b91ebffe 430f6434
2 .gitignore | 13 +
3 analyzer/windows/modules/auxiliary/screenshots.py | 2 +-
4 conf/cuckoo.conf | 7 +-
5 conf/processing.conf | 10 +-
6 conf/reporting.conf | 10 +
7 lib/cuckoo/common/abstracts.py | 3 +
8 lib/cuckoo/core/scheduler.py | 3 +-
9 lib/cuckoo/core/startup.py | 5 +-
10 modules/processing/comprehensive.py | 1354 +++++++++++++++++++++
11 modules/reporting/comprehensivedump.py | 32 +
12 modules/reporting/mongodb.py | 3 +-
13 modules/reporting/vtfeaturedump.py | 86 ++
14 utils/api.py | 66 +-
15 utils/joinjson.py | 31 +
16 utils/setup.sh | 2 +-
17 15 files changed, 1586 insertions(+), 41 deletions(-)
Appendix: Network Setup between Host and Guest
Note the term “vitualbox” mentioned in this section is refer to the guest Virtualbox(the inner virtualbox in our setting), becasue our installed “Cuckoo host” itself is a virtual machine. Our setting is a nested virtualization environment.
After installing the Windows guest virtualbox, we do the following steps to configure the network. This is important, otherwise, Cuckoo will not run.
- Turn off windows auto-update and firewall, enable some other Windows (guest) settings by run the following command
1reg add "hklm\software\Microsoft\Windows NT\CurrentVersion\WinLogon" /v AutoAdminLogon /d 1 /t REG_SZ /f
2reg add "hklm\system\CurrentControlSet\Control\TerminalServer" /v AllowRemoteRPC /d 0x01 /t REG_DWORD /f
3reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System" /v LocalAccountTokenFilterPolicy /d 0x01 /t REG_DWORD /f
make sure you didn’t set up a username and password to log in. The above setting make sure you enable auto-login (Allows for the agent to start upon reboot) and Remote RPC (Allows for Cuckoo to reboot the sandbox using RPC)
-
Setting up the network in VirtualBox
- We should use two network adapter, the first one is host-only network adapter (Create one such adapter (vboxnet0) for the vm from the virtualbox menu
File->Preference->Network
), this adapter will enable host guest communication and RPC mechanism Cuckoo requires. After creating the host-only network adapter(i.e. vboxnet0), select it in the vm’s own setting, using default settings.[192.168.56/24] - Configure the host-only network adapter as following(all default values). This is important for Cuckoo to run correctly. The default value is match the default configuration in the virtualbox.conf file.
- The other adapter we should use is NAT. To set this, just enable the second adapter from the virtual machine’s network configuration, and select the NAT in the dropdown menu.
- We should use two network adapter, the first one is host-only network adapter (Create one such adapter (vboxnet0) for the vm from the virtualbox menu
-
Using the
vboxmanage
command line tool to create a snapshot of vms.1vboxmanage snapshot "[VM Name]" take "[Snapshot Name]" --pause 2vboxmanage controlvm "[VM Name]" poweroff 3vboxmanage snapshot "[VM Name]" restorecurrent
``
-
Configure
- cuckoo.conf
- Make sure the following configuration is set up correctly:
- Make sure the following configuration is set up correctly:
- virtualbox.conf
- Make sure the following is set up correctly:
- cuckoo.conf