Microsoft Office files (and other file types commonly used for delivering malware, including binary files, documents, scripts, and archives) are supported in Intezer for both on-demand sandboxing and automated alert triage. Phishing attacks are one of the three primary ways attackers get access to organizations according to Verizon’s 2023 Data Breach Investigations Report… and many phishing attacks arrive via emails containing malicious attachments. A seemingly innocent Microsoft Word file, for example, can be the initial infection stage of a dangerous attack where a threat actor uses a document to deliver malware.
When handling a security breach, the incident response team will collect suspicious files and evidence from the compromised endpoint in order to investigate the incident. One of the challenges IR teams face is finding all of the malicious files that were used in the attack and classifying them to their relevant malware family. Binary files are usually the main suspect. We know that malicious code was executed, so we search for suspicious binary files containing this code (looking for recently installed programs, for example). Non-binary files like Microsoft Office documents should also be carefully examined because they can be the first stage of an attack that caused the malware execution to begin with. Office documents are widely used by threat actors to deliver malware. Usually, the file is attached to an email that is crafted to look like a legitimate communication. Threat actors use social engineering techniques to persuade the victim into opening the malicious attachment. In this article, we will explain the different types of Microsoft Office file formats and how attackers abuse these documents to deliver malware. You will also be presented with tools (both free and paid) and techniques that can help you better identify and classify malicious Microsoft Office files. At the end we’ll look at how you can speed up the whole process by using automation to analyze a specific file or a high volume of incoming files.
When collecting files that could be related to an incident, you might notice that many files contain various extensions (.txt, .dotm, .zip, .docx, .pdf) which belong to different applications. For the purpose of this blog, we will focus on the three main types of file formats in Microsoft Office: Word, Excel, and PowerPoint. First, let’s explain the structure of these files and how they differ from one another.
This file format was incorporated into Microsoft Office 2007. It is a zipped XML-based format developed by Microsoft and used for all Microsoft Office files. The associated extensions include .docx, .xlsx and .pptx. OOXML files are structured in a similar way to OLE files but there are several differences between them:
RTF is another document format developed by Microsoft. RTF files encode text and graphics in a way that makes it possible to share the file between applications. In the past, it was more difficult to open a .doc file without having Microsoft Office or even a Windows PC, so using RTF became a convenient solution.
Unlike the previous formats we talked about, RTF files consist of unformatted text, control words, groups, backslash, and delimiters. Like OOXML, RTF files don’t support macros.
For more information about OLE, OOXML, and RTF files, see Microsoft’s documentation.
In general, you should never trust the suffix of a file because attackers deliberately change the suffix to trick victims into opening them. Always verify the file type that you are analyzing. You can use the file command (Linux/Mac) or the oleid utility from oletools developed by Decalage. This utility displays useful and important information about the file, including the file type and encryption.
There are several ways in which a document can be weaponized with malware and used to launch an attack.
This technique is documented within MITRE ATT&CK® T1137.
Macros save users time by allowing them to automate a series of commands that can be triggered by different actions. Usually, macros are written in Visual Basic for Applications (VBA), a language developed by Microsoft and supported by all Microsoft Office products. Another way to create a macro is to record it within the Microsoft Office application. Macros are a powerful tool that gives users access and permissions to resources of the local system. Attackers use macros to modify files on the system and to execute the next stage of an attack.
By default, OOXML files (.docx, .xlsx, .pptx) can’t be used to store macros. Only specific files with enabled-macro can be used to contain VBA macros. The goal is to make it easier to detect files that have macros and to reduce the risk of attacks that use macros. Files with enabled macros use the letter m at the end of the extension such as .dotm, .docm, .xlsm, and .pptm.
Because of the great security risks of macros, Microsoft added several security measures to restrict the execution of macros. The most effective way to protect the system is to entirely disable macros, but it’s not always possible as macros are a handy tool for many organizations. Another option is manually enabling macros and enforcing limitations on the source and integrity of the document. When a user opens a file containing macros, including OLM files such as .doc, Microsoft Office applications will show a warning message. An alternative solution is to open files in Protected View. Essentially, the file is available only for reading to prevent attackers from executing commands and manipulating the user or file. For more information, check out Microsoft’s website.
In a 2021 attack documented by Kaspersky Lab, a threat actor sent spear phishing emails luring victims to open a malicious Microsoft Excel file. The file used Excel 4.0 macros, which is an older version of macros used to automate tasks in Excel. The macros are hidden in empty cells and spreadsheets so that when the file is opened, malware is downloaded and executed.
Another type of attack method is based on remote .dotm template file injection. If an attacker creates a .docx file and convinces the victim to open the file and press enable content, the file will load a malicious template file from a remote location that executes malware. While the .docx doesn’t contain the macro code itself, the content of the file leads to execution of the macro.
Let’s analyze this doc file: MD5: 167949ba90da85c8b56878d95be19c1a.
First, we can run the oleid tool as described in the previous section. Once we establish that the file contains a VBA macro, we can use the olevba utility to get more information about the VBA and view the code of the macro.
Now, we need to analyze the code of the macro to understand if the file is malicious (macros can also be used for legitimate reasons). To get the streams in the file which contain the code of the VBA macro, you can either unzip the document file and open the file that contains the macro (olevba identifies the file name), or use oledump.
The VBA code in malicious Microsoft Office files is frequently obfuscated, and it may look similar to the image below. Attackers will obfuscate a macro’s code to make it harder and more time-consuming for antiviruses and malware analysts to understand what the code is doing. Attackers use several techniques including: