String Theory (..for Windows)
mark January 9th, 2007
In languages that have direct memory access, string manipulation has always been tough to get right. When the dangers of buffer overflows were first realized, many applications that did any sort of string processing were found to have vulnerabilities that we would now consider to be obvious. Over the years, system string APIs have been updated, refined, and replaced. Interestingly, these various replacement functions have exhibited additional quirks, which can render an application vulnerable in more subtle ways than with standard buffer overflows. Some of these quirks are quite well-documented and well-known (such as the strncpy() function not NUL-terminating), and some of them are not. Windows actually provides quite a large number of string functions, each with their own subtle variances. It is important as auditors to know these differences so that you can accurately assess whether an application is vulnerable to potential memory corruption issues or not. With that in mind, I’ve put together a series of tables of quick facts about the various string functions available in a Win32 programming environment that should be useful as a reference when auditing string manipulation routines.
To make the page more readable, I have divided the API functions over several tables. The first table we will look at covers the standard string copy and concatenation API functions available.
| Function Name | DLL | Function Parameters | NUL-Termination Guaranteed | Return Value | Comments |
|---|---|---|---|---|---|
| strcpy(), wcscpy(), _mbscpy(), _tcscpy() | NTDLL.DLL |
|
Yes | Pointer to destination string | - Unbounded string copy functions, should be considered dangerous |
| strncpy(), wcsncpy(), _mbsncpy(), _tcsncpy() | NTDLL.DLL |
|
No | Pointer to destination string | - These functions do not guarantee NUL-termination, possibly leads to out of bounds memory access/corruption |
| strcat(), wcscat(), _mbscat(), _tcscat() | NTDLL.DLL, MSVCRT.DLL, MSVCR*.DLL |
|
Yes | Pointer to destination string | - Unbounded string copy functions, should be considered dangerous |
| strncat(), wcsncat(), _mbsncat(), _tcsncat() | NTDLL.DLL, MSVCRT.DLL, MSVCR*.DLL |
|
Yes | Pointer to destination string | - Maximum length parameter indicates how much space (in characters) remains in the buffer, not the total size of the buffer. It is common for developers to accidentally supply the entire size of the buffer. - Maximum length parameter also does not include the trailing NUL character. Therefore, the correct length to supply is (sizeof(buffer)/sizeof(TCHAR) - _tcslen(buffer) - 1). Can lead to integer underflow conditions |
Most of the functions in this first table have pretty well-known pitfalls, but we needed to cover them for completeness. Obviously the most dangerous functions are unbounded ones, but each of the secure alternatives also have their own problems - strncpy() failing to NUL-terminate and strncat() has a deceptive size parameter.
Moving on, let’s look at the replacement lstr* functions:
| Function Name | DLL | Function Parameters | NUL-Termination Guaranteed | Return Value | Comments |
|---|---|---|---|---|---|
| lstrcpyA(), lstrcpyW() | KERNEL32.DLL |
|
Yes | Pointer to destination buffer, or NULL if an exception occurs | - Unbounded string copy functions, should be considered dangerous - Installs exception handlers: these functions will not crash on NULL-dereferences, invalid pointers, or hitting page boundaries |
| lstrcpynA(), lstrcpynW() | KERNEL32.DLL |
|
Yes | Pointer to destination buffer, or NULL if an exception occurs | - Installs exception handlers: these functions will not crash on NULL-dereferences, invalid pointers, or hitting page boundaries - Unlike strncpy() etc, these functions do NUL-terminate (unless an exception occurs when hitting a page boundary) |
| lstrcatA(), lstrcatW() |
KERNEL32.DLL |
|
Yes | Pointer to destination string, or NULL if an exception occurs | - Installs exception handlers: these functions will not crash on NULL-dereferences, invalid pointers, or hitting page boundaries |
These functions can be used as alternatives to the standard string API functions. Their primary difference is that they install their own exception handler, so they can handle NULL pointers, invalid pointers, and hitting guard pages. For any of these cases to occur, a bug must already have occurred (not checking a return value, doing an unbounded string copy, etc), but it might be interesting in cases where triggering such a bug and not having the application immediately crash might lead to some sort of exploitable scenario. Also note than unlike strncpy(), lstrcpyn() does guarantee NUL-termination.
Next on the list is our string formatting functions.
| Function Name | DLL | Function Parameters | NUL-Termination Guaranteed | Return Value | Comments |
|---|---|---|---|---|---|
| sprintf(), swprintf(), _stprintf(), vsprintf(), vswprintf(), _vstprintf() | NTDLL.DLL, MSVCRT.DLL, MSVCR*.DLL |
|
Yes | Returns the number of characters written to the output stream. the vs* functions can actually return a negative value if a parameter validation handler is installed and a NULL destination buffer or format string is supplied. By default if such an error occurs, this will cause an exception to be thrown (NTSTATUS_INVALID_PARAMETER). | - Unbounded string copy functions, should be considered dangerous |
| _snprintf(), _snwprintf(), _sntprintf(), _vsnprintf(), _vsnwprintf(), _vsntprintf() | NTDLL.DLL, MSVCRT.DLL, MSVCR*.DLL |
|
No | Returns the number of characters written to the output stream (not including NUL), or -1 if truncation was necessary. | - Note that the way these functions work is quite different to how the UNIX versions of these functions work. Most UNIX versions guarantee NUL-termination, and return the number of characters that would have been written, if there was enough room. |
| wsprintfA(), wsprintfW(), wvsprintfA(), wvsprintfW() |
USER32.DLL |
|
Yes | Returns the number of characters (not including NUL) it wrote successfully to the output stream. | - These functions are unbounded, but won’t write more than 1024 characters + 1 trailing NUL character (for a total of 1025 characters). - This is in contrast to the MSDN documentation, which says it writes a maximum of 1024 characters and does not guarantee NUL termination - Return value doesn’t give an indication if truncation occured |
| wnsprintfA(), wnsprintfW(), wvnsprintfA(), wvnsprintfW() |
SHLWAPI.DLL |
|
Yes | Returns the number of characters it wrote to the output stream (not including NUL) | - Return value doesn’t give an indication if truncation occured |
As you can see, the various formatting functions available also have a number of interesting quirks. Some of them have similar names to the UNIX functions, yet behave differently, and the alternatives provided have their own potential problems. One interesting problem that I haven’t noted in the table here is that wide character printf()-style functions (with the exception of the wnsprintfW() and wvnprintfW() functions) contain an additional interesting quirk - strings can be arbitrarily truncated at any point when a WEOF (0xFFFF) character is encountered, and an error is returned. For example, consider the following code:
WCHAR buf[MAX_SIZE]; int rc;
rc = _snwprintf(buf, MAX_SIZE, L"%s test", user_input); ... more code ...
If the user_input string contains the characters "AAA" + "\xFFFF" + "BBB", then buf will hold "AAA" + "\xFFFF", followed by a NUL-terminator. The "BBB" and the static " test" parts will be truncated, and rc will hold the error code -1. This could be useful in situations where a user wants to arbitrarily truncate some data when one of the printf() functions are used. Note that this behavior is a little different in the secure string functions that we will be exploring later on, specifically the *_s functions from the "Secure CRT API" and the String* functions from the "StrSafe API". When either of these are used, the entire buffer will be truncated if a WEOF character is encountered (that is, a NUL character will be placed at the beginning of the destination buffer, discarding previously formatted data that might have already been output there).
For completeleness, I’ll also add a quick table for what I’ve called the "Random Shell API". This a collection of functions that are implemented in the ‘Shell Light-Weight API’ library (SHLWAPI.DLL), and don’t seem to belong anywhere else.
| Function Name | DLL | Function Parameters | NUL-Termination Guaranteed | Return Value | Comments |
|---|---|---|---|---|---|
| StrCpyA(), StrCpyW() |
SHLWAPI.DLL |
|
Yes | Pointer to destination string | - Acts just like strcpy(), except that it won’t crash if either parameters are NULL |
| StrCpyNA(), StrCpyNW() | SHLWAPI.DLL |
|
Yes | Pointer to destination string | - MSDN says it doesn’t guarantee NUL-termination, but it seems to in practice - Will crash if a NULL pointer is passed as either parameter |
| StrCatA(), StrCatW() | SHLWAPI.DLL |
|
Yes | Pointer to destination string | - Behaves the same as strcat(), except that it can handle NULL parameters without crashing |
| StrNCatA(), StrNCatW() |
SHLWAPI.DLL |
|
Yes | Pointer to destination string | - Has deceptive size parameter like strncat(), except that the size parameter does include the NUL-terminating byte |
| StrCatBuffA(), StrCatBuffW() | SHLWAPI.DLL |
|
Yes | Pointer to destination string | - Behaves exactly like StrNCat() except that the size parameter indicates the total size of the buffer, not the remaining size of the buffer. |
The "Shell API" actually contains quite a few other string routines in addition to the ones listed here, but they aren’t in common usage. For more information about them, visit the MSDN site here.
Microsoft realized that many of the functions we have classified so far are quite easy to misuse, and account for quite a large number of vulnerabilities. Therefore, they decided to implement "secure" alternatives, thus lessening the chance that developers will make mistakes with them. These replacement functions (referred to collectively as the "StrSafe" API), although considerably more secure by design, also have had a negative impact in the application security space: they have forced me to write another one of these big tables. Each function in the StrSafe API has two different versions - one version with that takes a size in characters for the destination buffer, (denoted by "Cch" appearing in the function name) and one that takes a size in bytes of the destination buffer (denoted by "Cb" appearing in the name). These function pairs explicitly address a confusion about the size parameter that is commonly made in string functions that deal with wide characters - the size parameter for all of the wide character functions we have looked at so far indicates the destination size in wide characters, not in bytes. It is very easy to get this wrong. Consider the following example:
#define MAX_SIZE 256
WCHAR buffer[MAX_SIZE];
wcsncpy(buffer, source, sizeof(buffer));
This code looks fairly natural, but in fact the size parameter is incorrect - the size passed to wcsncpy() should be MAX_SIZE, not sizeof(buffer) (which is MAX_SIZE * sizeof(WCHAR)). Therefore, this code is vulnerable to a buffer overflow.
In order to save a bit of space I have combined both versions of each function into a single table entry. So, without further adieu, the secure string API functions:
| Function Name | DLL | Function Parameters | NUL-Termination Guaranteed | Return Value | Comments |
|---|---|---|---|---|---|
| StringCchCopy(), StringCbCopy(), StringCchCopyN(), StringCbCopyN() |
|
Yes | Returns S_OK on success, STRSAFE_E_INVALID_PARAMETER if a parameter is invalid, or STRSAFE_E_INSUFFICIENT_BUFFER if it had to truncate data | - If length parameter is longer than STRSAFE_MAX_CCH then this function will not touch the destination buffer and fail - Return Value checking alot more important with this API |
|
| StringCchCat(), StringCbCat(), StringCchCatN(), StringCbCatN() |
|
Yes | Returns S_OK on success, STRSAFE_E_INVALID_PARAMETER if a parameter is invalid, or STRSAFE_E_INSUFFICIENT_BUFFER if it had to truncate data | - If length parameter is longer than STRSAFE_MAX_CCH then this function will not touch the destination buffer and fail - Return Value checking alot more important with this API |
|
| StringCchPrintf(), StringCbPrintf(), StringCchVPrintf(), StringCbPrintf() |
|
Yes | Returns S_OK on success, STRSAFE_E_INVALID_PARAMETER if a parameter is invalid, or STRSAFE_E_INSUFFICIENT_BUFFER if it had to truncate data | - If length parameter is 0, or longer than STRSAFE_MAX_CCH then this function will not touch the destination buffer and fail - Return Value checking alot more important with this API - Throws an invalid parameter exception if "%n" exists in the format string |
Note that I make a point here about the functions immediately failing if the length parameter is longer than STRSAFE_MAX_CCH characters 0×7FFFFFFF). In fact, this is only true for the *Cch* functions. For the *Cb* functions, the length parameter can’t be longer than STRSAFE_MAX_CCH * sizeof(TCHAR), which is 0FFFFFFFE if _UNICODE is defined. However, 0xFFFFFFFF also seems to work, so the *Cb* functions seem to be able to have an unrestricted length passed to them. Another thing to note is that every function listed here also has a corresponding function ending in "Ex" (StringCchCopyEx(), StringCbCopyEx(), etc) that takes 3 additional parameters - an address of a TCHAR pointer to be updated to point to the end of the destination string, a pointer to a DWORD that will be filled out with the number of remaining (unused) characters in the destination buffer, and a flags value that influences some behavioral characteristics of the function. These characteristics can have an effect on whether a potentially vulnerable call to one of these functions is actually exploitable. The flags you that the developer can supply as shown on MSDN are:
- STRSAFE_FILL_BEHIND_NULL
- If the function succeeds, the low byte of dwFlags (0) is used to fill the uninitialized portion of pszDest following the terminating null character.
- STRSAFE_IGNORE_NULLS
- Treat null string pointers like empty strings (TEXT("")).
- STRSAFE_FILL_ON_FAILURE
- If the function fails, the low byte of dwFlags (0) is used to fill the entire pszDest buffer, and the buffer is null-terminated. In the case of a STRSAFE_E_INSUFFICIENT_BUFFER failure, any truncated string returned is overwritten.
- STRSAFE_NULL_ON_FAILURE
- If the function fails, pszDest is set to an empty string (TEXT("")). In the case of a STRSAFE_E_INSUFFICIENT_BUFFER failure, any truncated string is overwritten.
- STRSAFE_NO_TRUNCATION
- As in the case of STRSAFE_NULL_ON_FAILURE, if the function fails, pszDest is set to an empty string (TEXT("")). In the case of a STRSAFE_E_INSUFFICIENT_BUFFER failure, any truncated string is overwritten.
The STRSAFE_FILL_BEHIND_NULL and STRSAFE_FILL_ON_FAILURE flags might be interesting in cases where an erroneous length parameter has been passed to one of these functions. The STRSAFE_NULL_ON_FAILURE and STRSAFE_NO_TRUNCATION flags could be interesting if a string is being built with several successive calls to one of the StrSafe functions, and you wanted to completely remove some of the string components from the destination buffer for some reason (such an attack would be predicated on the fact that they aren’t doing much return value checking). Finally, the STRSAFE_IGNORE_NULLS flag could be useful if allocation of a destination string has failed, and the error hasn’t been detected - keeping the program alive longer might have some interesting consequences.
Last but not least, we have the "Secure CRT" String replacement API. These functions are intended to replace many of the functions that we discussed earlier in this post (strcpy(), sprintf(), etc). The interesting thing about these functions are that they can automatically be compiled into an application in place of the functions they are intended to replace. This can be achieved in C++ programs by using Microsoft’s Secure Template Overloads. Essentially, in cases where statically sized arrays are used as destination buffers for any of the standard CRT string routines, the standard function’s secure counterpart will be substituted in to the code in its place. In cases where dynamically sized arrays are used, no substitution is made. In order for this functionality to be taken advantage of, the program must define _CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES as 1. Consult the MSDN page linked to above for more information.
| Function Name | DLL | Function Parameters | NUL-Termination Guaranteed | Return Value | Comments |
|---|---|---|---|---|---|
| strcpy_s(), wcscpy_s(), _mbscpy_s(), _tcscpy_s() | MSVCR80.DLL |
|
Yes | 0 on success, error code on failure | - If the maximum length is exceeded and no parameter validation handler is installed, the watson crash reporting mechanism is invoked - Length parameter is not constrained like String* functions |
| strncpy_s(), wcsncpy_s(), _mbsncpy_s(), _tcsncpy_s() | MSVCR80.DLL |
|
Yes | 0 on success, error code on failure | - If the maximum length is exceeded and no parameter validation handler is installed, the watson crash reporting mechanism is invoked - Length parameter is not constrained like String* functions |
| strcat_s(), wcscat_s(), _mbscat_s(), _tcscat_s() | MSVCR80.DLL |
|
Yes | 0 on success, error code on failure | - If the maximum length is exceeded and no parameter validation handler is installed, the watson crash reporting mechanism is invoked - If the destination string is already longer than the length parameter, the same thing happens - Length parameter is not constrained like String* functions |
| strncat_s(), wcsncat_s(), _mbsncat_s(), _tcsncat_s() | MSVCR80.DLL |
|
Yes | 0 on success, error code on failure | - If the maximum length is exceeded and no parameter validation handler is installed, the watson crash reporting mechanism is invoked - If the destination string is already longer than the length parameter, the same thing happens - Length parameter is not constrained like String* functions |
| sprintf_s(), swprintf_s(), _stprintf_s(), vsprintf_s(), _vswprintf_s(), _vstprintf_s() |
MSVCR80.DLL |
|
Yes | The number of characters written to the output stream, or -1 on error | - If the maximum length is exceeded and no parameter validation handler is installed, the watson crash reporting mechanism is invoked - Length parameter is not constrained like String* functions |
So, you can see that there are many different string functions available in a Windows programming environment. In fact, there are about 3 or 4 different ways to accomplish the same task. So, when auditing code, you need to be aware of the special cases for all of the functions we have talked about. Knowing about these cases will enable you to thoroughly and accurately assess a given piece of code to determine whether it is vulnerable in some way or another.
Happy Hunting!

mark is so sexy!
thats really cool man thnx
I say this only because this post was tagged “C/C++”, and because you mention functions that are supplied with VC++, but there is, of course, C++’s string library, which has two convenient properties:
1) it’s counted (no annoying null termination to mess with)
2) it takes care of ensuring that the buffer is the right size for you
I would argue that using the functions described in this post is unnecessarily complicated and that std::string/std::wstring are the only sensible string libraries that people should use. C strings are just asking for trouble. The only reasonable objection I think is the lack of a printf() equivalent, although this is not insurmountable (due to stringstreams and various parts of the Boost library).
Good point, but I should explain that our primary concern is with auditing real-world code. So, even if there are safer ways of doing things, we have to focus on what people actually do in practice.
Good post, im surprised you didnt include the often misused sscanf() family.
ron: Thanks! Yeah, the scanf() family of functions might have been a good addition. I did consider it - essentially, there is a huge amount of string functions in addition to the ones i posted - *gets(), *scanf(), and many, many more if you look over MSDN. I just felt I should draw the line somewhere, because the post was getting quite large. Scanf() functions are a borderline case though, they are used with relative frequency, I suppose.
anonymous: I know it, baby!
One little gotcha with lstrcpyn(A/W) is that if the last parameter, the number of characters asked to be copied, is zero then nothing at all is done no characters are written, leaving youre destination buffer untouched.
Imagine this:
char szBuffer1[32];
char szBuffer2[] = Hello world;
int nCharsToCopy = [some user supplied validated as >= 0];
lstrcpyn(szBuffer1, szBuffer2, nCharsToCopy);
szBuffer1 is left completely uninitialized, potentially unterminated.