<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.5" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Ranged Integers and Saturation Semantics - By Robert C. Seacord</title>
	<link>http://taossa.com/index.php/2007/01/18/ranged-integers-and-semantics/</link>
	<description>Continued ramblings on software security and code auditing</description>
	<pubDate>Tue, 07 Sep 2010 10:57:37 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.5</generator>

	<item>
		<title>by: jm</title>
		<link>http://taossa.com/index.php/2007/01/18/ranged-integers-and-semantics/#comment-76</link>
		<pubDate>Mon, 22 Jan 2007 10:42:52 +0000</pubDate>
		<guid>http://taossa.com/index.php/2007/01/18/ranged-integers-and-semantics/#comment-76</guid>
					<description>Things have been hectic this past week, so sorry I didn't get a chance to comment earlier. Hopefully, we can get some discussion going, and we'll follow up with a blog post later on.

So, there are a few general kinds of things that one would like to address: integer overflows, integer underflows, and the various type conversion related gotchas: signed/unsigned value changing conversions, sign-extension related flaws, and truncation. I might come up with a quick set of examples of integral vulnerabilities to help frame the discussion. One thing we observed when writing the book is that many integer-related vulnerabilities can be analyzed in terms of three components, which we denoted ACC: allocation, check, and copy (or more generally, write access). Generally, when arithmetic expressions can be manipulated such that any of these processes get subverted, or desynchronized from each other, then you end up with a situation where you can write outside of the intended memory boundaries. Naturally, not all integer related vulnerabilities can be conceptualized in this way, but it works pretty well for the mainstream case. Anyway, it would probably be helpful to sketch out a handful of insecure idioms in order to help think about how language modifications could address them. I'll try to do this later on.

Thinking out loud, there are probably four goals that modifications to the language could aspire to:

1. Make it easy for someone concerned with integral flaws to write bulletproof code. e.g. ok, I need to take a size from a packet, do some math on it, allocated some memory, and then parse and populate my data structures. If there are some simple features or code idioms that will prevent me from shooting myself in the foot, I'll definitely employ them.

2. Make it easy for someone to modify existing code so that it can be made safe with a minimum of effort. e.g. I have some complicated code here to read in several bitmaps and font definitions from a file, and I can tell it's going to be vulnerable at least fifteen ways to Sunday with all the multiplications and allocations going on. Can I make a couple of changes to variable types or something equally minor that will just make it safe?

3. Make existing code safe without modification to that code. Can I set some compiler flag or pragma, or can we change C/C++ so that vulnerable code can be recompiled so that it is safe?

4. Change the language and/or make simple idioms possible that would herd developers in general towards writing secure code just by using natural constructs. In other words, make changes that non-security aware developers would want to use. Maybe something like bounds-checked arrays?

Right off the bat, I think #3 would probably be impossible, and even if it was, such changes wouldn't ever be approved because they'd destroy more code than they would be fix. I think the set of developers that actually understand integral behavior in C is arguably vastly smaller than the set of developers that *think* they understand integral behavior in C. So, I'd argue that #1 and #2 are currently hard in C, for a coder that hasn't written a compiler or sat down with the standards (or a good book ;&gt;). If the modifications to the language for #1 and #2 were well-done, they might ultimately become idiomatic, realizing #4. 

Anyway, enough rambling. The idea of applying saturation semantics is IMHO quite insightful, as I think it would address many types of integer vulnerabilities. Here are some thoughts:

The idea of encoding range into type is interesting, but, I do see your point about it ballooning the number of definable types in the C language very quickly. The inability to write a general purpose function sort of reminds me of that one classic article on why Pascal sucks - array sizes are part of the type, thus you can't write a general purpose function that takes an array and a number of elements.

The way that current loop idioms would fail with ranged integers is an interesting observation. It almost seems like the use of ranged integers in this context is really shadow-implementing bounds-checked arrays. I obviously haven't thought this through very much, but if you did have bounds checked pointers, you could use the old idioms. E.g. instead of turning a[b] into (*((a)+(b))), the compiler could apply saturation semantics to b as an offset of a. This would very much change how things probably work, but it's another possible idea. Not sure how much it would demolish existing compilers or how much of it would be optimizable with static analysis.

Globally applying saturation semantics applied to signed integers might break existing code. Obviously existing code shouldn't rely on implementation-defined code, but one thing that springs to mind is the TCP sequence number code. It uses an idiom like: 
&lt;code&gt;#define    SEQ_LT(a,b)    ((int)((a)-(b)) &lt; 0)&lt;/code&gt;
Haven't thought it through, but that would probably be messed up.

Having support for a _Sat keyword would probably kill a lot of exploitable issues, even if it was just bounded to the normal integral boundaries. I kinda like this, but I need to think through it more.

Liudy's point is well taken - modifications would likely introduce new subtle semantics and gotchas. I guess it comes down to what you're hoping to accomplish, but if the newer semantics are more deterministic and straightforward than the older ones, it might be worth doing.

Unsigned arithmetic is closed under modulus, but maybe it would be useful to have the ability to have overflow and underflow cause an implementation-defined result, including an exception. The machine support would probably be sketchy (or maybe not, most arithmetic isn't sign-sensitive so it might just require duplicating flags checking code in the compiler). The problem with exceptions is that they are necessarily run-time and not compile-time, and the issues with intermediate results that Liudy mentioned. Definitely an interesting idea, though.

Anyway, lots to think about. I usually think about how to break things, not fix them. :&gt;</description>
		<content:encoded><![CDATA[<p>Things have been hectic this past week, so sorry I didn&#8217;t get a chance to comment earlier. Hopefully, we can get some discussion going, and we&#8217;ll follow up with a blog post later on.</p>
<p>So, there are a few general kinds of things that one would like to address: integer overflows, integer underflows, and the various type conversion related gotchas: signed/unsigned value changing conversions, sign-extension related flaws, and truncation. I might come up with a quick set of examples of integral vulnerabilities to help frame the discussion. One thing we observed when writing the book is that many integer-related vulnerabilities can be analyzed in terms of three components, which we denoted ACC: allocation, check, and copy (or more generally, write access). Generally, when arithmetic expressions can be manipulated such that any of these processes get subverted, or desynchronized from each other, then you end up with a situation where you can write outside of the intended memory boundaries. Naturally, not all integer related vulnerabilities can be conceptualized in this way, but it works pretty well for the mainstream case. Anyway, it would probably be helpful to sketch out a handful of insecure idioms in order to help think about how language modifications could address them. I&#8217;ll try to do this later on.</p>
<p>Thinking out loud, there are probably four goals that modifications to the language could aspire to:</p>
<p>1. Make it easy for someone concerned with integral flaws to write bulletproof code. e.g. ok, I need to take a size from a packet, do some math on it, allocated some memory, and then parse and populate my data structures. If there are some simple features or code idioms that will prevent me from shooting myself in the foot, I&#8217;ll definitely employ them.</p>
<p>2. Make it easy for someone to modify existing code so that it can be made safe with a minimum of effort. e.g. I have some complicated code here to read in several bitmaps and font definitions from a file, and I can tell it&#8217;s going to be vulnerable at least fifteen ways to Sunday with all the multiplications and allocations going on. Can I make a couple of changes to variable types or something equally minor that will just make it safe?</p>
<p>3. Make existing code safe without modification to that code. Can I set some compiler flag or pragma, or can we change C/C++ so that vulnerable code can be recompiled so that it is safe?</p>
<p>4. Change the language and/or make simple idioms possible that would herd developers in general towards writing secure code just by using natural constructs. In other words, make changes that non-security aware developers would want to use. Maybe something like bounds-checked arrays?</p>
<p>Right off the bat, I think #3 would probably be impossible, and even if it was, such changes wouldn&#8217;t ever be approved because they&#8217;d destroy more code than they would be fix. I think the set of developers that actually understand integral behavior in C is arguably vastly smaller than the set of developers that *think* they understand integral behavior in C. So, I&#8217;d argue that #1 and #2 are currently hard in C, for a coder that hasn&#8217;t written a compiler or sat down with the standards (or a good book ;>). If the modifications to the language for #1 and #2 were well-done, they might ultimately become idiomatic, realizing #4. </p>
<p>Anyway, enough rambling. The idea of applying saturation semantics is IMHO quite insightful, as I think it would address many types of integer vulnerabilities. Here are some thoughts:</p>
<p>The idea of encoding range into type is interesting, but, I do see your point about it ballooning the number of definable types in the C language very quickly. The inability to write a general purpose function sort of reminds me of that one classic article on why Pascal sucks - array sizes are part of the type, thus you can&#8217;t write a general purpose function that takes an array and a number of elements.</p>
<p>The way that current loop idioms would fail with ranged integers is an interesting observation. It almost seems like the use of ranged integers in this context is really shadow-implementing bounds-checked arrays. I obviously haven&#8217;t thought this through very much, but if you did have bounds checked pointers, you could use the old idioms. E.g. instead of turning a[b] into (*((a)+(b))), the compiler could apply saturation semantics to b as an offset of a. This would very much change how things probably work, but it&#8217;s another possible idea. Not sure how much it would demolish existing compilers or how much of it would be optimizable with static analysis.</p>
<p>Globally applying saturation semantics applied to signed integers might break existing code. Obviously existing code shouldn&#8217;t rely on implementation-defined code, but one thing that springs to mind is the TCP sequence number code. It uses an idiom like:<br />
<code>#define    SEQ_LT(a,b)    ((int)((a)-(b)) < 0)</code><br />
Haven&#8217;t thought it through, but that would probably be messed up.</p>
<p>Having support for a _Sat keyword would probably kill a lot of exploitable issues, even if it was just bounded to the normal integral boundaries. I kinda like this, but I need to think through it more.</p>
<p>Liudy&#8217;s point is well taken - modifications would likely introduce new subtle semantics and gotchas. I guess it comes down to what you&#8217;re hoping to accomplish, but if the newer semantics are more deterministic and straightforward than the older ones, it might be worth doing.</p>
<p>Unsigned arithmetic is closed under modulus, but maybe it would be useful to have the ability to have overflow and underflow cause an implementation-defined result, including an exception. The machine support would probably be sketchy (or maybe not, most arithmetic isn&#8217;t sign-sensitive so it might just require duplicating flags checking code in the compiler). The problem with exceptions is that they are necessarily run-time and not compile-time, and the issues with intermediate results that Liudy mentioned. Definitely an interesting idea, though.</p>
<p>Anyway, lots to think about. I usually think about how to break things, not fix them. :>
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Liudvikas Bukys</title>
		<link>http://taossa.com/index.php/2007/01/18/ranged-integers-and-semantics/#comment-75</link>
		<pubDate>Fri, 19 Jan 2007 15:22:58 +0000</pubDate>
		<guid>http://taossa.com/index.php/2007/01/18/ranged-integers-and-semantics/#comment-75</guid>
					<description>Both modwrap and saturation semantics would inspire a whole new generation of "unexpected arithmetic results" vulnerabilities.

Exception on overflow would be painful but perhaps might be justified as a way to force coders to really understand what code does under any circumstance.  It's painful because there are a number of common expressions where "the right thing" happens despite overflows in internediate results, and it would take a while for people to understand what's OK in an exception on overflow world.</description>
		<content:encoded><![CDATA[<p>Both modwrap and saturation semantics would inspire a whole new generation of &#8220;unexpected arithmetic results&#8221; vulnerabilities.</p>
<p>Exception on overflow would be painful but perhaps might be justified as a way to force coders to really understand what code does under any circumstance.  It&#8217;s painful because there are a number of common expressions where &#8220;the right thing&#8221; happens despite overflows in internediate results, and it would take a while for people to understand what&#8217;s OK in an exception on overflow world.
</p>
]]></content:encoded>
				</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.074 seconds -->
