So I was wrong

In a previous post I thought the Intel-Apple talks were over XScale processors. I was wrong. As announced at WWDC this week, Apple is indeed going to be using Intel chips in their upcoming computers.

I think this is a bad move. Not because they’re using Intel chips, but because they’re using the wrong Intel chips. x86/IA32 is a dead-end road. Intel even admits that x86 is not the future. x86 is a 20 year old instruction set archiecture. No one respects it.

Going with Itanium/IA64 would have been a bit smarter. Intel could probably even produce a special line of Itanium chips that does microcode emulation of PowerPC, much like the line HP made that did dual IA64 and PA-RISC. I think that would have been much smarter.

I was also incredibly wrong when I wrote previously that developers would throw fits if Apple changed things up again. I forgot that MacOS X apps are compiled with gcc. Switching targets would just mean setting some compiler flags. -ppc vs. -x86. Since Apple evolved the magnificent Bundle format for OS X making applications “dual platform” would also be a snap.

The big unknown that’s left in my mind is: will the rumored “OS X 10.4.1 x86 Preview Edition” run on standard PC hardware, or will it require the “Developer Transition Kit?” If I could run OS X on cheap-o hardware I definitely would. I guess we’ll have to wait two weeks to find out, since that’s when it’s scheduled for release.

Actually, will Apple make OS X available for x86 on non-Apple hardware at all? Darwin x86 has been available for a long time. I predict Apple will probably do one of two things: a) build in some special hardware that OS X can use to identify the system as an Apple or b) write in some insane operating system copy protection much like Microsoft’s. Apple would be basically be killing off their hardware division if they released OS X for x86 without any kind of protection… or maybe that’s the plan… ?

Terrain rendering using geometry clipmaps

Link

In a previous post I wrote about some optimizations I was trying to make to geometry climaps. I’ve summarized my work and posted it at the link above. There’s also a video!

I think the real contribution here is the normal blending technique that I came up with. It uses 1/4 the memory of the previous geometry clipmap work and doesn’t look 1/2 bad. 🙂

Geometry Clipmaps

I’ve been working on recreating the research of Losasso, Asirvatham, and Hoppe on GPU-based Geometry Clipmaps. One of their articles on the subject appears in GPU Gems 2. Geometry Clipmaps are a simple approach to rendering large terrains. The basic idea is you create fixed donut rings of varying LOD and then displace these rings using a heightmap in the GPU’s vertex shader. Offloading the displacement into the GPU is a significant performance improvement as it avoids continuous recreation of vertex and index buffers.

This week I implemented two ideas that I thought would be improvements to the Geometry Clipmap algorithm presented in GPU Gems 2. The first I call the “view dependent” approach. In Asirvatham’s paper the clipmaps are oriented along the x,z plane and move only when the camera exceeds some threshold. Every frame, the positions of the clipmaps must be recomputed by the CPU and then passed to the GPU. The “view dependent” approach attempts to skip this step by fixing the clipmap meshes in view space.

Imagine the intersection of the view frustum with the plane of the clip maps. It forms a large trapezoid that extends out to infinity. The view dependent approach fills this trapezoid with triangles that become denser closer to the camera and more spaced out as you move away from the camera. On the GPU, view space is transformed back into world space and then the world space coordinates are used to sample the height map. The advantage of this approach would be the vertex and index buffers would be 100% fixed and no transformations would ever need to take place on the CPU.

I implemented the view dependent approach and ran into some interesting problems that make it apparent why Asirvatham et al choose to orient the clipmaps in the x,z plane. When the view frustum is oriented at an even 0, 90, 180, 270 degree angle everything looks perfect, but as you rotate the camera in between these angles you get a stair-stepping effect in the terrain. This is because the points sampled on the heightmap are no longer “even” sample points, and you get an effect similar to drawing a line without anti-aliasing.

To combat this problem I tried solving it the same way you would if you were doing anti-aliasing: super-sampling. Instead of sampling at the point given by the transformation, I sampled 4 points around that point and averaged the results. This did indeed smooth the stair-stepping effect, but just like even anti-aliasing in a ray tracer, you’re just offloading the problem further back into the scene–the stair-stepping was still there, it just wasn’t as apparent.

Another way to combat the stair-stepping effect that I want to try would be to sample only the floor of the transformed points and move the transformed geometry in the x,z plane to an even interval. This might get funky, but I’d like to try it.

The second optimization I tried with geometry clipmaps was a simpler clipmap. In Asirvatham’s paper he uses a very unusual clipmap pattern so that transitions from one LOD to the next are always half the detail, and so each LOD ring has the same width. There’s no other way to get half detail and fixed width without using an unusual spiraling pattern. (Look at the paper, you’ll see what I mean). A simpler approach I thought would be to have transitions go to one third the detail and three times the width, kind of like an expanding checkerboard. Picture a Pascal’s triangle of Purina dog chow logos. This optimization worked pretty well. I believe I’m using significantly less geometry than Asirvatham’s approach. It doesn’t look quite as good, but it’s definitely functional.

Because the LOD tapers off much faster with this approach heightmap aliasing effects become apparent much faster, and it made me aware of a serious deficiency with geometry clipmaps that you don’t have with other terrain rendering algorithms. If you have a mountain off in the background, and that mountain has two peaks, it might look like just one wide peak depending upon how it’s sampled. Irregular mesh approaches don’t have this problem as they perform mesh optimization based on changes in slope. Geometry clipmaps just sample willy-nilly regardless of the consequences. A geometry clipmap approach that takes into account the silhouette differences of the resulting terrain might yield better results.

Once I get this project wrapped up I’ll post screen shots.

On the Intel-Apple talks

A lot of websites and news outlets have been spreading the rumor that Apple and Intel are talking. About what no one knows. My speculation is Apple probably wants the XScale architecture for the next generation iPod. Intel’s XScale line of ARM processors have pretty much slaughtered the mobile platform market. This might all be a ruse. I wouldn’t put it past Jobs to start up the hype engine just to get some bargaining power with PortalPlayer, the maker of the current iPod processor. PortalPlayer just IPO’d last year and probably has a pretty big head. Jobs might just be putting them in their place.

It would probably be very easy for Apple to make an x86 version of MacOS X but I just can’t see that happening; the Quark CEO would threaten again to drop the Mac platform and the rest of the developer community would flip out. Apple does a major architecture overhaul every 8-10 years (68k, PPC, OS X)… it’s not quite time yet. 🙂

One website I was reading last night (I wish I kept the link) speculated that if Apple went to Intel processors it would be the end of IBM’s microprocessor business. Please. Isn’t IBM going to make the PS3’s “cell” processor? IBM is going to sell more cell processors than G5’s next year, I doubt losing Apple would hurt IBM that much.

I like the ruse theory. Unless Intel is desperate to get into the iPod I doubt anything will come of this.

Passing large arrays of data to vertex shaders

As a follow-on to saturday’s post about sampling textures in the vertex shader, I thought I’d write up some quick notes on how you can pass data to a vertex shader through textures. Again, this is another one of those things that articles talk about doing but rarely show full examples on how to actually do it. Some times I wonder if the authors of these articles didn’t actually get it working of if they kept it a secret because it gives them an advantage.

You can pass large arrays of data to a vertex shader by storing the data in a 1D or 2D texture. Sampling the texture I covered previously, you basically use the tex2Dlod() function in HLSL (texldl is the shader assembly call). The trick here is: how do you get the data into a texture and then to the vertex shader?

Currently only floating point textures are supported by today’s vertex shader hardware, so the data needs to be stored in one of these formats:

D3DFMT_R32F: 32-bit float format using 32 bits for the red channel.
D3DFMT_A32B32G32R32F: 128-bit float format using 32 bits for the each channel (alpha, blue, green, red).

Direct3D doesn’t allow you to modify textures stored in video memory, so you’ll to create two textures. One of these textures will be stored in video memory, the other in system memory. Whenever you need to change the data passed to the vertex shader you’ll need to change it in the system memory texture and then copy it over the video memory texture. Try not to do this every frame or you’ll kill performance. Most papers I’ve read suggest doing this update in a background thread so that you don’t affect rendering performance. It probably depends upon your application.

Here’s how you create your two textures:

hr = D3DXCreateTexture(
m_pD3DDevice,
width,
height,
D3DX_DEFAULT,
NULL,
D3DFMT_R32F,
D3DPOOL_SYSTEMMEM,
&_systemtex);

hr = D3DXCreateTexture(
m_pD3DDevice,
width,
height,
D3DX_DEFAULT,
D3DUSAGE_RENDERTARGET ,
D3DFMT_R32F,
D3DPOOL_DEFAULT,
&_videotex);

To update the texture, do the following:

D3DLOCKED_RECT rect;
hr = _systemtex->LockRect(0, &rect, NULL, NULL);
FLOAT *pbits = (FLOAT *) rect.pBits;

for(int x = 0; x < width; x++)
{
for(int y = 0; y < height; y++)
{
pbits[y * width + x] = sinf(x);
}
}

hr = _systemtex->UnlockRect(0);
hr = m_pD3DDevice->UpdateTexture(_systemtex, _videotex);

These examples assume the R32F format, obviously if you use the 4 float format things will be a little different.

Vertex texture sampling notes

Vertex Shader 3.0 supports texture sampling. Several books and articles I’ve come across reference this feature and provide HLSL example code for how to do it, but none explain how you’re supposed to set up you DirectX pipeline to be able to make it work. It’s taken me hours and hours to get things working, so I thought I’d share what I had to do to make it happen.

Why sample a texture in a vertex shader? Displacement mapping, passing large arrays of data to a VS so you can deform geometry (something you could only previously do in a pixel shader). Why NOT sample textures in a vertex shader? The performance is much worse than it would be with a pixel shader. Beware.

The HLSL function that performs texture sampling in VS is tex2Dlod(). It takes two arguments, your sampler and a float4 in the format (u, v, mip1, mip2). Set mip1 and mip2 to 0 unless you want to do mipmap blending. Most of the time you’re going to call this function like this:

float4 foo = tex2Dlod(DisplaceSampler, float4(vTexCoord0.x, vTexCoord0.y, 0.0, 0.0));

When you declare your sampler in your HLSL file you don’t need to do anything special. Typically tho you don’t want to do any blending between points in the texture, so specify your sample with POINT sampling:

texture g_Texture;
sampler DisplaceSampler = sampler_state
{
texture = ;
MinFilter = POINT;
MagFilter = POINT;
MipFilter = NONE;
AddressU = CLAMP;
AddressV = CLAMP;
};

Current hardware only supports floating point texture formats for vertex texture sampling. When you load the texture that you’re going to sample in the VS pass D3DFMT_A32B32G32R32F or D3DFMT_R16F as the format.

HRESULT hres = D3DXCreateTextureFromFileEx(
d->GetDevice(),
name,
D3DX_DEFAULT, D3DX_DEFAULT,
D3DX_DEFAULT,
D3DUSAGE_RENDERTARGET,
D3DFMT_A32B32G32R32F,
D3DPOOL_DEFAULT,
D3DX_DEFAULT,
D3DX_DEFAULT,
NULL,
NULL,
NULL,
&texture);

The docs say to use RENDERTARGET for vertex sampled textures but I’ve monkeyed with this value and can’t get it to change anything.

I could only get tex2Dlod() working by using hardware vertex processing with the HAL renderer. Software vertex processing is returning 0 for the texture every time, so I guess I still have some bugs to work out. I had an email exchange with Wolfgang Engel about this and he said that the HLSL compiler at one time had some bugs with how it handled the tex2Dlod() function and that they might still be in there. In the mean time I’m just doing without vertex debugging I guess.

g_pD3D->CreateDevice(
D3DADAPTER_DEFAULT,
D3DDEVTYPE_HAL,
hWnd,
D3DCREATE_HARDWARE_VERTEXPROCESSING,
&g_d3dpp,
&g_pd3dDevice );

To pass the texture to the vertex shader you give it to the ID3DXEffect object like you would any other texture:

hr = g_pEffect->SetTexture( “g_Texture”, texture);

That’s about it. The biggest sticking point for me was finally trying it on the hardware, because for whatever reason my REF renderer isn’t doing tex2Dlod() correctly. Usually things are the other way around right? It works in the debugger and fails outside the debugger? Oh well. 😛

Mail 2.0 Gripes

Mail 2.0 Gripes

As has been reported elsewhere, Mail 2.0 in MacOS 10.4 is pretty different. I thought I’d share what I like and don’t like about the update to what I still believe is the best email client in existence.

Before I start, you might be wondering why I think Mail is teh best email client. My old TiBook 800 has been dying for the last month (overheating bad–I can’t figure it out), so I started looking for a replacement because I’ve been doing all of my email, instant messaging, and latex authoring on my mac. I seriously considered not replacing it at all and moving all of these tools to my desktop PC, but after evaluating tools for email on Windows I just couldn’t do it. Outlook, Thunderbird and Eudora suck, and there’s no way I was going back to Pine, XEmacs or mh. Outlook is slow. Thunderbird has some serious glitches with the way it handles multiple accounts. All of them look nasty, and all of them have very strange behavior when you try to edit a quote-reply. I’ve used a dozen email clients over the years, and I’ve grown to love MacOS Mail. I thought about for a long time and decided to replace my TiBook with a 12″ G4 mainly because I couldn’t live w/out Mail, Adium and TexShop. If these programs were available on PC I’d have no problem paying $100, $50 and $50 for each of them respectively.

Likes:

Conversation/Thread view is nice. It’s snappier than Outlook and the color coding makes it easier to discern threads at a glance.

I think Apple is starting to understand “teh snappy“. Mail 2.0 is more responsive. It looks like they finally put some multi-threading into the mail checks; previous versions would block on certain mail i/o and I’d have to force quit if my internet connection went down during one of these operations.

Spotlight integration is great. Searching mail is super-fast and intuitive. It’s required a little retraining for me to get used to using Spotlight, but so far it’s been very useful. Spotlight isn’t a new idea, but I think Apple’s done it a bit better than previous incarnations for 10.3.

Junk mail filtering feels a bit smarter. It seems to be learning a lot faster than my previous Mail on 10.3.

Dislikes:

Yeah, it’s ugly. I’m a little flabbergasted that the Mail designers broke the UI guidelines repeatedly in Mail. Of course, this is usually the way Apple works internally: some renegade team with Steve’s blessing makes some gross UI violation, and then a year or so later it shows up in the official UI guideline document as a sanctioned way to do things. Quicktime, iTunes, Safari, the “brushed metal” look, etc., now Mail.

The drawer is gone! You can still make the mailbox view go away (Cmd-Shift-M), but it can only be shown on the left. I liked it on the right better. I actually think showing the mailbox hierarchy on the left is a poor UI decision. Westerners read from left to right, so important, frequently accessed information should start in the upper-left corner. I liked keeping the mailbox view on the right because I didn’t access it that much. At the very least they should have designed it so you could move it to the right side if you wanted too. With the drawer you could do that.

What happened to the status bar? I’m a geek, I want to know what’s going on at all times! Maybe there’s an option somewhere to get it to reappear, but I’m not finding it…

I’m a little disapointed that IMAPS still isn’t a first class citizen in Mail. You can enable IMAPS for a server, but only after you’ve added the server as an IMAP server first. If your mail server rejects IMAP connections it can be a little awkward getting the server added to Mail.

GrowlDict

GrowlDict is a MacOS X Service that interfaces with the the Growl Notification Manager. It’s pretty slick. Highlight a word–any word, anywhere–hit Cmd-Shift-F and you can a see definition of the word drawn as a sexy Growl notification. Much quicker than using the Dashboad dictionary.

Xbox development

I wrote an xbox game over the weekend; finally got around to posting a movie of it today. You can download the movie here. (I recommend viewing it w/ VLC).

The game is a t3tris clone, but it does a lot more than your average clone. The blocks themselves are actually stored in their own independent data structure rather than a giant array like your typical t3tris clone. Storing the blocks in this manner allowed me to do per-piece bump mapping and some nifty little physical simulation-esque effects. I also added a particle system effect so that when lines are broken they appear to blow out of the stack. Currently the game is only one player, but it should be easy for me to make it four player… my goal is to get Tetris Worlds (R)-equivalent functionality so I can start experimenting with multiplayer t3tris game variant ideas I have. I think adaptations of t3tris that include “screw your neighbor” elements like Swordfighting in Puzzle Pirates are great party-game fun, but nothing like this exists on Xbox.

Xbox development is actually really easy. Xbox is basically DirectX 8.1 w/ some extensions to make common game tasks easier, like controller input, saving state, etc. Xbox also includes a whole slew of debugging tools for performance profiling and shader debugging. In some ways its even easier than PC game development. I think if I were to write a PC video game I would actually start on Xbox and then port it back to PC. 🙂

Closed architectures and copy protection

Wonderland and BoingBoing both point out the obvious fact that the Sony PSP is a “closed” architecture: …the device will only run software that has been cryptographically signed by Sony itself.

Well… duh. Sony isn’t doing this to be “evil” or to box-out would-be homebrew game developers. It’s for a much, much simpler reason: copy protection. If it were possible to run anything you wanted on a PSP people could write little hacks that circumvented the code signing. Code signing presents a chicken-and-egg problem to hackers that is a solid first level of defense against game piracy. Since game royalties are probably expected to make up a huge portion of Sony’s profit on the PSP it makes sense for them to do everything possible to prevent people from stealing games.

Why am I so passionate about this subject? About a year ago I released a game for Pocket PC, SmartPhone and Symbian called Vector Blaster. About a month after I released the game I started surveying the p2p networks and noticed that trading of my game was rampant. I estimate that for every one person that bought Vector Blaster there’s at least one hundred other people that stole the game. Myself and one other developer that I know basically stopped making games for Pocket PC and SmartPhone because piracy was such a huge problem and Microsoft ignored our cries to help lock the platform.

Vector Blaster isn’t even that expensive of a game. On PSP, the drive to pirate is even greater. The games cost more, they’re higher quality, etc., etc.–there’s just more at stake. Sony needs to protect their platform or no one will make games for it.

The general rule in the game industry is 90% of a game’s profits come in the first 6 weeks of sales. (MMORPGs being an exception). PC game companies shoot for the 6 week mark for their copy protection. If they can delay the piracy of their game for 6 weeks then it won’t make a substantial impact on their sales. Sony is probably trying for a similar goal. I bet that if they can delay the cracking of their platform for 12 months then that will give 90% of their target market a chance to buy a PSP and a few games. If it’s cracked after 12 months it will definitely have an impact on sales, but the impact on Sony’s bottom-line won’t be as significant.

In the end, someone will have figured out how to get past the PSP’s copy protection efforts, but by then Sony will have sold enough games to recoup the cost of the PSP and bring home some profits. Or not. Sony could get lucky: If it requires a modchip to crack the PSP then I doubt game piracy will have any effect at all on game sales. I just can’t see kids asking their parents, “Mom, can I take my PSP apart? … But Mommmmmm….”