AMD actually DOES use stdcall, they just seem not to have used a .def file when they created the library, or something to that extent, which meant they exported the whole mangled function names, not just the clean names themselves, which is common practice with Microsoft and OpenGL.
However, nVidia does NOT use stdcall. This wasn't immediately apparent, since their exported symbols looked nice and clean. However, upon inspecting the actual code inside the DLL, the telltale retn NN callee stack cleanup was missing. So it's not stdcall, it's cdecl.
Who is right in this case? I don't know yet. OpenGL uses stdcall, so that would mean nVidia is wrong here aswell. On the other hand, OpenAL DOES seem to use cdecl, so it's not like there's much of a consistency. AMD says that stdcall was decided by Khronos. In that case, nVidia is wrong, and Khronos is wrong aswell, for not catching this problem during OpenCL 1.0-conformance testing (just like Khronos didn't catch the mangled naming in AMD's CPU drivers, they both passed their tests).
At any rate, I think they all need to go back to the drawing board.
AMD's beta4 SDK has fixed the decorated naming problem. They now have clean naming and stdcall functions, analogous to OpenGL. So I think the AMD SDK in its current form is 'correct'.
nVidia has conceded that they didn't use stdcall, however, they said it wasn't really a mistake because Khronos made the decision to use stdcall at a later time.
They have said that their next release will use stdcall. Sadly they didn't comment on the exported symbol naming problem. So at this point I cannot be sure that their next release will be fully 'correct' and fully compatible with AMD's SDK on a binary level. I think nVidia will do the right thing though.
There was no actual reply from Khronos itself on the matter though. So it looks like this problem was mainly solved between AMD, nVidia and some developers who were using OpenCL and who pointed them into the right direction.
nVidia's Cuda 3.0 beta release also fixes the calling convention/function naming problems. AMD's beta4 SDK and nVidia's Cuda 3.0 beta are now binary compatible.
I've also found that the new nVidia OpenCL release solves quite a few performance issues. OpenCL now runs very well.
On the AMD side, AMD still goofed up in beta4. I tried running the CPU implementation, and it complained about missing atical*.dll files. If you don't have an ATi card in your system, you can't install the Catalyst driver that contains those files. So you have to manually extract the files and place them in the same directory as OpenCL.dll.
But after I had done that, I could run some nVidia samples on the AMD CPU driver, and I could run some AMD samples on the nVidia driver. So the binary compatibility is a fact, and developers can now test their code on two implementations.
The WinXP x64 nVidia 189.91 drivers with OpenCL support did not make an entry for the uninstaller in add/remove programs
It was a bit of a bummer because I installed it in hope of being able to run the tech demo "NVIDIA’s ocean demo" or "DirextX compute ocean" with them (Which Anand used in the 5870 review)
However there drivers did not seem to support it...
OpenCL and DirectCompute are completely independent APIs.
The Ocean demo is DirectCompute, which is part of DirectX 11, and has nothing to do with OpenCL. As such, it's not going to work on XP anyway. You need Vista or Windows 7.
I looked through some of the OpenCL stuff I compiled with the nVidia SDK, and it just links against a generic OpenCL.dll.
Theoretically it should just work fine with any OpenCL.dll, as long as it exports the same functions (which it should, if it's OpenCL 1.0-conformant).
So I don't think there's a binary dependency on a manufacturer there. The only 'problem' is that the binary will just link against whichever OpenCL.dll it finds first. So if you have multiple OpenCL devices installed, you'd probably have to drop your preferred OpenCL.dll into the same directory as the application, to ensure it runs on the proper device, as without an ICD, each DLL will only enumerate the devices from its manufacturer.
Care to elaborate then? As I say, if I just place a different OpenCL.dll into the directory, the same application can run on another vendor's hardware.
Hence no binary dependency. You don't need to recompile the application.
So what are you talking about, if you are so sure?
Ah, I see the problem.
For some reason AMD used cdecl exported names, they now all have leading underscores. nVidia uses stdcall, which is the default in Windows, and also used in OpenGL32.dll under Windows.
I think AMD made a snafu there in their SDK.
As I already said, an ICD isn't going to help AMD here. Either way, they have to recompile their code to use stdcall.
Besides, for most people an ICD isn't that important. As long as they have an OpenCL for their videocard, which would work, it'd be fine. And that OpenCL dll would just be automatically installed with the videocard drivers.
Only people who would specifically want to run OpenCL on their CPU, or who have more than one brand of videocard in their machine, would benefit from an ICD. That's not the issue here. The issue here is that there's no way you can get nVidia's and AMD's libraries to play nice. One of them needs to recompile, and I think it's going to be AMD, since you will want OpenCL to be consistent with OpenAL and OpenGL in calling convention.
>>Only people who would specifically want to run OpenCL on their CPU, or who have more than one brand of videocard in their machine, would benefit from an ICD. That's not the issue here.
Um, that would be me. Both the CPU and multiple cards.
Anyone have any experience with development tools for CUDA/OpenCL? The (Anandtech) Fermi article mentions Visual Studio integration is coming -- how about something similar on Linux? Any experience with the current NVidia OpenCL profiler, or know of something (perhaps $$$) that is even better?
I would suspect that, iff everything goes well, we can expect OpenCL support in the next driver drop from AMD which is more than likely going to be earlyish this month (I would suspect at the latest just before the Win7 launch date).
I doubt that, really.
I don't think Khronos will even have it tested by that time, they took more than a month on nVidia's drivers aswell. Which means that they'd be through the tests in late October (they were sent to Kronos on the 21st of September).
Also, nVidia kept the drivers in beta another 3 months, before making it a first public release, and they haven't put them in the official driver release yet (as the article says, 190.89 supports it, but the recently released 191.07 doesn't yet).
I would think that AMD will also keep the drivers in beta for a while, even if they do pass OpenCL-conformance testing. After all, that only proves that the OpenCL portion works, it doesn't prove that the drivers as a whole work. They'd still need to be tested for regular functionality and pass WHQL. They'd also have to be merged back into the main release tree, as they'll be a few months behind regular releases by this time (just like nVidia released a new driver a few days ago still without OpenCL).
So no, I don't expect OpenCL drivers this month, probably not even next month.
I don't see why they would release them in the Catalyst drivers until there are actual applications to use OpenCL. It wasn't clear to me that you were referring to an official Catalyst release.
I think Nvidia is at pretty much the same point (but having released them earlier (about 2 weeks)) with beta drivers released. Has Nvidia put them in their official public release? I don't even know why they would do so, with no programs to use it; it would just make the download larger.
"Also, nVidia kept the drivers in beta another 3 months, before making it a first public release, and they haven't put them in the official driver release yet (as the article says, 190.89 supports it, but the recently released 191.07 doesn't yet).
I would think that AMD will also keep the drivers in beta for a while, even if they do pass OpenCL-conformance testing."
I think it was pretty clear...
And nVidia's drivers aren't beta anymore. They're a public release, just not the main release. The 190.89 drivers were beta for a number of weeks, available to registered developers, before they made them public.
It seems the same goes for AMD now. They're in the SDK beta 4, but you can only download them after you're registered as a developer.
And it's a chicken-and-egg problem. As a developer I find it very important that they release public drivers ASAP, because I want to know what drivers my software is expected to run on. Drivers first, applications later. It doesn't work the other way around.
I can't release applications if users need to register as developers with their IHV, and then download a beta SDK just to run my application. I also cannot guarantee that my application will run with future updates to the beta SDK/drivers.
Besides, don't forget that running beta drivers will also impact performance, stability and compatibility with other software. I cannot recommend end-users to run beta drivers rather than the official supported drivers.
So plenty of reasons to put them in the official release, no reasons not to (unless you want to sabotage the adoption of OpenCL).
According to people who have downloaded them, they are labeled as Catalyst 9.11 RC, which suggests they could be released in November. Thank you for clarifying; I'm not a developer, so I didn't really consider the implications for developers. Looking on Nvidias website, I didn't see the download for the 190.89 drivers, so I thought those were a beta as well.
You don't need CPU-drivers for OpenCL. All you need is nVidia's GPU-accelerated OpenCL drivers, and you're up and running.
So the current status is like this:
- People with an nVidia card will be able to run OpenCL by installing the proper drivers (version 190.89 currently).
- People with anything other than an nVidia card can currently only run a CPU-based OpenCL implementation. AMD currently supplies a CPU implementation in their Stream beta SDK, which works on both AMD and Intel CPUs, although obviously it was aimed at AMD CPUs. Intel currently hasn't released anything, not even in beta.
There's also some OpenCL support in the works with the opensource Mesa/Gallium3D project.
"For NVIDIA GPU users with Intel CPUs, they'll be waiting on Intel for a CPU driver. Do note however that a CPU driver isn't required to use OpenCL on a GPU, and indeed we expect the first significant OpenCL applications to be intended to run solely on GPUs anyhow. So it's not a bad situation for NVIDIA, it's just one that needs to be solved sooner than later."
OpenCL has both CPU and GPU implementations, so if you want a fully utilized system with OpenCL you need to have CPU and GPU drivers. At least, that's how I understand it, and of course near-term the GPU driver is going to prove more important than the CPU driver.
I just think it's a very confusing thing to write. It shouldn't even have been mentioned in the first place.
Firstly, obviously nVidia doesn't need to provide a CPU driver, since that's the job of the CPU manufacturer. Just like nVidia doesn't need to provide a GPU driver for AMD hardware either.
Secondly, there's absolutely no need to have a "fully utilized system" with both CPU and GPU drivers. As you say yourself, the CPU driver isn't going to be important.
Thirdly, AMD never said anything about nVidia not having a CPU driver. The point with AMD (which is REALLY lame) is that they keep pretending that nVidia doesn't support OpenCL at all, and that nVidia is only pushing its proprietary C for Cuda and PhysX standards, so AMD is trying to look like the 'good guy', while in reality nVidia is the one offering OpenCL support (not to mention that AMD also tried a proprietary standard first, but only nVidia actually managed to get some software support with their proprietary technology). In fact, AMD even claimed that they were the first to support DirectCompute with the launch of the HD5870 (if they added CS5.0 to the statement they would be right, but they didn't). This is also not true, since nVidia has supported DirectCompute since the first 190-release drivers in July, months before the HD5870 was launched.
"In fact, AMD even claimed that they were the first to support DirectCompute with the launch of the HD5870 (if they added CS5.0 to the statement they would be right, but they didn't). "
Most of the releases and news reports I read specifically said the first WHQL certified DirectX 11 and DirectCompute 11 driver, not that they were the first with DirectCompute.
"•AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors are expected to be the first to support accelerated processing on the GPU through DirectCompute."
I read that to mean that they are the first with DirectX 11 GPUs that support DirectCompute. It seems that later press releases have made it more clear that they meant DirectCompute 11.
I don't see how you can read it to mean that. You *know* that's what it's supposed to mean if you are up to speed with the subject... but if you don't, there's no way you could read it like that, because a key piece of information was simply omitted from that statement.
Not really, after "first", you are assuming it should have GPU, whereas the subject was "DirectX 11 enabled graphics processors". However, I agree it is not absolutely clear, but I don't think it was purposeful, since the subsequent press releases clarified the statement.
"AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors are expected to be the first to support accelerated processing on the GPU through DirectCompute."
"The first" refers back to "AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors".
That part is very clear.
The problem is with this: "accelerated processing on the GPU through DirectCompute."
Your suggestion doesn't make sense...
You would get:
"AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors are expected to be the first family of DirectX 11 enabled graphics processors to support accelerated processing on the GPU through DirectCompute."
(Note that 'first' now takes on a slightly different meaning, the function of the word in the sentence changes).
What you have now is a kind of pleonasm. Since AMD's GPUs are the first DX11 GPUs, they are obviously the first DX11 GPUs to support whatever feature.
I'm sure that's not what they meant to say. It's just too far-fetched.
Jarred already said most of what I want to say, but I will add something.
It may be a confusing thing to write, but I consider it a critical point none the less. From what we're seeing out of the Apple developer camp, OpenCL is going to be big on the CPU. It won't be as big as it is on the GPU (no one is going to write something in OpenCL that they only intend to run on x86 processors), but big none the less.
In the mean time we have this crazy situation where you need drivers from multiple sources in many cases to get a complete driver stack. And even with drivers, it's all a mess without the ICD.
My fundamental point right now is that in spite of having a complete spec and certification, the OpenCL situation is very, very screwed up on Windows and Linux. When most of my time talking to contacts is composed of them trying to answer "who is responsible for what", there's a problem.
For OpenCL to succeed there needs to be full GPU and CPU drivers for all platforms, and an ICD to tie them together. We're not there yet.
Why is that a crazy situation?
The same goes with OpenGL or Direct3D. You can have multiple devices, even from multiple vendors, and just enumerate through all of them.
You will ALWAYS have to have drivers from multiple sources, the ICD won't solve that. Even though AMD and Intel might package both their CPU and GPU drivers into a single downloadable package, they will STILL be two independent implementations, and two independent drivers. So from a technical point-of-view, it doesn't really matter whether CPU and GPU drivers come from the same manufacturer or not.
I suppose the best solution would be for Microsoft to offer CPU drivers through Windows Update. They already deliver GPU drivers through Windows Update, so eventually those will be updated to drivers with OpenCL support. If they also solve the CPU-part of the equation, the end-user doesn't even have to know about OpenCL.
Thing is that you make it sound like it's somehow nVidia's fault or responsibility to supply CPU drivers, and that's very confusing (and AMD has never said anything of the sort either).
>>You will ALWAYS have to have drivers from multiple sources, the >>ICD won't solve that.
Yeah, but right now I have to link my OpenCL/Windows program with either the ATI Stream SDK, OR the NVIDIA OpenCL SDK. I can not do both (unless I do a plugin OpenCL driver layer for my program, thats a PITA I don't want to deal with).
The ICD, with enumeration of installed drivers built in is CRITICAL for developers ease of use.
Technically you don't.
As long as you link to 'a' OpenCL.dll, it should be fine.
The problem here is that AMD uses a nonstandard calling convention in their OpenCL.dll. That's why linking to their stuff doesn't work for nVidia and vice versa.
nVidia uses the same standard as Microsoft uses, and also OpenGL and OpenAL use, so I think AMD is the one who made a mistake here.
If AMD had used the same standard calling convention, we wouldn't have this problem. Then all functions could just be automatically imported by name.
Besides, the ICD won't solve this problem. The calling convention that AMD uses also has caller stack cleanup, rather than callee stack cleanup. You'd get stack corruption. They just need to fix and recompile their code.
That's the only time when you'll need to have an ICD.
However, in most cases just a single device will be fine. Developers or end-users wanting to use OpenCL on a single GPU (or a set of GPUs from the same vendor) is by far the most common scenario.
First nvidias drivers are for gt200 and lower and just gpu-s so its no reason for nvidia not to hawe drivers first.
Second amd-s new gpu had come out this month so its obvious that they waited with the drivers so they hawe everything in one.
Also they hawe cpu+gpu support and i think they prepared it with CPU and GPU merge in mind later on.
AMD still doesn't have GPU drivers for OpenCL at all.
And they currently only support DirectCompute on the HD5800-series. The 4000-series should also support DirectCompute CS4.1, but no drivers in sight.
There's no reason for AMD to delay 4000-series support for newer hardware. And there's even less reason for AMD not to support them even after releasing newer GPUs.
Apparently nVidia supports their existing customers much better. They too have a new GPU upcoming, but that didn't stop them from supporting all their existing customers, all the way back to the 3-year old 8800-series. AMD 'only' has to support the 4000-series, as the other hardware isn't capable anyway, but still, nothing.
Testing and certificating drivers takes quite a time. I dont see to much reason to make them first for 4k series than cpu-s and than also for the whole 5K series (which is actualy quite more cards than 5870 and 5850).
It would make much more sense to make it at once. Which nvidia also did but with the 1 year old GPU as their latest.
Yes, testing and certifying drivers takes time. That's no excuse though, is it? nVidia has to go through the same process, and nVidia actually supports 3 major series of GPUs (G80, G92 and GT200) in their drivers, AMD only has to do two (RV770 and RV870).
Besides, especially in the case of OpenCL, that has NOTHING to do with new GPUs. The OpenCL project started a long time ago, and AMD promised us drivers in the first half of 2009.
And even now that their new GPU is on the market, there STILL aren't any OpenCL drivers. AMD just failed to deliver on their promises.
And if I owned an HD4870, what do I care about the new GPU? Why should I wait? I just want the OpenCL support that AMD promised me. Especially when AMD shouts their mouth off about OpenCL and GPU-accelerated physics in the press. nVidia owners don't even NEED to wait for OpenCL, because they can already get accelerated physics and such through Cuda. Yet they get OpenCL before AMD owners do aswell. AMD should stop marketing vapourware and start supporting their customers.
If I had to take a guess at it, you are SiliconDoc the annoying commentor from the Fermi article with a different name and using less caps locks...did you finally calm down?
Oh and BTW, WE DON'T CARE who came out first with the drivers, they will eventually both have it.
My understanding of AMD's lateness to the "party" is because they were busy launching the HD5xxx series cards and didn't have the same "free" time on their hands as NVIDIA has with their phantom GT300/Fermi release (I'm saying phantom because it has been talked about but there is no final product in testing (I don't refer to the first silicons of a new product as final).
NVIDIA came first for OpenCL, but AMD came first for DirectX 11, GET OVER IT!
Scali sounds like someone who is interested in GPGPU computing, more specifically wide opencl support across a mature platform, which opens up the GPU for applications other then gaming, definitely not like the scathing often misinformed comments of SiliconDoc. And you're making the argument that it doesn't matter who's first to support knew technologies, after you criticize Nvidia's "phantom GT300/Fermi" announcement for being late, and at the same time sympathizing with AMD's first to dx11 HD5870 launch? You're comment ended up sounding a lot more like SiliconDoc then then anything Scali wrote.
You are a bit naive and you obviously cannot read between the lines. I didn't praise anyone, hell if I use your argument that I praised someone, that would mean I praised AMD for their first to market DX11 cards and NVIDIA for their first to market OpenCL drivers, I did neither. I simply mentioned facts and how irrelevant this argument about AMD being late to the game is compared to NVIDIA also being late in another regard. If you don't understand that, well you're a lost cause.
As for mentioning the "phantom Fermi" comment, I am sorry you can't comprehend sarcasm. I'll stop making too much sense in the future.
But the point of my comment stands, GET OVER IT!
BTW, regarding Scali and SiliconDoc, they are both using the same type of argument and don't seem to stop arguing when everyone is trying to smarten them up, that is where I saw the similarity. As for me, I know when to stop...unlike some people.
I doubt it. Scali sounds like a developer anxious to develop OpenCL apps for whatever purpose and possibly doesn't care about DX11 at all (e.g. not developing a game/gfx acceleration or targeting non-Windows platform). If I were in his shoes, I too would annoyed at AMD, since they're "blocking" access to a substantial marketshare of GPUs with their late drivers. As a developer myself, I don't give a shit about which vendor is better - I care about what features I can play with and how much hardware in the market will support those features.
By the way, the missing feature in pre-4000 series GPUs from AMD is shared memory.
Pre-4000 series GPUs also won't be able to use DirectCompute, even with CS4.x downlevel. Again nVidia's 8-series and higher will all support DirectCompute CS4.0.
Unfortunately, it's the SM5.0 profile which has the more useful things in it, such as the Interlock functions which (as I understand it) don't work on SM4.0 hardware when it comes to DirectCompute.
From a gaming devs point of view these are pretty vital for various processes (such as single pass luminance or single pass deferred lighting), which imo reduces the usefulness of DirectCompute on anything pre-SM5.0, certainl in the games sphere.
I don't think so, to be honest. Why would you need interlocking for deferred lighting?
It is unfortunate though that there is no interlocking support in CS4.0, since only the original G80 series doesn't support it. G92 and later do have interlocking and various other additions which aren't exposed through OpenCL or DirectCompute.
You actually see complaints about DirectCompute in the nVidia GPU Computing SDK, such as:
// Notice: In CS5.0, we can output up to 8 RWBuffers but in CS4.x only one output buffer is allowed,
// that way we have to allocate one big buffer and manage the offsets manually. The restriction is
// not caused by NVIDIA GPUs and does not present on NVIDIA GPUs when using other computing APIs like
// CUDA and OpenCL.
For single pass Interlock functions are used to work out the depth of a "tile" for processing and accumalate which lights are effecting that "tile" before performing the lighting resolve.
Granted, the process could be carried out without such things, it would probably require more passes however and generally be less efficient.
Depth of a tile? Are you now confusing tile-based rendering with deferred rendering, not to mention confusing compute shaders with conventional rendering techniques?
And even then, I still don't see why interlock functions would be required.
No, I'm not confusing anything, it was an idea put forward by Johan Andersson of DICE at the Siggrap 09 conference.
You do the standard deferred rendering step for creating a g-buffer
Then you dispatch compute shaders in blocks of 16 pixels aka a 'tile'
Each thread then retrieves the z depth for its pixel from a g-buffer; interlockMin and interlockMax at then used to obtain the min/max z for that block of pixels
The compute shader then goes on to calculate which lights intersect this 'tile' given the extents and min/max data (processing 16 lights at once)
The light count is increased using an interlockAdd for each intersecting light and the light index is stored in a buffer
Finally, the compute shader goes back to pixel processing, where each thread in the group sums the lighting for its pixel and writes out the final data.
No confusion at all and a good example of how a compute shader can be used to calculate and output graphics.
It's just one very specific example. It's a gross over-generalization to say that anything lower than CS5.0 is useless for graphics based on this single example.
In fact, your entire hypothesis is wrong. You go from "If CS5.0 can do it better, CS4.x is useless".
The correct hypothesis would ofcourse be "If CS4.x can do better than using only conventional geometry/vertex/pixelshaders, then CS4.x is useful".
I never said CS4.x was 'useless' I just said that due to the lack of Interlock functionality its usefullness was reduced.
Will it have some uses? Of course, and I dare say it will allow some cool things to be done, however when compared to CS5.0 profiles with, in perticular, the Interlock stuff then some things become harder to do or indeed impossible in a single pass. (see previous example).
I agree that its unfortunate that the CS4.0 profile doesn't support Interlock, but I guess they had to draw the line somewhere.
Yea well, I just think you have an odd perspective.
Obviously CS5.0 is better, and obviously DICE wanted to promote the new DirectX 11 features.
Bottom line is however that we've not had CS at ALL yet, in DirectX, even though there's a huge installed base of DX10 cards capable of CS4.0 or CS4.1. There isn't a lot of DX11 hardware out there yet.
Therefore in the short term CS4.x is going to be the more interesting one, as it allows you to implement new functionality like realtime tessellation, physics etc, or to make more efficient implementations of existing technologies like post-processing/SSAO and all that. CS4.x is just a nice shot in the arm for all that DX10 hardware out there.
On another note, you also have to put CS5.0 into perspective. Interlock makes it easy to code certain things a certain way, but it's no guarantee that it will also be efficient. That depends largely on how the hardware implements interlocking. Think back to the conditional branching that was introduced in PS3.0 for example. In many cases it was actually faster to just use a multipass algorithm using alphatest, which only required PS2.0, simply because the branching itself wasn't implemented in a very efficient way, unlike alphatesting.
So while the DICE solution looks nice and efficient, it doesn't necessarily have to be all that much faster than a more bruteforce multipass algorithm. In fact, if I had to choose, I'd rather implement a CS4.0 algorithm that improves performance on all DX10 (and DX11) hardware, than to go for a CS5.0 algorithm that doesn't work on DX10 hardware, and may only be marginally faster than a CS4.0 algorithm on DX11 hardware (which is already the fastest hardware on the market anyway, so it's not the hardware that needs the performance increase most anyway).
If you want a nice case, look at PhysX. It even runs on the G80 architecture, which doesn't support interlocking. So there's great things you can do with compute shaders without interlocking. It would be very nice if developers would use CS4.0 for such physics effects.
The entire Linux graphics stack is being overhauled.
The current one has issues to address, and its the very reason why there's nothing like DXVA on Linux. Nvidia worked around this deficiency by coding their own approach...VDPAU. But this only works with Nvidia GF8 or newer cards and closed drivers.
Closed drivers mean Nvidia must keep up with Kernel and Xorg versions. If they don't, you are at their whim and have to wait. If they choose to drop support for a specific era of hardware, you are SOL.
Right now, Linux is gradually moving to the new stack. This is going to take time and cause pain to some users. (As infrastructure changes often do)...But the benefit is that things like HD playback accelerated and OpenCL will be supported in the long run. Features won't be hardware brand specific like VDPAU is...So far, there's very raw code for OpenCL support in the new stack. (Someone has started something, but its not really usable...More like an initial thing to see if its feasible.)
As for AMD, the greatest thing they've done was release the documentation specs for open driver development. Right now; 2D and X-Video is done for all Radeons up to 4xxx series; with power saving (PowerPlay) and 3D features being worked on...They "kind of work", but are really buggy.
Overall, if you need 3D acceleration now, you have no choice but to use Nvidia cards and closed drivers.
In the long run (5 yrs+ away?), Radeon's may have the advantage of having open driver support. ie: Radeons will work out-of-the-box without too much fuss. One less step in setting up a Linux box.
Wow, talk about a epic failure of understanding what vdpau is.
vdpau, is NOT vender specific. It is a freely open standard that other graphic venders are free to implement as well in their drivers support either through native driver solution or through use of a wrapper. vdpau is a API. vdpau is not isolated to nvidia cards as well. The S3's chrome 5 series for example has native vdpau support.
Saying vdpau is "vendor specific" is nothing but pure BS. You might as well say openCL is "vendor specific" as well by your definition of "vendor specific" since they are the only ones right now with openCL support in their drivers.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
67 Comments
Back to Article
maomao0000 - Sunday, October 11, 2009 - link
http://www.myyshop.com">http://www.myyshop.comQuality is our Dignity; Service is our Lift.
Myyshop.com commodity is credit guarantee, you can rest assured of purchase, myyshop will
provide service for you all, welcome to myyshop.com
Air Jordan 7 Retro Size 10 Blk/Red Raptor - $34
100% Authentic Brand New in Box DS Air Jordan 7 Retro Raptor colorway
Never Worn, only been tried on the day I bought them back in 2002
$35Firm; no trades
http://www.myyshop.com/productlist.asp?id=s14">http://www.myyshop.com/productlist.asp?id=s14 (Jordan)
http://www.myyshop.com/productlist.asp?id=s29">http://www.myyshop.com/productlist.asp?id=s29 (Nike shox)
Scali - Thursday, October 8, 2009 - link
AMD actually DOES use stdcall, they just seem not to have used a .def file when they created the library, or something to that extent, which meant they exported the whole mangled function names, not just the clean names themselves, which is common practice with Microsoft and OpenGL.However, nVidia does NOT use stdcall. This wasn't immediately apparent, since their exported symbols looked nice and clean. However, upon inspecting the actual code inside the DLL, the telltale retn NN callee stack cleanup was missing. So it's not stdcall, it's cdecl.
Who is right in this case? I don't know yet. OpenGL uses stdcall, so that would mean nVidia is wrong here aswell. On the other hand, OpenAL DOES seem to use cdecl, so it's not like there's much of a consistency. AMD says that stdcall was decided by Khronos. In that case, nVidia is wrong, and Khronos is wrong aswell, for not catching this problem during OpenCL 1.0-conformance testing (just like Khronos didn't catch the mangled naming in AMD's CPU drivers, they both passed their tests).
At any rate, I think they all need to go back to the drawing board.
Scali - Sunday, October 18, 2009 - link
AMD's beta4 SDK has fixed the decorated naming problem. They now have clean naming and stdcall functions, analogous to OpenGL. So I think the AMD SDK in its current form is 'correct'.nVidia has conceded that they didn't use stdcall, however, they said it wasn't really a mistake because Khronos made the decision to use stdcall at a later time.
They have said that their next release will use stdcall. Sadly they didn't comment on the exported symbol naming problem. So at this point I cannot be sure that their next release will be fully 'correct' and fully compatible with AMD's SDK on a binary level. I think nVidia will do the right thing though.
There was no actual reply from Khronos itself on the matter though. So it looks like this problem was mainly solved between AMD, nVidia and some developers who were using OpenCL and who pointed them into the right direction.
Scali - Monday, November 9, 2009 - link
nVidia's Cuda 3.0 beta release also fixes the calling convention/function naming problems. AMD's beta4 SDK and nVidia's Cuda 3.0 beta are now binary compatible.I've also found that the new nVidia OpenCL release solves quite a few performance issues. OpenCL now runs very well.
On the AMD side, AMD still goofed up in beta4. I tried running the CPU implementation, and it complained about missing atical*.dll files. If you don't have an ATi card in your system, you can't install the Catalyst driver that contains those files. So you have to manually extract the files and place them in the same directory as OpenCL.dll.
But after I had done that, I could run some nVidia samples on the AMD CPU driver, and I could run some AMD samples on the nVidia driver. So the binary compatibility is a fact, and developers can now test their code on two implementations.
jebo - Wednesday, October 7, 2009 - link
"Finally to wrap this up, we have the catalyst of this story: drivers."I see what you did there
Per Hansson - Wednesday, October 7, 2009 - link
The WinXP x64 nVidia 189.91 drivers with OpenCL support did not make an entry for the uninstaller in add/remove programsIt was a bit of a bummer because I installed it in hope of being able to run the tech demo "NVIDIA’s ocean demo" or "DirextX compute ocean" with them (Which Anand used in the 5870 review)
However there drivers did not seem to support it...
Scali - Wednesday, October 7, 2009 - link
OpenCL and DirectCompute are completely independent APIs.The Ocean demo is DirectCompute, which is part of DirectX 11, and has nothing to do with OpenCL. As such, it's not going to work on XP anyway. You need Vista or Windows 7.
Per Hansson - Wednesday, October 7, 2009 - link
Err, meant 190.89Zingam - Wednesday, October 7, 2009 - link
OpenCL is my bitchScali - Wednesday, October 7, 2009 - link
I looked through some of the OpenCL stuff I compiled with the nVidia SDK, and it just links against a generic OpenCL.dll.Theoretically it should just work fine with any OpenCL.dll, as long as it exports the same functions (which it should, if it's OpenCL 1.0-conformant).
So I don't think there's a binary dependency on a manufacturer there. The only 'problem' is that the binary will just link against whichever OpenCL.dll it finds first. So if you have multiple OpenCL devices installed, you'd probably have to drop your preferred OpenCL.dll into the same directory as the application, to ensure it runs on the proper device, as without an ICD, each DLL will only enumerate the devices from its manufacturer.
Ryan Smith - Wednesday, October 7, 2009 - link
Yes, I am sure about the binary dependencies.Scali - Wednesday, October 7, 2009 - link
Care to elaborate then? As I say, if I just place a different OpenCL.dll into the directory, the same application can run on another vendor's hardware.Hence no binary dependency. You don't need to recompile the application.
So what are you talking about, if you are so sure?
Scali - Wednesday, October 7, 2009 - link
Ah, I see the problem.For some reason AMD used cdecl exported names, they now all have leading underscores. nVidia uses stdcall, which is the default in Windows, and also used in OpenGL32.dll under Windows.
I think AMD made a snafu there in their SDK.
tweakoz - Wednesday, October 7, 2009 - link
IIIIIII CCCCCCC DDDD !I C D D !
I C D D !
I C D D !
I C D D !
I C D D
IIIIIII CCCCCCC DDDD !
;>
Scali - Wednesday, October 7, 2009 - link
As I already said, an ICD isn't going to help AMD here. Either way, they have to recompile their code to use stdcall.Besides, for most people an ICD isn't that important. As long as they have an OpenCL for their videocard, which would work, it'd be fine. And that OpenCL dll would just be automatically installed with the videocard drivers.
Only people who would specifically want to run OpenCL on their CPU, or who have more than one brand of videocard in their machine, would benefit from an ICD. That's not the issue here. The issue here is that there's no way you can get nVidia's and AMD's libraries to play nice. One of them needs to recompile, and I think it's going to be AMD, since you will want OpenCL to be consistent with OpenAL and OpenGL in calling convention.
tweakoz - Saturday, January 9, 2010 - link
>>Only people who would specifically want to run OpenCL on their CPU, or who have more than one brand of videocard in their machine, would benefit from an ICD. That's not the issue here.Um, that would be me. Both the CPU and multiple cards.
ICD.
mtm
tweakoz - Wednesday, October 7, 2009 - link
doh....;>
mfago - Tuesday, October 6, 2009 - link
Anyone have any experience with development tools for CUDA/OpenCL? The (Anandtech) Fermi article mentions Visual Studio integration is coming -- how about something similar on Linux? Any experience with the current NVidia OpenCL profiler, or know of something (perhaps $$$) that is even better?Thanks!
- Matt
bobvodka - Tuesday, October 6, 2009 - link
I would suspect that, iff everything goes well, we can expect OpenCL support in the next driver drop from AMD which is more than likely going to be earlyish this month (I would suspect at the latest just before the Win7 launch date).Scali - Wednesday, October 7, 2009 - link
I doubt that, really.I don't think Khronos will even have it tested by that time, they took more than a month on nVidia's drivers aswell. Which means that they'd be through the tests in late October (they were sent to Kronos on the 21st of September).
Also, nVidia kept the drivers in beta another 3 months, before making it a first public release, and they haven't put them in the official driver release yet (as the article says, 190.89 supports it, but the recently released 191.07 doesn't yet).
I would think that AMD will also keep the drivers in beta for a while, even if they do pass OpenCL-conformance testing. After all, that only proves that the OpenCL portion works, it doesn't prove that the drivers as a whole work. They'd still need to be tested for regular functionality and pass WHQL. They'd also have to be merged back into the main release tree, as they'll be a few months behind regular releases by this time (just like nVidia released a new driver a few days ago still without OpenCL).
So no, I don't expect OpenCL drivers this month, probably not even next month.
drmo - Wednesday, October 14, 2009 - link
Sorry, they were released today: Oct 13.http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGR...">http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGR...
Granted, it is beta, but let us not try to make predictions when we don't really know what we are talking about.
Scali - Wednesday, October 14, 2009 - link
My prediction still stands, I don't think the OpenCL driver will be in this months Catalyst release, or in next month's.drmo - Wednesday, October 14, 2009 - link
I don't see why they would release them in the Catalyst drivers until there are actual applications to use OpenCL. It wasn't clear to me that you were referring to an official Catalyst release.I think Nvidia is at pretty much the same point (but having released them earlier (about 2 weeks)) with beta drivers released. Has Nvidia put them in their official public release? I don't even know why they would do so, with no programs to use it; it would just make the download larger.
Scali - Wednesday, October 14, 2009 - link
"Also, nVidia kept the drivers in beta another 3 months, before making it a first public release, and they haven't put them in the official driver release yet (as the article says, 190.89 supports it, but the recently released 191.07 doesn't yet).I would think that AMD will also keep the drivers in beta for a while, even if they do pass OpenCL-conformance testing."
I think it was pretty clear...
And nVidia's drivers aren't beta anymore. They're a public release, just not the main release. The 190.89 drivers were beta for a number of weeks, available to registered developers, before they made them public.
It seems the same goes for AMD now. They're in the SDK beta 4, but you can only download them after you're registered as a developer.
And it's a chicken-and-egg problem. As a developer I find it very important that they release public drivers ASAP, because I want to know what drivers my software is expected to run on. Drivers first, applications later. It doesn't work the other way around.
I can't release applications if users need to register as developers with their IHV, and then download a beta SDK just to run my application. I also cannot guarantee that my application will run with future updates to the beta SDK/drivers.
Besides, don't forget that running beta drivers will also impact performance, stability and compatibility with other software. I cannot recommend end-users to run beta drivers rather than the official supported drivers.
So plenty of reasons to put them in the official release, no reasons not to (unless you want to sabotage the adoption of OpenCL).
drmo - Wednesday, October 14, 2009 - link
According to people who have downloaded them, they are labeled as Catalyst 9.11 RC, which suggests they could be released in November. Thank you for clarifying; I'm not a developer, so I didn't really consider the implications for developers. Looking on Nvidias website, I didn't see the download for the 190.89 drivers, so I thought those were a beta as well.Scali - Wednesday, November 18, 2009 - link
Well, the 9.11 drivers are released, but I haven't seen any mention of OpenCL in the release notes.bobvodka - Wednesday, October 7, 2009 - link
Hmmm, you raise good points there.My 'this month' thing mostly depended on them submitting the OpenCL stuff for certaificantion early last month.
Scali - Tuesday, October 6, 2009 - link
You don't need CPU-drivers for OpenCL. All you need is nVidia's GPU-accelerated OpenCL drivers, and you're up and running.So the current status is like this:
- People with an nVidia card will be able to run OpenCL by installing the proper drivers (version 190.89 currently).
- People with anything other than an nVidia card can currently only run a CPU-based OpenCL implementation. AMD currently supplies a CPU implementation in their Stream beta SDK, which works on both AMD and Intel CPUs, although obviously it was aimed at AMD CPUs. Intel currently hasn't released anything, not even in beta.
There's also some OpenCL support in the works with the opensource Mesa/Gallium3D project.
JarredWalton - Tuesday, October 6, 2009 - link
"For NVIDIA GPU users with Intel CPUs, they'll be waiting on Intel for a CPU driver. Do note however that a CPU driver isn't required to use OpenCL on a GPU, and indeed we expect the first significant OpenCL applications to be intended to run solely on GPUs anyhow. So it's not a bad situation for NVIDIA, it's just one that needs to be solved sooner than later."OpenCL has both CPU and GPU implementations, so if you want a fully utilized system with OpenCL you need to have CPU and GPU drivers. At least, that's how I understand it, and of course near-term the GPU driver is going to prove more important than the CPU driver.
Scali - Wednesday, October 7, 2009 - link
I just think it's a very confusing thing to write. It shouldn't even have been mentioned in the first place.Firstly, obviously nVidia doesn't need to provide a CPU driver, since that's the job of the CPU manufacturer. Just like nVidia doesn't need to provide a GPU driver for AMD hardware either.
Secondly, there's absolutely no need to have a "fully utilized system" with both CPU and GPU drivers. As you say yourself, the CPU driver isn't going to be important.
Thirdly, AMD never said anything about nVidia not having a CPU driver. The point with AMD (which is REALLY lame) is that they keep pretending that nVidia doesn't support OpenCL at all, and that nVidia is only pushing its proprietary C for Cuda and PhysX standards, so AMD is trying to look like the 'good guy', while in reality nVidia is the one offering OpenCL support (not to mention that AMD also tried a proprietary standard first, but only nVidia actually managed to get some software support with their proprietary technology). In fact, AMD even claimed that they were the first to support DirectCompute with the launch of the HD5870 (if they added CS5.0 to the statement they would be right, but they didn't). This is also not true, since nVidia has supported DirectCompute since the first 190-release drivers in July, months before the HD5870 was launched.
drmo - Wednesday, October 7, 2009 - link
"In fact, AMD even claimed that they were the first to support DirectCompute with the launch of the HD5870 (if they added CS5.0 to the statement they would be right, but they didn't). "Most of the releases and news reports I read specifically said the first WHQL certified DirectX 11 and DirectCompute 11 driver, not that they were the first with DirectCompute.
http://www.amd.com/us/press-releases/Pages/amd-pre...">http://www.amd.com/us/press-releases/Pages/amd-pre...
Scali - Wednesday, October 7, 2009 - link
I'm talking about statements like here:http://www.hpcwire.com/topic/developertools/AMD-Su...">http://www.hpcwire.com/topic/developert...Review-b...
"•AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors are expected to be the first to support accelerated processing on the GPU through DirectCompute."
This statement was made in late September, while nVidia already released WHQL drivers with DirectCompute in July:
http://www.nvidia.com/object/win7_winvista_32bit_1...">http://www.nvidia.com/object/win7_winvista_32bit_1...
"Supports Microsoft’s new DirectX Compute API on Windows 7."
drmo - Wednesday, October 14, 2009 - link
I read that to mean that they are the first with DirectX 11 GPUs that support DirectCompute. It seems that later press releases have made it more clear that they meant DirectCompute 11.Scali - Wednesday, October 14, 2009 - link
I don't see how you can read it to mean that. You *know* that's what it's supposed to mean if you are up to speed with the subject... but if you don't, there's no way you could read it like that, because a key piece of information was simply omitted from that statement.drmo - Wednesday, October 14, 2009 - link
Not really, after "first", you are assuming it should have GPU, whereas the subject was "DirectX 11 enabled graphics processors". However, I agree it is not absolutely clear, but I don't think it was purposeful, since the subsequent press releases clarified the statement.Scali - Wednesday, October 14, 2009 - link
"AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors are expected to be the first to support accelerated processing on the GPU through DirectCompute.""The first" refers back to "AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors".
That part is very clear.
The problem is with this: "accelerated processing on the GPU through DirectCompute."
Your suggestion doesn't make sense...
You would get:
"AMD's upcoming next generation ATI Radeon family of DirectX 11 enabled graphics processors are expected to be the first family of DirectX 11 enabled graphics processors to support accelerated processing on the GPU through DirectCompute."
(Note that 'first' now takes on a slightly different meaning, the function of the word in the sentence changes).
What you have now is a kind of pleonasm. Since AMD's GPUs are the first DX11 GPUs, they are obviously the first DX11 GPUs to support whatever feature.
I'm sure that's not what they meant to say. It's just too far-fetched.
Ryan Smith - Wednesday, October 7, 2009 - link
Jarred already said most of what I want to say, but I will add something.It may be a confusing thing to write, but I consider it a critical point none the less. From what we're seeing out of the Apple developer camp, OpenCL is going to be big on the CPU. It won't be as big as it is on the GPU (no one is going to write something in OpenCL that they only intend to run on x86 processors), but big none the less.
In the mean time we have this crazy situation where you need drivers from multiple sources in many cases to get a complete driver stack. And even with drivers, it's all a mess without the ICD.
My fundamental point right now is that in spite of having a complete spec and certification, the OpenCL situation is very, very screwed up on Windows and Linux. When most of my time talking to contacts is composed of them trying to answer "who is responsible for what", there's a problem.
For OpenCL to succeed there needs to be full GPU and CPU drivers for all platforms, and an ICD to tie them together. We're not there yet.
Scali - Wednesday, October 7, 2009 - link
Why is that a crazy situation?The same goes with OpenGL or Direct3D. You can have multiple devices, even from multiple vendors, and just enumerate through all of them.
You will ALWAYS have to have drivers from multiple sources, the ICD won't solve that. Even though AMD and Intel might package both their CPU and GPU drivers into a single downloadable package, they will STILL be two independent implementations, and two independent drivers. So from a technical point-of-view, it doesn't really matter whether CPU and GPU drivers come from the same manufacturer or not.
I suppose the best solution would be for Microsoft to offer CPU drivers through Windows Update. They already deliver GPU drivers through Windows Update, so eventually those will be updated to drivers with OpenCL support. If they also solve the CPU-part of the equation, the end-user doesn't even have to know about OpenCL.
Thing is that you make it sound like it's somehow nVidia's fault or responsibility to supply CPU drivers, and that's very confusing (and AMD has never said anything of the sort either).
tweakoz - Wednesday, October 7, 2009 - link
>>You will ALWAYS have to have drivers from multiple sources, the >>ICD won't solve that.Yeah, but right now I have to link my OpenCL/Windows program with either the ATI Stream SDK, OR the NVIDIA OpenCL SDK. I can not do both (unless I do a plugin OpenCL driver layer for my program, thats a PITA I don't want to deal with).
The ICD, with enumeration of installed drivers built in is CRITICAL for developers ease of use.
mtm
Scali - Wednesday, October 7, 2009 - link
Technically you don't.As long as you link to 'a' OpenCL.dll, it should be fine.
The problem here is that AMD uses a nonstandard calling convention in their OpenCL.dll. That's why linking to their stuff doesn't work for nVidia and vice versa.
nVidia uses the same standard as Microsoft uses, and also OpenGL and OpenAL use, so I think AMD is the one who made a mistake here.
If AMD had used the same standard calling convention, we wouldn't have this problem. Then all functions could just be automatically imported by name.
Besides, the ICD won't solve this problem. The calling convention that AMD uses also has caller stack cleanup, rather than callee stack cleanup. You'd get stack corruption. They just need to fix and recompile their code.
tweakoz - Wednesday, October 21, 2009 - link
What if you are trying to use multiple devices (from different vendors) simulultaneously?mtm
Scali - Wednesday, October 21, 2009 - link
That's the only time when you'll need to have an ICD.However, in most cases just a single device will be fine. Developers or end-users wanting to use OpenCL on a single GPU (or a set of GPUs from the same vendor) is by far the most common scenario.
Zool - Wednesday, October 7, 2009 - link
First nvidias drivers are for gt200 and lower and just gpu-s so its no reason for nvidia not to hawe drivers first.Second amd-s new gpu had come out this month so its obvious that they waited with the drivers so they hawe everything in one.
Also they hawe cpu+gpu support and i think they prepared it with CPU and GPU merge in mind later on.
Scali - Wednesday, October 7, 2009 - link
AMD still doesn't have GPU drivers for OpenCL at all.And they currently only support DirectCompute on the HD5800-series. The 4000-series should also support DirectCompute CS4.1, but no drivers in sight.
There's no reason for AMD to delay 4000-series support for newer hardware. And there's even less reason for AMD not to support them even after releasing newer GPUs.
Apparently nVidia supports their existing customers much better. They too have a new GPU upcoming, but that didn't stop them from supporting all their existing customers, all the way back to the 3-year old 8800-series. AMD 'only' has to support the 4000-series, as the other hardware isn't capable anyway, but still, nothing.
Zool - Wednesday, October 7, 2009 - link
Testing and certificating drivers takes quite a time. I dont see to much reason to make them first for 4k series than cpu-s and than also for the whole 5K series (which is actualy quite more cards than 5870 and 5850).It would make much more sense to make it at once. Which nvidia also did but with the 1 year old GPU as their latest.
Scali - Wednesday, October 7, 2009 - link
Yes, testing and certifying drivers takes time. That's no excuse though, is it? nVidia has to go through the same process, and nVidia actually supports 3 major series of GPUs (G80, G92 and GT200) in their drivers, AMD only has to do two (RV770 and RV870).Besides, especially in the case of OpenCL, that has NOTHING to do with new GPUs. The OpenCL project started a long time ago, and AMD promised us drivers in the first half of 2009.
And even now that their new GPU is on the market, there STILL aren't any OpenCL drivers. AMD just failed to deliver on their promises.
And if I owned an HD4870, what do I care about the new GPU? Why should I wait? I just want the OpenCL support that AMD promised me. Especially when AMD shouts their mouth off about OpenCL and GPU-accelerated physics in the press. nVidia owners don't even NEED to wait for OpenCL, because they can already get accelerated physics and such through Cuda. Yet they get OpenCL before AMD owners do aswell. AMD should stop marketing vapourware and start supporting their customers.
tamalero - Wednesday, October 7, 2009 - link
I'm the only one that seen the news about Fermi board mockup that wasnt even a real Fermi board in the first place?talk about vaporware o_O
Amiga500 - Wednesday, October 7, 2009 - link
I think its a fair guess to say you are affiliated with Nvidia.It'll come in time. I for one, would rather wait a bit for drivers that encompass both the CPU and GPU.
Scali - Wednesday, October 7, 2009 - link
Nope, completely wrong guess.Titanius - Wednesday, October 7, 2009 - link
If I had to take a guess at it, you are SiliconDoc the annoying commentor from the Fermi article with a different name and using less caps locks...did you finally calm down?Oh and BTW, WE DON'T CARE who came out first with the drivers, they will eventually both have it.
My understanding of AMD's lateness to the "party" is because they were busy launching the HD5xxx series cards and didn't have the same "free" time on their hands as NVIDIA has with their phantom GT300/Fermi release (I'm saying phantom because it has been talked about but there is no final product in testing (I don't refer to the first silicons of a new product as final).
NVIDIA came first for OpenCL, but AMD came first for DirectX 11, GET OVER IT!
dragonsqrrl - Wednesday, October 7, 2009 - link
Scali sounds like someone who is interested in GPGPU computing, more specifically wide opencl support across a mature platform, which opens up the GPU for applications other then gaming, definitely not like the scathing often misinformed comments of SiliconDoc. And you're making the argument that it doesn't matter who's first to support knew technologies, after you criticize Nvidia's "phantom GT300/Fermi" announcement for being late, and at the same time sympathizing with AMD's first to dx11 HD5870 launch? You're comment ended up sounding a lot more like SiliconDoc then then anything Scali wrote.Titanius - Thursday, October 8, 2009 - link
You are a bit naive and you obviously cannot read between the lines. I didn't praise anyone, hell if I use your argument that I praised someone, that would mean I praised AMD for their first to market DX11 cards and NVIDIA for their first to market OpenCL drivers, I did neither. I simply mentioned facts and how irrelevant this argument about AMD being late to the game is compared to NVIDIA also being late in another regard. If you don't understand that, well you're a lost cause.As for mentioning the "phantom Fermi" comment, I am sorry you can't comprehend sarcasm. I'll stop making too much sense in the future.
But the point of my comment stands, GET OVER IT!
BTW, regarding Scali and SiliconDoc, they are both using the same type of argument and don't seem to stop arguing when everyone is trying to smarten them up, that is where I saw the similarity. As for me, I know when to stop...unlike some people.
Maian - Wednesday, October 7, 2009 - link
I doubt it. Scali sounds like a developer anxious to develop OpenCL apps for whatever purpose and possibly doesn't care about DX11 at all (e.g. not developing a game/gfx acceleration or targeting non-Windows platform). If I were in his shoes, I too would annoyed at AMD, since they're "blocking" access to a substantial marketshare of GPUs with their late drivers. As a developer myself, I don't give a shit about which vendor is better - I care about what features I can play with and how much hardware in the market will support those features.Scali - Tuesday, October 6, 2009 - link
By the way, the missing feature in pre-4000 series GPUs from AMD is shared memory.Pre-4000 series GPUs also won't be able to use DirectCompute, even with CS4.x downlevel. Again nVidia's 8-series and higher will all support DirectCompute CS4.0.
bobvodka - Tuesday, October 6, 2009 - link
Unfortunately, it's the SM5.0 profile which has the more useful things in it, such as the Interlock functions which (as I understand it) don't work on SM4.0 hardware when it comes to DirectCompute.From a gaming devs point of view these are pretty vital for various processes (such as single pass luminance or single pass deferred lighting), which imo reduces the usefulness of DirectCompute on anything pre-SM5.0, certainl in the games sphere.
Scali - Wednesday, October 7, 2009 - link
I don't think so, to be honest. Why would you need interlocking for deferred lighting?It is unfortunate though that there is no interlocking support in CS4.0, since only the original G80 series doesn't support it. G92 and later do have interlocking and various other additions which aren't exposed through OpenCL or DirectCompute.
You actually see complaints about DirectCompute in the nVidia GPU Computing SDK, such as:
// Notice: In CS5.0, we can output up to 8 RWBuffers but in CS4.x only one output buffer is allowed,
// that way we have to allocate one big buffer and manage the offsets manually. The restriction is
// not caused by NVIDIA GPUs and does not present on NVIDIA GPUs when using other computing APIs like
// CUDA and OpenCL.
bobvodka - Wednesday, October 7, 2009 - link
For single pass Interlock functions are used to work out the depth of a "tile" for processing and accumalate which lights are effecting that "tile" before performing the lighting resolve.Granted, the process could be carried out without such things, it would probably require more passes however and generally be less efficient.
Scali - Wednesday, October 7, 2009 - link
Depth of a tile? Are you now confusing tile-based rendering with deferred rendering, not to mention confusing compute shaders with conventional rendering techniques?And even then, I still don't see why interlock functions would be required.
bobvodka - Thursday, October 8, 2009 - link
No, I'm not confusing anything, it was an idea put forward by Johan Andersson of DICE at the Siggrap 09 conference.You do the standard deferred rendering step for creating a g-buffer
Then you dispatch compute shaders in blocks of 16 pixels aka a 'tile'
Each thread then retrieves the z depth for its pixel from a g-buffer; interlockMin and interlockMax at then used to obtain the min/max z for that block of pixels
The compute shader then goes on to calculate which lights intersect this 'tile' given the extents and min/max data (processing 16 lights at once)
The light count is increased using an interlockAdd for each intersecting light and the light index is stored in a buffer
Finally, the compute shader goes back to pixel processing, where each thread in the group sums the lighting for its pixel and writes out the final data.
No confusion at all and a good example of how a compute shader can be used to calculate and output graphics.
Scali - Thursday, October 8, 2009 - link
It's just one very specific example. It's a gross over-generalization to say that anything lower than CS5.0 is useless for graphics based on this single example.In fact, your entire hypothesis is wrong. You go from "If CS5.0 can do it better, CS4.x is useless".
The correct hypothesis would ofcourse be "If CS4.x can do better than using only conventional geometry/vertex/pixelshaders, then CS4.x is useful".
bobvodka - Thursday, October 8, 2009 - link
I never said CS4.x was 'useless' I just said that due to the lack of Interlock functionality its usefullness was reduced.Will it have some uses? Of course, and I dare say it will allow some cool things to be done, however when compared to CS5.0 profiles with, in perticular, the Interlock stuff then some things become harder to do or indeed impossible in a single pass. (see previous example).
I agree that its unfortunate that the CS4.0 profile doesn't support Interlock, but I guess they had to draw the line somewhere.
Scali - Thursday, October 8, 2009 - link
Yea well, I just think you have an odd perspective.Obviously CS5.0 is better, and obviously DICE wanted to promote the new DirectX 11 features.
Bottom line is however that we've not had CS at ALL yet, in DirectX, even though there's a huge installed base of DX10 cards capable of CS4.0 or CS4.1. There isn't a lot of DX11 hardware out there yet.
Therefore in the short term CS4.x is going to be the more interesting one, as it allows you to implement new functionality like realtime tessellation, physics etc, or to make more efficient implementations of existing technologies like post-processing/SSAO and all that. CS4.x is just a nice shot in the arm for all that DX10 hardware out there.
On another note, you also have to put CS5.0 into perspective. Interlock makes it easy to code certain things a certain way, but it's no guarantee that it will also be efficient. That depends largely on how the hardware implements interlocking. Think back to the conditional branching that was introduced in PS3.0 for example. In many cases it was actually faster to just use a multipass algorithm using alphatest, which only required PS2.0, simply because the branching itself wasn't implemented in a very efficient way, unlike alphatesting.
So while the DICE solution looks nice and efficient, it doesn't necessarily have to be all that much faster than a more bruteforce multipass algorithm. In fact, if I had to choose, I'd rather implement a CS4.0 algorithm that improves performance on all DX10 (and DX11) hardware, than to go for a CS5.0 algorithm that doesn't work on DX10 hardware, and may only be marginally faster than a CS4.0 algorithm on DX11 hardware (which is already the fastest hardware on the market anyway, so it's not the hardware that needs the performance increase most anyway).
If you want a nice case, look at PhysX. It even runs on the G80 architecture, which doesn't support interlocking. So there's great things you can do with compute shaders without interlocking. It would be very nice if developers would use CS4.0 for such physics effects.
haplo602 - Tuesday, October 6, 2009 - link
currently Nvidia is the only vendor with good linux support for opengl/opencl. AMD is behind in opengl and very much non-existent for opencl.stmok - Tuesday, October 6, 2009 - link
The entire Linux graphics stack is being overhauled.The current one has issues to address, and its the very reason why there's nothing like DXVA on Linux. Nvidia worked around this deficiency by coding their own approach...VDPAU. But this only works with Nvidia GF8 or newer cards and closed drivers.
Closed drivers mean Nvidia must keep up with Kernel and Xorg versions. If they don't, you are at their whim and have to wait. If they choose to drop support for a specific era of hardware, you are SOL.
Right now, Linux is gradually moving to the new stack. This is going to take time and cause pain to some users. (As infrastructure changes often do)...But the benefit is that things like HD playback accelerated and OpenCL will be supported in the long run. Features won't be hardware brand specific like VDPAU is...So far, there's very raw code for OpenCL support in the new stack. (Someone has started something, but its not really usable...More like an initial thing to see if its feasible.)
As for AMD, the greatest thing they've done was release the documentation specs for open driver development. Right now; 2D and X-Video is done for all Radeons up to 4xxx series; with power saving (PowerPlay) and 3D features being worked on...They "kind of work", but are really buggy.
Overall, if you need 3D acceleration now, you have no choice but to use Nvidia cards and closed drivers.
In the long run (5 yrs+ away?), Radeon's may have the advantage of having open driver support. ie: Radeons will work out-of-the-box without too much fuss. One less step in setting up a Linux box.
jackylman - Friday, October 9, 2009 - link
I'm using ATI's open-source 3D right now on a RadeonHD 4000 card. I don't find it buggy (runs compiz, googleearth, some games).Depending on your level of 3D need, closed-source drivers are no longer the only choice.
Deanjo - Wednesday, October 7, 2009 - link
Wow, talk about a epic failure of understanding what vdpau is.vdpau, is NOT vender specific. It is a freely open standard that other graphic venders are free to implement as well in their drivers support either through native driver solution or through use of a wrapper. vdpau is a API. vdpau is not isolated to nvidia cards as well. The S3's chrome 5 series for example has native vdpau support.
http://drivers.s3graphics.com/en/download/drivers/...">http://drivers.s3graphics.com/en/download/drivers/...
In fact it is now part of freedesktop.
http://lists.freedesktop.org/archives/xorg-announc...">http://lists.freedesktop.org/archives/xorg-announc...
Saying vdpau is "vendor specific" is nothing but pure BS. You might as well say openCL is "vendor specific" as well by your definition of "vendor specific" since they are the only ones right now with openCL support in their drivers.
stmok - Wednesday, October 7, 2009 - link
Well, if I'm wrong I apologize. But there's no need to act like a dick about it.How hard is it to politely correct someone Deanjo? Apparently in your case, too hard.