Ok, this isn't really a complete "HOWTO" - it's just a summary of the experiences I had with getting NVidia drivers to work properly with a 64-bit Linux distribution with my current hardware (although this will equally apply to a 32-bit distro).
It seems that there are an ungodly number of things that can go wrong (when they do go wrong) with NVidia drivers on the Linux platform. Usually things work fine after installation, be it by using the supplied NVidia installer or whatever method your distribution suggests. Unfortunately, my problem was unsolvable by all means that you will find on the web. That's right: I went through everything short of replacing my entire PC to solve this problem. What was my final solution? Well, if you're looking for a solution to your own NVidia woes, I would suggest you not read the rest of this paragraph and skip to the next one where I give you some hints on what to try when troubleshooting. My solution: I flashed the video card BIOS to that of a completely different vendor. While that doesn't make sense that you'd have to do this (and you won't find anyone from NVidia that will suggest you should do it), it was what I had to resort to to get things to work.
Symptoms: When any 3D (openGL) application is started, the PC will lock up hard - usually requiring a manual reset (meaning finger push button reset). The PC may not lock up hard, but freeze and become garbled instead; you may even be able to continue working but with a garbled screen (I couldn't). To reproduce this problem, simply run 'glxinfo' in a console - that should be enough to do it.
This occurs at least with the following distributions: Gentoo Linux, Ubuntu, and Suse - regardless of whether a 32-bit or 64-bit distribution is being used.
My hardware:
Motherboard: ASUS A8N32-SLI Deluxe
Graphics card: Gainward 7800GT PCIe
HDD: 2x Maxtor 6V300F0 300GB
PSU: 650 Watts
CPU: AMD X2 4400
RAM: 2x 1GB Kingston DDR400
First of all, you have to have actually successfully installed the NVidia drivers. Unless you've disabled it, you will see an NVidia splash screen when X starts. If you don't have this part right, then there are many resources on the web to help you depending on your distribution. It's usually always better to do the install using the method your distro prefers: emerge on Gentoo, apt-get for Ubuntu, or yast for Suse. If you instead try to use the supplied installer from NVidia's website, you could run into library problems later (which is the case with Gentoo).
Here is the list of things to check when you have the lockup problem (in no particular order):
1 - Passing Kernel Parameters
There are some kernel parameters that may or may not influence the performance or stability of the driver. They are:
CODE:
-
noapic
-
acpi=off
-
noacpi
-
nolapic
You should try passing these to the linux kernel when you boot - or some combination of them. "noapic" and "acpi=off" seem to be popular.
2 - IRQ Conflicts
Your NVidia board should have an assigned IRQ. When the nvidia driver module is loaded, execute this:
You should see something like this:
CODE:
-
CPU0 CPU1
-
0: 7226051 0 IO-APIC-edge timer
-
8: 0 0 IO-APIC-edge rtc
-
9: 0 0 IO-APIC-level acpi
-
14: 16276 0 IO-APIC-edge ide0
-
15: 88998 0 IO-APIC-edge ide1
-
50: 0 0 IO-APIC-level libata
-
58: 243948 0 IO-APIC-level ohci_hcd:usb1
-
66: 113605 0 IO-APIC-level NVidia CK804
-
74: 3148469 0 IO-APIC-level ohci1394, sky2, nvidia
-
225: 3586632 0 IO-APIC-level libata, eth0
-
233: 693809 0 IO-APIC-level libata, ehci_hcd:usb2
-
NMI: 3559 2995
-
LOC: 7226318 7226296
-
ERR: 0
-
MIS: 0
Of course your output will look different. The point here is that the "nvidia" module has an interrupt number, and that it doesn't share it with other peripherals. In my case you see that it is sharing an interrupt with ohci1349 (firewire) and sky2 (ethernet) drivers. Try disabling or moving the peripheral that is sharing an IRQ number with the nvidia module. I disabled the hardware that shared with the nvidia card (of course, to no avail).
If you have no interrupt assigned to your video card at all, check that in your BIOS settings (of your motherboard) that you have "Assign IRQ to Video" enabled (or something similar).
One more note: the "type" of interrupt assigned to the nvidia board should be "level", not "edge". The driver module probably won't load if it is not "level".
It has been noted that Creative boards (SoundBlaster whatever, Audigy, etc) like to conflict with NVidia boards. If you have one, move it to a different slot or take it out temporarily for testing.
3 - Check For a Motherboard BIOS Upgrade
Normally there should be no reason to upgrade your motherboard BIOS. On occasion, something may have been fixed; but more often than not you're bound to introduce a new problem. If you start asking around for help from NVidia, they will tell you to do this, even if you're sure you don't need to (or if there is no update available for your board) - so you may as well get this item out of the way.
4 - Strange Motherboard Settings
Some BIOS settings can interfere with your NVidia card. If you have a disabled NX/XD-Bit (NoeXecute/eXecuteDisable) , you should try changing that. If you have onboard virus protection, disable it. If you have overclocking related BIOS options, turn them off. For example, my ASUS has an automatic overclocking feature - it should be disabled (later you can turn it on if you get things working, of course). Also, "PEG Link Mode" should be normal.
While you're at it, you can disable all onboard things you're not using: firewire, USB (if you can deal without USB mouse/keyboard), serial / parallel, audio, etc. That way you can eliminate them as possibly conflicting with your NVidia card.
5 - MEMTEST86
Ok, this should actually be #1, but I had assumed in the beginning that your system already has a known good and stable configuration. This is absolutely essential: run memtest86 . Some say you should run it for 10 hours (or overnight); the only time I ever found errors on a bad RAM module they appeared after a minute or less.
If you have an Ubuntu or SUSE boot CD, it has memtest86 as an option (it can't get easier than that). Otherwise, download this small (but excellent!) recovery CD and choose the memtest boot option.
If you have any RAM problems whatsoever, you need to fix this first - it's probably the culprit.
6 - AGP and Your xorg.conf File
There are a couple of settings that could affect your stability issue; primarily if you are using AGP (I am not). I haven't heard that these affect PCIe in any way, but I'm putting this here anyway since it seems to be a common problem among lots AGP users. From the NVidia README:
Option "NvAGP" "integer"
- Configure AGP support. Integer argument can be one of:
Please note that NVIDIA's internal AGP support cannot work if AGPGART is either statically compiled into your kernel or is built as a module and loaded into your kernel.
Try 0 and see what happens. If it works, then you have AGP issues. You might have to remove AGPGART support from your kernel and use the right NvAGP option.
Also, look for "RenderAccel" and give it the parameter "false":
Option "RenderAccel" "false"
Later you can enable this if you get things working.
7 - Manually Upgrade Drivers
Even though I said you should use the proper installation method depending on your distro, try uninstalling the drivers and getting the newest from NVidia's website. Note: If you have an older NVidia card, you need to use the "Legacy" drivers supplied by your distro (and by NVidia).
Be careful with updating your drivers this way, as you can end up with library problems / improper links that could lead to further problems.
8 - Incompatible Libraries
It has been noted on some distros (Gentoo) that if you install the driver using NVidia's install mechanism instead of the default "right way" (emerge, apt-get, urpmi, whatever) you'll run into problems with old libraries lying around in places they shouldn't - which end up causing conflicts and odd behavior. If you've done this, then I can't really help you - but you should probably start by uninstalling any versions you have installed and manually looking for leftover library bits and removing them. Good luck.
9 - Reinstall X, Mesa, and Dependencies
Ok, this is a pain to do, but it's one of the things I did. Completely reinstall X and its dependencies - or if it's easier, just reinstall your whole distribution.
10 - Different Distributions?
Ok, perhaps a different distribution will fix your problem. Why? Well, different kernels might work differently with your board. If you don't mind trying out a different distribution, give this a shot. You won't find the real cause of your problem this way, however.
SUSE linux was the only distro I tried that worked 100% perfectly after an install (it detected my motherboard, and that I have a RAID controller, which it informed me would not be supported in RAID mode, which I already knew but no other installer chose to tell me). Ubuntu, Mandriva, and Gentoo could give you anywhere from a few to many niggles after installation.
11 - Vanilla Kernel Compilation
Another thing that NVidia support will ask you to do (this means go to kernel.org and get the latest official stable release and use that as your kernel). When you do this, I recommend only enabling the bare minimum you need (and the options that are required for the NVidia driver - see this page for more info.)
12 - Stability / Heat Issues
If you are actually able to use the drivers for 3D applications for a short period of time (and thereafter a lockup), then you don't have the same problem as I did, but you might have a stability problem due to overheating. You can monitor the temperature of your GPU in Linux, but I won't go over how to set that up. If available, I would suggest borrowing Windows from a friend, installing it and the NVidia Windows drivers, and then installing the NVidia NTune utility. It should come with a stress test that will notify you of any stability problems. While you're at it, check to see if 3D applications work properly in Windows for you (in my case, they worked flawlessly).
If you do have a heat problem, you probably should return your card. If you can't, then you could attempt to pull off the fan/heatsink contraption and apply fresh thermal paste to the offending chips. Google around for guides on how to do this.
One more thing: cheap power supplies will give you never-ending stability issues. Make sure that you have a big enough capacity power supply. Also, not all power supplies are the same: 500W from one brand will not "be the same" as 500W from another manufacturer. If you have a no-name cheapy powersupply with borderline specs for your system, get a newer better one (I did this unnecessarily - but it was cheaper than getting a new MB or video card).
13 - Maxtor Drives and NForce Incompatibility
The particular model drives I have (Maxor 6V300F0) have a nasty bug that causes instability with NForce chipset motherboards. Not all drives have this bug; certain firmware revisions do. There is a fix out there that Maxtor will not give you for some reason. Do a search to find out where to download the fix (I don't have it; I didn't need it, since my firmware revision was new enough). I don't know why Maxtor won't give out the fix (probably they are afraid of helping a hard drive firmware hacking scene, where people simply change the firmware in their drives that are identical to other models and gain hundred of extra gigabytes. Well, that's my guess.) In any case, I don't think I will be buying Maxtor again.
Last But Not Least : Video Card BIOS
None of the above items were in any way related to my problem. In the end, I took my video card (Gainward 7800GT) and flashed the BIOS with one from an eVGA 7800GT.
Initially I flashed with a slightly newer version of the Gainward 7800GT ROM, but that made absolutely no difference. I wrote the possibility of a BIOS problem off at that point. Later, short of buying a new board, I took the leap of faith and flashed with the eVGA BIOS.
So why does this work? What is the problem? Why should the BIOS affect whether or not the NVidia drivers will work on a particular platform (fine on Windows, but not on Linux)? I expect that NVidia won't answer this.
If you have the same problem as I did, you certainly won't get an RMA - your card works fine on Windows. There is no detectable issue with the card. And you are voiding your warranty by flashing it to that of a different vendor - but what other choice is there? Buy a second video card and cross your fingers?
Go to the site http://www.mvktech.net to find the right ROM and flashing utility. Of course, you shouldn't do this - but it was the only solution that worked for me. You very well might destroy your video card. Note: When flashing to a different vendor, you'll have to specify some flags to override the original vendor information.
One thing I found out from this whole debacle was that NVidia does not officially suggest or support any particular Motherboard/Video Card combination on Linux. So if you want to play it safe and buy a combination of hardware that just works - well, don't ask NVidia. If you're a home user like me and can't afford to drop $300 a pop on new video cards just because the one you bought has a non-detectable BIOS issue, well - good luck.
So as a last resort, you might want to try what I did and flash the your video card BIOS. But do a little research first and make sure your card is practically the same as the BIOS you're flashing to (I just made a wild-ass guess). If you flash it to a wrong BIOS, you're gonna end up with a nice decorative circuit board for your Christmas tree.
If anyone else out there had to do this, I'd appreciate hearing the details.