AMD, like Cyrix VIA, and Nextech (I believe was the name) were all clone makers, they made pin compatible processors for PCs. Their primary advantage was their lower cost. the AMD k5, designed to compete with the Pentium, as a Socket 5 processor just like the Pentium. The idea was that as a lower cost alternative their processors could be used in machines instead of Intel's. AMD, specifically, excelled in integer operations, doing them a lot faster than the equivalent Intel Processor. So in some cases the AMD processor was not only cheaper but also a better choice, if it was for use in applications that did a lot of integer arithmetic. Nextech was working on a new processor to compete with the Pentium and the K6; AMD bought the company and relabelled the in-development Nextech chip the K6-2; the K6 and K6-2 are completely different processors, and not in any way the same (they were basically designed by two different companies). The K6-2 supported a set of 3-D extensions (much like MMX)- whether this was Nextech or AMD that implemented it, I don't remember- at the same time it supported MMX, and it's floating point performance no longer sucked ass, and was very nearly comparable to Intel's offering.
Over time, all the other clone vendors died, or were purchases; VIA, to my recollection, bought Cyrix, made a few processors (the VIA Samuel C3 being the only one I distinctly recall) and then killed their processor division entirely, focusing on their motherboards and embedded solutions. AMD became the only competitor to intel that had any “weight”. Also, as their processors became equal to Intels both in performance and price, they started being made using different Pin designs. I believe this was originally because the socket or slot for some Pentium processor was patented so AMD couldn't make a compatible equivalent; at the very least, the Pentium itself was named the Pentium- rather than the 586- in order to prevent other vendors from using the same name.
One interesting thing about the Pentium Processor is that it is the first CISC instruction set processor to be considered SuperScalar. This is because of it's pipeline architecture which allows it to, in many cases, execute two instructions per clock cycle. The Pentium came in two revisions; the earlier versions didn't have things like MMX, and in many cases had the infamous FPU issue (Intel Errata #23). The second generation came in higher clock speeds (90Mhz, 100Mhz,and 133Mhz, as opposed to the 50 and 75Mhz of their original incarnations), as well as any number of improvements, such as a smaller die size and an on-chip APIC. It didn't have MMX, that was the third revision, which came in even higher clock speeds, FSB/Clock:66/166,66/200,66/233,66/266 (mobile only for the last one). the third revision had MMX, a smaller still die size, lower voltage requirements, a 16KB write-back cache (compared to the earlier versions 8KB). The interesting thing about some pentium boards, including those designed for slot CPUs, is that a lot of them actually had two processor slots. Usually the second one was labelled “for testing only” but you could literally plug in another processor and have dual processors. The only downside is that you pretty much required Windows NT to use them (9x doesn't support multiple cores or processors). Heck that wouldn't even work with XP Home, which only supports a single physical processor.
AMD's lower cost offerings impacted Intel's market, so they came up with their own low-cost alternatives. Which isn't too surprising given they'd been doing that for years, with the 386SX and 486SX, The 386SX being a slower variant of the 386DX, whereas the 486SX was a 486DX with it's FPU disabled. Installing the companion “co-processor”, the 487SX was actually installing a 486DX, which then took over all system operations from the SX. In addition, they created lower-cost upgrade capabilities for the 486, since the k5 was almost feature par with the pentium (and better in some ways, with 6 5-stage instruction pipelines rather than 2). To compete with this they created the Pentium “Overdrive” chip, which would be installed in a 486DX board, and take over all operations from the installed 486DX. Naturally, it was on a 486 board so some operations would still be slow, particularly bus transfers and DMA, but it sped up processor intensive tasks, and sped up a lot of tasks because of that. Later, with the K6 and K6-2 eating into their Pentium II Market share, they came up with another lower cost segment, the Pentium Celeron.
sidebar:*technically, the first Intel 6th generation processor was not the Pentium II, but rather the Pentium Pro*
Of important note is that the first Celerons were not Pentium processors, but rather Pentium II processors; it took a generation for Intel to catch on to AMD's low-cost niche tactic and come up with a response in the celeron. The Celeron was typically a slotted processor, at least all that I've seen are. The basic difference is that it has less on-die cache, and no L2 cache (some revisions had 128KB, compare to the Pentium II's standard 512KB). Ironically, the Celeron usually performed much worse than the K6 and K6-2 it was designed to compete with; Not to mention the awkwardness of the slotted processor design. Even so, and particularly through partnerships with retail computer manufacturers, Intel was able to squeeze the Celeron boards into the market. (the “Barbie” and “Hot wheels” machines from mid to late 1998 are a good example of this, since they sported celeron processors). The Celeron Brand lives on, but it is still a lower cost alternative to their other offerings, and is almost never a wise choice for a desktop machine. Many users are woo'd by the higher clock speed, but with so little cache, the clock speed barely compensates.
The 6th generation gave us the above Pentium II's, Celerons and K6-2s; the seventh gave us Pentium 2s…. Wait? P2s? What about Pentium III's and K6-2s? Well, they aren't 7th gen processors, since they are based on the same die as the sixth gen chips (for Intel, this was the Pentium Pro, for AMD, the K6-2).
The Original Pentium III was practically a Pentium 2 with SSE (MMX2) and a higher clock speed. An interesting sidebar is that the P6 chips from intel (pentium Pro, Pentium II, and Pentium III) are only fully utilized by NT versions of windows; since the Microops that the CISC instructions are reduced to are optimized for use with 32-bit code. windows 9x executes a good half of the time in 16-bit mode (for compatibility with older software, mostly) so you don't get the biggest improvement with it.
Intel failed miserably on their first attempts at a consumer-appealing x64 architecture. The Itanium was 64-bit, but it's execution of 32-bit code had to be fully emulated. It found some uses in business and servers, but it's limited ability with 32-bit code abhorred it's adoption in the consumer sector.
AMD created it's own 64-bit processors, but made it so that 64-bit was just another “mode” of the processor. In this way, 32-bit code could be run quite easy with minimal virtualization. Intel followed suite with their own extensions that implemented the same instruction set as AMD, making it compatible.
I'm not nearly as familiar with their history after around the Pentium III/Athlon XP area.
The two are practically the same now. They offer consumers a choice, but at the same time that choice is practically useless. The fact is that we've pretty much hit the architectural limit that different die configurations can give us, and we are not easily able to reduce the process further without invoking the dangers of quantum tunnelling. The best considerations for the future is to add more processor cores, and, even more important, have software that is better able to extort the best power from those cores. My opinion is that the big problem right now is not the hardware, or the software, but rather the programming languages that are dominant in the industry today, largely C/C++. What is needed is the adoption of one of the myriad languages that have built in support for concurrent execution of constructs; for example, some languages are able to compile a simple for iterative construct in a way that it can execute on separate cores. This approach is particularly powerful in a stateless environment, such as a functional language. to that end many functional languages include built-in concurrency support. What makes this particularly interesting is that most programmers think of “concurrency” and immediately think of threads; but threading is only one of the ways that concurrency can be achieved, and it is one of the least powerful, as well. Erlang, for example, takes the approach of sending messages between processes, instead of having different threads. Since Erlang is a functional' language, most of it's constructs are largely stateless; this is as opposed to most imperative languages which are typically state-heavy. It is the abundance of state in our standardized' programming languages that is causing the difficulties we are seeing with concurrency, not the cores or the implementations thereof. Consider for a moment that most of the benchmarking tools being used to compare processors are he written in C/C++. In order to trust the performance results, you have to trust that the code is making the best use of the available hardware. But the fact is that imperative stateful programming abhors concurrency; threads deadlock, and you have data synchronization issues and race conditions to deal with. So, while processor performance benchmarks might state that a Bulldozer is “worse” than another CPU, I move that that result is as much a testament to weaknesses in the program and the stateful imperative programming paradigm at least with regards to it's use with concurrent solutions. This is why I have never put faith in benchmarks; the fact is that any weakness being shown could easily be an oversight or problem with the software being used to test. If a benchmark tool only uses two threads, how can you trust it's result when it runs on 6 cores? And even if it was to use more, you're still placing your trust in how the program was written. And while one could argue that the test will show how a lot of current software and games run on a given system, it doesn't test the actual potential of that system; a properly written game could be written to take advantage of 6 cores and it would scream compared to running that program with fewer cores. At this point, concurrency is the answer to improving system speed, and in order to properly leverage concurrency, we don't just need more cores, but we need software and programming languages that provide built-in support for concurrency constructs. C/C++ simply does not offer this, and while I'm sure a library could be written that does, there are already loads of languages that provide built-in support for concurrency in any number of different ways; either through C# and .NET's addition of parallel constructs in C# 5.0, or the ability of functional languages to make assumptions because the code is primarily stateless and thus easier to make parallelize.
Personally I don't have a preference for either. I used a K6-2 for nearly 5 years, a Pentium 4 for about a year, and am now using Intel Q8200 and a laptop with a Intel T3200 (I think). Maybe my next build wilent l be AMD, I don't know. Either way, I'm not going to base any of my choices on how a given system performed with a piece of software. The heart of the matter is that I never trust software. I don't even trust software I wrote half the time. Software is a loose thread on a sock. If you pull out the thread, the sock is going to fall down regardless of how well formed the ankle is, and you cannot declare “this ankle sucks, because my sock keeps falling down” just as you cannot say “This hardware sucks, because this piece of software says so”.