RDNA3 - Preliminary Information for Next Year - Page 2 - Video Cards - HWzone Forums
Skip to content
  • Create an account
  • About Us

    Hello Guest!

    Please note - in order to participate in our community, comment and open new discussions, you must join as a registered member.

    Our members enjoy many advantages, including the ability to participate in discussions, enjoy raffles and promotions for members of the site, and receive our weekly content directly by email.

    Do not like being harassed by email? You can register for the site but do not submit your registration to the weekly email updates.

RDNA3 - Preliminary information for next year

Recommended Posts

  • תגובות 48
  • Opens on
  • Last comment

Prominent participants in the discussion

Prominent participants in the discussion

Popular responses

Oh, lay down, a piece of old, crazy trouble. Do not throw your complexes at me. Go near 2 buy some second hand fans and clean them. If you think I have any sentiment for one of the companies that I purchased a product from one of them, you are Leg

I try to read all this correspondence because the subject interests me but I can not shake off the fact that you are talking about the same thing .... the caveman says something and then the NEC says it is inaccurate and basically explains the exact same thing a

The situation you describe is not from an MMO, and there it would really be useless. The actual use of this is to run, for example, 5 completely identical characters in parallel, on the same server, in the same instance on the server so that each action is done in parallel

It does not work that way Caveman. Processes do not "fight" for a place in the cache. cache only serves the process at a given moment

Running at that moment on the GPU. At a given moment can run only one process. This is similar to a CPU, I will explain in detail:


As a preliminary background for expanding the canvas, a video card usually and usually (when it comes to visual graphic acceleration) works simultaneously, on

One process that gets exclusive access to it. Especially games and simulators which is a task that is realtime currency

Things. This is the typical work style in graphic acceleration. 


But we are talking about a situation where this exclusive approach is nullified.

That is, the operating system allows two (or more) applications that need the card resources for their work, maybe it's not a game

Rather but a batch mathematical calculation that requests the GPU, then similar to a division of labor (at the core of a processor), every time

As the card moves to another task, an action called context switch occurs.


This means that if the graphics card accelerates several tasks in parallel, it gives several cycles to one task, then frees up to engage

In the activity of the other task, and then moves on to the next in line, all according to the way the operating system schedules the

The activity and the division of time.


In any context switch just like in a processor, the contents of the stack are replaced with the contents of the content needed for the other task. This content

Sits in a vram (if its pre-cached to the vram it depends on an application). Or it is cached on a read-only basis
per call basis (as in a CPU processor).


If the operating system manages the two processes that sat in the vram at the same time, ie did both pre-caching

Was and both can fit inside it and there is enough space at the same time, so the operating system will happily do it.

vram is the cache of the .


The GPU processor will work each time in front of the section where the stack of the same application that is currently running is located. The so-called opposite

The segment Relevant to the same application. Segmentation by address space. Each application gets access only to its own segment.


A cache memory, in every sub-attraction of running the process (the one running at that moment), takes into it (from within) what is needed for calculations

The relatives of the GPU, or, what is in the most frequently accessed method of paging management. Depends on the managed algorithm

cache of course, because each application runs its business optimally according to the type of algorithm running.


In case the application is a graphical acceleration of sample games, I explained in depth how it works -> Read my article

From November 2020 entitled "How an algorithmic cache works in the case of a game / graphic simulator".

And it works differently of course in the case of a different type of application, which is not visual / graphic acceleration.


Also all cards have cache. The question is what size:

On tickets up to The cache volume is of the 5-10mb type. M- The volume jumped to 128MB.


The cache works in the same way between running one application in the operating system, ie a single stack of a single application, or running a lot

Applications, ie segmentation of memory so that each segment gets its own stack.


Cache overflows while running the process (processed) are shifted to vram storage, which is the next softening and using paging.

Surfing from vram in the same way is shifted to General with paging and so on, which is the next softening after it.

And surfing from Surf as is known to disk through paging and so on and so forth. And so it is with every video card

And not different in From any other case. But when there is more cache naturally, the browsing events between on die cache

To Vram decrease in quantity.


It's the same as a CPU processor, if you are familiar with the field. Even a modern CPU processor has several hierarchies of memory:

There is L1 cache, followed by L2, followed by L3, followed by , Followed by a disc, and so on ... and a processor usually juggles between dozens

(If not hundreds) processes that the operating system throws at all the time. and he context switch of all buffers

His every time he makes a move to handle another slice.


Over the years there has been a division into multiple hierarchical levels of Cache and non-cache, as far as technology knows how to develop

Other means, which are among other means in the range. Today we have reached a model of up to 6 levels of hierarchy, and the industry is on its way to the model

Of 7 levels probably starting in another two years.



Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites

You basically explain how they do fight for a place in CACHE (maybe not directly, but bottom line, only one process will use it). For my most intense use of the video card (Multiboxing - running multiple instances of the same game), it seems that architecture that relies on cache to get a big boost to speed will suffer more than one that does not so completely rely on this component. In this case the larger bandwidth of the memory will be reflected.

For that matter, run 5 times world of warcarft or 4 times 3.

Post Reply Direct Link To This Post
Share on other sites

Not true.

You did not understand, I will explain again more simply:

Process does not compete for resource. Process cannot manage resource. Process is passive.


The person who manages the resource is the operating system that balances the processes. She puts them in one after the other

In the queue she set according to the priority set for each of them. And at any given moment there is only one process that is found

In pipe line and stack. just one.


Each run of a process, in those parts of a second that it is currently processing, ie it is its turn to get run time,

The power of the system is -> its power as it runs on a single processor. 

All the other processors on the side do nothing and do not take any trouble from the GPU.


I think I understand what confuses you,

You mistakenly think the stack is cached. But not so. The stack sits in vram or in .

Just like in a CPU processor. In every context switch everything is purged. All buffers and memory are cached as well.


In the case of graphical acceleration it is the same, the stack sits either in RAM

(Or in VRAM if he had a place there) That's all.


* I will look for some good book to learn how these assemblies work and the interaction between them.



Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites

No. Not only do I understand what you are saying, I also know it without you saying ...

I'm trying to say that the bottom line (without starting to describe what's technically happening) is what's happening, and it doesn 't matter if the context switch happens because of the operating system, or an old witch who said hocus pocus. In the end, if there are several companies running at the same time, only one of them can use the cache, and if everyone needs to update a few dozen frames per second in high resolution ... good luck with that.

Edited By captaincaveman
Post Reply Direct Link To This Post
Share on other sites

What I'm trying to explain, is that if you have a cache of only 5-10MB, as there are in all cards in the world,

Or you have a 128MB cache, this extra cache can in no way compromise performance,

For more to improve, but certainly not to harm.


Your question should be worded differently:

Instead of focusing on cache memory, which can at most benefit but not harm, your question should focus only on

In the width of the VRAM memory and be such -> what happens in cases where the bandwidth of the VRAM is smaller compared to

The case in which it is larger. Does it affect in any way?


The answer is that it certainly affects, but not in the aspect of what you have described (multiplicity of processes) but in the algorithmic aspect:

Not every algorithm benefits from cache. And who like you knows a database person, that if you take action on the DB of this kind

Which requires access to the entire DB to finish the job, what will help us here cache, which can hold only a small chunk of it at a time

Given ?

Such as the action of adding one additional value to each individual cell in DB. Another column say. Here cache does not help at all.

That you will still need O of n accesses to the database, do not repeat the same cells several times.



The person here is not a multiplicity of processes in your question, but the type of algorithm.



If it's games / simulators - this is actually the type of algorithms where cache helps a lot. Because of the processing method it is taking

One texture at a time "and paint it" (which means retract it using a linear transformation on polygons)

And the approach is of such a nature, that one only approaches a very small section in each attraction and repeats it over and over again many times.

That is why cache is very helpful in games and simulators, where the working algorithm is ideal for working with cache.

And that's why they assimilated it in the first place.


But if it is another mathematical processing, not a graphical acceleration, the cache may not be relevant at all.

Then the bandwidth in front of the VRAM will be what will speak.


So try to delineate your question more precisely:

Have you asked about specific games? 

So in the case of games (multiplayer) what I described to you is what will happen. The multiplicity of processes is irrelevant to the question and is

Will behave well just like a lone game runner. Because the cache is indifferent to the context switch. It is not related to the subject.


But if you ask for a cryptocurrency pad, for example, I'm not sure the cache will help at all, and the width will speak more

stripe The total in front of which the core is conducted. And in this case one (or equivalent of several different) mining laws does not matter either.

In each case what matters to us is the bandwidth .


** Without going into detail about the Mining Per Excellence algorithm, yes, this is just an example of something other than acceleration.

Graphically, just to convey the point and nothing more. That is, an example of something unfamiliar to us as such, which performs cyclical attractions

Many on the same section Small, so it is not clear to us that cache at all helps.


Hope he manages to put you in order.




A. Games?

The cache din is the same regardless of the amount of games running at one time. It is effective in all cases.


B. No games?

The cache law is the same regardless of the amount of processes running simultaneously here as well. Efficiency depends on whether the cache is relevant

At all to the same algorithm, or irrelevant. If not relevant then chances are the bandwidth will actually be relevant.

And if so relevant to the run algorithm, then similar to the case of a previous section a.


Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites
Quote of the captaincaveman

What really intrigues me to know, is how CACHE's works AMD When the card needs to run more than one graphics processing task.

Suppose I run multiple instances of a particular game. Will the various processes fight for a place in CACHE?


Here is the wording of the question you should have asked:

"What intrigues me the caveman to know, is whether the bandwidth of RDNA2 thaws in favor of Cache on die,

Affects multiplicity of processes? In the case of games and not just games ....

And in general how it affects regardless of the multiplicity of processes ".


The answer as I described earlier is:

A. That cache memory Is not Affects multiplicity of processes, it affects the run "application type".

B. If it's games, thaw in RDNA2 in VRAM bandwidth in favor A cache is very clever, because it is the type of algorithm

    That cache is sewn for him like a glove next to.

third. If the run algorithm is not a game and is of a type that cache does not help, then thawing in RDNA2 is not appropriate.



Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites

Who is responsible for filling things in the CACHE of the video card? I feel that there must be some reference to the process that put the information in the address in memory ...

Can one process access information that another process has cached?

Post Reply Direct Link To This Post
Share on other sites

God forbid if one process is allowed to access information of another process. This is the point where serious failures and bugs occur,

As well as information security issues that have been under very heavy emphasis in the last decade, to ensure the same tough segmentation.

This is one of the reasons why Modern need so much , He does this thanks to a tough algorithm for separating tabs

One from the other in the memory space so that none of them will be allowed either accidentally or intentionally to access the information of another tab.

The price is consumption multi.


The on die cache is similar to hamsters, purge in any context switch. In fact, if not pre caching or forcing

The process of keeping its stack in vram, then the vram purge too. In fact vram is all cach

Of the . he purge every time the program that ran on the graphics accelerator comes out. It is more correct to call it a mathematical accelerator by the way,

But one who knows how to do a lot of things, when graphics are just one of them. Was right to call it in modern times MPU and not GPU

Which is his old name.



In order for a process stack to remain in the vram in the first place, even though it comes out all the time unless you force it otherwise, you need a system

The management / operation will force him to do so. Because a modern operating system is smart and because programmers take advantage of features it is

Allows them, so if you are running some processes on the GPU (hereafter the video card which is a mini computer in itself), try to

The processes will be stored together in vram and not in , Assuming everyone has a place of course at the same time. 

This is done so that when you do a context switch, you will not have to upload anything again Until vram everything again.

That is, minimize context switch events to occur from the level of vram and above, and not from the level of and above.

Save time on each context switch when it occurs and save its overhead.


To ensure that the subject is well understood, we will explain that the cach on die is too small to do the same trick we did in VRAM,

That is, to bring up to it the applications and the stack, so that everyone sat in it similar to how they sit today in the giant vram.

Applications of the type running on a GPU, can not fit in a small volume, even as 128mb.


If at some point in the very distant future, the cach on die will swell to huge dimensions of gigabytes, and will be able to store what is needed sitting

Today in vram, then maybe it will be possible to raise the hierarchy so that the applications themselves and the stack all sat in the same cach on die.

We are probably not there for the next 20 years, according to the rate of doubling the capabilities once every 3 years. Unless there is a revolution that accelerates significantly

The amount of transistors that can be manufactured in chips, and will break the current multiplication law of once every 3 years. Such revolutions are unpredictable

Or to watch, and they occur in one big boom and a complete surprise. And they are very rare.


Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites
Quote of the captaincaveman

Can one process access information that another process has cached?


I think it comes down to the end of your mind and understands your question better, appreciates with the necessary caution, who understands maybe what is confusing

You from the world of processors and you tried to enter the world of GPU, then I will also explain this:


Unlike a CPU (modern) processor where the L3 cach allows the transfer of information ** between the cores ** of the processor, ie between the core

First let's say a fourth core, so that each of them runs a foreign processor, the on die cach in a graphics accelerator common only to the cores

The GPU, and not the speech of foreign processes with each other, ie not the exchange of information between them. That's probably what you asked.


This is because the cach in the graphics accelerator sits in the hierarchy, in a place that serves only the cores of the graphics accelerator, and to

Let's confuse GPU cores with CPU cores - the graphics accelerator at any given moment, no matter how many cores it has,

Can only serve a single process. Have not yet invented the ability to divide the GPU resource into its face, working in parallel on several

Applications simultaneously, so that some of its cores will handle one application, and some in another application.


Therefore information / variables can move between different processes, only at the level of hierarchy that suits them, ie or through the by any chance

Generic, or through the vram in the ideal case, if they took care during the development to do everything necessary, so that it all happens and converges

For VRAM operations. That is, all the processes that are foreign to each other are all sitting in VRAM, working in coordination with each other

And planner intends, to replace variables through VRAM.


Because if not, then they will switch ways .

Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites

You keep saying that I am confused and then describe (at great length) what I am saying ... in short, the cache will not divide the information it has between different processes even if it is the same information (from the same game for example). In this case we will have to refill the cache every time and then the speed of memory is reflected ...

Post Reply Direct Link To This Post
Share on other sites

You draw an imaginary scenario that I am not familiar with in practice.


So before the settings:

In the same game, there is no multiplicity of processes in the GPU. The game is one process that the GPU runs.

The GPU runs a process of drawing the screen and its output is the computer monitor.


The only way to force multiple processes on a GPU, if it's just using different games, is to run several games at once,

And hopefully, the game type and operating system impress it at all. Usually a game gets dedicated access and locks access

To the GPU resource, because it is a real-time process.


In case we run several games at the same time, we will experience a drastic decrease in performance and stuttering in each of them. Because the run is already

Do not do realtime.


The issue of caching has nothing to do with how many applications are required to be performed by the GPU, since as explained, at any given time the accelerator

Runs only one of the games, and the cach is used for that specific run. And once the GPU processor shifts to handle the game

Next, the cach will serve the next game and come back for goodness sake.


Anyway when a game talks to the other, it's at the CPU level in general and not at the graphics level.

Because no pixel information of the screen output is transferred between the games. Which is the product of a GPU.

Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites

Reading Comprehension...

This is a run of the same game multiple times, not different games.

And it's probably clear that the GPU will process only one game frame at a given moment and the operating system will juggle between processes ... I try to say that in such a case, the relative advantage that the very fast cache gives AMD is eliminated and the slower memory of the cards causes damage Greater performance compared to that of With the GDDR6X.


for example:


Post Reply Direct Link To This Post
Share on other sites

The GPU runs the graphics drawn on the screen, not the game engine.

One game will transfer information to a second game, at a non-graphical level, but at the application level running the CPU.

This means that the traffic at the , By CPU, and operating system. No GPU connection.


Trying to explain you in another direction:

The GPU in the work chain, sits at the last end of the chain -> it only gets to the end of the task, 

Toddler request, draw a picture. And he takes it out as an output to the screen, and that's the end of his work.


If there will be several games running in parallel, then each in turn in a distinct and separated way, will ask him to draw the picture for him.

Ha and nothing more.


I will write this most simply and clearly:

It is not the job of a GPU to transfer information between one application and another.

Multiple processes exist only in the operating system of this type linux and the like, which is run by a CPU processor.

In a GPU the world has only one application running at a time. Nor does he seek to speak to anyone other than to write

His statement, below the output, aim at the screen.


Edited By nec_000
Post Reply Direct Link To This Post
Share on other sites

Join the discussion

You can then join the discussion and then join our community. If you already have an account with us, please Log in now To comment under your username.
Note that: The comment will appear to the surfers after approval by the board management team.

Add a comment

×   The content pasted is with formatting.   Remove formatting

  Only 75 emoji are allowed.

×   Your link has been automatically assimilated.   Show as regular link

×   Your previous content has been automatically restored.   Clear all

×   You can not paste images directly. Upload or insert images from URL.

  • Create new ...

At the top of the news:

new on the site