Key changes in version cudaprof v2.2 with respect to v1.1:

1) Profiler counters are now also supported on the Windows Vista platform.

2) Support has been added to handle profiler data for multiple CUDA devices.
The session list now uses a tree display and devices are listed as children under each session.

3) Following new profiler counters are supported: 
  a)  gld_32b, gld_64b, gld_128b : Count 32-byte, 64-byte, 128-byte global memory loads
      gst_32b, gst_64b, gst_128b : Count 32-byte, 64-byte, 128-byte global memory stores
    These are available only for GPUs with compute capability 1.2 or higher.

  b) gld_req, gst_req: Count number of global memory load and store requests.
    These are available only for GPUs with compute capability 1.2.

4) The summary table is enhanced to display global memory bandwidth values for each kernel. The overall application level global memory bandwidth is also provided using the new menu option "Session->Global Memory Throughput". The global memory bandwidth calculation uses the new counter values gld_32b/64b/128b and gst_32b/64b/128b and is available only for GPUs with compute capability 1.2 or higher.

5) The summary table is enhanced to display instruction bandwidth for each kernel. The instruction bandwidth is a ratio of achieved instruction rate to the peak single instruction issue rate. It uses the "instructions" profiler counter.

6) A new option "Session->Analyze Occupancy" is provided - which reports details of occupancy calculation for each kernel and the factor due to which the maximum occupancy is not achieved. 

7) The GPU Time Width Plot is enhanced to display data from multiple streams or from multiple devices. New options "Split On Stream" and "Split on Gpu" are provided in the "Session View Settings" dialog.

8) There is a new option "Timestamp Based Total" provided in the "Session View Settings" dialog for "Summary Plot". This uses the difference in timestamp values for the last and first method to compute the total gpu time instead of using the sum of gpu times for all methods. When this option is used a new bar is displayed to show the "GPU Idle" time.


9) Miscellaneous:

- Handling of large profiler data is improved

- A busy cursor is now displayed for operations which can take long time.

- The host machine configuration can be seen using the new menu option "Help->System Info".

- In session settings dialog all counters, timestamp, kernel and memory transfer options are enabled by default.

- The cputime and profiler counter columns are hidden by default in the summary table.

- CUDA Visual Profiler is now integrated into the CUDA Toolkit installer and the version number has been changed from 1.1 directly to 2.2 to match the CUDA toolkit version.



Key changes in version cudaprof v1.1 with respect to v1.0:

1) Enhancements to profiler output:

a) New columns added for kernel methods:
  - Size of grid of blocks (grid size X, grid size Y)
  - Size of a thread block (block size X, block size Y, block size Z)
  - Register count per thread 
  - Static shared memory size per block
  - Dynamic shared memory size per block
  - StreamID of kernel launched 

b) New columns added for memcopy methods:
  - number of bytes 
  - direction of transfer (host to device or device to host) 
  - cputime 

2) New view options added:

a) Comparison summary plot : This plot can be used to compare summary profiling data for two sessions.

b) Kernel table : This lists number of calls, grid size, block size, shared memory size per block and register count per thread for each kernel.

c) Memcopy table: This lists number of calls, memory transfer size in bytes and memory transfer direction for each memcopy.

3) cudaprof now detects whether a CUDA capable device is available on the system. If a CUDA device is not found the following message is displayed:
   "Unable to load cuda library. CUDA Visual Profiler device features will be disabled."
and certain options like Profile menu options are disabled. 
Also based on the device type certain options are enabled or disabled. A new option "Profile->Device Properties" is provided to display cuda device properties.

4) The cputime value displayed is adjusted based on whether kernel execution is asynchronous or not. 

5) Summary table has a new method display option. User can choose between "base name", "base name with suffix" or "full name". The "base name" option is useful to combine data for different template based kernel methods having the same name.

6) Width plot 

a) display with cpu time enabled is changed. cputime is shown as a separate bar below gputime.

b) A new option is added to use occupancy as a bar height option in width plot.

7) Added height zoom option for height plot.

8) Common improvements to plots

a) Added title for each plot.

b) Added option to display plot configuration options.


9) The error reporting during program execution is changed to help in identifying the specific cause of the error.

10) The cudaprof user document (earlier README file) is now converted to HTML format (cudaprof.html) and can also be viewed using the "Help->Cuda Visual Profiler Help" menu option or using the <F1> function key. This new help option is currently only supported on Windows.


11) The format of the cudaprof .cpj project files is changed from plain text to XML. The information for each session which was earlier in separate .csn files is now part of the .cpj file. The new format is used for any new projects created or when existing projects are updated. Existing projects in the old format can also be opened.

12) CUDA device names for all available CUDA devices are now saved for each session and they are shown in Session Properties.

13) Added menu option "File->Delete" to delete a cuda profiler project.

14) In the Windows version the Microsoft Visual C++ libraries are no longer included in the cudaprof ZIP file. If you do not have Microsoft Visual C++ 2005 SP1 installed you will need to download and install the Microsoft Visual C++ 2005 SP1 Redistributable Package.
