Skip to content

Conversation

A-006
Copy link
Collaborator

@A-006 A-006 commented Aug 4, 2025

What's changed?

  • Remove redundant copies in recip2real and real2recip, improving the performance of the plane-wave basis by approximately 5%.
  • During CPU-based calculations with nspin=4, the veff_op operator exhibits suboptimal cache utilization. Rearranging the input vector structure can significantly enhance performance.
  • Decrese the memory useage of the pw basis when nspin =1/2.

@mohanchen mohanchen added Performance Issues related to fail running ABACUS Refactor Refactor ABACUS codes labels Aug 6, 2025
@@ -16,13 +16,13 @@ template <>
void FFT_CUDA<float>::setupFFT()
{
cufftPlan3d(&c_handle, this->nx, this->ny, this->nz, CUFFT_C2C);
resmem_cd_op()(this->c_auxr_3d, this->nx * this->ny * this->nz);
resmem_cd_op()(this->c_auxr_3d, 2*this->nx * this->ny * this->nz);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why '2' is added here?

}
template <>
void FFT_CUDA<double>::setupFFT()
{
cufftPlan3d(&z_handle, this->nx, this->ny, this->nz, CUFFT_Z2Z);
resmem_zd_op()(this->z_auxr_3d, this->nx * this->ny * this->nz);
resmem_zd_op()(this->z_auxr_3d, 2*this->nx * this->ny * this->nz);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why '2' is added here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error has been fixed.

@mohanchen
Copy link
Collaborator

After discussion with @A-006 , I cannot accept this modifcation, which causes too many new defects. We decide to close this PR.

@mohanchen mohanchen closed this Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Issues related to fail running ABACUS Refactor Refactor ABACUS codes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants