Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- MKL pardiso performance problem if run on heavily used memory heap

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Popov__Maxim

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-23-2021
08:17 AM

363 Views

MKL pardiso performance problem if run on heavily used memory heap

MKL 2021.3 (+ tbb 2021.3)

Windows 10, Visual Studio 2017

2 x Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz (20 cores total)

192 Gb RAM

I have relatively complex performance issue(s) with MKL pardiso.

Please find attached visual studio project which reproduces bug(s).

It is pure synthetic example. But we have similar problem (and even more) in our commercial product.

The test runs the same task 8 times. After 4th run we create some "garbage" in memory using mallocs and free (allocating 7.6 Gb and free some of them to create heap fragmentation).

As you can see from the protocols (below) mkl_2018.0 works fine (no issues). But mkl_2021.3 (I also tested 2020.1 with the same result) has couple of problems:**First**: Solution time 3 times slower. (comparing mkl_2018.0 and mkl_2021.3)**Second**: factorizations are ~10+ times slower after we created "garbage" in memory. Also our commercial code have the same problem with solution time (it slows down ~5 times if run on heavily used heap) but I can't reproduce it in the test.

**Protocols**

**mkl 2018.0**

*** Symbolic factorization = 0.521841

*** Numerical factorization = 0.003823

*** Solution = 2.82555

*** Symbolic factorization = 0.0160846

*** Numerical factorization = 0.003187

*** Solution = 2.84516

*** Symbolic factorization = 0.0159267

*** Numerical factorization = 0.0032141

*** Solution = 2.86703

*** Symbolic factorization = 0.015718

*** Numerical factorization = 0.0037508

*** Solution = 2.85035

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.0148403

*** Numerical factorization = 0.002934

*** Solution = 2.81944

*** Symbolic factorization = 0.0145776

*** Numerical factorization = 0.0030027

*** Solution = 2.82286

*** Symbolic factorization = 0.0142837

*** Numerical factorization = 0.0030718

*** Solution = 2.84451

*** Symbolic factorization = 0.0138617

*** Numerical factorization = 0.0030959

*** Solution = 2.82229

**mkl 2021.3**

*** Symbolic factorization = 0.158939

*** Numerical factorization = 0.0044622

*** Solution = 8.59468

*** Symbolic factorization = 0.0150729

*** Numerical factorization = 0.0027243

*** Solution = 8.78183

*** Symbolic factorization = 0.0148563

*** Numerical factorization = 0.0026545

*** Solution = 8.57554

*** Symbolic factorization = 0.0149359

*** Numerical factorization = 0.0027301

*** Solution = 8.85421

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.166303

*** Numerical factorization = 0.131799

*** Solution = 8.84035

*** Symbolic factorization = 0.168182

*** Numerical factorization = 0.134787

*** Solution = 8.64635

*** Symbolic factorization = 0.189809

*** Numerical factorization = 0.131606

*** Solution = 8.61737

*** Symbolic factorization = 0.165271

*** Numerical factorization = 0.134852

*** Solution = 8.61592

Link Copied

8 Replies

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-23-2021
10:36 AM

337 Views

Do you see this regression with the OpenMP runtime version of MKL Pardiso as well?

Popov__Maxim

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-23-2021
11:27 AM

333 Views

I haven't checked OpenMP runtime version of MKL (and really don't know how to do that).

We don't use OpenMP in our product anymore, so we are not interested in OpenMP version of MKL

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-24-2021
02:59 AM

312 Views

The reported behavior has not been reproduced on Linux OS ( RH7) with AVX2 and AVX-512 code paths.

Here are the logs I see with MKL versions 2018.1 and 2021.3 correspondingly. I only added the call of mkl_get_version() function to report mkl's version info:

**MKL v.2018.0.1 **

Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors

*** Symbolic factorization = 0.0398407

*** Numerical factorization = 0.104073

*** Solution = 3.41783

*** Symbolic factorization = 0.019124

*** Numerical factorization = 0.00386271

*** Solution = 3.38974

*** Symbolic factorization = 0.0185753

*** Numerical factorization = 0.0031121

*** Solution = 3.37459

*** Symbolic factorization = 0.0183605

*** Numerical factorization = 0.0051151

*** Solution = 3.39523

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.0195702

*** Numerical factorization = 0.00301139

*** Solution = 3.44974

*** Symbolic factorization = 0.018768

*** Numerical factorization = 0.00286272

*** Solution = 3.44151

*** Symbolic factorization = 0.0177862

*** Numerical factorization = 0.00294998

*** Solution = 3.44731

*** Symbolic factorization = 0.0173954

*** Numerical factorization = 0.00292216

*** Solution = 3.4427

/****************************************************/

**MKL v.2021.0.3 **

Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors

*** Symbolic factorization = 0.0315676

*** Numerical factorization = 0.0498327

*** Solution = 3.36807

*** Symbolic factorization = 0.0182586

*** Numerical factorization = 0.00408213

*** Solution = 3.37433

*** Symbolic factorization = 0.0171186

*** Numerical factorization = 0.00311218

*** Solution = 3.40193

*** Symbolic factorization = 0.0181005

*** Numerical factorization = 0.00289025

*** Solution = 3.38738

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.0198418

*** Numerical factorization = 0.00326361

*** Solution = 3.44586

*** Symbolic factorization = 0.0174867

*** Numerical factorization = 0.00287047

*** Solution = 3.30897

*** Symbolic factorization = 0.016585

*** Numerical factorization = 0.00287961

*** Solution = 3.39236

*** Symbolic factorization = 0.0178846

*** Numerical factorization = 0.00268261

*** Solution = 3.43325

The AVX-512 results are very similar.

Popov__Maxim

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-24-2021
07:14 AM

301 Views

Thank you for checking it on Linux!

Most probably it's Windows (or even Windows 10) specific problem.

It looks like slowdown in memory allocation in Windows, if allocate relatively large blocks.

MKL has it's own memory pool (according to documentation), but it didn't help in this case. I guess that pardiso is not using MKL's memory pool for all allocations which leads to slowdown on Windows.

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-24-2021
03:26 PM

277 Views

Hi Maxim!

As a quick check while we are trying to reproduce the issue: can you try to set he environment variable MKL_DISABLE_FAST_MM=1 prior to calling the test and see if the behavior changes?

Thanks,

Kirill

Popov__Maxim

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-24-2021
09:22 PM

262 Views

Hi Kirill!

Yes, MKL_DISABLE_FAST_MM=1 significantly degrades performance:

*** Symbolic factorization = 0.122878

*** Numerical factorization = 0.411002

*** Solution = 8.86472

*** Symbolic factorization = 0.0150951

*** Numerical factorization = 0.403717

*** Solution = 8.87216

*** Symbolic factorization = 0.0146924

*** Numerical factorization = 0.3977

*** Solution = 8.87921

*** Symbolic factorization = 0.0148401

*** Numerical factorization = 0.397932

*** Solution = 8.83029

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.190399

*** Numerical factorization = 0.479833

*** Solution = 8.83713

*** Symbolic factorization = 0.169229

*** Numerical factorization = 0.5047

*** Solution = 8.83924

*** Symbolic factorization = 0.176403

*** Numerical factorization = 0.482541

*** Solution = 8.83819

*** Symbolic factorization = 0.179341

*** Numerical factorization = 0.532943

*** Solution = 8.86166

Regards,

Maxim

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-25-2021
08:18 AM

252 Views

Thanks for the experiment!

If we saw stable (but ofc slower) times before/after garbage allocation with disabled fast memory manager, it would be a great hint for us. Alas, as I see, after making garbage allocations the times go up as well so we can't be sure that fast memory manager affects the original issue.

Thanks for trying.

Best,

Kirill

Popov__Maxim

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-22-2021
11:45 AM

102 Views

Hello!

Are there any updates on the issue?

In the meantime we redefined MKL's pointers i_malloc, i_calloc, i_realloc and i_free with our own memory pool allocation functions. After that the problems seems to disappear. But we consider it as a temporary solution.

Regards,

Maksim

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.