Email search scenario tests of using dotnet spark as reducers #205

ruihmicrosoft · 2019-08-07T18:58:39Z

This commit adds two unit tests which mimic the reducers used in email search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet.

The first reducer computes the top N contacts per user. It first groups by the user id and then collapses each group into one row by concatenating the records. A Udf is used in order to apply some string operations.
The second reducer takes N rows and outputs M rows. Given a succeeded search query with flattened folder and item list, the reducer explodes the input dataframe by pairing the corresponding folder and item ids. If the folder and item list contain multiple records, the reducer will generate multiple rows as output.

We are excited to review your PR.

So we can do the best job, please check:

There's a descriptive title that will make sense to other developers some time from now.
There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
You have included any necessary tests in the same PR.

…l search scenarios today. They are typical examples as it covers the two most common data transformation --- takes N rows and outputs 1 row, and takes N rows and outputs M rows. We use the two unit tests to demonstrate the feasibility to migrate existing reducers used by email search to HDI. Note this isn't a performance test yet. 1) The first reducer computes the top N contacts per user. It first groups by the user id and then collapses each group into one row by concatenating the records. A Udf is used in order to apply some string operations. 2) The second reducer takes N rows and outputs M rows. Given a succeeded search query with flattened folder and item list, the reducer explodes the input dataframe by pairing the corresponding folder and item ids. If the folder and item list contain multiple records, the reducer will generate multiple rows as output.

dnfclas · 2019-08-07T18:58:53Z

All CLA requirements met.

imback82 · 2019-11-16T18:14:47Z

@ruihmicrosoft, we have been adding examples with some documentations under the example folder: #319, #320. If you want to push forward this PR, can you please convert this to an example, not E2E test? Thanks.

ruihmicrosoft changed the title ~~This commit adds two unit tests which mimic the reducers used in emai…~~ Email search scenario tests of using dotnet spark as reducers Aug 7, 2019

Merge branch 'master' into master

59b80a8

Base automatically changed from master to main March 18, 2021 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Email search scenario tests of using dotnet spark as reducers #205

Email search scenario tests of using dotnet spark as reducers #205

Uh oh!

ruihmicrosoft commented Aug 7, 2019 •

edited

Loading

Uh oh!

dnfclas commented Aug 7, 2019 •

edited

Loading

Uh oh!

imback82 commented Nov 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Email search scenario tests of using dotnet spark as reducers #205

Are you sure you want to change the base?

Email search scenario tests of using dotnet spark as reducers #205

Uh oh!

Conversation

ruihmicrosoft commented Aug 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnfclas commented Aug 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imback82 commented Nov 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ruihmicrosoft commented Aug 7, 2019 •

edited

Loading

dnfclas commented Aug 7, 2019 •

edited

Loading