You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 26, 2023. It is now read-only.
With normal functions, fanning out can be done by having the function send multiple messages to a queue. However, fanning back in is much more challenging. You'd have to write code to track when the queue-triggered functions end and store function outputs. The Durable Functions extension handles this pattern with relatively simple code:
50
48
```csharp
@@ -94,9 +92,7 @@ This performs the following **permanent** changes to your machine:
94
92
95
93
## 1. Serverless MapReduce on Azure ##
96
94
97
-
<palign="center">
98
-
<imgsrc="./images/MapReduceArchitecture.png"/>
99
-
</p>
95
+

100
96
101
97
The above diagram shows the architecture a MapReduce implementation generally follows.
102
98
@@ -127,12 +123,14 @@ You'll first notice there are two projects in the solution. One is a Function v2
127
123
This is where all our MapReduce logic lies. Let's have a look
128
124
129
125
##### StartAsync
130
-
The entry point to our Durable Orchestration, this method is triggered by an HTTP request to the Function which contains a single `path` query parameter specifying the URL to the blob storage account containing the files to process. Example:
131
-
~~~
126
+
The entry point to our Durable Orchestration, this method is triggered by an HTTP request to the Function which contains a single `path` query parameter specifying the URL to the blob storage account containing the files to process. Example:
127
+
128
+
```
132
129
POST /api/StartAsync?code=Pd459lsir2CILjc8jRAkO6TLy3pasuBDikYZMZRKAjaTgjh00OW2wg==&path=https://mystorage.blob.core.windows.net/newyorkcitytaxidata/2017/yellow_tripdata_2017 HTTP/1.1
133
130
Host: myfunction.azurewebsites.net
134
131
Cache-Control: no-cache
135
-
~~~
132
+
```
133
+
136
134
Note the format of the `path` variable. It's not only used to denote the container in which to look, but also the prefix to use when searching for the files. You can, therefore, get as arbitrary or specific as you want but *if you have the files in a subfolder you **must** specify it*.
137
135
138
136
Once StartAsync is kicked off, it parses out the container name and prefix from the `path` parameter and kicks off a new orchestration with `BeginMapReduce` as the entry point
@@ -141,6 +139,7 @@ Once StartAsync is kicked off, it parses out the container name and prefix from
141
139
This is the *actual* orchestrator for the entire process. First, we retrieve all the blobs from the storage container which match the prefix, using the Activity function `GetFileListAsync`. We must do this as the queries to Blob Storage are asynchronous and therefore [cannot live inside an orchestrator function](https://docs.microsoft.com/en-us/azure/azure-functions/durable-functions-checkpointing-and-replay#orchestrator-code-constraints).
142
140
143
141
After getting the list of files to include it then spins up a mapper for each, in parallel:
142
+
144
143
```csharp
145
144
vartasks=newTask<double[]>[files.Length];
146
145
for (inti=0; i<files.Length; i++)
@@ -150,12 +149,14 @@ for (int i = 0; i < files.Length; i++)
150
149
files[i]);
151
150
}
152
151
```
152
+
153
153
We add the resulting `Task<T>` object from these calls to an array of Tasks, and we wait for them all to complete using `Task.WhenAll()`
154
154
155
155
Once they've completed, we've got mappers created for each file and it's time to reduce them. We do this by calling out *once* to another activity function: `Reducer`. This function does the math to aggregate the average speed computed for each day of the week in each line of the files in to an overall average across all the files for each day of the week.
156
156
157
157
After this, we return the result as a string back to the Orchestrator (`BeginMapReduce`) who sets this as `output` for the entire orchestration which the caller can discover by issuing an HTTP GET to the status API:
158
-
~~~
158
+
159
+
```
159
160
GET /admin/extensions/DurableTaskExtension/instances/14f9ae24aa5945759c3bc764ef074912?taskHub=DurableFunctionsHub&connection=Storage&code=ahyNuruLOooCFiF6QB7NaI6FWCHGjukAdtP/JGYXhFWD/2lxI9ozMg== HTTP/1.1
> Note: Give this Status API a hit while the orchestration is running and you'll get an idea of where it's at in the process due to the calls to `SetCustomStatus` throughout the code
178
179
@@ -209,7 +210,7 @@ After deployment:
209
210
- Visit your Function App in the Azure Portal
210
211
- Click the `StartAsync` function
211
212
- Click 'Get function URL' & copy it for usage in your favorite REST API testing program
- Issue an HTTP POST to that endpoint with the `path` parameter populated from the output of the PowerShell script you ran in [2.1](#21-copy-the-dataset-to-an-azure-blob-storage-instance)
0 commit comments