ETL with the CMR API, Apache Spark, and the FRED S&P 500 Economics Data - YouTube

Channel: Coherent Logic Limited

[1]
Hi this is Tom with Coherent Logic and today I will give you a quick
[5]
demonstration of the CMR data acquisition API for the Spark Cluster
[9]
Computing Platform we will use the CMR API to acquire S&P 500 economics data from the Federal
[14]
Reserve Bank of St. Louis and once you understand this example accessing the
[18]
other data sets is easy so you can see that we have Spark running it's a single
[25]
node in this instance nothing fancy on the right hand side of our screen we
[30]
have the FRED website for Federal Reserve Economic Data there are a number
[35]
of series here and if we click on one of these series for example the S&P 500 it
[40]
will take us to a page that looks like this the S&P 500 has a series ID called
[46]
SP 500 and we need that and when we invoke the Federal Reserve Bank of st.
[52]
Louis Web Services the data will be returned in XML form and that's what we
[58]
can see here now the URL up here we're going to compare in a moment but we do
[64]
need that URL and let's go into a little bit of code so in this case we have an
[72]
API key and this is provided to us by the Federal Reserve Bank of St. Louis we
[77]
have some imports and I'm going to create a new instance of CMR that's done
[89]
here
[93]
and now we'll take a look at the query so this is our query we have SP 500 here
[102]
and we're going to wait for a moment while spring wires together two
[108]
components for our API and once that happens the first time subsequent calls
[114]
will not be affected by the same kind of delay so we have some data as a quick
[121]
aside let's take a look at the query I the URL is at the top and our query is
[127]
at the bottom so we could see that we have Fred series observations and it
[133]
reads the same in our API FRED series observations we have an API key we set
[140]
the API key in our API using a method called with API key and we have a series
[146]
identify or SP500 and that is set using a method called with series ID SP 500
[153]
finally we invoke a method called do get observations sorry
[158]
doGetAsObservationsDataSet passing an instance of Spark and once that
[163]
returns we have a data set with SP 500 observations so in this example since we
[172]
have the data we take a quick look at what's in the data and starting with the
[177]
count so we have two thousand six hundred and twenty five results and we
[184]
will show the first five to get an idea what's in there so we can see that data
[191]
we're not going to do any analysis in this case so moving on we have other
[198]
series that we could take a look at and in this case we could do the
[202]
case-shiller u.s. national home price index I have an example already for that
[212]
so in the right hand side of the screen we can see the S&P Case-Schiller U.S.
[218]
national home price index with the series ID here and I already have the
[225]
query constructed for this you can see that this has five hundred and seventeen
[232]
rows and we can also notice that it returned much quicker this time because
[237]
we no longer have the upfront wiring impact when the API was initially
[243]
started so we could try a couple others I won't read them out but they each have
[252]
their own series identifier
[266]
and so we can see in each case the queries had in this case 517 results 217
[276]
373 rows 2709 rows 325 rows thousand thirty two rows 855 rows and finally
[286]
five thousand and thirty five rows so this concludes our example of the CMR
[293]
Data Acquisition API for the Federal Reserve Bank of St. Louis. Please feel
[297]
free to leave questions and comments as any feedback is always welcome.