馃攳
ETL with the CMR API, Apache Spark, and the FRED S&P 500 Economics Data - YouTube
Channel: Coherent Logic Limited
[1]
Hi this is Tom with Coherent Logic and
today I will give you a quick
[5]
demonstration of the CMR data
acquisition API for the Spark Cluster
[9]
Computing Platform we will use the CMR API
to acquire S&P 500 economics data from the Federal
[14]
Reserve Bank of St. Louis and once you
understand this example accessing the
[18]
other data sets is easy so you can see
that we have Spark running it's a single
[25]
node in this instance nothing fancy on
the right hand side of our screen we
[30]
have the FRED website for Federal
Reserve Economic Data there are a number
[35]
of series here and if we click on one of
these series for example the S&P 500 it
[40]
will take us to a page that looks like
this the S&P 500 has a series ID called
[46]
SP 500 and we need that and when we
invoke the Federal Reserve Bank of st.
[52]
Louis Web Services the data will be
returned in XML form and that's what we
[58]
can see here now the URL up here we're
going to compare in a moment but we do
[64]
need that URL and let's go into a little
bit of code so in this case we have an
[72]
API key and this is provided to us by
the Federal Reserve Bank of St. Louis we
[77]
have some imports and I'm going to
create a new instance of CMR that's done
[89]
here
[93]
and now we'll take a look at the query
so this is our query we have SP 500 here
[102]
and we're going to wait for a moment
while spring wires together two
[108]
components for our API and once that
happens the first time subsequent calls
[114]
will not be affected by the same kind of
delay so we have some data as a quick
[121]
aside let's take a look at the query I
the URL is at the top and our query is
[127]
at the bottom so we could see that we
have Fred series observations and it
[133]
reads the same in our API FRED series
observations we have an API key we set
[140]
the API key in our API using a method
called with API key and we have a series
[146]
identify or SP500 and that is set using
a method called with series ID SP 500
[153]
finally we invoke a method called do get
observations sorry
[158]
doGetAsObservationsDataSet passing
an instance of Spark and once that
[163]
returns we have a data set with SP 500
observations so in this example since we
[172]
have the data we take a quick look at
what's in the data and starting with the
[177]
count so we have two thousand six
hundred and twenty five results and we
[184]
will show the first five to get an idea
what's in there so we can see that data
[191]
we're not going to do any analysis in
this case so moving on we have other
[198]
series that we could take a look at and
in this case we could do the
[202]
case-shiller u.s. national home price
index I have an example already for that
[212]
so in the right hand side of the screen
we can see the S&P Case-Schiller U.S.
[218]
national home price index with the
series ID here and I already have the
[225]
query constructed for this you can see
that this has five hundred and seventeen
[232]
rows and we can also notice that it
returned much quicker this time because
[237]
we no longer have the upfront wiring
impact when the API was initially
[243]
started so we could try a couple others
I won't read them out but they each have
[252]
their own series identifier
[266]
and so we can see in each case the
queries had in this case 517 results 217
[276]
373 rows 2709 rows 325 rows thousand
thirty two rows 855 rows and finally
[286]
five thousand and thirty five rows so
this concludes our example of the CMR
[293]
Data Acquisition API for the Federal
Reserve Bank of St. Louis. Please feel
[297]
free to leave questions and comments as
any feedback is always welcome.
Most Recent Videos:
You can go back to the homepage right here: Homepage





