AWS Analytics offerings

This was my preparation note while I appeared for AWS solution architect – Associate exam. I cleared it in first attempt with good margin. Sharing it here as I guess it helps for beginners and aspirants.

Other notes in this series.

AWS data pipeline
  • AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data.
  • With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks.
Redshift
  • Large scale data warehousing system.
  • Based on Postgre but customized.
  • AWS data warehousing solution for business intelligence.
  • Supports
    • Single node (160 Gb)
    • Multi node
      • Leader node
      • Compute node – up to 128.
  • Redshift is column based database.
  • Supports compression
  • Billing
    • Charged only for the hours of compute node. Leader node not changed
    • Data transfer
    • Backup.
  • Available in only in 1 AZ
  • Can be created within VPC
  • Redshift doesn’t provide data interface
    • Connection requires usage of ODBC.JDBC connections and PostgreSQL drivers
  • Block size of Redshift is 1MB or 1024 KB.
Elastic Map Reduce (EMR)
  • Elastic map reduce
  • Supports mapreduce and apache spark
  • Big data ecosystem.
Elastic Search
  • Elastic search on amazon.
  • Elasticsearch is distributed search and analytics platform (similar to solr).
Kinesis for real time message processing
amazon-kinesis

  • Real time stream processing.
  • Used for
    • Real time analytics
    • Real time notifications
    • Complex events processing.
Amazon Machine Learning
  • Create predictive models.
  • Deploy models
  • Do scoring
  • A set of algorithms available

Leave a Reply

Your email address will not be published. Required fields are marked *