Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python
J**L
Solid introduction to Data Engineering
Data Engineering With Python provides a solid overview of pipelining and database connections for those tasked with processing both batch and stream data flows. Not only for the data miners, this book will be useful as well in a CI/CD environment using Kafka and Spark. It’s very readable and contains lots of practical, illustrative examples.Hits — solid explanations and demonstrations of Pandas, Zookeeper, Kafka, and Spark. Also introduces Great Expectations, NiFi, Airflow, and Faker, all of which are tied together in a usable demonstration environment. Pipeline implementation’s thoroughly covered as well.Misses — the book could use a little fine-tuning as to Python 3; some of the instructions are rather downrev, and concepts like 311 / SeeClickFix are sort of dropped in without a lot of explanation. Also, there’s a heavy focus on SQL and almost no coverage of noSQL databases.Overall, a good addition to the bookshelf if you’re using any of these Python packages. Readable and useful for anyone supporting data analysis.
L**Z
A Good conceptual foundation
I was hesitant to buy this book based on some reviews that I read, but I decided to give it a shot nonetheless.For context, I’m currently a Software Engineer looking to make a transition over to Data Engineering in the future. I had little knowledge of the field, so I was looking for a book to give me a bit of a foundation on it.For me, the first and second sections of the book were really good for a conceptual understanding of the techniques and jargon within DE. Especially the first section!What threw me a little off was the constant reference to NiFi. Don’t get me wrong though, I did enjoy learning about it and also I think it was useful to understand that DE’s use a variety of different tools in their day to day (which was a plus for me).I just think the title is slightly misleading as Python is not always referenced. I think it got more exposure in the first section, but died down a bit in the latter half of the second section and then got referenced just a bit in the last one. Also, the Python implementation used in NiFi is Jython and not the default implementation that many probably use. Also, and I could be totally mistaken, but it seems that the most recent Jython version uses Python 2.7? Seemed a bit backwards to me to reference such an outdated version, but that’s in no way the author’s fault.I gave a 4 out of 5 stars just because some examples of NiFi simply didn’t work even though I followed the steps correctly. This could be because things have changed with the software since the release of the book, but I’m not totally sure….In any case, I would still recommend this book to get a decent conceptual knowledge on DE principles, but I would do what others suggested and look at the documentations of the tools referenced to get a more updated view of them and work on personal projects utilizing them to apply the knowledge that was taught in this reading.
P**L
Too Framework Dependent
I’m really appreciative of this author helping to teach others about data engineering. With that being said, this book is too platform dependent and doesn’t cover the fundamentals in great depth. I think this book should’ve focused more on Python, sql, and how to model databases and build etls from scratch without using any tools like airflow, even if the examples were much simple. It would have been a better way to show the process in a way that doesn’t have too much abstraction because of advanced tools like airflow and nifi.
A**R
Returned book before finishing chapter 2
This book has a very poor flow (ironically based on what it is teaching) and is filled with errors. I wasn't even able to finish chapter 2 before I grew frustrated and requested a refund.The author makes a very large assumption that you will be using Linux and expects you to have an understanding on how to use it. This isn't a big deal if you are familiar with Linux, except the author makes no attempt to explain the flavor of Linux they are using or what sort of setup they have. It would have been better to have the reader create a VM with a specific setup so that everyone is on the same page.It also took me a long time to get Nifi up and running. There is a typo in one of the commands when you set the JAVA_HOME system variable. The version of Nifi is also very out of date which is understandable considering this is a book, however, there is no correction on the publisher's site or the GitHub repo. Even after you get Nifi setup, it jumps around the example and completely misses some steps on how to run your first flow.It's obvious this was a rushed book and the editing was also rushed or was not even done. You're better off finding a different book or finding online training.
M**Y
Not the best editing
Already seeing typos and poor visual examples (e.g. the columnar format example) within the first 10 main pages. Seems to have some good overall content but a bit discouraging that I feel as though I need to reconfirm or check against some of the statements in a book that’s supposed to be beginner friendly. The saving grace is a lot of content and variety.
Trustpilot
1 week ago
3 weeks ago