Did a talk on how to get started studying networking and security with the cloud for the DSC groups of North America.
I thought we would talk about networking and security at a high level. We could spend a lot of time on either of these subjects, and they’re both really large subjects, so I thought I would just sort of give you an idea of how I think about all this and then maybe that will be valuable to you to give you an idea of just how I think about all this stuff and how it’s coming up in the future of the field and then at the end I’ll sort of give you some different things that you might try if you’re interested in learning more about this on your own. So yeah, very broadly speaking, we’ll just talk about sort of networking. At a high level, we’ll look at sort of security, which I don’t use since it’s sort of like a specific thing but more of just like a process in general, and then you’ll know how all this works out in the real world, sort of the theory and the practice of it, and then, like I said, probably some trends that I’ve seen in the industry and just some stuff I think you should be thinking about going forward.
Networking is a whole complicated subject in and of itself. Very broadly, we have switches and stuff like that to actually transfer packets of data around. But then, on top of all these like hardware devices, there’s a lot of really complicated software that runs under the hood to actually make everything all work together. So networking is many people think that it’s hardware, but it’s oftentimes very much software, which is very much the key part of actually making all the hardware smart, so to speak. At a very simplistic level, we might think that we have simple UDP packets that go over the network, and then if we can do some sort of handshaking, we’ve invented TCP, our very simplest network protocol. From there, we can sort of stack different networking levels, so if you’re interested in the field of networking, you need to know the base levels one through seven and the various spots at which we can run hardware or run sort of packet filters and sort of networking algorithms at these various levels, and then a lot of times in modern stuff, things span these levels. So, you have to interface with multiple levels of the networking stack simultaneously. Then if we can start building trunking networks and stuff then we can basically jump from there into virtual LANs, sort of segmenting a specific port on a switch saying that only traffic that’s allowed over this specific port or virtual lab fits into a certain thing. Then from there the next step is to sort of to combine multiple virtual LANs and that’s where we reach up to virtual private networks which are really common in the sort of business and industry world. Then, increasingly in the modern world of cloud computing we have sort of this concept called “virtual private clouds” so you might run all your sort of services on one little private IP network and they can only talk to the other services over clearly defined routes and patterns and stuff like that and then basically by extension we isolate a lot of our stuff and we get a lot of security almost for free by doing this approach.
A surprising number of people don’t really know about the DNS layer, just sort of like whenever you type google.com into the search the web bar on your browser all the various steps that go on under the hood for that page to actually get delivered to you. I think at the very least you need to know how to interface with some sort of DNS registar. After buying domains, you need some sort of tool to manage your actual DNS entries. I really like CloudFlare in this space. So, you might look at them. At some point, load balancers aren’t probably technically part of DNS, but I put them on here because if you’re going to start dealing with heavy traffic at scale, you’re going to have to explicitly define your spots where your traffic is coming in and then be sent out to your internal network. By extension, HTTPS and setting up full encryption stacks and stuff like that is something that you very much should try to do on your own, just to sort of learn how it all actually works, the whole certificate and signing process. Then running your own email servers, is probably outside the scope of what you all want to learn. But SPF and DKIM (domain key identity management) is just a really interesting sort of long-term sort of thing, I think we’ll see more of this sort of pattern of publishing security stuff in the DNS records in the future.
So, to go back to the whole network as a physical device, I think one gigabit Ethernet is pretty well established. Most of you are probably familiar with networking at that level. 10 Gigabit Ethernet is sort of starting to come down to like the consumer level so I think that’s just sort of an interesting trend over the next year or two. WiFi 6 is out. This gives you about 200 megabytes a second worth of bandwidth for wireless devices. So, I just think that’s really interesting for building sort of networks and stuff like that with lots of little devices. At the enterprise level, sort of the story of the last year or two has been sort of this slow transition from 40 to 100 Gigabit Ethernet sort of just speeding up these backlinks for storage networks and stuff like that. In the high-performance world, this whole area of remote direct memory access interconnects (RDMA), notably InfiniBand, but there are other interesting players that are really coming up and becoming important right now. Then PCI5 is about to make it to market, and I think this is just a really interesting technology in general for increasing the bandwidth of devices. But the CXL framework is super interesting. This idea is that we’re sort of rethinking what a processor and a device on a bus really are and finding new ways for these things to talk together. Flash memory is probably going to be the most near-term application of this sort of stuff. But we’re increasingly seeing sort of this whole world of GPU devices and edge computing, which is to say, moving our flash memory either closer to where we’re going to process it or vice versa, moving our processors closer to where we actually need to run them so we can run AI in the field and stuff like that on the local network and skipping a trip to the cloud, so to speak.
The cloud is kind of a vague term. To me, it’s just a fancy word for Unix. So, I think if you want to be serious about learning the cloud, by extension, you need to be serious about learning Unix. Towards this end, I think you need to pick a particular distribution of Linux. I’m a fan of the Ubuntu LTS releases, but there are certainly plenty of choices out there. But basically, I will tell you just to pick one and sort of use it for everything, and then, by extension, you’ll sort of know all the little day-to-day frustrations and stuff and be able to deal with them reliably both locally and in the cloud. And then, ideally, I think the long-term trend of all this cloud stuff is to just generally move the logic closer and closer to the user. We can think of this as load balancing, running servers across different continents in order to reduce the round-trip time for your users. But we can also kind of get into that whole other realm of edge hardware and AI devices and stuff like that sort of coming down to the actual users' network.
Docker is a really powerful pattern. It’s kind of taken over the industry by storm. So, I think you should 100% need to learn how to write Docker scripts and understand what’s going on whenever you build a simple virtual machine this way and then deploy it. The actual process of deploying Docker containers, I think it’s still kind of like it’s not 100% clear what the proper pattern and processes are. So, I think this is something that’s still going to be banging on for the next few years. Google has certainly done a lot of work with Kubernetes, and so that’s kind of like the 600-pound gorilla of this space, so to speak, right now. Kubernetes has a lot of mindshare, and you should at least poke around to at least sort of understand some of those concepts, and then sort of decide whether or not you want to go deeper down the Kubernetes rabbit hole.
Once you’ve built something, oftentimes redeploying it kind of becomes the next stage of all this stuff. Back in the glory days, historically, a lot of people would just sort of run servers and if something broke, they would log in and update the server in place. We sort of patch things as needed. But the modern pattern, I think, is that if a server doesn’t work, we don’t like it anymore. Basically, we throw it away, so to speak, so you can redeploy a new virtual server in the cloud. So, this kind of gets into the whole Kubernetes idea of cattle, and not pets, so to speak. So at some point, if you’re doing a lot of redeploying, you’re going to need some sort of secret management tool to sort of unify all your keys and whatnot, and then likewise, once you have secrets, you probably need to start rotating them or setting up some sort of schedule.
Then we have these clouds, and so some people will try to design applications and interfaces to sort of be cloud-agnostic. But I don’t think that’s really a good pattern. Because then you’re really limiting yourself by not utilizing some of the nice features of these clouds. So, I’ve seen people make sort of business cases for being cloud-agnostic, but I always feel like you’re sort of limiting your possibilities. So, I would tell you to pick a particular platform and then sort of just embrace it. Then Terraform and Nomad, these are just some interesting tools that are sort of rethinking the whole sort of Kubernetes rebuilding the world process, you might say. I just think they’re interesting to have on your radar on the long term, since we’re rethinking these patterns and finding better and better ways to sort of declaratively build these sorts of large cloud clusters, so to speak.
Security then, I don’t know that I can necessarily just give you some sort of set of best practices for security. To me, the essence of security is just really just minimizing the amount of surface area that you’re shipping to the world. To use a simple example, we might build a static site with nothing but HTML and CSS, and then you can deploy that certainly using Apache or some sort of web server like that. But if you had a purely static site, I would go a step further and use a pure static server like the Cloud Bucket interface, and then by extension, I don’t have to worry about security because I sort of offloaded the security concerns to somebody else. This, to me, is a lot of how you get into security as a mindset is just sort of building things in such a way that hopefully you don’t have to maintain them, you don’t have to keep them up to date, and you don’t have to sort of maintain all this sort of headspace of what’s going on someplace else. So, towards this, then you need to be building and deploying as soon as possible, ideally on day one. Because then if you’re trying to build security on top, you’re building on top of real product and you’re not sort of defending against imaginary threats which is a pattern I’ve seen people get into. Then to me personally, I think that updates and iteration speed, just being able to ship more code is much more important than being correct basically because the faster and faster you can get that loop, the quicker and quicker you can go and by extension I think that will ultimately give you the resilience you needed to handle these sort of security issues.
So towards that end, then once you’ve actually started shipping I think then you just need to sort of coming up with some sort of process unify containers use one operating system for everything like I said and then you only have one set of interfaces to keep tracking. Put all your deploy process into code and make sure that everything is literally running from scratch or from a specific git commit. This is a lot more of a pain to set up. But the flip side is that it dramatically reduces the possibility of things breaking along the way. You should try to document what you’re doing and stuff like that. I think a lot of people go overboard on this and they sort of write reams of documents. I would tell you just to make something simple. But in my experience, even something simple is really hard to maintain. Over time, we’ll see it evolve and become more complicated very quickly, and so just sort of doing this sort of simple documentation and then trying to keep it up to date is a really powerful tool towards understanding what the heck is going on whenever your code is running in the cloud, and by extension, being able to reason about it. Then backups are really important. I think this is all a part of the security process as well. To me, security means many people think of security as sort of being like defending against hackers which is certainly one part of the whole thing. But to me, being able to say, “Okay, we lost all of our data. Can we rebuild the system? Can we get back to where we were?” really sort of de-risking the whole data ingress and stuff like that is a really important tool. So, making a backup and then by extension, seeing if your backups can actually be deployed or rebuilt, so to speak, means that you actually have documented your build process properly is a really powerful thing. A lot of people make backups and then only discover too late that the things that they thought were getting backed up were not.
And then part of all this is dealing with computers, but another piece of all this is dealing with people. A gentleman named Steve Yegge has a blog post on sort of this idea of accessibility versus security, and I thought it was actually just sort of an interesting way of sort of framing and thinking about all this. But I think his point was that a product can have zero security, so to speak, but still be a successful product people use even though it’s broken. But vice versa, you can have all the bells and whistles and the most hardened Linux distribution in the world but if nobody’s actually using your site, then all that work is for nothing. So, trying to find some sort of balance between sort of having some sort of best practices but, on the flip side, not being the person who sort of just says no all the time, I think is really important framing for you and your career. My experience is that whenever you say no or don’t do things that way, eventually stuff will start sort of bypassing you, so to speak, politically. So eventually, that’s not where you want to be in the long term for your career. So, you need to find a way to work with everybody and do things not necessarily in your way or your process, but work within their system.
And so, these are just some simple things I think you can do if you’re interested in learning more about the cloud and networking and security in general. I think one of the simplest things you can do is just buy a domain. For about $10 a year, you can buy a domain and you can practice pointing it at different sites doing HTTP signing or HTTPS signing and stuff like that. I think this is just a really good practical way to sort of learn some of the nitty-gritty of running a website yourself. Cloud servers are exceedingly cheap. You can mock build one of these servers in the cloud and run it for an hour for literally pennies, so I think sort of spinning up servers, playing around with them a little bit, and then deleting them is an exceedingly cheap and really good way for you to get a lot of practice with these tools. So, sign up for an account on Google Cloud and just start poking around, and then yeah, I think they give you some free credits, try to use them in as imaginative a way as possible, and it’s just a good way to learn for the price of a cup of coffee. You can learn a whole bunch about computers on your own, at your own pace. I’ve had luck with my old enterprise equipment. Notably, like routers, you can get devices that are extremely expensive a few years ago, for very cheap off eBay, and by extension, you can teach yourself sort of networking at a low level in a very self-directed, on-your-own-time sort of process. So, this is something that you might look at if you understand this field. Like I said, if you’re going to run Linux in the cloud, I think you should also be running it locally. So just find a Linux distribution that you like and then just use it as much as possible in order to really learn how it works and how all the things go. I think it’s just a really powerful skill in general.
microk8s + kubeflow
So, I did a talk last year on all this and I’m simply gonna repeat the demo I did then, which is running a Kubeflow deployment on top of a microk8s cluster, which I think is just kind of an interesting way of sort of tying all these concepts together.
What I’ve done is just basically install the server version of Ubuntu LTS, and then you can do a
sudo snap install microk8s --classic and this will install the microk8s distribution to your server, and then, if you like, you can add more and more of these nodes. So basically, by combining, say, a handful of computers together, you can have yourself a high-availability Kubernetes cluster to play with locally. From there, basically, we can do sort of Helm deployments and stuff like that, and then microk8s has this enable kubeflow command right here. So basically, then you would run this command, and by extension, this will deploy kubeflow into your local little bitty cluster, so to speak, and then you can SSH into it and look at it directly. So, I did all this earlier today. So, here’s my machine. It’s just a simple single-node device. So, here’s all the services I have running. So, we have 68 services running on top of our little mini-Kubernetes cluster here in order to run kubeflow for us, and so then basically we can log into our Kubernetes/kubeflow dashboards, you can poke around on Kubernetes here which is deployed, and then we can run Jupyter Labs. This is a simple way to set up notebooks. So here I’ve run a simple MNIST demo is running on my cluster on my node as just a simple example of running machine learning in the cloud. This whole kubeflow thing is a big part of how Google Cloud is building the vertex AI tool. This is kind of where they’re pivoting to or moving to this year, and so if you understand how this kubeflow works by extension, I think you’ll have a solid leg up on understanding what vertex.ai is doing under the hood for you. So, I just think this is kind of just an interesting set of complementary skills.
So, like I said, I think of security as being a mindset, not a specific set of best practices or guidelines in general, but rather security is sort of being sort of paranoid about the world and then trying to design your old system in such a manner that it’s not going to break whenever the first group of woodpeckers comes through, so to speak, and then likewise, I think you can sort of work with networking stuff at the pure software level, just think of it as being a sort of a software interface for linking together networks and devices. But if you actually understand how the hardware works and the packets of data get routed at the packet level, I think this will allow you to reason much more clearly about the scale and, by extension, build larger and larger applications and ship them to more and more users. But broadly, none of this stuff is really magic, I would say, although people often try to make it so, so just practice and bang around a little bit over time, and then eventually, hopefully, these concepts will become clear to you, and you’ll be able to utilize the cloud to solve new and interesting problems. With that, I’ll say thanks for listening.