The increasing use of domain-specific computing hardware and architectures has led to an increasing demand for unconventional computing approaches. One such approach is the Ising machine, which is designed to solve combinatorial optimization problems. Here we show that a probabilistic-bit (p-bit)-based Ising machine can be used to train deep Boltzmann networks. Using hardware-aware network topologies on field-programmable gate arrays, we train the full Modified National Institute of Standards and Technology (MNIST) and Fashion MNIST datasets without downsampling, as well as a reduced version of the Canadian Institute for Advanced Research, 10 classes (CIFAR-10) dataset. For the MNIST dataset, our machine, which has 4,264 nodes (p-bits) and about 30,000 parameters, can achieve the same classification accuracy (90%) as an optimized software-based restricted Boltzmann machine with approximately 3.25 million parameters. Similar results are achieved for the Fashion MNIST and CIFAR-10 datasets. The sparse deep Boltzmann network can also generate new handwritten digits and fashion products, a task the software-based restricted Boltzmann machine fails at. Our hybrid computer performs a measured 50 to 64 billion probabilistic flips per second and can perform the contrastive divergence algorithm (CD-n) with up to n = 10 million sweeps per update, which is beyond the capabilities of existing software implementations.