I have a numpy array of pixel data, something like
0 0 0 0 0 0 0
0 1 3 4 6 1 0
0 2 3 5 2 1 0
0 1 0 0 1 0 0
0 0 0 0 0 0 0
I would like to get a new array which excludes any outer rows/columns with zeroes, so I just end up with only the non-zero values (that works for any given array) i.e.
1 3 4 6 1
2 3 5 2 1
So far all I've managed to get is
1 3 4 6 1
2 3 5 2 1
1 0 0 1 0
using np.argwhere to find the "min" and "max" non-zero values, but this still includes rows/columns with zero and non-zero values in.
My actual array:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765 0 0
0 0 1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756 0 0
0 0 1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792 0 0
0 0 1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817 0 0
0 0 1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754 0 0
0 0 1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864 0 0
0 0 1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824 0 0
0 0 1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876 0 0
0 0 1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754 0 0
0 0 1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928 0 0
0 0 1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798 0 0
0 0 1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875 0 0
0 0 1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825 0 0
0 0 1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813 0 0
0 0 1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857 0 0
0 0 1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857 0 0
0 0 1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818 0 0
0 0 1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898 0 0
0 0 1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822 0 0
0 0 1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834 0 0
0 0 1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819 0 0
0 0 1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907 0 0
0 0 1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861 0 0
0 0 1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881 0 0
0 0 1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873 0 0
0 0 1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842 0 0
0 0 1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888 0 0
0 0 1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873 0 0
0 0 1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796 0 0
0 0 1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755 0 0
0 0 1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816 0 0
0 0 1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855 0 0
0 0 1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787 0 0
0 0 1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842 0 0
0 0 1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738 0 0
0 0 1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829 0 0
0 0 1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776 0 0
0 0 1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721 0 0
0 0 1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844 0 0
0 0 1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887 0 0
0 0 1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800 0 0
0 0 0 1782 0 0 0 0 1879 0 0 0 0 1884 0 0 0 0 0 0 0 1893 0 1932 1909 1938 0 0 0 0 0 1928 0 0 1816 0 0 1921 1887 0 0 0 0 1876 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1907 0 0 0 0 1944 0 0 0 0 1954 0 0 0 0 0 0 0 1930 0 1875 1882 1912 0 0 0 0 0 1890 0 0 1875 0 0 1873 1872 0 0 0 0 1897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Welcome to StackOverflow!
Input:
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 1872 ... 1765 0 0]
...
[ 0 0 1850 ... 1800 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]]
Input array.npy
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765 0 0
0 0 1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756 0 0
0 0 1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792 0 0
0 0 1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817 0 0
0 0 1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754 0 0
0 0 1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864 0 0
0 0 1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824 0 0
0 0 1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876 0 0
0 0 1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754 0 0
0 0 1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928 0 0
0 0 1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798 0 0
0 0 1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875 0 0
0 0 1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825 0 0
0 0 1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813 0 0
0 0 1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857 0 0
0 0 1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857 0 0
0 0 1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818 0 0
0 0 1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898 0 0
0 0 1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822 0 0
0 0 1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834 0 0
0 0 1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819 0 0
0 0 1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907 0 0
0 0 1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861 0 0
0 0 1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881 0 0
0 0 1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873 0 0
0 0 1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842 0 0
0 0 1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888 0 0
0 0 1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873 0 0
0 0 1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796 0 0
0 0 1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755 0 0
0 0 1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816 0 0
0 0 1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855 0 0
0 0 1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787 0 0
0 0 1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842 0 0
0 0 1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738 0 0
0 0 1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829 0 0
0 0 1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776 0 0
0 0 1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721 0 0
0 0 1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844 0 0
0 0 1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887 0 0
0 0 1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800 0 0
0 0 0 1782 0 0 0 0 1879 0 0 0 0 1884 0 0 0 0 0 0 0 1893 0 1932 1909 1938 0 0 0 0 0 1928 0 0 1816 0 0 1921 1887 0 0 0 0 1876 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1907 0 0 0 0 1944 0 0 0 0 1954 0 0 0 0 0 0 0 1930 0 1875 1882 1912 0 0 0 0 0 1890 0 0 1875 0 0 1873 1872 0 0 0 0 1897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Solution 1:
np_input = np.load('array.npy')
# Remove all zeros from column
np_input = np_input[:, (np_input != 0).any(axis=0)]
# Remove all zeros from row
np_input = np_input[(np_input != 0).any(axis=1)]
# converting to list of lists
np_input = np_input.tolist()
# Remove sub list that contains a zero
np_input = [x for x in np_input if 0 not in x]
# Convert pixles_input to numpy array
final_np = np.array(np_input)
print(final_np)
Solution 2:
np_input = np.load('array.npy')
final_np = np.array([x for x in np_input[:, (np_input != 0).any(axis=0)][(np_input != 0).any(axis=1)].tolist() if 0 not in x])
print(final_np)
Output:
[[1872 1803 1731 ... 1709 1774 1765]
[1937 1746 1790 ... 1685 1814 1756]
[1754 1895 1806 ... 1817 1885 1792]
...
[1861 1895 1819 ... 1861 1867 1844]
[1822 1867 1806 ... 1786 1919 1887]
[1850 1926 1855 ... 1861 1761 1800]]
Output array.npy
1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765
1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756
1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792
1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817
1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754
1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864
1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824
1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876
1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754
1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928
1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798
1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875
1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825
1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813
1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857
1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857
1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818
1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898
1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822
1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834
1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819
1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907
1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861
1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881
1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873
1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842
1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888
1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873
1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796
1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755
1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816
1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855
1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787
1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842
1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738
1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829
1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776
1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721
1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844
1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887
1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800
If we go by your assumption that there likely won't be any zeros in the middle of the array, we can figure out if a row contains any zeros using any(axis=1) (or axis=0 for columns), and if a row contains all zeros using all
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 3, 4, 6, 1, 0],
[0, 2, 3, 5, 2, 1, 0],
[0, 1, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
To start, we want to delete those rows and columns that are all zeros.
delete_rows = (data == 0).all(axis=1)
delete_cols = (data == 0).all(axis=0)
For now, let's set those rows to -999 (since your data is pixel data, -999 is an invalid value that you never expect to see) so that data == 0 for the future steps isn't confused by these "border" rows/cols
data[delete_rows, :] = -999
data[:, delete_cols] = -999
Next, let's find any rows that contain any zeros and are next to a row that's going to be deleted (previous or next row is in delete_rows):
zero_rows = (data == 0).any(axis=1)
d_r = np.zeros(zero_rows.shape, dtype=bool)
d_r[1:] = d_r[1:] | delete_rows[:-1]
d_r[:-1] = d_r[:-1] | delete_rows[1:]
d_r[0] = d_r[-1] = True
delete_rows = delete_rows | (zero_rows & d_r)
data[delete_rows, :] = -999
We can repeat this until there are no more changes to delete_rows. I.e.:
del_count = sum(delete_rows)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_rows = (data == 0).any(axis=1)
d_r = np.zeros(zero_rows.shape, dtype=bool)
d_r[1:] = d_r[1:] | delete_rows[:-1]
d_r[:-1] = d_r[:-1] | delete_rows[1:]
d_r[0] = d_r[-1] = True # First and last rows can be deleted if they have any zeros
delete_rows = delete_rows | (zero_rows & d_r)
prev_del_count, del_count = del_count, sum(delete_rows)
data[delete_rows, :] = -999
Then, we can do the same for columns:
del_count = sum(delete_cols)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_cols = (data == 0).any(axis=0)
d_c = np.zeros(zero_cols.shape, dtype=bool)
d_c[1:] = d_c[1:] | delete_cols[:-1]
d_c[:-1] = d_c[:-1] | delete_cols[1:]
d_c[0] = d_c[-1] = True # First and last cols can be deleted if they have any zeros
delete_cols = delete_cols | (zero_cols & d_c)
prev_del_count, del_count = del_count, sum(delete_cols)
data[:, delete_cols] = -999
Now, we have:
delete_rows = np.array([ True, False, False, True, True])
delete_cols = np.array([ True, False, False, False, False, False, True])
And we can filter out the required rows and cols:
filtered_data = data[~delete_rows, :][:, ~delete_cols]
which gives:
array([[1, 3, 4, 6, 1],
[2, 3, 5, 2, 1]])
Running this on the larger array, we get the desired result:
def remove_outside_zeros(data):
delete_rows = (data == 0).all(axis=1)
delete_cols = (data == 0).all(axis=0)
data[delete_rows, :] = -999
data[:, delete_cols] = -999
del_count = sum(delete_rows)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_rows = (data == 0).any(axis=1)
d_r = np.zeros(zero_rows.shape, dtype=bool)
d_r[1:] = d_r[1:] | delete_rows[:-1]
d_r[:-1] = d_r[:-1] | delete_rows[1:]
d_r[0] = d_r[-1] = True
delete_rows = delete_rows | (zero_rows & d_r)
prev_del_count, del_count = del_count, sum(delete_rows)
data[delete_rows, :] = -999
del_count = sum(delete_cols)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_cols = (data == 0).any(axis=0)
d_c = np.zeros(zero_cols.shape, dtype=bool)
d_c[1:] = d_c[1:] | delete_cols[:-1]
d_c[:-1] = d_c[:-1] | delete_cols[1:]
d_c[0] = d_c[-1] = True
delete_cols = delete_cols | (zero_cols & d_c)
prev_del_count, del_count = del_count, sum(delete_cols)
data[:, delete_cols] = -999
return data[~delete_rows, :][:, ~delete_cols]
arr = """0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765 0 0
0 0 1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756 0 0
0 0 1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792 0 0
0 0 1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817 0 0
0 0 1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754 0 0
0 0 1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864 0 0
0 0 1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824 0 0
0 0 1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876 0 0
0 0 1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754 0 0
0 0 1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928 0 0
0 0 1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798 0 0
0 0 1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875 0 0
0 0 1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825 0 0
0 0 1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813 0 0
0 0 1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857 0 0
0 0 1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857 0 0
0 0 1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818 0 0
0 0 1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898 0 0
0 0 1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822 0 0
0 0 1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834 0 0
0 0 1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819 0 0
0 0 1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907 0 0
0 0 1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861 0 0
0 0 1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881 0 0
0 0 1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873 0 0
0 0 1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842 0 0
0 0 1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888 0 0
0 0 1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873 0 0
0 0 1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796 0 0
0 0 1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755 0 0
0 0 1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816 0 0
0 0 1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855 0 0
0 0 1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787 0 0
0 0 1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842 0 0
0 0 1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738 0 0
0 0 1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829 0 0
0 0 1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776 0 0
0 0 1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721 0 0
0 0 1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844 0 0
0 0 1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887 0 0
0 0 1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800 0 0
0 0 0 1782 0 0 0 0 1879 0 0 0 0 1884 0 0 0 0 0 0 0 1893 0 1932 1909 1938 0 0 0 0 0 1928 0 0 1816 0 0 1921 1887 0 0 0 0 1876 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1907 0 0 0 0 1944 0 0 0 0 1954 0 0 0 0 0 0 0 1930 0 1875 1882 1912 0 0 0 0 0 1890 0 0 1875 0 0 1873 1872 0 0 0 0 1897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"""
data = np.array([row.split() for row in arr.split("\n")], dtype=int)
r = remove_outside_zeros(data)
print(r)
gives:
array([[1872, 1803, 1731, ..., 1709, 1774, 1765],
[1937, 1746, 1790, ..., 1685, 1814, 1756],
[1754, 1895, 1806, ..., 1817, 1885, 1792],
...,
[1861, 1895, 1819, ..., 1861, 1867, 1844],
[1822, 1867, 1806, ..., 1786, 1919, 1887],
[1850, 1926, 1855, ..., 1861, 1761, 1800]])
I'm having issues with scraping basketball-reference.com. I'm trying to access the "Team Per Game Stats" table but can't seem to target the correct div/table. I'm trying to capture the table and bring it into a dataframe using pandas.
I've tried using soup.find and soup.find_all to find a all the tables but when I search the results I do not see the ID of the table I am looking for. See below.
x = soup.find("table", id="team-stats-per_game")
import csv, time, sys, math
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import urllib.request
#NBA season
year = 2019
# URL page we will scraping
url = "https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)
# Basketball reference URL
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
x = soup.find("table", id="team-stats-per_game")
print(x)
Result:
None
I expect the output to list the table elements, specifically tr and th tags to target and bring into a pandas df.
As Jarett mentioned above, BeautifulSoup can't parse your tag. In this case it's because it's commented out in the source.
While this is admittedly an amateurish approach, it works for your data.
table_src = html.text.split('<div class="overthrow table_container"
id="div_team-stats-per_game">')[1].split('</table>')[0] + '</table>'
table = BeautifulSoup(table_src, 'lxml')
The tables are rendered after, so you'd need to use Selenium to let it render or as mentioned above. But that isn't necessary as most of the tables are within the comments. You could use BeautifulSoup to pull out the comments, then search through those for the table tags.
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd
#NBA season
year = 2019
url = 'https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base'.format(year)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if 'table' in each:
try:
tables.append(pd.read_html(each)[0])
except:
continue
This will return you a list of dataframes, so just pull out the table you want from wherever it is located by its index position:
Output:
print (tables[3])
Rk Team G MP FG ... STL BLK TOV PF PTS
0 1.0 Milwaukee Bucks* 82 19780 3555 ... 615 486 1137 1608 9686
1 2.0 Golden State Warriors* 82 19805 3612 ... 625 525 1169 1757 9650
2 3.0 New Orleans Pelicans 82 19755 3581 ... 610 441 1215 1732 9466
3 4.0 Philadelphia 76ers* 82 19805 3407 ... 606 432 1223 1745 9445
4 5.0 Los Angeles Clippers* 82 19830 3384 ... 561 385 1193 1913 9442
5 6.0 Portland Trail Blazers* 82 19855 3470 ... 546 413 1135 1669 9402
6 7.0 Oklahoma City Thunder* 82 19855 3497 ... 766 425 1145 1839 9387
7 8.0 Toronto Raptors* 82 19880 3460 ... 680 437 1150 1724 9384
8 9.0 Sacramento Kings 82 19730 3541 ... 679 363 1095 1751 9363
9 10.0 Washington Wizards 82 19930 3456 ... 683 379 1154 1701 9350
10 11.0 Houston Rockets* 82 19830 3218 ... 700 405 1094 1803 9341
11 12.0 Atlanta Hawks 82 19855 3392 ... 675 419 1397 1932 9294
12 13.0 Minnesota Timberwolves 82 19830 3413 ... 683 411 1074 1664 9223
13 14.0 Boston Celtics* 82 19780 3451 ... 706 435 1052 1670 9216
14 15.0 Brooklyn Nets* 82 19980 3301 ... 539 339 1236 1763 9204
15 16.0 Los Angeles Lakers 82 19780 3491 ... 618 440 1284 1701 9165
16 17.0 Utah Jazz* 82 19755 3314 ... 663 483 1240 1728 9161
17 18.0 San Antonio Spurs* 82 19805 3468 ... 501 386 992 1487 9156
18 19.0 Charlotte Hornets 82 19830 3297 ... 591 405 1001 1550 9081
19 20.0 Denver Nuggets* 82 19730 3439 ... 634 363 1102 1644 9075
20 21.0 Dallas Mavericks 82 19780 3182 ... 533 351 1167 1650 8927
21 22.0 Indiana Pacers* 82 19705 3390 ... 713 404 1122 1594 8857
22 23.0 Phoenix Suns 82 19880 3289 ... 735 418 1279 1932 8815
23 24.0 Orlando Magic* 82 19780 3316 ... 543 445 1082 1526 8800
24 25.0 Detroit Pistons* 82 19855 3185 ... 569 331 1135 1811 8778
25 26.0 Miami Heat 82 19730 3251 ... 627 448 1208 1712 8668
26 27.0 Chicago Bulls 82 19905 3266 ... 603 351 1159 1663 8605
27 28.0 New York Knicks 82 19780 3134 ... 557 422 1151 1713 8575
28 29.0 Cleveland Cavaliers 82 19755 3189 ... 534 195 1106 1642 8567
29 30.0 Memphis Grizzlies 82 19880 3113 ... 684 448 1147 1801 8490
30 NaN League Average 82 19815 3369 ... 626 406 1155 1714 9119
[31 rows x 25 columns]
As other answers mentioned this is basically because the content of page is being loaded by help of JavaScript and getting source code with help of urlopener or request will not load that dynamic part.
So here I have a way around of it, actually you can make use of selenium to let the dynamic content load and then get the source code from there and find for the table.
Here is the code that actually give the result you expected.
But you will need to setup selenium web driver
from lxml import html
from bs4 import BeautifulSoup
from time import sleep
from selenium import webdriver
def parse(url):
response = webdriver.Firefox()
response.get(url)
sleep(3)
sourceCode=response.page_source
return sourceCode
year =2019
soup = BeautifulSoup(parse("https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)),'lxml')
x = soup.find("table", id="team-stats-per_game")
print(x)
Hope this helped you with your problem and feel free to ask any further doubts.
Happy Coding:)
I have a file that has the following:
A B C D
1 2 3 4 5
2 2 4
3 1 3 4
Note that 4 on line 2 is followed immediately by the new line.
I want to make a dictionary that looks like this
['A']['1'] = 2, d['B']['1'] = 3, ..., d['D']['1'] = 5, d['B']['2'] = 2, etc
The blanks should not appear in the dictionary.
What's the best way to do this in python?
The data will all be single digits right? So it lines up with the column headers? In that case, you can do this:
it = iter(datafile)
cols = list(next(it)[2::2])
d = {}
for row in it:
for col, val in zip(cols, row[2::2]):
if val != ' ':
d.setdefault(col, {})[row[0]] = int(val)
Based on the author's data and code that was recently added, the above code clearly isn't enough. If the format of the document will always be 31 pairs of data for 12 months in groups of 6, we could handle it in many ways. This is what I wrote. It's not the most elegant, probably not as efficient as it can be, but get's the job done. This is one of the reasons why you index by row first, then column.
def process(data):
import re
hre = re.compile(r' +([A-Z]+)'*6)
sre = re.compile(r' +([a-z]+) ([a-z]+)'*6)
dre = re.compile(r'(\d{1,2}) ' + r'(.{4}) (.{4}) {,4}'*6)
it = iter(data)
headers = None
result = {}
for line in it:
if not line: continue
if not headers:
# find the first header
hmatch = hre.match(line)
if hmatch:
subs = iter(sre.match(next(it)).groups())
headers = [h + next(subs)
for h in hmatch.groups()
for _ in range(2)]
count = 0
else:
# fill in the data
dmatch = dre.match(line)
row = dmatch.group(1)
for col, d in zip(headers, dmatch.groups()[1:]):
if d.strip():
result.setdefault(col, {})[row] = int(d)
count += 1
if count == 31:
headers = None
return result
data = """
TIMES OF SUNRISE AND SUNSET (for ideal horizon & meteorological conditions)
For the year 2012
Make corrections for daylight saving time where necessary.
------------------------------------------------------------------------------
JAN FEB MAR APR MAY JUN
rise set rise set rise set rise set rise set rise set
1 0513 1925 0541 1918 0606 1851 0628 1812 0648 1738 0708 1720
2 0514 1925 0541 1918 0606 1850 0628 1811 0649 1737 0709 1719
3 0515 1925 0542 1917 0607 1849 0629 1810 0649 1736 0709 1719
4 0515 1926 0543 1916 0608 1847 0630 1808 0650 1736 0710 1719
5 0516 1926 0544 1915 0609 1846 0630 1807 0651 1735 0710 1719
6 0517 1926 0545 1915 0609 1845 0631 1806 0651 1734 0711 1719
7 0518 1926 0546 1914 0610 1844 0632 1805 0652 1733 0711 1719
8 0519 1926 0547 1913 0611 1843 0632 1803 0653 1732 0712 1719
9 0519 1926 0548 1912 0612 1841 0633 1802 0653 1731 0712 1718
10 0520 1926 0549 1911 0612 1840 0634 1801 0654 1731 0712 1718
11 0521 1926 0550 1911 0613 1839 0634 1800 0655 1730 0713 1718
12 0522 1926 0551 1910 0614 1838 0635 1759 0655 1729 0713 1718
13 0523 1926 0551 1909 0615 1836 0636 1757 0656 1729 0714 1719
14 0524 1926 0552 1908 0615 1835 0636 1756 0657 1728 0714 1719
15 0525 1925 0553 1907 0616 1834 0637 1755 0657 1727 0714 1719
16 0526 1925 0554 1906 0617 1832 0638 1754 0658 1727 0715 1719
17 0527 1925 0555 1905 0617 1831 0638 1753 0659 1726 0715 1719
18 0527 1925 0556 1904 0618 1830 0639 1752 0659 1725 0715 1719
19 0528 1924 0557 1903 0619 1829 0640 1751 0700 1725 0716 1719
20 0529 1924 0558 1902 0619 1827 0640 1749 0701 1724 0716 1719
21 0530 1924 0558 1901 0620 1826 0641 1748 0701 1724 0716 1720
22 0531 1923 0559 1900 0621 1825 0642 1747 0702 1723 0716 1720
23 0532 1923 0600 1859 0621 1824 0642 1746 0703 1723 0716 1720
24 0533 1923 0601 1858 0622 1822 0643 1745 0703 1722 0717 1720
25 0534 1922 0602 1857 0623 1821 0644 1744 0704 1722 0717 1721
26 0535 1922 0602 1855 0624 1820 0644 1743 0705 1722 0717 1721
27 0536 1921 0603 1854 0624 1818 0645 1742 0705 1721 0717 1721
28 0537 1921 0604 1853 0625 1817 0646 1741 0706 1721 0717 1722
29 0538 1920 0605 1852 0626 1816 0646 1740 0706 1720 0717 1722
30 0539 1920 0626 1815 0647 1739 0707 1720 0717 1722
31 0540 1919 0627 1813 0707 1720
JUL AUG SEP OCT NOV DEC
rise set rise set rise set rise set rise set rise set
1 0717 1723 0705 1740 0632 1759 0553 1818 0518 1841 0503 1907
2 0717 1723 0704 1741 0631 1800 0552 1819 0517 1842 0503 1908
3 0717 1724 0703 1741 0630 1801 0551 1819 0517 1843 0503 1909
4 0717 1724 0702 1742 0629 1801 0550 1820 0516 1843 0503 1910
5 0717 1724 0701 1743 0627 1802 0548 1821 0515 1844 0503 1911
6 0717 1725 0700 1743 0626 1802 0547 1821 0514 1845 0503 1911
7 0716 1725 0700 1744 0625 1803 0546 1822 0513 1846 0503 1912
8 0716 1726 0659 1745 0624 1804 0545 1823 0513 1847 0503 1913
9 0716 1726 0658 1745 0622 1804 0543 1823 0512 1848 0503 1914
10 0716 1727 0657 1746 0621 1805 0542 1824 0511 1849 0503 1914
11 0716 1727 0656 1746 0620 1805 0541 1825 0511 1850 0503 1915
12 0715 1728 0655 1747 0618 1806 0540 1825 0510 1850 0504 1916
13 0715 1729 0654 1748 0617 1807 0538 1826 0509 1851 0504 1916
14 0715 1729 0653 1748 0616 1807 0537 1827 0509 1852 0504 1917
15 0714 1730 0652 1749 0614 1808 0536 1827 0508 1853 0505 1918
16 0714 1730 0651 1750 0613 1809 0535 1828 0508 1854 0505 1918
17 0713 1731 0650 1750 0612 1809 0534 1829 0507 1855 0505 1919
18 0713 1731 0649 1751 0610 1810 0533 1830 0507 1856 0506 1920
19 0713 1732 0648 1751 0609 1810 0531 1830 0506 1857 0506 1920
20 0712 1733 0647 1752 0608 1811 0530 1831 0506 1858 0507 1921
21 0712 1733 0645 1753 0607 1812 0529 1832 0505 1859 0507 1921
22 0711 1734 0644 1753 0605 1812 0528 1833 0505 1859 0508 1922
23 0711 1734 0643 1754 0604 1813 0527 1834 0505 1900 0508 1922
24 0710 1735 0642 1755 0603 1813 0526 1834 0504 1901 0509 1923
25 0709 1736 0641 1755 0601 1814 0525 1835 0504 1902 0509 1923
26 0709 1736 0640 1756 0600 1815 0524 1836 0504 1903 0510 1923
27 0708 1737 0638 1756 0559 1815 0523 1837 0503 1904 0510 1924
28 0707 1738 0637 1757 0557 1816 0522 1838 0503 1905 0511 1924
29 0707 1738 0636 1758 0556 1817 0521 1838 0503 1906 0512 1924
30 0706 1739 0635 1758 0555 1817 0520 1839 0503 1906 0512 1925
31 0705 1739 0634 1759 0519 1840 0513 1925
""".split('\n')
>>> d = process(data)
>>> d['DECrise']['8']
503
>>> d
{'AUGset': {'24': 1755, '25': 1755, '26': 1756, '27': 1756, '20': 1752...
For fun and interest, I came up with a totally different answer;
import datetime
import math
import ephem # PyEphem module
class SunTimes(object):
"""Helper class for finding sun rise/set times
#param date: observation date, one of
string, "yyyy[/mm[/dd[ hh[:mm[:ss]]]]]"
(Unspecified pieces are assumed to be 0)
datetime.date
#param lat: latitude, one of
string, "d[:mm[:ss]]" angle measured in degrees, minutes, seconds
(Unspecified pieces are assumed to be 0)
epoch.Angle
numeric angle in degrees
#param lon: longitude, same types as lat
#fromCity: string, city name
If specified, overrides lat and lon
If city is not recognized, raises KeyError
"""
def __init__(self, *args, **kwargs):
super(SunTimes,self).__init__()
self.sun = ephem.Sun()
self.date = ephem.Date(0)
self._date = 0
self.viewer = ephem.Observer()
self._lat = ''
self._lon = ''
self._city = None
self.dirty = True # lazy updates
self._clean(*args, **kwargs)
def _clean(self, date=None, lat=None, lon=None, fromCity=None):
if date is not None and date != self._date:
self.date = ephem.Date(date)
self._date = date
self.dirty = True
if lat is not None and lat != self._lat:
self.viewer.lat = self.getAngle(lat)
self._lat = lat
self.viewer.name = None
self.city = None
self.dirty = True
if lon is not None and lon != self._lon:
self.viewer.long = self.getAngle(lon)
self._lon = lon
self.viewer.name = None
self.city = None
self.dirty = True
if fromCity is not None and fromCity != self._city:
self.viewer = ephem.city(fromCity)
self._city = fromCity
self._lat = self.viewer.lat
self._lon = self.viewer.long
self.dirty = True
if self.dirty:
self.viewer.date = self.date
self.sun.compute(self.viewer)
self.dirty = False
def getAngle(self, value):
if isinstance(value, ephem.Angle):
return value
elif isinstance(value, str):
return ephem.degrees(value)
else:
return ephem.degrees(math.radians(value))
def sunrise(self, *args, **kwargs):
self._clean(*args, **kwargs)
return self.sun.rise_time.datetime()
def sunset(self, *args, **kwargs):
self._clean(*args, **kwargs)
return self.sun.set_time.datetime()
the tables given match very nicely for local times in Perth, Australia.
sun = SunTimes(lat='-31.9273', lon='115.87925') # Perth
print sun.sunrise(date='2012/1/1')
>>> 2012-01-01 05:15:42.835679
print st.sunset()
>>> 2012-01-01 19:24:23.083130
The times are not exactly identical; a comparison follows:
Read each line into a string, parse it into a list with some empty elements like
(2,,2,4,)
and then convert that list into your dictionary entries. Before parsing you might want to read about the methods in the string module.
Looks like a homework problem regarding sparse matrices.
This is a possible solution, assuming text is the content of the input file:
lines = text.rstrip().split('\n')
lines = [line + ' ' * (max(map(len, lines)) - len(line)) for line in lines]
# pads lines with spaces so that all of them have the same length
rows = tuple(line.replace(' ', ' ').split(' ') for line in lines)
columns = tuple(zip(*rows)) # transpose rows matrix
table = dict()
for i, column in enumerate(columns):
if i > 0: # skip first column
table[rows[0][i]] = dict()
for j, cell in enumerate(column):
if j > 0 and cell.isdigit(): # skip header and blanks
table[rows[0][i]][columns[0][j]] = int(cell)
print table # prints the resulting dict
This assumes that all data items are separated by a single whitespace and that 'blank' items consist of a single whitespace.
from collections import defaultdict
import re
testData = """
A B C D
1 2 3 4 5
2 2 4
3 1 3 4
"""
def strToArray(s):
item = re.compile(r'(\s|\S+)(?:\s|$)')
return [item.findall(ln) for ln in s.split('\n') if len(ln)]
def arrayToDict(array):
res = defaultdict(dict)
xIds = array.pop(0)[1:]
for row in array:
yId = row.pop(0)
for xId,item in zip(xIds,row):
if item.strip():
res[xId][yId] = int(item)
return res
def main():
data = arrayToDict(strToArray(testData))
print data
if __name__=="__main__":
main()
which results in
{'A': {'1': 2, '3': 1}, 'C': {'1': 4, '3': 3, '2': 4}, 'B': {'1': 3, '2': 2}, 'D': {'1': 5, '3': 4}}
Here's a solution using your later data:
data = """
TIMES OF SUNRISE AND SUNSET (for ideal horizon & meteorological conditions)
For the year 2012
Make corrections for daylight saving time where necessary.
------------------------------------------------------------------------------
JAN FEB MAR APR MAY JUN
rise set rise set rise set rise set rise set rise set
1 0513 1925 0541 1918 0606 1851 0628 1812 0648 1738 0708 1720
2 0514 1925 0541 1918 0606 1850 0628 1811 0649 1737 0709 1719
3 0515 1925 0542 1917 0607 1849 0629 1810 0649 1736 0709 1719
4 0515 1926 0543 1916 0608 1847 0630 1808 0650 1736 0710 1719
5 0516 1926 0544 1915 0609 1846 0630 1807 0651 1735 0710 1719
6 0517 1926 0545 1915 0609 1845 0631 1806 0651 1734 0711 1719
7 0518 1926 0546 1914 0610 1844 0632 1805 0652 1733 0711 1719
8 0519 1926 0547 1913 0611 1843 0632 1803 0653 1732 0712 1719
9 0519 1926 0548 1912 0612 1841 0633 1802 0653 1731 0712 1718
10 0520 1926 0549 1911 0612 1840 0634 1801 0654 1731 0712 1718
11 0521 1926 0550 1911 0613 1839 0634 1800 0655 1730 0713 1718
12 0522 1926 0551 1910 0614 1838 0635 1759 0655 1729 0713 1718
13 0523 1926 0551 1909 0615 1836 0636 1757 0656 1729 0714 1719
14 0524 1926 0552 1908 0615 1835 0636 1756 0657 1728 0714 1719
15 0525 1925 0553 1907 0616 1834 0637 1755 0657 1727 0714 1719
16 0526 1925 0554 1906 0617 1832 0638 1754 0658 1727 0715 1719
17 0527 1925 0555 1905 0617 1831 0638 1753 0659 1726 0715 1719
18 0527 1925 0556 1904 0618 1830 0639 1752 0659 1725 0715 1719
19 0528 1924 0557 1903 0619 1829 0640 1751 0700 1725 0716 1719
20 0529 1924 0558 1902 0619 1827 0640 1749 0701 1724 0716 1719
21 0530 1924 0558 1901 0620 1826 0641 1748 0701 1724 0716 1720
22 0531 1923 0559 1900 0621 1825 0642 1747 0702 1723 0716 1720
23 0532 1923 0600 1859 0621 1824 0642 1746 0703 1723 0716 1720
24 0533 1923 0601 1858 0622 1822 0643 1745 0703 1722 0717 1720
25 0534 1922 0602 1857 0623 1821 0644 1744 0704 1722 0717 1721
26 0535 1922 0602 1855 0624 1820 0644 1743 0705 1722 0717 1721
27 0536 1921 0603 1854 0624 1818 0645 1742 0705 1721 0717 1721
28 0537 1921 0604 1853 0625 1817 0646 1741 0706 1721 0717 1722
29 0538 1920 0605 1852 0626 1816 0646 1740 0706 1720 0717 1722
30 0539 1920 0626 1815 0647 1739 0707 1720 0717 1722
31 0540 1919 0627 1813 0707 1720
JUL AUG SEP OCT NOV DEC
rise set rise set rise set rise set rise set rise set
1 0717 1723 0705 1740 0632 1759 0553 1818 0518 1841 0503 1907
2 0717 1723 0704 1741 0631 1800 0552 1819 0517 1842 0503 1908
3 0717 1724 0703 1741 0630 1801 0551 1819 0517 1843 0503 1909
4 0717 1724 0702 1742 0629 1801 0550 1820 0516 1843 0503 1910
5 0717 1724 0701 1743 0627 1802 0548 1821 0515 1844 0503 1911
6 0717 1725 0700 1743 0626 1802 0547 1821 0514 1845 0503 1911
7 0716 1725 0700 1744 0625 1803 0546 1822 0513 1846 0503 1912
8 0716 1726 0659 1745 0624 1804 0545 1823 0513 1847 0503 1913
9 0716 1726 0658 1745 0622 1804 0543 1823 0512 1848 0503 1914
10 0716 1727 0657 1746 0621 1805 0542 1824 0511 1849 0503 1914
11 0716 1727 0656 1746 0620 1805 0541 1825 0511 1850 0503 1915
12 0715 1728 0655 1747 0618 1806 0540 1825 0510 1850 0504 1916
13 0715 1729 0654 1748 0617 1807 0538 1826 0509 1851 0504 1916
14 0715 1729 0653 1748 0616 1807 0537 1827 0509 1852 0504 1917
15 0714 1730 0652 1749 0614 1808 0536 1827 0508 1853 0505 1918
16 0714 1730 0651 1750 0613 1809 0535 1828 0508 1854 0505 1918
17 0713 1731 0650 1750 0612 1809 0534 1829 0507 1855 0505 1919
18 0713 1731 0649 1751 0610 1810 0533 1830 0507 1856 0506 1920
19 0713 1732 0648 1751 0609 1810 0531 1830 0506 1857 0506 1920
20 0712 1733 0647 1752 0608 1811 0530 1831 0506 1858 0507 1921
21 0712 1733 0645 1753 0607 1812 0529 1832 0505 1859 0507 1921
22 0711 1734 0644 1753 0605 1812 0528 1833 0505 1859 0508 1922
23 0711 1734 0643 1754 0604 1813 0527 1834 0505 1900 0508 1922
24 0710 1735 0642 1755 0603 1813 0526 1834 0504 1901 0509 1923
25 0709 1736 0641 1755 0601 1814 0525 1835 0504 1902 0509 1923
26 0709 1736 0640 1756 0600 1815 0524 1836 0504 1903 0510 1923
27 0708 1737 0638 1756 0559 1815 0523 1837 0503 1904 0510 1924
28 0707 1738 0637 1757 0557 1816 0522 1838 0503 1905 0511 1924
29 0707 1738 0636 1758 0556 1817 0521 1838 0503 1906 0512 1924
30 0706 1739 0635 1758 0555 1817 0520 1839 0503 1906 0512 1925
31 0705 1739 0634 1759 0519 1840 0513 1925
"""
import re
import itertools
parsed = re.findall(r'''(?xm) # verbose, multiline
^ # start of line
(\d+) # the date
\s{2} # 2 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)|\s{9})? # rise/set time or 9 spaces (optional)
$ # end of line
''',data)
# transpose, throw out date line and create an iterator
# that will walk the original table column by column.
parsed = zip(*parsed)[1:]
data_gen = itertools.chain(*parsed)
sun = {}
# Date changes fastest, followed by 6 month step, then rise/set, then first 6 months.
for m in range(1,7):
for t in ['rise','set']:
for s in [0,6]:
for d in range(1,32):
data = next(data_gen)
# handle blanks
if data:
sun[m+s,d,t] = data
if __name__ == '__main__':
print sun[11,24,'rise']
I propose to use the csv module.
The presence of blanks produces some hard problems. So I created a file containing
A B C D
1 2 3 4 5
2 8 2 4 10
3 1 88 3 4
and I write this code that roughly processes this content as you wish:
f = open('gogo.txt','rb')
print f.read()
f.seek(0,0)
import csv
dodo = csv.reader(f, delimiter = ' ')
headers = dodo.next()[-4:]
print 'headers==',headers
print
d = {}
for k in headers:
d[k] = {}
print d
print
for row in dodo:
print row[0],row[1:]
z = zip(headers,row[1:])
print "z==",z
for x,y in zip(headers,row[-4:]):
print x,y
d[x][row[0]] = y
print d
print '-----------------------------------'
print d
Result
A B C D
1 2 3 4 5
2 8 2 4 10
3 1 88 3 4
headers== ['A', 'B', 'C', 'D']
{'A': {}, 'C': {}, 'B': {}, 'D': {}}
1 ['2', '3', '4', '5']
z== [('A', '2'), ('B', '3'), ('C', '4'), ('D', '5')]
A 2
B 3
C 4
D 5
{'A': {'1': '2'}, 'C': {'1': '4'}, 'B': {'1': '3'}, 'D': {'1': '5'}}
-----------------------------------
2 ['8', '2', '4', '10']
z== [('A', '8'), ('B', '2'), ('C', '4'), ('D', '10')]
A 8
B 2
C 4
D 10
{'A': {'1': '2', '2': '8'}, 'C': {'1': '4', '2': '4'}, 'B': {'1': '3', '2': '2'}, 'D': {'1': '5', '2': '10'}}
-----------------------------------
3 ['1', '88', '3', '4']
z== [('A', '1'), ('B', '88'), ('C', '3'), ('D', '4')]
A 1
B 88
C 3
D 4
{'A': {'1': '2', '3': '1', '2': '8'}, 'C': {'1': '4', '3': '3', '2': '4'}, 'B': {'1': '3', '3': '88', '2': '2'}, 'D': {'1': '5', '3': '4', '2': '10'}}
-----------------------------------
{'A': {'1': '2', '3': '1', '2': '8'}, 'C': {'1': '4', '3': '3', '2': '4'}, 'B': {'1': '3', '3': '88', '2': '2'}, 'D': {'1': '5', '3': '4', '2': '10'}}
Since it has been said that it is a homework, it will be a good thing that you'll have to search how to improve this code to make it able to process lines containing blanks.
sloppy code, but I believe this does what you asked for
jcomeau#intrepid:/tmp$ cat test.dat test.py; ./test.py
A B C D
1 2 3 4 5
2 2 4
3 1 3 4
#!/usr/bin/python
import re
input = open('test.dat')
data = input.readlines()
input.close()
pattern = '(\S+|\s?)\s'
parsed = [map(str.strip, re.compile(pattern).findall(line)) for line in data]
columns = parsed.pop(0)[1:]
rows = [r.pop(0) for r in parsed]
d = {}
for c in columns:
if not d.has_key(c):
d[c] = {}
for r in rows:
try:
d[c][r] = int(parsed[rows.index(r)][columns.index(c)])
except:
pass
print d, d['A']['1'], d['B']['1'], d['D']['1'], d['B']['2']
{'A': {'1': 2, '3': 1}, 'C': {'1': 4, '3': 3, '2': 4}, 'B': {'1': 3, '2': 2}, 'D': {'1': 5, '3': 4}} 2 3 5 2
Hugh Bothwell and jcomeau_ictx, interesting ideas but not general enough. Doesn't work with the real data, although I'm sure you can find a way to use regexp to make it work.
scoffey, thanks. I've used your idea of padding the lines to the same length.
eyquem, you must be dreaming. I never said anything about homework.
Below is my code with the real data now.
l = """
TIMES OF SUNRISE AND SUNSET (for ideal horizon & meteorological conditions)
For the year 2012
Make corrections for daylight saving time where necessary.
------------------------------------------------------------------------------
JAN FEB MAR APR MAY JUN
rise set rise set rise set rise set rise set rise set
1 0513 1925 0541 1918 0606 1851 0628 1812 0648 1738 0708 1720
2 0514 1925 0541 1918 0606 1850 0628 1811 0649 1737 0709 1719
3 0515 1925 0542 1917 0607 1849 0629 1810 0649 1736 0709 1719
4 0515 1926 0543 1916 0608 1847 0630 1808 0650 1736 0710 1719
5 0516 1926 0544 1915 0609 1846 0630 1807 0651 1735 0710 1719
6 0517 1926 0545 1915 0609 1845 0631 1806 0651 1734 0711 1719
7 0518 1926 0546 1914 0610 1844 0632 1805 0652 1733 0711 1719
8 0519 1926 0547 1913 0611 1843 0632 1803 0653 1732 0712 1719
9 0519 1926 0548 1912 0612 1841 0633 1802 0653 1731 0712 1718
10 0520 1926 0549 1911 0612 1840 0634 1801 0654 1731 0712 1718
11 0521 1926 0550 1911 0613 1839 0634 1800 0655 1730 0713 1718
12 0522 1926 0551 1910 0614 1838 0635 1759 0655 1729 0713 1718
13 0523 1926 0551 1909 0615 1836 0636 1757 0656 1729 0714 1719
14 0524 1926 0552 1908 0615 1835 0636 1756 0657 1728 0714 1719
15 0525 1925 0553 1907 0616 1834 0637 1755 0657 1727 0714 1719
16 0526 1925 0554 1906 0617 1832 0638 1754 0658 1727 0715 1719
17 0527 1925 0555 1905 0617 1831 0638 1753 0659 1726 0715 1719
18 0527 1925 0556 1904 0618 1830 0639 1752 0659 1725 0715 1719
19 0528 1924 0557 1903 0619 1829 0640 1751 0700 1725 0716 1719
20 0529 1924 0558 1902 0619 1827 0640 1749 0701 1724 0716 1719
21 0530 1924 0558 1901 0620 1826 0641 1748 0701 1724 0716 1720
22 0531 1923 0559 1900 0621 1825 0642 1747 0702 1723 0716 1720
23 0532 1923 0600 1859 0621 1824 0642 1746 0703 1723 0716 1720
24 0533 1923 0601 1858 0622 1822 0643 1745 0703 1722 0717 1720
25 0534 1922 0602 1857 0623 1821 0644 1744 0704 1722 0717 1721
26 0535 1922 0602 1855 0624 1820 0644 1743 0705 1722 0717 1721
27 0536 1921 0603 1854 0624 1818 0645 1742 0705 1721 0717 1721
28 0537 1921 0604 1853 0625 1817 0646 1741 0706 1721 0717 1722
29 0538 1920 0605 1852 0626 1816 0646 1740 0706 1720 0717 1722
30 0539 1920 0626 1815 0647 1739 0707 1720 0717 1722
31 0540 1919 0627 1813 0707 1720
JUL AUG SEP OCT NOV DEC
rise set rise set rise set rise set rise set rise set
1 0717 1723 0705 1740 0632 1759 0553 1818 0518 1841 0503 1907
2 0717 1723 0704 1741 0631 1800 0552 1819 0517 1842 0503 1908
3 0717 1724 0703 1741 0630 1801 0551 1819 0517 1843 0503 1909
4 0717 1724 0702 1742 0629 1801 0550 1820 0516 1843 0503 1910
5 0717 1724 0701 1743 0627 1802 0548 1821 0515 1844 0503 1911
6 0717 1725 0700 1743 0626 1802 0547 1821 0514 1845 0503 1911
7 0716 1725 0700 1744 0625 1803 0546 1822 0513 1846 0503 1912
8 0716 1726 0659 1745 0624 1804 0545 1823 0513 1847 0503 1913
9 0716 1726 0658 1745 0622 1804 0543 1823 0512 1848 0503 1914
10 0716 1727 0657 1746 0621 1805 0542 1824 0511 1849 0503 1914
11 0716 1727 0656 1746 0620 1805 0541 1825 0511 1850 0503 1915
12 0715 1728 0655 1747 0618 1806 0540 1825 0510 1850 0504 1916
13 0715 1729 0654 1748 0617 1807 0538 1826 0509 1851 0504 1916
14 0715 1729 0653 1748 0616 1807 0537 1827 0509 1852 0504 1917
15 0714 1730 0652 1749 0614 1808 0536 1827 0508 1853 0505 1918
16 0714 1730 0651 1750 0613 1809 0535 1828 0508 1854 0505 1918
17 0713 1731 0650 1750 0612 1809 0534 1829 0507 1855 0505 1919
18 0713 1731 0649 1751 0610 1810 0533 1830 0507 1856 0506 1920
19 0713 1732 0648 1751 0609 1810 0531 1830 0506 1857 0506 1920
20 0712 1733 0647 1752 0608 1811 0530 1831 0506 1858 0507 1921
21 0712 1733 0645 1753 0607 1812 0529 1832 0505 1859 0507 1921
22 0711 1734 0644 1753 0605 1812 0528 1833 0505 1859 0508 1922
23 0711 1734 0643 1754 0604 1813 0527 1834 0505 1900 0508 1922
24 0710 1735 0642 1755 0603 1813 0526 1834 0504 1901 0509 1923
25 0709 1736 0641 1755 0601 1814 0525 1835 0504 1902 0509 1923
26 0709 1736 0640 1756 0600 1815 0524 1836 0504 1903 0510 1923
27 0708 1737 0638 1756 0559 1815 0523 1837 0503 1904 0510 1924
28 0707 1738 0637 1757 0557 1816 0522 1838 0503 1905 0511 1924
29 0707 1738 0636 1758 0556 1817 0521 1838 0503 1906 0512 1924
30 0706 1739 0635 1758 0555 1817 0520 1839 0503 1906 0512 1925
31 0705 1739 0634 1759 0519 1840 0513 1925
"""
l = l.split('\n')
l = filter(None, l)
f = map(lambda _: str.ljust(_[4:],78), l[7:38])
s = map(lambda _: str.ljust(_[4:],78), l[40:71])
l = map(lambda _: ''.join(_),zip(f,s))
a=[]
r = [13*i for i in xrange(13)]
for line in l:
d = [(line[r[i]:r[i+1]]) for i in xrange(12)]
a.append(d)
import numpy
a = numpy.transpose(a).tolist()
sun = {}
for m in xrange(12):
a[m] = filter(lambda _: not _.isspace(), a[m])
for d in xrange(len(a[m])):
date = "%4d-%02d-%d" % (2012, m+1, d+1)
sun[date] = {}
sun[date]['rise'], sun[date]['set'] = a[m][d].split()
print sun
Andrey,
To increase the automation of the beginning of the treatment, I would do like that:
l = filter(None,l.splitlines())
starts = [ i+1 for i,line in enumerate(l) if 'rise' in line and 'set' in line]
print 'starts==',starts
f = []
for line in l[starts[0]:]:
if any(c not in ' 0123456789' for c in line):
break
else:
f.append( line.partition(' ')[2] )
s = []
for line in l[starts[1]:]:
if any(c not in ' 0123456789' for c in line):
break
else:
s.append( line.partition(' ')[2] )
a= [ (x+' '+y).split(' ') for x,y in zip(f,s) ]
And improving the algorithm:
l = filter(None,l.splitlines())
starts = [ i+1 for i,line in enumerate(l) if 'rise' in line and 'set' in line]
print 'starts==',starts
nb, a = 0, []
while starts[1]+nb<len(l):
line0 = l[starts[0]+nb].partition(' ')[2]
line1 = l[starts[1]+nb].partition(' ')[2]
if any(c not in ' 0123456789' for c in line0) or any(c not in ' 0123456789' for c in line1):
break
else:
a.append((line0+' '+line1).split(' '))
nb += 1
I can't go farther for I haven't numpy and I don't know the final dat you want to obtain;
But there's something evident:
to realize this kind of process, it's absolutely necessary to use regexes: you'll have a confort of treatment several magnitude higher.