How to get the body of the table using Python? - python

I am self-lerning webscraping and I am trying to get tbody from a table with beautifulSoups.
My attempt:
url ='https://www.agrolok.pl/notowania/notowania-cen-pszenicy.htm'
page = requests.get(url).content
soup = BeautifulSoup(page, 'lxml')
table = soup.findAll('table', class_='hover')
print(table)
Thats what I get:
<table class="hover"></table>
Any hints highly appreciated

'table', class_='hover' that contains table data aka tbody, tr, td and so on are dynamic thats why you are not getting tbody but you can mimic dat selenium with pandas/bs4. I use selenium with pandas.
Script:
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.agrolok.pl/notowania/notowania-cen-pszenicy.htm')
driver.maximize_window()
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(soup))[0]
d=df.rename(columns=df.iloc[0]).drop(df.index[0])
print(d)
Output:
7/4/2022 1410 1380 343.25 4.7002 1613 1640
1 7/1/2022 1410 1300 334.50 4.7176 1578 1630
2 6/30/2022 1410 1320 350.25 4.6806 1639 1650
3 6/29/2022 1500 1380 358.50 4.6809 1678 1710
4 6/28/2022 1450 1360 356.75 4.7004 1677 1690
5 6/27/2022 1450 1360 350.00 4.6965 1644 1690
6 6/24/2022 1450 1360 357.25 4.7094 1682 1700
7 6/23/2022 1450 1360 359.00 4.7096 1691 1690
8 6/22/2022 1470 1410 370.50 4.6590 1726 1750
9 6/21/2022 1500 1370 372.50 4.6460 1731 1730
10 6/20/2022 1540 1460 388.25 4.6731 1814 1780
11 6/15/2022 1560 1460 392.75 4.6642 1832 1780
12 6/14/2022 1560 1460 392.25 4.6548 1826 1780
13 6/13/2022 1540 1460 394.50 4.6313 1827 1800
14 6/10/2022 1530 1450 391.75 4.6030 1803 1760
15 6/9/2022 1540 1500 386.25 4.5826 1770 1730
16 6/8/2022 1550 1520 381.75 4.5817 1749 1730
17 6/7/2022 1500 1540 385.50 4.5855 1768 1700
18 6/6/2022 1600 1510 397.50 4.5880 1824 1760
19 6/3/2022 1560 1490 378.25 4.5908 1736 1700
20 6/2/2022 1590 1490 382.50 4.5876 1755 1710
21 6/1/2022 1590 1490 380.50 4.5891 1746 1700
22 5/31/2022 1650 1560 392.25 4.5756 1795 1750
23 5/30/2022 1670 1590 406.75 4.5869 1866 1800
24 5/27/2022 1670 1580 414.75 4.6102 1912 1700
25 5/26/2022 1650 1580 409.50 4.6135 1889 1700
26 5/25/2022 1670 1600 404.50 4.5955 1859 1700
27 5/24/2022 1690 1630 410.50 4.6107 1893 1800
28 5/23/2022 1700 1600 426.00 4.6171 1966 1860
29 5/20/2022 1700 1630 420.75 4.6366 1951 1840
30 5/19/2022 1700 1640 422.25 4.6429 1960 1850
31 5/18/2022 1700 1640 430.50 4.6528 2003 1850
32 5/17/2022 1690 1640 438.25 4.6558 2040 1850
33 5/16/2022 1690 1640 438.25 4.6724 2048 1880
34 5/13/2022 1670 1560 416.50 4.6679 1944 1800
35 5/12/2022 1670 1540 414.25 4.6841 1940 1790
36 5/11/2022 1670 1540 403.25 4.6700 1883 1790
37 5/10/2022 1680 1560 396.50 4.6761 1854 1780
38 5/9/2022 1670 1560 394.50 4.7059 1856 1780
39 5/6/2022 1600 1580 406.25 4.6979 1909 1760
40 5/5/2022 1660 1610 401.00 4.6658 1871 1780
41 5/4/2022 1660 1630 390.50 4.6777 1827 1735
42 4/29/2022 1660 1630 400.75 4.6582 1867 1720
43 4/28/2022 1670 1640 416.50 4.6915 1954 1740
44 4/27/2022 1670 1630 418.25 4.7076 1969 1720
45 4/26/2022 1660 1640 415.25 4.6429 1928 1685
46 4/25/2022 1665 1630 408.25 4.6405 1894 1670
47 4/22/2022 1665 1650 407.00 4.6361 1887 1690
48 4/21/2022 1660 1650 405.75 4.6523 1888 1690
49 4/20/2022 1660 1660 398.50 4.6295 1845 1700
50 4/19/2022 1680 1660 399.50 4.6361 1852 1740
51 4/15/2022 1690 1660 401.00 4.6378 1860 1770
52 4/14/2022 1690 1660 401.00 4.6447 1863 1770
53 4/13/2022 1680 1630 403.00 4.6460 1872 1780
54 4/12/2022 1650 1620 399.25 4.6626 1862 1700
55 4/11/2022 1630 1590 379.50 4.6451 1763 1670
56 4/8/2022 1650 1610 372.75 4.6405 1730 1660
57 4/7/2022 1650 1610 363.75 4.6478 1691 1670
58 4/6/2022 1650 1600 364.00 4.6539 1694 1670
59 4/5/2022 1650 1620 364.50 4.6317 1688 1680
60 4/4/2022 1640 1610 363.75 4.6373 1687 1680

soup = BeautifulSoup(HTML)
# the first argument to find tells it what tag to search for
# the second you can pass a dict of attr->value pairs to filter
# results that match the first tag
table = soup.find( "table", {"title":"TheTitle"} )
rows=list()
for row in table.findAll("tr"):
rows.append(row)
# now rows contains each tr in the table (as a BeautifulSoup object)
# and you can search them to pull out the times

for i in table:
tbody = i.find_all('tbody')

Related

How do I crop a python array to maximum size with only non-zero values (largest non-zero rectangle)

I have a numpy array of pixel data, something like
0 0 0 0 0 0 0
0 1 3 4 6 1 0
0 2 3 5 2 1 0
0 1 0 0 1 0 0
0 0 0 0 0 0 0
I would like to get a new array which excludes any outer rows/columns with zeroes, so I just end up with only the non-zero values (that works for any given array) i.e.
1 3 4 6 1
2 3 5 2 1
So far all I've managed to get is
1 3 4 6 1
2 3 5 2 1
1 0 0 1 0
using np.argwhere to find the "min" and "max" non-zero values, but this still includes rows/columns with zero and non-zero values in.
My actual array:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765 0 0
0 0 1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756 0 0
0 0 1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792 0 0
0 0 1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817 0 0
0 0 1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754 0 0
0 0 1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864 0 0
0 0 1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824 0 0
0 0 1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876 0 0
0 0 1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754 0 0
0 0 1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928 0 0
0 0 1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798 0 0
0 0 1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875 0 0
0 0 1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825 0 0
0 0 1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813 0 0
0 0 1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857 0 0
0 0 1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857 0 0
0 0 1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818 0 0
0 0 1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898 0 0
0 0 1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822 0 0
0 0 1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834 0 0
0 0 1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819 0 0
0 0 1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907 0 0
0 0 1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861 0 0
0 0 1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881 0 0
0 0 1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873 0 0
0 0 1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842 0 0
0 0 1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888 0 0
0 0 1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873 0 0
0 0 1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796 0 0
0 0 1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755 0 0
0 0 1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816 0 0
0 0 1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855 0 0
0 0 1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787 0 0
0 0 1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842 0 0
0 0 1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738 0 0
0 0 1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829 0 0
0 0 1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776 0 0
0 0 1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721 0 0
0 0 1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844 0 0
0 0 1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887 0 0
0 0 1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800 0 0
0 0 0 1782 0 0 0 0 1879 0 0 0 0 1884 0 0 0 0 0 0 0 1893 0 1932 1909 1938 0 0 0 0 0 1928 0 0 1816 0 0 1921 1887 0 0 0 0 1876 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1907 0 0 0 0 1944 0 0 0 0 1954 0 0 0 0 0 0 0 1930 0 1875 1882 1912 0 0 0 0 0 1890 0 0 1875 0 0 1873 1872 0 0 0 0 1897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Welcome to StackOverflow!
Input:
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 1872 ... 1765 0 0]
...
[ 0 0 1850 ... 1800 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]]
Input array.npy
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765 0 0
0 0 1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756 0 0
0 0 1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792 0 0
0 0 1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817 0 0
0 0 1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754 0 0
0 0 1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864 0 0
0 0 1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824 0 0
0 0 1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876 0 0
0 0 1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754 0 0
0 0 1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928 0 0
0 0 1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798 0 0
0 0 1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875 0 0
0 0 1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825 0 0
0 0 1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813 0 0
0 0 1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857 0 0
0 0 1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857 0 0
0 0 1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818 0 0
0 0 1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898 0 0
0 0 1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822 0 0
0 0 1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834 0 0
0 0 1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819 0 0
0 0 1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907 0 0
0 0 1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861 0 0
0 0 1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881 0 0
0 0 1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873 0 0
0 0 1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842 0 0
0 0 1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888 0 0
0 0 1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873 0 0
0 0 1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796 0 0
0 0 1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755 0 0
0 0 1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816 0 0
0 0 1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855 0 0
0 0 1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787 0 0
0 0 1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842 0 0
0 0 1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738 0 0
0 0 1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829 0 0
0 0 1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776 0 0
0 0 1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721 0 0
0 0 1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844 0 0
0 0 1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887 0 0
0 0 1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800 0 0
0 0 0 1782 0 0 0 0 1879 0 0 0 0 1884 0 0 0 0 0 0 0 1893 0 1932 1909 1938 0 0 0 0 0 1928 0 0 1816 0 0 1921 1887 0 0 0 0 1876 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1907 0 0 0 0 1944 0 0 0 0 1954 0 0 0 0 0 0 0 1930 0 1875 1882 1912 0 0 0 0 0 1890 0 0 1875 0 0 1873 1872 0 0 0 0 1897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Solution 1:
np_input = np.load('array.npy')
# Remove all zeros from column
np_input = np_input[:, (np_input != 0).any(axis=0)]
# Remove all zeros from row
np_input = np_input[(np_input != 0).any(axis=1)]
# converting to list of lists
np_input = np_input.tolist()
# Remove sub list that contains a zero
np_input = [x for x in np_input if 0 not in x]
# Convert pixles_input to numpy array
final_np = np.array(np_input)
print(final_np)
Solution 2:
np_input = np.load('array.npy')
final_np = np.array([x for x in np_input[:, (np_input != 0).any(axis=0)][(np_input != 0).any(axis=1)].tolist() if 0 not in x])
print(final_np)
Output:
[[1872 1803 1731 ... 1709 1774 1765]
[1937 1746 1790 ... 1685 1814 1756]
[1754 1895 1806 ... 1817 1885 1792]
...
[1861 1895 1819 ... 1861 1867 1844]
[1822 1867 1806 ... 1786 1919 1887]
[1850 1926 1855 ... 1861 1761 1800]]
Output array.npy
1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765
1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756
1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792
1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817
1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754
1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864
1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824
1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876
1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754
1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928
1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798
1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875
1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825
1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813
1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857
1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857
1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818
1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898
1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822
1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834
1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819
1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907
1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861
1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881
1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873
1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842
1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888
1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873
1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796
1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755
1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816
1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855
1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787
1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842
1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738
1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829
1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776
1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721
1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844
1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887
1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800
If we go by your assumption that there likely won't be any zeros in the middle of the array, we can figure out if a row contains any zeros using any(axis=1) (or axis=0 for columns), and if a row contains all zeros using all
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 3, 4, 6, 1, 0],
[0, 2, 3, 5, 2, 1, 0],
[0, 1, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
To start, we want to delete those rows and columns that are all zeros.
delete_rows = (data == 0).all(axis=1)
delete_cols = (data == 0).all(axis=0)
For now, let's set those rows to -999 (since your data is pixel data, -999 is an invalid value that you never expect to see) so that data == 0 for the future steps isn't confused by these "border" rows/cols
data[delete_rows, :] = -999
data[:, delete_cols] = -999
Next, let's find any rows that contain any zeros and are next to a row that's going to be deleted (previous or next row is in delete_rows):
zero_rows = (data == 0).any(axis=1)
d_r = np.zeros(zero_rows.shape, dtype=bool)
d_r[1:] = d_r[1:] | delete_rows[:-1]
d_r[:-1] = d_r[:-1] | delete_rows[1:]
d_r[0] = d_r[-1] = True
delete_rows = delete_rows | (zero_rows & d_r)
data[delete_rows, :] = -999
We can repeat this until there are no more changes to delete_rows. I.e.:
del_count = sum(delete_rows)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_rows = (data == 0).any(axis=1)
d_r = np.zeros(zero_rows.shape, dtype=bool)
d_r[1:] = d_r[1:] | delete_rows[:-1]
d_r[:-1] = d_r[:-1] | delete_rows[1:]
d_r[0] = d_r[-1] = True # First and last rows can be deleted if they have any zeros
delete_rows = delete_rows | (zero_rows & d_r)
prev_del_count, del_count = del_count, sum(delete_rows)
data[delete_rows, :] = -999
Then, we can do the same for columns:
del_count = sum(delete_cols)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_cols = (data == 0).any(axis=0)
d_c = np.zeros(zero_cols.shape, dtype=bool)
d_c[1:] = d_c[1:] | delete_cols[:-1]
d_c[:-1] = d_c[:-1] | delete_cols[1:]
d_c[0] = d_c[-1] = True # First and last cols can be deleted if they have any zeros
delete_cols = delete_cols | (zero_cols & d_c)
prev_del_count, del_count = del_count, sum(delete_cols)
data[:, delete_cols] = -999
Now, we have:
delete_rows = np.array([ True, False, False, True, True])
delete_cols = np.array([ True, False, False, False, False, False, True])
And we can filter out the required rows and cols:
filtered_data = data[~delete_rows, :][:, ~delete_cols]
which gives:
array([[1, 3, 4, 6, 1],
[2, 3, 5, 2, 1]])
Running this on the larger array, we get the desired result:
def remove_outside_zeros(data):
delete_rows = (data == 0).all(axis=1)
delete_cols = (data == 0).all(axis=0)
data[delete_rows, :] = -999
data[:, delete_cols] = -999
del_count = sum(delete_rows)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_rows = (data == 0).any(axis=1)
d_r = np.zeros(zero_rows.shape, dtype=bool)
d_r[1:] = d_r[1:] | delete_rows[:-1]
d_r[:-1] = d_r[:-1] | delete_rows[1:]
d_r[0] = d_r[-1] = True
delete_rows = delete_rows | (zero_rows & d_r)
prev_del_count, del_count = del_count, sum(delete_rows)
data[delete_rows, :] = -999
del_count = sum(delete_cols)
prev_del_count = del_count + 1
while del_count != prev_del_count:
zero_cols = (data == 0).any(axis=0)
d_c = np.zeros(zero_cols.shape, dtype=bool)
d_c[1:] = d_c[1:] | delete_cols[:-1]
d_c[:-1] = d_c[:-1] | delete_cols[1:]
d_c[0] = d_c[-1] = True
delete_cols = delete_cols | (zero_cols & d_c)
prev_del_count, del_count = del_count, sum(delete_cols)
data[:, delete_cols] = -999
return data[~delete_rows, :][:, ~delete_cols]
arr = """0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1872 1803 1731 1766 1816 1843 1706 1768 1815 1741 1846 1857 1731 1745 1842 1720 1769 1853 1764 1776 1816 1773 1793 1767 1830 1791 1835 1823 1762 1832 1763 1762 1779 1901 1872 1819 1862 1802 1726 1788 1847 1785 1796 1773 1800 1742 1873 1830 1869 1832 1809 1861 1702 1808 1709 1774 1765 0 0
0 0 1937 1746 1790 1750 1862 1898 1770 1727 1868 1895 1761 1800 1814 1826 1836 1774 1847 1868 1837 1746 1809 1869 1818 1760 1940 1844 1845 1833 1815 1872 1773 1816 1769 1860 1841 1856 1857 1779 1779 1822 1781 1778 1858 1727 1816 1835 1835 1864 1793 1781 1908 1820 1803 1838 1685 1814 1756 0 0
0 0 1754 1895 1806 1818 1829 1733 1865 1903 1764 1850 1847 1913 1856 1757 1782 1826 1818 1875 1843 1777 1716 1825 1761 1842 1843 1925 1791 1879 1887 1873 1789 1769 1805 1915 1825 1829 1817 1840 1882 1762 1840 1878 1830 1862 1789 1884 1798 1802 1847 1875 1825 1773 1803 1850 1817 1885 1792 0 0
0 0 1773 1830 1797 1878 1758 1897 1813 1836 1835 1960 1841 1807 1788 1799 1839 1834 1792 1855 1785 1912 1824 1845 1831 1902 1879 1869 1793 1901 1801 1881 1871 1786 1851 1879 1822 1829 1951 1873 1778 1769 1941 1805 1826 1892 1869 1783 1895 1799 1800 1973 1829 1869 1903 1858 1806 1837 1817 0 0
0 0 1828 1858 1793 1833 1894 1832 1763 1892 1786 1893 1883 1846 1828 1821 1875 1864 1778 1863 1832 1801 1798 1871 1753 1899 1892 1901 1907 1877 1756 1865 1899 1874 1841 1775 1838 1817 1864 1798 1843 1803 1853 1878 1831 1855 1803 1816 1885 1818 1882 1859 1790 1892 1826 1906 1842 1831 1754 0 0
0 0 1811 1831 1837 1828 1792 1768 1818 1797 1766 1924 1849 1921 1881 1795 1883 1954 1811 1804 2006 1849 1841 1808 1867 1918 1755 1765 1881 1852 1930 1848 1807 1876 1776 1790 1849 1855 1942 1871 1908 1822 1810 1794 1889 1780 1857 1879 1845 1858 1901 1839 1744 1743 1811 1853 1841 1854 1864 0 0
0 0 1880 1888 1874 1878 1888 1868 1852 1887 1875 1874 1892 1828 1842 1822 1789 1870 1829 1841 1864 1859 1846 1776 1799 1875 1875 1811 1873 1837 1921 1917 1777 1840 1872 1816 1878 1890 1821 1925 1810 1945 1884 1845 1859 1843 1806 1894 1886 1886 1885 1931 1761 1819 1889 1765 1891 1896 1824 0 0
0 0 1856 1827 1826 1882 1786 1852 1820 1880 1912 1795 1854 1868 1899 1855 1886 1894 1891 1907 1907 1713 1800 1922 1831 1814 1894 1851 1927 1879 1881 1884 1932 1904 1807 1839 1851 1885 1889 1913 1878 1754 1930 1905 1915 1825 1901 1870 1839 1867 1897 1862 1843 1836 1774 1764 1838 1829 1876 0 0
0 0 1858 1840 1897 1884 1861 1910 1860 1879 1882 1860 1831 1828 1846 1820 1889 1830 1852 1880 1842 1917 1872 1839 1820 1888 1871 1838 1817 1939 1905 1890 1832 1925 1780 1862 1793 1887 1836 1846 1852 1939 1922 1874 1865 1890 1864 1863 1918 1819 1861 1851 1854 1886 1898 1888 1796 1917 1754 0 0
0 0 1891 1852 1926 1803 1863 1814 1849 1857 1870 1882 1979 1786 1880 1820 1812 1863 1922 1916 1851 1879 1827 1859 1913 1843 1852 1823 1812 1891 1932 1887 1883 1975 1769 1831 1859 1954 1780 1829 1853 1754 1832 1733 1886 1800 1808 1879 1821 1934 1897 1822 1941 1863 1818 1826 1883 1894 1928 0 0
0 0 1829 1820 1899 1869 1864 1863 1895 1923 1839 1804 1884 1835 1859 1872 1825 1841 1817 1817 1832 1882 1878 1854 1867 1917 1843 1928 1949 1859 1929 1938 1826 1808 1823 1872 1865 1811 1908 1848 1861 1926 1799 1825 1799 1859 1957 1848 1863 1846 1806 1934 1845 1899 1827 1881 1836 1806 1798 0 0
0 0 1794 1914 1880 1892 1849 1862 1819 1927 1873 1886 1857 1907 1840 1897 1857 1867 1925 1972 1871 1975 1854 1843 1856 1872 1875 1927 1819 1905 1948 1881 1904 1832 1863 1854 1811 1869 1797 1946 1805 1779 1824 1919 1886 1817 1845 1844 1909 1885 1900 1826 1867 1817 1833 1870 1888 1879 1875 0 0
0 0 1930 1857 1851 1862 1907 1924 1838 1833 1858 1847 1892 1788 1902 1786 1880 1818 1896 1938 1953 1952 1903 1723 1867 1955 1859 1869 1890 1830 1864 1837 1806 1827 1872 1868 1907 1977 1878 1895 1786 1892 1897 1872 1927 1807 1854 1865 1911 1957 1816 1833 1904 1897 1764 1895 1854 1800 1825 0 0
0 0 1889 1837 1887 1885 1865 1863 1779 1883 1815 1807 1856 1788 1857 1842 1812 1838 1949 1887 1909 1843 1848 1901 1812 1890 1882 1873 1835 1870 1855 1846 1811 1899 1855 1826 1916 1781 1887 1882 1887 1826 1848 1855 1804 1859 1827 1802 1884 1920 1920 1876 1839 1835 1822 1868 1844 1796 1813 0 0
0 0 1845 1883 1857 1790 1738 1915 1963 1899 1878 1890 1813 1779 1836 1832 1895 1863 1874 1899 1946 1851 1967 1816 1860 1860 1793 1852 1917 1904 1879 1911 1747 1939 1938 1849 1917 1894 1845 1895 1877 1903 1870 1868 1878 1857 1921 1858 1843 1800 1930 1820 1752 1827 1885 1927 1902 1842 1857 0 0
0 0 1916 1898 1929 1884 1981 1866 1940 1978 1848 1903 1935 1843 1817 1944 1871 1862 1917 1876 1920 1921 1789 1881 1938 1793 1906 1912 1854 1904 1855 1901 1877 1814 1894 1907 1894 1828 1839 1980 1805 1878 1861 1808 1885 1854 1958 1863 1756 1922 1898 1808 1822 1864 1916 1855 1919 1896 1857 0 0
0 0 1961 1800 1897 1857 1791 1823 1925 1827 1894 1911 1836 1826 1888 1854 1753 1841 1900 1859 1807 1910 1902 1908 1902 1920 1901 1951 1944 1920 1897 1889 1880 1873 1836 1886 1930 1856 1984 1935 1834 1926 1868 1932 1876 1891 1796 1814 1807 1824 1852 1888 1870 1911 1834 1845 1854 1863 1818 0 0
0 0 1885 1947 1836 1886 1803 1982 1901 1939 1930 1876 1832 1888 1886 1855 1845 1910 1877 1836 1910 1888 1904 1905 1859 1899 1834 1879 1893 1861 1896 1931 1855 1890 1964 1939 1798 1894 1844 1913 1906 1920 1873 1807 1875 1837 1900 1904 1919 1845 1895 1844 1793 1855 1926 1786 1917 1834 1898 0 0
0 0 1863 1856 1776 1925 1943 1875 1903 1858 1878 1865 1877 1821 1892 1914 1907 1863 1779 1879 1939 1893 1867 1846 1940 1910 1927 1920 1920 1934 1788 1851 1937 1943 1906 1853 1954 1910 1892 1857 1878 1853 1887 1876 1915 1819 1820 1933 1813 1848 1867 1866 1949 1905 1832 1876 1786 1918 1822 0 0
0 0 1897 1880 1904 1942 1886 1894 1887 1946 1881 1855 1924 1866 1905 1846 1960 1854 1878 1979 1908 1933 1868 1920 1938 1805 1882 1879 1850 1862 1889 1872 1900 1903 1856 1862 1862 1959 1886 1856 1910 1912 1847 1939 1884 1885 1798 1885 1825 1903 1837 1900 1825 1837 1845 1807 1890 1843 1834 0 0
0 0 1879 1896 1898 1980 1844 1889 2013 1938 1950 1877 1849 1916 1879 1871 1946 1916 1890 1945 1942 1934 1914 1821 1902 1938 1878 1906 1823 1927 1912 1948 1932 1927 1859 1819 1933 1927 1915 1789 1970 1930 1931 1831 1856 1890 1831 1852 1863 1884 1821 1842 1861 1843 1751 1872 1790 1852 1819 0 0
0 0 1884 1974 1825 1888 1932 1843 1911 1899 1905 1845 1847 1920 1883 1934 1879 1869 1792 2024 1882 1944 1850 1913 1899 1799 1899 1927 1849 1935 1880 1874 1888 1881 1870 1829 1908 1841 1957 1892 2001 1999 1941 1959 1917 1913 1893 1849 1908 1853 1928 1868 1784 1881 1871 1844 1754 1849 1907 0 0
0 0 1890 1898 1845 1922 1950 1938 1868 1915 1907 1858 1825 1867 1933 1921 1933 1820 1865 1851 1947 1903 1869 1871 1837 1941 1892 1833 1817 1856 1863 1884 1909 1875 1904 1943 1916 2001 1887 1858 1837 1875 1846 1824 1913 1831 1891 1901 1818 1908 1921 1864 1898 1869 1829 1733 1815 1824 1861 0 0
0 0 1902 1934 1894 1839 1894 1869 1962 1809 1891 1865 1957 1950 1926 1861 1954 1876 1782 1883 1959 1852 1849 1891 1887 1756 1861 1905 1894 1913 1831 1828 1906 1875 1981 1887 1990 1922 1825 1995 1831 1852 1864 1922 1878 1895 1897 1819 1851 1873 1799 1901 1810 1880 1922 1875 1858 1841 1881 0 0
0 0 1852 1867 1940 1858 1867 1888 1863 1839 1851 1885 1875 1928 1903 1913 1858 1838 1819 1818 1744 1850 1856 1884 1861 1846 1896 1891 1894 1946 1911 1888 1865 1849 1777 1893 2010 1931 1832 1901 1817 1900 1869 1863 1825 1848 1885 1893 1875 1843 1884 1819 1950 1899 1926 1837 1819 1876 1873 0 0
0 0 1872 1871 1884 1844 1847 1935 1859 1858 1894 1866 1930 1741 1919 1854 1855 1866 1833 1860 1875 1852 1976 1835 1811 1994 1897 1833 1891 1904 1938 1906 1802 1875 1861 1835 1939 1870 1877 1972 1949 1880 1881 1795 1792 1764 1945 1978 1875 1887 1861 1890 1832 1794 1873 1919 1797 1876 1842 0 0
0 0 1897 1884 1845 1842 1878 1918 1835 1866 1868 1858 1908 1900 1868 1756 1841 1746 1842 1891 1852 1889 1869 1886 1802 1902 1859 1935 1978 1880 1918 1865 1779 1889 1824 1781 1902 1890 1836 1833 1908 1865 1916 1916 1902 1796 1878 1858 1825 1914 1921 1829 1848 1862 1863 1847 1847 1831 1888 0 0
0 0 1856 1933 1882 1948 1882 2003 1938 1901 1856 1755 1834 1868 1861 1768 1863 1841 1814 1896 1859 1871 1860 1908 1912 1893 1896 1968 1863 1938 1920 1828 1952 1854 1867 1913 1764 1893 1876 1892 1901 1813 1890 1916 1915 1887 1836 1812 1798 1846 1867 1846 1866 1787 1915 1898 1911 1717 1873 0 0
0 0 1877 1885 1868 1858 1932 1949 1835 1849 1898 1867 1911 1902 1926 1859 1818 1941 1836 1816 1940 1908 1886 1818 1899 1948 1870 1845 1887 1925 1891 1823 1885 1844 1795 1886 1879 1865 1841 1830 1902 1946 1803 1889 1893 1856 1816 1853 1813 1851 1897 1852 1827 1918 1834 1859 1738 1808 1796 0 0
0 0 1838 1839 1997 1844 1855 1867 1953 1898 1876 1865 1882 1808 1857 1856 1850 1832 1892 1802 1858 1882 1896 1925 1840 1905 1895 1838 1865 1922 1904 1843 1958 1890 1907 1796 1858 1871 1906 1815 1888 1870 1902 1717 1868 1823 1888 1905 1821 1812 1928 1867 1787 1826 1821 1905 1839 1747 1755 0 0
0 0 1870 1868 1899 1915 1873 1841 1938 1918 1897 1902 1846 1887 1750 1868 1841 1828 1928 1852 1876 1905 1859 1838 1931 1871 1920 1779 1836 1897 1863 1937 1895 1934 1940 1872 1890 1893 1852 1874 1860 1857 1874 1903 1826 1873 1877 1833 1922 1847 1832 1874 1914 1829 1846 1863 1829 1913 1816 0 0
0 0 1887 1888 1924 1880 1818 1878 1842 1908 1947 1914 1848 1867 1868 1891 1874 1872 1900 1828 1905 1865 1925 1965 1868 1893 1864 1869 1868 1867 1863 1946 1822 1883 1863 1817 1948 1846 1843 1826 1832 1793 1825 1802 2014 1967 1832 1895 1848 1833 1914 1817 1898 1798 1910 1865 1862 1856 1855 0 0
0 0 1914 1862 1828 1924 1897 1984 1931 1925 1896 1895 1908 1933 1889 1813 1836 1921 1855 1841 1935 1917 1897 1890 1880 1904 1851 1937 1936 1920 1856 1798 1810 1819 1871 1855 1905 1832 1941 1844 1827 1855 1901 1846 1826 1762 1870 1899 1873 1853 1902 1839 1884 1841 1838 1816 1846 1860 1787 0 0
0 0 1869 1874 1867 1894 1865 1951 1865 1887 1857 1900 1839 1874 1877 1876 1845 1897 1881 1952 1832 1855 1855 1949 1889 1942 1844 1881 1937 1892 1779 1841 1893 1902 1814 1791 1858 1870 1874 1856 1814 1744 1799 1831 1839 1717 1878 1815 1846 1864 1832 1927 1808 1859 1818 1848 1828 1803 1842 0 0
0 0 1871 1884 1842 1834 1873 1884 1950 1911 1992 1847 1847 1834 1849 1809 1822 1927 1925 1835 1857 1891 1848 1833 1843 1939 1858 1871 1975 1816 1874 1915 1835 1918 1906 1902 1849 1863 1909 1798 1842 1910 1791 1843 1781 1832 1898 1889 1884 1853 1883 1855 1975 1767 1826 1761 1879 1814 1738 0 0
0 0 1886 1909 1873 1850 1908 1894 1907 1872 1837 1773 1847 1926 1884 1882 1831 1832 1942 1897 1844 1950 1886 1978 1947 1815 1843 1785 1886 1914 1911 1883 1824 1873 1934 1943 1831 1906 1813 1820 1831 1870 1824 1875 1866 1913 1800 1818 1930 1860 1808 1884 1834 1921 1717 1812 1816 1947 1829 0 0
0 0 1860 1893 1883 1843 1923 1853 1834 1858 1922 1944 1942 1839 1813 1852 1889 1945 1902 1977 1929 1881 1850 1967 1844 1877 1970 1850 1941 1897 1814 1894 1841 1837 1821 1866 1777 1805 1851 1889 1838 1843 1853 1776 1907 1909 1846 1781 1775 1876 1941 1851 1849 1854 1813 1885 1912 1887 1776 0 0
0 0 1819 1896 1911 1936 1887 1847 1874 1894 1855 1869 1843 1864 1921 1883 1875 1926 1866 1923 1886 1889 1844 1896 2002 1944 1909 1858 1927 1870 1882 1886 1899 1894 1809 1904 1786 1920 1908 1888 1901 1859 1857 1793 1880 1828 1809 1839 1905 1893 1849 1920 1837 1868 1910 1850 1873 1900 1721 0 0
0 0 1861 1895 1819 1865 1741 1797 1832 1849 1901 1869 1870 1811 1786 1910 1936 1961 1907 1899 1949 1863 1845 1885 1881 1831 1884 1937 1860 1906 1873 1838 1859 1898 1924 1863 1902 1881 1851 1880 1945 1851 1929 1846 1843 1879 1774 1826 1788 1871 1918 1780 1825 1853 1782 1852 1861 1867 1844 0 0
0 0 1822 1867 1806 1745 1942 1836 1841 1861 1787 1867 1947 1906 1826 1822 1935 1787 1879 1920 1830 1928 1879 1837 1921 1923 1855 1932 1844 1841 1917 1928 1865 1915 1873 1839 1846 1910 1896 1903 1911 1838 1857 1905 1870 1811 1899 1874 1860 1822 1935 1757 1862 1807 1856 1868 1786 1919 1887 0 0
0 0 1850 1926 1855 1766 1858 1815 1894 1861 1911 1910 1846 1861 1857 1800 1837 1784 1912 1937 1916 1942 1929 1866 1905 1916 1923 1922 1899 1838 1910 1872 1778 1849 1863 1868 1870 1828 1880 1793 1889 1937 1857 1888 1882 1946 1841 1838 1800 1819 1874 1918 1879 1895 1874 1884 1861 1761 1800 0 0
0 0 0 1782 0 0 0 0 1879 0 0 0 0 1884 0 0 0 0 0 0 0 1893 0 1932 1909 1938 0 0 0 0 0 1928 0 0 1816 0 0 1921 1887 0 0 0 0 1876 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1907 0 0 0 0 1944 0 0 0 0 1954 0 0 0 0 0 0 0 1930 0 1875 1882 1912 0 0 0 0 0 1890 0 0 1875 0 0 1873 1872 0 0 0 0 1897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"""
data = np.array([row.split() for row in arr.split("\n")], dtype=int)
r = remove_outside_zeros(data)
print(r)
gives:
array([[1872, 1803, 1731, ..., 1709, 1774, 1765],
[1937, 1746, 1790, ..., 1685, 1814, 1756],
[1754, 1895, 1806, ..., 1817, 1885, 1792],
...,
[1861, 1895, 1819, ..., 1861, 1867, 1844],
[1822, 1867, 1806, ..., 1786, 1919, 1887],
[1850, 1926, 1855, ..., 1861, 1761, 1800]])

Error while training a model with keras train_function (empty Logs)

I am trying to train a new model based on Bert's pre-training. However, the moment I try to do the training, I get an error without logs that seems to be related to this issue from Keras: https://github.com/keras-team/keras/issues/16202
. The process that the data has followed is as follows:
Xtrain_encoded = np.array(tokenizer.batch_encode_plus(X_train.astype('str'), truncation=True, max_length = 512)['input_ids'])
ytrain_encoded = tf.keras.utils.to_categorical(y_train, num_classes=4, dtype = 'int32')
Xtest_encoded = np.array(tokenizer.batch_encode_plus(X_test.astype('str'), truncation=True, max_length = 512)['input_ids'])
ytest_encoded = tf.keras.utils.to_categorical(y_test, num_classes=4, dtype = 'int32')
tensor_with_from_dimensions = tf.ragged.constant(Xtrain_encoded, ytrain_encoded)
tensor_with_from_Xtrain_encoded = tf.ragged.constant(Xtrain_encoded)
tensor_with_from_Ytrain_encoded = tf.ragged.constant(ytrain_encoded)
tensor_with_from_Xtest_encoded = tf.ragged.constant(Xtest_encoded)
tensor_with_from_Ytest_encoded = tf.ragged.constant(ytest_encoded)
Then, I created the training and testing dataset
BATCH_SIZE = 32*strategy.num_replicas_in_sync
AUTO = tf.data.experimental.AUTOTUNE
train_dataset = (
tf.data.Dataset
.from_tensor_slices((tensor_with_from_Xtrain_encoded, tensor_with_from_Ytrain_encoded))
.repeat()
.shuffle(2048)
.batch(BATCH_SIZE)
.prefetch(AUTO)
)
test_dataset = (
tf.data.Dataset
.from_tensor_slices(tensor_with_from_Xtest_encoded)
.batch(BATCH_SIZE)
)
And here below is where it breaks
n_steps = tensor_with_from_Xtrain_encoded.shape[0] // BATCH_SIZE
train_history = model.fit(
train_dataset,
steps_per_epoch=n_steps,
epochs=10
)
The value of tensor_with_from_Xtrain_encoded and tensor_with_from_Ytrain_encoded respectively is like this but more extensive:
tf.Tensor(
[ 101 11909 26136 23194 1000 1037 2329 10687 3005 2269 2003 15497
3979 1999 7128 2005 1037 3940 1997 2149 8408 20109 2003 1037
2877 8343 1999 1996 3347 25990 2278 2022 4974 2075 1997 2137
4988 2508 17106 1010 2009 2001 3936 2006 5958 1012 19935 2884
1011 16686 2098 19935 2884 3347 2100 2040 3728 1056 28394 3064
1037 6302 1997 2370 3173 2039 1037 16574 2132 2001 2426 2093
28101 2015 4453 2004 4298 2108 1996 16520 6359 2124 2004 2198
1996 3786 2571 1012 3347 2100 1010 2484 1010 2003 1996 2365
1997 2019 6811 1011 2141 16830 2040 2003 15497 3979 2006 7404
5571 5079 2000 1996 9252 2687 20109 1997 7861 22083 14625 1999
7938 1998 11959 1012 2036 2104 4812 2024 1996 2567 1997 1037
2329 3460 2320 5338 2007 15071 2048 2530 2162 11370 2015 1010
1998 1037 2280 6080 2266 2040 4991 2000 7025 1998 6158 2000
7795 1010 3725 2015 10013 3780 2988 1012 1037 6474 2137 4675
3334 29165 2964 8519 2024 3517 2000 4875 2000 1996 2866 2306
2420 2000 2393 6709 17106 2015 6359 1010 3725 2015 3679 5653
2988 1012 2280 19323 2218 2011 18301 2031 2056 2002 2003 2028
1997 2195 24815 5130 2027 9919 1996 11555 2349 2000 2037 2329
24947 1010 2007 2048 1997 2010 13675 10698 2229 3615 2000 2004
2577 1998 25589 1012 3347 2100 1010 2040 2253 2000 7795 2197
2095 2000 2954 1999 2049 6703 2942 2162 1010 2038 1037 3857
1010 3096 4309 1998 9669 2035 2714 2000 2216 1997 2198 1010
2429 2000 1996 10013 1012 2077 3352 1037 24815 2923 1010 2002
2001 2019 22344 10687 2013 2225 2414 2124 2004 1048 9743 4890
1010 3005 2189 2001 2209 2006 4035 2557 1015 1012 3347 2100
2036 2596 1999 2189 6876 6866 2006 7858 2005 2774 4159 26641
1010 3909 2152 1998 24726 1012 2021 2002 2001 7283 7490 3550
2011 8771 1997 2543 23544 5499 14512 2019 6460 2213 16480 14066
2854 1998 2939 2041 1997 2010 2155 2015 27729 2188 1999 1996
10850 2050 10380 2212 1997 2414 2197 2095 1010 3038 2002 2001
2975 2673 2005 1996 8739 1997 16455 1012 3041 2023 3204 1010
2002 2001 2464 1999 1037 6302 6866 2000 10474 4147 21356 5929
1998 1037 2304 21451 20464 12462 2096 3173 1037 16574 2132 2007
2010 2187 2192 1996 2168 2192 2198 2003 2464 2478 2000 4009
1037 5442 2408 17106 2015 3759 1999 2010 7781 2678 1012 2429
2000 1996 10013 1010 3347 2100 2003 2006 2019 2880 2862 1997
2329 24815 5130 2040 2089 2022 2198 1012 2036 2006 1996 2862
2003 10958 9759 2140 7025 1010 2538 1010 1996 3259 2758 1012
1999 2262 1010 7025 2015 2048 3080 3428 2164 2852 1012 21146
9103 2140 7025 1010 1037 6731 7522 2007 2563 2015 2120 2740
2326 2020 5338 2007 15071 2048 2530 2162 11370 2015 2379 1996
9042 3675 2007 4977 1012 2021 1996 3428 2020 2207 2197 2095
2044 4445 2329 4988 2198 2064 15286 102], shape=(512,), dtype=int32)
tf.Tensor([0 0 1 0], shape=(4,), dtype=int32)
The error itself:
ValueError: Unexpected result of `train_function` (Empty logs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.

On hacker rank the testcase giving runtime error but in local machine giving correct output

This code is giving runtime errror for this test case on hacker rank but when I run this code in my local machine with same test case its give me correct output.
n = int(input())
li = [int(x) for x in input().split()]
se = set()
se1 = set()
for x in range(n):
for y in range(x, n):
if tuple(li[x:y+1]) not in se1:
se.add(len(li[x:y+1]) * min(li[x:y+1]))
se1.add(tuple(li[x:y+1]))
print(max(se))
Test Case
1000

I optimized your algorithm, removed usage of sets and improved minimum computation of sub-ranges by computing rolling minimum. My algorithm's running time is O(N^2).
Your algorithm is of O(N^3) complexity because of two loops (each of O(N)) and then inside each loop adding to set and computing minimum and slicing all take O(N) time (because adding a tuple of O(N) length to set needs also O(N) hashing operations of all its elements). So probably your algorithm will be even faster if you don't use sets, because adding tuple to set takes same time as computing minimum again. Also you don't need to store results in se set, you can just compute maximal value on the way.
Try it online!
n, l = int(input()), list(map(int, input().split()))
maxv = None
for i in range(n):
minv, maxv = l[i], l[i] if maxv is None else maxv
for j in range(i, n):
minv = min(minv, l[j])
maxv = max(maxv, (j - i + 1) * minv)
print(maxv)
Input:
1000

Output:
85175

Python Beautiful Soup can't find specific table

I'm having issues with scraping basketball-reference.com. I'm trying to access the "Team Per Game Stats" table but can't seem to target the correct div/table. I'm trying to capture the table and bring it into a dataframe using pandas.
I've tried using soup.find and soup.find_all to find a all the tables but when I search the results I do not see the ID of the table I am looking for. See below.
x = soup.find("table", id="team-stats-per_game")
import csv, time, sys, math
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import urllib.request
#NBA season
year = 2019
# URL page we will scraping
url = "https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)
# Basketball reference URL
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
x = soup.find("table", id="team-stats-per_game")
print(x)
Result:
None
I expect the output to list the table elements, specifically tr and th tags to target and bring into a pandas df.
As Jarett mentioned above, BeautifulSoup can't parse your tag. In this case it's because it's commented out in the source.
While this is admittedly an amateurish approach, it works for your data.
table_src = html.text.split('<div class="overthrow table_container"
id="div_team-stats-per_game">')[1].split('</table>')[0] + '</table>'
table = BeautifulSoup(table_src, 'lxml')
The tables are rendered after, so you'd need to use Selenium to let it render or as mentioned above. But that isn't necessary as most of the tables are within the comments. You could use BeautifulSoup to pull out the comments, then search through those for the table tags.
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd
#NBA season
year = 2019
url = 'https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base'.format(year)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if 'table' in each:
try:
tables.append(pd.read_html(each)[0])
except:
continue
This will return you a list of dataframes, so just pull out the table you want from wherever it is located by its index position:
Output:
print (tables[3])
Rk Team G MP FG ... STL BLK TOV PF PTS
0 1.0 Milwaukee Bucks* 82 19780 3555 ... 615 486 1137 1608 9686
1 2.0 Golden State Warriors* 82 19805 3612 ... 625 525 1169 1757 9650
2 3.0 New Orleans Pelicans 82 19755 3581 ... 610 441 1215 1732 9466
3 4.0 Philadelphia 76ers* 82 19805 3407 ... 606 432 1223 1745 9445
4 5.0 Los Angeles Clippers* 82 19830 3384 ... 561 385 1193 1913 9442
5 6.0 Portland Trail Blazers* 82 19855 3470 ... 546 413 1135 1669 9402
6 7.0 Oklahoma City Thunder* 82 19855 3497 ... 766 425 1145 1839 9387
7 8.0 Toronto Raptors* 82 19880 3460 ... 680 437 1150 1724 9384
8 9.0 Sacramento Kings 82 19730 3541 ... 679 363 1095 1751 9363
9 10.0 Washington Wizards 82 19930 3456 ... 683 379 1154 1701 9350
10 11.0 Houston Rockets* 82 19830 3218 ... 700 405 1094 1803 9341
11 12.0 Atlanta Hawks 82 19855 3392 ... 675 419 1397 1932 9294
12 13.0 Minnesota Timberwolves 82 19830 3413 ... 683 411 1074 1664 9223
13 14.0 Boston Celtics* 82 19780 3451 ... 706 435 1052 1670 9216
14 15.0 Brooklyn Nets* 82 19980 3301 ... 539 339 1236 1763 9204
15 16.0 Los Angeles Lakers 82 19780 3491 ... 618 440 1284 1701 9165
16 17.0 Utah Jazz* 82 19755 3314 ... 663 483 1240 1728 9161
17 18.0 San Antonio Spurs* 82 19805 3468 ... 501 386 992 1487 9156
18 19.0 Charlotte Hornets 82 19830 3297 ... 591 405 1001 1550 9081
19 20.0 Denver Nuggets* 82 19730 3439 ... 634 363 1102 1644 9075
20 21.0 Dallas Mavericks 82 19780 3182 ... 533 351 1167 1650 8927
21 22.0 Indiana Pacers* 82 19705 3390 ... 713 404 1122 1594 8857
22 23.0 Phoenix Suns 82 19880 3289 ... 735 418 1279 1932 8815
23 24.0 Orlando Magic* 82 19780 3316 ... 543 445 1082 1526 8800
24 25.0 Detroit Pistons* 82 19855 3185 ... 569 331 1135 1811 8778
25 26.0 Miami Heat 82 19730 3251 ... 627 448 1208 1712 8668
26 27.0 Chicago Bulls 82 19905 3266 ... 603 351 1159 1663 8605
27 28.0 New York Knicks 82 19780 3134 ... 557 422 1151 1713 8575
28 29.0 Cleveland Cavaliers 82 19755 3189 ... 534 195 1106 1642 8567
29 30.0 Memphis Grizzlies 82 19880 3113 ... 684 448 1147 1801 8490
30 NaN League Average 82 19815 3369 ... 626 406 1155 1714 9119
[31 rows x 25 columns]
As other answers mentioned this is basically because the content of page is being loaded by help of JavaScript and getting source code with help of urlopener or request will not load that dynamic part.
So here I have a way around of it, actually you can make use of selenium to let the dynamic content load and then get the source code from there and find for the table.
Here is the code that actually give the result you expected.
But you will need to setup selenium web driver
from lxml import html
from bs4 import BeautifulSoup
from time import sleep
from selenium import webdriver
def parse(url):
response = webdriver.Firefox()
response.get(url)
sleep(3)
sourceCode=response.page_source
return sourceCode
year =2019
soup = BeautifulSoup(parse("https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)),'lxml')
x = soup.find("table", id="team-stats-per_game")
print(x)
Hope this helped you with your problem and feel free to ask any further doubts.
Happy Coding:)

Formatted input in Python

I have a file that has the following:
A B C D
1 2 3 4 5
2 2 4
3 1 3 4
Note that 4 on line 2 is followed immediately by the new line.
I want to make a dictionary that looks like this
['A']['1'] = 2, d['B']['1'] = 3, ..., d['D']['1'] = 5, d['B']['2'] = 2, etc
The blanks should not appear in the dictionary.
What's the best way to do this in python?
The data will all be single digits right? So it lines up with the column headers? In that case, you can do this:
it = iter(datafile)
cols = list(next(it)[2::2])
d = {}
for row in it:
for col, val in zip(cols, row[2::2]):
if val != ' ':
d.setdefault(col, {})[row[0]] = int(val)
Based on the author's data and code that was recently added, the above code clearly isn't enough. If the format of the document will always be 31 pairs of data for 12 months in groups of 6, we could handle it in many ways. This is what I wrote. It's not the most elegant, probably not as efficient as it can be, but get's the job done. This is one of the reasons why you index by row first, then column.
def process(data):
import re
hre = re.compile(r' +([A-Z]+)'*6)
sre = re.compile(r' +([a-z]+) ([a-z]+)'*6)
dre = re.compile(r'(\d{1,2}) ' + r'(.{4}) (.{4}) {,4}'*6)
it = iter(data)
headers = None
result = {}
for line in it:
if not line: continue
if not headers:
# find the first header
hmatch = hre.match(line)
if hmatch:
subs = iter(sre.match(next(it)).groups())
headers = [h + next(subs)
for h in hmatch.groups()
for _ in range(2)]
count = 0
else:
# fill in the data
dmatch = dre.match(line)
row = dmatch.group(1)
for col, d in zip(headers, dmatch.groups()[1:]):
if d.strip():
result.setdefault(col, {})[row] = int(d)
count += 1
if count == 31:
headers = None
return result
data = """
TIMES OF SUNRISE AND SUNSET (for ideal horizon & meteorological conditions)
For the year 2012
Make corrections for daylight saving time where necessary.
------------------------------------------------------------------------------
JAN FEB MAR APR MAY JUN
rise set rise set rise set rise set rise set rise set
1 0513 1925 0541 1918 0606 1851 0628 1812 0648 1738 0708 1720
2 0514 1925 0541 1918 0606 1850 0628 1811 0649 1737 0709 1719
3 0515 1925 0542 1917 0607 1849 0629 1810 0649 1736 0709 1719
4 0515 1926 0543 1916 0608 1847 0630 1808 0650 1736 0710 1719
5 0516 1926 0544 1915 0609 1846 0630 1807 0651 1735 0710 1719
6 0517 1926 0545 1915 0609 1845 0631 1806 0651 1734 0711 1719
7 0518 1926 0546 1914 0610 1844 0632 1805 0652 1733 0711 1719
8 0519 1926 0547 1913 0611 1843 0632 1803 0653 1732 0712 1719
9 0519 1926 0548 1912 0612 1841 0633 1802 0653 1731 0712 1718
10 0520 1926 0549 1911 0612 1840 0634 1801 0654 1731 0712 1718
11 0521 1926 0550 1911 0613 1839 0634 1800 0655 1730 0713 1718
12 0522 1926 0551 1910 0614 1838 0635 1759 0655 1729 0713 1718
13 0523 1926 0551 1909 0615 1836 0636 1757 0656 1729 0714 1719
14 0524 1926 0552 1908 0615 1835 0636 1756 0657 1728 0714 1719
15 0525 1925 0553 1907 0616 1834 0637 1755 0657 1727 0714 1719
16 0526 1925 0554 1906 0617 1832 0638 1754 0658 1727 0715 1719
17 0527 1925 0555 1905 0617 1831 0638 1753 0659 1726 0715 1719
18 0527 1925 0556 1904 0618 1830 0639 1752 0659 1725 0715 1719
19 0528 1924 0557 1903 0619 1829 0640 1751 0700 1725 0716 1719
20 0529 1924 0558 1902 0619 1827 0640 1749 0701 1724 0716 1719
21 0530 1924 0558 1901 0620 1826 0641 1748 0701 1724 0716 1720
22 0531 1923 0559 1900 0621 1825 0642 1747 0702 1723 0716 1720
23 0532 1923 0600 1859 0621 1824 0642 1746 0703 1723 0716 1720
24 0533 1923 0601 1858 0622 1822 0643 1745 0703 1722 0717 1720
25 0534 1922 0602 1857 0623 1821 0644 1744 0704 1722 0717 1721
26 0535 1922 0602 1855 0624 1820 0644 1743 0705 1722 0717 1721
27 0536 1921 0603 1854 0624 1818 0645 1742 0705 1721 0717 1721
28 0537 1921 0604 1853 0625 1817 0646 1741 0706 1721 0717 1722
29 0538 1920 0605 1852 0626 1816 0646 1740 0706 1720 0717 1722
30 0539 1920 0626 1815 0647 1739 0707 1720 0717 1722
31 0540 1919 0627 1813 0707 1720
JUL AUG SEP OCT NOV DEC
rise set rise set rise set rise set rise set rise set
1 0717 1723 0705 1740 0632 1759 0553 1818 0518 1841 0503 1907
2 0717 1723 0704 1741 0631 1800 0552 1819 0517 1842 0503 1908
3 0717 1724 0703 1741 0630 1801 0551 1819 0517 1843 0503 1909
4 0717 1724 0702 1742 0629 1801 0550 1820 0516 1843 0503 1910
5 0717 1724 0701 1743 0627 1802 0548 1821 0515 1844 0503 1911
6 0717 1725 0700 1743 0626 1802 0547 1821 0514 1845 0503 1911
7 0716 1725 0700 1744 0625 1803 0546 1822 0513 1846 0503 1912
8 0716 1726 0659 1745 0624 1804 0545 1823 0513 1847 0503 1913
9 0716 1726 0658 1745 0622 1804 0543 1823 0512 1848 0503 1914
10 0716 1727 0657 1746 0621 1805 0542 1824 0511 1849 0503 1914
11 0716 1727 0656 1746 0620 1805 0541 1825 0511 1850 0503 1915
12 0715 1728 0655 1747 0618 1806 0540 1825 0510 1850 0504 1916
13 0715 1729 0654 1748 0617 1807 0538 1826 0509 1851 0504 1916
14 0715 1729 0653 1748 0616 1807 0537 1827 0509 1852 0504 1917
15 0714 1730 0652 1749 0614 1808 0536 1827 0508 1853 0505 1918
16 0714 1730 0651 1750 0613 1809 0535 1828 0508 1854 0505 1918
17 0713 1731 0650 1750 0612 1809 0534 1829 0507 1855 0505 1919
18 0713 1731 0649 1751 0610 1810 0533 1830 0507 1856 0506 1920
19 0713 1732 0648 1751 0609 1810 0531 1830 0506 1857 0506 1920
20 0712 1733 0647 1752 0608 1811 0530 1831 0506 1858 0507 1921
21 0712 1733 0645 1753 0607 1812 0529 1832 0505 1859 0507 1921
22 0711 1734 0644 1753 0605 1812 0528 1833 0505 1859 0508 1922
23 0711 1734 0643 1754 0604 1813 0527 1834 0505 1900 0508 1922
24 0710 1735 0642 1755 0603 1813 0526 1834 0504 1901 0509 1923
25 0709 1736 0641 1755 0601 1814 0525 1835 0504 1902 0509 1923
26 0709 1736 0640 1756 0600 1815 0524 1836 0504 1903 0510 1923
27 0708 1737 0638 1756 0559 1815 0523 1837 0503 1904 0510 1924
28 0707 1738 0637 1757 0557 1816 0522 1838 0503 1905 0511 1924
29 0707 1738 0636 1758 0556 1817 0521 1838 0503 1906 0512 1924
30 0706 1739 0635 1758 0555 1817 0520 1839 0503 1906 0512 1925
31 0705 1739 0634 1759 0519 1840 0513 1925
""".split('\n')
>>> d = process(data)
>>> d['DECrise']['8']
503
>>> d
{'AUGset': {'24': 1755, '25': 1755, '26': 1756, '27': 1756, '20': 1752...
For fun and interest, I came up with a totally different answer;
import datetime
import math
import ephem # PyEphem module
class SunTimes(object):
"""Helper class for finding sun rise/set times
#param date: observation date, one of
string, "yyyy[/mm[/dd[ hh[:mm[:ss]]]]]"
(Unspecified pieces are assumed to be 0)
datetime.date
#param lat: latitude, one of
string, "d[:mm[:ss]]" angle measured in degrees, minutes, seconds
(Unspecified pieces are assumed to be 0)
epoch.Angle
numeric angle in degrees
#param lon: longitude, same types as lat
#fromCity: string, city name
If specified, overrides lat and lon
If city is not recognized, raises KeyError
"""
def __init__(self, *args, **kwargs):
super(SunTimes,self).__init__()
self.sun = ephem.Sun()
self.date = ephem.Date(0)
self._date = 0
self.viewer = ephem.Observer()
self._lat = ''
self._lon = ''
self._city = None
self.dirty = True # lazy updates
self._clean(*args, **kwargs)
def _clean(self, date=None, lat=None, lon=None, fromCity=None):
if date is not None and date != self._date:
self.date = ephem.Date(date)
self._date = date
self.dirty = True
if lat is not None and lat != self._lat:
self.viewer.lat = self.getAngle(lat)
self._lat = lat
self.viewer.name = None
self.city = None
self.dirty = True
if lon is not None and lon != self._lon:
self.viewer.long = self.getAngle(lon)
self._lon = lon
self.viewer.name = None
self.city = None
self.dirty = True
if fromCity is not None and fromCity != self._city:
self.viewer = ephem.city(fromCity)
self._city = fromCity
self._lat = self.viewer.lat
self._lon = self.viewer.long
self.dirty = True
if self.dirty:
self.viewer.date = self.date
self.sun.compute(self.viewer)
self.dirty = False
def getAngle(self, value):
if isinstance(value, ephem.Angle):
return value
elif isinstance(value, str):
return ephem.degrees(value)
else:
return ephem.degrees(math.radians(value))
def sunrise(self, *args, **kwargs):
self._clean(*args, **kwargs)
return self.sun.rise_time.datetime()
def sunset(self, *args, **kwargs):
self._clean(*args, **kwargs)
return self.sun.set_time.datetime()
the tables given match very nicely for local times in Perth, Australia.
sun = SunTimes(lat='-31.9273', lon='115.87925') # Perth
print sun.sunrise(date='2012/1/1')
>>> 2012-01-01 05:15:42.835679
print st.sunset()
>>> 2012-01-01 19:24:23.083130
The times are not exactly identical; a comparison follows:
Read each line into a string, parse it into a list with some empty elements like
(2,,2,4,)
and then convert that list into your dictionary entries. Before parsing you might want to read about the methods in the string module.
Looks like a homework problem regarding sparse matrices.
This is a possible solution, assuming text is the content of the input file:
lines = text.rstrip().split('\n')
lines = [line + ' ' * (max(map(len, lines)) - len(line)) for line in lines]
# pads lines with spaces so that all of them have the same length
rows = tuple(line.replace(' ', ' ').split(' ') for line in lines)
columns = tuple(zip(*rows)) # transpose rows matrix
table = dict()
for i, column in enumerate(columns):
if i > 0: # skip first column
table[rows[0][i]] = dict()
for j, cell in enumerate(column):
if j > 0 and cell.isdigit(): # skip header and blanks
table[rows[0][i]][columns[0][j]] = int(cell)
print table # prints the resulting dict
This assumes that all data items are separated by a single whitespace and that 'blank' items consist of a single whitespace.
from collections import defaultdict
import re
testData = """
A B C D
1 2 3 4 5
2 2 4
3 1 3 4
"""
def strToArray(s):
item = re.compile(r'(\s|\S+)(?:\s|$)')
return [item.findall(ln) for ln in s.split('\n') if len(ln)]
def arrayToDict(array):
res = defaultdict(dict)
xIds = array.pop(0)[1:]
for row in array:
yId = row.pop(0)
for xId,item in zip(xIds,row):
if item.strip():
res[xId][yId] = int(item)
return res
def main():
data = arrayToDict(strToArray(testData))
print data
if __name__=="__main__":
main()
which results in
{'A': {'1': 2, '3': 1}, 'C': {'1': 4, '3': 3, '2': 4}, 'B': {'1': 3, '2': 2}, 'D': {'1': 5, '3': 4}}
Here's a solution using your later data:
data = """
TIMES OF SUNRISE AND SUNSET (for ideal horizon & meteorological conditions)
For the year 2012
Make corrections for daylight saving time where necessary.
------------------------------------------------------------------------------
JAN FEB MAR APR MAY JUN
rise set rise set rise set rise set rise set rise set
1 0513 1925 0541 1918 0606 1851 0628 1812 0648 1738 0708 1720
2 0514 1925 0541 1918 0606 1850 0628 1811 0649 1737 0709 1719
3 0515 1925 0542 1917 0607 1849 0629 1810 0649 1736 0709 1719
4 0515 1926 0543 1916 0608 1847 0630 1808 0650 1736 0710 1719
5 0516 1926 0544 1915 0609 1846 0630 1807 0651 1735 0710 1719
6 0517 1926 0545 1915 0609 1845 0631 1806 0651 1734 0711 1719
7 0518 1926 0546 1914 0610 1844 0632 1805 0652 1733 0711 1719
8 0519 1926 0547 1913 0611 1843 0632 1803 0653 1732 0712 1719
9 0519 1926 0548 1912 0612 1841 0633 1802 0653 1731 0712 1718
10 0520 1926 0549 1911 0612 1840 0634 1801 0654 1731 0712 1718
11 0521 1926 0550 1911 0613 1839 0634 1800 0655 1730 0713 1718
12 0522 1926 0551 1910 0614 1838 0635 1759 0655 1729 0713 1718
13 0523 1926 0551 1909 0615 1836 0636 1757 0656 1729 0714 1719
14 0524 1926 0552 1908 0615 1835 0636 1756 0657 1728 0714 1719
15 0525 1925 0553 1907 0616 1834 0637 1755 0657 1727 0714 1719
16 0526 1925 0554 1906 0617 1832 0638 1754 0658 1727 0715 1719
17 0527 1925 0555 1905 0617 1831 0638 1753 0659 1726 0715 1719
18 0527 1925 0556 1904 0618 1830 0639 1752 0659 1725 0715 1719
19 0528 1924 0557 1903 0619 1829 0640 1751 0700 1725 0716 1719
20 0529 1924 0558 1902 0619 1827 0640 1749 0701 1724 0716 1719
21 0530 1924 0558 1901 0620 1826 0641 1748 0701 1724 0716 1720
22 0531 1923 0559 1900 0621 1825 0642 1747 0702 1723 0716 1720
23 0532 1923 0600 1859 0621 1824 0642 1746 0703 1723 0716 1720
24 0533 1923 0601 1858 0622 1822 0643 1745 0703 1722 0717 1720
25 0534 1922 0602 1857 0623 1821 0644 1744 0704 1722 0717 1721
26 0535 1922 0602 1855 0624 1820 0644 1743 0705 1722 0717 1721
27 0536 1921 0603 1854 0624 1818 0645 1742 0705 1721 0717 1721
28 0537 1921 0604 1853 0625 1817 0646 1741 0706 1721 0717 1722
29 0538 1920 0605 1852 0626 1816 0646 1740 0706 1720 0717 1722
30 0539 1920 0626 1815 0647 1739 0707 1720 0717 1722
31 0540 1919 0627 1813 0707 1720
JUL AUG SEP OCT NOV DEC
rise set rise set rise set rise set rise set rise set
1 0717 1723 0705 1740 0632 1759 0553 1818 0518 1841 0503 1907
2 0717 1723 0704 1741 0631 1800 0552 1819 0517 1842 0503 1908
3 0717 1724 0703 1741 0630 1801 0551 1819 0517 1843 0503 1909
4 0717 1724 0702 1742 0629 1801 0550 1820 0516 1843 0503 1910
5 0717 1724 0701 1743 0627 1802 0548 1821 0515 1844 0503 1911
6 0717 1725 0700 1743 0626 1802 0547 1821 0514 1845 0503 1911
7 0716 1725 0700 1744 0625 1803 0546 1822 0513 1846 0503 1912
8 0716 1726 0659 1745 0624 1804 0545 1823 0513 1847 0503 1913
9 0716 1726 0658 1745 0622 1804 0543 1823 0512 1848 0503 1914
10 0716 1727 0657 1746 0621 1805 0542 1824 0511 1849 0503 1914
11 0716 1727 0656 1746 0620 1805 0541 1825 0511 1850 0503 1915
12 0715 1728 0655 1747 0618 1806 0540 1825 0510 1850 0504 1916
13 0715 1729 0654 1748 0617 1807 0538 1826 0509 1851 0504 1916
14 0715 1729 0653 1748 0616 1807 0537 1827 0509 1852 0504 1917
15 0714 1730 0652 1749 0614 1808 0536 1827 0508 1853 0505 1918
16 0714 1730 0651 1750 0613 1809 0535 1828 0508 1854 0505 1918
17 0713 1731 0650 1750 0612 1809 0534 1829 0507 1855 0505 1919
18 0713 1731 0649 1751 0610 1810 0533 1830 0507 1856 0506 1920
19 0713 1732 0648 1751 0609 1810 0531 1830 0506 1857 0506 1920
20 0712 1733 0647 1752 0608 1811 0530 1831 0506 1858 0507 1921
21 0712 1733 0645 1753 0607 1812 0529 1832 0505 1859 0507 1921
22 0711 1734 0644 1753 0605 1812 0528 1833 0505 1859 0508 1922
23 0711 1734 0643 1754 0604 1813 0527 1834 0505 1900 0508 1922
24 0710 1735 0642 1755 0603 1813 0526 1834 0504 1901 0509 1923
25 0709 1736 0641 1755 0601 1814 0525 1835 0504 1902 0509 1923
26 0709 1736 0640 1756 0600 1815 0524 1836 0504 1903 0510 1923
27 0708 1737 0638 1756 0559 1815 0523 1837 0503 1904 0510 1924
28 0707 1738 0637 1757 0557 1816 0522 1838 0503 1905 0511 1924
29 0707 1738 0636 1758 0556 1817 0521 1838 0503 1906 0512 1924
30 0706 1739 0635 1758 0555 1817 0520 1839 0503 1906 0512 1925
31 0705 1739 0634 1759 0519 1840 0513 1925
"""
import re
import itertools
parsed = re.findall(r'''(?xm) # verbose, multiline
^ # start of line
(\d+) # the date
\s{2} # 2 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)\s{4}|\s{13}) # rise/set time or 13 spaces
(?:(\d+)\s(\d+)|\s{9})? # rise/set time or 9 spaces (optional)
$ # end of line
''',data)
# transpose, throw out date line and create an iterator
# that will walk the original table column by column.
parsed = zip(*parsed)[1:]
data_gen = itertools.chain(*parsed)
sun = {}
# Date changes fastest, followed by 6 month step, then rise/set, then first 6 months.
for m in range(1,7):
for t in ['rise','set']:
for s in [0,6]:
for d in range(1,32):
data = next(data_gen)
# handle blanks
if data:
sun[m+s,d,t] = data
if __name__ == '__main__':
print sun[11,24,'rise']
I propose to use the csv module.
The presence of blanks produces some hard problems. So I created a file containing
A B C D
1 2 3 4 5
2 8 2 4 10
3 1 88 3 4
and I write this code that roughly processes this content as you wish:
f = open('gogo.txt','rb')
print f.read()
f.seek(0,0)
import csv
dodo = csv.reader(f, delimiter = ' ')
headers = dodo.next()[-4:]
print 'headers==',headers
print
d = {}
for k in headers:
d[k] = {}
print d
print
for row in dodo:
print row[0],row[1:]
z = zip(headers,row[1:])
print "z==",z
for x,y in zip(headers,row[-4:]):
print x,y
d[x][row[0]] = y
print d
print '-----------------------------------'
print d
Result
A B C D
1 2 3 4 5
2 8 2 4 10
3 1 88 3 4
headers== ['A', 'B', 'C', 'D']
{'A': {}, 'C': {}, 'B': {}, 'D': {}}
1 ['2', '3', '4', '5']
z== [('A', '2'), ('B', '3'), ('C', '4'), ('D', '5')]
A 2
B 3
C 4
D 5
{'A': {'1': '2'}, 'C': {'1': '4'}, 'B': {'1': '3'}, 'D': {'1': '5'}}
-----------------------------------
2 ['8', '2', '4', '10']
z== [('A', '8'), ('B', '2'), ('C', '4'), ('D', '10')]
A 8
B 2
C 4
D 10
{'A': {'1': '2', '2': '8'}, 'C': {'1': '4', '2': '4'}, 'B': {'1': '3', '2': '2'}, 'D': {'1': '5', '2': '10'}}
-----------------------------------
3 ['1', '88', '3', '4']
z== [('A', '1'), ('B', '88'), ('C', '3'), ('D', '4')]
A 1
B 88
C 3
D 4
{'A': {'1': '2', '3': '1', '2': '8'}, 'C': {'1': '4', '3': '3', '2': '4'}, 'B': {'1': '3', '3': '88', '2': '2'}, 'D': {'1': '5', '3': '4', '2': '10'}}
-----------------------------------
{'A': {'1': '2', '3': '1', '2': '8'}, 'C': {'1': '4', '3': '3', '2': '4'}, 'B': {'1': '3', '3': '88', '2': '2'}, 'D': {'1': '5', '3': '4', '2': '10'}}
Since it has been said that it is a homework, it will be a good thing that you'll have to search how to improve this code to make it able to process lines containing blanks.
sloppy code, but I believe this does what you asked for
jcomeau#intrepid:/tmp$ cat test.dat test.py; ./test.py
A B C D
1 2 3 4 5
2 2 4
3 1 3 4
#!/usr/bin/python
import re
input = open('test.dat')
data = input.readlines()
input.close()
pattern = '(\S+|\s?)\s'
parsed = [map(str.strip, re.compile(pattern).findall(line)) for line in data]
columns = parsed.pop(0)[1:]
rows = [r.pop(0) for r in parsed]
d = {}
for c in columns:
if not d.has_key(c):
d[c] = {}
for r in rows:
try:
d[c][r] = int(parsed[rows.index(r)][columns.index(c)])
except:
pass
print d, d['A']['1'], d['B']['1'], d['D']['1'], d['B']['2']
{'A': {'1': 2, '3': 1}, 'C': {'1': 4, '3': 3, '2': 4}, 'B': {'1': 3, '2': 2}, 'D': {'1': 5, '3': 4}} 2 3 5 2
Hugh Bothwell and jcomeau_ictx, interesting ideas but not general enough. Doesn't work with the real data, although I'm sure you can find a way to use regexp to make it work.
scoffey, thanks. I've used your idea of padding the lines to the same length.
eyquem, you must be dreaming. I never said anything about homework.
Below is my code with the real data now.
l = """
TIMES OF SUNRISE AND SUNSET (for ideal horizon & meteorological conditions)
For the year 2012
Make corrections for daylight saving time where necessary.
------------------------------------------------------------------------------
JAN FEB MAR APR MAY JUN
rise set rise set rise set rise set rise set rise set
1 0513 1925 0541 1918 0606 1851 0628 1812 0648 1738 0708 1720
2 0514 1925 0541 1918 0606 1850 0628 1811 0649 1737 0709 1719
3 0515 1925 0542 1917 0607 1849 0629 1810 0649 1736 0709 1719
4 0515 1926 0543 1916 0608 1847 0630 1808 0650 1736 0710 1719
5 0516 1926 0544 1915 0609 1846 0630 1807 0651 1735 0710 1719
6 0517 1926 0545 1915 0609 1845 0631 1806 0651 1734 0711 1719
7 0518 1926 0546 1914 0610 1844 0632 1805 0652 1733 0711 1719
8 0519 1926 0547 1913 0611 1843 0632 1803 0653 1732 0712 1719
9 0519 1926 0548 1912 0612 1841 0633 1802 0653 1731 0712 1718
10 0520 1926 0549 1911 0612 1840 0634 1801 0654 1731 0712 1718
11 0521 1926 0550 1911 0613 1839 0634 1800 0655 1730 0713 1718
12 0522 1926 0551 1910 0614 1838 0635 1759 0655 1729 0713 1718
13 0523 1926 0551 1909 0615 1836 0636 1757 0656 1729 0714 1719
14 0524 1926 0552 1908 0615 1835 0636 1756 0657 1728 0714 1719
15 0525 1925 0553 1907 0616 1834 0637 1755 0657 1727 0714 1719
16 0526 1925 0554 1906 0617 1832 0638 1754 0658 1727 0715 1719
17 0527 1925 0555 1905 0617 1831 0638 1753 0659 1726 0715 1719
18 0527 1925 0556 1904 0618 1830 0639 1752 0659 1725 0715 1719
19 0528 1924 0557 1903 0619 1829 0640 1751 0700 1725 0716 1719
20 0529 1924 0558 1902 0619 1827 0640 1749 0701 1724 0716 1719
21 0530 1924 0558 1901 0620 1826 0641 1748 0701 1724 0716 1720
22 0531 1923 0559 1900 0621 1825 0642 1747 0702 1723 0716 1720
23 0532 1923 0600 1859 0621 1824 0642 1746 0703 1723 0716 1720
24 0533 1923 0601 1858 0622 1822 0643 1745 0703 1722 0717 1720
25 0534 1922 0602 1857 0623 1821 0644 1744 0704 1722 0717 1721
26 0535 1922 0602 1855 0624 1820 0644 1743 0705 1722 0717 1721
27 0536 1921 0603 1854 0624 1818 0645 1742 0705 1721 0717 1721
28 0537 1921 0604 1853 0625 1817 0646 1741 0706 1721 0717 1722
29 0538 1920 0605 1852 0626 1816 0646 1740 0706 1720 0717 1722
30 0539 1920 0626 1815 0647 1739 0707 1720 0717 1722
31 0540 1919 0627 1813 0707 1720
JUL AUG SEP OCT NOV DEC
rise set rise set rise set rise set rise set rise set
1 0717 1723 0705 1740 0632 1759 0553 1818 0518 1841 0503 1907
2 0717 1723 0704 1741 0631 1800 0552 1819 0517 1842 0503 1908
3 0717 1724 0703 1741 0630 1801 0551 1819 0517 1843 0503 1909
4 0717 1724 0702 1742 0629 1801 0550 1820 0516 1843 0503 1910
5 0717 1724 0701 1743 0627 1802 0548 1821 0515 1844 0503 1911
6 0717 1725 0700 1743 0626 1802 0547 1821 0514 1845 0503 1911
7 0716 1725 0700 1744 0625 1803 0546 1822 0513 1846 0503 1912
8 0716 1726 0659 1745 0624 1804 0545 1823 0513 1847 0503 1913
9 0716 1726 0658 1745 0622 1804 0543 1823 0512 1848 0503 1914
10 0716 1727 0657 1746 0621 1805 0542 1824 0511 1849 0503 1914
11 0716 1727 0656 1746 0620 1805 0541 1825 0511 1850 0503 1915
12 0715 1728 0655 1747 0618 1806 0540 1825 0510 1850 0504 1916
13 0715 1729 0654 1748 0617 1807 0538 1826 0509 1851 0504 1916
14 0715 1729 0653 1748 0616 1807 0537 1827 0509 1852 0504 1917
15 0714 1730 0652 1749 0614 1808 0536 1827 0508 1853 0505 1918
16 0714 1730 0651 1750 0613 1809 0535 1828 0508 1854 0505 1918
17 0713 1731 0650 1750 0612 1809 0534 1829 0507 1855 0505 1919
18 0713 1731 0649 1751 0610 1810 0533 1830 0507 1856 0506 1920
19 0713 1732 0648 1751 0609 1810 0531 1830 0506 1857 0506 1920
20 0712 1733 0647 1752 0608 1811 0530 1831 0506 1858 0507 1921
21 0712 1733 0645 1753 0607 1812 0529 1832 0505 1859 0507 1921
22 0711 1734 0644 1753 0605 1812 0528 1833 0505 1859 0508 1922
23 0711 1734 0643 1754 0604 1813 0527 1834 0505 1900 0508 1922
24 0710 1735 0642 1755 0603 1813 0526 1834 0504 1901 0509 1923
25 0709 1736 0641 1755 0601 1814 0525 1835 0504 1902 0509 1923
26 0709 1736 0640 1756 0600 1815 0524 1836 0504 1903 0510 1923
27 0708 1737 0638 1756 0559 1815 0523 1837 0503 1904 0510 1924
28 0707 1738 0637 1757 0557 1816 0522 1838 0503 1905 0511 1924
29 0707 1738 0636 1758 0556 1817 0521 1838 0503 1906 0512 1924
30 0706 1739 0635 1758 0555 1817 0520 1839 0503 1906 0512 1925
31 0705 1739 0634 1759 0519 1840 0513 1925
"""
l = l.split('\n')
l = filter(None, l)
f = map(lambda _: str.ljust(_[4:],78), l[7:38])
s = map(lambda _: str.ljust(_[4:],78), l[40:71])
l = map(lambda _: ''.join(_),zip(f,s))
a=[]
r = [13*i for i in xrange(13)]
for line in l:
d = [(line[r[i]:r[i+1]]) for i in xrange(12)]
a.append(d)
import numpy
a = numpy.transpose(a).tolist()
sun = {}
for m in xrange(12):
a[m] = filter(lambda _: not _.isspace(), a[m])
for d in xrange(len(a[m])):
date = "%4d-%02d-%d" % (2012, m+1, d+1)
sun[date] = {}
sun[date]['rise'], sun[date]['set'] = a[m][d].split()
print sun
Andrey,
To increase the automation of the beginning of the treatment, I would do like that:
l = filter(None,l.splitlines())
starts = [ i+1 for i,line in enumerate(l) if 'rise' in line and 'set' in line]
print 'starts==',starts
f = []
for line in l[starts[0]:]:
if any(c not in ' 0123456789' for c in line):
break
else:
f.append( line.partition(' ')[2] )
s = []
for line in l[starts[1]:]:
if any(c not in ' 0123456789' for c in line):
break
else:
s.append( line.partition(' ')[2] )
a= [ (x+' '+y).split(' ') for x,y in zip(f,s) ]
And improving the algorithm:
l = filter(None,l.splitlines())
starts = [ i+1 for i,line in enumerate(l) if 'rise' in line and 'set' in line]
print 'starts==',starts
nb, a = 0, []
while starts[1]+nb<len(l):
line0 = l[starts[0]+nb].partition(' ')[2]
line1 = l[starts[1]+nb].partition(' ')[2]
if any(c not in ' 0123456789' for c in line0) or any(c not in ' 0123456789' for c in line1):
break
else:
a.append((line0+' '+line1).split(' '))
nb += 1
I can't go farther for I haven't numpy and I don't know the final dat you want to obtain;
But there's something evident:
to realize this kind of process, it's absolutely necessary to use regexes: you'll have a confort of treatment several magnitude higher.

Categories