From rubini@unipv.it Thu Apr  7 16:35:48 EDT 1994
Article: 7532 of comp.os.linux.development
Path: bigblue.oit.unc.edu!concert!gatech!howland.reston.ans.net!math.ohio-state.edu!jussieu.fr!univ-lyon1.fr!ghost.dsi.unimi.it!mirage.unipv.it!rubini
From: rubini@unipv.it (Alessandro Rubini)
Newsgroups: comp.os.linux.development
Subject: A boost in performance when performance weakens (patch)
Date: 7 Apr 94 11:59:57 GMT
Organization: Pavia University
Lines: 120
Message-ID: <rubini.765719997@ipvvis.unipv.it>
NNTP-Posting-Host: ipvvis.unipv.it
Keywords: performance, kernel, swap

I posted this on the mailing list but the list seems to be dead :-(

this is a _very_ tiny patch to decrease swapping when the system is
heavily loaded, with a little performance penalty in small-load situations.

----------------------------

I noticed that all docs state that the good thing of the elevator
alghorithm is that it favours reads. Though this is surely a good
behaviour if you have plenty of ram, when the system is heavy-loaded,
it is undoubtely a loss. I have only 4 megs ram and I'm pretty
interested to get performance out of my box, since I sometimes need X.

Thus, I made a simple patch to /usr/src/linux/drivers/block/blk.h,
in order to favor writes. What follows is the patch itself ('<' to '>')
and some timings I made to my system (no-net, no other activity).

The net result is a decreased swapping, at the cost of some file-page
multiple fetch. The performance of the system _dramatically_ increases
as the load rises. (Well, it decreases, but less than before).

I myself have definitely switched to the new approach, as I ususally
have one emacs, one compilation (or TeX) and two or three shells, and
performance in the interactive tools is terrible due to heavy
swapping. I can't provide reliable tests on multi-terminal sessions, though.

Maybe something can be made about it: first of all make some tests on
machines with plenty of memory (which I can't).
Then, if there's interest I can try to add a flag in the kernel, in order to
trigger between the two approaches on an euristic basis (i.e. two or more
processes in the 'swapping' state)

Please keep me informed...
My data follow:

================ THE PATCH

*** blk.h.orig      Thu Mar 31 12:03:43 1994
--- blk.h       Thu Mar 31 12:04:03 1994
***************
*** 44,50 ****
   * are much more time-critical than writes.
   */
  #define IN_ORDER(s1,s2) \
! ((s1)->cmd < (s2)->cmd || ((s1)->cmd == (s2)->cmd && \
  ((s1)->dev < (s2)->dev || (((s1)->dev == (s2)->dev && \
  (s1)->sector < (s2)->sector)))))
  
--- 44,50 ----
   * are much more time-critical than writes.
   */
  #define IN_ORDER(s1,s2) \
! ((s1)->cmd > (s2)->cmd || ((s1)->cmd == (s2)->cmd && \
  ((s1)->dev < (s2)->dev || (((s1)->dev == (s2)->dev && \
  (s1)->sector < (s2)->sector)))))
  
================ THE TIMINGS - raw data, with a little hand editing.

startx, up to when the disk stops (2M available mem due to kernel profiling)

                       pag-in pag-ot swp-in swp-ot cpu-u cpu-s idle

old-before:  01:00:36  29088    722    653   1220  3933  5164  77592
old-after:   01:12:18  50058   1003  14366  15739  4612 13452 138851

new-before:  01:20:09   3261    234     59    506   475   964   9711
new-after:   01:28:19  24305    465   8374   9483  1069  6538  52252

==difference    -3:22    +74    +50  -5398  -5542   -85 -2714 -18718
                   30%


startx, up to when the disk stops (3.5M available, but hphoon run)

                       pag-in pag-ot swp-in swp-ot cpu-u cpu-s idle

old-before:  03:44:41  002674 00207   00026 00235  00447 00955 03911
old-after:   03:47:55  011746 00418   02017 03167  01191 03811 19298

new-before:  03:53:36  002651 00217   00008 00198  00480 00896 03722
new-after:   03:56:36  012074 00434   01778 02883  01206 03692 17938

==difference    -0:14    +351    +6    -221  -247    -18   -60 -1171
                    6%

some other timings, not postprocessed.

	un-tarring a gzipped distribution:
Mar 31 02:56:26 pg=(015012,003303) sw=(01736,02512) 002154u 005505s (050443)
Mar 31 02:56:45 pg=(016136,004952) sw=(01739,02513) 002722u 006114s (051115)
Mar 31 03:26:15 pg=(012049,000483) sw=(01684,02450) 001123u 003758s (030805)
Mar 31 03:26:32 pg=(013205,001747) sw=(01687,02450) 001699u 004338s (031395)
              0:19 -> 0:17 (11% gain, not precise, though)

	a compilation:
Mar 31 02:58:59 pg=(016304,005014) sw=(01744,02513) 002793u 006473s (064090)
Mar 31 03:12:27 pg=(059729,013856) sw=(03845,04975) 037530u 019961s (096712)
Mar 31 03:29:05 pg=(013271,002143) sw=(01692,02450) 001726u 004811s (046149)
Mar 31 03:41:17 pg=(050705,010677) sw=(03573,04670) 036245u 018014s (071771)
             13:28 -> 12:12 (9.5% gained)

	a TeX run:
Mar 31 03:58:51 pg=(012503,000466) sw=(02012,02883) 001307u 004073s (030596)
Mar 31 04:01:03 pg=(013764,000920) sw=(02020,02883) 003247u 014806s (031204)
Mar 31 03:49:22 pg=(012215,000456) sw=(02274,03167) 001296u 004207s (027356)
Mar 31 03:51:37 pg=(013547,000912) sw=(02282,03167) 003272u 015213s (027920)
              2:12 -> 2:15 (2% loss for a non-swapping run)

================ THE TIMING SCRIPT (just in case)
echo `date | awk '{print $2 " " $3 " " $4}'` \
    `fgrep page /proc/stat | awk '{printf "pg=(%06d,%06d)",$2,$3}'` \
    `fgrep swap /proc/stat | awk '{printf "sw=(%05d,%05d)",$2,$3}'` \
    `head -1 /proc/stat | awk '{printf "%06du %06ds (%06d)",$2,$4,$5}'`

--
=========================
   alessandro rubini
 rubini@ipvvis.unipv.it
=========================