Discussion:
APU2 routing speed NetBSD
(too old to reply)
7***@gmx.ch
2016-12-18 14:08:35 UTC
Permalink
Hi All,
I am working with an APU2 board (4gb, 3 lan and AMD CPu 1Ghz
you can see specifications here http://www.pcengines.ch/apu2.htm)
to make a router/firewall.

Before choosing the OS I want to use I have done some benchmarks.
My first benchmark is:
1) Copy of a big file (17GB) compressed with gzip
from a Windows 10 machine to a Windows 8.1 machine.
This is done by using CIFS on Windows. Each machine
is at one side of the router.

The results are strange:
With OpenBSD and Linux (alpine linux 3.4) I get
the maximum speed (112Mbytes/sec) while with NetBSD
the speed is limited to 70Mbytes/sec.

All the hardware is identical and all the router OS are installed on the APU2.
To do the tests, I simply reboot the router and I choose an
other OS.

Do you have an idea why there is a such big difference? Should I
tune some specific network parameters on NetBSD?

I have done a second benchmark by transferring the same file
with ftp, but this time the second machine was a Linux server
(running on the same hardware as the Windows used for
the CIFS transfer) and I get similar
differences (80-85Mbytes with Linux Router,50-55Mbytes
with NetBSD).

The benchmarks were run without any firewall.


Thank your for your help,

best regards,

Alan


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2016-12-18 14:55:06 UTC
Permalink
Post by 7***@gmx.ch
Hi All,
I am working with an APU2 board (4gb, 3 lan and AMD CPu 1Ghz
you can see specifications here http://www.pcengines.ch/apu2.htm)
to make a router/firewall.
Can you give a few more details, e.g. the output of dmesg, and ifconfig
for the interfaces used?

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2016-12-19 00:01:42 UTC
Permalink
Post by 7***@gmx.ch
I am working with an APU2 board (4gb, 3 lan and AMD CPu 1Ghz
you can see specifications here http://www.pcengines.ch/apu2.htm)
to make a router/firewall.
Before choosing the OS I want to use I have done some benchmarks.
1) Copy of a big file (17GB) compressed with gzip
from a Windows 10 machine to a Windows 8.1 machine.
This is done by using CIFS on Windows. Each machine
is at one side of the router.
With OpenBSD and Linux (alpine linux 3.4) I get
the maximum speed (112Mbytes/sec) while with NetBSD
the speed is limited to 70Mbytes/sec.
There are basically two things to see about. One is the raw speed of
packet forwarding. The other is whether there is any loss and how that
interacts with congestion control. Plus then there are odd things I am
not thinking of.

NetBSD itself should be efficient. However, you are pushing close to 1
Gb/s. So, I would want to look at network counters to see if there is
any trouble. Do "netstat -s" on the router to a file, before and after,
and diff -u. Also, do the equivalent on the test machines.

This will slow it down, but run tcpudmp and save to a file, and look
over it. If TCP, use tcpdump2xplot from xplot in pkgsrc (read the docs;
this is not really easy, but very helpful to understand TCP behavior).
If CIFS is over UDP, look at retransmission specs on the clients.

Understand what MTU is being used, with various systems. Jumbo frames
of 8K data plus header may be faster due to amortizing per-packet
overhead and faster pack/unpack. THose may be enabled by the others.

Finally, 70 MB/s is still half of GbE, and it may be that other
considerations are more important than the last bit of speed. But
still, I think this should go faster.
7***@gmx.ch
2016-12-20 08:21:59 UTC
Permalink
Hi Martin,
Here are the requested info. Since my last test, I have only added the
IPv6 address. The first benchmark was done with IPv4.

Thanks for help

Best regard

Alan

wm0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 00:0d:b9:42:61:9c
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet 172.16.0.10 netmask 0xffff0000 broadcast 172.16.255.255
inet6 fe80::20d:b9ff:fe42:619c%wm0 prefixlen 64 scopeid 0x1
inet6 fd4f:bd4e:6d27:0:de0b:f387:afeb:7 prefixlen 64
inet6 2a02:168:c000:0:de0b:f387:afeb:7 prefixlen 64


wm1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 00:0d:b9:42:61:9d
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 192.168.88.2 netmask 0xffffff00 broadcast 192.168.88.255
inet6 fe80::20d:b9ff:fe42:619d%wm1 prefixlen 64 scopeid 0x2
inet6 fd4f:bd4e:6d27:2:de0b:f387:afeb:2 prefixlen 64
inet6 2a02:168:c000:2:de0b:f387:afeb:2 prefixlen 64


Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.

NetBSD 7.0.2 (GENERIC.201610210724Z)
total memory = 4079 MB
avail memory = 3943 MB
kern.module.path=/stand/amd64/7.0/modules
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
PC Engines apu2 (1.0)
mainbus0 (root)
ACPI: RSDP 0xf30e0 000024 (v02 CORE )
ACPI: XSDT 0xdffb80e0 00005C (v01 CORE COREBOOT 00000000 CORE 00000000)
ACPI: FACP 0xdffb96f0 0000F4 (v04 CORE COREBOOT 00000000 CORE 00000000)
ACPI: DSDT 0xdffb8250 001496 (v02 AMD COREBOOT 00010001 INTL 20140114)
ACPI: FACS 0xdffb8210 000040
ACPI: SSDT 0xdffb97f0 000045 (v02 CORE COREBOOT 0000002A CORE 0000002A)
ACPI: APIC 0xdffb9840 00007E (v01 CORE COREBOOT 00000000 CORE 00000000)
ACPI: HEST 0xdffb98c0 0001D0 (v01 CORE COREBOOT 00000000 CORE 00000000)
ACPI: SSDT 0xdffb9a90 0048A6 (v02 AMD AGESA 00000002 MSFT 04000000)
ACPI: SSDT 0xdffbe340 0007C8 (v01 AMD AGESA 00000001 AMD 00000001)
ACPI: HPET 0xdffbeb10 000038 (v01 CORE COREBOOT 00000000 CORE 00000000)
ACPI: All ACPI Tables successfully acquired
ioapic0 at mainbus0 apid 4: pa 0xfec00000, version 0x21, 24 pins
ioapic1 at mainbus0 apid 5: pa 0xfec20000, version 0x21, 32 pins
cpu0 at mainbus0 apid 0: AMD GX-412TC SOC , id 0x730f01
cpu1 at mainbus0 apid 1: AMD GX-412TC SOC , id 0x730f01
cpu2 at mainbus0 apid 2: AMD GX-412TC SOC , id 0x730f01
cpu3 at mainbus0 apid 3: AMD GX-412TC SOC , id 0x730f01
acpi0 at mainbus0: Intel ACPICA 20131218
acpi0: X/RSDT: OemId <CORE ,COREBOOT,00000000>, AslId <CORE,00000000>
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Safe" frequency 3579545 Hz quality 900
hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
acpibut0 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
LDRC (PNP0C02) at acpi0 not configured
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
COM1 (PNP0501) at acpi0 not configured
AWR0 (PNP0C02) at acpi0 not configured
ABR0 (PNP0C02) at acpi0 not configured
ABR1 (PNP0C02) at acpi0 not configured
ABR2 (PNP0C02) at acpi0 not configured
ABR3 (PNP0C02) at acpi0 not configured
ABR4 (PNP0C02) at acpi0 not configured
ACPI: Enabled 4 GPEs in block 00 to 1F
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x1022 product 0x1566 (rev. 0x00)
pchb1 at pci0 dev 2 function 0: vendor 0x1022 product 0x156b (rev. 0x00)
ppb0 at pci0 dev 2 function 2: vendor 0x1022 product 0x1439 (rev. 0x00)
ppb0: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x1 @ 5.0GT/s
ppb0: link is x1 @ 2.5GT/s
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci1 dev 0 function 0: I210 Ethernet (FLASH less) (rev. 0x03)
wm0: interrupting at ioapic1 pin 4
wm0: PCI-Express bus
wm0: 64 words iNVM
wm0: Ethernet address 00:0d:b9:42:61:9c
wm0: Copper
ukphy0 at wm0 phy 1: OUI 0x000ac2, model 0x0000, rev. 0
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb1 at pci0 dev 2 function 3: vendor 0x1022 product 0x1439 (rev. 0x00)
ppb1: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x1 @ 5.0GT/s
ppb1: link is x1 @ 2.5GT/s
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
wm1 at pci2 dev 0 function 0: I210 Ethernet (FLASH less) (rev. 0x03)
wm1: interrupting at ioapic1 pin 8
wm1: PCI-Express bus
wm1: 64 words iNVM
wm1: Ethernet address 00:0d:b9:42:61:9d
wm1: Copper
ukphy1 at wm1 phy 1: OUI 0x000ac2, model 0x0000, rev. 0
ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb2 at pci0 dev 2 function 4: vendor 0x1022 product 0x1439 (rev. 0x00)
ppb2: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x1 @ 5.0GT/s
ppb2: link is x1 @ 2.5GT/s
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
wm2 at pci3 dev 0 function 0: I210 Ethernet (FLASH less) (rev. 0x03)
wm2: interrupting at ioapic1 pin 12
wm2: PCI-Express bus
wm2: 64 words iNVM
wm2: Ethernet address 00:0d:b9:42:61:9e
wm2: Copper
ukphy2 at wm2 phy 1: OUI 0x000ac2, model 0x0000, rev. 0
ukphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
vendor 0x1022 product 0x1537 (miscellaneous crypto) at pci0 dev 8 function 0 not configured
vendor 0x1022 product 0x7814 (USB serial bus, xHCI, revision 0x11) at pci0 dev 16 function 0 not configured
pciide0 at pci0 dev 17 function 0: vendor 0x1022 product 0x7800 (rev. 0x40)
pciide0: bus-master DMA support present, but unused (no driver support)
pciide0: primary channel configured to native-PCI mode
pciide0: using ioapic0 pin 19 for native-PCI interrupt
atabus0 at pciide0 channel 0
pciide0: secondary channel configured to native-PCI mode
atabus1 at pciide0 channel 1
ehci0 at pci0 dev 19 function 0: vendor 0x1022 product 0x7808 (rev. 0x39)
ehci0: interrupting at ioapic0 pin 18
ehci0: EHCI version 1.0
usb0 at ehci0: USB revision 2.0
vendor 0x1022 product 0x780b (SMBus serial bus, revision 0x42) at pci0 dev 20 function 0 not configured
pcib0 at pci0 dev 20 function 3: vendor 0x1022 product 0x780e (rev. 0x11)
sdhc0 at pci0 dev 20 function 7: vendor 0x1022 product 0x7813 (rev. 0x01)
sdhc0: interrupting at ioapic0 pin 16
sdhc0: SD Host Specification 2.0, rev.16
sdhc0: using DMA transfer
sdmmc0 at sdhc0 slot 0
pchb2 at pci0 dev 24 function 0: vendor 0x1022 product 0x1580 (rev. 0x00)
pchb3 at pci0 dev 24 function 1: vendor 0x1022 product 0x1581 (rev. 0x00)
pchb4 at pci0 dev 24 function 2: vendor 0x1022 product 0x1582 (rev. 0x00)
pchb5 at pci0 dev 24 function 3: vendor 0x1022 product 0x1583 (rev. 0x00)
pchb6 at pci0 dev 24 function 4: vendor 0x1022 product 0x1584 (rev. 0x00)
pchb7 at pci0 dev 24 function 5: vendor 0x1022 product 0x1585 (rev. 0x00)
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat 0 us, pow 0 mW
acpicpu0: C2: I/O, lat 400 us, pow 0 mW
acpicpu0: P0: FFH, lat 4 us, pow 980 mW, 1000 MHz
acpicpu0: P1: FFH, lat 4 us, pow 807 mW, 800 MHz
acpicpu0: P2: FFH, lat 4 us, pow 609 mW, 600 MHz
acpicpu0: T0: I/O, lat 1 us, pow 0 mW, 100 %
acpicpu0: T1: I/O, lat 1 us, pow 0 mW, 88 %
acpicpu0: T2: I/O, lat 1 us, pow 0 mW, 76 %
acpicpu0: T3: I/O, lat 1 us, pow 0 mW, 64 %
acpicpu0: T4: I/O, lat 1 us, pow 0 mW, 52 %
acpicpu0: T5: I/O, lat 1 us, pow 0 mW, 40 %
acpicpu0: T6: I/O, lat 1 us, pow 0 mW, 28 %
acpicpu0: T7: I/O, lat 1 us, pow 0 mW, 16 %
acpicpu1 at cpu1: ACPI CPU
acpicpu2 at cpu2: ACPI CPU
acpicpu3 at cpu3: ACPI CPU
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
timecounter: Timecounter "TSC" frequency 998187020 Hz quality 3000
uhub0 at usb0: vendor 0x1022 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
IPsec: Initialized Security Association Processing.
ld0 at sdmmc0: <0x1b:0x534d:00000:0x10:0x935c4c22:0x0e7>
ld0: 7543 MB, 3831 cyl, 64 head, 63 sec, 512 bytes/sect x 15448064 sectors
ld0: 4-bit width, bus clock 50.000 MHz
uhub1 at uhub0 port 1: vendor 0x0438 product 0x7900, class 9/0, rev 2.00/0.18, addr 2
uhub1: single transaction translator
uhub1: 4 ports with 4 removable, self powered
wd0 at atabus0 drive 0
wd0: <SATA SSD>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 15272 MB, 31029 cyl, 16 head, 63 sec, 512 bytes/sect x 31277232 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio0 at pad0: half duplex, playback, capture
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
Post by Martin Husemann
Post by 7***@gmx.ch
Hi All,
I am working with an APU2 board (4gb, 3 lan and AMD CPu 1Ghz
you can see specifications here http://www.pcengines.ch/apu2.htm)
to make a router/firewall.
Can you give a few more details, e.g. the output of dmesg, and ifconfig
for the interfaces used?
Martin
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
7***@gmx.ch
2016-12-21 09:38:35 UTC
Permalink
Post by Manuel Bouyer
Post by 7***@gmx.ch
Thanks for the info. I suppose that *CSUM is the computation of
the checksum of packet by the card. How I Configure it in NetBSD?
with ifconfig
e.g.
ifconfig wm0 ip4csum tcp4csum udp4csum ...
Post by 7***@gmx.ch
Probably this should be done in "hostname.if" file, but how.
in ifconfig.wm0 and ifconfig.wm1
--
NetBSD: 26 ans d'experience feront toujours la difference
--
Hi,
Unfortunately the situation has not really changed after the adding
of the computation of the checksums by the card.

I have done a new test with NFS (between a Knoppix 7.2 client and a Ubuntu 16.04 server).
The results are the same: 114Mbyte/sec for OpenBSD with firewall disabled (it
is enabled by default) and 70Mbytes/sec with NetBSD.

This is annoying while I don't understand what happens. I should still make some test
with a firewall to see how the transfer speed will decrease. With the slight
firewall (1 pass, two block rules) of OpenBSD, the speed was about 90Mbytes/sec.

By the way, is the default kernel from Netbsd SMP aware?
Is a firewall enabled by default on NetBSD (like in OpenBSD)?

Would a current kernel be better?

Thanks for your help,

Best regards,

Alan

Here are the results of my test (the file copied with dd was on an NFSv4 share):
-----------------------------------------------------------------------------------
***@Microknoppix:/tmp/s/Test# time dd if=PartSave.ntfsclone.gz of=/dev/null bs=8M
2082+1 records in
2082+1 records out
17467590412 bytes (17 GB, 16 GiB) copied, 153.702 s, 114 MB/s

real 2m33.715s
user 0m0.013s
sys 0m13.807s
***@Microknoppix:/tmp/s/Test# time dd if=PartSave.ntfsclone.gz of=/dev/null bs=8M
2082+1 records in
2082+1 records out
17467590412 bytes (17 GB, 16 GiB) copied, 252.771 s, 69.1 MB/s

real 4m12.782s
user 0m0.013s
sys 0m14.010s
***@Microknoppix:/tmp/s/Test#
-----------------------------------------------------------------------------------------

Here are the configuration of the network interfaces:
-----------------------------------------------------------------------------------------
wm0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 00:0d:b9:42:61:9c
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet 172.16.0.10 netmask 0xffff0000 broadcast 172.16.255.255
inet6 fe80::20d:b9ff:fe42:619c%wm0 prefixlen 64 scopeid 0x1
inet6 fd4f:bd4e:6d27:0:de0b:f387:afeb:7 prefixlen 64
inet6 2a02:168:c000:0:de0b:f387:afeb:7 prefixlen 64
wm1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
enabled=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
enabled=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 00:0d:b9:42:61:9d
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet 192.168.88.2 netmask 0xffffff00 broadcast 192.168.88.255
inet6 fe80::20d:b9ff:fe42:619d%wm1 prefixlen 64 scopeid 0x2
inet6 fd4f:bd4e:6d27:2:de0b:f387:afeb:2 prefixlen 64
inet6 2a02:168:c000:2:de0b:f387:afeb:2 prefixlen 64
wm2: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
ec_capabilities=7<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU>
ec_enabled=0
address: 00:0d:b9:42:61:9e
media: Ethernet autoselect
status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33648
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
-----------------------------------------------------------------------------------------

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2016-12-21 10:04:12 UTC
Permalink
Post by 7***@gmx.ch
Hi,
Unfortunately the situation has not really changed after the adding
of the computation of the checksums by the card.
I have done a new test with NFS (between a Knoppix 7.2 client and a Ubuntu 16.04 server).
The results are the same: 114Mbyte/sec for OpenBSD with firewall disabled (it
is enabled by default) and 70Mbytes/sec with NetBSD.
Could you check for drops ? netstat -id and netstat -q
Post by 7***@gmx.ch
This is annoying while I don't understand what happens. I should still make some test
with a firewall to see how the transfer speed will decrease. With the slight
firewall (1 pass, two block rules) of OpenBSD, the speed was about 90Mbytes/sec.
By the way, is the default kernel from Netbsd SMP aware?
yes
Post by 7***@gmx.ch
Is a firewall enabled by default on NetBSD (like in OpenBSD)?
no
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
c***@SDF.ORG
2016-12-21 10:09:33 UTC
Permalink
wm(4) had a bunch of work done on it since 7.0, so it's worth
trying a newer kernel. I believe most of it will be in 7.1 too.

NPF is not used unless you edit some config files, but it
shouldn't make a difference on a small workload.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
7***@gmx.ch
2016-12-21 13:16:14 UTC
Permalink
Post by Manuel Bouyer
Post by 7***@gmx.ch
Hi,
Unfortunately the situation has not really changed after the adding
of the computation of the checksums by the card.
I have done a new test with NFS (between a Knoppix 7.2 client and a Ubuntu 16.04 server).
The results are the same: 114Mbyte/sec for OpenBSD with firewall disabled (it
is enabled by default) and 70Mbytes/sec with NetBSD.
Could you check for drops ? netstat -id and netstat -q
Here are the result of both commandsi (as far as I can understand, I see
no drop):
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls Drops
wm0 1500 <Link> 00:0d:b9:42:61:9c 6128071 0 24129832 0 0 0
wm0 1500 fe80::/64 fe80::20d:b9ff:fe 6128071 0 24129832 0 0 0
wm0 1500 172.16/16 jalouvre 6128071 0 24129832 0 0 0
wm0 1500 fd4f:bd4e:6d2 fd4f:bd4e:6d27:0: 6128071 0 24129832 0 0 0
wm0 1500 2a02:168:c000 2a02:168:c000:0:d 6128071 0 24129832 0 0 0
wm1 1500 <Link> 00:0d:b9:42:61:9d 24135834 0 6128546 0 0 0
wm1 1500 fe80::/64 fe80::20d:b9ff:fe 24135834 0 6128546 0 0 0
wm1 1500 192.168.88/24 192.168.88.2 24135834 0 6128546 0 0 0
wm1 1500 fd4f:bd4e:6d2 fd4f:bd4e:6d27:2: 24135834 0 6128546 0 0 0
wm1 1500 2a02:168:c000 2a02:168:c000:2:d 24135834 0 6128546 0 0 0
wm2* 1500 <Link> 00:0d:b9:42:61:9e 0 0 0 0 0 0
lo0 33648 <Link> 0 0 0 0 0 0
lo0 33648 127/8 localhost 0 0 0 0 0 0
lo0 33648 localhost/128 ::1 0 0 0 0 0 0
lo0 33648 fe80::/64 fe80::1 0 0 0 0 0 0



arpintrq:
queue length: 0
maximum queue length: 50
packets dropped: 0
atintrq1:
queue length: 0
maximum queue length: 256
packets dropped: 0
atintrq2:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoediscinq:
queue length: 0
maximum queue length: 256
packets dropped: 0
ppoeinq:
queue length: 0
maximum queue length: 256
packets dropped: 0

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2016-12-21 14:24:07 UTC
Permalink
Post by 7***@gmx.ch
Here are the result of both commandsi (as far as I can understand, I see
inteed there's no drop at the wm0/wm1 level, that's good.
I expected to see a ipintrq in the netstat -q output, but it seems
to be gone in netbsd-7. I don't know how this queue is controlled now,
but this is a place where there could be drops too.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
7***@gmx.ch
2016-12-21 16:04:46 UTC
Permalink
Post by c***@SDF.ORG
wm(4) had a bunch of work done on it since 7.0, so it's worth
trying a newer kernel. I believe most of it will be in 7.1 too.
How can I do that? My first attempt was to download quickly
a kernel from "http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201612210220Z/amd64/binary/sets/"
Post by c***@SDF.ORG
boot hd0a:netbsd-current
boot: hd0a:netbsd-current: Inappropriate file type or format

Should I install a new boot program too? In what set can I find it?

best regards,

Alan

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
7***@gmx.ch
2016-12-30 18:38:37 UTC
Permalink
Finally, I have used a new kernel and it works a little bit faster (90Mbits/sec
instead of 70Mbits/sec).

But after being able to play with the new kernel, I get a lot of trouble
with Windows 10 by transfering data (either by CIFS or FTP) and with all
OS (Linux, OpenBSD, NetBSD) on the router.

After a lot of tests, searchs and looking on the Web I have noticed that this
command (from the PF firewall) solves most of my problems (by avoiding too long packets)
match on $ext_if scrub (max-mss 1454 reassemble tcp random-id)

To continue my tests, I should do more or less the same thing
in NPF. The only documentation I have found is here http://www.netbsd.org/~rmind/npf/#_application_level_gateways
but the documentation is quite sparse.

Could somebody give me some pointer how to achieve an equivalent result with
NPF?

Thanks for your help,

Alan
Post by 7***@gmx.ch
Post by c***@SDF.ORG
wm(4) had a bunch of work done on it since 7.0, so it's worth
trying a newer kernel. I believe most of it will be in 7.1 too.
How can I do that? My first attempt was to download quickly
a kernel from "http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201612210220Z/amd64/binary/sets/"
Post by c***@SDF.ORG
boot hd0a:netbsd-current
boot: hd0a:netbsd-current: Inappropriate file type or format
Should I install a new boot program too? In what set can I find it?
best regards,
Alan
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2016-12-20 08:44:20 UTC
Permalink
Post by 7***@gmx.ch
Hi Martin,
Here are the requested info. Since my last test, I have only added the
IPv6 address. The first benchmark was done with IPv4.
Thanks for help
Best regard
Alan
wm0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
You want to enable the *CSUM features here. Other OSes do it by
default.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
7***@gmx.ch
2016-12-20 09:00:59 UTC
Permalink
Post by Manuel Bouyer
Post by 7***@gmx.ch
Hi Martin,
Here are the requested info. Since my last test, I have only added the
IPv6 address. The first benchmark was done with IPv4.
Thanks for help
Best regard
Alan
wm0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
You want to enable the *CSUM features here. Other OSes do it by
default.
Thanks for the info. I suppose that *CSUM is the computation of
the checksum of packet by the card. How I Configure it in NetBSD?

Probably this should be done in "hostname.if" file, but how.

Best regards,

Alan

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2016-12-20 09:43:22 UTC
Permalink
Post by 7***@gmx.ch
Thanks for the info. I suppose that *CSUM is the computation of
the checksum of packet by the card. How I Configure it in NetBSD?
with ifconfig
e.g.
ifconfig wm0 ip4csum tcp4csum udp4csum ...
Post by 7***@gmx.ch
Probably this should be done in "hostname.if" file, but how.
in ifconfig.wm0 and ifconfig.wm1
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
7***@gmx.ch
2016-12-21 10:03:33 UTC
Permalink
Post by Greg Troxel
Post by 7***@gmx.ch
I am working with an APU2 board (4gb, 3 lan and AMD CPu 1Ghz
you can see specifications here http://www.pcengines.ch/apu2.htm)
to make a router/firewall.
Before choosing the OS I want to use I have done some benchmarks.
1) Copy of a big file (17GB) compressed with gzip
from a Windows 10 machine to a Windows 8.1 machine.
This is done by using CIFS on Windows. Each machine
is at one side of the router.
With OpenBSD and Linux (alpine linux 3.4) I get
the maximum speed (112Mbytes/sec) while with NetBSD
the speed is limited to 70Mbytes/sec.
There are basically two things to see about. One is the raw speed of
packet forwarding. The other is whether there is any loss and how that
interacts with congestion control. Plus then there are odd things I am
not thinking of.
NetBSD itself should be efficient. However, you are pushing close to 1
Gb/s. So, I would want to look at network counters to see if there is
any trouble. Do "netstat -s" on the router to a file, before and after,
and diff -u. Also, do the equivalent on the test machines.
Here is the result of the difference on the router. I have had
a look on it and I see that no "fast forward" are used.

Could be that the reason of slowness? I have read somewhere
that fast forward could increase routing speed.

Best regards,

Alan

Differences between file produced by the netstat -s command before and
after the transfer.
----------------------------------------------------------------------------------------
--- /tmp/avantCopy.txt 2016-12-21 11:00:32.000000000 +0100
+++ /tmp/apresCopy.txt 2016-12-21 11:07:48.000000000 +0100
@@ -84,18 +84,18 @@
0 packets with ECN CE bit
0 packets ECN ECT(0) bit
udp:
- 203 datagrams received
+ 231 datagrams received
0 with incomplete header
0 with bad data length field
0 with bad checksum
0 dropped due to no socket
- 36 broadcast/multicast datagrams dropped due to no socket
+ 45 broadcast/multicast datagrams dropped due to no socket
0 dropped due to full socket buffers
- 167 delivered
- 187 PCB hash misses
- 169 datagrams output
+ 186 delivered
+ 215 PCB hash misses
+ 188 datagrams output
ip:
- 15146914 total packets received
+ 30256038 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
@@ -110,13 +110,13 @@
0 malformed fragments dropped
0 fragments dropped after timeout
0 packets reassembled ok
- 203 packets for this host
+ 231 packets for this host
0 packets for unknown/unsupported protocol
- 15146698 packets forwarded (0 packets fast forwarded)
- 13 packets not forwardable
+ 30255791 packets forwarded (0 packets fast forwarded)
+ 16 packets not forwardable
0 redirects sent
0 packets no matching gif found
- 169 packets sent from this host
+ 188 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
0 output packets discarded due to no route
@@ -208,7 +208,7 @@
0 ipcomp input bytes
0 ipcomp output bytes
ip6:
- 44 total packets received
+ 54 total packets received
0 with size smaller than minimum
0 with data size < data length
0 with bad options
@@ -218,7 +218,7 @@
0 fragments dropped after timeout
0 fragments that exceeded limit
0 packets reassembled ok
- 32 packets for this host
+ 39 packets for this host
0 packets forwarded
0 packets fast forwarded
0 fast forward flows
@@ -227,18 +227,18 @@
20 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
- 50 output packets discarded due to no route
+ 56 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 datagrams that can't be fragmented
0 packets that violated scope rules
0 multicast packets which we don't join
Input packet histogram:
- UDP: 32
- ICMP6: 12
+ UDP: 39
+ ICMP6: 15
Mbuf statistics:
0 one mbufs
- 44 one ext mbufs
+ 54 one ext mbufs
0 two or more ext mbufs
0 packets whose headers are not continuous
0 tunneling packets that can't find gif
@@ -258,7 +258,7 @@
0 bad checksums
0 messages with bad length
Input packet histogram:
- router advertisement: 12
+ router advertisement: 15
Histogram of error messages to be generated:
0 no route
0 administratively prohibited
@@ -348,13 +348,13 @@
0 packets with ECN CE bit
0 packets ECN ECT(0) bit
udp6:
- 32 datagrams received
+ 39 datagrams received
0 with incomplete header
0 with bad data length field
0 with bad checksum
0 with no checksum
0 dropped due to no socket
- 32 multicast datagrams dropped due to no socket
+ 39 multicast datagrams dropped due to no socket
0 dropped due to full socket buffers
0 delivered
0 datagrams output
@@ -459,13 +459,13 @@
0 delivered
0 datagrams output
arp:
- 86 packets sent
- 82 reply packets
+ 99 packets sent
+ 95 reply packets
4 request packets
- 103 packets received
+ 120 packets received
2 reply packets
- 101 valid request packets
- 20 broadcast/multicast packets
+ 118 valid request packets
+ 24 broadcast/multicast packets
0 packets with unknown protocol type
0 packets with bad (short) length
0 packets with null target IP address
Thor Lancelot Simon
2016-12-21 20:09:12 UTC
Permalink
Post by 7***@gmx.ch
Here is the result of the difference on the router. I have had
a look on it and I see that no "fast forward" are used.
It is indeed odd that you're not fast forwarding. Do you have
firewall rules or IPsec policies installed?

Can you try a test with more than one stream? If the results are higher,
try a test with larger socket buffers -- much larger -- on both sender and
receiver. A long time ago, I found that 'wm' forwarding performance was
more sensitive to interrupt latency than I expected, and I did a little work
tuning the interrupt mitigation settings in the driver, but I didn't really
care about single-stream performance.

Finally, if you boot with -1, I am curious whether performance goes up.

A kernel profile -- better, perhaps some data gathered with DTrace FBT --
might be helpful, as might the output of lockstat.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...