DB Hub Support Forum Forum Index DB Hub Support Forum  
  Linux/Unix hub software for Direct Connect      FAQ      Search      Memberlist      Download      Album      Czat      Statistics  
  · Log in Register · Profile · Log in to check your private messages · Usergroups  



Previous topic «» Next topic
high load
Author Message
@DarKRaveR 
#[KVIP]

Joined: 19 Jul 2007
Posts: 102
Location: Germany
Posted: 2008-02-02, 20:07   


grayich wrote:
0.451-rc4
with --enable-perl

after 12hour
Code:
  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
21189 dcchub      1  96    0  4220K  2032K select   9:08  2.59% dbhub
21188 dcchub      1  96    0  4696K  3104K select   0:56  0.00% dbhub
21187 dcchub      1  96    0  3616K  1312K select   0:16  0.00% dbhub


~350 users

waiting..........


Well, this still doesn't help a single bit ...

Afterall no code regrding this was changed. Aside from that, you should be aware that WCPU != real CPU load ... if WCPU is 50%, it means that during all the time given to the process, approx 50% were computation, 50% sleeping - and yes, thats not the exact calculation, but closer than saying it is the CPU load.

In turn this means: If the hub cannot go to sleep, because there is IO on ANY socket, then the WEIGHTED CPU LOAD asymtotically goes towards 100% (in theory that is) ... The hub currently let itelf wake up forcefully every second, so depending on the hardware, this implies a certain fixed WCPU load, e could decrease this, by waking up less often and thus doing the maintanance less often. Another factor which is outside our control: The WCPU load not only increases with the number of users, but also with the numbr of ctions each user does. IE. 10 users chatting actively on the mainchat will impose more CPU LOAD than 10 users only downloading and not intercation with the hub.

What does this mean? We can optimize things, to ecrease the WCPU load, but many fators are outside the control of the coder ... unfortunately ...

Still, if there is a race condition, a real bug tc., I need profiling data/traces etc. to nail things down.

And in general I would appreciate profiling data from high load hubs, to see, what still could be optimized.
 
 
Ant 

Hub address:
127.0.0.1
Age: 32
Joined: 13 Jun 2007
Posts: 17
Location: Russia
Posted: 2008-02-06, 06:12   

Well, under callgrind on 451 rc4 uptime was 3 days and users flaw 60-90.
For now i moved to 451 release and move it under usual user and with callgrind again.

Is there any success info from man with centos?
 
 
@DarKRaveR 
#[KVIP]

Joined: 19 Jul 2007
Posts: 102
Location: Germany
Posted: 2008-02-06, 20:01   

Nopes, he never answered, unfortunately. I think, I will increase the sleeping period in the next release. We can then check if that improves the situation partly ... But from what I have seen there are things that should be completely reorganized and handled differently ;-).
 
 
grayich
#[VIP]

Joined: 24 Feb 2007
Posts: 57
Location: Ukraine
Posted: 2008-02-17, 13:58   

there is the ever-higher loading in 1-3 days. 500-600 persons.

unfortunately to watch, that exactly loads does not can
 
 
Ant 

Hub address:
127.0.0.1
Age: 32
Joined: 13 Jun 2007
Posts: 17
Location: Russia
Posted: 2008-02-25, 12:03   

Well, latest stable version with debug enabled, users flaw 70-90 and under vallgrind works approx 1 weeks and 5 days.
So, is there any ideas how to compromise it? :)
 
 
@DarKRaveR 
#[KVIP]

Joined: 19 Jul 2007
Posts: 102
Location: Germany
Posted: 2008-02-25, 12:57   

Humm, not really, what happens if you run it without something like callgrind (there's no purpose in running it under allgrind all the time) and without ebugging?

Funny enough: Callgrind actually slows down the execution speed quite a bit, maybe that'S why the CPU load doesn't climb that fast when it is running ... well, the wighted cpu load I mean.

It still is a mystery ....
 
 
SuSlayer 

Age: 27
Joined: 07 Feb 2008
Posts: 1
Location: Russia
Posted: 2008-03-02, 13:53   

There is the same after the move to 0,451 with - enable-perl After ~ 12 hours of work with ~ 300 users ...
It's here at the moment
Quote:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8235 dbhub 15 0 4368 2528 912 S 92.6 0.7 132:10.52 dbhub
 
 
@DarKRaveR 
#[KVIP]

Joined: 19 Jul 2007
Posts: 102
Location: Germany
Posted: 2008-03-02, 15:38   

Which of the processes is that? The Controlling Parent, The Listener, or one of the proceses embedding a script?

Does it only happen with perl support?
 
 
Ant 

Hub address:
127.0.0.1
Age: 32
Joined: 13 Jun 2007
Posts: 17
Location: Russia
Posted: 2008-04-08, 08:58   

Well, it's again :(
Even with config from backup i saw it.

After long time of normal works on one of my smp machines i saw today high load:

pid: 17969 cpu%:100 dbhub

In logs when i start it:
Mar 12 12:58:07 main(): *** Started DB Hub ver 0.451 (released: 02/02/2008) ***
Mar 12 12:58:08 perl_init(): Forked new script parsing process for script /usr/local/dbhub/.dbhub/scripts/rrdcstats_v0.12.pl, childs pid is 17970 and parents pid is 17969
Mar 12 12:58:09 perl_init(): Forked new script parsing process for script /usr/local/dbhub/.dbhub/scripts/phpUserStats.pl, childs pid is 17991 and parents pid is 17969
Mar 12 12:58:09 fork_process(): Forked new process, childs pid is 18007 and parents pid is 17969

It was compiled like this: ./configure --prefix=/usr/local/dbhub --enable-switch_user --enable-perl

100% again after few minutes when restarting.


==============================

Next one, compile like this:
./configure --prefix=/usr/local/dbhub --enable-switch_user --enable-perl --enable-debug

in top:
28203 dbhub 25 0 4416 904 640 S 100 0.0 10:32.34 dbhub

Same behaviour, 100% eating, in log:

Apr 8 10:32:09 main.c[4176]@main: *** Started DB Hub ver 0.451 (released: 02/02/2008) ***
Apr 8 10:32:09 perl_utils.c[109]@perl_init: Forked new script parsing process for script /usr/local/dbhub/.dbhub/scripts/rrdcstats_v0.12.pl, childs pid is
28204 and parents pid is 28203
Apr 8 10:32:09 perl_utils.c[109]@perl_init: Forked new script parsing process for script /usr/local/dbhub/.dbhub/scripts/phpUserStats.pl, childs pid is 28
209 and parents pid is 28203
Apr 8 10:32:09 main.c[358]@fork_process: Forked new process, childs pid is 28214 and parents pid is 28203
Apr 8 10:36:42 #### Recieved signal 11
Apr 8 10:36:42 #### Backtrace showing 10 items ####
Apr 8 10:36:42 0: /usr/local/dbhub/bin/dbhub(logbacktrace+0x1f) [0x80bef7f]
Apr 8 10:36:42 1: [0xbfffe420]
Apr 8 10:36:42 2: /usr/local/dbhub/bin/dbhub(get_human_user+0x75) [0x80b60a2]
Apr 8 10:36:42 3: /usr/local/dbhub/bin/dbhub(send_myinfo+0x18a) [0x80bcf6d]
Apr 8 10:36:42 4: /usr/local/dbhub/bin/dbhub(handle_command+0x8f3) [0x80b38b1]
Apr 8 10:36:42 5: /usr/local/dbhub/bin/dbhub(socket_action+0x58e) [0x80b77db]
Apr 8 10:36:42 6: /usr/local/dbhub/bin/dbhub(get_socket_action+0x274) [0x80b9e2b]
Apr 8 10:36:42 7: /usr/local/dbhub/bin/dbhub(main+0x1177) [0x80b978c]
Apr 8 10:36:42 8: /lib/i686/libc.so.6(__libc_start_main+0xdc) [0xb7ccb75c]
Apr 8 10:36:42 9: /usr/local/dbhub/bin/dbhub [0x8054f31]
Apr 8 10:36:42 #### Backtrace ends here ####




==============================

I try vallgrind too in next step :)
 
 
Ant 

Hub address:
127.0.0.1
Age: 32
Joined: 13 Jun 2007
Posts: 17
Location: Russia
Posted: 2008-04-08, 08:59   

Removing perl scripts doen't help at all.
 
 
Ant 

Hub address:
127.0.0.1
Age: 32
Joined: 13 Jun 2007
Posts: 17
Location: Russia
Posted: 2008-04-08, 09:15   

Well, after valgrind --tool=callgrind ./dbhub
And few minutes waiting:
top:
28042 dbhub 25 0 42512 17m 1092 R 100 0.9 1:08.89 callgrind

Apr 8 12:02:44 main.c[4176]@main: *** Started DB Hub ver 0.451 (released: 02/02/2008) ***
Apr 8 12:02:46 main.c[358]@fork_process: Forked new process, childs pid is 28044 and parents pid is 28042
Apr 8 12:05:59 #### Recieved signal 11
Apr 8 12:05:59 #### Backtrace showing 10 items ####
Apr 8 12:05:59 0: ./dbhub(logbacktrace+0x1f) [0x80bef7f]
Apr 8 12:05:59 1: /lib/i686/libc.so.6 [0x41c8068]
Apr 8 12:06:00 2: ./dbhub(get_human_user+0x75) [0x80b60a2]
Apr 8 12:06:00 3: ./dbhub(validate_nick+0x831) [0x8066672]
Apr 8 12:06:00 4: ./dbhub(handle_command+0x714) [0x80b36d2]
Apr 8 12:06:00 5: ./dbhub(socket_action+0x58e) [0x80b77db]
Apr 8 12:06:00 6: ./dbhub(get_socket_action+0x274) [0x80b9e2b]
Apr 8 12:06:00 7: ./dbhub(main+0x1177) [0x80b978c]
Apr 8 12:06:00 8: /lib/i686/libc.so.6(__libc_start_main+0xdc) [0x41b575c]
Apr 8 12:06:01 9: ./dbhub [0x8054f31]
Apr 8 12:06:01 #### Backtrace ends here ####


So, after it i saw in directory with dbhub this files:
callgrind.out.27973
callgrind.out.28044
vgcore.28044

Then:
callgrind_control -w /usr/local/dbhub/bin/ -d testdump

And thereis another one file added:
callgrind.out.28042.1

i made: callgrind_annotate callgrind.out.28042.1

WARNING: header line 2 malformed, ignoring
line: 'creator: callgrind-3.2.0'
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
L2 cache:
Timerange: Basic block 0 - 528547919
Trigger: Dump Command: testdump
Profiled target: ./dbhub (PID 28042, part 1)
Events recorded: Ir
Events shown: Ir
Event sort order: Ir
Thresholds: 99
Include dirs:
User annotated:
Auto-annotation: off

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
1,338,072,077 PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
604,041,455 network.c:get_socket_action [/usr/local/dbhub/bin/dbhub]
211,009,380 ???:memset [/lib/ld-2.4.so]
182,108,177 main.c:socket_action [/usr/local/dbhub/bin/dbhub]
107,569,488 main.c:clear_user_list [/usr/local/dbhub/bin/dbhub]
82,746,141 main.c:main [/usr/local/dbhub/bin/dbhub]
74,471,202 ???:select [/lib/i686/libc-2.4.so]
45,509,306 ???:recv [/lib/i686/libc-2.4.so]
16,551,454 ???:_dl_sysinfo_int80 [/lib/ld-2.4.so]
4,300,644 lang.c:read_lang [/usr/local/dbhub/bin/dbhub]


Help! :)

Anyway, looks like when it eats 100% child process dead. And i saw only one dbhub process in ps ax | grep dbhub

===
Just remove all files and recompile hub, looks like it works for now.
Looks very strange. I can understand if my config files were damaged, but i take my old backup files and it craches too.
 
 
grayich
#[VIP]

Joined: 24 Feb 2007
Posts: 57
Location: Ukraine
  Posted: 2008-05-26, 18:07   

users more than 800. load of processor 100%. unfortunately I will pass on verlihub :(
 
 
NetImperia

Hub address:
imperium.dom.lan
Age: 36
Joined: 19 Oct 2008
Posts: 1
Location: St.Petersburg
Posted: 2008-10-19, 12:03   

I have to fix it for yourself so

In main.c before:
}
quit_program();

insert this:
#if HAVE_UNISTD_H
usleep(100);
#endif


now hub with 900 users. server load 40%

p.s. Sorry for the poor knowledge of English
 
 
Ant 

Hub address:
127.0.0.1
Age: 32
Joined: 13 Jun 2007
Posts: 17
Location: Russia
Posted: 2009-01-25, 14:23   

Well, looks like it helps.
On my smp system it works more then 10 weeks now, that never been before.

What is idea of it's workaround?
 
 
Ant 

Hub address:
127.0.0.1
Age: 32
Joined: 13 Jun 2007
Posts: 17
Location: Russia
Posted: 2009-04-13, 06:09   

NetImperia,
Do you have any stat perl script lire rrdstat 0.12 ?

Looks like after this modify there some strange problem with stat after long uptime (wrong share size).
Do you have anything wrong with it?
 
 
Display posts from previous:   
Reply to topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum
Add this topic to your bookmarks
Printable version

Jump to:  



Powered by phpBB modified by Przemo © 2003 phpBB Group
Template modified by Mich@≥

SourceForge.net Logo